date | tags |
---|---|
2022-04-23 |
paper, deep-learning, activations, gelu |
Dan Hendrycks, Kevin Gimpel
arXiv Preprint
Year: 2016
This paper introduces GELUs, a new activation function for neural networks.
This new activation function is based on the CDF of the Gaussian distribution, as follows.
To avoid the integral, it can also be approximated as follows.
The authors assure that this function is more robust to noise than ReLU and ELU. Additionally, as
In the paper there are many experiments showing that GELU performs better than ReLU and ELU, converging faster and to a better optimum.