Posts

Showing posts from April 26, 2019

What is GELU activation?

Image
9 1 $begingroup$ I was going through BERT paper which uses GELU (Gaussian Error Linear Unit) which states equation as $$ GELU(x) = xP(X ≤ x) = xΦ(x).$$ which appriximates to $$0.5x(1 + tanh[sqrt{ 2/π}(x + 0.044715x^3)])$$ Could you simplify the equation and explain how it has been approimated. activation-function bert mathematics share | improve this question asked Apr 18 at 8:06 thanatoz thanatoz 709 5 21 $endgroup$ add a comment  |  ...