What is GELU activation?
9
1
$begingroup$
I was going through BERT paper which uses GELU (Gaussian Error Linear Unit) which states equation as $$ GELU(x) = xP(X ≤ x) = xΦ(x).$$ which appriximates to $$0.5x(1 + tanh[sqrt{ 2/π}(x + 0.044715x^3)])$$ Could you simplify the equation and explain how it has been approimated.
activation-function bert mathematics
share | improve this question
asked Apr 18 at 8:06
thanatoz thanatoz
709 5 21
$endgroup$
add a comment |