IGLU: The Integrated Gaussian Linear Unit Activation Function
This paper introduces IGLU, a novel parametric activation function derived from a scale mixture of GELU gates that utilizes a Cauchy CDF to provide heavy-tailed gradient properties and robustness against vanishing gradients, alongside a computationally efficient rational approximation (IGLU-Approx) that achieves competitive or superior performance across vision and language tasks compared to standard baselines like ReLU and GELU.