Polynomial, trigonometric, and tropical activations

This paper introduces a family of polynomial, trigonometric, and tropical activation functions derived from orthonormal bases that, when combined with variance-preserving initialization, enable the successful training of deep models like GPT-2 and ConvNeXt while mitigating gradient instability, offering polynomial interpretability, and facilitating fine-tuning through Hermite interpolation.

Ismail Khalfaoui-Hassani, Stefan Kesselheim2026-03-03💬 cs.CL