POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation
The paper introduces POET-X, a memory-efficient and scalable variant of the POET framework that utilizes optimized orthogonal equivalence transformations to enable the stable pretraining of billion-parameter large language models on a single GPU, overcoming the high memory and computational costs of the original implementation.