Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure
该论文介绍了业界首个基于千卡 GPU 集群和 LeRobot 框架的云端具身智能训练平台,通过重构数据流水线、优化模型训练算法(如 FlashAttention、FP8 量化)及构建弹性基础设施,将 GR00T-N1.5 模型训练速度提升 40 倍,并建立了端到端评估闭环,为下一代自主智能机器人奠定了关键技术基础。
Chen Zhou, Haoran Sun, Hedan Yang, Jing Long, Junwu Xiong, Luqiao Wang, Mingxi Luo, Qiming Yang, Shuai Di, Song Wang, Tianyun Zhao, Wanting Xu, Wen Huang, Xiaodong Bai, Xiaomeng Tian, Xiaolong Xiang, Yicheng Gong, Yongjian Guo, Yucheng Guo, Yunxuan Ma, Yu Wei, Zhong Guan, Zhen SunFri, 13 Ma🤖 cs.AI