Evaluating Synthetic Data for Baggage Trolley Detection in Airport Logistics

该论文提出了一种基于 NVIDIA Omniverse 构建的阿尔及尔国际机场高保真数字孪生体的合成数据生成管道,用于解决行李推车检测中的隐私与数据多样性难题,实验表明结合少量真实标注数据的混合训练策略在显著降低标注成本的同时,其检测精度(mAP@50 达 0.94)可媲美甚至超越全量真实数据基线。

Abdeldjalil Taibi, Mohmoud Badlis, Amina Bensalem, Belkacem Zouilekh, Mohammed Brahimi2026-03-10🤖 cs.LG

Compressed Proximal Federated Learning for Non-Convex Composite Optimization on Heterogeneous Data

本文提出了一种名为 FedCEF 的新型联邦复合优化算法,通过解耦近端更新与通信、结合误差反馈与控制变量机制,有效解决了非凸复合优化中非平滑正则化、数据异构及有偏压缩带来的挑战,在极端压缩比下实现了通信高效且收敛稳健的分布式训练。

Pu Qiu, Chen Ouyang, Yongyang Xiong, Keyou You, Wanquan Liu, Yang Shi2026-03-10🤖 cs.LG

Scalable Training of Mixture-of-Experts Models with Megatron Core

本文介绍了 Megatron Core 中针对混合专家(MoE)模型可扩展训练的系统级协同优化方案,通过整合内存、通信和计算层面的多项创新技术,在 NVIDIA GB300/GB200 集群上实现了 DeepSeek-V3 和 Qwen3 等超大规模模型的高效、生产就绪型训练。

Zijie Yan (NVIDIA), Hongxiao Bai (NVIDIA), Xin Yao (NVIDIA), Dennis Liu (NVIDIA), Tong Liu (NVIDIA), Hongbin Liu (NVIDIA), Pingtian Li (NVIDIA), Evan Wu (NVIDIA), Shiqing Fan (NVIDIA), Li Tao (NVIDIA), Robin Zhang (NVIDIA), Yuzhong Wang (NVIDIA), Shifang Xu (NVIDIA), Jack Chang (NVIDIA), Xuwen Chen (NVIDIA), Kunlun Li (NVIDIA), Yan Bai (NVIDIA), Gao Deng (NVIDIA), Nan Zheng (NVIDIA), Vijay Anand Korthikanti (NVIDIA), Abhinav Khattar (NVIDIA), Ethan He (NVIDIA), Soham Govande (NVIDIA), Sangkug Lym (NVIDIA), Zhongbo Zhu (NVIDIA), Qi Zhang (NVIDIA), Haochen Yuan (NVIDIA), Xiaowei Ren (NVIDIA), Deyu Fu (NVIDIA), Tailai Ma (NVIDIA), Shunkang Zhang (NVIDIA), Jiang Shao (NVIDIA), Ray Wang (NVIDIA), Santosh Bhavani (NVIDIA), Xipeng Li (NVIDIA), Chandler Zhou (NVIDIA), David Wu (NVIDIA), Yingcan Wei (NVIDIA), Ashwath Aithal (NVIDIA), Michael Andersch (NVIDIA), Mohammad Shoeybi (NVIDIA), Jiajie Yao (NVIDIA), June Yang (NVIDIA)2026-03-10🤖 cs.LG

Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization

该论文提出了一种结合神经评论家估计与自然策略梯度的原始 - 对偶算法,利用神经切线核理论证明了在一般策略参数化和多层神经网络评论家设置下,无限时域约束马尔可夫决策过程(CMDP)的平均奖励问题具有全局收敛性及约束违反率保证。

Anirudh Satheesh, Pankaj Kumar Barman, Washim Uddin Mondal, Vaneet Aggarwal2026-03-10🤖 cs.LG

Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models

该论文针对现代代码生成模型的训练瓶颈,提出了包含条件截断掩码等三项创新的 MicroCoder-GRPO 算法,并配套发布了更具挑战性的 MicroCoder-Dataset 和更高效的 MicroCoder-Evaluator,通过大量实验验证了其在 LiveCodeBench v6 上显著的性能提升及 34 项关键训练洞察。

Zongqian Li, Shaohan Huang, Zewen Chi, Yixuan Su, Lexin Zhou, Li Dong, Nigel Collier, Furu Wei2026-03-10🤖 cs.LG

Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

该论文提出了包含自动难度过滤的四阶段数据处理框架,构建了强调新颖性与挑战性的 MicroCoder 数据集,并通过强化学习验证了其在提升代码生成模型解决高难度问题能力方面的显著优势。

Zongqian Li, Tengchao Lv, Shaohan Huang, Yixuan Su, Qinzheng Sun, Qiufeng Yin, Ying Xin, Scarlett Li, Lei Cui, Nigel Collier, Furu Wei2026-03-10🤖 cs.LG

ProgAgent:A Continual RL Agent with Progress-Aware Rewards

ProgAgent 提出了一种结合进度感知奖励学习与 JAX 原生高吞吐架构的持续强化学习智能体,通过从无人标注专家视频中提取密集奖励、引入对抗性正则化以应对分布偏移,并融合 PPO 与核心集回放等机制,有效解决了机器人终身学习中的灾难性遗忘与奖励指定难题,在多个基准测试及真实机器人任务中显著超越了现有基线。

Jinzhou Tan, Gabriel Adineera, Jinoh Kim2026-03-10🤖 cs.LG