FAST: An Efficient Scheduler for All-to-All GPU Communication

FAST is an efficient scheduler designed to overcome the scalability and performance limitations of existing solutions for All-to-All(v) communication in dynamic Mixture-of-Experts workloads by addressing traffic skew and incast congestion while drastically reducing synthesis time on modern GPU clusters.

Yiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko Nurvitadhi2026-03-09💻 cs

DVD-Quant: Data-free Video Diffusion Transformers Quantization

This paper introduces DVD-Quant, a novel data-free post-training quantization framework for Video Diffusion Transformers that utilizes Bounded-init Grid Refinement, Auto-scaling Rotated Quantization, and δ\delta-Guided Bit Switching to achieve a 2×\times speedup and enable W4A4 quantization without compromising visual fidelity.

Zhiteng Li, Hanxuan Li, Junyi Wu, Kai Liu, Haotong Qin, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang2026-03-09💻 cs

Instance Data Condensation for Image Super-Resolution

This paper introduces Instance Data Condensation (IDC), a novel framework utilizing Random Local Fourier Feature Extraction and Multi-level Feature Distribution Matching to synthesize a highly compact (10% volume) dataset for Image Super-Resolution that achieves performance comparable to the original full dataset while significantly reducing computational and storage requirements.

Tianhao Peng, Ho Man Kwan, Yuxuan Jiang, Ge Gao, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull2026-03-09💻 cs

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using F2\mathbb{F}_2

This paper introduces "Linear Layouts," a novel framework that models tensor layouts as linear algebra operations over F2\mathbb{F}_2 to enable generic, efficient, and bug-free layout definitions and conversions for deep learning workloads, successfully integrating with the Triton compiler to overcome the limitations of existing case-by-case approaches.

Keren Zhou, Mario Lezcano, Adam Goucher, Akhmed Rakhmati, Jeff Niu, Justin Lebar, Pawel Szczerbuk, Peter Bell, Phil Tillet, Thomas Raoux, Zahi Moudallal2026-03-09💻 cs

ROS-related Robotic Systems Development with V-model-based Application of MeROS Metamodel

This paper proposes a structured methodology that integrates the Robot Operating System (ROS) with Model-Based Systems Engineering (MBSE) through a specialized SysML metamodel called MeROS and an adapted V-model, aiming to enhance the semantic coherence, structural traceability, and reliable coordination of complex heterogeneous robotic systems.

Tomasz Winiarski, Jan Kaniuka, Daniel Giełdowski, Jakub Ostrysz, Krystian Radlak, Dmytro Kushnir2026-03-09💻 cs

TrinityDNA: A Bio-Inspired Foundational Model for Efficient Long-Sequence DNA Modeling

TrinityDNA is a novel, bio-inspired foundational model that integrates structural feature capture, symmetry handling, multi-scale attention, and evolutionary training to efficiently model long DNA sequences, significantly advancing gene function prediction and regulatory discovery while introducing a new long-sequence CDS annotation benchmark.

Qirong Yang, Yucheng Guo, Zicheng Liu, Yujie Yang, Qijin Yin, Siyuan Li, Shaomin Ji, Linlin Chao, Xiaoming Zhang, Stan Z. Li2026-03-09💻 cs

Bridging Simulation and Usability: A User-Friendly Framework for Scenario Generation in CARLA

This paper introduces an interactive, no-code framework with a graphical interface and graph-based representation to democratize scenario generation for autonomous driving validation in CARLA, enabling non-technical users to efficiently create, manage, and execute diverse test scenarios without programming expertise.

Ahmed Abouelazm, Mohammad Mahmoud, Conrad Walter, Oleksandr Shchetsura, Erne Hussong, Helen Gremmelmaier, J. Marius Zöllner2026-03-09💻 cs