Information Routing in Atomistic Foundation Models: How Task Alignment and Equivariance Shape Linear Disentanglement

This paper introduces Compositional Probe Decomposition (CPD) to demonstrate that linear disentanglement of geometric and compositional information in atomistic foundation models is primarily driven by task alignment rather than architecture, revealing a significant performance gradient where models trained on specific properties like HOMO-LUMO gaps outperform energy-trained models and exhibit symmetry-dependent information routing.

Joshua SteierTue, 10 Ma🤖 cs.LG

ARC-AGI-2 Technical Report

This paper presents a transformer-based system that significantly advances ARC performance by integrating a compact task encoding, symmetry-based data augmentation, test-time LoRA adaptation, and multi-perspective decoding to enable efficient neural inference and human-level generalization from few examples.

Wallyson Lemes de Oliveira, Mekhron Bobokhonov, Matteo Caorsi, Aldo Podestà, Gabriele Beltramo, Luca Crosato, Matteo Bonotto, Federica Cecchetto, Hadrien Espic, Dan Titus Salajan, Stefan Taga, Luca Pana, Joe CarthyTue, 10 Ma💬 cs.CL

A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

This paper demonstrates that current LLM-as-a-Judge frameworks fail to reliably measure adversarial robustness due to unaccounted distribution shifts that degrade performance to near-random levels, often leading to inflated attack success rates, and proposes new benchmarks to address these evaluation flaws.

Leo Schwinn, Moritz Ladenburger, Tim Beyer, Mehrnaz Mofakhami, Gauthier Gidel, Stephan GünnemannTue, 10 Ma💬 cs.CL

Distributionally Robust Geometric Joint Chance-Constrained Optimization: Neurodynamic Approaches

This paper introduces a two-time scale neurodynamic duplex approach utilizing projection equations to solve distributionally robust geometric joint chance-constrained optimization problems with unknown distributions, demonstrating convergence to the global optimum through neural networks in applications such as shape optimization and telecommunications.

Ange Valli (L2S), Siham Tassouli (OPTIM), Abdel Lisser (L2S)Tue, 10 Ma🔢 math

Building the ethical AI framework of the future: from philosophy to practice

This paper proposes an ethics-by-design control architecture that operationalizes AI governance across the entire lifecycle by embedding philosophical reasoning into a triple-gate enforcement structure (Metric, Governance, and Eco) with measurable triggers and audit trails, thereby translating normative commitments into testable controls compatible with existing MLOps pipelines and major regulatory frameworks like the EU AI Act and NIST RMF.

Jasper Kyle CatapangTue, 10 Ma💻 cs

Scale Dependent Data Duplication

This paper demonstrates that data duplication is scale-dependent, revealing that as model capability and corpus size increase, semantically equivalent documents behave like exact duplicates by producing aligned gradients and causing accelerated semantic collisions, which leads to rapidly increasing training losses for larger models and necessitates new scaling laws to accurately predict performance.

Joshua Kazdan, Noam Levi, Rylan Schaeffer, Jessica Chudnovsky, Abhay Puri, Bo He, Mehmet Donmez, Sanmi Koyejo, David DonohoTue, 10 Ma🤖 cs.LG

Multi-Agent DRL for V2X Resource Allocation: Disentangling Challenges and Benchmarking Solutions

This paper addresses the lack of systematic evaluation in Multi-Agent Deep Reinforcement Learning for C-V2X resource allocation by introducing a disentangled benchmark suite of interference games and diverse datasets to isolate specific challenges, ultimately identifying policy robustness and generalization across vehicular topologies as the primary hurdle and demonstrating the superiority of actor-critic methods over value-based approaches.

Siyuan Wang, Lei Lei, Pranav Maheshwari, Sam Bellefeuille, Kan Zheng, Dusit NiyatoTue, 10 Ma🤖 cs.LG

Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible Reinforcement Learning Research

To address the complexity gap between StarCraft II's full game and its mini-games, this paper introduces the Two-Bridge Map Suite, an open-source, lightweight benchmark that isolates tactical navigation and combat skills to enable accessible reinforcement learning research under realistic compute budgets.

Sourav Panda, Shreyash Kale, Tanmay Ambadkar, Abhinav Verma, Jonathan DodgeTue, 10 Ma🤖 cs.LG

Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness

The paper demonstrates that unlike in domains with external verifiers, scaling inference compute through crowd wisdom strategies fails to improve LLM truthfulness in unverified settings because correlated model errors and the inability to distinguish social prediction from truth verification cause aggregation to reinforce shared misconceptions rather than identify correct answers.

Yegor Denisov-Blanch, Joshua Kazdan, Jessica Chudnovsky, Rylan Schaeffer, Sheng Guan, Soji Adeshina, Sanmi KoyejoTue, 10 Ma🤖 cs.LG

Annealed Co-Generation: Disentangling Variables via Progressive Pairwise Modeling

This paper proposes Annealed Co-Generation (ACG), a framework that replaces high-dimensional joint diffusion modeling with a low-dimensional, pairwise approach coupled through a three-stage annealing process to achieve efficient and consistent multivariate co-generation for scientific applications like flow-field completion and antibody generation.

Hantao Zhang, Jieke Wu, Mingda Xu, Xiao Hu, Yingxuan You, Pascal FuaTue, 10 Ma🤖 cs.LG

Evo: Autoregressive-Diffusion Large Language Models with Evolving Balance

The paper introduces Evo, a novel large language model that unifies autoregressive and diffusion-based generation within a continuous evolutionary latent framework, enabling adaptive balancing of planning and refinement to achieve state-of-the-art performance across diverse benchmarks while maintaining fast inference speeds.

Junde Wu, Minhao Hu, Jiayuan Zhu, Yuyuan Liu, Tianyi Zhang, Kang Li, Jingkun Chen, Jiazhen Pan, Min Xu, Yueming JinTue, 10 Ma🤖 cs.LG

Distilling and Adapting: A Topology-Aware Framework for Zero-Shot Interaction Prediction in Multiplex Biological Networks

This paper proposes a novel topology-aware framework that leverages domain-specific foundation models, a graph tokenizer for multiplex connectivity, and knowledge distillation to achieve robust zero-shot interaction prediction in multiplex biological networks, outperforming state-of-the-art methods.

Alana Deng, Sugitha Janarthanan, Yan Sun, Zihao Jing, Pingzhao HuTue, 10 Ma🤖 cs.LG