CCR-Bench: A Comprehensive Benchmark for Evaluating LLMs on Complex Constraints, Control Flows, and Real-World Cases

This paper introduces CCR-Bench, a novel benchmark designed to rigorously evaluate large language models on complex, real-world industrial tasks involving entangled content-format requirements and intricate logical workflows, revealing significant performance gaps in even state-of-the-art models.

Xiaona Xue, Yiqiao Huang, Jiacheng Li, Yuanhang Zheng, Huiqi Miao, Yunfei Ma, Rui Liu, Xinbao Sun, Minglu Liu, Fanyu Meng, Chao Deng, Junlan Feng2026-03-10💬 cs.CL

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

This paper introduces a particle filtering framework to rigorously analyze the accuracy-cost tradeoffs of parallel inference methods in large language models, establishing theoretical guarantees and identifying fundamental limits while demonstrating that sampling error alone does not fully predict final model accuracy.

Noah Golowich, Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, Akshay Krishnamurthy2026-03-10🤖 cs.LG

Designing probabilistic AI monsoon forecasts to inform agricultural decision-making

This paper presents a decision-theory framework and a blended AI-statistical forecasting system that successfully delivered skillful, tailored monsoon onset predictions to 38 million Indian farmers in 2025, enabling better agricultural decision-making under uncertainty.

Colin Aitken, Rajat Masiwal, Adam Marchakitus, Katherine Kowal, Mayank Gupta, Tyler Yang, Amir Jina, Pedram Hassanzadeh, William R. Boos, Michael Kremer2026-03-10🤖 cs.LG

EveryQuery: Zero-Shot Clinical Prediction via Task-Conditioned Pretraining over Electronic Health Records

EveryQuery is a novel electronic health record foundation model that achieves efficient, zero-shot clinical prediction by directly estimating outcome likelihoods through task-conditioned pre-training, thereby outperforming computationally expensive autoregressive baselines—particularly for rare events—while currently facing limitations in complex disjunctive reasoning tasks.

Payal Chandak, Gregory Kondas, Isaac Kohane, Matthew McDermott2026-03-10💻 cs

Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases

This paper introduces Rel-MOSS, a novel relation-centric deep learning framework that addresses the critical issue of class imbalance in relational databases by employing a relation-wise gating controller and a relation-guided minority synthesizer to enhance the representation and over-sampling of minority entities, thereby significantly outperforming existing methods in entity classification tasks.

Jun Yin, Peng Huo, Bangguo Zhu, Hao Yan, Senzhang Wang, Shirui Pan, Chengqi Zhang2026-03-10🤖 cs.LG

IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation

The paper proposes IMSE, a test-time adaptation method that fine-tunes only the singular values of Vision Transformer linear layers via a spectral mixture of experts and a diversity maximization loss to prevent feature collapse, achieving state-of-the-art performance with significantly fewer trainable parameters.

Sunghyun Baek (Korea Advanced Institute of Science and Technology), Jaemyung Yu (Korea Advanced Institute of Science and Technology), Seunghee Koh (Korea Advanced Institute of Science and Technology), Minsu Kim (LG Energy Solution), Hyeonseong Jeon (LG Energy Solution), Junmo Kim (Korea Advanced Institute of Science and Technology)2026-03-10💻 cs

ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM Framework

This paper introduces ELLMob, a self-aligned Large Language Model framework that leverages Fuzzy-Trace Theory to reconcile habitual patterns with event constraints, addressing the lack of event-annotated datasets and significantly improving the generation of human mobility trajectories during major societal events like typhoons, pandemics, and the Olympics.

Yusong Wang, Chuang Yang, Jiawei Wang, Xiaohang Xu, Jiayi Xu, Dongyuan Li, Chuan Xiao, Renhe Jiang2026-03-10🤖 cs.LG

Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs

This paper introduces EvoStage, a novel evolutionary paradigm that leverages large language models with a stagewise, multi-agent approach and real-time feedback to overcome the limitations of black-box modeling, successfully generating algorithm designs that outperform both human experts and existing methods in complex industrial tasks like chip placement and black-box optimization.

Chen Lu, Ke Xue, Chengrui Gao, Yunqi Shi, Siyuan Xu, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou2026-03-10💻 cs

Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning

This paper introduces HILA, a Human-In-the-Loop Multi-Agent Collaboration framework that employs Dual-Loop Policy Optimization to train agents with metacognitive policies for dynamically deferring to human experts and continuously improving their reasoning capabilities, thereby overcoming the static knowledge limitations of purely autonomous systems.

Wei Yang, Defu Cao, Jiacheng Pang, Muyan Weng, Yan Liu2026-03-10💻 cs

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

VORL-EXPLORE is a hybrid learning and planning framework for multi-robot exploration in dynamic environments that couples task allocation with motion execution via a shared navigability fidelity signal, enabling adaptive arbitration between global and reactive policies to prevent bottlenecks and ensure robust, collision-free coverage.

Ning Liu, Sen Shen, Zheng Li, Sheng Liu, Dongkun Han, Shangke Lyu, Thomas Braunl2026-03-10💻 cs

OSExpert: Computer-Use Agents Learning Professional Skills via Exploration

The paper introduces OSExpert, a computer-use agent that leverages a GUI-based depth-first search exploration algorithm to discover action primitives and self-construct a skill curriculum, thereby significantly improving performance and efficiency on complex tasks to approach human expert levels.

Jiateng Liu, Zhenhailong Wang, Rushi Wang, Bingxuan Li, Jeonghwan Kim, Aditi Tiwari, Pengfei Yu, Denghui Zhang, Heng Ji2026-03-10💻 cs