Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness

The paper demonstrates that unlike in domains with external verifiers, scaling inference compute through crowd wisdom strategies fails to improve LLM truthfulness in unverified settings because correlated model errors and the inability to distinguish social prediction from truth verification cause aggregation to reinforce shared misconceptions rather than identify correct answers.

Yegor Denisov-Blanch, Joshua Kazdan, Jessica Chudnovsky, Rylan Schaeffer, Sheng Guan, Soji Adeshina, Sanmi Koyejo2026-03-10🤖 cs.LG

Annealed Co-Generation: Disentangling Variables via Progressive Pairwise Modeling

This paper proposes Annealed Co-Generation (ACG), a framework that replaces high-dimensional joint diffusion modeling with a low-dimensional, pairwise approach coupled through a three-stage annealing process to achieve efficient and consistent multivariate co-generation for scientific applications like flow-field completion and antibody generation.

Hantao Zhang, Jieke Wu, Mingda Xu, Xiao Hu, Yingxuan You, Pascal Fua2026-03-10🤖 cs.LG

Evo: Autoregressive-Diffusion Large Language Models with Evolving Balance

The paper introduces Evo, a novel large language model that unifies autoregressive and diffusion-based generation within a continuous evolutionary latent framework, enabling adaptive balancing of planning and refinement to achieve state-of-the-art performance across diverse benchmarks while maintaining fast inference speeds.

Junde Wu, Minhao Hu, Jiayuan Zhu, Yuyuan Liu, Tianyi Zhang, Kang Li, Jingkun Chen, Jiazhen Pan, Min Xu, Yueming Jin2026-03-10🤖 cs.LG

Distilling and Adapting: A Topology-Aware Framework for Zero-Shot Interaction Prediction in Multiplex Biological Networks

This paper proposes a novel topology-aware framework that leverages domain-specific foundation models, a graph tokenizer for multiplex connectivity, and knowledge distillation to achieve robust zero-shot interaction prediction in multiplex biological networks, outperforming state-of-the-art methods.

Alana Deng, Sugitha Janarthanan, Yan Sun, Zihao Jing, Pingzhao Hu2026-03-10🤖 cs.LG

From ARIMA to Attention: Power Load Forecasting Using Temporal Deep Learning

This paper empirically demonstrates that a Transformer model utilizing self-attention mechanisms outperforms traditional ARIMA and recurrent neural network approaches (LSTM, BiLSTM) in short-term power load forecasting on PJM data, achieving a superior 3.8% MAPE and highlighting the effectiveness of attention-based architectures for capturing complex temporal patterns.

Suhasnadh Reddy Veluru, Sai Teja Erukude, Viswa Chaitanya Marella2026-03-10🤖 cs.LG

HEARTS: Benchmarking LLM Reasoning on Health Time Series

The paper introduces HEARTS, a comprehensive benchmark comprising 16 real-world health datasets and 110 tasks across four reasoning capabilities, which reveals that current large language models significantly underperform specialized models in health time series analysis due to struggles with multi-step temporal reasoning and reliance on simple heuristics.

Sirui Li, Shuhan Xiao, Mihir Joshi, Ahmed Metwally, Daniel McDuff, Wei Wang, Yuzhe Yang2026-03-10🤖 cs.LG

Trust Aware Federated Learning for Secure Bone Healing Stage Interpretation in e-Health

This paper proposes a trust-aware federated learning framework that utilizes an Adaptive Trust Score Scaling and Filtering mechanism to secure bone healing stage interpretation in e-Health by mitigating the impact of unreliable or adversarial participants while maintaining model integrity and predictive performance.

Paul Shepherd, Tasos Dagiuklas, Bugra Alkan, Joaquim Bastos, Jonathan Rodriguez2026-03-10🤖 cs.LG