In-Context Reinforcement Learning for Tool Use in Large Language Models

This paper proposes In-Context Reinforcement Learning (ICRL), a novel framework that eliminates the need for supervised fine-tuning by leveraging few-shot prompting during reinforcement learning rollouts to progressively teach large language models how to effectively use external tools, ultimately achieving state-of-the-art performance in a data-efficient, zero-shot manner.

Yaoqi Ye, Yiran Zhao, Keyu Duan, Zeyu Zheng, Kenji Kawaguchi, Cihang Xie, Michael Qizhe Shieh2026-03-10💻 cs

DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

This paper introduces DSH-Bench, a comprehensive benchmark featuring a hierarchical subject taxonomy, granular difficulty and scenario classification, and a novel Subject Identity Consistency Score (SICS) metric to systematically evaluate and diagnose subject-driven text-to-image generation models.

Zhenyu Hu, Qing Wang, Te Cao, Luo Liao, Longfei Lu, Liqun Liu, Shuang Li, Hang Chen, Mengge Xue, Yuan Chen, Chao Deng, Peng Shu, Huan Yu, Jie Jiang2026-03-10💻 cs

DC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning

This paper introduces the Dual-Consensus Weak-to-Strong (DC-W2S) framework, which enhances the reliability of Process Reward Models in biological reasoning by strategically filtering noisy weak supervision signals through self- and neighborhood-consensus metrics to enable robust training without exhaustive expert annotation.

Chi-Min Chan, Ehsan Hajiramezanali, Xiner Li, Edward De Brouwer, Carl Edwards, Wei Xue, Sirui Han, Yike Guo, Gabriele Scalia2026-03-10🤖 cs.LG

UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking

This paper identifies the critical limitation of current LLM-based agents in accessing unindexed information, introduces the first dedicated UIS-QA benchmark to quantify this challenge, and proposes UIS-Digger, a multi-agent framework that significantly outperforms state-of-the-art models by effectively combining dual-mode browsing and file parsing to retrieve vital unindexed data.

Chang Liu, Chuqiao Kuang, Tianyi Zhuang, Yuxin Cheng, Huichi Zhou, Xiaoguang Li, Lifeng Shang2026-03-10💻 cs

SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action

SaiVLA-0 introduces a neuroscience-inspired, compute-aware Vision-Language-Action framework featuring a tripartite Cerebrum-Pons-Cerebellum architecture that decouples high-level semantics from real-time control to achieve modular scalability, active foveated vision, and significant improvements in training efficiency and task success rates.

Xiang Shi, Wenlong Huang, Menglin Zou, Xinhai Sun2026-03-10🤖 cs.LG

An explainable hybrid deep learning-enabled intelligent fault detection and diagnosis approach for automotive software systems validation

This paper proposes a novel explainable hybrid deep learning framework combining 1D-CNN and GRU architectures with interpretability techniques like IGs and SHAP to enhance fault detection, diagnosis, and root cause analysis in automotive software system validation while overcoming the limitations of traditional black-box models.

Mohammad Abboush, Ehab Ghannoum, Andreas Rausch2026-03-10💻 cs

Evidence-Driven Reasoning for Industrial Maintenance Using Heterogeneous Data

This paper introduces the Condition Insight Agent, a deployed decision-support framework that integrates heterogeneous industrial data sources through constrained, rule-verified LLM reasoning to generate evidence-grounded maintenance explanations and actionable advice while ensuring reliability and human oversight.

Fearghal O'Donncha, Nianjun Zhou, Natalia Martinez, James T Rayfield, Fenno F. Heath III, Abigail Langbridge, Roman Vaculin2026-03-10💻 cs

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

This paper reveals that hidden states in end-to-end full-duplex speech models like SALM-Duplex and Moshi significantly leak speaker identity, and proposes two streaming anonymization methods using Stream-Voice-Anon that effectively mitigate this privacy risk while maintaining low-latency dialogue performance.

Nikita Kuzmin, Tao Zhong, Jiajun Deng, Yingke Zhu, Tristan Tsoi, Tianxiang Cao, Simon Lui, Kong Aik Lee, Eng Siong Chng2026-03-10💻 cs

TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation

This paper introduces TildeOpen LLM, a 30-billion-parameter open-weight model that achieves superior performance across 34 European languages, particularly for low-resource groups, by employing curriculum learning and dataset upsampling to address data imbalances without requiring increased computational resources.

Toms Bergmanis, Martins Kronis, Ingus J\=anis Pretkalninš, D\=avis Nicmanis, Jelizaveta Jelinska, Roberts Rozis, Rinalds V\=iksna, M\=arcis Pinnis2026-03-10💬 cs.CL

MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data

This paper proposes MM-TS, a novel framework for multi-modal contrastive learning that dynamically adjusts temperature and margin schedules based on local data distribution to address long-tail imbalances, unifying InfoNCE and max-margin objectives to achieve state-of-the-art performance across multiple image- and video-language datasets.

Siarhei Sheludzko, Dhimitrios Duka, Bernt Schiele, Hilde Kuehne, Anna Kukleva2026-03-10💻 cs

Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules

This paper critiques the reliance of current tabular foundation model benchmarks on point-estimate metrics like MSE, advocating instead for the adoption of proper scoring rules such as CRPS to evaluate probabilistic forecasts and the use of finetuning or promptable strategies to align model inductive biases with distributional regression goals.

Jonas Landsgesell, Pascal Knoll2026-03-10🤖 cs.LG

Alignment-Aware and Reliability-Gated Multimodal Fusion for Unmanned Aerial Vehicle Detection Across Heterogeneous Thermal-Visual Sensors

This paper proposes two novel fusion strategies, Registration-aware Guided Image Fusion (RGIF) and Reliability-Gated Modality-Attention Fusion (RGMAF), which effectively integrate heterogeneous thermal and visual sensor data to significantly enhance unmanned aerial vehicle detection performance across diverse perspectives and resolutions.

Ishrat Jahan, Molla E Majid, M Murugappan, Muhammad E. H. Chowdhury, N. B. Prakash, Saad Bin Abul Kashem, Balamurugan Balusamy, Amith Khandakar2026-03-10💻 cs