Weakly Supervised Teacher-Student Framework with Progressive Pseudo-mask Refinement for Gland Segmentation

This paper proposes a weakly supervised teacher-student framework with progressive pseudo-mask refinement that leverages sparse annotations and an Exponential Moving Average stabilized teacher network to achieve accurate and generalizable gland segmentation in colorectal histopathology, effectively addressing the scarcity of pixel-level labels.

Hikmat Khan, Wei Chen, Muhammad Khalid Khan Niazi2026-03-10💻 cs

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

The paper introduces PostTrainBench, a benchmark evaluating the ability of autonomous AI agents to automate LLM post-training under strict compute constraints, revealing that while frontier agents can outperform official models in specific targeted scenarios, they generally lag behind and exhibit concerning failure modes such as reward hacking and unauthorized data usage.

Ben Rank, Hardik Bhatnagar, Ameya Prabhu, Shira Eisenberg, Karina Nguyen, Matthias Bethge, Maksym Andriushchenko2026-03-10🤖 cs.LG

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

The paper introduces OfficeQA Pro, a challenging enterprise benchmark using a massive corpus of U.S. Treasury Bulletins to demonstrate that current frontier AI agents struggle significantly with grounded, multi-document reasoning, achieving low accuracy even with direct document access and benefiting notably from structured document representations.

Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins, Ivan Zhou, Cindy Wang, Ashutosh Baheti, Owen Oertell, Jacob Portes, Sam Havens, Erich Elsen, Michael Bendersky, Matei Zaharia, Xing Chen2026-03-10💬 cs.CL

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

RAG-Driver is a novel retrieval-augmented multi-modal large language model that leverages in-context learning with expert demonstrations to achieve state-of-the-art, explainable, and zero-shot generalizable autonomous driving without requiring costly retraining or suffering from catastrophic forgetting.

Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd2026-03-09🤖 cs.AI

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

This paper derives model-agnostic theoretical lower-bounds for the energy-to-solution metric of ideal neuromorphic learning-in-memory optimizers by analyzing their out-of-equilibrium thermodynamics, demonstrating how matching memory dynamics to optimization processes can overcome energy bottlenecks associated with memory writes and consolidation in large-scale AI workloads.

Zihao Chen, Faiek Ahsan, Johannes Leugering, Gert Cauwenberghs, Shantanu Chakrabartty2026-03-09🤖 cs.AI

Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information

This paper proposes a pose-aware in-context visual learning (PA-ICVL) framework that enhances Vision-Language Models' ability to detect semantic structural visual hallucinations in non-photorealistic cartoon images by integrating pose information alongside RGB data, achieving significant performance improvements over RGB-only baselines.

Bumsoo Kim, Wonseop Shin, Kyuchul Lee, Yonghoon Jung, Sanghyun Seo2026-03-09🤖 cs.AI

Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition

This paper proposes a novel two-stage active learning pipeline for automatic speech recognition that combines unsupervised x-vector clustering with a supervised Bayesian batch selection method to efficiently identify diverse and informative samples, thereby significantly reducing labeling effort while improving model performance across various test conditions.

Ognjen Kundacina, Vladimir Vincan, Dragisa Miskovic2026-03-09⚡ eess