How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty Detection

This study evaluates various denoising strategies for multilingual sentence difficulty detection using BERT, finding that while pre-trained models possess inherent noise robustness, explicit noise filtering techniques like Gaussian Mixture Models significantly boost performance on smaller datasets and help produce cleaner corpora, even if gains are marginal on larger datasets.

Nouran Khallaf, Serge Sharoff2026-03-10💬 cs.CL

Domain-Specific Quality Estimation for Machine Translation in Low-Resource Scenarios

This paper addresses the challenge of domain-specific machine translation quality estimation in low-resource scenarios by demonstrating that while prompt-only methods are fragile for open-weight models, adapting intermediate Transformer layers via Low-Rank Adaptation (ALOPE) and Low-Rank Multiplicative Adaptation (LoRMA) significantly improves robustness and performance across English-to-Indic language pairs.

Namrata Patil Gurav, Akashdeep Ranu, Archchana Sindhujan, Diptesh Kanojia2026-03-10🤖 cs.LG

SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions

This Systematization of Knowledge (SoK) paper establishes the first unified framework for Agentic Retrieval-Augmented Generation (RAG) by formalizing autonomous loops as decision-making processes, proposing a comprehensive taxonomy and architectural decomposition, critiquing current evaluation limitations and systemic risks, and outlining critical research directions for building reliable and scalable agentic systems.

Saroj Mishra, Suman Niroula, Umesh Yadav, Dilip Thakur, Srijan Gyawali, Shiva Gaire2026-03-10💬 cs.CL

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

This paper introduces OAKS, a benchmark designed to evaluate large language models' ability to adapt to continuously evolving knowledge streams, revealing that current state-of-the-art models and agentic memory systems struggle with accurate state-tracking and are highly susceptible to distraction in dynamic environments.

Jiyeon Kim, Hyunji Lee, Dylan Zhou, Sue Hyun Park, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Sungmin Cha, Minjoon Seo2026-03-10💬 cs.CL

AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions

This paper introduces AQuA, a fine-grained dataset that categorizes ambiguous visual questions into four levels with corresponding optimal response strategies, demonstrating that fine-tuning Vision-Language Models on this dataset enables them to effectively recognize ambiguity and adaptively generate context-appropriate responses such as seeking clarification or listing alternatives, thereby outperforming existing baselines.

Jihyoung Jang, Hyounghun Kim2026-03-10💬 cs.CL

Generalization in Online Reinforcement Learning for Mobile Agents

This paper addresses the underexplored challenge of generalization in online reinforcement learning for mobile GUI agents by introducing the AndroidWorld-Generalization benchmark and a scalable GRPO-based training system, demonstrating that while RL significantly improves zero-shot performance on unseen task instances, generalization to new templates and applications remains difficult and benefits from test-time few-shot adaptation.

Li Gu, Zihuan Jiang, Zhixiang Chi, Huan Liu, Ziqiang Wang, Yuanhao Yu, Glen Berseth, Yang Wang2026-03-10🤖 cs.LG

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

This paper introduces Dial, a knowledge-grounded framework that addresses the challenges of generating executable SQL across heterogeneous database systems by employing dialect-aware logical planning, a hierarchical intent-aware knowledge base, and an execution-driven debugging loop, achieving significant improvements in translation accuracy and dialect feature coverage on the newly constructed DS-NL2SQL benchmark.

Xiang Zhang, Hongming Xu, Le Zhou, Wei Zhou, Xuanhe Zhou, Guoliang Li, Yuyu Luo, Changdong Liu, Guorun Chen, Jiang Liao, Fan Wu2026-03-10🤖 cs.LG

Skip to the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs

This paper reveals that diffusion language models develop distinct, hierarchical internal representations with early-layer redundancy compared to autoregressive models, enabling a novel, training-free layer-skipping inference method that significantly reduces computational costs while maintaining high performance.

Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Chris Lott, Mingu Lee, Fatih Porikli2026-03-10💬 cs.CL

Bolbosh: Script-Aware Flow Matching for Kashmiri Text-to-Speech

This paper introduces Bolbosh, the first open-source neural Text-to-Speech system for Kashmiri, which utilizes a script-aware, supervised cross-lingual adaptation strategy based on Optimal Transport Conditional Flow Matching and a three-stage acoustic enhancement pipeline to overcome the limitations of zero-shot multilingual baselines and achieve significantly higher speech quality and intelligibility.

Tajamul Ashraf, Burhaan Rasheed Zargar, Saeed Abdul Muizz, Ifrah Mushtaq, Nazima Mehdi, Iqra Altaf Gillani, Aadil Amin Kak, Janibul Bashir2026-03-10💬 cs.CL

TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning

TableMind++ enhances the existing TableMind framework for tool-augmented table reasoning by introducing an uncertainty-aware inference framework that mitigates hallucinations through memory-guided plan pruning, confidence-based action refinement, and dual-weighted trajectory aggregation, thereby achieving superior performance on diverse benchmarks.

Mingyue Cheng, Shuo Yu, Chuang Jiang, Xiaoyu Tao, Qingyang Mao, Jie Ouyang, Qi Liu, Enhong Chen2026-03-10💬 cs.CL

MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

The paper introduces MAWARITH, a large-scale Arabic dataset and the MIR-E evaluation metric designed to benchmark and improve large language models' ability to perform complex, multi-step reasoning for Islamic inheritance law, revealing that while advanced models like Gemini-2.5-flash achieve high performance, many others struggle with critical legal rules and error propagation.

Abdessalam Bouchekif, Shahd Gaben, Samer Rashwani, Somaya Eltanbouly, Mutaz Al-Khatib, Heba Sbahi, Mohammed Ghaly, Emad Mohamed2026-03-10💬 cs.CL