cs.CL papers | Gist.Science

To Predict or Not to Predict? Towards reliable uncertainty estimation in the presence of noise

This study evaluates uncertainty estimation methods for multilingual text classification under noisy conditions, finding that while softmax-based approaches struggle in low-resource or domain-shift scenarios, Monte Carlo dropout offers robust calibration and significantly improves performance by enabling the abstention of uncertain predictions.

Nouran Khallaf, Serge Sharoff2026-03-10💬 cs.CL

How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty Detection

This study evaluates various denoising strategies for multilingual sentence difficulty detection using BERT, finding that while pre-trained models possess inherent noise robustness, explicit noise filtering techniques like Gaussian Mixture Models significantly boost performance on smaller datasets and help produce cleaner corpora, even if gains are marginal on larger datasets.

Nouran Khallaf, Serge Sharoff2026-03-10💬 cs.CL

RILEC: Detection and Generation of L1 Russian Interference Errors in English Learner Texts

This paper introduces RILEC, a large-scale dataset and a generative framework for detecting and creating L1 Russian interference errors in English learner texts, demonstrating that models fine-tuned on this augmented data significantly improve the identification of specific error types like transliteration and tense misuse.

Darya Kharlamova, Irina Proskurina2026-03-10💬 cs.CL

Position: LLMs Must Use Functor-Based and RAG-Driven Bias Mitigation for Fairness

This position paper proposes a dual-pronged framework for mitigating biases in large language models by integrating category-theoretic functor-based transformations to structurally map semantic domains to unbiased forms and retrieval-augmented generation to dynamically inject diverse external knowledge during inference.

Ravi Ranjan, Utkarsh Grover, Agorista Polyzou2026-03-10💬 cs.CL

Domain-Specific Quality Estimation for Machine Translation in Low-Resource Scenarios

This paper addresses the challenge of domain-specific machine translation quality estimation in low-resource scenarios by demonstrating that while prompt-only methods are fragile for open-weight models, adapting intermediate Transformer layers via Low-Rank Adaptation (ALOPE) and Low-Rank Multiplicative Adaptation (LoRMA) significantly improves robustness and performance across English-to-Indic language pairs.

Namrata Patil Gurav, Akashdeep Ranu, Archchana Sindhujan, Diptesh Kanojia2026-03-10🤖 cs.LG

SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions

This Systematization of Knowledge (SoK) paper establishes the first unified framework for Agentic Retrieval-Augmented Generation (RAG) by formalizing autonomous loops as decision-making processes, proposing a comprehensive taxonomy and architectural decomposition, critiquing current evaluation limitations and systemic risks, and outlining critical research directions for building reliable and scalable agentic systems.

Saroj Mishra, Suman Niroula, Umesh Yadav, Dilip Thakur, Srijan Gyawali, Shiva Gaire2026-03-10💬 cs.CL

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

This paper introduces OAKS, a benchmark designed to evaluate large language models' ability to adapt to continuously evolving knowledge streams, revealing that current state-of-the-art models and agentic memory systems struggle with accurate state-tracking and are highly susceptible to distraction in dynamic environments.

Jiyeon Kim, Hyunji Lee, Dylan Zhou, Sue Hyun Park, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Sungmin Cha, Minjoon Seo2026-03-10💬 cs.CL

AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions

This paper introduces AQuA, a fine-grained dataset that categorizes ambiguous visual questions into four levels with corresponding optimal response strategies, demonstrating that fine-tuning Vision-Language Models on this dataset enables them to effectively recognize ambiguity and adaptively generate context-appropriate responses such as seeking clarification or listing alternatives, thereby outperforming existing baselines.

Jihyoung Jang, Hyounghun Kim2026-03-10💬 cs.CL

Generalization in Online Reinforcement Learning for Mobile Agents

This paper addresses the underexplored challenge of generalization in online reinforcement learning for mobile GUI agents by introducing the AndroidWorld-Generalization benchmark and a scalable GRPO-based training system, demonstrating that while RL significantly improves zero-shot performance on unseen task instances, generalization to new templates and applications remains difficult and benefits from test-time few-shot adaptation.

Li Gu, Zihuan Jiang, Zhixiang Chi, Huan Liu, Ziqiang Wang, Yuanhao Yu, Glen Berseth, Yang Wang2026-03-10🤖 cs.LG

Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning

The paper proposes PACT, a fine-tuning framework that preserves LLM safety alignment by selectively constraining the model's confidence on a small subset of safety-related tokens during training, thereby preventing alignment drift without compromising downstream task performance.

Guoli Wang, Haonan Shi, Tu Ouyang, An Wang2026-03-10🤖 cs.LG

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

This paper introduces Dial, a knowledge-grounded framework that addresses the challenges of generating executable SQL across heterogeneous database systems by employing dialect-aware logical planning, a hierarchical intent-aware knowledge base, and an execution-driven debugging loop, achieving significant improvements in translation accuracy and dialect feature coverage on the newly constructed DS-NL2SQL benchmark.

Xiang Zhang, Hongming Xu, Le Zhou, Wei Zhou, Xuanhe Zhou, Guoliang Li, Yuyu Luo, Changdong Liu, Guorun Chen, Jiang Liao, Fan Wu2026-03-10🤖 cs.LG

Image Generation Models: A Technical History

This paper provides a comprehensive technical survey of the history and evolution of image generation models, detailing the objectives, architectures, and limitations of various approaches from VAEs to diffusion methods, while also addressing recent advancements in video generation and the critical challenges of robustness and responsible deployment.

Rouzbeh Shirvani2026-03-10💬 cs.CL

The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling

This paper introduces the Dual-Stream Transformer, an architecture that decomposes the residual stream into separate token and context streams with tunable mixing strategies to achieve a balance between high interpretability and minimal performance loss while demonstrating robustness to attention amplification.

J. Clayton Kerce, Alexis Fox2026-03-10🤖 cs.LG

Cross-Modal Taxonomic Generalization in (Vision-) Language Models

This paper demonstrates that vision-language models can recover and generalize taxonomic knowledge (hypernyms) from language representations even when deprived of explicit visual evidence during training, provided that the counterfactual image-label mappings maintain high visual coherence within categories.

Tianyang Xu, Marcelo Sandoval-Castaneda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra2026-03-10💬 cs.CL

Skip to the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs

This paper reveals that diffusion language models develop distinct, hierarchical internal representations with early-layer redundancy compared to autoregressive models, enabling a novel, training-free layer-skipping inference method that significantly reduces computational costs while maintaining high performance.

Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Chris Lott, Mingu Lee, Fatih Porikli2026-03-10💬 cs.CL

A Joint Neural Baseline for Concept, Assertion, and Relation Extraction from Clinical Text

This paper proposes a novel end-to-end joint neural system that simultaneously optimizes concept recognition, assertion classification, and relation extraction for clinical text, significantly outperforming traditional pipeline baselines across all three tasks.

Fei Cheng, Ribeka Tanaka, Sadao Kurohashi2026-03-10💬 cs.CL

Bolbosh: Script-Aware Flow Matching for Kashmiri Text-to-Speech

This paper introduces Bolbosh, the first open-source neural Text-to-Speech system for Kashmiri, which utilizes a script-aware, supervised cross-lingual adaptation strategy based on Optimal Transport Conditional Flow Matching and a three-stage acoustic enhancement pipeline to overcome the limitations of zero-shot multilingual baselines and achieve significantly higher speech quality and intelligibility.

Tajamul Ashraf, Burhaan Rasheed Zargar, Saeed Abdul Muizz, Ifrah Mushtaq, Nazima Mehdi, Iqra Altaf Gillani, Aadil Amin Kak, Janibul Bashir2026-03-10💬 cs.CL

TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning

TableMind++ enhances the existing TableMind framework for tool-augmented table reasoning by introducing an uncertainty-aware inference framework that mitigates hallucinations through memory-guided plan pruning, confidence-based action refinement, and dual-weighted trajectory aggregation, thereby achieving superior performance on diverse benchmarks.

Mingyue Cheng, Shuo Yu, Chuang Jiang, Xiaoyu Tao, Qingyang Mao, Jie Ouyang, Qi Liu, Enhong Chen2026-03-10💬 cs.CL

Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data

The paper introduces "Accent Vector," a novel method that enables fine-grained, controllable accent manipulation in multilingual Text-to-Speech systems by deriving accent characteristics from native non-English speech, thereby eliminating the need for accented training data.

Thanathai Lertpetchpun, Thanapat Trachu, Jihwan Lee, Tiantian Feng, Dani Byrd, Shrikanth Narayanan2026-03-10💬 cs.CL

MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

The paper introduces MAWARITH, a large-scale Arabic dataset and the MIR-E evaluation metric designed to benchmark and improve large language models' ability to perform complex, multi-step reasoning for Islamic inheritance law, revealing that while advanced models like Gemini-2.5-flash achieve high performance, many others struggle with critical legal rules and error propagation.

Abdessalam Bouchekif, Shahd Gaben, Samer Rashwani, Somaya Eltanbouly, Mutaz Al-Khatib, Heba Sbahi, Mohammed Ghaly, Emad Mohamed2026-03-10💬 cs.CL

← Previous Next →