cs.CL papers | Gist.Science

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

The paper introduces DataChef-32B, a reinforcement learning-based system that automates the end-to-end generation of optimal data recipes for adapting Large Language Models to specific tasks, achieving performance comparable to or exceeding human-curated pipelines and official checkpoints.

Yicheng Chen, Zerun Ma, Xinchen Xie, Yining Li, Kai Chen2026-03-09🤖 cs.AI

The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

This systematic literature review critiques the "ground truth" paradigm in machine learning as a positivistic fallacy that misinterprets human disagreement as noise, arguing instead for pluralistic annotation infrastructures that treat diverse subjective perspectives as high-fidelity signals essential for building culturally competent models.

Sheza Munir, Benjamin Mah, Krisha Kalsi, Shivani Kapania, Julian Posada, Edith Law, Ding Wang, Syed Ishtiaque Ahmed2026-03-09🤖 cs.AI

IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR

This paper introduces IntelliAsk, a question-generation model trained via RLVR with a novel reward model (IntelliReward) and DAPO optimization to produce high-quality, evidence-based research questions that outperform human reviewers and strong baselines in expert evaluations while also enhancing broader reasoning and writing capabilities.

Karun Sharma, Vidushee Vats, Shengzhi Li, Yuxiang Wang, Zhongtian Sun, Prayag Tiwari2026-03-09🤖 cs.AI

Diverse Word Choices, Same Reference: Annotating Lexically-Rich Cross-Document Coreference

This paper proposes a revised annotation scheme for cross-document coreference resolution that treats coreference chains as discourse elements to better capture lexical diversity and framing variations in news media, demonstrating through the reannotation of NewsWCL50 and ECB+ datasets that this approach enables more balanced and discourse-aware analysis.

Anastasia Zhukova, Felix Hamborg, Karsten Donnay, Norman Meuschke, Bela Gipp2026-03-09💬 cs.CL

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR $\rightarrow$ LLM Pipelines?

This paper challenges the assumption that Speech LLMs inherently outperform ASR $\rightarrow$ LLM pipelines by demonstrating through matched-backbone testing and mechanistic analysis that current Speech LLMs often function as expensive cascades relying on text representations, which can even underperform traditional pipelines under noisy conditions.

Jayadev Billa2026-03-09🤖 cs.AI

Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"?

This paper introduces novel "Text-to-Big SQL" evaluation metrics to address the limitations of existing benchmarks in assessing production-level LLM agents, demonstrating that traditional Text-to-SQL metrics fail to capture critical cost, latency, and efficiency implications that arise when scaling to large datasets.

Germán T. Eizaguirre, Lars Tissen, Marc Sánchez-Artigas2026-03-09💬 cs.CL

Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

This paper reframes the modality collapse observed in multimodal LLMs as a mismatched decoding problem, demonstrating through information-theoretic analysis and empirical validation that the accessibility of non-text information is fundamentally limited by the decoder's training objective and scoring rule rather than the encoder's architecture or alignment.

Jayadev Billa2026-03-09🤖 cs.AI

CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

The paper proposes CoME, a novel mobile agent architecture that employs four specialized experts with a progressive training strategy and an InfoGain-Driven DPO method to achieve balanced, decoupled enhancement of hybrid reasoning capabilities, outperforming existing dense and MoE approaches on AITZ and AMEX datasets.

Yuxuan Liu, Weikai Xu, Kun Huang, Changyu Chen, Jiankun Zhao, Pengzhi Gao, Wei Liu, Jian Luan, Shuo Shang, Bo Du, Ji-Rong Wen, Rui Yan2026-03-09🤖 cs.AI

Verify as You Go: An LLM-Powered Browser Extension for Fake News Detection

This paper introduces Aletheia, a novel LLM-powered browser extension that combines Retrieval-Augmented Generation with interactive user features to effectively detect fake news and provide transparent, evidence-based explanations, outperforming existing baselines in both detection accuracy and user engagement.

Dorsaf Sallami, Esma Aïmeur2026-03-09💬 cs.CL

Omni-C: Compressing Heterogeneous Modalities into a Single Dense Encoder

The paper introduces Omni-C, a single dense Transformer encoder that compresses heterogeneous modalities (text, audio, and image) into shared representations via unimodal contrastive pretraining, thereby eliminating the parameter overhead and routing complexity of Mixture-of-Expert architectures while achieving comparable performance with significantly reduced memory usage.

Kin Wai Lau, Yasar Abbas Ur Rehman, Lai-Man Po, Pedro Porto Buarque de Gusmão2026-03-09🤖 cs.AI

Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

This paper establishes that while language-equivalent context-free grammars yield identical token masks in grammar-constrained decoding, their structural differences significantly impact computational efficiency by introducing variable state-space blowups and ambiguity costs, leading to fundamental lower bounds on decoding work and new distortion metrics for masked sampling.

Faruk Alpay, Bilge Senturk2026-03-09🤖 cs.LG

EigenData: A Self-Evolving Multi-Agent Platform for Function-Calling Data Synthesis, Auditing, and Repair

The paper introduces EigenData, a self-evolving multi-agent platform that automates the synthesis, auditing, and repair of high-quality function-calling training data, demonstrating its effectiveness by systematically correcting the Berkeley Function-Calling Leaderboard (BFCL-V3) to achieve model rankings that better correlate with human judgments of functional correctness.

Jiaao Chen, Jingyuan Qi, Mingye Gao, Wei-Chen Wang, Hanrui Wang, Di Jin2026-03-09🤖 cs.AI

Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment

This paper proposes CDDS, a novel cross-modal alignment algorithm that utilizes a dual-path UNet for constrained decoupling of semantic and modality components and a distribution sampling method to bridge the modality gap, thereby achieving superior semantic consistency and outperforming state-of-the-art methods by 6.6% to 14.2%.

Xiang Ma, Lexin Fang, Litian Xu, Caiming Zhang2026-03-09🤖 cs.LG

CBR-to-SQL: Rethinking Retrieval-based Text-to-SQL using Case-based Reasoning in the Healthcare Domain

The paper introduces CBR-to-SQL, a Case-Based Reasoning framework that improves Text-to-SQL generation in healthcare by utilizing abstract case templates and a two-stage retrieval process to achieve higher accuracy, sample efficiency, and robustness compared to standard Retrieval-Augmented Generation methods on the MIMICSQL dataset.

Hung Nguyen, Hans Moen, Pekka Marttinen2026-03-09🤖 cs.AI

NOTAI.AI: Explainable Detection of Machine-Generated Text via Curvature and Feature Attribution

The paper introduces NOTAI.AI, an explainable framework that combines curvature-based signals, neural, and stylometric features within an XGBoost classifier to detect machine-generated text while using SHAP and an LLM layer to generate structured, natural-language rationales for its decisions.

Oleksandr Marchenko Breneur, Adelaide Danilov, Aria Nourbakhsh, Salima Lamsiyah2026-03-09💬 cs.CL

Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs

This paper investigates how Chain-of-Thought prompting exacerbates the leakage of personally identifiable information in large language models, demonstrating that leakage varies significantly by model family and reasoning budget, and evaluating various lightweight inference-time gatekeepers to propose hybrid policies that balance reasoning utility with privacy protection.

Patrick Ahrend, Tobias Eder, Xiyang Yang, Zhiyi Pan, Georg Groh2026-03-09💬 cs.CL

RACAS: Controlling Diverse Robots With a Single Agentic System

The paper introduces RACAS, a robot-agnostic agentic system that uses natural language communication between LLM/VLM-based modules to control diverse robotic platforms without requiring code modifications or retraining, successfully demonstrating its effectiveness across wheeled, multi-jointed, and underwater robots.

Dylan R. Ashley, Jan Przepióra, Yimeng Chen, Ali Abualsaud, Nurzhan Yesmagambet, Shinkyu Park, Eric Feron, Jürgen Schmidhuber2026-03-09🤖 cs.AI

The Fragility Of Moral Judgment In Large Language Models

This study demonstrates that large language models' moral judgments are highly fragile and manipulable, as they are significantly more influenced by narrative perspective, persuasion cues, and evaluation protocols than by the underlying moral substance of a dilemma.

Tom van Nuenen, Pratik S. Sachdeva2026-03-09🤖 cs.AI

FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation

FreeTxt-Vi is a free, open-source web toolkit that integrates a benchmarked bilingual Vietnamese-English NLP pipeline for segmentation, sentiment analysis, and summarization, enabling non-programmers to analyze text data while demonstrating competitive performance against existing baselines.

Hung Nguyen Huy, Mo El-Haj, Dawn Knight, Paul Rayson2026-03-09💬 cs.CL

Autonomous Algorithm Discovery for Ptychography via Evolutionary LLM Reasoning

The paper introduces Ptychi-Evolve, an autonomous framework that leverages large language models and evolutionary mechanisms to automatically discover and evolve novel regularization algorithms for ptychography, achieving significant reconstruction quality improvements across diverse imaging datasets.

Xiangyu Yin, Ming Du, Junjing Deng, Zhi Yang, Yimo Han, Yi Jiang2026-03-09🤖 cs.AI

← Previous Next →

cs.CL