cs.CL papers | Gist.Science

SarcasmMiner: A Dual-Track Post-Training Framework for Robust Audio-Visual Sarcasm Reasoning

SarcasmMiner is a reinforcement learning-based post-training framework that employs a dual-track distillation strategy with a generative reward model and group relative policy optimization to significantly enhance robust audio-visual sarcasm reasoning and reduce hallucinations in foundation models.

Zhu Li, Yongjian Chen, Huiyuan Lai + 3 more2026-03-06💬 cs.CL

Knowledge Divergence and the Value of Debate for Scalable Oversight

This paper establishes a formal geometric framework linking AI debate and RLAIF by demonstrating that the value of debate scales with knowledge divergence between models, transitioning from negligible benefit to essential oversight as representations diverge, while identifying specific regimes where debate unlocks inaccessible outcomes or risks coordination failure.

Robin Young2026-03-06🤖 cs.LG

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

WavSLM is a single-stream speech language model that achieves competitive speech generation and consistency without text supervision by quantizing and distilling WavLM representations into a single codebook for autoregressive next-chunk prediction.

Luca Della Libera, Cem Subakan, Mirco Ravanelli2026-03-06🤖 cs.AI

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

The paper introduces Med-V1, a family of efficient 3-billion-parameter small language models trained on synthetic data that achieve performance comparable to frontier models like GPT-5 for biomedical evidence attribution and hallucination detection, while enabling scalable applications such as analyzing citation validity and identifying evidence misattributions in clinical guidelines.

Qiao Jin, Yin Fang, Lauren He + 12 more2026-03-06🤖 cs.AI

PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration

This paper introduces PersianPunc, a large-scale dataset of 17 million samples, and a fine-tuned ParsBERT model that achieves a 91.33% macro-averaged F1 score for Persian punctuation restoration, offering a more efficient and accurate alternative to large language models for real-time applications.

Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery2026-03-06🤖 cs.AI

A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes

This paper introduces a freely accessible, multilingual corpus of original and human-annotated Easy-to-Read texts in Spanish, Catalan, and Italian, designed to support automatic text simplification research and enhance access to democratic participatory processes.

Stefan Bott, Verena Riegler, Horacio Saggion + 2 more2026-03-06💬 cs.CL

Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

This paper investigates model merging as a scalable alternative to full fine-tuning for multi-domain ASR, benchmarking 11 algorithms across 10 European Portuguese domains and introducing a novel "BoostedTSV-M" method that outperforms full fine-tuning while preserving out-of-distribution generalization.

Carlos Carvalho, Francisco Teixeira, Thomas Rolland + 1 more2026-03-06💬 cs.CL

DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning

The paper proposes DiSCTT, a difficulty-aware, consensus-guided self-curriculum framework that dynamically allocates supervised fine-tuning or reinforcement learning strategies based on instance-level agreement among reasoning trajectories, thereby achieving more stable, efficient, and accurate test-time adaptation for large language models on heterogeneous reasoning tasks.

Mohammad Mahdi Moradi, Sudhir Mudur2026-03-06💬 cs.CL

Progressive Residual Warmup for Language Model Pretraining

The paper proposes Progressive Residual Warmup (ProRes), a method that stabilizes and accelerates the pretraining of Transformer-based language models by implementing an "early layer learns first" philosophy where deeper layers gradually activate only after shallower layers have settled, resulting in faster convergence and improved downstream performance.

Tianhao Chen, Xin Xu, Lu Yin + 4 more2026-03-06💬 cs.CL

An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs

This study demonstrates that carefully fine-tuned low-parameter LLMs (<4B) utilizing Chain-of-Thought reasoning and neighbor-word analysis can achieve Word Sense Disambiguation performance comparable to or exceeding state-of-the-art high-parameter models like GPT-4-Turbo, while significantly reducing computational and energy costs.

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough2026-03-06💬 cs.CL

Dissociating Direct Access from Inference in AI Introspection

This paper demonstrates that AI models detect injected internal representations through two distinct mechanisms—probability-matching based on prompt anomalies and direct, content-agnostic access to internal states—where the latter allows anomaly detection without reliable semantic identification, aligning with established theories in philosophy and psychology.

Harvey Lederman, Kyle Mahowald2026-03-06🤖 cs.AI

Ensembling Language Models with Sequential Monte Carlo

This paper introduces a unified framework for ensembling diverse language models via $f$ -ensemble distributions and proposes a byte-level sequential Monte Carlo algorithm to sample from these distributions, effectively overcoming challenges like mismatched vocabularies and biased approximations to improve performance on structured text generation tasks.

Robin Shing Moon Chan, Tianyu Liu, Samuel Kiegeland + 5 more2026-03-06🤖 cs.AI

Distributed Partial Information Puzzles: Examining Common Ground Construction Under Epistemic Asymmetry

This paper introduces the Distributed Partial Information Puzzle (DPIP) and its associated multimodal dataset to evaluate how well current large language models and logic-based systems can construct common ground under epistemic asymmetry, revealing that modern LLMs struggle to accurately track task progression and belief states compared to axiomatic approaches.

Yifan Zhu, Mariah Bradford, Kenneth Lai + 4 more2026-03-06🤖 cs.AI

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

This paper introduces FlashAttention-4, a co-designed algorithm and kernel implementation for Blackwell GPUs that addresses asymmetric hardware scaling through asynchronous pipelines, software-emulated operations, and CuTe-DSL-based development to achieve up to 1.3 $\times$ speedup over cuDNN and significantly faster compile times.

Ted Zadouri, Markus Hoehnerbach, Jay Shah + 3 more2026-03-06💬 cs.CL

DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates

This paper introduces DEBISS, a novel corpus of spoken and individual semi-structured debates designed to address the scarcity of debate datasets by providing comprehensive NLP annotations for tasks such as speech-to-text, speaker diarization, argument mining, and debater quality assessment.

Klaywert Danillo Ferreira de Souza, David Eduardo Pereira, Cláudio E. C. Campelo + 1 more2026-03-06💬 cs.CL

NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance

This paper introduces NCTB-QA, a large-scale Bangla educational question-answering dataset featuring a balanced distribution of answerable and unanswerable questions with adversarial distractors, and demonstrates that fine-tuning transformer models on this domain-specific data significantly improves performance in low-resource settings.

Abrar Eyasir, Tahsin Ahmed, Muhammad Ibrahim2026-03-06💬 cs.CL

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval

This paper introduces INTRA, a novel method that leverages internal LLM representations to achieve state-of-the-art fact-checking performance without relying on external retrieval, thereby addressing limitations in scalability and generalization across diverse sources, languages, and knowledge domains.

Artem Vazhentsev, Maria Marina, Daniil Moskovskiy + 8 more2026-03-06🤖 cs.AI

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

This paper demonstrates that reasoning models often exhibit "performative chain-of-thought" by generating tokens without revealing their internal beliefs, yet activation probing can detect these hidden certainties early to enable significant token reduction while distinguishing genuine uncertainty in complex tasks.

Siddharth Boppana, Annabel Ma, Max Loeffler + 5 more2026-03-06🤖 cs.AI

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

This paper leverages Chinese open-weight LLMs that censor politically sensitive topics as a natural testbed to evaluate honesty elicitation and lie detection techniques, finding that methods like few-shot prompting and self-classification effectively increase truthful responses and detect falsehoods, though no approach completely eliminates deception.

Helena Casademunt, Bartosz Cywiński, Khoi Tran + 3 more2026-03-06🤖 cs.AI

The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

This paper reveals that while massive activations and attention sinks frequently co-occur in Transformers due to the pre-norm architecture, they actually serve distinct global and local functions respectively, with the former acting as implicit parameters and the latter modulating short-range attention dependencies.

Shangwen Sun, Alfredo Canziani, Yann LeCun + 1 more2026-03-06🤖 cs.AI

← Previous Next →