cs.CL papers | Gist.Science

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation

The paper introduces POET-X, a memory-efficient and scalable variant of the POET framework that utilizes optimized orthogonal equivalence transformations to enable the stable pretraining of billion-parameter large language models on a single GPU, overcoming the high memory and computational costs of the original implementation.

Zeju Qiu, Lixin Liu, Adrian Weller + 2 more2026-03-06🤖 cs.AI

Catch Me If You Can Describe Me: Open-Vocabulary Camouflaged Instance Segmentation with Diffusion

This paper proposes a novel diffusion-based method for Open-Vocabulary Camouflaged Instance Segmentation (OVCIS) that effectively fuses multi-scale textual-visual features to overcome the challenges of blending boundaries and segmenting unseen object classes, demonstrating superior performance on benchmarks with applications in surveillance, wildlife monitoring, and military reconnaissance.

Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo + 4 more2026-03-05🤖 cs.AI

RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference

This paper proposes RAEE, a robust retrieval-augmented framework that models early exit as a distribution prediction problem to leverage similar data's exit information for guiding adaptive layer termination, thereby accelerating LLM inference while maintaining or enhancing zero-shot performance across multiple tasks.

Lianming Huang, Shangyu Wu, Yufei Cui + 6 more2026-03-05💬 cs.CL

Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph

This paper proposes a novel framework that integrates Large Language Models with the Australian National University's Computer Science Scholarly Knowledge Graph, utilizing a Deep Document Model and KG-enhanced Query Processing to enable accurate, fine-grained semantic retrieval of research artifacts through automatic LLM-SPARQL fusion.

Runsong Jia, Bowen Zhang, Sergio J. Rodríguez Méndez + 1 more2026-03-05🤖 cs.AI

Manipulating language models' training data to study syntactic constraint learning: the case of English passivization

This study demonstrates that neural network language models can learn English passivization exceptions by leveraging both frequency-based entrenchment and semantic affectedness from their training data, validating the utility of manipulating training corpora to investigate the sources of linguistic evidence in acquisition.

Cara Su-Yi Leong, Tal Linzen2026-03-05💬 cs.CL

LMUnit: Fine-grained Evaluation with Natural Language Unit Tests

This paper introduces LMUnit, a framework that utilizes natural language unit tests and a unified scoring model to overcome the limitations of human and automated evaluation, significantly improving inter-annotator agreement and achieving state-of-the-art performance on key benchmarks.

Jon Saad-Falcon, Rajan Vivek, William Berrios + 6 more2026-03-05🤖 cs.AI

Preference Leakage: A Contamination Problem in LLM-as-a-judge

This paper identifies and empirically validates "preference leakage," a pervasive contamination problem where LLM judges exhibit significant bias toward student models with which they share a relationship (such as being the same model, having an inheritance link, or belonging to the same family), thereby challenging the reliability of LLM-as-a-judge evaluation paradigms.

Dawei Li, Renliang Sun, Yue Huang + 6 more2026-03-05🤖 cs.AI

OSCAR: Online Soft Compression And Reranking

OSCAR is a novel online soft compression and reranking method that dynamically reduces computational overhead in Retrieval-Augmented Generation pipelines at inference time, achieving significant speed-ups with minimal accuracy loss across various large language model sizes.

Maxime Louis, Thibault Formal, Hervé Dejean + 1 more2026-03-05🤖 cs.AI

Generating Fine Details of Entity Interactions

This paper introduces \data, a dataset of fine-grained interaction prompts, and proposes \model, a novel framework leveraging Multimodal Large Language Models for prompt decomposition, image critique, and targeted refinement to significantly enhance the generation of complex object interactions in text-to-image synthesis.

Xinyi Gu, Jiayuan Mao2026-03-05🤖 cs.LG

When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger

This paper introduces Noise-to-Meaning Recursive Self-Improvement (N2M-RSI), a minimal formal model demonstrating that AI agents feeding their own outputs back as inputs can trigger unbounded internal complexity growth once a specific information-integration threshold is crossed, while remaining implementation-agnostic and scalable to agent swarms.

Rintaro Ando2026-03-05🤖 cs.AI

Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information

This paper introduces PubHealthBench, a new benchmark comprising over 8,000 questions derived from UK government guidance, to evaluate LLMs on public health knowledge and finds that while state-of-the-art models excel in multiple-choice tasks, their performance on free-form responses remains limited, highlighting the need for additional safeguards in real-world applications.

Joshua Harris, Fan Grayson, Felix Feldman + 8 more2026-03-05🤖 cs.LG

Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering

This paper introduces Multi-Objective Balanced Covering (MoB), a novel visual token pruning framework that leverages Hausdorff distance and $\epsilon$ -covering theory to derive a closed-form error bound and dynamically balance prompt alignment with visual preservation, achieving significant inference acceleration with minimal performance loss across diverse multimodal models.

Yangfu Li, Hongjian Zhan, Tianyi Chen + 2 more2026-03-05💬 cs.CL

R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

The paper introduces R1-Code-Interpreter, a general-purpose LLM trained via multi-stage reinforcement learning and supervised fine-tuning on a diverse dataset of 144 tasks, which significantly improves reasoning accuracy and enables emergent self-checking behaviors, outperforming both text-only and tool-augmented GPT-4o models.

Yongchao Chen, Yueying Liu, Junwei Zhou + 5 more2026-03-05🤖 cs.AI

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

This paper introduces Supervised Calibration (SC), a loss-minimization framework that enhances In-Context Learning in Large Language Models by learning optimal per-class affine transformations to correct systematic biases and alter decision boundary orientations, thereby achieving state-of-the-art performance across multiple models and datasets.

Korel Gundem, Juncheng Dong, Dennis Zhang + 2 more2026-03-05🤖 cs.AI

Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models

This paper investigates how preference models overrely on superficial artifacts like length and style rather than substantive quality, quantifying this miscalibration against human preferences and demonstrating that a counterfactual data augmentation method effectively mitigates these biases while preserving overall performance.

Anirudh Bharadwaj, Chaitanya Malaviya, Nitish Joshi + 1 more2026-03-05💬 cs.CL

CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering

This paper introduces CounselBench, a large-scale benchmark developed with mental health professionals to evaluate and adversarially stress-test large language models on realistic open-ended mental health questions, revealing significant gaps in safety, personalization, and the reliability of automated evaluation compared to human expert judgment.

Yahan Li, Jifan Yao, John Bosco S. Bunyi + 3 more2026-03-05💬 cs.CL

Query-Level Uncertainty in Large Language Models

This paper proposes "Internal Confidence," a novel, training-free method that estimates query-level uncertainty by leveraging self-evaluations across model layers and tokens to detect knowledge boundaries, thereby enabling more efficient and trustworthy adaptive inference strategies like retrieval-augmented generation and model cascading.

Lihu Chen, Gerard de Melo, Fabian M. Suchanek + 1 more2026-03-05💬 cs.CL

Context Biasing for Pronunciation-Orthography Mismatch in Automatic Speech Recognition

This paper proposes a novel context biasing method for automatic speech recognition that leverages user-provided on-the-fly corrections of substitution errors to effectively resolve pronunciation-orthography mismatches, achieving a 22% to 34% relative improvement in biased word error rates without compromising overall system performance.

Christian Huber, Alexander Waibel2026-03-05🤖 cs.LG

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

The paper introduces RLVER, a novel reinforcement learning framework that utilizes verifiable emotion rewards from simulated users to significantly enhance the empathetic capabilities of large language models while preserving their logical and coding competencies.

Peisong Wang, Ruotian Ma, Bang Zhang + 13 more2026-03-05🤖 cs.AI

UQLM: A Python Package for Uncertainty Quantification in Large Language Models

The paper introduces UQLM, a Python package that leverages state-of-the-art uncertainty quantification techniques to generate confidence scores for detecting hallucinations and enhancing the reliability of Large Language Model outputs.

Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik + 3 more2026-03-05🤖 cs.AI

← Previous Next →