cs.CL papers | Gist.Science

MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations

This paper introduces MuSaG, the first German multimodal sarcasm dataset featuring aligned text, audio, and video annotations from television shows, and demonstrates through benchmarking that current models struggle to match human reliance on audio cues, thereby highlighting a critical gap for future multimodal sarcasm detection research.

Aaron Scott, Maike Züfle, Jan Niehues2026-03-05🤖 cs.AI

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

This paper introduces the Agent Data Protocol (ADP), a lightweight interlingua that unifies fragmented agent datasets across diverse formats into a standardized pipeline, enabling effective large-scale supervised fine-tuning that achieves state-of-the-art performance on various agent benchmarks without domain-specific tuning.

Yueqi Song, Ketan Ramaneti, Zaid Sheikh + 18 more2026-03-05🤖 cs.AI

CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field

The paper introduces CareMedEval, a novel dataset derived from French medical student exams that evaluates the ability of large language models to perform critical appraisal and reasoning on biomedical literature, revealing significant limitations in current models despite improvements from intermediate reasoning steps.

Doria Bonzi, Alexandre Guiggi, Frédéric Béchet + 2 more2026-03-05🤖 cs.AI

Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop

This paper presents a human-in-the-loop approach using large language models with various prompting strategies to extract Dutch metaphors from cancer patient interviews and forum data, resulting in the creation of the HealthQuote.NL corpus to enhance healthcare communication and personalized care.

Lifeng Han, David Lindevelt, Sander Puts + 2 more2026-03-05💬 cs.CL

Categorical Emotions or Appraisals - Which Emotion Model Explains Argument Convincingness Better?

This paper demonstrates that appraisal theories, which capture the subjective cognitive evaluations of an argument's impact, are more effective than categorical emotion models in predicting argument convincingness.

Lynn Greschner, Meike Bauer, Sabine Weber + 1 more2026-03-05💬 cs.CL

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

This paper presents a case study on developing a Multimodal Large Language Model for the low-resource Basque language, demonstrating that strong performance can be achieved with approximately 20% Basque multimodal data and that a Basque-adapted language model backbone is not strictly necessary.

Lukas Arana, Julen Etxaniz, Ander Salaberria + 1 more2026-03-05🤖 cs.AI

Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM

This paper introduces Dripper, a lightweight framework that reformulates web main content extraction as a constrained sequence labeling task using Small Language Models, achieving superior efficiency and accuracy compared to both traditional heuristics and massive generative LLMs while enabling the creation of high-quality training corpora.

Mengjie Liu, Jiahui Peng, Wenchang Ning + 14 more2026-03-05💬 cs.CL

What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models

This study employs contrastive explanations and saliency attribution to identify specific source tokens that trigger gender bias in neural machine translation models, revealing a significant overlap between these model-driven triggers and human gender perceptions to inform future bias mitigation strategies.

Janiça Hackenbuchner, Arda Tezcan, Joke Daems2026-03-05💬 cs.CL

NRR-Core: Non-Resolution Reasoning as a Computational Framework for Contextual Identity and Ambiguity Preservation

This paper proposes Non-Resolution Reasoning (NRR), a novel computational framework that challenges the premature ambiguity collapse in current AI by introducing principles of non-identity and parallel interpretation retention to preserve contextual identity and maintain multiple valid meanings until resolution is explicitly required.

Kei Saito2026-03-05🤖 cs.AI

A Systematic Analysis of Biases in Large Language Models

This study systematically evaluates four widely adopted large language models across political, ideological, alliance, linguistic, and gender dimensions, revealing that despite alignment efforts for neutrality, these models still exhibit significant biases and affinities.

Xulang Zhang, Rui Mao, Erik Cambria2026-03-05🤖 cs.AI

Generalization of RLVR Using Causal Reasoning as a Testbed

This paper demonstrates that Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the generalization of causal reasoning in large language models compared to supervised fine-tuning, but only when the models possess sufficient initial reasoning competence and are trained on specific combinations of model scale and query complexity.

Brian Lu, Hongyu Zhao, Shuo Sun + 3 more2026-03-05🤖 cs.AI

Succeeding at Scale: Automated Dataset Construction and Query-Side Adaptation for Multi-Tenant Search

This paper introduces DevRev-Search, an automated benchmark for technical support retrieval, and proposes an Index-Preserving Adaptation strategy that fine-tunes only the query encoder to achieve scalable, high-performance multi-tenant search without the prohibitive cost of re-indexing.

Prateek Jain, Shabari S Nair, Ritesh Goru + 4 more2026-03-05🤖 cs.AI

Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

This paper introduces a post-training paradigm that leverages knowledge graphs as implicit reward models to guide large language models in learning compositional reasoning from axiomatic facts, enabling a 14B model to outperform frontier systems on complex multi-hop medical queries through path-derived supervision.

Yuval Kansal, Niraj K. Jha2026-03-05✓ Author reviewed🤖 cs.AI

NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM Inference

This paper introduces NRR-Phi, a formal framework that maps ambiguous text to a non-collapsing state space using a hybrid extraction pipeline, thereby preserving multiple interpretations and preventing premature semantic commitment in large language model inference.

Kei Saito2026-03-05🤖 cs.AI

When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?

This paper presents the first empirical study on training LLMs to abstain from answering in temporal question answering by combining Chain-of-Thought supervision with Reinforcement Learning, demonstrating that this approach significantly outperforms existing models in accuracy and reliability while revealing the limitations of implicit reasoning cues and supervised fine-tuning.

Xinyu Zhou, Chang Jin, Carsten Eickhoff + 2 more2026-03-05🤖 cs.AI

Rewards as Labels: Revisiting RLVR from a Classification Perspective

This paper proposes "Rewards as Labels" (REAL), a novel framework that reformulates Reinforcement Learning with Verifiable Rewards as a classification problem to address gradient misassignment and domination issues in methods like GRPO, thereby achieving superior training stability and performance on mathematical reasoning benchmarks compared to state-of-the-art baselines.

Zepeng Zhai, Meilin Chen, Jiaxuan Zhao + 3 more2026-03-05🤖 cs.LG

Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding

The paper introduces SureLock, a method for Masked Diffusion-LMs that dynamically locks converged token positions to skip redundant computations while caching their keys and values, thereby reducing per-iteration complexity from $O(N^2d)$ to $O(MNd)$ and achieving 30–50% FLOP savings without compromising generation quality.

Daisuke Oba, Danushka Bollegala, Masahiro Kaneko + 1 more2026-03-05🤖 cs.LG

To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks

This study reveals that Large Reasoning Models often fail to outperform non-reasoning models in Theory of Mind tasks due to performance degradation from excessive "slow thinking" and reliance on option-matching shortcuts, indicating that robust social reasoning requires capabilities distinct from formal logical deduction.

Nanxu Gong, Haotian Li, Sixun Dong + 3 more2026-03-05🤖 cs.AI

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

This paper introduces the first NLP dataset for the endangered Meenzerisch dialect of Mainz and demonstrates that current large language models struggle significantly to generate or define its words, achieving accuracy rates below 10% even with few-shot learning and rule extraction, thereby highlighting an urgent need for further research and resources to preserve German dialects.

Minh Duc Bui, Manuel Mager, Peter Herbert Kann + 1 more2026-03-05💬 cs.CL

Extending Czech Aspect-Based Sentiment Analysis with Opinion Terms: Dataset and LLM Benchmarks

This paper introduces a novel Czech restaurant-domain dataset enriched with opinion terms for aspect-based sentiment analysis, benchmarks various Transformer and LLM models across multiple settings, and proposes an LLM-driven translation and label alignment methodology to effectively address cross-lingual challenges in low-resource languages.

Jakub Šmíd, Pavel Přibáň, Pavel Král2026-03-05💬 cs.CL

← Previous Next →