cs.CL papers | Gist.Science

Towards Robust Retrieval-Augmented Generation Based on Knowledge Graph: A Comparative Analysis

This paper presents a comparative analysis demonstrating that GraphRAG, a knowledge graph-based retrieval system with specific customizations, outperforms the standard RGB baseline in robustness across noise, integration, negative rejection, and counterfactual scenarios, offering valuable insights for building more reliable Retrieval-Augmented Generation systems.

Hazem Amamou, Stéphane Gagnon, Alan Davoust, Anderson R. Avila2026-03-09💬 cs.CL

Cultural Perspectives and Expectations for Generative AI: A Global Survey Approach

This paper presents findings from a large-scale global survey that explores diverse cultural perspectives on Generative AI, distilling community-defined understandings of culture to propose recommendations for more inclusive and sensitive AI development, including participatory approaches and frameworks for addressing cultural boundaries.

Erin van Liemt, Renee Shelby, Andrew Smart, Sinchana Kumbale, Richard Zhang, Neha Dixit, Qazi Mamunur Rashid, Jamila Smith-Loud2026-03-09🤖 cs.AI

Structured Multidimensional Representation Learning for Large Language Models

This paper introduces the L-Transformer, a novel architecture that utilizes structured spectral factorization via the L-product to decompose the embedding space into independent spectral sub-transformers, achieving significant parameter reduction (up to 75%) while maintaining competitive performance and introducing beneficial frequency-based inductive biases.

Alaa El Ichi, Khalide Jbilou, Mohamed El Guide, Franck Dufrenois2026-03-09💬 cs.CL

Let's Talk, Not Type: An Oral-First Multi-Agent Architecture for Guaraní

This position paper argues that to better serve oral indigenous languages like Guaraní, AI systems must move beyond text-centric designs to adopt an oral-first multi-agent architecture that prioritizes conversational dynamics, community governance, and cultural sovereignty.

Samantha Adorno, Akshata Kishore Moharir, Ratna Kandala2026-03-09💬 cs.CL

CodeScout: Contextual Problem Statement Enhancement for Software Agents

The paper introduces CodeScout, a framework that enhances software agent performance by performing lightweight pre-exploration of codebases to convert underspecified user requests into comprehensive, actionable problem statements, resulting in a 20% improvement in resolution rates on the SWEBench-Verified benchmark.

Manan Suri, Xiangci Li, Mehdi Shojaie, Songyang Han, Chao-Chun Hsu, Shweta Garg, Aniket Anand Deshmukh, Varun Kumar2026-03-09💬 cs.CL

NERdME: a Named Entity Recognition Dataset for Indexing Research Artifacts in Code Repositories

The paper introduces NERdME, a new dataset of 200 manually annotated README files containing over 10,000 labeled spans across 10 entity types, designed to bridge the gap in scholarly information extraction by enabling the automatic indexing of implementation-level research artifacts in code repositories.

Genet Asefa Gesese, Zongxiong Chen, Shufan Jiang, Mary Ann Tan, Zhaotai Liu, Sonja Schimmler, Harald Sack2026-03-09💬 cs.CL

PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models

This paper introduces PVminer, a benchmark for structured extraction of patient voice from patient-generated text, and presents PVminerLLM, a supervised fine-tuned large language model that significantly outperforms prompt-based baselines in extracting codes, sub-codes, and evidence spans to enable scalable analysis of non-clinical health drivers.

Samah Fodeh, Linhai Ma, Ganesh Puthiaraju, Srivani Talakokkul, Afshan Khan, Ashley Hagaman, Sarah Lowe, Aimee Roundtree2026-03-09🤖 cs.AI

Tutor Move Taxonomy: A Theory-Aligned Framework for Analyzing Instructional Moves in Tutoring

This paper introduces a theory-aligned, four-category taxonomy of tutor instructional moves, developed through a hybrid deductive-inductive process, to enable large-scale, systematic analysis of tutoring dialogue and its relationship to learning outcomes.

Zhuqian Zhou, Kirk Vanacore, Tamisha Thompson, Jennifer St John, Rene Kizilcec2026-03-09💬 cs.CL

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It

This paper proposes "Proof-of-Guardrail," a system leveraging Trusted Execution Environments to provide cryptographic, verifiable attestation that AI agents enforce specific safety measures, thereby addressing the threat of falsely advertised safety while acknowledging remaining risks like active jailbreaking.

Xisen Jin, Michael Duan, Qin Lin, Aaron Chan, Zhenglun Chen, Junyi Du, Xiang Ren2026-03-09🤖 cs.AI

RouteGoT: Node-Adaptive Routing for Cost-Efficient Graph of Thoughts Reasoning

RouteGoT is a budget-controllable, node-adaptive routing framework that optimizes Graph of Thoughts reasoning by dynamically assigning strong models to critical planning and synthesis tasks while utilizing lightweight models for easier subtasks, thereby significantly improving accuracy and reducing token consumption compared to existing methods.

Yuhang Liu, Ruijie Wang, Yunlong Chu, Bing Hao, Yumeng Lin, Shengzhong Liu, Minglai Shao2026-03-09💬 cs.CL

HART: Data-Driven Hallucination Attribution and Evidence-Based Tracing for Large Language Models

This paper proposes HART, a novel framework that addresses the limitations of existing hallucination attribution methods by formalizing hallucination tracing as a four-stage structured task and introducing the first dedicated dataset to enable fine-grained, causal-level interpretability and evidence alignment for large language models.

Shize Liang, Hongzhi Wang2026-03-09💬 cs.CL

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

This paper empirically evaluates the effectiveness and limitations of many-shot prompting for test-time adaptation in large language models, finding that while it benefits structured tasks with high information gain, its performance is highly sensitive to selection strategies and often yields limited improvements for open-ended generation.

Shubhangi Upasani, Chen Wu, Jay Rainton, Bo Li, Changran Hu, Qizheng Zhang, Urmish Thakker2026-03-09🤖 cs.LG

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

ReflexiCoder is a novel reinforcement learning framework that internalizes structured self-reflection and self-correction capabilities into an LLM's weights, enabling it to autonomously generate, debug, and optimize code without external feedback while achieving state-of-the-art performance and improved token efficiency across multiple benchmarks.

Juyong Jiang, Jiasi Shen, Sunghun Kim, Kang Min Yoo, Jeonghoon Kim, Sungju Kim2026-03-09🤖 cs.LG

ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning

This paper proposes ROSE, a reordered SparseGPT method that enhances one-shot LLM pruning accuracy by adaptively reordering weights based on estimated column and block pruning losses to address the suboptimal performance caused by predefined left-to-right pruning orders in layers with columnar patterns.

Mingluo Su, Huan Wang2026-03-09🤖 cs.LG

Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation

This paper introduces CoCA, a reinforcement learning framework that shifts the paradigm from answer-first to confidence-first by jointly optimizing a model's pre-answer confidence calibration and answer accuracy through segmented credit assignment, thereby enabling more reliable uncertainty estimation without compromising performance.

Changcheng Li, Jiancan Wu, Hengheng Zhang, Zhengsu Chen, Guo An, Junxiang Qiu, Xiang Wang, Qi Tian2026-03-09💬 cs.CL

VerChol -- Grammar-First Tokenization for Agglutinative Languages

The paper "VerChol" proposes a grammar-first tokenization method designed to overcome the limitations of statistical approaches like Byte Pair Encoding by preserving morpheme boundaries and reducing token inflation in agglutinative languages.

Prabhu Raja2026-03-09💬 cs.CL

Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

This paper introduces ConStory-Bench, a new benchmark with 2,000 prompts and a detailed error taxonomy, alongside the ConStory-Checker automated pipeline, to systematically evaluate and analyze the prevalence and patterns of consistency errors in long-form story generation by Large Language Models.

Junjie Li, Xinrui Guo, Yuhao Wu, Roy Ka-Wei Lee, Hongzhi Li, Yutao Xie2026-03-09🤖 cs.AI

Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions

This paper presents a cost-effective ensemble methodology using LLMs and novel evaluation metrics (Content Preservation Ratio and Tag Well-Formedness) to achieve accurate and reliable semantic tagging of UN Security Council resolutions while minimizing hallucinations.

Hussein Ghaly2026-03-09💬 cs.CL

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning

InfoGatherer is a principled framework for high-stakes document-grounded QA that combines retrieved evidence with strategic user questioning, utilizing Dempster-Shafer belief assignments to model uncertainty and improve decision reliability while reducing interaction turns.

Maksym Taranukhin, Shuyue Stella Li, Evangelos Milios, Geoff Pleiss, Yulia Tsvetkov, Vered Shwartz2026-03-09💬 cs.CL

Learning Next Action Predictors from Human-Computer Interaction

This paper introduces LongNAP, a user model that leverages a large-scale dataset of 360K annotated multimodal interactions and a hybrid parametric-in-context learning approach to significantly outperform existing baselines in predicting a user's next action by reasoning over their full interaction history.

Omar Shaikh, Valentin Teutschbein, Kanishk Gandhi, Yikun Chi, Nick Haber, Thomas Robinson, Nilam Ram, Byron Reeves, Sherry Yang, Michael S. Bernstein, Diyi Yang2026-03-09💬 cs.CL

← Previous Next →