cs.CL 篇论文 | Gist.Science

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

ReflexiCoder 提出了一种基于强化学习的框架，通过将生成、反思与自修正的完整轨迹内化至模型权重中，使大语言模型能够在无需外部反馈或执行引擎的情况下实现自主代码调试，从而在多项基准测试中达到甚至超越 GPT-5.1 的性能，同时显著降低了推理计算开销。

Juyong Jiang, Jiasi Shen, Sunghun Kim, Kang Min Yoo, Jeonghoon Kim, Sungju Kim2026-03-09🤖 cs.LG

ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning

本文提出了 ROSE，一种针对 SparseGPT 的改进方法，通过引入预剪枝、基于损失的两级重排序策略以及自适应识别列状层，有效解决了原有固定剪枝顺序导致的次优问题，从而在多种主流大语言模型上实现了更精准的单次剪枝效果。

Mingluo Su, Huan Wang2026-03-09🤖 cs.LG

Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation

该论文提出了一种名为 CoCA 的基于 GRPO 强化学习的框架，通过“先置信度后回答”的新范式及分段奖励机制，实现了大语言模型置信度校准与回答准确性的联合优化，从而在保持回答质量的同时显著提升了不确定性估计的可靠性。

Changcheng Li, Jiancan Wu, Hengheng Zhang, Zhengsu Chen, Guo An, Junxiang Qiu, Xiang Wang, Qi Tian2026-03-09💬 cs.CL

VerChol -- Grammar-First Tokenization for Agglutinative Languages

该论文提出了 VerChol，一种专为黏着语（如泰米尔语、土耳其语等）设计的“语法优先”分词方法，旨在解决主流字节对编码（BPE）因忽视形态边界而导致分词碎片化和 Token 数量膨胀的问题。

Prabhu Raja2026-03-09💬 cs.CL

Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

本文针对大语言模型在长故事生成中普遍存在的连贯性缺陷，提出了包含 2000 个提示和 19 种细粒度错误分类的 ConStory-Bench 基准及 ConStory-Checker 自动检测工具，并通过实验揭示了事实与时间维度错误高发、多出现在叙事中段及高熵文本段等关键规律。

Junjie Li, Xinrui Guo, Yuhao Wu, Roy Ka-Wei Lee, Hongzhi Li, Yutao Xie2026-03-09🤖 cs.AI

Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions

该论文提出了一种基于大语言模型（LLM）集成系统的语义标记新方法，通过引入内容保留率（CPR）和标签规范性（TWF）两项指标来筛选最佳输出，从而在联合国安理会决议的清洗与标记任务中实现了高精度、低幻觉且具备成本效益的自动化处理。

Hussein Ghaly2026-03-09💬 cs.CL

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning

InfoGatherer 是一个通过结合检索文档与针对性追问来收集信息，并利用基于 Dempster-Shafer 理论的证据网络对不确定性进行形式化建模，从而在法律和医疗等高风险领域实现更可靠、可解释决策的框架。

Maksym Taranukhin, Shuyue Stella Li, Evangelos Milios, Geoff Pleiss, Yulia Tsvetkov, Vered Shwartz2026-03-09💬 cs.CL

Learning Next Action Predictors from Human-Computer Interaction

该论文提出了名为 LongNAP 的用户模型，通过结合参数化学习与上下文学习，利用大规模标注的自然交互数据来预测用户的多模态下一步操作，从而实现了在复杂交互背景下对用户需求的主动式预测。

Omar Shaikh, Valentin Teutschbein, Kanishk Gandhi, Yikun Chi, Nick Haber, Thomas Robinson, Nilam Ram, Byron Reeves, Sherry Yang, Michael S. Bernstein, Diyi Yang2026-03-09💬 cs.CL

Addressing the Ecological Fallacy in Larger LMs with Human Context

该论文提出通过引入作者上下文（HuLM 任务）来纠正大型语言模型中的生态谬误，实验表明在 8B Llama 模型上应用人类感知的微调（HuFT）或持续预训练，能显著提升其在多项下游任务中的性能。

Nikita Soni, Dhruv Vijay Kunjadiya, Pratham Piyush Shah, Dikshya Mohanty, H. Andrew Schwartz, Niranjan Balasubramanian2026-03-09🤖 cs.AI

Implicit Style Conditioning: A Structured Style-Rewrite Framework for Low-Resource Character Modeling

该论文提出了一种结合显式风格解耦（涵盖词汇、句法和语用维度）与隐式思维链蒸馏的框架，使小参数语言模型在低资源条件下也能实现高保真的角色风格化生成，并显著优于更大规模的基线模型。

Chanhui Zhu2026-03-09🤖 cs.LG

Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models

该论文基于交互主义和建构主义心理学理论，提出了一种融合个体特质与情境特征的机器学习方法，利用大型语言模型分析社交媒体数据以预测心理健康状态，在保持竞争力的同时显著提升了模型的可解释性。

Nikita Soni, August Håkan Nilsson, Syeda Mahwish, Vasudha Varadarajan, H. Andrew Schwartz, Ryan L. Boyd2026-03-09🤖 cs.AI

Imagine How To Change: Explicit Procedure Modeling for Change Captioning

本文提出了 ProCap 框架，通过从静态图像对比转向动态过程建模，利用稀疏关键帧和可学习的过程查询来显式捕捉变化过程，从而生成更准确描述图像间差异及其发生方式的变化描述。

Jiayang Sun, Zixin Guo, Min Cao, Guibo Zhu, Jorma Laaksonen2026-03-09🤖 cs.AI

Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL

该论文提出了名为 Track-SQL 的框架，通过引入语义增强模式提取器和模式感知上下文提取器这两个双提取模块，有效解决了生成式语言模型在多轮 Text-to-SQL 任务中处理上下文信息和动态模式链接的不足，并在 SparC 和 CoSQL 数据集上取得了最先进的性能。

Bingfeng Chen, Shaobin Shi, Yongqi Luo, Boyan Xu, Ruichu Cai, Zhifeng Hao2026-03-09💬 cs.CL

cs.CL

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning

Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation

VerChol -- Grammar-First Tokenization for Agglutinative Languages

Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning

Learning Next Action Predictors from Human-Computer Interaction

Addressing the Ecological Fallacy in Larger LMs with Human Context

Implicit Style Conditioning: A Structured Style-Rewrite Framework for Low-Resource Character Modeling

Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models

Imagine How To Change: Explicit Procedure Modeling for Change Captioning

Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL

MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing

ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning

Evaluating Austrian A-Level German Essays with Large Language Models for Automated Essay Scoring

Experiences Build Characters: The Linguistic Origins and Functional Impact of LLM Personality

DeepSight: Bridging Depth Maps and Language with a Depth-Driven Multimodal Model

Making Implicit Premises Explicit in Logical Understanding of Enthymemes

Diffusion Language Models Are Natively Length-Aware