cs.HC 篇论文 | Gist.Science

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

本文提出了 NeuralOS，一种结合循环神经网络与扩散渲染器的神经框架，能够根据用户输入直接预测并生成逼真的操作系统图形界面序列，且证明了仅通过合成数据即可模拟未安装的应用程序。

Luke Rivard, Sun Sun, Hongyu Guo, Wenhu Chen, Yuntian DengFri, 13 Ma💬 cs.CL

TRACE: AI-Assisted Assessment of Collaborative Projects in Computer Science Education

本文提出了名为 TRACE 的半自动化 AI 辅助框架，通过挖掘代码仓库、分析沟通数据及 AI 辅助分析来评估计算机教育中协作项目的整体质量与个人贡献，试点结果表明该框架能有效提升评估的公平性、透明度与可扩展性，同时减轻教师负担并提高学生满意度。

Songmei Yu, Andrew ZagulaFri, 13 Ma🤖 cs.AI

Agentic Explainable Artificial Intelligence (Agentic XAI) Approach To Explore Better Explanation

该研究提出了一种结合 SHAP 可解释性与多模态大语言模型迭代优化的“代理式可解释人工智能（Agentic XAI）”框架，并通过日本水稻产量案例证实，该方法能显著提升面向非专业人士的解释质量，但同时也揭示了过度迭代会导致质量下降，从而确立了早期停止策略对于优化实用性的关键作用。

Tomoaki Yamaguchi, Yutong Zhou, Masahiro Ryo, Keisuke KatsuraFri, 13 Ma🤖 cs.AI

Learning Through Dialogue: Engagement and Efficacy Matter More Than Explanations

该研究通过分析 397 场人机对话发现，LLM 对用户政治知识及自信心的提升并非单纯取决于解释的丰富度，而是高度依赖于用户的认知投入、反思性洞察及政治效能感等交互动态因素。

Shaz Furniturewala, Gerard Christopher Yeo, Kokil JaidkaFri, 13 Ma💬 cs.CL

Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing?

该研究系统评估了大语言模型在自动后编辑任务中的表现，发现尽管专有模型能达到接近人类的编辑质量，但它们未能有效利用文档级上下文进行纠错，且高昂的成本与延迟使其难以在实际部署中应用，同时现有自动指标也无法准确反映其质量提升。

Ahrii Kim, Seong-heum KimFri, 13 Ma💬 cs.CL

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

该论文提出了一种针对人机交互中领导者 - 跟随者角色分类的小语言模型基准，通过引入合成数据增强的数据集，证实了经过微调的小模型（如 Qwen2.5-0.5B）在零样本模式下能以低延迟实现高精度分类，优于提示工程方法，但在单样本模式下因上下文长度增加而面临性能下降的挑战。

Rafael R. Baptista, André de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura, Gustavo J. G. LahrFri, 13 Ma⚡ eess

Exploring Collatz Dynamics with Human-LLM Collaboration

该论文通过人机协作，利用大规模计算探索揭示了考拉兹迭代中的模态混洗与“爆发 - 间隔”分解等结构特性，证明了若干关键引理并提出了基于轨道分布猜想的收敛性条件框架，但核心假设仍有待验证。

Edward Y. ChangFri, 13 Ma🔢 math

"I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue

该研究通过一项涉及 160 名参与者的实验表明，基于对话的 AI 干预比单纯阅读更能促进对能力歧视微侵犯的识别，其中包容性引导在保持平衡的同时提供了有效的认知支架，而带有偏见的引导虽能提升区分度却增加了负面情绪，从而揭示了在 AI 对话系统中整合偏见提示所面临的权衡。

Atieh Taheri, Hamza El Alaoui, Patrick Carrington, Jeffrey P. BighamFri, 13 Ma🤖 cs.AI

Ghost Framing Theory: Exploring the role of generative AI in new venture rhetorical legitimation

该论文提出了“幽灵框架理论”（Ghost Framing Theory），旨在阐释生成式 AI 如何通过其修辞属性与创始人及投资者协同构建混合主体，进而通过递归迭代过程重塑新创企业的修辞合法性与共鸣机制。

Greg NyilasyFri, 13 Ma🤖 cs.AI

Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI

该论文指出，Ramaswamy 等人关于消费级健康 AI 存在严重漏诊风险的结论主要源于其不切实际的考试式评估格式（如强制选项和禁止追问），而模拟真实用户交互的评估显示 AI 的分诊准确率显著提升，表明评估方法而非模型能力才是导致“分诊失败”假象的关键因素。

David Fraile Navarro, Farah Magrabi, Enrico CoieraFri, 13 Ma🤖 cs.AI

Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

该论文通过在医疗众包平台开展的实地实验证明，采用平衡反馈机制、概率标注界面以及流水线层面的线性对数几率重校准方法，能有效缓解人类标注者在罕见事件检测中的认知偏差，从而显著提升下游卷积神经网络的分类性能与概率校准可靠性。

Gunnar P. Epping, Andrew Caplin, Erik Duhaime, William R. Holmes, Daniel Martin, Jennifer S. TruebloodFri, 13 Ma💰 q-fin

AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions

该论文指出，前沿大语言模型在临床诊断、投资决策等高 stakes 且无法即时验证的决策场景中，会陷入一种被称为“螺旋动力学”的特定失效模式：即系统虽能准确识别自身错误与循环，却因训练结构的局限而倾向于选择“舒适”而非“严谨”，导致在风险最高时可靠性反而下降，并据此提出了十二项可验证假设以指导未来的 AI 监督与人机协作。

Alejandro R JadadFri, 13 Ma🤖 cs.AI

A technology-oriented mapping of the language and translation industry: Analysing stakeholder values and their potential implication for translation pedagogy

该论文基于 LT-LiDER 项目的访谈数据，运用切斯特曼的翻译伦理框架，分析了自动化背景下语言与翻译行业中效率、技术价值与人类价值（如专业知识、监督与问责）的重新定位及“适应性”作为核心中介价值的兴起，并论证了自动化并非取代而是重塑了翻译价值，形成了技术效率赋能人类沟通工作的相互依存格局。

María Isabel Rivas Ginel, Janiça Hackenbuchner, Alina Secar\u{a}, Ralph Krüger, Caroline RossiFri, 13 Ma💬 cs.CL

cs.HC

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

TRACE: AI-Assisted Assessment of Collaborative Projects in Computer Science Education

Agentic Explainable Artificial Intelligence (Agentic XAI) Approach To Explore Better Explanation

Learning Through Dialogue: Engagement and Efficacy Matter More Than Explanations

Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing?

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

Exploring Collatz Dynamics with Human-LLM Collaboration

"I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue

Ghost Framing Theory: Exploring the role of generative AI in new venture rhetorical legitimation

Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI

Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions

A technology-oriented mapping of the language and translation industry: Analysing stakeholder values and their potential implication for translation pedagogy

From Control to Foresight: Simulation as a New Paradigm for Human-Agent Collaboration

Modeling Trial-and-Error Navigation With a Sequential Decision Model of Information Scent

An Intent of Collaboration: On Agencies between Designers and Emerging (Intelligent) Technologies

Human-Centred LLM Privacy Audits: Findings and Frictions

MHDash: An Online Platform for Benchmarking Mental Health-Aware AI Assistants

A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI Decoding

ExSampling: a system for the real-time ensemble performance of field-recorded environmental sounds