cs.CY 篇论文 | Gist.Science

Measuring AI R&D Automation

该论文指出当前数据难以反映 AI 研发自动化（AIRDA）的真实程度及其对能力与安全的影响，因此提出了一套涵盖资本投入、研究人员时间分配及系统安全事件等维度的新指标体系，建议企业、第三方机构及政府共同追踪这些指标，以更好地评估 AIRDA 的后果、实施安全措施并掌握 AI 发展节奏。

Alan Chan, Ranay Padarath, Joe Kwon + 2 more2026-03-06💻 cs

Signal in the Noise: Decoding the Reality of Airline Service Quality with Large Language Models

本研究利用大语言模型分析 1.6 万余条在线评论，揭示了传统指标未能捕捉的埃及航空服务痛点（如沟通不畅与员工态度），证明了该框架在将非结构化乘客反馈转化为可操作战略情报方面的有效性。

Ahmed Dawoud, Osama El-Shamy, Ahmed Habashy2026-03-06💻 cs

Invariant Causal Routing for Governing Social Norms in Online Market Economies

本文提出了一种名为“不变因果路由（ICR）”的治理框架，通过结合反事实推理与不变因果发现技术，在异质环境识别出稳定的政策 - 规范因果关系，从而为在线市场经济社会规范的引导提供了可解释且具备分布泛化能力的干预方案。

Xiangning Yu, Qirui Mi, Xiao Xue + 4 more2026-03-06💻 cs

Token Taxes: mitigating AGI's economic risks

该论文提出通过针对模型推理实施“代币税”（即基于使用量的销售环节附加费），利用现有算力治理基础设施在 AI 使用端而非托管端捕获价值，从而有效缓解通用人工智能（AGI）可能引发的税基侵蚀、生活水平下降及公民赋权丧失等经济风险。

Lucas Irwin, Tung-Yu Wu, Fazl Barez2026-03-06💻 cs

A Case Study in Responsible AI-Assisted Video Solutions: Multi-Metric Behavioral Insights in a Public Market Setting

该研究通过在市中心公共市场开展案例研究，展示了一种在严格遵循隐私与伦理标准的前提下，利用计算机视觉技术提取客流方向、停留时长及移动模式等多维行为洞察的负责任 AI 视频解决方案，从而为优化城市空间人流管理提供了可行路径。

Mehrnoush Fereydouni, Eka Ebong, Sahar Maleki + 3 more2026-03-06💻 cs

Stan: An LLM-based thermodynamics course assistant

本文介绍了"Stan"，这是一个基于本地部署的开源大语言模型（如 Llama 3.1 和 Whisper）构建的热力学课程辅助系统，它通过检索增强生成技术同时为学生提供基于教材索引的精准问答，并为教师生成包含教学总结、学生困惑点及教学案例的结构化分析，从而在保障数据隐私和成本可控的前提下，全面支持教与学。

Eric M. Furst, Vasudevan Venkateshwaran2026-03-06🔬 physics

Generalizing Fair Top- $k$ Selection: An Integrative Approach

本文针对多保护组下的公平 Top- $k$ 选择问题，揭示了现有假设下的计算不可行性并提出了针对小规模 $k$ 的高效算法，同时引入效用损失作为新的差异度量以增强评分函数的稳定性，最终通过工程权衡在真实数据集上实现了优异性能。

Guangya Cai2026-03-06💻 cs

Analysis of Terms of Service on Social Media Platforms: Consent Challenges and Assessment Metrics

本研究针对社交媒体平台通过服务条款（ToS）获取用户同意的现状，提出并应用了一个涵盖文本可读性、语义透明度和界面设计承诺的三维评估框架，对 13 个主流平台的 ToS 进行分析，揭示了其在语言复杂性、非承诺性措辞及数据实践披露等方面的显著缺陷，从而指出 ToS 虽形式上包含同意机制，实则往往限制了用户的清晰认知与自主选择，并呼吁将其重新定位为塑造同意条件的关键文件以推动更伦理化的同意机制设计。

Yong-Bin Kang, Anthony McCosker2026-03-06💻 cs

Evaluating and Correcting Human Annotation Bias in Dynamic Micro-Expression Recognition

本文提出了一种名为 GAMDSS 的新架构，通过动态关键帧重选策略优化时空建模，有效减少了跨文化微表情数据集中的人工标注偏差并提升了识别性能，同时无需增加模型参数量。

Feng Liu, Bingyu Nan, Xuezhong Qian + 1 more2026-03-06💻 cs

Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses

该研究通过对 890 项结果的元分析，揭示了自动短答案评分中 AI 模型在难度适应性、架构选择（解码器表现劣于编码器）、词表大小收益递减以及种族偏见和措辞敏感性等方面的系统性缺陷，并呼吁针对自回归模型的统计局限性优化系统设计。

Michael Hardy2026-03-06💬 cs.CL

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

该论文通过构建统一特征中心框架，从理论上揭示了差分隐私随机梯度下降（DP-SGD）在两层 ReLU 卷积神经网络中因信噪比失衡而损害公平性与鲁棒性的内在机制，并指出公共预训练结合私有微调的范式在特征分布偏移下未必有效。

Ruichen Xu, Kexin Chen2026-03-06🤖 cs.LG

Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

该研究通过针对 164 名法学生的随机实验发现，约十分钟的生成式 AI 专项培训不仅能显著提高法律分析任务中的工具使用率，还能带来相当于三分之一字母等级的成绩提升，而缺乏培训的单纯工具访问则无法改善表现甚至导致回答变短，这表明在知识密集型领域，针对用户的互补性培训对于释放生成式 AI 的生产力至关重要。

Benjamin M. Chen, Hong Bao2026-03-06🤖 cs.AI

Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes

该研究揭示，即使简历经过匿名化处理，大型语言模型仍能通过语言、爱好等细微的社会文化标记推断求职者的种族和性别，并表现出对特定群体（如华裔和白人男性）的系统性偏好，且要求模型提供解释的提示反而会加剧这种偏见。

Bryan Chen Zhengyu Tan, Shaun Khoo, Bich Ngoc Doan + 3 more2026-03-06💻 cs

cs.CY

Measuring AI R&D Automation

Signal in the Noise: Decoding the Reality of Airline Service Quality with Large Language Models

Invariant Causal Routing for Governing Social Norms in Online Market Economies

Token Taxes: mitigating AGI's economic risks

A Case Study in Responsible AI-Assisted Video Solutions: Multi-Metric Behavioral Insights in a Public Market Setting

Stan: An LLM-based thermodynamics course assistant

Generalizing Fair Top- $k$ Selection: An Integrative Approach

Analysis of Terms of Service on Social Media Platforms: Consent Challenges and Assessment Metrics

Evaluating and Correcting Human Annotation Bias in Dynamic Micro-Expression Recognition

Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes

Cognitive Warfare: Definition, Framework, and Case Study

The role of spatial scales in assessing urban mobility models

NL2GDS: LLM-aided interface for Open Source Chip Design

Synthetic emotions and consciousness: exploring architectural boundaries

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop

A Systematic Analysis of Biases in Large Language Models

cs.CY

Measuring AI R&D Automation

Signal in the Noise: Decoding the Reality of Airline Service Quality with Large Language Models

Invariant Causal Routing for Governing Social Norms in Online Market Economies

Token Taxes: mitigating AGI's economic risks

A Case Study in Responsible AI-Assisted Video Solutions: Multi-Metric Behavioral Insights in a Public Market Setting

Stan: An LLM-based thermodynamics course assistant

Generalizing Fair Top-kkk Selection: An Integrative Approach

Analysis of Terms of Service on Social Media Platforms: Consent Challenges and Assessment Metrics

Evaluating and Correcting Human Annotation Bias in Dynamic Micro-Expression Recognition

Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

Small Changes, Big Impact: Demographic Bias in LLM-Based Hiring Through Subtle Sociocultural Markers in Anonymised Resumes

Cognitive Warfare: Definition, Framework, and Case Study

The role of spatial scales in assessing urban mobility models

NL2GDS: LLM-aided interface for Open Source Chip Design

Synthetic emotions and consciousness: exploring architectural boundaries

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop

A Systematic Analysis of Biases in Large Language Models

Generalizing Fair Top- $k$ Selection: An Integrative Approach