cs.AI 篇论文 | Gist.Science

mAVE: A Watermark for Joint Audio-Visual Generation Models

本文提出了首个专为联合音视频生成模型设计的 mAVE 水印框架，通过在不微调的情况下对音视频潜在空间进行加密绑定，有效解决了现有方法因模态解耦而面临的“交换攻击”漏洞，从而在零性能损失的前提下实现了近完美的绑定完整性与版权保护。

Luyang Si, Leyi Pan, Lijie Wen2026-03-10💻 cs

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

该研究通过对比直接生成与基于人类作者定义的中间表示（IR）的流水线方法，实证评估了大型语言模型在结构约束下将目标可玩模式（GPCs）转化为可编译 Unity 游戏代码的能力，并揭示了当前模型在代码生成中面临的主要结构性“接地”与“卫生”失败模式。

Hugh Xuechen Liu, Kıvanç Tatar2026-03-10💻 cs

Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation

该论文提出了一种结合半自回归生成与在线知识蒸馏的个性化重排序框架（PSAD），通过引入用户画像网络增强用户 - 物品交互，有效解决了生成式重排序中生成质量与推理延迟的平衡难题，并在多个数据集上显著优于现有最先进方法。

Kai Cheng, Hao Wang, Wei Guo, Weiwen Liu, Yong Liu, Yawen Li, Enhong Chen2026-03-10💻 cs

Vision Language Models Cannot Reason About Physical Transformation

该论文通过构建 ConservationBench 基准测试发现，当前视觉语言模型在面对物理变换时无法真正理解守恒定律，其表现接近随机猜测且受文本先验误导，表明它们缺乏在动态场景中保持物理属性变换不变性的推理能力。

Dezhi Luo, Yijiang Li, Maijunxian Wang, Tianwei Zhao, Bingyang Wang, Siheng Wang, Pinyuan Feng, Pooyan Rahmanzadehgervi, Ziqiao Ma, Hokin Deng2026-03-10💻 cs

Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information

本文介绍了为 AIWolfDial 2024 共享任务开发的基于大语言模型的狼人杀 AI 智能体，该智能体通过利用对话摘要和人工设计的角色信息，有效提升了发言的一致性与角色特征的连贯性。

Yoshiki Tanaka, Takumasa Kaneko, Hiroki Onozeki, Natsumi Ezure, Ryuichi Uehara, Zhiyang Qi, Tomoya Higuchi, Ryutaro Asahara, Michimasa Inaba2026-03-10💬 cs.CL

aCAPTCHA: Verifying That an Entity Is a Capable Agent via Asymmetric Hardness

该论文提出了 aCAPTCHA，一种基于人类认知与 AI 处理之间非对称难度差异的时间约束安全协议，旨在通过验证行动、推理和记忆能力来区分人类、脚本与智能体，从而解决自主 AI 代理在网络安全中的实体类型验证问题。

Zuyao Xu, Xiang Li, Fubin Wu, Yuqi Qiu, Lu Sun, FaSheng Miao2026-03-10💻 cs

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

本文提出了 EyExIn 框架，通过专家感知双流编码、语义自适应门控融合及自适应深度专家注入机制，有效解决了视网膜视觉语言模型在细粒度病理感知和推理过程中因语言先验主导而产生的幻觉问题，显著提升了眼科视觉问答的精度与可信度。

Shuai Lu, Meng Wang, Jia Guo, Jiawei Du, Bo Liu, Shengzhu Yang, Weihang Zhang, Huazhu Fu, Huiqi Li2026-03-10💻 cs

Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language

该论文针对现有对话情感识别方法难以捕捉复杂微妙情感状态的局限，提出了“对话情感转录”（ETC）新任务，并构建了包含日语自然语言情感描述及分类标签的数据集，旨在推动更富表现力的对话情感理解研究。

Yoshiki Tanaka, Ryuichi Uehara, Koji Inoue, Michimasa Inaba2026-03-10💬 cs.CL

Fine-Grained Table Retrieval Through the Lens of Complex Queries

本文提出了一种名为 DCTR 的细粒度表格检索机制，通过细粒度类型查询分解和全局连通性感知，有效解决了开放域复杂查询场景下关系数据库问答中的检索挑战，并在行业基准测试中展现了其针对高复合查询和密集连接数据库的鲁棒性。

Wojciech Kosiuk, Xingyu Ji, Yeounoh Chung, Fatma Özcan, Madelon Hulsebos2026-03-10💬 cs.CL

Improving reasoning at inference time via uncertainty minimisation

该论文提出了一种在推理阶段通过最大化模型内部“自我确定性”来最小化不确定性的新方法，该方法在思维层面而非词元层面进行选择，以少量采样显著提升了大语言模型在数学推理任务中的表现，并揭示了早期推理步骤的确定性对最终准确性的关键预测作用。

Nicolas Legrand, Kenneth Enevoldsen, Márton Kardos, Kristoffer Nielbo2026-03-10💻 cs

Learning to Rank the Initial Branching Order of SAT Solvers

该论文提出了一种利用图神经网络预测 SAT 求解器初始分支顺序的预处理方法，在随机 3-CNF 和伪工业基准测试中显著提升了求解速度并展现出良好的泛化能力，但在更复杂的工业实例上因求解器动态启发式策略的覆盖及实例复杂性而效果有限。

Arvid Eriksson (KTH Royal Institute of Technology), Gabriel Poesia (Kempner Institute at Harvard University), Roman Bresson (Mohamed Bin Zayed University of Artificial Intelligence), Karl Henrik Johansson (KTH Royal Institute of Technology), David Broman (KTH Royal Institute of Technology)2026-03-10💻 cs

From State Changes to Creative Decisions: Documenting and Interpreting Traces Across Creative Domains

该论文针对现有方法在记录创意活动轨迹时缺乏意图与高层级创意决策关联的问题，提出了三种互补方案，分别通过节点式界面管理生成式 AI 状态、构建可视化创作词汇以及将语义历史嵌入交互状态，以更好地捕捉和解读跨领域的创意实践。

Xiaohan Peng, Sotiris Piliouras, Carl Abou Saada Nujaim2026-03-10💻 cs

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

本文针对自主智能体面临的提示注入等执行层漏洞，提出了包含沙箱隔离、意图验证、零信任授权及审计日志的四层治理架构（LGA），并通过构建双语基准测试与多模型实验，验证了该架构在保持低延迟的同时能有效拦截绝大多数恶意工具调用。

Yuxu Ge2026-03-10💻 cs

$\textbf{Re}^{2}$ : Unlocking LLM Reasoning via Reinforcement Learning with Re-solving

该论文提出了“重解强化学习”（Re²）方法，通过让大语言模型在推理过程中学会灵活放弃低效路径并重新解题，从而在无需监督微调的情况下显著提升其推理性能并解决过度思考问题。

Pinzheng Wang, Shuli Xu, Juntao Li, Yu Luo, Dong Li, Jianye Hao, Min Zhang2026-03-10💻 cs

A Miniature Brain Transformer: Thalamic Gating, Hippocampal Lateralization, Amygdaloid Salience, and Prefrontal Working Memory in Attention-Coupled Latent Memory

该论文提出了一种融合丘脑、杏仁核、前额叶及小脑等脑区模拟模块的微型脑 Transformer 架构，并通过实验揭示了一个反直觉的关键发现：仅有抑制性胼胝体耦合无法实现海马体功能侧化，必须依赖前额叶工作记忆缓冲器打破对称性，才能触发侧化状态的急剧相变。

Hong Jeong2026-03-10💻 cs

VINO: Video-driven Invariance for Non-contextual Objects via Structural Prior Guided De-contextualization

该论文提出了 VINO 框架，通过利用结构先验生成非语义视图并构建不对称蒸馏任务，有效解决了视频自监督学习中因前景与背景协同运动导致的上下文捷径问题，从而学习到具有强物体中心不变性的鲁棒特征表示。

Seul-Ki Yeom, Marcel Simon, Eunbin Lee, Tae-Ho Kim2026-03-10💻 cs

A Hybrid LTR-based System via Social Context Embedding for Recommending Solutions of Software Bugs in Developer Communities

该论文提出了一种基于学习排序（LTR）的混合推荐系统，通过利用深度学习技术挖掘 Stack Overflow 中的社交上下文嵌入，帮助开发者在软件社区中高效检索并推荐最相关的软件缺陷解决方案，其在推荐前 10 个答案时达到了约 78% 的准确率。

Fouzi Harrag, Mokdad Khemliche2026-03-10💻 cs

cs.AI

mAVE: A Watermark for Joint Audio-Visual Generation Models

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation

Vision Language Models Cannot Reason About Physical Transformation

Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information

aCAPTCHA: Verifying That an Entity Is a Capable Agent via Asymmetric Hardness

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language

Fine-Grained Table Retrieval Through the Lens of Complex Queries

Improving reasoning at inference time via uncertainty minimisation

Learning to Rank the Initial Branching Order of SAT Solvers

From State Changes to Creative Decisions: Documenting and Interpreting Traces Across Creative Domains

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

$\textbf{Re}^{2}$ : Unlocking LLM Reasoning via Reinforcement Learning with Re-solving

A Miniature Brain Transformer: Thalamic Gating, Hippocampal Lateralization, Amygdaloid Salience, and Prefrontal Working Memory in Attention-Coupled Latent Memory

VINO: Video-driven Invariance for Non-contextual Objects via Structural Prior Guided De-contextualization

A Hybrid LTR-based System via Social Context Embedding for Recommending Solutions of Software Bugs in Developer Communities

LEPA: Learning Geometric Equivariance in Satellite Remote Sensing Data with a Predictive Architecture

Learning When to Cooperate Under Heterogeneous Goals

Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving

cs.AI

mAVE: A Watermark for Joint Audio-Visual Generation Models

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation

Vision Language Models Cannot Reason About Physical Transformation

Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information

aCAPTCHA: Verifying That an Entity Is a Capable Agent via Asymmetric Hardness

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language

Fine-Grained Table Retrieval Through the Lens of Complex Queries

Improving reasoning at inference time via uncertainty minimisation

Learning to Rank the Initial Branching Order of SAT Solvers

From State Changes to Creative Decisions: Documenting and Interpreting Traces Across Creative Domains

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Re2\textbf{Re}^{2}Re2: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving

A Miniature Brain Transformer: Thalamic Gating, Hippocampal Lateralization, Amygdaloid Salience, and Prefrontal Working Memory in Attention-Coupled Latent Memory

VINO: Video-driven Invariance for Non-contextual Objects via Structural Prior Guided De-contextualization

A Hybrid LTR-based System via Social Context Embedding for Recommending Solutions of Software Bugs in Developer Communities

LEPA: Learning Geometric Equivariance in Satellite Remote Sensing Data with a Predictive Architecture

Learning When to Cooperate Under Heterogeneous Goals

Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving

$\textbf{Re}^{2}$ : Unlocking LLM Reasoning via Reinforcement Learning with Re-solving