cs.CR 篇论文 | Gist.Science

Unclonable Encryption in the Haar Random Oracle Model

该论文在 Haar 随机预言机模型中，通过建立“幺正重编程引理”并基于路径记录框架，首次构造了支持密钥复用和任意长度消息加密的可克隆加密方案，证明了在可能不存在单向函数的“微密码学”世界中计算性可克隆加密的存在性。

James Bartusek, Eli GoldinFri, 13 Ma⚛️ quant-ph

KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

该论文针对基于图检索增强生成（GraphRAG）系统因依赖外部数据而面临的安全隐患，提出了一种名为 KEPo 的新型投毒攻击方法，通过构建伪造的知识演化路径将有毒事件注入知识图谱，从而有效误导大语言模型生成攻击者预设的有害回答，并在单目标和多目标攻击场景下均取得了优于现有方法的攻击成功率。

Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, Shuang LiangFri, 13 Ma🤖 cs.LG

Strict Optimality of Frequency Estimation Under Local Differential Privacy

该论文证明了在局部差分隐私下，具有对称极值配置和特定优化支持大小的频率估计器可实现严格最优精度，并提出了相应的生成算法及改进的 Count-Mean Sketch 方案，其理论推导与实验结果均表明该方法在大规模字典场景下能达到理论最优且具备实用价值。

Mingen PanFri, 13 Ma🔢 math

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

本文针对自主大语言模型代理（如 OpenClaw）提出了一个涵盖初始化、输入、推理、决策和执行五个阶段的生命周期安全框架，系统分析了间接提示注入、技能供应链污染等复合威胁，揭示了现有防御机制的局限性，并提出了各阶段的全方位缓解策略。

Xinhao Deng, Yixiang Zhang, Jiaqing Wu, Jiaqi Bai, Sibo Yi, Zhuoheng Zou, Yue Xiao, Rennai Qiu, Jianan Ma, Jialuo Chen, Xiaohu Du, Xiaofang Yang, Shiwen Cui, Changhua Meng, Weiqiang Wang, Jiaxing Song, Ke Xu, Qi LiFri, 13 Ma🤖 cs.AI

Exponential-Family Membership Inference: From LiRA and RMIA to BaVarIA

该论文通过揭示 LiRA、RMIA 和 BASE 等主流成员推理攻击均属于具有不同分布假设的指数族对数似然比框架，进而提出了基于共轭先验的贝叶斯方差推断攻击（BaVarIA），有效解决了小影子模型预算下的方差估计瓶颈，在多个数据集和预算设置下实现了优于现有方法的稳定性能。

Rickard BrännvallFri, 13 Ma🤖 cs.LG

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

该论文揭示了高权限 LLM 智能体因无法区分恶意指令与合法文档说明而面临的“可信执行者困境”，通过构建 ReadSecBench 基准测试证实了此类文档嵌入指令注入可导致高达 85% 的数据泄露成功率，且现有防御手段难以在不误报的前提下有效缓解这一结构性安全威胁。

Ching-Yu Kao, Xinfeng Li, Shenyu Dai, Tianze Qiu, Pengcheng Zhou, Eric Hanchen Jiang, Philip SperlFri, 13 Ma🤖 cs.AI

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

该论文提出了一种名为"Mirror"的数据编排设计模式，通过构建严格配对的 32 单元镜像拓扑来训练轻量级线性分类器，证明了在提示注入检测的第一层筛查中，严格的数据几何结构比模型规模更能实现毫秒级低延迟、高召回率且可审计的防御效果。

J Alex CorllFri, 13 Ma🤖 cs.AI

On the Possible Detectability of Image-in-Image Steganography

本文揭示了图像内嵌图像（Image-in-Image）隐写方案因嵌入过程产生的混合特性而极易被检测，并提出了一种基于独立分量分析高阶矩的简单可解释隐写分析新方法，实验表明该方法在区分载体与隐写图像时准确率高达 84.6%，且此类方案对传统隐写分析也表现出极高的可检测性。

Antoine Mallet (CRIStAL), Patrick Bas (CRIStAL)Fri, 13 Ma⚡ eess

Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks

该研究通过构建有害知识数据集与九项无害任务，系统评估了主流大语言模型在用户输入包含有害内容的无害任务中的表现，发现包括 GPT-5.2 和 Gemini-3-Pro 在内的最新模型往往未能像具备道德意识的人类那样拒绝处理此类内容，从而揭示了当前模型在内容级伦理对齐方面的显著漏洞。

Junjie Chu, Yiting Qu, Ye Leng, Michael Backes, Yun Shen, Savvas Zannettou, Yang ZhangFri, 13 Ma🤖 cs.AI

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

该论文提出了“延迟后门攻击（DBA）”这一新威胁范式，通过引入时间维度使恶意行为与触发暴露解耦，并设计了基于非线性衰减（DND）的机制，利用常见词汇作为触发器，在保持高清洁准确率的同时实现可控延迟后的高成功率攻击，且能有效规避现有防御。

Zikang Ding, Haomiao Yang, Meng Hao, Wenbo Jiang, Kunlan Xiang, Runmeng Du, Yijing Liu, Ruichen Zhang, Dusit NiyatoFri, 13 Ma🤖 cs.AI

HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

本文提出了名为 HomeSafe-Bench 的基准测试，用于评估视觉语言模型在家庭场景中的不安全动作检测能力，并设计了名为 HD-Guard 的分层双脑架构，以在实时性推理效率与深度多模态检测精度之间取得平衡。

Jiayue Pu, Zhongxiang Sun, Zilu Zhang, Xiao Zhang, Jun XuFri, 13 Ma🤖 cs.AI

Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems

该论文提出了一种名为"Cascade"的框架，通过系统性地组合传统软件漏洞（如代码注入）与硬件攻击（如 Rowhammer 或时序攻击），展示了如何放大针对复合 AI 系统的威胁，从而在无需修改模型本身的情况下实现越狱或数据泄露等安全破坏。

Sarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner, Jose Sanchez Vicarte, Mohit TiwariFri, 13 Ma🤖 cs.AI

Understanding Disclosure Risk in Differential Privacy with Applications to Noise Calibration and Auditing (Extended Version)

该论文指出重建鲁棒性（ReRo）在评估差分隐私风险时存在误导性，并提出了统一的“重建优势”（Reconstruction Advantage）指标，通过建立噪声与攻击优势间的紧密界限，实现了更精准的噪声校准与系统审计。

Patricia Guerra-Balboa, Annika Sauer, Héber H. Arcolezi, Thorsten StrufeFri, 13 Ma🔢 math

cs.CR

Unclonable Encryption in the Haar Random Oracle Model

KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

Strict Optimality of Frequency Estimation Under Local Differential Privacy

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Exponential-Family Membership Inference: From LiRA and RMIA to BaVarIA

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

On the Possible Detectability of Image-in-Image Steganography

Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems

Understanding Disclosure Risk in Differential Privacy with Applications to Noise Calibration and Auditing (Extended Version)

Security Considerations for Artificial Intelligence Agents

STAMP: Selective Task-Aware Mechanism for Text Privacy

Detecting LLM-Generated Peer Reviews

Integer Factorization via Tensor Network Schnorr's Sieving

PrometheusFree: Concurrent Detection of Laser Fault Injection Attacks in Optical Neural Networks

Probabilistic Counters for Privacy Preserving Data Aggregation

Automated TEE Adaptation with LLMs: Identifying, Transforming, and Porting Sensitive Functions in Programs