cs.AI papers | Gist.Science

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

This position paper reframes multi-agent memory as a computer architecture challenge by proposing a three-layer hierarchy and identifying critical protocol gaps, with a specific focus on resolving multi-agent memory consistency as the primary obstacle to building reliable and scalable collaborative systems.

Zhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, Jishen Zhao2026-03-12🤖 cs.AI

The Epistemic Support-Point Filter: Jaynesian Maximum Entropy Meets Popperian Falsification

This paper introduces the Epistemic Support-Point Filter (ESPF), a mathematically proven optimal evidence-only filter that synthesizes Jaynesian maximum entropy propagation with Popperian falsification updates to minimize worst-case epistemic ignorance, thereby outperforming Bayesian approaches while recovering the Kalman filter in the Gaussian limit.

Moriba Kemessia Jah2026-03-12🔢 math

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

This paper introduces HTMuon, a heavy-tailed spectral correction method that improves upon the Muon optimizer by preserving parameter interdependencies while inducing heavier-tailed weight spectra, resulting in consistent performance gains in LLM pretraining and image classification alongside theoretical convergence guarantees.

Tianyu Pang, Yujie Fang, Zihang Liu, Shenyang Deng, Lei Hsiung, Shuhua Yu, Yaoqing Yang2026-03-12🤖 cs.LG

ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models

This paper introduces ADVERSA, an automated red-teaming framework that evaluates LLM safety by measuring continuous guardrail degradation and judge reliability across multi-turn interactions, revealing that successful jailbreaks in frontier models often occur early in conversations rather than accumulating over sustained adversarial pressure.

Harry Owiredu-Ashley2026-03-12🤖 cs.AI

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models

This paper pioneers the application of sparse autoencoders to the Chronos-T5 time series foundation model, revealing a depth-dependent causal hierarchy where mid-encoder features responsible for change detection are more critical to forecasting accuracy than the semantically rich but less causally influential features in the final encoder layer.

Anurag Mishra2026-03-12🤖 cs.LG

Why LLMs Fail: A Failure Analysis and Partial Success Measurement for Automated Security Patch Generation

This study evaluates 319 LLM-generated patches for 64 Java security vulnerabilities, revealing that while models often preserve functionality, they frequently fail to address security issues due to semantic misunderstandings, resulting in a low overall success rate of 24.8% and highlighting the critical need for rigorous validation before deployment.

Amir Al-Maamari2026-03-12🤖 cs.AI

Marginals Before Conditionals

This paper demonstrates that neural networks learning a conditional task with K-fold ambiguity first converge to a marginal solution characterized by a log K loss plateau, which persists until an internally assembled selector-routing head triggers a sharp, collective transition to the full conditional solution, a process governed by dataset size and stabilized by gradient noise.

Mihir Sahasrabudhe2026-03-12🤖 cs.LG

TASER: Task-Aware Spectral Energy Refine for Backdoor Suppression in UAV Swarms Decentralized Federated Learning

This paper proposes TASER, a decentralized defense framework for UAV swarms that mitigates stealthy backdoor attacks in Federated Learning by leveraging spectral energy refinement to structurally disrupt malicious tasks while preserving main-task accuracy, offering a more efficient alternative to complex outlier detection methods.

Sizhe Huang, Shujie Yang2026-03-12🤖 cs.AI

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

The paper introduces "Amnesia," a lightweight adversarial attack that manipulates the internal activation states of open-weight Large Language Models to bypass existing safety mechanisms and induce harmful or antisocial behaviors without requiring any fine-tuning.

Ali Raza, Gurang Gupta, Nikolay Matyunin, Jibesh Patra2026-03-12🤖 cs.AI

Digging Deeper: Learning Multi-Level Concept Hierarchies

This paper introduces Multi-Level Concept Splitting (MLCS) and Deep-HiCEMs to overcome the limitations of shallow hierarchies in concept-based models by automatically discovering multi-level concept structures from coarse annotations and enabling effective interventions at various levels of abstraction.

Oscar Hill, Mateo Espinosa Zarlenga, Mateja Jamnik2026-03-12🤖 cs.LG

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

KernelSkill is a multi-agent framework that enhances GPU kernel optimization by replacing opaque LLM heuristics with a knowledge-driven, dual-level memory architecture of expert skills, achieving state-of-the-art speedups and a 100% success rate on KernelBench.

Qitong Sun, Jun Han, Tianlin Li, Zhe Tang, Sheng Chen, Fei Yang, Aishan Liu, Xianglong Liu, Yang Liu2026-03-12🤖 cs.LG

ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping

The paper introduces ES-dLLM, a training-free inference acceleration framework for Diffusion Large Language Models that significantly boosts throughput by dynamically skipping tokens in early layers based on intermediate representation variations and confidence scores, achieving up to a 16.8 $\times$ speedup while maintaining generation quality.

Zijian Zhu, Fei Ren, Zhanhong Tan, Kaisheng Ma2026-03-12🤖 cs.LG

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

This paper proposes a "multi-stream perturbation attack" that exploits vulnerabilities in the step-by-step reasoning of thinking-mode LLMs by interweaving multiple task streams to disrupt safety alignment, causing high attack success rates and inducing reasoning collapse or repetitive outputs across various models.

Fan Yang2026-03-12🤖 cs.AI

Execution Is the New Attack Surface: Survivability-Aware Agentic Crypto Trading with OpenClaw-Style Local Executors

This paper proposes Survivability-Aware Execution (SAE), a middleware framework for OpenClaw-style agentic crypto trading systems that enforces non-bypassable invariants like exposure budgets and order-rate limits to mitigate execution-induced losses from untrusted prompts or compromised skills, demonstrating significant reductions in maximum drawdown and risk metrics through offline replay testing.

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Serhii Hovorov, Sofiia Pidturkina2026-03-12🤖 cs.AI

Equivariant Asynchronous Diffusion: An Adaptive Denoising Schedule for Accelerated Molecular Conformation Generation

This paper introduces Equivariant Asynchronous Diffusion (EAD), a novel model that combines the strengths of auto-regressive and synchronous approaches through an adaptive denoising schedule to effectively capture molecular hierarchy and achieve state-of-the-art 3D molecular conformation generation.

Junyi An, Chao Qu, Yun-Fei Shi, Zhijian Zhou, Fenglei Cao, Yuan Qi2026-03-12🧬 q-bio

Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models

This paper introduces Code-Space Response Oracles (CSRO), a novel framework that replaces black-box deep reinforcement learning oracles with Large Language Models to generate human-readable, interpretable multi-agent policies as code, achieving competitive performance while enabling the discovery of complex, explainable strategies.

Daniel Hennes, Zun Li, John Schultz, Marc Lanctot2026-03-12🤖 cs.AI

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

This paper proposes a hardware-efficient "soft sparsity" paradigm for CNNs that utilizes a Most Significant Bit (MSB) proxy to skip negligible non-zero multiplications, achieving significant MAC and power reductions with zero accuracy loss while outperforming traditional zero-skipping methods.

Vishal Shashidhar, Anupam Kumari, Roy P Paily2026-03-12🤖 cs.LG

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

The paper introduces CLIPO, a method that integrates contrastive learning into policy optimization to generalize Reinforcement Learning with Verifiable Rewards (RLVR) by capturing invariant structures across correct reasoning paths, thereby mitigating hallucinations and improving the generalization and robustness of Large Language Models.

Sijia Cui, Pengyu Cheng, Jiajun Song, Yongbo Gai, Guojun Zhang, Zhechao Yu, Jianhe Lin, Xiaoxi Jiang, Guanjun Jiang2026-03-12🤖 cs.LG

Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

This paper argues that the "Lost in the Middle" phenomenon in large language models is an inherent geometric property of causal decoder architectures present at initialization, caused by the interplay of causal masking and residual connections that creates a structurally hostile "dead zone" in the middle of the context, a bias that persists even after standard pretraining.

Borun D Chowdhury2026-03-12🤖 cs.LG

AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models

The paper proposes AR-VLA, a standalone autoregressive Action Expert that maintains long-lived memory to generate continuous, context-aware action sequences, effectively addressing the frequency mismatch between fast control and slow reasoning while outperforming traditional reactive Vision-Language-Action models in trajectory smoothness and task success.

Yutong Hu, Jan-Nico Zaech, Nikolay Nikolov, Yuanqi Yao, Sombit Dey, Giuliano Albanese, Renaud Detry, Luc Van Gool, Danda Paudel2026-03-12🤖 cs.AI

← Previous Next →