cs.LG papers | Gist.Science

ContextBench: Modifying Contexts for Targeted Latent Activation

This paper introduces ContextBench, a benchmark for evaluating methods that generate fluent inputs to trigger specific latent features in language models, and demonstrates that enhanced Evolutionary Prompt Optimization variants achieve state-of-the-art performance in balancing elicitation strength with linguistic fluency.

Robert Graham, Edward Stevinson, Leo Richter, Alexander Chia, Joseph Miller, Joseph Isaac Bloom2026-03-09🤖 cs.AI

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

The paper introduces Sysformer, a novel approach that safeguards frozen large language models by learning to adapt system prompts in the embedding space to significantly improve safety robustness against harmful inputs and jailbreaking attacks without requiring costly model fine-tuning.

Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar2026-03-09🤖 cs.AI

SPoT: Subpixel Placement of Tokens in Vision Transformers

The paper introduces SPoT, a novel tokenization strategy that positions Vision Transformer tokens continuously within images via an oracle-guided search to overcome grid-based limitations, significantly reducing the required token count while improving performance and interpretability.

Martine Hjelkrem-Tan, Marius Aasan, Gabriel Y. Arteaga, Adín Ramírez Rivera2026-03-09🤖 cs.LG

Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding

This paper introduces Quantifying Cross-Attention Interaction (QCAI), a novel post-hoc explainable AI method that interprets cross-attention mechanisms in encoder-decoder transformers to improve the understanding of TCR-pMHC binding, achieving state-of-the-art performance on the newly established TCR-XAI benchmark of 274 experimentally determined structures.

Jiarui Li, Zixiang Yin, Haley Smith, Zhengming Ding, Samuel J. Landry, Ramgopal R. Mettu2026-03-09🤖 cs.LG

Temporal Misalignment Attacks against Multimodal Perception in Autonomous Driving

This paper introduces DejaVu, a novel attack that exploits in-vehicular network vulnerabilities to induce subtle temporal misalignments between camera and LiDAR streams, thereby severely degrading multimodal fusion-based perception tasks like object detection and tracking in autonomous driving systems.

Md Hasan Shahriar, Md Mohaimin Al Barat, Harshavardhan Sundar, Ning Zhang, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou2026-03-09🤖 cs.LG

Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL

This paper proposes a novel student-teacher framework for autonomous driving that utilizes a graph-based multi-agent RL teacher to automatically generate diverse, adaptive traffic curricula, enabling a student agent to achieve superior robustness and balanced driving performance compared to traditional rule-based approaches.

Ahmed Abouelazm, Johannes Ratz, Philip Schörner, J. Marius Zöllner2026-03-09🤖 cs.LG

Merging Memory and Space: A State Space Neural Operator

The paper proposes the State Space Neural Operator (SS-NO), a parameter-efficient architecture that extends structured state space models with adaptive damping and learnable frequency modulation to achieve state-of-the-art performance in learning solution operators for diverse time-dependent partial differential equations.

Nodens Koren, Samuel Lanthaler2026-03-09🤖 cs.LG

Multivariate Fields of Experts for Convergent Image Reconstruction

This paper introduces Multivariate Fields of Experts, a new image prior framework that generalizes existing methods using multivariate potential functions to achieve fast, interpretable, and theoretically guaranteed convergence in various inverse problems, outperforming univariate models while approaching deep learning performance with significantly fewer parameters and data.

Stanislas Ducotterd, Michael Unser2026-03-09🤖 cs.LG

Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression

This paper provides a theoretical characterization of the Expectation-Maximization algorithm's behavior in overspecified two-component mixed linear regression, establishing that unbalanced initial mixing weights yield linear convergence and optimal statistical accuracy, whereas balanced initial weights result in sublinear convergence and degraded accuracy.

Zhankun Luo, Abolfazl Hashemi2026-03-09🤖 cs.LG

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space

This paper introduces Kernel VICReg, a novel self-supervised learning framework that extends the VICReg objective into a Reproducing Kernel Hilbert Space to capture nonlinear dependencies and improve representation learning performance on datasets with complex geometric structures.

M. Hadi Sepanj, Benyamin Ghojogh, Saed Moradi, Paul Fieguth2026-03-09🤖 cs.LG

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

The paper introduces ScaleZero, a multi-task planning model that combines a Mixture-of-Experts architecture to mitigate gradient conflicts and an online Dynamic Parameter Scaling strategy to efficiently allocate capacity, achieving performance comparable to specialized single-task agents with significantly reduced sample complexity.

Yuan Pu, Yazhe Niu, Jia Tang, Junyu Xiong, Shuai Hu, Hongsheng Li2026-03-09🤖 cs.LG

Quantum parameter estimation with uncertainty quantification from continuous measurement data using neural network ensembles

This paper demonstrates that deep neural network ensembles enable accurate, real-time quantum parameter estimation with well-calibrated uncertainty quantification and drift detection, offering a faster alternative to traditional Bayesian inference methods without sacrificing accuracy.

Amanuel Anteneh2026-03-09⚛️ quant-ph

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

This paper introduces "Answer-Then-Check," a novel safety alignment method that enhances LLM robustness against jailbreak attacks by training models to generate direct answers internally and then critically evaluate their safety before responding, achieving superior protection with reduced over-refusal while maintaining general reasoning capabilities through the newly constructed 80K-sample ReSA dataset.

Chentao Cao, Xiaojun Xu, Bo Han, Hang Li2026-03-09🤖 cs.AI

VEGA: Electric Vehicle Navigation Agent via Physics-Informed Neural Operator and Proximal Policy Optimization

VEGA is an electric vehicle navigation system that combines a physics-informed neural operator for real-time vehicle parameter estimation with a Proximal Policy Optimization agent for efficient, charge-aware route and charging stop planning, demonstrating superior inference speed and generalization across international road networks compared to traditional energy-aware baselines.

Hansol Lim, Minhyeok Im, Jonathan Boyack, Jee Won Lee, Jongseong Brad Choi2026-03-09🤖 cs.LG

Spectral/Spatial Tensor Atomic Cluster Expansion with Universal Embeddings in Cartesian Space

This paper introduces the Tensor Atomic Cluster Expansion (TACE), a unified atomistic machine learning framework that employs irreducible Cartesian tensors to efficiently model both scalar and tensorial observables without complex angular-momentum coupling, demonstrating robust accuracy and scalability across diverse chemical systems and tasks.

Zemin Xu, Wenbo Xie, P. Hu2026-03-09🔬 cond-mat.mtrl-sci

C^2Prompt: Class-aware Client Knowledge Interaction for Federated Continual Learning

This paper proposes C²Prompt, a novel federated continual learning method that mitigates temporal and spatial forgetting by introducing a local class distribution compensation mechanism and a class-aware prompt aggregation scheme to enhance class-wise knowledge coherence across distributed clients.

Kunlun Xu, Yibo Feng, Jiangmeng Li, Yongsheng Qi, Jiahuan Zhou2026-03-09🤖 cs.LG

Auto-Regressive U-Net for Full-Field Prediction of Shrinkage-Induced Damage in Concrete

This paper proposes a computationally efficient dual-network architecture combining an auto-regressive U-Net and a CNN to predict time-dependent full-field damage evolution and key mechanical properties in concrete, thereby enabling insights into aggregate effects and optimizing mix designs for improved durability.

Liya Gaynutdinova, Petr Havlásek, Ondřej Rokoš, Fleur Hendriks, Martin Doškář2026-03-09🤖 cs.LG

Taxonomy-aware Dynamic Motion Generation on Hyperbolic Manifolds

This paper introduces GPHDM, a novel framework that extends Gaussian Process Dynamical Models to hyperbolic manifolds to generate physically consistent, human-like robot motions by preserving the hierarchical taxonomy and temporal dynamics of movement.

Luis Augenstein, Noémie Jaquier, Tamim Asfour, Leonel Rozo2026-03-09🤖 cs.LG

Planner Aware Path Learning in Diffusion Language Models Training

This paper addresses the training-inference mismatch in diffusion language models caused by planner-based sampling strategies by deriving a new Planned Evidence Lower Bound (P-ELBO) and introducing Planner Aware Path Learning (PAPL), a simple training modification that aligns training with planned inference to achieve significant performance gains across protein, text, and code generation tasks.

Fred Zhangzhi Peng, Zachary Bezemek, Jarrid Rector-Brooks, Shuibai Zhang, Anru R. Zhang, Michael Bronstein, Alexander Tong, Avishek Joey Bose2026-03-09🤖 cs.LG

Diffusion Alignment as Variational Expectation-Maximization

The paper introduces Diffusion Alignment as Variational Expectation-Maximization (DAV), an iterative framework that alternates between test-time search for diverse, reward-aligned samples and model refinement to optimize diffusion models for downstream objectives while mitigating reward over-optimization and mode collapse.

Jaewoo Lee, Minsu Kim, Sanghyeok Choi, Inhyuck Song, Sujin Yun, Hyeongyu Kang, Woocheol Shin, Taeyoung Yun, Kiyoung Om, Jinkyoo Park2026-03-09🤖 cs.LG

← Previous Next →