cs.LG papers | Gist.Science

Scalable Multi-Task Learning for Particle Collision Event Reconstruction with Heterogeneous Graph Neural Networks

This paper proposes a scalable Heterogeneous Graph Neural Network (HGNN) that employs a multi-task learning paradigm to simultaneously perform particle vertex association and graph pruning, thereby significantly improving beauty hadron reconstruction performance and inference efficiency for complex particle collision events at the Large Hadron Collider.

William Sutcliffe, Marta Calvi, Simone Capelli + 5 more2026-03-09⚛️ hep-ex

RM-R1: Reward Modeling as Reasoning

The paper introduces Reasoning Reward Models (ReasRMs), specifically the RM-R1 family, which reformulate reward modeling as a reasoning task using a chain-of-rubrics mechanism and a two-stage training pipeline to achieve superior interpretability and performance compared to existing large-scale models.

Xiusi Chen, Gaotang Li, Ziqi Wang, Bowen Jin, Cheng Qian, Yu Wang, Hongru Wang, Yu Zhang, Denghui Zhang, Tong Zhang, Hanghang Tong, Heng Ji2026-03-09🤖 cs.AI

Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias

This paper introduces a variant of Polyak's stepsizes for entropic mirror descent to solve linear systems without restrictive domain assumptions, establishing sublinear and linear convergence rates, strengthening $\ell_1$ -norm implicit bias bounds, and generalizing results to arbitrary convex $L$ -smooth functions while proposing an exponentiation-free alternative method.

Yura Malitsky, Alexander Posch2026-03-09🤖 cs.LG

Maximizing Asynchronicity in Event-based Neural Networks

This paper introduces EVA, a novel event-by-event asynchronous-to-synchronous (A2S) framework inspired by language modeling that generates highly expressive features, outperforming prior methods in recognition tasks and achieving state-of-the-art results in detection for event-based vision.

Haiqing Hao, Nikola Zubic, Weihua He, Zhipeng Sui, Davide Scaramuzza, Wenhui Wang2026-03-09🤖 cs.AI

ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge

The paper introduces ESGenius, the first comprehensive benchmark comprising a curated corpus of authoritative ESG documents and a rigorously validated question-answer dataset, which reveals that while large language models exhibit moderate zero-shot performance in sustainability domains, their accuracy significantly improves when grounded in retrieval-augmented generation (RAG) using the provided source materials.

Chaoyue He, Xin Zhou, Yi Wu + 9 more2026-03-09💬 cs.CL

ContextBench: Modifying Contexts for Targeted Latent Activation

This paper introduces ContextBench, a benchmark for evaluating methods that generate fluent inputs to trigger specific latent features in language models, and demonstrates that enhanced Evolutionary Prompt Optimization variants achieve state-of-the-art performance in balancing elicitation strength with linguistic fluency.

Robert Graham, Edward Stevinson, Leo Richter, Alexander Chia, Joseph Miller, Joseph Isaac Bloom2026-03-09🤖 cs.AI

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

The paper introduces Sysformer, a novel approach that safeguards frozen large language models by learning to adapt system prompts in the embedding space to significantly improve safety robustness against harmful inputs and jailbreaking attacks without requiring costly model fine-tuning.

Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar2026-03-09🤖 cs.AI

SPoT: Subpixel Placement of Tokens in Vision Transformers

The paper introduces SPoT, a novel tokenization strategy that positions Vision Transformer tokens continuously within images via an oracle-guided search to overcome grid-based limitations, significantly reducing the required token count while improving performance and interpretability.

Martine Hjelkrem-Tan, Marius Aasan, Gabriel Y. Arteaga, Adín Ramírez Rivera2026-03-09🤖 cs.LG

Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding

This paper introduces Quantifying Cross-Attention Interaction (QCAI), a novel post-hoc explainable AI method that interprets cross-attention mechanisms in encoder-decoder transformers to improve the understanding of TCR-pMHC binding, achieving state-of-the-art performance on the newly established TCR-XAI benchmark of 274 experimentally determined structures.

Jiarui Li, Zixiang Yin, Haley Smith, Zhengming Ding, Samuel J. Landry, Ramgopal R. Mettu2026-03-09🤖 cs.LG

Temporal Misalignment Attacks against Multimodal Perception in Autonomous Driving

This paper introduces DejaVu, a novel attack that exploits in-vehicular network vulnerabilities to induce subtle temporal misalignments between camera and LiDAR streams, thereby severely degrading multimodal fusion-based perception tasks like object detection and tracking in autonomous driving systems.

Md Hasan Shahriar, Md Mohaimin Al Barat, Harshavardhan Sundar, Ning Zhang, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou2026-03-09🤖 cs.LG

Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL

This paper proposes a novel student-teacher framework for autonomous driving that utilizes a graph-based multi-agent RL teacher to automatically generate diverse, adaptive traffic curricula, enabling a student agent to achieve superior robustness and balanced driving performance compared to traditional rule-based approaches.

Ahmed Abouelazm, Johannes Ratz, Philip Schörner, J. Marius Zöllner2026-03-09🤖 cs.LG

Merging Memory and Space: A State Space Neural Operator

The paper proposes the State Space Neural Operator (SS-NO), a parameter-efficient architecture that extends structured state space models with adaptive damping and learnable frequency modulation to achieve state-of-the-art performance in learning solution operators for diverse time-dependent partial differential equations.

Nodens Koren, Samuel Lanthaler2026-03-09🤖 cs.LG

Multivariate Fields of Experts for Convergent Image Reconstruction

This paper introduces Multivariate Fields of Experts, a new image prior framework that generalizes existing methods using multivariate potential functions to achieve fast, interpretable, and theoretically guaranteed convergence in various inverse problems, outperforming univariate models while approaching deep learning performance with significantly fewer parameters and data.

Stanislas Ducotterd, Michael Unser2026-03-09🤖 cs.LG

Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression

This paper provides a theoretical characterization of the Expectation-Maximization algorithm's behavior in overspecified two-component mixed linear regression, establishing that unbalanced initial mixing weights yield linear convergence and optimal statistical accuracy, whereas balanced initial weights result in sublinear convergence and degraded accuracy.

Zhankun Luo, Abolfazl Hashemi2026-03-09🤖 cs.LG

Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space

This paper introduces Kernel VICReg, a novel self-supervised learning framework that extends the VICReg objective into a Reproducing Kernel Hilbert Space to capture nonlinear dependencies and improve representation learning performance on datasets with complex geometric structures.

M. Hadi Sepanj, Benyamin Ghojogh, Saed Moradi, Paul Fieguth2026-03-09🤖 cs.LG

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

The paper introduces ScaleZero, a multi-task planning model that combines a Mixture-of-Experts architecture to mitigate gradient conflicts and an online Dynamic Parameter Scaling strategy to efficiently allocate capacity, achieving performance comparable to specialized single-task agents with significantly reduced sample complexity.

Yuan Pu, Yazhe Niu, Jia Tang, Junyu Xiong, Shuai Hu, Hongsheng Li2026-03-09🤖 cs.LG

Quantum parameter estimation with uncertainty quantification from continuous measurement data using neural network ensembles

This paper demonstrates that deep neural network ensembles enable accurate, real-time quantum parameter estimation with well-calibrated uncertainty quantification and drift detection, offering a faster alternative to traditional Bayesian inference methods without sacrificing accuracy.

Amanuel Anteneh2026-03-09⚛️ quant-ph

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

This paper introduces "Answer-Then-Check," a novel safety alignment method that enhances LLM robustness against jailbreak attacks by training models to generate direct answers internally and then critically evaluate their safety before responding, achieving superior protection with reduced over-refusal while maintaining general reasoning capabilities through the newly constructed 80K-sample ReSA dataset.

Chentao Cao, Xiaojun Xu, Bo Han, Hang Li2026-03-09🤖 cs.AI

VEGA: Electric Vehicle Navigation Agent via Physics-Informed Neural Operator and Proximal Policy Optimization

VEGA is an electric vehicle navigation system that combines a physics-informed neural operator for real-time vehicle parameter estimation with a Proximal Policy Optimization agent for efficient, charge-aware route and charging stop planning, demonstrating superior inference speed and generalization across international road networks compared to traditional energy-aware baselines.

Hansol Lim, Minhyeok Im, Jonathan Boyack, Jee Won Lee, Jongseong Brad Choi2026-03-09🤖 cs.LG

Spectral/Spatial Tensor Atomic Cluster Expansion with Universal Embeddings in Cartesian Space

This paper introduces the Tensor Atomic Cluster Expansion (TACE), a unified atomistic machine learning framework that employs irreducible Cartesian tensors to efficiently model both scalar and tensorial observables without complex angular-momentum coupling, demonstrating robust accuracy and scalability across diverse chemical systems and tasks.

Zemin Xu, Wenbo Xie, P. Hu2026-03-09🔬 cond-mat.mtrl-sci

← Previous Next →