Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

This paper introduces "Answer-Then-Check," a novel safety alignment method that enhances LLM robustness against jailbreak attacks by training models to generate direct answers internally and then critically evaluate their safety before responding, achieving superior protection with reduced over-refusal while maintaining general reasoning capabilities through the newly constructed 80K-sample ReSA dataset.

Chentao Cao, Xiaojun Xu, Bo Han, Hang Li2026-03-09🤖 cs.AI

VEGA: Electric Vehicle Navigation Agent via Physics-Informed Neural Operator and Proximal Policy Optimization

VEGA is an electric vehicle navigation system that combines a physics-informed neural operator for real-time vehicle parameter estimation with a Proximal Policy Optimization agent for efficient, charge-aware route and charging stop planning, demonstrating superior inference speed and generalization across international road networks compared to traditional energy-aware baselines.

Hansol Lim, Minhyeok Im, Jonathan Boyack, Jee Won Lee, Jongseong Brad Choi2026-03-09🤖 cs.LG

Auto-Regressive U-Net for Full-Field Prediction of Shrinkage-Induced Damage in Concrete

This paper proposes a computationally efficient dual-network architecture combining an auto-regressive U-Net and a CNN to predict time-dependent full-field damage evolution and key mechanical properties in concrete, thereby enabling insights into aggregate effects and optimizing mix designs for improved durability.

Liya Gaynutdinova, Petr Havlásek, Ondřej Rokoš, Fleur Hendriks, Martin Doškář2026-03-09🤖 cs.LG

Planner Aware Path Learning in Diffusion Language Models Training

This paper addresses the training-inference mismatch in diffusion language models caused by planner-based sampling strategies by deriving a new Planned Evidence Lower Bound (P-ELBO) and introducing Planner Aware Path Learning (PAPL), a simple training modification that aligns training with planned inference to achieve significant performance gains across protein, text, and code generation tasks.

Fred Zhangzhi Peng, Zachary Bezemek, Jarrid Rector-Brooks, Shuibai Zhang, Anru R. Zhang, Michael Bronstein, Alexander Tong, Avishek Joey Bose2026-03-09🤖 cs.LG

Diffusion Alignment as Variational Expectation-Maximization

The paper introduces Diffusion Alignment as Variational Expectation-Maximization (DAV), an iterative framework that alternates between test-time search for diverse, reward-aligned samples and model refinement to optimize diffusion models for downstream objectives while mitigating reward over-optimization and mode collapse.

Jaewoo Lee, Minsu Kim, Sanghyeok Choi, Inhyuck Song, Sujin Yun, Hyeongyu Kang, Woocheol Shin, Taeyoung Yun, Kiyoung Om, Jinkyoo Park2026-03-09🤖 cs.LG

Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits

This paper addresses the online minimization of polarization and disagreement in the Friedkin-Johnsen opinion dynamics model under incomplete information by proposing a two-stage low-rank matrix bandit algorithm that achieves a cumulative regret of O~(max(1κ,V)VT)\widetilde{\mathcal{O}}\big(\max(\tfrac{1}{\kappa},\sqrt{|V|})\sqrt{|V|T}\big) through subspace estimation and linear bandit optimization.

Federico Cinus, Yuko Kuroki, Atsushi Miyauchi, Francesco Bonchi2026-03-09🤖 cs.LG

Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

This paper demonstrates that while standard decoder-only models underperform compared to encoder-only architectures in cross-modal adaptation for partial differential equations, introducing novel bidirectionality-mimicking techniques like Parallel Flipping and Sequence Doubling effectively closes this performance gap.

Paloma García-de-Herreros, Philipp Slusallek, Dietrich Klakow, Vagrant Gautam2026-03-09🤖 cs.LG

Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence

This paper demonstrates that injecting external verification into synthetic data retraining can prevent model collapse and yield near-term improvements, though theoretical analysis and experiments across linear regression, VAEs, and LLMs show that long-term performance ultimately converges to the verifier's knowledge center and may plateau or decline if the verifier is imperfect.

Bingji Yi, Qiyuan Liu, Yuwei Cheng, Haifeng Xu2026-03-09🤖 cs.LG

Real-Time Learning of Predictive Dynamic Obstacle Models for Robotic Motion Planning

This paper presents a real-time online framework that utilizes modified sliding-window Hankel Dynamic Mode Decomposition with singular-value hard thresholding and Cadzow projection to denoise partial measurements and construct predictive models for dynamic obstacle motion, enabling stable, variance-aware forecasting suitable for robotic motion planning.

Stella Kombo, Masih Haseli, Skylar X. Wei, Joel W. Burdick2026-03-09🤖 cs.LG