Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits

This paper addresses the online minimization of polarization and disagreement in the Friedkin-Johnsen opinion dynamics model under incomplete information by proposing a two-stage low-rank matrix bandit algorithm that achieves a cumulative regret of O~(max(1κ,V)VT)\widetilde{\mathcal{O}}\big(\max(\tfrac{1}{\kappa},\sqrt{|V|})\sqrt{|V|T}\big) through subspace estimation and linear bandit optimization.

Federico Cinus, Yuko Kuroki, Atsushi Miyauchi, Francesco Bonchi2026-03-09🤖 cs.LG

Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

This paper demonstrates that while standard decoder-only models underperform compared to encoder-only architectures in cross-modal adaptation for partial differential equations, introducing novel bidirectionality-mimicking techniques like Parallel Flipping and Sequence Doubling effectively closes this performance gap.

Paloma García-de-Herreros, Philipp Slusallek, Dietrich Klakow, Vagrant Gautam2026-03-09🤖 cs.LG

Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence

This paper demonstrates that injecting external verification into synthetic data retraining can prevent model collapse and yield near-term improvements, though theoretical analysis and experiments across linear regression, VAEs, and LLMs show that long-term performance ultimately converges to the verifier's knowledge center and may plateau or decline if the verifier is imperfect.

Bingji Yi, Qiyuan Liu, Yuwei Cheng, Haifeng Xu2026-03-09🤖 cs.LG

Real-Time Learning of Predictive Dynamic Obstacle Models for Robotic Motion Planning

This paper presents a real-time online framework that utilizes modified sliding-window Hankel Dynamic Mode Decomposition with singular-value hard thresholding and Cadzow projection to denoise partial measurements and construct predictive models for dynamic obstacle motion, enabling stable, variance-aware forecasting suitable for robotic motion planning.

Stella Kombo, Masih Haseli, Skylar X. Wei, Joel W. Burdick2026-03-09🤖 cs.LG

FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle

The paper introduces FireScope, a novel VLM-based framework and accompanying FireScope-Bench dataset that leverage chain-of-thought reasoning to significantly improve the generalization, interpretability, and accuracy of cross-continental wildfire risk prediction by integrating visual, climatic, and geographic factors.

Mario Markov (INSAIT, Sofia University "St. Kliment Ohridski"), Stefan Maria Ailuro (INSAIT, Sofia University "St. Kliment Ohridski"), Luc Van Gool (INSAIT, Sofia University "St. Kliment Ohridski"), Konrad Schindler (ETH Zurich), Danda Pani Paudel (INSAIT, Sofia University "St. Kliment Ohridski")2026-03-09🤖 cs.LG

SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization

The paper proposes SPINE, a token-selective test-time reinforcement learning framework that improves reasoning model performance by updating only high-entropy decision-critical tokens with entropy-band regularization, thereby preventing response collapse and enhancing stability without requiring external labels or reward models.

Jianghao Wu, Yasmeen George, Jin Ye, Yicheng Wu, Daniel F. Schmidt, Jianfei Cai2026-03-09🤖 cs.LG

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

This paper introduces Soft Q-based Diffusion Finetuning (SQDF), a novel KL-regularized reinforcement learning method that employs a reparameterized policy gradient of a training-free soft Q-function, enhanced by discount factors, consistency models, and off-policy replay buffers, to effectively align diffusion models with downstream objectives while mitigating reward over-optimization and preserving sample diversity.

Hyeongyu Kang, Jaewoo Lee, Woocheol Shin, Kiyoung Om, Jinkyoo Park2026-03-09🤖 cs.AI

Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity

This paper proposes a novel training framework that leverages the α\alpha-divergence family to explicitly filter incorrect answers and control the precision-diversity trade-off, thereby overcoming the diversity loss inherent in standard Reinforcement Learning and achieving state-of-the-art performance on the Lean theorem-proving benchmark.

Germán Kruszewski, Pierre Erbacher, Jos Rozen, Marc Dymetman2026-03-09🤖 cs.AI

DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection

DFIR-DETR is a transformer-based small object detector that addresses key limitations in standard architectures by introducing Dynamic Content-Feature Aggregation for adaptive attention, a norm-preserving Dynamic Feature Pyramid Network for detail recovery, and a Frequency-domain Iterative Refinement module to preserve high-frequency boundaries, achieving state-of-the-art performance on NEU-DET and VisDrone benchmarks with high efficiency.

Bo Gao, Jingcheng Tong, Xingsheng Chen, Han Yu, Zichen Li2026-03-09🤖 cs.LG