Quality over Quantity: Demonstration Curation via Influence Functions for Data-Centric Robot Learning

This paper introduces Quality over Quantity (QoQ), a systematic framework that leverages influence functions to automatically curate high-quality robot learning demonstrations by quantifying each sample's contribution to reducing validation loss, thereby significantly improving policy performance over manual or heuristic data selection methods.

Haeone Lee, Taywon Min, Junsu Kim, Sinjae Kang, Fangchen Liu, Lerrel Pinto, Kimin Lee2026-03-11🤖 cs.LG

Adaptive Active Learning for Online Reliability Prediction of Satellite Electronics

This paper proposes a novel integrated online reliability prediction framework for satellite electronics that combines a Wiener process-based degradation model with a two-stage adaptive active learning strategy to significantly improve prediction accuracy while reducing data requirements under limited and variable operational conditions.

Shixiang Li, Yubin Tian, Dianpeng Wang, Piao Chen, Mengying Ren2026-03-11🤖 cs.LG

Verifying Good Regulator Conditions for Hypergraph Observers: Natural Gradient Learning from Causal Invariance via Established Theorems

This paper verifies that persistent observers in causally invariant hypergraph substrates satisfy the Conant-Ashby Good Regulator Theorem, thereby necessitating internal models that lead to natural gradient descent as the unique learning rule and yielding a model-dependent closed-form formula for Vanchurin's regime parameter α\alpha with a quantum-classical threshold at κ(F)=2\kappa(F)=2.

Max Zhuravlev2026-03-11🤖 cs.LG

PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing

This paper proposes a Reconfigurable Intelligent Surface (RIS)-aided semantic-aware Vehicle Edge Computing framework that utilizes a Proximal Policy Optimization (PPO) and Linear Programming (LP) hybrid scheme to jointly optimize offloading ratios, semantic symbols, and RIS phase shifts, achieving a 40–50% reduction in end-to-end latency compared to existing methods.

Wei Feng, Jingbo Zhang, Qiong Wu, Pingyi Fan, Qiang Fan2026-03-11🤖 cs.LG

Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms

This paper identifies and theoretically proves that unmasked policy gradient algorithms systematically suppress valid actions at unvisited states due to parameter sharing and gradient propagation, a failure mode that action masking avoids and that can be mitigated in unmasked settings through feasibility classification.

Renos Zabounidis, Roy Siegelmann, Mohamad Qadri, Woojun Kim, Simon Stepputtis, Katia P. Sycara2026-03-11🤖 cs.LG

Probabilistic Hysteresis Factor Prediction for Electric Vehicle Batteries with Graphite Anodes Containing Silicon

This paper proposes a data-driven framework that harmonizes heterogeneous driving cycle data and employs statistical and deep learning models to enable efficient, probabilistic prediction of voltage hysteresis factors in silicon-graphite anode batteries, thereby improving state-of-charge estimation and generalizability across different vehicle models.

Runyao Yu, Viviana Kleine, Philipp Gromotka, Thomas Rudolf, Adrian Eisenmann, Gautham Ram Chandra Mouli, Peter Palensky, Jochen L. Cremer2026-03-11🤖 cs.LG

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

This paper introduces DCPO, a framework that resolves the inherent gradient conflict between accuracy and calibration in Reinforcement Learning from Verifiable Rewards by decoupling reasoning and confidence objectives, thereby achieving state-of-the-art calibration performance without compromising model accuracy.

Zhengzhao Ma, Xueru Wen, Boxi Cao, Yaojie Lu, Hongyu Lin, Jinglin Yang, Min He, Xianpei Han, Le Sun2026-03-11🤖 cs.LG

Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning

This paper proposes a Probability of Necessity and Sufficiency (PNS)-based regularization method for Class-Incremental Learning that utilizes a dual-scope counterfactual generator to mitigate feature collisions caused by intra-task shortcut reliance and inter-task semantic confusion, thereby ensuring both the causal completeness and separability of task-specific representations.

Zhen Zhang, Jielei Chu, Tianrui Li2026-03-11🤖 cs.AI

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

RubiCap introduces a novel reinforcement learning framework that leverages LLM-generated rubrics to create structured, multi-faceted reward signals for dense image captioning, thereby overcoming the limitations of supervised distillation and deterministic checkers to achieve state-of-the-art performance and superior word efficiency across various benchmarks.

Tzu-Heng Huang, Sirajul Salekin, Javier Movellan, Frederic Sala, Manjot Bilkhu2026-03-11🤖 cs.AI

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Latent-DARM is a novel latent-space communication framework that bridges Discrete Diffusion Language Models for global planning and Autoregressive Models for fluent execution, significantly improving reasoning accuracy on benchmarks like DART-5 and AIME2024 while drastically reducing token usage compared to state-of-the-art reasoning models.

Lina Berrayana, Ahmed Heakl, Abdullah Sohail, Thomas Hofmann, Salman Khan, Wei Chen2026-03-11🤖 cs.AI