cs.LG papers | Gist.Science

Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments

This paper introduces Joint MDPs (JMDPs), a formalism that augments standard MDPs with a multi-action sample transition model to specify the joint distribution of counterfactual one-step outcomes, enabling the derivation of Bellman operators and convergent dynamic programming algorithms for environments with coupled dynamics.

Ege C. Kaya, Mahsa Ghasemi, Abolfazl Hashemi2026-03-10🤖 cs.LG

How Private Are DNA Embeddings? Inverting Foundation Model Representations of Genomic Sequences

This study demonstrates that DNA foundation models (DNABERT-2, Evo 2, and NTv2) are vulnerable to model inversion attacks, where adversaries can reconstruct sensitive genomic sequences from shared embeddings with high accuracy, particularly for shorter sequences and per-token representations, thereby highlighting critical privacy risks in Embeddings-as-a-Service frameworks.

Sofiane Ouaari, Jules Kreuer, Nico Pfeifer2026-03-10🤖 cs.LG

Not All Neighbors Matter: Understanding the Impact of Graph Sparsification on GNN Pipelines

This paper demonstrates that graph sparsification serves as an effective, lightweight preprocessing step for Graph Neural Networks, significantly accelerating training and inference on large-scale graphs while often preserving or even improving predictive accuracy.

Yuhang Song, Naima Abrar Shami, Romaric Duvignau, Vasiliki Kalavri2026-03-10🤖 cs.LG

Post-Training with Policy Gradients: Optimality and the Base Model Barrier

This paper establishes that while policy gradient methods can achieve near-optimal performance in post-training autoregressive models when staying within the base model's support, they face an exponential query barrier to generalize beyond it unless process rewards are utilized to leverage token-level likelihood quantiles.

Alireza Mousavi-Hosseini, Murat A. Erdogdu2026-03-10🤖 cs.LG

Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards

Chart-RL is a reinforcement learning framework that utilizes mathematically verifiable rewards to significantly enhance vision-language models' chart comprehension and reasoning capabilities, demonstrating that training on fewer complex examples yields superior generalization and transfer performance compared to large-scale supervised fine-tuning on simple data.

Xin Zhang, Xingyu Li, Rongguang Wang, Ruizhong Miao, Zheng Wang, Dan Roth, Chenyang Li2026-03-10🤖 cs.LG

Learning Quadruped Walking from Seconds of Demonstration

This paper presents a principled analysis of why imitation learning is effective for quadruped locomotion with limited data and proposes a new offline method that enables robots to learn robust walking policies from just a few seconds of demonstration.

Ruipeng Zhang, Hongzhan Yu, Ya-Chien Chang, Chenghao Li, Henrik I. Christensen, Sicun Gao2026-03-10🤖 cs.LG

A SISA-based Machine Unlearning Framework for Power Transformer Inter-Turn Short-Circuit Fault Localization

This paper proposes a SISA-based machine unlearning framework that efficiently localizes power transformer inter-turn short-circuit faults by isolating and selectively retraining only the data shards affected by sensor poisoning, thereby achieving diagnostic accuracy comparable to full retraining while significantly reducing computational time.

Nanhong Liu, Jingyi Yan, Mucun Sun, Jie Zhang2026-03-10🤖 cs.LG

Topology-Aware Reinforcement Learning over Graphs for Resilient Power Distribution Networks

This paper proposes a topology-aware graph reinforcement learning framework that integrates persistence homology to enhance power distribution network resilience, demonstrating superior performance in maximizing power delivery and minimizing voltage violations across diverse outage scenarios compared to baseline models.

Roshni Anna Jacob, Prithvi Poddar, Jaidev Goel, Souma Chowdhury, Yulia R. Gel, Jie Zhang2026-03-10🤖 cs.LG

Conditional Unbalanced Optimal Transport Maps: An Outlier-Robust Framework for Conditional Generative Modeling

This paper introduces Conditional Unbalanced Optimal Transport Maps (CUOTM), a robust conditional generative framework that mitigates the outlier sensitivity of classical Conditional Optimal Transport by relaxing distribution-matching constraints via Csiszár divergence penalties while preserving conditioning marginals through a theoretically justified triangular $c$ -transform parameterization.

Jiwoo Yoon, Kyumin Choi, Jaewoong Choi2026-03-10🤖 cs.LG

NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning

This paper introduces NePPO, a novel multi-agent reinforcement learning pipeline that computes approximate Nash equilibria in general-sum games by learning a player-independent potential function to transform the mixed cooperative-competitive environment into an approximating cooperative game.

Addison Kalanther, Sanika Bharvirkar, Shankar Sastry, Chinmay Maheshwari2026-03-10🤖 cs.LG

Diffusion Controller: Framework, Algorithms and Parameterization

The paper introduces Diffusion Controller (DiffCon), a unified control-theoretic framework that models reverse diffusion sampling as a state-only stochastic control problem within LS-MDPs, enabling the derivation of practical fine-tuning algorithms and a lightweight side-network architecture that outperforms existing gray-box and white-box adaptation methods.

Tong Yang, Moonkyung Ryu, Chih-Wei Hsu, Guy Tennenholtz, Yuejie Chi, Craig Boutilier, Bo Dai2026-03-10🤖 cs.LG

Masked Unfairness: Hiding Causality within Zero ATE

This paper demonstrates that optimizing for objectives like profit or crime reduction while maintaining a zero Average Treatment Effect (ATE) can mask significant unfairness driven by confounding, thereby arguing that fairness regulations must shift from evaluating aggregate decision-level outcomes to scrutinizing model-level causal mechanisms.

Zou Yang, Sophia Xiao, Bijan Mazaheri2026-03-10🤖 cs.LG

Adaptive Discovery of Interpretable Audio Attributes with Multimodal LLMs for Low-Resource Classification

This paper proposes a method that leverages Multimodal Large Language Models to adaptively and rapidly discover interpretable audio attributes within the AdaFlock framework, achieving superior low-resource classification performance compared to direct MLLM prediction and human-driven approaches in under 11 minutes.

Kosuke Yoshimura, Hisashi Kashima2026-03-10🤖 cs.LG

Combinatorial Allocation Bandits with Nonlinear Arm Utility

This paper introduces the Combinatorial Allocation Bandits (CAB) framework to optimize arm satisfaction in matching platforms rather than just maximizing match counts, proposing and analyzing both Upper Confidence Bound and Thompson Sampling algorithms that achieve near-optimal regret bounds under a generalized linear model.

Yuki Shibukawa, Koichi Tanaka, Yuta Saito, Shinji Ito2026-03-10🤖 cs.LG

Can Safety Emerge from Weak Supervision? A Systematic Analysis of Small Language Models

This paper introduces Self-MOA, a fully automated framework that aligns small language models using weak supervision from automated evaluators to achieve significant safety improvements with minimal training data while preserving helpfulness.

Punyajoy Saha, Sudipta Halder, Debjyoti Mondal, Subhadarshi Panda2026-03-10🤖 cs.LG

TEA-Time: Transporting Effects Across Time

This paper introduces the TEA-Time framework for extrapolating treatment effects across different time periods by proposing two identification strategies with doubly robust estimators, which are validated through simulations and applied to Upworthy A/B tests to demonstrate a trade-off between precision and bias.

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky2026-03-10🤖 cs.LG

RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States

The paper introduces \textsc{ReSched}, a minimalist deep reinforcement learning framework that simplifies the Flexible Job Shop Scheduling Problem by condensing the state space to four essential features and utilizing a modified Transformer architecture, achieving superior performance and generalization across various scheduling variants compared to existing methods.

Xiangjie Xiao, Cong Zhang, Wen Song, Zhiguang Cao2026-03-10🤖 cs.LG

Resource-Adaptive Federated Text Generation with Differential Privacy

This paper proposes a resource-adaptive federated learning framework that combines differentially private finetuning by strong clients with a lightweight voting mechanism for weak clients to generate reusable, domain-aligned synthetic text datasets while addressing computational heterogeneity and privacy constraints.

Jiayi Wang, John Gounley, Heidi Hanson2026-03-10🤖 cs.LG

The Talking Robot: Distortion-Robust Acoustic Models for Robot-Robot Communication

The paper introduces Artoo, a lightweight, end-to-end neural acoustic communication system for robots that co-trains a compact text-to-speech transmitter and ASR receiver to achieve distortion-robust, high-accuracy decoding in noisy environments without preserving human-like speech qualities.

Hanlong Li, Karishma Kamalahasan, Jiahui Li, Kazuhiro Nakadai, Shreyas Kousik2026-03-10🤖 cs.LG

Interpretable Maximum Margin Deep Anomaly Detection

The paper proposes Interpretable Maximum Margin Deep Anomaly Detection (IMD-AD), a novel method that utilizes a small set of labeled anomalies and a maximum margin objective to overcome hypersphere collapse in Deep SVDD while enabling end-to-end learning of interpretable decision boundaries and improved detection performance.

Zhiji Yang, Mei Huang, Xinyu Li, Xianli Pan, Qi Wang, Jianhua Zhao2026-03-10🤖 cs.LG

← Previous Next →