MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation

The paper introduces MIRACL, a novel hierarchical Meta-MORL framework that enables few-shot generalization and efficient adaptation for multi-objective multi-echelon supply chain optimization by decomposing tasks into structured subproblems and employing a Pareto-based strategy to achieve superior performance over conventional baselines.

Rifny Rachman, Josh Tingey, Richard Allmendinger, Wei Pan, Pradyumn Shukla, Bahrul Ilmi Nasution2026-03-09🤖 cs.LG

Score-Guided Proximal Projection: A Unified Geometric Framework for Rectified Flow Editing

This paper introduces Score-Guided Proximal Projection (SGPP), a unified geometric framework that reformulates Rectified Flow editing as a proximal optimization problem to overcome the limitations of existing inversion and sampling methods by theoretically guaranteeing manifold convergence while enabling a continuous, training-free trade-off between identity preservation and generative flexibility.

Vansh Bansal, James G Scott2026-03-09🤖 cs.LG

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

This paper proposes the Disentangled Safety Hypothesis (DSH), which reveals that large language models separate safety "recognition" and "refusal execution" into distinct geometric subspaces, enabling the development of the Refusal Erasure Attack (REA) to bypass safety mechanisms by surgically disabling the refusal axis while preserving harmful content generation.

Jinman Wu, Yi Xie, Shen Lin, Shiqian Zhao, Xiaofeng Chen2026-03-09🤖 cs.AI

First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic Constraints

This paper proposes a first-order Softmax-Weighted Switching Gradient method for distributed stochastic minimax optimization under stochastic constraints, achieving optimal oracle complexity and high-probability convergence guarantees in both full and partial client participation settings while avoiding the instability of traditional primal-dual approaches.

Zhankun Luo, Antesh Upadhyay, Sang Bin Moon, Abolfazl Hashemi2026-03-09🤖 cs.LG

The Coordination Gap: Alternation Metrics for Temporal Dynamics in Multi-Agent Battle of the Exes

This paper introduces temporally sensitive Alternation (ALT) metrics to reveal that conventional outcome-based evaluations can severely mischaracterize multi-agent coordination, as demonstrated by Q-learning agents in a Battle of the Exes variant that achieve high traditional fairness scores but perform significantly worse than random baselines in actual turn-taking dynamics.

Nikolaos Al. Papadopoulos, Konstantinos Psannis2026-03-09🤖 cs.LG

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

This paper empirically evaluates the effectiveness and limitations of many-shot prompting for test-time adaptation in large language models, finding that while it benefits structured tasks with high information gain, its performance is highly sensitive to selection strategies and often yields limited improvements for open-ended generation.

Shubhangi Upasani, Chen Wu, Jay Rainton, Bo Li, Changran Hu, Qizheng Zhang, Urmish Thakker2026-03-09🤖 cs.LG

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

ReflexiCoder is a novel reinforcement learning framework that internalizes structured self-reflection and self-correction capabilities into an LLM's weights, enabling it to autonomously generate, debug, and optimize code without external feedback while achieving state-of-the-art performance and improved token efficiency across multiple benchmarks.

Juyong Jiang, Jiasi Shen, Sunghun Kim, Kang Min Yoo, Jeonghoon Kim, Sungju Kim2026-03-09🤖 cs.LG

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

This paper proposes three bias mitigation techniques—top-k concept filtering, removal of biased concepts, and adversarial debiasing—to address information leakage in Concept Bottleneck Models, thereby achieving superior fairness-performance tradeoffs for interpretable image classification compared to prior work.

Schrasing Tong, Antoine Salaun, Vincent Yuan, Annabel Adeyeri, Lalana Kagal2026-03-09🤖 cs.LG

Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning

This paper introduces Reference-guided Policy Optimization (RePO), a novel framework that combines reinforcement learning with verifiable rewards and supervised reference guidance to effectively balance exploration and exploitation in molecular optimization tasks where only single-reference data is available, thereby outperforming existing SFT and RLVR baselines.

Xuan Li, Zhanke Zhou, Zongze Li, Jiangchao Yao, Yu Rong, Lu Zhang, Bo Han2026-03-09🤖 cs.AI