cs.LG papers | Gist.Science

Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment

This paper challenges prior claims of Best-of-N's suboptimality by demonstrating that, under practical assumptions and when evaluated via win-rate rather than expected reward, properly tuned Best-of-N is both statistically and computationally optimal, while also proposing a simple variant that eliminates reward hacking without sacrificing performance.

Ved Sriraman, Adam Block2026-03-09🤖 cs.AI

Full Dynamic Range Sky-Modelling For Image Based Lighting

This paper introduces Icarus, a deep learning-based all-weather sky model that overcomes the limitations of existing methods in handling full dynamic range and class-imbalanced solar regions to generate photorealistic, user-controllable environment maps for accurate Image-Based Lighting.

Ian J. Maquignaz2026-03-09🤖 cs.LG

MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation

The paper introduces MIRACL, a novel hierarchical Meta-MORL framework that enables few-shot generalization and efficient adaptation for multi-objective multi-echelon supply chain optimization by decomposing tasks into structured subproblems and employing a Pareto-based strategy to achieve superior performance over conventional baselines.

Rifny Rachman, Josh Tingey, Richard Allmendinger, Wei Pan, Pradyumn Shukla, Bahrul Ilmi Nasution2026-03-09🤖 cs.LG

Score-Guided Proximal Projection: A Unified Geometric Framework for Rectified Flow Editing

This paper introduces Score-Guided Proximal Projection (SGPP), a unified geometric framework that reformulates Rectified Flow editing as a proximal optimization problem to overcome the limitations of existing inversion and sampling methods by theoretically guaranteeing manifold convergence while enabling a continuous, training-free trade-off between identity preservation and generative flexibility.

Vansh Bansal, James G Scott2026-03-09🤖 cs.LG

TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks

This paper introduces TML-Bench, a benchmark for evaluating the end-to-end correctness and reliability of autonomous coding agents on Kaggle-style tabular machine learning tasks, demonstrating that the MiniMax-M2.1 model achieves the best aggregate performance across four competitions under varying time budgets.

Mykola Pinchuk2026-03-09🤖 cs.AI

Bridging Domains through Subspace-Aware Model Merging

This paper introduces SCORE, a novel model merging method that resolves singular subspace conflicts between domain-specific models by projecting them into a shared orthogonal basis, thereby significantly improving generalization to unseen domains compared to existing approaches.

Levy Chaves, Chao Zhou, Rebekka Burkholz, Eduardo Valle, Sandra Avila2026-03-09🤖 cs.AI

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

This paper proposes the Disentangled Safety Hypothesis (DSH), which reveals that large language models separate safety "recognition" and "refusal execution" into distinct geometric subspaces, enabling the development of the Refusal Erasure Attack (REA) to bypass safety mechanisms by surgically disabling the refusal axis while preserving harmful content generation.

Jinman Wu, Yi Xie, Shen Lin, Shiqian Zhao, Xiaofeng Chen2026-03-09🤖 cs.AI

First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic Constraints

This paper proposes a first-order Softmax-Weighted Switching Gradient method for distributed stochastic minimax optimization under stochastic constraints, achieving optimal oracle complexity and high-probability convergence guarantees in both full and partial client participation settings while avoiding the instability of traditional primal-dual approaches.

Zhankun Luo, Antesh Upadhyay, Sang Bin Moon, Abolfazl Hashemi2026-03-09🤖 cs.LG

The Coordination Gap: Alternation Metrics for Temporal Dynamics in Multi-Agent Battle of the Exes

This paper introduces temporally sensitive Alternation (ALT) metrics to reveal that conventional outcome-based evaluations can severely mischaracterize multi-agent coordination, as demonstrated by Q-learning agents in a Battle of the Exes variant that achieve high traditional fairness scores but perform significantly worse than random baselines in actual turn-taking dynamics.

Nikolaos Al. Papadopoulos, Konstantinos Psannis2026-03-09🤖 cs.LG

Sparse Crosscoders for diffing MoEs and Dense models

This paper utilizes crosscoders to demonstrate that Mixture of Experts (MoE) models develop more specialized, focused representations with fewer unique features compared to the broader, general-purpose feature distributions found in dense models of equivalent active parameter count.

Marmik Chaudhari, Nishkal Hundia, Idhant Gulati2026-03-09🤖 cs.LG

MoE Lens -- An Expert Is All You Need

This paper analyzes the DeepSeekMoE model to reveal that Mixture of Experts architectures exhibit highly concentrated specialization where a single dominant expert can approximate full ensemble performance, suggesting significant opportunities for inference optimization through targeted expert pruning.

Marmik Chaudhari, Idhant Gulati, Nishkal Hundia, Pranav Karra, Shivam Raval2026-03-09🤖 cs.LG

Margin and Consistency Supervision for Calibrated and Robust Vision Models

This paper introduces Margin and Consistency Supervision (MaCS), an architecture-agnostic regularization framework that combines a hinge-squared margin penalty and a consistency regularizer to simultaneously enhance the calibration, robustness, and generalization of deep vision models without requiring additional data or architectural changes.

Salim Khazem2026-03-09🤖 cs.AI

Self-Auditing Parameter-Efficient Fine-Tuning for Few-Shot 3D Medical Image Segmentation

The paper introduces SEA-PEFT, a self-auditing parameter-efficient fine-tuning framework that automates adapter configuration through a dynamic search-audit-allocate loop, achieving significant performance improvements in few-shot 3D medical image segmentation without requiring manual design or extensive computational resources.

Son Thai Ly, Hien V. Nguyen2026-03-09🤖 cs.LG

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

This paper empirically evaluates the effectiveness and limitations of many-shot prompting for test-time adaptation in large language models, finding that while it benefits structured tasks with high information gain, its performance is highly sensitive to selection strategies and often yields limited improvements for open-ended generation.

Shubhangi Upasani, Chen Wu, Jay Rainton, Bo Li, Changran Hu, Qizheng Zhang, Urmish Thakker2026-03-09🤖 cs.LG

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

ReflexiCoder is a novel reinforcement learning framework that internalizes structured self-reflection and self-correction capabilities into an LLM's weights, enabling it to autonomously generate, debug, and optimize code without external feedback while achieving state-of-the-art performance and improved token efficiency across multiple benchmarks.

Juyong Jiang, Jiasi Shen, Sunghun Kim, Kang Min Yoo, Jeonghoon Kim, Sungju Kim2026-03-09🤖 cs.LG

Stochastic Event Prediction via Temporal Motif Transitions

The paper introduces STEP, a continuous-time stochastic framework that models temporal link prediction as a sequential forecasting problem via Poisson-driven motif transitions, achieving significant precision gains and lower runtime compared to state-of-the-art baselines while remaining compatible with existing graph neural networks.

\.Ibrahim Bahadır Altun, Ahmet Erdem Sarıyüce2026-03-09🤖 cs.LG

ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning

This paper proposes ROSE, a reordered SparseGPT method that enhances one-shot LLM pruning accuracy by adaptively reordering weights based on estimated column and block pruning losses to address the suboptimal performance caused by predefined left-to-right pruning orders in layers with columnar patterns.

Mingluo Su, Huan Wang2026-03-09🤖 cs.LG

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

PixARMesh is a novel autoregressive method that directly reconstructs complete, high-fidelity, and artist-ready 3D indoor scene meshes from a single RGB image by jointly predicting object layout and geometry within a unified model, eliminating the need for implicit fields or post-hoc optimization.

Xiang Zhang, Sohyun Yoo, Hongrui Wu, Chuan Li, Jianwen Xie, Zhuowen Tu2026-03-09🤖 cs.LG

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

This paper proposes three bias mitigation techniques—top-k concept filtering, removal of biased concepts, and adversarial debiasing—to address information leakage in Concept Bottleneck Models, thereby achieving superior fairness-performance tradeoffs for interpretable image classification compared to prior work.

Schrasing Tong, Antoine Salaun, Vincent Yuan, Annabel Adeyeri, Lalana Kagal2026-03-09🤖 cs.LG

Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning

This paper introduces Reference-guided Policy Optimization (RePO), a novel framework that combines reinforcement learning with verifiable rewards and supervised reference guidance to effectively balance exploration and exploitation in molecular optimization tasks where only single-reference data is available, thereby outperforming existing SFT and RLVR baselines.

Xuan Li, Zhanke Zhou, Zongze Li, Jiangchao Yao, Yu Rong, Lu Zhang, Bo Han2026-03-09🤖 cs.AI

← Previous Next →