Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models

This paper introduces TableEG, a framework that leverages fine-tuned large language models to generate authentic, distribution-aligned synthetic errors in tabular data, thereby addressing the scarcity of real-world error datasets and establishing a robust benchmark for evaluating data cleaning techniques.

Xinyuan Liu, Jiahui Chen, Bocheng Hu, Yu Sun, Xinyang Chen, Shaoxu Song, Yongxin TongTue, 10 Ma🤖 cs.LG

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

This paper identifies a pervasive "agreement bias" in Multimodal LLM verifiers that causes them to over-validate agent behavior, and proposes a lightweight Self-Grounded Verification (SGV) method that significantly improves failure detection and task completion across web navigation, computer use, and robotics by decoupling prior generation from trajectory evaluation.

Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt KiraTue, 10 Ma🤖 cs.LG

Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models

This paper proposes a tree-based Weak-to-Strong Generalization framework that leverages Monte Carlo Tree Search to organize both successful and failure trajectories from weak models, thereby significantly enhancing the reasoning and decision-making capabilities of strong models in complex interactive environments.

Ruimeng Ye, Zihan Wang, Yang Xiao, Zinan Ling, Manling Li, Bo HuiTue, 10 Ma🤖 cs.LG

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

This paper investigates how malicious auditees can construct fairness-compliant yet representative-looking samples from non-compliant distributions to deceive auditors, formalizes these manipulation strategies using optimal transport and entropic projections, and proposes statistical tests to detect such distributional manipulation attacks.

Valentin Lafargue, Adriana Laurindo Monteiro, Emmanuelle Claeys, Laurent Risser, Jean-Michel LoubesTue, 10 Ma🤖 cs.LG

Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models

This paper introduces a Dynamic, Automatic, and Systematic (DAS) red-teaming framework that exposes a critical "Benchmarking Gap" in medical large language models, revealing that despite high static benchmark scores, most models exhibit profound brittleness, privacy leaks, bias, and hallucinations when subjected to continuous, adversarial stress-testing.

Jiazhen Pan (Cherise), Bailiang Jian (Cherise), Paul Hager (Cherise), Yundi Zhang (Cherise), Che Liu (Cherise), Friedrike Jungmann (Cherise), Hongwei Bran Li (Cherise), Chenyu You (Cherise), Junde Wu (Cherise), Jiayuan Zhu (Cherise), Fenglin Liu (Cherise), Yuyuan Liu (Cherise), Niklas Bubeck (Cherise), Christian Wachinger (Cherise), Chen (Cherise), Chen (Cherise), Zhenyu Gong, Cheng Ouyang, Georgios Kaissis, Benedikt Wiestler, Daniel RueckertTue, 10 Ma🤖 cs.LG

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

The paper introduces CauKer, a novel algorithm that combines Gaussian Process kernel composition with Structural Causal Models to generate diverse, causally coherent synthetic time series, enabling sample-efficient pre-training of classification foundation models that exhibit clear scaling laws across varying dataset sizes and model capacities.

Shifeng Xie, Vasilii Feofanov, Ambroise Odonnat, Lei Zan, Marius Alonso, Jianfeng Zhang, Themis Palpanas, Lujia Pan, Keli Zhang, Ievgen RedkoTue, 10 Ma🤖 cs.LG

GraphProp: Training the Graph Foundation Models using Graph Properties

GraphProp is a two-phase framework for training graph foundation models that first learns structural generalization by predicting graph invariants and then leverages these representations as positional encodings to enhance cross-domain performance in graph-level tasks, particularly outperforming existing methods in scenarios with limited data or missing node attributes.

Ziheng Sun, Qi Feng, Lehao Lin, Chris Ding, Jicong FanTue, 10 Ma🤖 cs.LG

Constraint Learning in Multi-Agent Dynamic Games from Demonstrations of Local Nash Interactions

This paper presents an inverse dynamic game algorithm that uses mixed-integer linear programs to learn parametric constraints from multi-agent interaction demonstrations by encoding Karush-Kuhn-Tucker conditions, thereby providing theoretical guarantees for recovering inner approximations of safe and unsafe sets to enable robust motion planning.

Zhouyu Zhang, Chih-Yuan Chiu, Glen ChouTue, 10 Ma🤖 cs.LG

CbLDM: A Diffusion Model for recovering nanostructure from atomic pair distribution function

This paper proposes CbLDM, a Condition-based Latent Diffusion Model that utilizes conditional priors and Laplacian matrices to effectively and stably recover the nanostructures of monometallic nanoparticles from their atomic pair distribution functions, addressing the highly ill-posed nature of the inverse problem.

Jiarui Cao, Zhiyang Zhang, Heming Wang, Jun Xu, Ling Lan, Simon J. L. Billinge, Ran GuTue, 10 Ma🔬 cond-mat.mtrl-sci

Entropy-Driven Curriculum for Multi-Task Training in Human Mobility Prediction

This paper proposes a unified training framework that combines entropy-driven curriculum learning, which sequences training from simple to complex trajectories based on Lempel-Ziv compression, with multi-task learning to simultaneously optimize location, distance, and direction predictions, thereby achieving state-of-the-art performance and significantly faster convergence in human mobility prediction.

Tianye Fang, Xuanshu Luo, Martin WernerTue, 10 Ma🤖 cs.LG

Synthetic data for ratemaking: imputation-based methods vs adversarial networks and autoencoders

This paper benchmarks Multivariate Imputation by Chained Equations (MICE) against deep generative models like Variational Autoencoders and Conditional Tabular GANs for synthetic ratemaking data, finding that MICE offers a simpler yet high-fidelity alternative that effectively preserves statistical distributions and supports robust Generalized Linear Model training.

Yevhen Havrylenko, Meelis Käärik, Artur TuttarTue, 10 Ma🤖 cs.LG

Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization

This paper proposes the F²SA-pp method, which utilizes pp-th order finite differences to achieve a nearly optimal O~(pϵ4p/2)\tilde{\mathcal{O}}(p \epsilon^{-4-p/2}) complexity for finding ϵ\epsilon-stationary points in stochastic bilevel optimization with highly smooth objectives, thereby improving upon previous first-order bounds and matching the fundamental lower limit.

Lesi Chen, Junru Li, El Mahdi Chayti, Jingzhao ZhangTue, 10 Ma🤖 cs.LG

Behavioral Inference at Scale: The Fundamental Asymmetry Between Motivations and Belief Systems

Through large-scale experiments with over 1.5 million LLM-generated behavioral sequences, this paper reveals a fundamental asymmetry in behavioral inference where agent motivations are nearly perfectly recoverable while belief systems remain largely opaque due to inherent information-theoretic limits and architectural constraints, particularly within a "neutral zone" of behavioral ambiguity.

Jason Starace, Terence SouleTue, 10 Ma🤖 cs.LG

Physics-Aware Neural Operators for Direct Inversion in 3D Photoacoustic Tomography

The paper introduces PANO, a physics-aware neural operator that performs direct, single-pass inversion of raw sensor data into high-quality 3D photoacoustic images, outperforming traditional algorithms and enabling real-time reconstruction across diverse sparse acquisition settings to facilitate the clinical translation of 3D PACT.

Jiayun Wang, Yousuf Aborahama, Arya Khokhar, Yang Zhang, Chuwei Wang, Karteekeya Sastry, Julius Berner, Yilin Luo, Boris Bonev, Zongyi Li, Kamyar Azizzadenesheli, Lihong V. Wang, Anima AnandkumarTue, 10 Ma🤖 cs.LG