Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

This paper introduces \textsc{Gome}, a gradient-based MLE agent that outperforms traditional tree search methods on MLE-Bench by mapping diagnostic reasoning to gradient computation, demonstrating that as LLM reasoning capabilities improve, gradient-based optimization becomes increasingly superior to exhaustive enumeration.

Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang BianWed, 11 Ma🤖 cs.AI

Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation

Pri4R is a simple yet effective method that enhances Vision-Language-Action models with an implicit understanding of world dynamics by training them to predict 3D point tracks using privileged 4D information, thereby significantly improving physical manipulation performance without adding inference overhead.

Jisoo Kim, Jungbin Cho, Sanghyeok Chu, Ananya Bal, Jinhyung Kim, Gunhee Lee, Sihaeng Lee, Seung Hwan Kim, Bohyung Han, Hyunmin Lee, Laszlo A. Jeni, Seungryong KimWed, 11 Ma🤖 cs.AI

Zero-Shot and Supervised Bird Image Segmentation Using Foundation Models: A Dual-Pipeline Approach with Grounding DINO~1.5, YOLOv11, and SAM~2.1

This paper proposes a dual-pipeline framework for bird image segmentation that leverages the frozen SAM 2.1 backbone with either a zero-shot Grounding DINO 1.5 detector or a supervised fine-tuned YOLOv11 detector, achieving state-of-the-art performance on the CUB-200-2011 dataset while eliminating the need for retraining the segmentation model across different species or domains.

Abhinav MunagalaWed, 11 Ma🤖 cs.AI

Breaking the Factorization Barrier in Diffusion Language Models

The paper introduces Coupled Discrete Diffusion (CoDD), a hybrid framework that overcomes the "factorization barrier" in diffusion language models by replacing fully factorized outputs with a lightweight probabilistic inference layer, thereby enabling efficient parallel generation of coherent, high-quality text without the prohibitive costs of full joint modeling or reinforcement learning.

Ian Li, Zilei Shao, Benjie Wang, Rose Yu, Guy Van den Broeck, Anji LiuWed, 11 Ma🤖 cs.AI

Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision

This paper proposes an energy-aware spike budgeting framework that integrates experience replay, learnable neuron parameters, and an adaptive scheduler to effectively mitigate catastrophic forgetting while optimizing both accuracy and energy efficiency in Spiking Neural Networks across diverse frame-based and event-based neuromorphic vision benchmarks.

Anika Tabassum Meem, Muntasir Hossain Nadid, Md Zesun Ahmed MiaWed, 11 Ma🤖 cs.AI

WebAccessVL: Violation-Aware VLM for Web Accessibility

The paper introduces WebAccessVL, a violation-aware vision-language model that automatically edits website HTML to fix WCAG2 accessibility violations while preserving visual design, achieving a 96% reduction in violations and outperforming GPT-5 through a supervised image-conditioned program synthesis approach enhanced by a checker-in-the-loop refinement strategy.

Amber Yijia Zheng, Jae Joong Lee, Bedrich Benes, Raymond A. YehWed, 11 Ma🤖 cs.AI

Automating Forecasting Question Generation and Resolution for AI Evaluation

This paper presents an automated system using LLM-powered web research agents to generate and resolve diverse, real-world forecasting questions at scale, demonstrating high-quality question creation and resolution rates that surpass human-curated platforms while effectively evaluating and improving AI forecasting performance.

Nikos I. Bosse, Peter Mühlbacher, Jack Wildman, Lawrence Phillips, Dan SchwarzWed, 11 Ma🤖 cs.AI

CLEAR-Mamba:Towards Accurate, Adaptive and Trustworthy Multi-Sequence Ophthalmic Angiography Classification

The paper introduces CLEAR-Mamba, an enhanced MedMamba framework featuring a hypernetwork-based adaptive conditioning layer and a reliability-aware prediction scheme, which achieves superior accuracy and trustworthiness in multi-sequence ophthalmic angiography classification by addressing challenges in generalization and confidence estimation.

Zhuonan Wang, Wenjie Yan, Wenqiao Zhang, Xiaohui Song, Jian Ma, Ke Yao, Yibo Yu, Beng Chin OoiWed, 11 Ma🤖 cs.AI

MCGI: Manifold-Consistent Graph Indexing for Billion-Scale Disk-Resident Vector Search

The paper proposes Manifold-Consistent Graph Indexing (MCGI), a geometry-aware, disk-resident indexing method that leverages Local Intrinsic Dimensionality to dynamically adapt search strategies, achieving significantly higher throughput and lower latency than state-of-the-art baselines on billion-scale datasets by resolving the Euclidean-Geodesic mismatch in high-dimensional spaces.

Dongfang ZhaoWed, 11 Ma🤖 cs.AI

EMFusion: Conditional Diffusion Framework for Trustworthy Frequency Selective EMF Forecasting in Wireless Networks

This paper introduces EMFusion, a conditional multivariate diffusion-based framework that leverages a residual U-Net with cross-attention and imputation-based sampling to provide accurate, uncertainty-quantified, frequency-selective electromagnetic field forecasts for wireless network planning, significantly outperforming existing baseline models.

Zijiang Yan, Yixiang Huang, Jianhua Pei, Hina Tabassum, Luca ChiaraviglioWed, 11 Ma🤖 cs.AI

Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

This paper introduces ELERAG, an enhanced Retrieval-Augmented Generation system that integrates Wikidata-based Entity Linking and a hybrid re-ranking strategy to significantly improve factual accuracy in Italian educational question-answering, particularly outperforming standard methods in domain-specific contexts while demonstrating the importance of domain-adapted strategies.

Francesco Granata, Francesco Poggi, Misael MongiovìWed, 11 Ma🤖 cs.AI