cs.AI papers | Gist.Science

RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

This paper introduces RbtAct, a framework that leverages peer review rebuttals as implicit supervision to train large language models to generate more actionable and specific review feedback through a novel perspective-conditioned task and a new dataset called RMR-75K.

Sihong Wu, Yiling Ma, Yilun Zhao, Tiansheng Hu, Owen Jiang, Manasi Patwardhan, Arman CohanWed, 11 Ma🤖 cs.AI

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models

This paper introduces MUGEN, a comprehensive benchmark revealing that Large Audio-Language Models struggle with multi-audio understanding as input scaling increases, and demonstrates that combining training-free strategies like Audio-Permutational Self-Consistency with Chain-of-Thought can significantly improve performance.

Chih-Kai Yang, Yun-Shao Tsai, Yu-Kai Guo, Ping-Le Tsai, Yen-Ting Piao, Hung-Wei Chen, Ting-Lin Hsiao, Yun-Man Hsu, Ke-Han Lu, Hung-yi LeeWed, 11 Ma🤖 cs.AI

Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning

Mousse is a novel optimizer that improves upon the Muon algorithm by integrating Shampoo's Kronecker-factored preconditioning to adaptively handle the heavy-tailed curvature of deep neural networks, thereby achieving faster training convergence with negligible computational overhead.

Yechen Zhang, Shuhao Xing, Junhao Huang, Kai Lv, Yunhua Zhou, Xipeng Qiu, Qipeng Guo, Kai ChenWed, 11 Ma🤖 cs.AI

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

The paper introduces ActiveUltraFeedback, an efficient active learning pipeline that leverages uncertainty estimates and novel selection strategies like Double Reverse Thompson Sampling to generate high-quality preference data, enabling Large Language Models to achieve superior alignment performance with as little as one-sixth of the annotated data required by static baselines.

Davit Melikidze, Marian Schneider, Jessica Lam, Martin Wertich, Ido Hakimi, Barna Pásztor, Andreas KrauseWed, 11 Ma🤖 cs.AI

ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

The paper proposes ESAinsTOD, a unified end-to-end schema-aware instruction-tuning framework that leverages full-parameter LLM fine-tuning with instruction and schema alignment mechanisms to achieve superior performance, generalization in low-resource settings, and robustness against noise across diverse task-oriented dialog benchmarks.

Dechuan Teng, Chunlin Lu, Libo Qin, Wanxiang CheWed, 11 Ma🤖 cs.AI

AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering

This paper introduces AutoViVQA, a large-scale automatically constructed dataset for Vietnamese Visual Question Answering, and evaluates transformer-based multimodal models alongside various automatic metrics to assess their performance and alignment with human judgment in the Vietnamese context.

Nguyen Anh Tuong, Phan Ba Duc, Nguyen Trung Quoc, Tran Dac Thinh, Dang Duy Lan, Nguyen Quoc Thinh, Tung LeWed, 11 Ma🤖 cs.AI

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

This study demonstrates that a custom Transformer architecture outperforms both traditional machine learning models and zero-shot generative LLMs in automatically classifying cardiac risk from large-context, unstructured Dutch electronic health records, offering a robust alternative to manual administrative coding for geriatric cardiovascular risk management.

Jacopo Vitale, David Della Morte, Luca Bacco, Mario Merone, Mark de Groot, Saskia Haitjema, Leandro Pecchia, Bram van EsWed, 11 Ma🤖 cs.AI

GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation

This paper introduces an open-source framework for Graph Neural Network-based Time Series Anomaly Detection to enable reproducible experimentation and critical evaluation, demonstrating that GNNs enhance both detection performance and interpretability while highlighting the need for standardized metrics and thresholding strategies.

Federico Bello, Gonzalo Chiarlone, Marcelo Fiori, Gastón García González, Federico LarrocaWed, 11 Ma🤖 cs.AI

When to Lock Attention: Training-Free KV Control in Video Diffusion

KV-Lock is a training-free framework for DiT-based video diffusion models that dynamically adjusts background key-value locking and classifier-free guidance scales based on hallucination detection to simultaneously enhance foreground quality and maintain background consistency.

Tianyi Zeng, Jincheng Gao, Tianyi Wang, Zijie Meng, Miao Zhang, Jun Yin, Haoyuan Sun, Junfeng Jiao, Christian Claudel, Junbo Tan, Xueqian WangWed, 11 Ma🤖 cs.AI

MM-tau-p $^2$ : Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

This paper introduces MM-tau-p $^2$ , a novel benchmark featuring 12 metrics to holistically evaluate the robustness and efficiency of multi-modal LLM agents in dual-control settings, specifically assessing their performance with and without user persona adaptation across domains like telecom and retail.

Anupam Purwar, Aditya ChoudharyWed, 11 Ma🤖 cs.AI

Grounding Synthetic Data Generation With Vision and Language Models

This paper proposes a vision-language grounded framework for interpretable synthetic data generation and evaluation in remote sensing, introducing the ARAS400k dataset which demonstrates that augmenting real data with synthetic images consistently outperforms real-data-only baselines in semantic segmentation and image captioning tasks.

Ümit Mert Ça\u{g}lar, Alptekin TemizelWed, 11 Ma🤖 cs.AI

A Variational Latent Equilibrium for Learning in Cortex

This paper proposes a biologically plausible, local learning framework for time-continuous neuronal networks that approximates backpropagation through time by deriving real-time error dynamics from a prospective energy function, thereby unifying and extending the Generalized Latent Equilibrium model to enable spatiotemporal credit assignment consistent with brain circuitry.

Simon Brandt, Paul Haider, Walter Senn, Federico Benitez, Mihai A. PetroviciWed, 11 Ma🤖 cs.AI

Routing without Forgetting

The paper introduces Routing without Forgetting (RwF), a transformer architecture that addresses Online Continual Learning by replacing iterative gradient-based specialization with dynamic, single-step associative retrieval of input-conditioned prompts via energy-based layers, thereby achieving superior performance on class-incremental benchmarks without explicit task identifiers.

Alessio Masano, Giovanni Bellitto, Dipam Goswani, Joost Van de Weijer, Concetto SpampinatoWed, 11 Ma🤖 cs.AI

Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference

This paper demonstrates that Mamba-2's state space duality can be implemented entirely using standard XLA primitives without custom kernels, achieving portable, host-synchronization-free $O(1)$ autoregressive caching with high performance across CPU, NVIDIA GPU, and Google Cloud TPU hardware.

Cosmo SantoniWed, 11 Ma🤖 cs.AI

Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

This paper introduces Efficient Draft Adaptation (EDA), a parameter- and data-efficient framework that restores speculative decoding performance on fine-tuned target models through a decoupled architecture, data regeneration strategy, and sample selection mechanism, achieving superior acceptance lengths with significantly reduced training costs compared to full retraining.

Luxi Lin, Zhihang Lin, Zhanpeng Zeng, Yuhao Chen, Qingyu Zhang, Jixiang Luo, Xuelong Li, Rongrong JiWed, 11 Ma🤖 cs.AI

Evolving Prompt Adaptation for Vision-Language Models

The paper proposes EvoPrompt, a novel framework that achieves stable, knowledge-preserving adaptation of vision-language models in few-shot scenarios by employing a modality-shared projector and an evolutionary training strategy that decouples prompt updates into directional and magnitude components to prevent catastrophic forgetting.

Enming Zhang, Jiayang Li, Yanru Wu, Zhenyu Liu, Yang LiWed, 11 Ma🤖 cs.AI

Temporal-Conditioned Normalizing Flows for Multivariate Time Series Anomaly Detection

This paper introduces temporal-conditioned normalizing flows (tcNF), a novel autoregressive framework that enhances multivariate time series anomaly detection by effectively modeling temporal dependencies and uncertainty to identify low-probability events with improved accuracy and robustness.

David Baumgartner, Helge Langseth, Kenth Engø-Monsen, Heri RamampiaroWed, 11 Ma🤖 cs.AI

EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation

EvoDriveVLA is a novel Vision-Language-Action model for autonomous driving that overcomes perception degradation and planning instability through a collaborative distillation framework combining self-anchored visual constraints and oracle-guided trajectory optimization to achieve state-of-the-art performance.

Jiajun Cao, Xiaoan Zhang, Xiaobao Wei, Liyuqiu Huang, Wang Zijian, Hanzhen Zhang, Zhengyu Jia, Wei Mao, Hao Wang, Xianming Liu, Shuchang Zhou Liu, Yang Wang, Shanghang ZhangWed, 11 Ma🤖 cs.AI

Declarative Scenario-based Testing with RoadLogic

This paper introduces RoadLogic, an open-source framework that bridges declarative OpenSCENARIO specifications and executable simulations by combining Answer Set Programming, motion planning, and specification-based monitoring to automatically generate diverse, realistic, and compliant autonomous vehicle testing scenarios.

Ezio Bartocci, Alessio Gambi, Felix Gigler, Cristinel Mateis, Dejan NičkovicWed, 11 Ma🤖 cs.AI

Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers

This paper introduces Variational Mixture-of-Experts Routing (VMoER), a scalable Bayesian framework that confines uncertainty quantification to the expert-selection stage of Mixture-of-Experts Transformers, achieving significant improvements in calibration, stability, and out-of-distribution detection with negligible computational overhead.

Albus Yizhuo Li, Matthew WickerWed, 11 Ma🤖 cs.AI

← Previous Next →

cs.AI