Routing without Forgetting

The paper introduces Routing without Forgetting (RwF), a transformer architecture that addresses Online Continual Learning by replacing iterative gradient-based specialization with dynamic, single-step associative retrieval of input-conditioned prompts via energy-based layers, thereby achieving superior performance on class-incremental benchmarks without explicit task identifiers.

Alessio Masano, Giovanni Bellitto, Dipam Goswani, Joost Van de Weijer, Concetto SpampinatoWed, 11 Ma🤖 cs.AI

A Variational Latent Equilibrium for Learning in Cortex

This paper proposes a biologically plausible, local learning framework for time-continuous neuronal networks that approximates backpropagation through time by deriving real-time error dynamics from a prospective energy function, thereby unifying and extending the Generalized Latent Equilibrium model to enable spatiotemporal credit assignment consistent with brain circuitry.

Simon Brandt, Paul Haider, Walter Senn, Federico Benitez, Mihai A. PetroviciWed, 11 Ma🤖 cs.AI

PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

PRECEPT is a unified test-time adaptation framework that enhances LLM agent resilience by integrating deterministic exact-match rule retrieval, conflict-aware memory with Bayesian reliability, and the Pareto-guided COMPASS prompt-evolution loop to achieve superior compositional generalization, continuous learning, and robustness against knowledge drift and adversarial inputs.

Arash ShahmansooriWed, 11 Ma🤖 cs.AI

MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

This paper introduces MiniAppBench, a comprehensive benchmark derived from real-world data to evaluate LLMs' ability to generate principle-driven interactive HTML applications, alongside MiniAppEval, an agentic framework that uses browser automation to assess these applications across intention, static, and dynamic dimensions.

Zuhao Zhang, Chengyue Yu, Yuante Li, Chenyi Zhuang, Linjian Mo, Shuai LiWed, 11 Ma🤖 cs.AI

When to Lock Attention: Training-Free KV Control in Video Diffusion

KV-Lock is a training-free framework for DiT-based video diffusion models that dynamically adjusts background key-value locking and classifier-free guidance scales based on hallucination detection to simultaneously enhance foreground quality and maintain background consistency.

Tianyi Zeng, Jincheng Gao, Tianyi Wang, Zijie Meng, Miao Zhang, Jun Yin, Haoyuan Sun, Junfeng Jiao, Christian Claudel, Junbo Tan, Xueqian WangWed, 11 Ma🤖 cs.AI

GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation

This paper introduces an open-source framework for Graph Neural Network-based Time Series Anomaly Detection to enable reproducible experimentation and critical evaluation, demonstrating that GNNs enhance both detection performance and interpretability while highlighting the need for standardized metrics and thresholding strategies.

Federico Bello, Gonzalo Chiarlone, Marcelo Fiori, Gastón García González, Federico LarrocaWed, 11 Ma🤖 cs.AI

Logics-Parsing-Omni Technical Report

This paper introduces the Omni Parsing framework and the Logics-Parsing-Omni model, which unify document, image, and audio-visual parsing through a three-level hierarchical paradigm of holistic detection, fine-grained recognition, and multi-level interpreting to transform unstructured multimodal signals into traceable, evidence-based structured knowledge.

Xin An, Jingyi Cai, Xiangyang Chen, Huayao Liu, Peiting Liu, Peng Wang, Bei Yang, Xiuwen Zhu, Yongfan Chen, Baoyu Hou, Shuzhao Li, Weidong Ren, Fan Yang, Jiangtao Zhang, Xiaoxiao Xu, Lin QuWed, 11 Ma🤖 cs.AI

EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages

The paper introduces EsoLang-Bench, a novel benchmark utilizing esoteric programming languages to expose the limitations of large language models' genuine reasoning capabilities by revealing a dramatic performance gap between their high scores on standard benchmarks and near-zero accuracy on tasks requiring the acquisition of new languages through documentation and experimentation rather than memorization.

Aman Sharma, Paras ChopraWed, 11 Ma🤖 cs.AI

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

This study demonstrates that a custom Transformer architecture outperforms both traditional machine learning models and zero-shot generative LLMs in automatically classifying cardiac risk from large-context, unstructured Dutch electronic health records, offering a robust alternative to manual administrative coding for geriatric cardiovascular risk management.

Jacopo Vitale, David Della Morte, Luca Bacco, Mario Merone, Mark de Groot, Saskia Haitjema, Leandro Pecchia, Bram van EsWed, 11 Ma🤖 cs.AI

AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering

This paper introduces AutoViVQA, a large-scale automatically constructed dataset for Vietnamese Visual Question Answering, and evaluates transformer-based multimodal models alongside various automatic metrics to assess their performance and alignment with human judgment in the Vietnamese context.

Nguyen Anh Tuong, Phan Ba Duc, Nguyen Trung Quoc, Tran Dac Thinh, Dang Duy Lan, Nguyen Quoc Thinh, Tung LeWed, 11 Ma🤖 cs.AI

ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

The paper proposes ESAinsTOD, a unified end-to-end schema-aware instruction-tuning framework that leverages full-parameter LLM fine-tuning with instruction and schema alignment mechanisms to achieve superior performance, generalization in low-resource settings, and robustness against noise across diverse task-oriented dialog benchmarks.

Dechuan Teng, Chunlin Lu, Libo Qin, Wanxiang CheWed, 11 Ma🤖 cs.AI

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

The paper introduces ActiveUltraFeedback, an efficient active learning pipeline that leverages uncertainty estimates and novel selection strategies like Double Reverse Thompson Sampling to generate high-quality preference data, enabling Large Language Models to achieve superior alignment performance with as little as one-sixth of the annotated data required by static baselines.

Davit Melikidze, Marian Schneider, Jessica Lam, Martin Wertich, Ido Hakimi, Barna Pásztor, Andreas KrauseWed, 11 Ma🤖 cs.AI

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

The paper introduces OOD-MMSafe, a benchmark revealing significant causal blindness in current Multimodal Large Language Models regarding hidden consequences, and proposes the Consequence-Aware Safety Policy Optimization (CASPO) framework to effectively mitigate these risks by shifting safety alignment from intent detection to consequence projection.

Ming Wen, Kun Yang, Jingyu Zhang, Yuxuan Liu, shiwen cui, Shouling Ji, Xingjun MaWed, 11 Ma🤖 cs.AI

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models

This paper introduces MUGEN, a comprehensive benchmark revealing that Large Audio-Language Models struggle with multi-audio understanding as input scaling increases, and demonstrates that combining training-free strategies like Audio-Permutational Self-Consistency with Chain-of-Thought can significantly improve performance.

Chih-Kai Yang, Yun-Shao Tsai, Yu-Kai Guo, Ping-Le Tsai, Yen-Ting Piao, Hung-Wei Chen, Ting-Lin Hsiao, Yun-Man Hsu, Ke-Han Lu, Hung-yi LeeWed, 11 Ma🤖 cs.AI