cs.AI papers | Gist.Science

Routing without Forgetting

The paper introduces Routing without Forgetting (RwF), a transformer architecture that addresses Online Continual Learning by replacing iterative gradient-based specialization with dynamic, single-step associative retrieval of input-conditioned prompts via energy-based layers, thereby achieving superior performance on class-incremental benchmarks without explicit task identifiers.

Alessio Masano, Giovanni Bellitto, Dipam Goswani, Joost Van de Weijer, Concetto SpampinatoWed, 11 Ma🤖 cs.AI

A Variational Latent Equilibrium for Learning in Cortex

This paper proposes a biologically plausible, local learning framework for time-continuous neuronal networks that approximates backpropagation through time by deriving real-time error dynamics from a prospective energy function, thereby unifying and extending the Generalized Latent Equilibrium model to enable spatiotemporal credit assignment consistent with brain circuitry.

Simon Brandt, Paul Haider, Walter Senn, Federico Benitez, Mihai A. PetroviciWed, 11 Ma🤖 cs.AI

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

This paper proposes "Context Engineering" as a foundational discipline that, alongside Intent and Specification Engineering, forms a maturity model for scaling autonomous multi-agent systems by shifting focus from individual prompts to the comprehensive management of an agent's informational environment, goals, and policy constraints.

Vera V. VishnyakovaWed, 11 Ma🤖 cs.AI

Grounding Synthetic Data Generation With Vision and Language Models

This paper proposes a vision-language grounded framework for interpretable synthetic data generation and evaluation in remote sensing, introducing the ARAS400k dataset which demonstrates that augmenting real data with synthetic images consistently outperforms real-data-only baselines in semantic segmentation and image captioning tasks.

Ümit Mert Ça\u{g}lar, Alptekin TemizelWed, 11 Ma🤖 cs.AI

PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

PRECEPT is a unified test-time adaptation framework that enhances LLM agent resilience by integrating deterministic exact-match rule retrieval, conflict-aware memory with Bayesian reliability, and the Pareto-guided COMPASS prompt-evolution loop to achieve superior compositional generalization, continuous learning, and robustness against knowledge drift and adversarial inputs.

Arash ShahmansooriWed, 11 Ma🤖 cs.AI

MM-tau-p $^2$ : Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

This paper introduces MM-tau-p $^2$ , a novel benchmark featuring 12 metrics to holistically evaluate the robustness and efficiency of multi-modal LLM agents in dual-control settings, specifically assessing their performance with and without user persona adaptation across domains like telecom and retail.

Anupam Purwar, Aditya ChoudharyWed, 11 Ma🤖 cs.AI

MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

This paper introduces MiniAppBench, a comprehensive benchmark derived from real-world data to evaluate LLMs' ability to generate principle-driven interactive HTML applications, alongside MiniAppEval, an agentic framework that uses browser automation to assess these applications across intention, static, and dynamic dimensions.

Zuhao Zhang, Chengyue Yu, Yuante Li, Chenyi Zhuang, Linjian Mo, Shuai LiWed, 11 Ma🤖 cs.AI

When to Lock Attention: Training-Free KV Control in Video Diffusion

KV-Lock is a training-free framework for DiT-based video diffusion models that dynamically adjusts background key-value locking and classifier-free guidance scales based on hallucination detection to simultaneously enhance foreground quality and maintain background consistency.

Tianyi Zeng, Jincheng Gao, Tianyi Wang, Zijie Meng, Miao Zhang, Jun Yin, Haoyuan Sun, Junfeng Jiao, Christian Claudel, Junbo Tan, Xueqian WangWed, 11 Ma🤖 cs.AI

GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation

This paper introduces an open-source framework for Graph Neural Network-based Time Series Anomaly Detection to enable reproducible experimentation and critical evaluation, demonstrating that GNNs enhance both detection performance and interpretability while highlighting the need for standardized metrics and thresholding strategies.

Federico Bello, Gonzalo Chiarlone, Marcelo Fiori, Gastón García González, Federico LarrocaWed, 11 Ma🤖 cs.AI

Logics-Parsing-Omni Technical Report

This paper introduces the Omni Parsing framework and the Logics-Parsing-Omni model, which unify document, image, and audio-visual parsing through a three-level hierarchical paradigm of holistic detection, fine-grained recognition, and multi-level interpreting to transform unstructured multimodal signals into traceable, evidence-based structured knowledge.

Xin An, Jingyi Cai, Xiangyang Chen, Huayao Liu, Peiting Liu, Peng Wang, Bei Yang, Xiuwen Zhu, Yongfan Chen, Baoyu Hou, Shuzhao Li, Weidong Ren, Fan Yang, Jiangtao Zhang, Xiaoxiao Xu, Lin QuWed, 11 Ma🤖 cs.AI

EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages

The paper introduces EsoLang-Bench, a novel benchmark utilizing esoteric programming languages to expose the limitations of large language models' genuine reasoning capabilities by revealing a dramatic performance gap between their high scores on standard benchmarks and near-zero accuracy on tasks requiring the acquisition of new languages through documentation and experimentation rather than memorization.

Aman Sharma, Paras ChopraWed, 11 Ma🤖 cs.AI

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

This study demonstrates that a custom Transformer architecture outperforms both traditional machine learning models and zero-shot generative LLMs in automatically classifying cardiac risk from large-context, unstructured Dutch electronic health records, offering a robust alternative to manual administrative coding for geriatric cardiovascular risk management.

Jacopo Vitale, David Della Morte, Luca Bacco, Mario Merone, Mark de Groot, Saskia Haitjema, Leandro Pecchia, Bram van EsWed, 11 Ma🤖 cs.AI

AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering

This paper introduces AutoViVQA, a large-scale automatically constructed dataset for Vietnamese Visual Question Answering, and evaluates transformer-based multimodal models alongside various automatic metrics to assess their performance and alignment with human judgment in the Vietnamese context.

Nguyen Anh Tuong, Phan Ba Duc, Nguyen Trung Quoc, Tran Dac Thinh, Dang Duy Lan, Nguyen Quoc Thinh, Tung LeWed, 11 Ma🤖 cs.AI

ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

The paper proposes ESAinsTOD, a unified end-to-end schema-aware instruction-tuning framework that leverages full-parameter LLM fine-tuning with instruction and schema alignment mechanisms to achieve superior performance, generalization in low-resource settings, and robustness against noise across diverse task-oriented dialog benchmarks.

Dechuan Teng, Chunlin Lu, Libo Qin, Wanxiang CheWed, 11 Ma🤖 cs.AI

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

The paper introduces ActiveUltraFeedback, an efficient active learning pipeline that leverages uncertainty estimates and novel selection strategies like Double Reverse Thompson Sampling to generate high-quality preference data, enabling Large Language Models to achieve superior alignment performance with as little as one-sixth of the annotated data required by static baselines.

Davit Melikidze, Marian Schneider, Jessica Lam, Martin Wertich, Ido Hakimi, Barna Pásztor, Andreas KrauseWed, 11 Ma🤖 cs.AI

Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning

Mousse is a novel optimizer that improves upon the Muon algorithm by integrating Shampoo's Kronecker-factored preconditioning to adaptively handle the heavy-tailed curvature of deep neural networks, thereby achieving faster training convergence with negligible computational overhead.

Yechen Zhang, Shuhao Xing, Junhao Huang, Kai Lv, Yunhua Zhou, Xipeng Qiu, Qipeng Guo, Kai ChenWed, 11 Ma🤖 cs.AI

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

The paper introduces OOD-MMSafe, a benchmark revealing significant causal blindness in current Multimodal Large Language Models regarding hidden consequences, and proposes the Consequence-Aware Safety Policy Optimization (CASPO) framework to effectively mitigate these risks by shifting safety alignment from intent detection to consequence projection.

Ming Wen, Kun Yang, Jingyu Zhang, Yuxuan Liu, shiwen cui, Shouling Ji, Xingjun MaWed, 11 Ma🤖 cs.AI

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models

This paper introduces MUGEN, a comprehensive benchmark revealing that Large Audio-Language Models struggle with multi-audio understanding as input scaling increases, and demonstrates that combining training-free strategies like Audio-Permutational Self-Consistency with Chain-of-Thought can significantly improve performance.

Chih-Kai Yang, Yun-Shao Tsai, Yu-Kai Guo, Ping-Le Tsai, Yen-Ting Piao, Hung-Wei Chen, Ting-Lin Hsiao, Yun-Man Hsu, Ke-Han Lu, Hung-yi LeeWed, 11 Ma🤖 cs.AI

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

This paper introduces CVS, a training-free data selection method that identifies high-quality vision-language samples by measuring how much a question alters a frozen model's assessment of answer validity, thereby improving multimodal reasoning performance while reducing computational costs.

Peng Sun, Huawen Shen, Yi Ban, Tianfan Fu, Yanbo Wang, Yuqiang LiWed, 11 Ma🤖 cs.AI

AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents

AutoAgent is a self-evolving multi-agent framework that integrates evolving cognition, on-the-fly contextual decision-making, and elastic memory orchestration to enable autonomous agents to adaptively learn from experience and make reliable, context-aware decisions in dynamic environments without external retraining.

Xiaoxing Wang, Ning Liao, Shikun Wei, Chen Tang, Feiyu XiongWed, 11 Ma🤖 cs.AI

← Previous Next →

cs.AI