cs.AI papers | Gist.Science

Grounding Synthetic Data Generation With Vision and Language Models

This paper proposes a vision-language grounded framework for interpretable synthetic data generation and evaluation in remote sensing, introducing the ARAS400k dataset which demonstrates that augmenting real data with synthetic images consistently outperforms real-data-only baselines in semantic segmentation and image captioning tasks.

Ümit Mert Ça\u{g}lar, Alptekin Temizel2026-03-11🤖 cs.AI

PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

PRECEPT is a unified test-time adaptation framework that enhances LLM agent resilience by integrating deterministic exact-match rule retrieval, conflict-aware memory with Bayesian reliability, and the Pareto-guided COMPASS prompt-evolution loop to achieve superior compositional generalization, continuous learning, and robustness against knowledge drift and adversarial inputs.

Arash Shahmansoori2026-03-11🤖 cs.AI

MM-tau-p $^2$ : Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

This paper introduces MM-tau-p $^2$ , a novel benchmark featuring 12 metrics to holistically evaluate the robustness and efficiency of multi-modal LLM agents in dual-control settings, specifically assessing their performance with and without user persona adaptation across domains like telecom and retail.

Anupam Purwar, Aditya Choudhary2026-03-11🤖 cs.AI

MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

This paper introduces MiniAppBench, a comprehensive benchmark derived from real-world data to evaluate LLMs' ability to generate principle-driven interactive HTML applications, alongside MiniAppEval, an agentic framework that uses browser automation to assess these applications across intention, static, and dynamic dimensions.

Zuhao Zhang, Chengyue Yu, Yuante Li, Chenyi Zhuang, Linjian Mo, Shuai Li2026-03-11🤖 cs.AI

When to Lock Attention: Training-Free KV Control in Video Diffusion

KV-Lock is a training-free framework for DiT-based video diffusion models that dynamically adjusts background key-value locking and classifier-free guidance scales based on hallucination detection to simultaneously enhance foreground quality and maintain background consistency.

Tianyi Zeng, Jincheng Gao, Tianyi Wang, Zijie Meng, Miao Zhang, Jun Yin, Haoyuan Sun, Junfeng Jiao, Christian Claudel, Junbo Tan, Xueqian Wang2026-03-11🤖 cs.AI

GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation

This paper introduces an open-source framework for Graph Neural Network-based Time Series Anomaly Detection to enable reproducible experimentation and critical evaluation, demonstrating that GNNs enhance both detection performance and interpretability while highlighting the need for standardized metrics and thresholding strategies.

Federico Bello, Gonzalo Chiarlone, Marcelo Fiori, Gastón García González, Federico Larroca2026-03-11🤖 cs.AI

Logics-Parsing-Omni Technical Report

This paper introduces the Omni Parsing framework and the Logics-Parsing-Omni model, which unify document, image, and audio-visual parsing through a three-level hierarchical paradigm of holistic detection, fine-grained recognition, and multi-level interpreting to transform unstructured multimodal signals into traceable, evidence-based structured knowledge.

Xin An, Jingyi Cai, Xiangyang Chen, Huayao Liu, Peiting Liu, Peng Wang, Bei Yang, Xiuwen Zhu, Yongfan Chen, Baoyu Hou, Shuzhao Li, Weidong Ren, Fan Yang, Jiangtao Zhang, Xiaoxiao Xu, Lin Qu2026-03-11🤖 cs.AI

EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages

The paper introduces EsoLang-Bench, a novel benchmark utilizing esoteric programming languages to expose the limitations of large language models' genuine reasoning capabilities by revealing a dramatic performance gap between their high scores on standard benchmarks and near-zero accuracy on tasks requiring the acquisition of new languages through documentation and experimentation rather than memorization.

Aman Sharma, Paras Chopra2026-03-11🤖 cs.AI

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

This study demonstrates that a custom Transformer architecture outperforms both traditional machine learning models and zero-shot generative LLMs in automatically classifying cardiac risk from large-context, unstructured Dutch electronic health records, offering a robust alternative to manual administrative coding for geriatric cardiovascular risk management.

Jacopo Vitale, David Della Morte, Luca Bacco, Mario Merone, Mark de Groot, Saskia Haitjema, Leandro Pecchia, Bram van Es2026-03-11🤖 cs.AI

AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering

This paper introduces AutoViVQA, a large-scale automatically constructed dataset for Vietnamese Visual Question Answering, and evaluates transformer-based multimodal models alongside various automatic metrics to assess their performance and alignment with human judgment in the Vietnamese context.

Nguyen Anh Tuong, Phan Ba Duc, Nguyen Trung Quoc, Tran Dac Thinh, Dang Duy Lan, Nguyen Quoc Thinh, Tung Le2026-03-11🤖 cs.AI

ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

The paper proposes ESAinsTOD, a unified end-to-end schema-aware instruction-tuning framework that leverages full-parameter LLM fine-tuning with instruction and schema alignment mechanisms to achieve superior performance, generalization in low-resource settings, and robustness against noise across diverse task-oriented dialog benchmarks.

Dechuan Teng, Chunlin Lu, Libo Qin, Wanxiang Che2026-03-11🤖 cs.AI

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

The paper introduces ActiveUltraFeedback, an efficient active learning pipeline that leverages uncertainty estimates and novel selection strategies like Double Reverse Thompson Sampling to generate high-quality preference data, enabling Large Language Models to achieve superior alignment performance with as little as one-sixth of the annotated data required by static baselines.

Davit Melikidze, Marian Schneider, Jessica Lam, Martin Wertich, Ido Hakimi, Barna Pásztor, Andreas Krause2026-03-11🤖 cs.AI

Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning

Mousse is a novel optimizer that improves upon the Muon algorithm by integrating Shampoo's Kronecker-factored preconditioning to adaptively handle the heavy-tailed curvature of deep neural networks, thereby achieving faster training convergence with negligible computational overhead.

Yechen Zhang, Shuhao Xing, Junhao Huang, Kai Lv, Yunhua Zhou, Xipeng Qiu, Qipeng Guo, Kai Chen2026-03-11🤖 cs.AI

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

The paper introduces OOD-MMSafe, a benchmark revealing significant causal blindness in current Multimodal Large Language Models regarding hidden consequences, and proposes the Consequence-Aware Safety Policy Optimization (CASPO) framework to effectively mitigate these risks by shifting safety alignment from intent detection to consequence projection.

Ming Wen, Kun Yang, Jingyu Zhang, Yuxuan Liu, shiwen cui, Shouling Ji, Xingjun Ma2026-03-11🤖 cs.AI

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models

This paper introduces MUGEN, a comprehensive benchmark revealing that Large Audio-Language Models struggle with multi-audio understanding as input scaling increases, and demonstrates that combining training-free strategies like Audio-Permutational Self-Consistency with Chain-of-Thought can significantly improve performance.

Chih-Kai Yang, Yun-Shao Tsai, Yu-Kai Guo, Ping-Le Tsai, Yen-Ting Piao, Hung-Wei Chen, Ting-Lin Hsiao, Yun-Man Hsu, Ke-Han Lu, Hung-yi Lee2026-03-11🤖 cs.AI

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

This paper introduces CVS, a training-free data selection method that identifies high-quality vision-language samples by measuring how much a question alters a frozen model's assessment of answer validity, thereby improving multimodal reasoning performance while reducing computational costs.

Peng Sun, Huawen Shen, Yi Ban, Tianfan Fu, Yanbo Wang, Yuqiang Li2026-03-11🤖 cs.AI

AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents

AutoAgent is a self-evolving multi-agent framework that integrates evolving cognition, on-the-fly contextual decision-making, and elastic memory orchestration to enable autonomous agents to adaptively learn from experience and make reliable, context-aware decisions in dynamic environments without external retraining.

Xiaoxing Wang, Ning Liao, Shikun Wei, Chen Tang, Feiyu Xiong2026-03-11🤖 cs.AI

RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

This paper introduces RbtAct, a framework that leverages peer review rebuttals as implicit supervision to train large language models to generate more actionable and specific review feedback through a novel perspective-conditioned task and a new dataset called RMR-75K.

Sihong Wu, Yiling Ma, Yilun Zhao, Tiansheng Hu, Owen Jiang, Manasi Patwardhan, Arman Cohan2026-03-11🤖 cs.AI

EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning

This paper introduces EXPLORE-Bench, a benchmark derived from real first-person videos to evaluate the ability of multimodal large language models to perform long-horizon egocentric scene prediction, revealing significant performance gaps compared to humans and demonstrating that stepwise reasoning offers partial improvements at a computational cost.

Chengjun Yu, Xuhan Zhu, Chaoqun Du, Pengfei Yu, Wei Zhai, Yang Cao, Zheng-Jun Zha2026-03-11🤖 cs.AI

Ego: Embedding-Guided Personalization of Vision-Language Models

The paper proposes "Ego," an efficient personalization method for vision-language models that extracts visual tokens representing target concepts via internal attention mechanisms to serve as memory, enabling strong performance across single-concept, multi-concept, and video personalization tasks without requiring additional training stages or external modules.

Soroush Seifi, Simon Gardier, Vaggelis Dorovatas, Daniel Olmeda Reino, Rahaf Aljundi2026-03-11🤖 cs.AI

← Previous Next →

cs.AI