cs.AI papers | Gist.Science

MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs

This paper presents MM-ISTS, a multimodal framework that leverages vision-text large language models and a novel two-stage encoding mechanism to enhance irregularly sampled time series forecasting by integrating temporal, visual, and textual modalities for improved pattern recognition and contextual understanding.

Zhi Lei, Chenxi Liu, Hao Miao, Wanghui Qiu, Bin Yang, Chenjuan Guo2026-03-09🤖 cs.AI

Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration

This paper identifies a "linguistic blindness" failure mode in Vision-Language-Action (VLA) models where they ignore contradictory instructions in favor of visual priors, and proposes IGAR, a train-free attention recalibration method that effectively restores language grounding and prevents erroneous actions without requiring model retraining.

Ninghao Zhang, Bin Zhu, Shijie Zhou, Jingjing Chen2026-03-09🤖 cs.AI

Demystifying KAN for Vision Tasks: The RepKAN Approach

The paper introduces RepKAN, a novel dual-path architecture that combines CNN efficiency with KAN's non-linear power to achieve state-of-the-art performance and explicit physical interpretability in remote sensing image classification.

Minjong Cheon2026-03-09🤖 cs.AI

MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing

This paper introduces MASFactory, a graph-centric framework that utilizes a human-in-the-loop "Vibe Graphing" approach to automatically compile natural language intents into executable multi-agent system workflows, thereby addressing challenges in manual implementation, component reuse, and context integration while demonstrating effectiveness across seven public benchmarks.

Yang Liu, Jinxuan Cai, Yishen Li, Qi Meng, Zedi Liu, Xin Li, Chen Qian, Chuan Shi, Cheng Yang2026-03-09🤖 cs.AI

Sensitivity-Aware Retrieval-Augmented Intent Clarification

This paper proposes a three-step framework to develop sensitivity-aware retrieval-augmented intent clarification systems that balance user utility with the protection of sensitive information in domains like healthcare and legal contexts by defining attack models, designing retrieval-level defenses, and establishing evaluation metrics for the protection-utility trade-off.

Maik Larooij2026-03-09🤖 cs.AI

Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

This paper investigates failure modes in lightweight Vision-Language Models for automated driving by analyzing intermediate activations to reveal that while some visual concepts like object presence are linearly encoded, others like orientation are not, leading to either perceptual or cognitive failures that are further exacerbated by object distance.

Nikos Theodoridis, Reenu Mohandas, Ganesh Sistu, Anthony Scanlan, Ciarán Eising, Tim Brophy2026-03-09🤖 cs.AI

TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation

TempoSyncDiff is a reference-conditioned latent diffusion framework that employs teacher-student distillation and temporal regularization to enable low-latency, temporally stable, and identity-consistent audio-driven talking head generation suitable for edge deployment.

Soumya Mazumdar, Vineet Kumar Rakesh2026-03-09🤖 cs.AI

Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation

This paper introduces PyPDDLEngine, an open-source PDDL simulation engine that enables agentic LLM planning via step-wise interaction, demonstrating that while this approach yields a modest 3% success rate improvement over direct LLM planning on Blocksworld tasks, it incurs significantly higher costs and lacks the external verification mechanisms that drive success in other coding agent applications.

Kai Göbel, Pierrick Lorang, Patrik Zips, Tobias Glück2026-03-09🤖 cs.AI

Evaluating Austrian A-Level German Essays with Large Language Models for Automated Essay Scoring

This paper evaluates the performance of four state-of-the-art open-weight Large Language Models on Austrian A-level German essay grading using rubric-based prompts, finding that while they can apply standardized criteria, their low agreement rates with human experts (maximum 40.6% on sub-dimensions and 32.8% on final grades) render them currently unsuitable for real-world automated scoring.

Jonas Kubesch, Lena Huber, Clemens Havas2026-03-09🤖 cs.AI

Aggregative Semantics for Quantitative Bipolar Argumentation Frameworks

This paper introduces "aggregative semantics," a novel family of gradual semantics for Quantitative Bipolar Argumentation Frameworks that enhances interpretability and parametrisability by separately aggregating attackers and supporters in a three-stage computation before combining them with an argument's intrinsic weight.

Yann Munro, Isabelle Bloch, Marie-Jeanne Lesot2026-03-09🤖 cs.AI

Text-Driven Emotionally Continuous Talking Face Generation

This paper introduces the novel task of Emotionally Continuous Talking Face Generation (EC-TFG) and proposes the TIE-TFG model, which utilizes text and varying emotion descriptions to synthesize realistic videos featuring smooth, natural emotional transitions rather than fixed expressions.

Hao Yang, Yanyan Zhao, Tian Zheng, Hongbo Zhang, Bichen Wang, Di Wu, Xing Fu, Xuda Zhi, Yongbo Huang, Hao He2026-03-09🤖 cs.AI

Lifelong Embodied Navigation Learning

This paper introduces Uni-Walker, a lifelong embodied navigation framework that addresses catastrophic forgetting in large language model-based agents by decoupling navigation knowledge into shared and task-specific components using DE-LoRA, knowledge inheritance, and expert subspace orthogonality to enable continuous adaptation across diverse scenes and instruction styles.

Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han2026-03-09🤖 cs.AI

StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation

StreamVoiceAnon+ is a streaming speaker anonymization system that preserves emotional content by combining supervised finetuning with neutral-emotion pairs and frame-level acoustic distillation, achieving significant improvements in emotion preservation (49.2% UAR) and intelligibility (5.77% WER) while maintaining strong privacy and zero inference latency overhead.

Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng2026-03-09🤖 cs.AI

Offline Materials Optimization with CliqueFlowmer

This paper introduces CliqueFlowmer, an offline model-based optimization framework that integrates clique-based optimization with transformer and flow generation to effectively discover materials with superior target properties, outperforming traditional generative baselines.

Jakub Grudzien Kuba, Benjamin Kurt Miller, Sergey Levine, Pieter Abbeel2026-03-09🤖 cs.AI

Experiences Build Characters: The Linguistic Origins and Functional Impact of LLM Personality

This study demonstrates that exposing Large Language Models to domain-specific texts via continued pre-training shapes distinct machine personalities that influence problem-solving, revealing a "Suppression Advantage" where reduced social traits enhance complex reasoning while identifying a bimodal competence peak between "Expressive Generalists" and "Suppressed Specialists."

Xi Wang, Mengdie Zhuang, Jiqun Liu2026-03-09🤖 cs.AI

Making Implicit Premises Explicit in Logical Understanding of Enthymemes

This paper proposes a neuro-symbolic pipeline that integrates large language models for generating implicit premises and translating natural language into logical formulas, alongside a SAT-based reasoner, to systematically decode enthymemes and verify logical entailment, demonstrating promising performance on existing datasets.

Xuyao Feng, Anthony Hunter2026-03-09🤖 cs.AI

A Hazard-Informed Data Pipeline for Robotics Physical Safety

This paper introduces a structured Robotics Physical Safety Framework that bridges classical risk engineering with modern machine learning by utilizing explicit asset declaration, vulnerability enumeration, and hazard-driven synthetic data generation to train models on formalized safety envelopes.

Alexei Odinokov, Rostislav Yavorskiy2026-03-09🤖 cs.AI

A Causal Graph Approach to Oppositional Narrative Analysis

This paper proposes a graph-based framework that models narratives as entity-interaction graphs and employs causal estimation to distill minimal causal subgraphs, thereby achieving superior performance in classifying oppositional narratives while mitigating the biases inherent in traditional black-box models.

Diego Revilla, Martin Fernandez-de-Retana, Lingfeng Chen, Aritz Bilbao-Jayo, Miguel Fernandez-de-Retana2026-03-09🤖 cs.AI

Partial Policy Gradients for RL in LLMs

This paper introduces a partial policy gradient method for reinforcement learning in LLMs that optimizes subsets of future rewards to enable the reliable learning and comparison of diverse policy classes, such as greedy, K-step lookahead, and segment policies, which demonstrate varying effectiveness across different persona-alignment conversational tasks.

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai2026-03-09🤖 cs.AI

Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion

Place-it-R1 is an end-to-end framework that leverages Multimodal Large Language Models (MLLMs) with Chain-of-Thought reasoning to orchestrate video diffusion via a "Think-then-Place" paradigm, ensuring physically consistent and environment-aware video object insertion through iterative refinement and user-controllable plausibility-fidelity trade-offs.

Bohai Gu, Taiyi Wu, Dazhao Du, Jian Liu, Shuai Yang, Xiaotong Zhao, Alan Zhao, Song Guo2026-03-09🤖 cs.AI

← Previous Next →