cs.AI papers | Gist.Science

Aero-Promptness: Drag-Aware Aerodynamic Manipulability for Propeller-driven Vehicles

This paper introduces Drag-Aware Aerodynamic Manipulability (DAAM), a geometric framework for control allocation in redundant multirotors that utilizes a Riemannian metric to explicitly account for motor torque limits and aerodynamic drag, thereby generating a state-dependent manipulability volume that serves as a natural barrier function to optimize redundancy resolution while characterizing the resulting smooth manifolds and global jump discontinuities.

Antonio Franchi2026-03-10🔢 math

ViSA-Enhanced Aerial VLN: A Visual-Spatial Reasoning Enhanced Framework for Aerial Vision-Language Navigation

This paper proposes the ViSA-enhanced framework, a triple-phase collaborative architecture that leverages structured visual prompting to enable Vision-Language Models to perform direct spatial reasoning on image planes, achieving a 70.3% improvement in success rate over state-of-the-art aerial Vision-Language Navigation methods on the CityNav benchmark.

Haoyu Tong, Xiangyu Dong, Xiaoguang Ma, Haoran Zhao, Yaoming Zhou, Chenghao Lin2026-03-10💻 cs

PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents

This paper introduces PIRA-Bench, a novel benchmark and the PIRF baseline framework designed to advance GUI agents from reactive instruction-following to proactive intent recommendation by evaluating their ability to anticipate user needs from noisy, continuous visual inputs.

Yuxiang Chai, Shunye Tang, Han Xiao, Rui Liu, Hongsheng Li2026-03-10💻 cs

FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning

FedMomentum is a novel federated fine-tuning framework that preserves LoRA training momentum and ensures mathematically correct aggregation by using singular value decomposition (SVD) to extract dominant update directions while retaining residual components, thereby achieving faster convergence and higher accuracy than existing methods.

Peishen Yan, Yang Hua, Hao Wang, Jiaru Zhang, Xiaoyu Wu, Tao Song, Haibing Guan2026-03-10🤖 cs.LG

Alignment--Process--Outcome: Rethinking How AIs and Humans Collaborate

This paper proposes a unified dynamic framework using "task" and "intent" lenses to reconceptualize human-AI collaboration, arguing that alignment, process structure, and outcome quality are non-linearly related and require a structural analysis beyond simple outcome metrics.

Haichang Li, Anjun Zhu, Arpit Narechania2026-03-10💻 cs

Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model

This paper introduces MambaDance, a novel dance generation framework that replaces Transformers with a Mamba-based diffusion model and employs a Gaussian-based beat representation to effectively capture the sequential, rhythmic, and music-synchronized nature of dance across varying sequence lengths.

Sangjune Park, Inhyeok Choi, Donghyeon Soon, Youngwoo Jeon, Kyungdon Joo2026-03-10💻 cs

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

DyLLM is a training-free inference framework that accelerates Masked Diffusion Language Models by identifying and recomputing only temporally stable "salient tokens" while reusing cached activations for the rest, achieving up to 9.6x higher throughput with minimal accuracy loss.

Younjoo Lee, Junghoo Lee, Seungkyun Dan, Jaiyoung Park, Jung Ho Ahn2026-03-10💬 cs.CL

GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables

The paper proposes GCGNet, a Graph-Consistent Generative Network that integrates a Variational Generator, Graph Structure Aligner, and Graph Refiner to jointly model temporal and channel correlations in a noise-robust manner, thereby outperforming state-of-the-art methods in time series forecasting with exogenous variables.

Zhengyu Li, Xiangfei Qiu, Yuhan Zhu, Xingjian Wu, Jilin Hu, Chenjuan Guo, Bin Yang2026-03-10🤖 cs.LG

Solution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout

This paper presents a robust multimodal framework for the 10th ABAW Expression Recognition Challenge that utilizes a dual-branch Transformer with safe cross-attention and modality dropout to dynamically fuse audio and visual data, effectively addressing partial occlusions, missing modalities, and class imbalance to achieve 60.79% accuracy on the Aff-Wild2 validation set.

Jun Yu, Naixiang Zheng, Guoyuan Wang, Yunxiang Zhang, Lingsi Zhu, Jiaen Liang, Wei Huang, Shengping Liu2026-03-10💻 cs

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

CDRRM introduces a contrast-driven framework that generates high-quality, interpretable rubrics from preference pairs to guide reward modeling, achieving state-of-the-art performance and superior data efficiency while mitigating common evaluation biases.

Dengcan Liu, Fengkai Yang, Xiaohan Wang, Shurui Yan, Jiajun Chai, Jiahao Li, Yikun Ban, Zhendong Mao, Wei Lin, Guojun Yin2026-03-10🤖 cs.LG

S2S-FDD: Bridging Industrial Time Series and Natural Language for Explainable Zero-shot Fault Diagnosis

The paper proposes S2S-FDD, a novel framework that bridges the semantic gap between high-dimensional industrial time-series signals and natural language by converting sensor data into descriptive summaries and utilizing a multi-turn tree-structured reasoning process with historical documents to achieve explainable, zero-shot fault diagnosis.

Baoxue Li, Chunhui Zhao2026-03-10💻 cs

Speed3R: Sparse Feed-forward 3D Reconstruction Models

Speed3R is a sparse feed-forward 3D reconstruction model that overcomes the quadratic computational bottleneck of dense attention by employing a dual-branch mechanism to focus on informative tokens, achieving a 12.4x inference speedup with minimal accuracy trade-offs.

Weining Ren, Xiao Tan, Kai Han2026-03-10💻 cs

ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning

ImageEdit-R1 is a novel multi-agent framework that employs reinforcement learning to coordinate specialized vision-language and generative agents, enabling dynamic, context-aware image editing that outperforms existing monolithic models and baselines in handling complex, multi-step user instructions.

Yiran Zhao, Yaoqi Ye, Xiang Liu, Michael Qizhe Shieh, Trung Bui2026-03-10💻 cs

In-Context Reinforcement Learning for Tool Use in Large Language Models

This paper proposes In-Context Reinforcement Learning (ICRL), a novel framework that eliminates the need for supervised fine-tuning by leveraging few-shot prompting during reinforcement learning rollouts to progressively teach large language models how to effectively use external tools, ultimately achieving state-of-the-art performance in a data-efficient, zero-shot manner.

Yaoqi Ye, Yiran Zhao, Keyu Duan, Zeyu Zheng, Kenji Kawaguchi, Cihang Xie, Michael Qizhe Shieh2026-03-10💻 cs

DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

This paper introduces DSH-Bench, a comprehensive benchmark featuring a hierarchical subject taxonomy, granular difficulty and scenario classification, and a novel Subject Identity Consistency Score (SICS) metric to systematically evaluate and diagnose subject-driven text-to-image generation models.

Zhenyu Hu, Qing Wang, Te Cao, Luo Liao, Longfei Lu, Liqun Liu, Shuang Li, Hang Chen, Mengge Xue, Yuan Chen, Chao Deng, Peng Shu, Huan Yu, Jie Jiang2026-03-10💻 cs

DC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning

This paper introduces the Dual-Consensus Weak-to-Strong (DC-W2S) framework, which enhances the reliability of Process Reward Models in biological reasoning by strategically filtering noisy weak supervision signals through self- and neighborhood-consensus metrics to enable robust training without exhaustive expert annotation.

Chi-Min Chan, Ehsan Hajiramezanali, Xiner Li, Edward De Brouwer, Carl Edwards, Wei Xue, Sirui Han, Yike Guo, Gabriele Scalia2026-03-10🤖 cs.LG

UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking

This paper identifies the critical limitation of current LLM-based agents in accessing unindexed information, introduces the first dedicated UIS-QA benchmark to quantify this challenge, and proposes UIS-Digger, a multi-agent framework that significantly outperforms state-of-the-art models by effectively combining dual-mode browsing and file parsing to retrieve vital unindexed data.

Chang Liu, Chuqiao Kuang, Tianyi Zhuang, Yuxin Cheng, Huichi Zhou, Xiaoguang Li, Lifeng Shang2026-03-10💻 cs

SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action

SaiVLA-0 introduces a neuroscience-inspired, compute-aware Vision-Language-Action framework featuring a tripartite Cerebrum-Pons-Cerebellum architecture that decouples high-level semantics from real-time control to achieve modular scalability, active foveated vision, and significant improvements in training efficiency and task success rates.

Xiang Shi, Wenlong Huang, Menglin Zou, Xinhai Sun2026-03-10🤖 cs.LG

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

FoleyFlow introduces a novel video-to-audio generation framework that achieves superior semantic and rhythmic synchronization by aligning unimodal encoders through masked audio-visual modeling and employing a dynamic conditional flow that utilizes temporally varying video features to guide audio synthesis.

Shentong Mo, Yibing Song2026-03-10🤖 cs.LG

DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding

The paper introduces DARC, a retraining-free, inference-time method that mitigates the brittleness of standard preference alignment by framing response selection as a distributionally robust, risk-sensitive decision-making process to explicitly manage annotator disagreement and tail risk without compromising average quality.

Mingxi Zou, Jiaxiang Chen, Junfan Li, Langzhang Liang, Qifan Wang, Xu Yinghui, Zenglin Xu2026-03-10🤖 cs.LG

← Previous Next →