cs.AI papers | Gist.Science

SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action

SaiVLA-0 introduces a neuroscience-inspired, compute-aware Vision-Language-Action framework featuring a tripartite Cerebrum-Pons-Cerebellum architecture that decouples high-level semantics from real-time control to achieve modular scalability, active foveated vision, and significant improvements in training efficiency and task success rates.

Xiang Shi, Wenlong Huang, Menglin Zou, Xinhai Sun2026-03-10🤖 cs.LG

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

FoleyFlow introduces a novel video-to-audio generation framework that achieves superior semantic and rhythmic synchronization by aligning unimodal encoders through masked audio-visual modeling and employing a dynamic conditional flow that utilizes temporally varying video features to guide audio synthesis.

Shentong Mo, Yibing Song2026-03-10🤖 cs.LG

DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding

The paper introduces DARC, a retraining-free, inference-time method that mitigates the brittleness of standard preference alignment by framing response selection as a distributionally robust, risk-sensitive decision-making process to explicitly manage annotator disagreement and tail risk without compromising average quality.

Mingxi Zou, Jiaxiang Chen, Junfan Li, Langzhang Liang, Qifan Wang, Xu Yinghui, Zenglin Xu2026-03-10🤖 cs.LG

Gradually Excavating External Knowledge for Implicit Complex Question Answering

This paper proposes a gradual knowledge excavation framework that enables large language models to iteratively acquire external information and perform logical reasoning, achieving state-of-the-art performance on complex open-domain question answering with significantly fewer parameters.

Chang Liu, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Edmund Y. Lam, Ngai Wong2026-03-10💬 cs.CL

An explainable hybrid deep learning-enabled intelligent fault detection and diagnosis approach for automotive software systems validation

This paper proposes a novel explainable hybrid deep learning framework combining 1D-CNN and GRU architectures with interpretability techniques like IGs and SHAP to enhance fault detection, diagnosis, and root cause analysis in automotive software system validation while overcoming the limitations of traditional black-box models.

Mohammad Abboush, Ehab Ghannoum, Andreas Rausch2026-03-10💻 cs

Evidence-Driven Reasoning for Industrial Maintenance Using Heterogeneous Data

This paper introduces the Condition Insight Agent, a deployed decision-support framework that integrates heterogeneous industrial data sources through constrained, rule-verified LLM reasoning to generate evidence-grounded maintenance explanations and actionable advice while ensuring reliability and human oversight.

Fearghal O'Donncha, Nianjun Zhou, Natalia Martinez, James T Rayfield, Fenno F. Heath III, Abigail Langbridge, Roman Vaculin2026-03-10💻 cs

Evolution Strategy-Based Calibration for Low-Bit Quantization of Speech Models

This paper introduces ESC, an Evolution Strategy-based calibration method that addresses the unique challenges of audio signal quantization by optimizing activation scaling, thereby achieving near-lossless performance for INT4 and full INT8 quantization across multiple speech tasks.

Lucas Rakotoarivony2026-03-10💻 cs

Is continuous CoT better suited for multi-lingual reasoning?

This paper demonstrates that performing reasoning in a continuous latent space via the CODI framework significantly outperforms standard explicit reasoning in multilingual settings, particularly for low-resource and zero-shot scenarios, while achieving substantial compression of reasoning traces.

Ali Hamza Bashir, Behzad Shomali, Markus Frey, Mehdi Ali, Rafet Sifa, David Berghaus2026-03-10🤖 cs.LG

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

This paper reveals that hidden states in end-to-end full-duplex speech models like SALM-Duplex and Moshi significantly leak speaker identity, and proposes two streaming anonymization methods using Stream-Voice-Anon that effectively mitigate this privacy risk while maintaining low-latency dialogue performance.

Nikita Kuzmin, Tao Zhong, Jiajun Deng, Yingke Zhu, Tristan Tsoi, Tianxiang Cao, Simon Lui, Kong Aik Lee, Eng Siong Chng2026-03-10💻 cs

TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation

This paper introduces TildeOpen LLM, a 30-billion-parameter open-weight model that achieves superior performance across 34 European languages, particularly for low-resource groups, by employing curriculum learning and dataset upsampling to address data imbalances without requiring increased computational resources.

Toms Bergmanis, Martins Kronis, Ingus J\=anis Pretkalninš, D\=avis Nicmanis, Jelizaveta Jelinska, Roberts Rozis, Rinalds V\=iksna, M\=arcis Pinnis2026-03-10💬 cs.CL

MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data

This paper proposes MM-TS, a novel framework for multi-modal contrastive learning that dynamically adjusts temperature and margin schedules based on local data distribution to address long-tail imbalances, unifying InfoNCE and max-margin objectives to achieve state-of-the-art performance across multiple image- and video-language datasets.

Siarhei Sheludzko, Dhimitrios Duka, Bernt Schiele, Hilde Kuehne, Anna Kukleva2026-03-10💻 cs

Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules

This paper critiques the reliance of current tabular foundation model benchmarks on point-estimate metrics like MSE, advocating instead for the adoption of proper scoring rules such as CRPS to evaluate probabilistic forecasts and the use of finetuning or promptable strategies to align model inductive biases with distributional regression goals.

Jonas Landsgesell, Pascal Knoll2026-03-10🤖 cs.LG

Alignment-Aware and Reliability-Gated Multimodal Fusion for Unmanned Aerial Vehicle Detection Across Heterogeneous Thermal-Visual Sensors

This paper proposes two novel fusion strategies, Registration-aware Guided Image Fusion (RGIF) and Reliability-Gated Modality-Attention Fusion (RGMAF), which effectively integrate heterogeneous thermal and visual sensor data to significantly enhance unmanned aerial vehicle detection performance across diverse perspectives and resolutions.

Ishrat Jahan, Molla E Majid, M Murugappan, Muhammad E. H. Chowdhury, N. B. Prakash, Saad Bin Abul Kashem, Balamurugan Balusamy, Amith Khandakar2026-03-10💻 cs

Revisiting Gradient Staleness: Evaluating Distance Metrics for Asynchronous Federated Learning Aggregation

This paper extends the adaptive aggregation method of AsyncFedED by exploring alternative distance metrics to better capture gradient staleness in asynchronous federated learning, demonstrating that specific metrics improve convergence, accuracy, and stability under heterogeneous and non-IID conditions.

Patrick Wilhelm, Odej Kao2026-03-10🤖 cs.LG

SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration

SplitAgent introduces a novel distributed architecture that enables privacy-preserving collaboration between enterprise and cloud AI agents by utilizing context-aware dynamic sanitization, differential privacy, and zero-knowledge verification to achieve high task accuracy while significantly reducing data leakage compared to static approaches.

Jianshu She2026-03-10💻 cs

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction

This paper introduces a systematic framework for Large Audio-Language Models that reformulates ambiguous emotion recognition as a distributional reasoning problem, utilizing an ambiguity-aware objective and structured chain-of-thought supervision to significantly improve performance on standard benchmarks.

Xiaofeng Yu, Jiaheng Dong, Jean Honorio, Abhirup Ghosh, Hong Jia, Ting Dang2026-03-10💻 cs

The Struggle Between Continuation and Refusal: A Mechanistic Analysis of the Continuation-Triggered Jailbreak in LLMs

This paper investigates the continuation-triggered jailbreak phenomenon in large language models, revealing through mechanistic interpretability analysis that its root cause lies in the inherent competition between the model's intrinsic continuation drive and its safety alignment defenses, while also identifying distinct behavioral patterns in safety-critical attention heads across different architectures.

Yonghong Deng, Zhen Yang, Ping Jian, Xinyue Zhang, Zhongbin Guo, Chengzhi Li2026-03-10🤖 cs.LG

Exploring Deep Learning and Ultra-Widefield Imaging for Diabetic Retinopathy and Macular Edema

This study leverages the MICCAI 2024 UWF4DR dataset to benchmark state-of-the-art deep learning models, including CNNs, Vision Transformers, and foundation models, in both spatial and frequency domains for image quality assessment, referable diabetic retinopathy detection, and diabetic macular edema identification using ultra-widefield imaging, demonstrating that feature-level fusion and frequency-domain representations yield robust and explainable results.

Pablo Jimenez-Lizcano, Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Guillermo González de Rivera, Ruben Vera-Rodriguez, Julian Fierrez2026-03-10💻 cs

Fibration Policy Optimization

This paper introduces Fibration Policy Optimization (FiberPO), a unified framework that bridges trust-region theory and compositional algebraic structures to enable principled, multi-scale stability control in large language model training through the novel Aggregational Policy Censoring Objective and Fiber Bundle Gating mechanism.

Chang Li, Tshihao Tsu, Yaren Zhang, Chao Xue, Xiaodong He2026-03-10🤖 cs.LG

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

The paper introduces FinToolBench, the first real-world, runnable benchmark that evaluates LLM agents on 760 executable financial tools using a novel framework assessing timeliness, intent, and regulatory compliance, alongside a proposed finance-aware baseline named FATR to advance trustworthy agentic AI in finance.

Jiaxuan Lu, Kong Wang, Yemin Wang, Qingmei Tang, Hongwei Zeng, Xiang Chen, Jiahao Pi, Shujian Deng, Lingzhi Chen, Yi Fu, Kehua Yang, Xiao Sun2026-03-10💻 cs

← Previous Next →