cs.AI papers | Gist.Science

CLoE: Expert Consistency Learning for Missing Modality Segmentation

The paper proposes CLoE, a consistency-driven framework that enhances missing-modality medical image segmentation by enforcing decision-level agreement among modality experts on both global and clinically critical foreground regions, thereby improving robustness and generalization compared to state-of-the-art methods.

Xinyu Tong, Meihua Zhou, Bowu Fan, Haitao LiWed, 11 Ma🤖 cs.AI

DenoiseSplat: Feed-Forward Gaussian Splatting for Noisy 3D Scene Reconstruction

The paper proposes DenoiseSplat, a feed-forward 3D Gaussian splatting method that achieves robust novel-view synthesis from noisy multi-view images by training end-to-end on a newly constructed noisy-clean benchmark without requiring 3D ground truth.

Fuzhen Jiang, Zhuoran Li, Yinlin ZhangWed, 11 Ma🤖 cs.AI

DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data

This paper introduces DendroNN, a novel dendrocentric neural network that leverages non-differentiable sequence detection and a rewiring phase to efficiently classify event-based spatiotemporal data, achieving competitive accuracy with up to 4x higher energy efficiency than state-of-the-art neuromorphic hardware through a dedicated asynchronous digital architecture.

Jann Krausse, Zhe Su, Kyrus Mama, Maryada, Klaus Knobloch, Giacomo Indiveri, Jürgen BeckerWed, 11 Ma🤖 cs.AI

Multi-model approach for autonomous driving: A comprehensive study on traffic sign-, vehicle- and lane detection and behavioral cloning

This study presents a comprehensive multi-model deep learning approach that integrates pre-trained and custom neural networks with advanced data augmentation and transfer learning techniques to enhance autonomous driving capabilities by effectively addressing traffic sign classification, vehicle and lane detection, and behavioral cloning across diverse datasets.

Kanishkha Jaisankar, Pranav M. Pawar, Diana Susane Joseph, Raja Muthalagu, Mithun MukherjeeWed, 11 Ma🤖 cs.AI

BridgeDiff: Bridging Human Observations and Flat-Garment Synthesis for Virtual Try-Off

BridgeDiff is a diffusion-based framework that improves virtual try-off by bridging the gap between on-body observations and flat-garment synthesis through a Garment Condition Bridge Module for robust appearance inference and a Flat Structure Constraint Module for enhanced structural stability.

Shuang Liu, Ao Yu, Linkang Cheng, Xiwen Huang, Li Zhao, Junhui Liu, Zhiting Lin, Yu LiuWed, 11 Ma🤖 cs.AI

Embodied Human Simulation for Quantitative Design and Analysis of Interactive Robotics

This paper presents a scalable, reinforcement learning-driven simulation framework featuring a full-body musculoskeletal model that enables the quantitative co-optimization of robotic structural design and control policies by providing direct access to internal human biomechanical metrics for interactive robotics.

Chenhui Zuo, Jinhao Xu, Michael Qian Vergnolle, Yanan SuiWed, 11 Ma🤖 cs.AI

Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing

This paper investigates emotion as a latent factor influencing LLM attention and reasoning, introducing the AURA-QA dataset and an emotional regularization framework that demonstrably improves reading comprehension performance across both emotionally varying and standard benchmarks.

Benjamin Reichman, Adar Avasian, Samuel Webster, Larry HeckWed, 11 Ma🤖 cs.AI

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Latent-DARM is a novel latent-space communication framework that bridges Discrete Diffusion Language Models for global planning and Autoregressive Models for fluent execution, significantly improving reasoning accuracy on benchmarks like DART-5 and AIME2024 while drastically reducing token usage compared to state-of-the-art reasoning models.

Lina Berrayana, Ahmed Heakl, Abdullah Sohail, Thomas Hofmann, Salman Khan, Wei ChenWed, 11 Ma🤖 cs.AI

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

DuplexCascade is a VAD-free, cascaded ASR-LLM-TTS pipeline that enables full-duplex speech-to-speech dialogue by converting long turns into micro-turns and utilizing special control tokens to coordinate turn-taking while preserving the conversational intelligence of large language models.

Jianing Yang, Yusuke Fujita, Yui SudoWed, 11 Ma🤖 cs.AI

Differentiable Stochastic Traffic Dynamics: Physics-Informed Generative Modelling in Transportation

This paper proposes a physics-informed generative modeling framework that derives a differentiable, distributional traffic dynamics model from stochastic Ito-type equations, enabling the estimation of traffic density distributions, credible intervals, and congestion risks through a score network trained with denoising score matching and Fokker-Planck residual loss.

Wuping XinWed, 11 Ma🤖 cs.AI

Reinforced Generation of Combinatorial Structures: Ramsey Numbers

This paper introduces AlphaEvolve, an LLM-based code mutation agent that serves as a unified meta-algorithm to improve the lower bounds of five classical Ramsey numbers and successfully recover or match existing bounds across various cases, demonstrating a shift from bespoke search methods to a single, adaptable framework for combinatorial structure generation.

Ansh Nagda, Prabhakar Raghavan, Abhradeep ThakurtaWed, 11 Ma🤖 cs.AI

ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

ZeroWBC is a novel framework that enables natural, versatile whole-body control for humanoid robots by learning visuomotor policies directly from human egocentric videos, thereby eliminating the need for expensive and time-consuming teleoperation data collection.

Haoran Yang, Jiacheng Bao, Yucheng Xin, Haoming Song, Yuyang Tian, Bin Zhao, Dong Wang, Xuelong LiWed, 11 Ma🤖 cs.AI

GIAT: A Geologically-Informed Attention Transformer for Lithology Identification

The paper proposes GIAT, a novel Geologically-Informed Attention Transformer that integrates Category-Wise Sequence Correlation filters into the self-attention mechanism to guide lithology identification with geological priors, achieving state-of-the-art accuracy and enhanced interpretability on well log datasets.

Jie Li, Qishun Yang, Nuo LiWed, 11 Ma🤖 cs.AI

Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL

This paper proposes a cost-effective framework that leverages structurally informative but functionally imperfect LLM-generated RTL to train netlist representation models, effectively overcoming data scarcity and outperforming methods reliant on scarce high-quality labeled datasets.

Siyang Cai, Cangyuan Li, Yinhe Han, Ying WangWed, 11 Ma🤖 cs.AI

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

RubiCap introduces a novel reinforcement learning framework that leverages LLM-generated rubrics to create structured, multi-faceted reward signals for dense image captioning, thereby overcoming the limitations of supervised distillation and deterministic checkers to achieve state-of-the-art performance and superior word efficiency across various benchmarks.

Tzu-Heng Huang, Sirajul Salekin, Javier Movellan, Frederic Sala, Manjot BilkhuWed, 11 Ma🤖 cs.AI

Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning

This paper proposes a Probability of Necessity and Sufficiency (PNS)-based regularization method for Class-Incremental Learning that utilizes a dual-scope counterfactual generator to mitigate feature collisions caused by intra-task shortcut reliance and inter-task semantic confusion, thereby ensuring both the causal completeness and separability of task-specific representations.

Zhen Zhang, Jielei Chu, Tianrui LiWed, 11 Ma🤖 cs.AI

QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model

The paper proposes QUSR, a novel diffusion-based image super-resolution model that combines an Uncertainty-Guided Noise Generation module to adaptively perturb high-uncertainty regions and a Quality-Aware Prior leveraging Multimodal Large Language Models to guide restoration, thereby achieving high-fidelity results in real-world scenarios with unknown and non-uniform degradations.

Junjie Yin, Jiaju Li, Hanfa XingWed, 11 Ma🤖 cs.AI

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

DexHiL is the first integrated human-in-the-loop framework for dexterous Vision-Language-Action models that combines coordinated arm-hand teleoperation with intervention-aware data sampling to significantly improve post-training performance and reliability in complex manipulation tasks.

Yifan Han, Zhongxi Chen, Yuxuan Zhao, Congsheng Xu, Yanming Shao, Yichuan Peng, Yao Mu, Wenzhao LianWed, 11 Ma🤖 cs.AI

PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings

The paper introduces PM-Nav, a novel framework that leverages priori-semantic maps and hierarchical chain-of-thought prompting to overcome the challenges of language-driven navigation in functional buildings with highly similar features, achieving substantial performance improvements over existing methods in both simulation and real-world environments.

Jiang Gao, Xiangyu Dong, Haozhou Li, Haoran Zhao, Yaoming Zhou, Xiaoguang MaWed, 11 Ma🤖 cs.AI

VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs

VIVID-Med introduces a novel framework that leverages a frozen large language model as a structured semantic teacher to pretrain lightweight, deployable medical Vision Transformers via a Unified Medical Schema and Structured Prediction Decomposition, achieving state-of-the-art performance across diverse medical imaging tasks with significantly reduced data requirements compared to existing vision-language models.

Xiyao Wang, Xiaoyu Tan, Yang Dai, Yuxuan Fu, Shuo Li, Xihe QiuWed, 11 Ma🤖 cs.AI

← Previous Next →