cs.AI papers | Gist.Science

LaVCa: LLM-assisted Visual Cortex Captioning

The paper proposes LaVCa, a novel data-driven approach that leverages large language models to generate natural-language captions for images, thereby providing more accurate and detailed interpretations of human visual cortex voxel selectivity and revealing fine-grained functional differentiation within the visual cortex compared to existing deep neural network-based methods.

Takuya Matsuyama, Shinji Nishimoto, Yu TakagiTue, 10 Ma🤖 cs.LG

Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective

This paper proposes a Clustering-On-Difficulty (COD) framework that groups tasks by their difficulty scaling features to predict downstream LLM performance with high accuracy (1.55% error), effectively addressing challenges like emergent capabilities and inconsistent scaling patterns.

Chengyin Xu, Kaiyuan Chen, Xiao Li, Ke Shen, Chenggang LiTue, 10 Ma🤖 cs.LG

Subclass Classification of Gliomas Using MRI Fusion Technique

This study proposes a high-accuracy glioma subclass classification framework that fuses 2D and 3D UNET-segmented multimodal MRI images using weighted averaging and classifies them via a pre-trained ResNet50 model, achieving a 99.25% accuracy rate.

Kiranmayee Janardhan, Christy Bobby ThomasTue, 10 Ma💻 cs

Deep Learning-Based Approach for Automatic 2D and 3D MRI Segmentation of Gliomas

This paper proposes a deep learning-based approach utilizing UNET, Inception, and ResNet architectures to achieve automatic 2D and 3D glioma segmentation on BraTS datasets, demonstrating that a ResNet model effectively balances computational efficiency and spatial accuracy to significantly improve diagnosis with high accuracy and Dice scores.

Kiranmayee Janardhan, Christy Bobby TTue, 10 Ma💻 cs

Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems in Minecraft

This paper proposes a novel dual-thread framework that enables concurrent planning and acting with interruptible execution for LLM-based multi-agent systems, overcoming the limitations of serialized paradigms to enhance real-time responsiveness and adaptability in dynamic environments like Minecraft.

Yaoru Li, Shunyu Liu, Tongya Zheng, Li Sun, Mingli SongTue, 10 Ma💻 cs

Enhancing Alzheimer's Diagnosis: Leveraging Anatomical Landmarks in Graph Convolutional Neural Networks on Tetrahedral Meshes

This paper proposes a novel transformer-based geometric deep learning model that tokenizes tetrahedral meshes with anatomical landmarks to accurately classify Alzheimer's disease and predict brain amyloid positivity in medium-risk individuals, offering a robust alternative to costly and invasive PET scans.

Yanxi Chen, Mohammad Farazi, Zhangsihao Yang, Yonghui Fan, Nicholas Ashton, Eric M Reiman, Yi Su, Yalin WangTue, 10 Ma💻 cs

ViLAM: Distilling Vision-Language Reasoning into Attention Maps for Social Robot Navigation

ViLAM is a novel method that distills vision-language reasoning from large Vision-Language Models into spatial attention maps to guide socially compliant robot navigation, achieving significant improvements in success rates through real-world validation.

Mohamed Elnoor, Kasun Weerakoon, Gershom Seneviratne, Jing Liang, Vignesh Rajagopal, Dinesh ManochaTue, 10 Ma💻 cs

IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models

The paper proposes IMPACT, a novel motion planning framework that leverages Vision-Language Models to infer environment semantics and generate anisotropic cost maps, enabling a contact-aware A* planner to safely navigate cluttered environments by distinguishing between acceptable and dangerous object contacts.

Yiyang Ling, Karan Owalekar, Oluwatobiloba Adesanya, Erdem Bıyık, Daniel SeitaTue, 10 Ma🤖 cs.LG

Engineering Systems for Data Analysis Using Interactive Structured Inductive Programming

The paper introduces iProg, an interactive tool that leverages a structured communication protocol between humans and large language models to decompose scientific data analysis tasks into declarative Data Flow Diagrams and generate corresponding code, thereby achieving significantly faster development, higher code quality, and better performance than traditional Low Code/No Code alternatives.

Shraddha Surana, Ashwin Srinivasan, Michael BainTue, 10 Ma💻 cs

More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models

This paper reveals that while Large Language Models overrepresent female characters due to fine-tuning, they paradoxically still reinforce traditional occupational gender stereotypes more than real-world labor data, highlighting the need for nuanced bias mitigation strategies.

Evan Chen, Run-Jun Zhan, Yan-Bai Lin, Hung-Hsuan ChenTue, 10 Ma💬 cs.CL

From 2D Alignment to 3D Plausibility: Unifying Heterogeneous 2D Priors and Penetration-Free Diffusion for Occlusion-Robust Two-Hand Reconstruction

This paper proposes a unified framework for occlusion-robust two-hand reconstruction that combines a fusion-alignment encoder to implicitly integrate heterogeneous 2D structural priors from vision foundation models with a penetration-free diffusion model that guides 3D pose generation toward collision-free, kinematically coherent interactions.

Gaoge Han, Yongkang Cheng, Zhe Chen, Shaoli Huang, Tongliang LiuTue, 10 Ma💻 cs

More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty

The paper introduces EDU-PRM, an entropy-driven process reward model that automatically identifies reasoning step boundaries using predictive entropy to eliminate manual annotations, achieving state-of-the-art performance with only 1.5% of the training data while significantly improving accuracy and reducing token usage.

Lang Cao, Renhong Chen, Yingtian Zou, Chao Peng, Huacong Xu, Yuxian Wang, Wu Ning, Qian Chen, Mofan Peng, Zijie Chen, Peishuo Su, Yitong LiTue, 10 Ma🤖 cs.LG

MediTools -- Medical Education Powered by LLMs

This paper introduces MediTools, an AI-powered prototype application that leverages large language models to revolutionize medical education through interactive dermatology case simulations, enhanced literature analysis, and automated medical news summaries, while validating its potential through a survey of medical professionals and students.

Amr Alshatnawi, Remi Sampaleanu, David LiebovitzTue, 10 Ma💻 cs

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

This paper presents a comprehensive review that consolidates fragmented evaluation efforts into a unified taxonomy of approximately 60 benchmarks, surveys AI-agent frameworks and collaboration protocols, and explores real-world applications and future research directions for autonomous AI agents.

Mohamed Amine Ferrag, Norbert Tihanyi, Merouane DebbahTue, 10 Ma🤖 cs.LG

SFIBA: Spatial-based Full-target Invisible Backdoor Attacks

The paper proposes SFIBA, a spatial-based full-target invisible backdoor attack that ensures trigger specificity and stealthiness in black-box settings by restricting triggers to local spatial regions and employing a frequency-domain injection method, thereby achieving high attack performance while evading existing defenses.

Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Zhishuai Li, Weifeng LiuTue, 10 Ma💻 cs

Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning

This paper introduces Task 5 of the DCASE 2025 Challenge, a multi-domain Audio Question Answering benchmark designed to evaluate and advance the acoustic reasoning capabilities of audio-language models across diverse scenarios including bioacoustics, temporal soundscapes, and complex real-world clips.

Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, S Sakshi, Vaibhavi Lokegaonkar, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha, Gunhee Kim, Jun Du, Rafael Valle, Bryan CatanzaroTue, 10 Ma💬 cs.CL

Precision Proactivity: Measuring Cognitive Load in Real-World AI-Assisted Work

This study of financial professionals using AI reveals that while AI-generated content improves task quality, unsolicited proactive interventions significantly increase extraneous cognitive load—particularly for less experienced users—thereby degrading performance more severely than intrinsic task complexity.

Brandon Lepine, Juho Kim, Pamela Mishkin, Matthew BeaneTue, 10 Ma💻 cs

Ready2Unlearn: A Learning-Time Approach for Preparing Models with Future Unlearning Readiness

This paper introduces Ready2Unlearn, a proactive, model-agnostic training-time optimization approach that leverages meta-learning principles to prepare machine learning models for efficient and principled future unlearning, shifting the focus from reactive post-deployment algorithms to forward-looking readiness.

Hanyu Duan, Yi Yang, Ahmed Abbasi, Kar Yan TamTue, 10 Ma🤖 cs.LG

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

FreeKV is a training-free framework that combines speculative retrieval, fine-grained correction, and hybrid CPU-GPU memory management to significantly accelerate KV cache retrieval for large language models, achieving up to a 13× speedup over state-of-the-art methods while maintaining near-lossless accuracy.

Guangda Liu, Chengwei Li, Zhenyu Ning, Jing Lin, Yiwu Yao, Danning Ke, Minyi Guo, Jieru ZhaoTue, 10 Ma🤖 cs.LG

MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision

MAS-ZERO is a novel, self-evolved inference-time framework that automatically designs, critiques, and refines multi-agent system configurations for specific tasks without requiring a validation set, achieving significant performance improvements over manual and existing automatic baselines across reasoning, coding, and agentic benchmarks.

Zixuan Ke, Austin Xu, Yifei Ming, Xuan-Phi Nguyen, Ryan Chin, Caiming Xiong, Shafiq JotyTue, 10 Ma🤖 cs.LG

← Previous Next →