cs.AI papers | Gist.Science

Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations

This paper proposes a transformer-based framework for skin cancer case retrieval that effectively combines reference images and textual descriptors by learning hierarchical representations and performing joint global-local alignment, thereby achieving state-of-the-art performance on the Derm7pt dataset to support clinical decision-making.

Yuheng Wang, Yuji Lin, Dongrun Zhu, Jiayue Cai, Sunil Kalia, Harvey Lui, Chunqi Chang, Z. Jane Wang, Tim K. LeeWed, 11 Ma🤖 cs.AI

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

This paper proposes a unified taxonomy and evaluation framework for latent world models in automated driving, organizing design choices by latent representations and structural priors while identifying key internal mechanics and research directions to enhance robustness, generalization, and deployability.

Rongxiang Zeng, Yongqi DongWed, 11 Ma🤖 cs.AI

Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

This study demonstrates that integrating sentiment scores derived from a finetuned Qwen3 model analyzing English and Chinese news significantly enhances aluminum price forecasting accuracy and economic utility, particularly during periods of high market volatility, compared to traditional tabular data models.

Alvaro Paredes Amorin, Andre Python, Christoph WeisserWed, 11 Ma🤖 cs.AI

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models

GST-VLA introduces a novel framework that enhances Vision-Language-Action models by converting visual observations into anisotropic 3D Gaussian spatial tokens and employing 3D Depth-Aware Chain-of-Thought reasoning to achieve state-of-the-art performance on precision-demanding robotic manipulation tasks.

Md Selim Sarowar, Omer Tariq, Sungho KimWed, 11 Ma🤖 cs.AI

A Text-Native Interface for Generative Video Authoring

This paper introduces Doki, a text-native interface that enables users of varying expertise to author generative videos by defining assets, scenes, and edits directly within a freeform text document, thereby shifting video creation from specialized tools to a natural writing process.

Xingyu Bruce Liu, Mira Dontcheva, Dingzeyu LiWed, 11 Ma🤖 cs.AI

Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation

The paper proposes Sim2Act, a robust simulation-to-decision framework that enhances policy reliability in mission-critical domains by combining an adversarial calibration mechanism to align simulation fidelity with decision impact and a group-relative perturbation strategy to stabilize learning without overly conservative constraints.

Hongyu Cao, Jinghan Zhang, Kunpeng Liu, Dongjie Wang, Feng Xia, Haifeng Chen, Xiaohua Hu, Yanjie FuWed, 11 Ma🤖 cs.AI

WS-Net: Weak-Signal Representation Learning and Gated Abundance Reconstruction for Hyperspectral Unmixing via State-Space and Weak Signal Attention Fusion

This paper introduces WS-Net, a deep unmixing framework that combines state-space modeling, wavelet-fused encoding, and a specialized weak signal attention mechanism to effectively recover weak spectral signals and significantly improve abundance estimation accuracy in hyperspectral images under low signal-to-noise conditions.

Zekun Long, Ali Zia, Guanyiman Fu, Vivien Rolland, Jun ZhouWed, 11 Ma🤖 cs.AI

PlayWorld: Learning Robot World Models from Autonomous Play

PlayWorld introduces a fully autonomous pipeline that trains high-fidelity, physically consistent video world models from unsupervised robot self-play, outperforming human-collected data in predicting complex interactions and significantly boosting real-world reinforcement learning success rates.

Tenny Yin, Zhiting Mei, Zhonghe Zheng, Miyu Yamane, David Wang, Jade Sceats, Samuel M. Bateman, Lihan Zha, Apurva Badithela, Ola Shorinwa, Anirudha MajumdarWed, 11 Ma🤖 cs.AI

Automating Detection and Root-Cause Analysis of Flaky Tests in Quantum Software

This paper presents an automated pipeline leveraging Large Language Models to detect and diagnose flaky tests in quantum software, successfully expanding an existing dataset by 54% and demonstrating that models like Google Gemini can achieve high accuracy (F1-scores up to 0.9643) in classifying flakiness and identifying root causes.

Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy, Lei ZhangWed, 11 Ma🤖 cs.AI

The Missing Memory Hierarchy: Demand Paging for LLM Context Windows

This paper introduces Pichay, a demand paging system that treats LLM context windows as a memory hierarchy rather than a static cache, successfully reducing context consumption by up to 93% in production by evicting stale content and dynamically reloading it only when needed.

Tony MasonWed, 11 Ma🤖 cs.AI

AI Phenomenology for Understanding Human-AI Experiences Across Eras

This paper proposes "AI phenomenology" as a research framework that prioritizes users' first-person lived experiences over traditional performance metrics to better understand and guide the bidirectional alignment between humans and AI systems, offering a set of methodological tools, design concepts, and a research agenda derived from three empirical studies.

Bhada Yun, Evgenia Taranova, Dana Feng, Renn Su, April Yi WangWed, 11 Ma🤖 cs.AI

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

This paper introduces CMA-ES-IG, an algorithm that enhances robot preference learning by generating perceptually distinct and informative queries, thereby improving scalability, robustness, and user experience compared to existing state-of-the-art methods.

Nathaniel Dennler, Zhonghao Shi, Yiran Tao, Andreea Bobu, Stefanos Nikolaidis, Maja MataricWed, 11 Ma🤖 cs.AI

Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis

This paper analyzes gender bias in audio deepfake detection using the ASVspoof 5 dataset and a ResNet-18 classifier, demonstrating that while aggregate metrics like Equal Error Rate may suggest low disparity, fairness-aware evaluation reveals significant gender-specific error distributions that necessitate more equitable and robust detection systems.

Aishwarya Fursule, Shruti Kshirsagar, Anderson R. AvilaWed, 11 Ma🤖 cs.AI

Security Considerations for Multi-agent Systems

This study systematically characterizes the unique threat landscape of multi-agent AI systems and empirically evaluates 16 security frameworks, revealing that none achieve majority coverage of the identified risks, with Non-Determinism and Data Leakage being the most under-addressed domains.

Tam Nguyen, Moses Ndebugre, Dheeraj ArremsettyWed, 11 Ma🤖 cs.AI

Arbiter: Detecting Interference in LLM Agent System Prompts

This paper introduces Arbiter, a framework that combines formal rules with multi-model LLM analysis to detect interference patterns in coding agent system prompts, revealing that prompt architecture influences failure types and that multi-model evaluation uncovers distinct vulnerabilities missed by single-model approaches.

Tony MasonWed, 11 Ma🤖 cs.AI

Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds

This paper introduces Semantic Level of Detail (SLoD), a framework that utilizes heat kernel diffusion on hyperbolic manifolds to enable continuous, principled control over knowledge abstraction levels in AI memory systems, automatically detecting emergent semantic boundaries in both synthetic and real-world knowledge graphs without manual supervision.

Edward IzgorodinWed, 11 Ma🤖 cs.AI

Automated Tensor-Relational Decomposition for Large-Scale Sparse Tensor Computation

This paper introduces \texttt{EinSum}, a tensor-relational extension of Einstein Summation Notation that automatically rewrites computations to leverage efficient numerical kernels for dense operations while utilizing relational systems to manage large-scale sparsity.

Yuxin Tang, Zhiyuan Xin, Zhimin Ding, Xinyu Yao, Daniel Bourgeois, Tirthak Patel, Chris JermaineWed, 11 Ma🤖 cs.AI

BiCLIP: Domain Canonicalization via Structured Geometric Transformation

The paper introduces BiCLIP, a simple and parameter-efficient framework that achieves state-of-the-art few-shot domain adaptation for vision-language models by applying a structured geometric transformation to align multimodal features across disparate domains using a small set of anchor samples.

Pranav Mantini, Shishir K. ShahWed, 11 Ma🤖 cs.AI

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

The paper introduces VoxEmo, a comprehensive benchmark and toolkit for evaluating Speech Large Language Models on speech emotion recognition across 35 corpora and 15 languages, featuring a distribution-aware soft-label protocol that reveals how these models uniquely align with human subjective emotion distributions despite trailing supervised baselines in hard-label accuracy.

Hezhao Zhang, Huang-Cheng Chou, Shrikanth Narayanan, Thomas HainWed, 11 Ma🤖 cs.AI

PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration

PathoScribe is a unified retrieval-augmented large language model framework that transforms static pathology archives into an active, reasoning-enabled clinical intelligence platform, enabling natural language case retrieval, automated cohort construction, and real-time diagnostic support with high accuracy and efficiency.

Abdul Rehman Akbar, Samuel Wales-McGrath, Alejadro Levya, Lina Gokhale, Rajendra Singh, Wei Chen, Anil Parwani, Muhammad Khalid Khan NiaziWed, 11 Ma🤖 cs.AI

← Previous Next →