cs.AI papers | Gist.Science

Do Foundation Models Know Geometry? Probing Frozen Features for Continuous Physical Measurement

This paper demonstrates that frozen vision-language model features contain rich, continuous geometric information that outperforms text-based outputs by 3.3x, revealing that the accuracy bottleneck stems from training objectives and autoregressive generation rather than representational limitations, as evidenced by high-precision linear probes and consistent performance across diverse encoder architectures.

Yakov Pyotr Shkolnikov2026-03-09🤖 cs.AI

PONTE: Personalized Orchestration for Natural Language Trustworthy Explanations

PONTE is a human-in-the-loop framework that enhances the reliability and personalization of Explainable AI narratives by modeling user adaptation as a closed-loop process involving preference modeling, grounded generation, and iterative verification to overcome the limitations of one-size-fits-all approaches and Large Language Model hallucinations.

Vittoria Vineis, Matteo Silvestri, Lorenzo Antonelli, Filippo Betello, Gabriele Tolomei2026-03-09🤖 cs.AI

NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches

The paper introduces NOBLE, a pretraining architecture that permanently augments transformer linear layers with learnable nonlinear low-rank branches (specifically using CosNet activation), achieving significant training efficiency and speedups across various models with minimal parameter and time overhead, though its benefits may be hindered by certain stochastic data augmentations.

Ethan Smith (Canva Research)2026-03-09🤖 cs.AI

COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics

COLD-Steer is a training-free framework that steers large language models by approximating the representational changes of gradient descent on in-context examples, achieving up to 95% steering effectiveness with 50 times fewer samples than existing methods.

Kartik Sharma, Rakshit S. Trivedi2026-03-09🤖 cs.AI

Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education

This paper presents an artificial intelligence system trained on over 45,000 ultrasound images that achieves diagnostic accuracy comparable to senior radiologists for fetal orofacial clefts, significantly enhances junior radiologists' performance when used as a copilot, and accelerates clinical expertise development for rare conditions.

Yuanji Zhang, Yuhao Huang, Haoran Dou, Xiliang Zhu, Chen Ling, Zhong Yang, Lianying Liang, Jiuping Li, Siying Liang, Rui Li, Yan Cao, Yuhan Zhang, Jiewei Lai, Yongsong Zhou, Hongyu Zheng, Xinru Gao, Cheng Yu, Liling Shi, Mengqin Yuan, Honglong Li, Xiaoqiong Huang, Chaoyu Chen, Jialin Zhang, Wenxiong Pan, Alejandro F. Frangi, Guangzhi He, Xin Yang, Yi Xiong, Linliang Yin, Xuedong Deng, Dong Ni2026-03-09🤖 cs.AI

RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering

RAMoEA-QA is a hierarchically routed generative model that employs a two-stage conditional specialization mechanism—combining an Audio Mixture-of-Experts for acoustic encoding and a Language Mixture-of-Adapters for query intent—to achieve state-of-the-art robustness and accuracy in respiratory audio question answering across diverse devices, environments, and task shifts.

Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo2026-03-09🤖 cs.AI

LiveSense: A Real-Time Wi-Fi Sensing Platform for Range-Doppler on COTS Laptop

LiveSense is a cross-platform system that transforms commercial off-the-shelf Wi-Fi 6E/7 laptops into real-time, centimeter-accurate Range-Doppler sensors capable of extracting synchronized channel state information and performing on-device signal processing to detect distance, velocity, and micro-motions while maintaining simultaneous communication.

Jessica Sanson, Rahul C. Shah, Maximilian Pinaroc, Cagri Tanriover, Valerio Frascolla2026-03-09🤖 cs.AI

Boosting deep Reinforcement Learning using pretraining with Logical Options

This paper proposes Hybrid Hierarchical RL (H^2RL), a two-stage framework that leverages logical option-based pretraining to inject symbolic structure into deep reinforcement learning agents, effectively mitigating reward misalignment and improving long-horizon decision-making while outperforming existing neural, symbolic, and neuro-symbolic baselines.

Zihan Ye, Phil Chau, Raban Emunds, Jannis Blüml, Cedric Derstroff, Quentin Delfosse, Oleg Arenz, Kristian Kersting2026-03-09🤖 cs.AI

SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning

The paper introduces SUREON, a large-scale surgical video QA dataset derived from academic lecture narrations, and presents two specialized vision-language models that demonstrate superior surgical reasoning capabilities by explicitly interpreting intent, rationale, and future steps in surgical scenes.

Alejandra Perez, Anita Rau, Lee White, Busisiwe Mlambo, Chinedu Nwoye, Muhammad Abdullah Jamal, Omid Mohareri2026-03-09🤖 cs.AI

Fly360: Omnidirectional Obstacle Avoidance within Drone View

This paper introduces Fly360, a lightweight two-stage perception-decision pipeline that enables drones with panoramic views to achieve stable, omnidirectional obstacle avoidance by converting RGB observations into depth maps and employing a fixed random-yaw training strategy, outperforming traditional forward-view baselines in both simulation and real-world scenarios.

Xiangkai Zhang, Dizhe Zhang, WenZhuo Cao, Zhaoliang Wan, Yingjie Niu, Lu Qi, Xu Yang, Zhiyong Liu2026-03-09🤖 cs.AI

BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

The paper proposes BEVLM, a framework that bridges the gap between spatially consistent Bird's-Eye View representations and Large Language Models by distilling semantic knowledge, thereby significantly enhancing both cross-view reasoning accuracy and safety-critical end-to-end driving performance.

Thomas Monninger, Shaoyuan Xie, Qi Alfred Chen, Sihao Ding2026-03-09🤖 cs.AI

Motion Illusions Generated Using Predictive Neural Networks Also Fool Humans

This paper introduces the Evolutionary Illusion GENerator (EIGen), a generative model based on video predictive neural networks that creates new visual motion illusions, which are confirmed to fool human participants, thereby supporting the hypothesis that such illusions arise from the brain's predictive processing rather than raw visual input and highlighting the value of studying "motivated failures" in AI research.

Lana Sinapayen, Eiji Watanabe2026-03-06💻 cs

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

This paper introduces EHRSQL, a practical text-to-SQL benchmark for electronic health records built from real hospital staff queries, which challenges models to handle diverse clinical needs, complex time expressions, and unanswerable questions to bridge the gap between research and healthcare deployment.

Gyubok Lee, Hyeonji Hwang, Seongsu Bae + 6 more2026-03-06💻 cs

Deep Learning Meets Mechanism Design: Key Results and Some Novel Applications

This paper reviews the technical details and key results of using deep learning to design mechanisms that approximately satisfy conflicting properties like incentive compatibility and welfare maximization, demonstrating the approach's effectiveness through case studies in vehicular energy management, mobile network resource allocation, and agricultural procurement auctions.

V. Udaya Sankar, Vishisht Srihari Rao, Mayank Ratan Bhardwaj + 1 more2026-03-06💻 cs

Seeing Through Uncertainty: A Free-Energy Approach for Real-Time Perceptual Adaptation in Robust Visual Navigation

This paper introduces FEP-Nav, a biologically-inspired framework that enables robust real-time visual navigation by minimizing Variational Free Energy through a dual-mechanism architecture of top-down decoding and adaptive normalization, allowing autonomous agents to maintain performance under noisy and shifting sensory conditions without gradient-based updates.

Maytus Piriyajitakonkij, Rishabh Dev Yadav, Mingfei Sun + 2 more2026-03-06💻 cs

Large Language Models are Contrastive Reasoners

This paper introduces Contrastive Prompting, a simple zero-shot method that significantly enhances large language models' reasoning capabilities across arithmetic, commonsense, and symbolic tasks by instructing them to generate both correct and incorrect answers, often outperforming state-of-the-art prompting techniques without requiring hand-crafted examples.

Liang Yao2026-03-06💻 cs

Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods

This paper proposes a novel two-phase learning framework that distills privileged information from LKH-generated expert trajectories to enable a non-holonomic vehicle to solve Dubins Traveling Salesman Problems with Neighborhoods approximately 50 times faster than traditional methods while ensuring all task points are visited.

Min Kyu Shin, Su-Jeong Park, Seung-Keol Ryu + 2 more2026-03-06💻 cs

Parallel Split Learning with Global Sampling

This paper introduces Parallel Split Learning with Global Sampling (GPSL), a server-driven scheme that fixes the global batch size and uses pooled-level proportions to draw local samples without replacement, thereby eliminating rounding bias, stabilizing optimization under non-IID data, and achieving centralized-like accuracy with negligible overhead.

Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush + 1 more2026-03-06💻 cs

Why Is Anything Conscious?

This paper proposes a formal framework arguing that consciousness arises naturally in biological systems as a subjective, valence-based interpretation of sensory information driven by the imperative of survival, where qualitative "good or bad" processing precedes neutral property representation and ultimately grounds meaning in the avoidance of death.

Michael Timothy Bennett, Sean Welsh, Anna Ciaunica2026-03-06💻 cs

Path Planning for Masked Diffusion Model Sampling

This paper introduces Path Planning (P2), a novel inference sampling strategy for Masked Diffusion Models that decomposes generation into planning and denoising stages to enable iterative token refinement, thereby establishing a new expanded evidence lower bound and achieving state-of-the-art performance across diverse domains including protein sequences, RNA, math, storytelling, and code generation.

Fred Zhangzhi Peng, Zachary Bezemek, Sawan Patel + 5 more2026-03-06💻 cs

← Previous Next →