cs.AI papers | Gist.Science

Iterative Quantum Feature Maps

The paper proposes Iterative Quantum Feature Maps (IQFMs), a hybrid quantum-classical framework that constructs deep architectures by iteratively connecting shallow, noise-resilient quantum feature maps with classically computed weights to mitigate hardware limitations and achieve performance comparable to classical neural networks without optimizing variational quantum parameters.

Nasa Matsumoto, Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima2026-03-09⚛️ quant-ph

SPARC: Concept-Aligned Sparse Autoencoders for Cross-Model and Cross-Modal Interpretability

SPARC introduces a novel framework that unifies concept representations across diverse AI architectures and modalities by enforcing global sparsity and cross-reconstruction loss, thereby creating a shared latent space that enables direct cross-model interpretability and applications like text-guided localization without manual alignment.

Ali Nasiri-Sarvi, Hassan Rivaz, Mahdi S. Hosseini2026-03-09🤖 cs.AI

Bridging MOOCs, Smart Teaching, and AI: A Decade of Evolution Toward a Unified Pedagogy

This paper proposes a unified instructional framework that integrates MOOCs, Smart Teaching, and AI into a coherent, teaching-driven pedagogy, formalizing them as a layered knowledge transformation model to maximize systemic educational potential through structured exposure, adaptive allocation, and efficiency amplification.

Bo Yuan, Jiazi Hu2026-03-09🤖 cs.AI

ExDD: Explicit Dual Distribution Learning for Surface Defect Detection via Diffusion Synthesis

The paper introduces ExDD, a novel framework for industrial surface defect detection that overcomes data scarcity and uniform outlier assumptions by explicitly modeling dual feature distributions via parallel memory banks and generating context-aware synthetic defects using latent diffusion models.

Muhammad Aqeel, Federico Leonardi, Francesco Setti2026-03-09🤖 cs.AI

A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature

This paper presents a multimodal large language model-based multi-agent system that significantly outperforms existing state-of-the-art methods in automatically extracting structured chemical information from diverse and complex literature graphics, thereby advancing AI-driven chemical research.

Yufan Chen, Ching Ting Leung, Bowen Yu, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao2026-03-09🤖 cs.AI

MAP: Mitigating Hallucinations in Large Vision-Language Models with Map-Level Attention Processing

This paper introduces MAP, a training-free decoding method that mitigates hallucinations in Large Vision-Language Models by interpreting hidden states as a 2D semantic map and employing layer-wise criss-cross attention and global-local logit fusion to aggregate widely distributed factual information for improved factual consistency.

Chenxi Li, Yichen Guo, Benfang Qian, Jinhao You, Kai Tang, Yaosong Du, Zonghao Zhang, Xiande Huang2026-03-09🤖 cs.AI

VLMQ: Token Saliency-Driven Post-Training Quantization for Vision-language Models

This paper introduces VLMQ, a post-training quantization framework tailored for vision-language models that leverages a gradient-driven importance factor to address visual over-representation and modality gaps, thereby achieving state-of-the-art performance across various model sizes and low-bit settings.

Yufei Xue, Yushi Huang, Jiawei Shao, Lunjie Zhu, Chi Zhang, Xuelong Li, Jun Zhang2026-03-09🤖 cs.AI

SGDFuse: SAM-Guided Diffusion Model for High-Fidelity Infrared and Visible Image Fusion

The paper proposes SGDFuse, a novel two-stage conditional diffusion model guided by Segment Anything Model (SAM) semantic masks, which achieves high-fidelity infrared and visible image fusion by leveraging explicit semantic priors to preserve key targets and minimize artifacts for superior downstream task performance.

Xiaoyang Zhang, jinjiang Li, Guodong Fan, Yakun Ju, Linwei Fan, Jun Liu, Alex C. Kot2026-03-09🤖 cs.AI

Handling Infinite Domain Parameters in Planning Through Best-First Search with Delayed Partial Expansions

This paper proposes a best-first search algorithm utilizing delayed partial expansions to explicitly treat control parameters as decision points within infinite domains, offering a complete and competitive alternative to existing constraint-based approaches for automated planning.

Ángel Aso-Mollar, Diego Aineto, Enrico Scala + 1 more2026-03-09⚡ eess

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

This paper introduces "Answer-Then-Check," a novel safety alignment method that enhances LLM robustness against jailbreak attacks by training models to generate direct answers internally and then critically evaluate their safety before responding, achieving superior protection with reduced over-refusal while maintaining general reasoning capabilities through the newly constructed 80K-sample ReSA dataset.

Chentao Cao, Xiaojun Xu, Bo Han, Hang Li2026-03-09🤖 cs.AI

Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation

This paper addresses the inconsistency and structural biases in existing latency metrics for simultaneous speech-to-text translation by introducing a comprehensive meta-evaluation, proposing new metrics (YAAL and LongYAAL) and a resegmentation tool (SoftSegmenter), and implementing these solutions within the OmniSTEval toolkit to enable more reliable system assessments.

Peter Polák, Sara Papi, Luisa Bentivogli, Ondřej Bojar2026-03-09🤖 cs.AI

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

The paper introduces LikePhys, a training-free evaluation method using likelihood preferences to assess intuitive physics understanding in video diffusion models, demonstrating that current models show improving capabilities in physical reasoning as they scale despite challenges with complex dynamics.

Jianhao Yuan, Fabio Pizzati, Francesco Pinto, Lars Kunze, Ivan Laptev, Paul Newman, Philip Torr, Daniele De Martini2026-03-09🤖 cs.AI

Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

Phys2Real is a real-to-sim-to-real reinforcement learning framework that enhances sim-to-real transfer for precise robotic manipulation by fusing vision-language model-inferred physical parameter priors with online interactive adaptation through uncertainty-aware ensemble estimation.

Maggie Wang, Stephen Tian, Aiden Swann, Ola Shorinwa, Jiajun Wu, Mac Schwager2026-03-09🤖 cs.AI

CanvasMAR: Improving Masked Autoregressive Video Prediction With Canvas

CanvasMAR enhances masked autoregressive video prediction by introducing a global "canvas" prior and a motion-aware curriculum to generate high-fidelity, coherent videos with fewer sampling steps, achieving performance that rivals advanced diffusion-based methods.

Zian Li, Muhan Zhang2026-03-09🤖 cs.AI

Just-In-Time Objectives: A General Approach for Specialized AI Interactions

This paper introduces "Just-In-Time Objectives," a framework that passively observes user behavior to infer and rapidly optimize for specific, real-time goals, enabling large language models to generate specialized tools and responses that significantly outperform standard generic interactions.

Michelle S. Lam, Omar Shaikh, Hallie Xu, Alice Guo, Diyi Yang, Jeffrey Heer, James A. Landay, Michael S. Bernstein2026-03-09🤖 cs.AI

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

The paper introduces 3DThinker, a novel framework that enables vision-language models to perform 3D spatial reasoning from limited views by aligning their internal representations with a 3D foundation model and refining the reasoning process through outcome-based optimization, all without requiring explicit 3D prior inputs or labeled 3D training data.

Zhangquan Chen, Manyuan Zhang, Xinlei Yu, Xufang Luo, Mingze Sun, Zihao Pan, Xiang An, Yan Feng, Peng Pei, Xunliang Cai, Ruqi Huang2026-03-09🤖 cs.AI

Automated Coding of Communication Data Using ChatGPT: Consistency Across Subgroups

This study demonstrates that ChatGPT-based coding of communication data performs consistently across gender and racial/ethnic subgroups, matching human rater reliability and validating its potential for large-scale collaborative assessments.

Jiangang Hao, Wenju Cui, Patrick Kyllonen, Emily Kerzabi2026-03-09🤖 cs.AI

Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People

This paper introduces the Collaborative Battleship task to evaluate language models' information-seeking abilities and proposes Bayesian Experimental Design-inspired Monte Carlo inference strategies that significantly enhance both question-asking and answer-accuracy, enabling weaker models to outperform humans and frontier models in strategic decision-making tasks.

Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum2026-03-09🤖 cs.AI

REx86: A Local Large Language Model for Assisting in x86 Assembly Reverse Engineering

This paper introduces REx86, a locally deployable, fine-tuned Qwen2.5-Coder-7B model that significantly enhances x86 assembly reverse engineering by improving code comprehension and accuracy while addressing the privacy and security limitations of cloud-based LLMs.

Darrin Lea, James Ghawaly, Golden Richard + 2 more2026-03-09🤖 cs.AI

LA-MARRVEL: A Knowledge-Grounded, Language-Aware LLM Framework for Clinically Robust Rare Disease Gene Prioritization

LA-MARRVEL is a knowledge-grounded, language-aware LLM framework that significantly improves rare disease gene prioritization accuracy by using structured, phenotype-rich prompts to generate clinically robust, ACMG-aligned reasoning without disrupting existing diagnostic pipelines.

Jaeyeon Lee, Lin Yao, Hyun-Hwan Jeong, Zhandong Liu2026-03-09🤖 cs.AI

← Previous Next →