cs.AI papers | Gist.Science

VLMQ: Token Saliency-Driven Post-Training Quantization for Vision-language Models

This paper introduces VLMQ, a post-training quantization framework tailored for vision-language models that leverages a gradient-driven importance factor to address visual over-representation and modality gaps, thereby achieving state-of-the-art performance across various model sizes and low-bit settings.

Yufei Xue, Yushi Huang, Jiawei Shao, Lunjie Zhu, Chi Zhang, Xuelong Li, Jun Zhang2026-03-09🤖 cs.AI

SGDFuse: SAM-Guided Diffusion Model for High-Fidelity Infrared and Visible Image Fusion

The paper proposes SGDFuse, a novel two-stage conditional diffusion model guided by Segment Anything Model (SAM) semantic masks, which achieves high-fidelity infrared and visible image fusion by leveraging explicit semantic priors to preserve key targets and minimize artifacts for superior downstream task performance.

Xiaoyang Zhang, jinjiang Li, Guodong Fan, Yakun Ju, Linwei Fan, Jun Liu, Alex C. Kot2026-03-09🤖 cs.AI

Handling Infinite Domain Parameters in Planning Through Best-First Search with Delayed Partial Expansions

This paper proposes a best-first search algorithm utilizing delayed partial expansions to explicitly treat control parameters as decision points within infinite domains, offering a complete and competitive alternative to existing constraint-based approaches for automated planning.

Ángel Aso-Mollar, Diego Aineto, Enrico Scala + 1 more2026-03-09⚡ eess

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

This paper introduces "Answer-Then-Check," a novel safety alignment method that enhances LLM robustness against jailbreak attacks by training models to generate direct answers internally and then critically evaluate their safety before responding, achieving superior protection with reduced over-refusal while maintaining general reasoning capabilities through the newly constructed 80K-sample ReSA dataset.

Chentao Cao, Xiaojun Xu, Bo Han, Hang Li2026-03-09🤖 cs.AI

Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation

This paper addresses the inconsistency and structural biases in existing latency metrics for simultaneous speech-to-text translation by introducing a comprehensive meta-evaluation, proposing new metrics (YAAL and LongYAAL) and a resegmentation tool (SoftSegmenter), and implementing these solutions within the OmniSTEval toolkit to enable more reliable system assessments.

Peter Polák, Sara Papi, Luisa Bentivogli, Ondřej Bojar2026-03-09🤖 cs.AI

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

The paper introduces LikePhys, a training-free evaluation method using likelihood preferences to assess intuitive physics understanding in video diffusion models, demonstrating that current models show improving capabilities in physical reasoning as they scale despite challenges with complex dynamics.

Jianhao Yuan, Fabio Pizzati, Francesco Pinto, Lars Kunze, Ivan Laptev, Paul Newman, Philip Torr, Daniele De Martini2026-03-09🤖 cs.AI

Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

Phys2Real is a real-to-sim-to-real reinforcement learning framework that enhances sim-to-real transfer for precise robotic manipulation by fusing vision-language model-inferred physical parameter priors with online interactive adaptation through uncertainty-aware ensemble estimation.

Maggie Wang, Stephen Tian, Aiden Swann, Ola Shorinwa, Jiajun Wu, Mac Schwager2026-03-09🤖 cs.AI

CanvasMAR: Improving Masked Autoregressive Video Prediction With Canvas

CanvasMAR enhances masked autoregressive video prediction by introducing a global "canvas" prior and a motion-aware curriculum to generate high-fidelity, coherent videos with fewer sampling steps, achieving performance that rivals advanced diffusion-based methods.

Zian Li, Muhan Zhang2026-03-09🤖 cs.AI

Just-In-Time Objectives: A General Approach for Specialized AI Interactions

This paper introduces "Just-In-Time Objectives," a framework that passively observes user behavior to infer and rapidly optimize for specific, real-time goals, enabling large language models to generate specialized tools and responses that significantly outperform standard generic interactions.

Michelle S. Lam, Omar Shaikh, Hallie Xu, Alice Guo, Diyi Yang, Jeffrey Heer, James A. Landay, Michael S. Bernstein2026-03-09🤖 cs.AI

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

The paper introduces 3DThinker, a novel framework that enables vision-language models to perform 3D spatial reasoning from limited views by aligning their internal representations with a 3D foundation model and refining the reasoning process through outcome-based optimization, all without requiring explicit 3D prior inputs or labeled 3D training data.

Zhangquan Chen, Manyuan Zhang, Xinlei Yu, Xufang Luo, Mingze Sun, Zihao Pan, Xiang An, Yan Feng, Peng Pei, Xunliang Cai, Ruqi Huang2026-03-09🤖 cs.AI

Automated Coding of Communication Data Using ChatGPT: Consistency Across Subgroups

This study demonstrates that ChatGPT-based coding of communication data performs consistently across gender and racial/ethnic subgroups, matching human rater reliability and validating its potential for large-scale collaborative assessments.

Jiangang Hao, Wenju Cui, Patrick Kyllonen, Emily Kerzabi2026-03-09🤖 cs.AI

Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People

This paper introduces the Collaborative Battleship task to evaluate language models' information-seeking abilities and proposes Bayesian Experimental Design-inspired Monte Carlo inference strategies that significantly enhance both question-asking and answer-accuracy, enabling weaker models to outperform humans and frontier models in strategic decision-making tasks.

Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum2026-03-09🤖 cs.AI

REx86: A Local Large Language Model for Assisting in x86 Assembly Reverse Engineering

This paper introduces REx86, a locally deployable, fine-tuned Qwen2.5-Coder-7B model that significantly enhances x86 assembly reverse engineering by improving code comprehension and accuracy while addressing the privacy and security limitations of cloud-based LLMs.

Darrin Lea, James Ghawaly, Golden Richard + 2 more2026-03-09🤖 cs.AI

LA-MARRVEL: A Knowledge-Grounded, Language-Aware LLM Framework for Clinically Robust Rare Disease Gene Prioritization

LA-MARRVEL is a knowledge-grounded, language-aware LLM framework that significantly improves rare disease gene prioritization accuracy by using structured, phenotype-rich prompts to generate clinically robust, ACMG-aligned reasoning without disrupting existing diagnostic pipelines.

Jaeyeon Lee, Lin Yao, Hyun-Hwan Jeong, Zhandong Liu2026-03-09🤖 cs.AI

The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models

This paper introduces the Cultural Reference Transformation (CRT) metric to evaluate how diffusion models navigate the tension between memorization and generalization in culturally iconic contexts, revealing that model behavior depends on distinct recognition and realization mechanisms influenced by factors like data frequency, textual uniqueness, and reference popularity.

Maria-Teresa De Rosa Palmini, Eva Cetinic2026-03-09🤖 cs.AI

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

This paper introduces Soft Q-based Diffusion Finetuning (SQDF), a novel KL-regularized reinforcement learning method that employs a reparameterized policy gradient of a training-free soft Q-function, enhanced by discount factors, consistency models, and off-policy replay buffers, to effectively align diffusion models with downstream objectives while mitigating reward over-optimization and preserving sample diversity.

Hyeongyu Kang, Jaewoo Lee, Woocheol Shin, Kiyoung Om, Jinkyoo Park2026-03-09🤖 cs.AI

XR-DT: Extended Reality-Enhanced Digital Twin for Safe Motion Planning via Human-Aware Model Predictive Path Integral Control

This paper introduces XR-DT, an Extended Reality-enhanced Digital Twin framework that integrates a novel Human-Aware Model Predictive Path Integral (HA-MPPI) controller with an attention-based trajectory prediction model to enable safe, efficient, and interpretable motion planning for mobile robots operating alongside humans.

Tianyi Wang, Jiseop Byeon, Ahmad Yehia, Yiming Xu, Jihyung Park, Tianyi Zeng, Sikai Chen, Ziran Wang, Junfeng Jiao, Christian Claudel2026-03-09🤖 cs.AI

Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity

This paper proposes a novel training framework that leverages the $\alpha$ -divergence family to explicitly filter incorrect answers and control the precision-diversity trade-off, thereby overcoming the diversity loss inherent in standard Reinforcement Learning and achieving state-of-the-art performance on the Lean theorem-proving benchmark.

Germán Kruszewski, Pierre Erbacher, Jos Rozen, Marc Dymetman2026-03-09🤖 cs.AI

Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation

This paper proposes a point cloud-based framework for event-driven human pose estimation that leverages spatiotemporal properties through novel temporal slicing and sequencing modules alongside an edge-enhanced representation, achieving improved accuracy and efficiency on the DHP19 dataset without converting event streams into dense frames.

Haoxian Zhou, Chuanzhi Xu, Langyi Chen, Pengfei Ye, Haodong Chen, Yuk Ying Chung, Qiang Qu2026-03-09🤖 cs.AI

A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation

The paper introduces A-3PO, a method that accelerates asynchronous LLM training by 1.8x by approximating the computationally expensive proximal policy in Decoupled PPO through simple interpolation, thereby eliminating the need for extra forward passes while maintaining comparable performance.

Xiaocan Li, Shiliang Wu, Zheng Shen2026-03-09🤖 cs.AI

← Previous Next →