cs.CV papers | Gist.Science

Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression

The paper proposes ST-Lite, a training-free KV cache compression framework that leverages the uniform high-sparsity of GUI attention patterns through a dual-branch scoring policy of spatial saliency and trajectory-aware semantic gating, achieving significant decoding acceleration with minimal performance loss in long-horizon GUI agents.

Bowen Zhou, Zhou Xu, Wanli Li + 2 more2026-03-03🤖 cs.LG

Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning

This paper proposes LoDA, a task-driven subspace decomposition method for LoRA-based continual learning that enhances knowledge sharing and isolation by decoupling general and task-specific directions through energy-based objectives and gradient-aligned optimization, thereby outperforming existing approaches.

Lingfeng He, De Cheng, Huaijie Wang + 3 more2026-03-03🤖 cs.LG

SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models

SKeDA is a generative watermarking framework for text-to-video diffusion models that enhances robustness against frame reordering and temporal distortions through Shuffle-Key-based Distribution-preserving Sampling and Differential Attention, while maintaining high video generation quality.

Yang Yang, Xinze Zou, Zehua Ma + 2 more2026-03-03🤖 cs.AI

A Case Study on Concept Induction for Neuron-Level Interpretability in CNN

This case study demonstrates that a concept induction-based framework for interpreting hidden neurons in CNNs, previously validated on ADE20K, successfully generalizes to the SUN2012 scene recognition dataset, confirming its broader applicability.

Moumita Sen Sarma, Samatha Ereshi Akkamahadevi, Pascal Hitzler2026-03-03🤖 cs.AI

Stateful Token Reduction for Long-Video Hybrid VLMs

This paper proposes a stateful, progressive token reduction framework with a unified language-aware scoring mechanism for hybrid video VLMs, achieving significant prefilling speedups while maintaining near-baseline accuracy by addressing the layerwise instability of token importance in architectures combining attention and state-space blocks.

Jindong Jiang, Amala Sanjay Deshmukh, Kateryna Chumachenko + 7 more2026-03-03🤖 cs.AI

AdURA-Net: Adaptive Uncertainty and Region-Aware Network

This paper proposes AdURA-Net, a geometry-driven deep learning framework that combines adaptive dilated convolutions with a dual-head loss function to effectively handle diagnostic uncertainty and improve reliability in multilabel thoracic disease classification.

Antik Aich Roy, Ujjwal Bhattacharya2026-03-03🤖 cs.AI

Optimisation of SOUP-GAN and CSR-GAN for High Resolution MR Images Reconstruction

This research enhances MR image reconstruction by optimizing SOUP-GAN and CSR-GAN models through architectural modifications and hyperparameter tuning, demonstrating that CSR-GAN excels in preserving high-frequency details while SOUP-GAN produces superior structural clarity with reduced noise.

Muneeba Rashid, Hina Shakir, Humaira Mehwish + 2 more2026-03-03⚡ eess

Efficient Flow Matching for Sparse-View CT Reconstruction

This paper proposes FMCT and its efficient variant EFMCT, which leverage the deterministic nature of Flow Matching and a velocity field reuse strategy to achieve high-quality, computationally efficient sparse-view CT reconstruction with bounded error and significantly fewer neural network function evaluations compared to diffusion-based methods.

Jiayang Shi, Lincen Yang, Zhong Li + 3 more2026-03-03⚡ eess

TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models

The TACIT Benchmark introduces a programmatic visual reasoning evaluation framework featuring 10 tasks across 6 domains with dual-track generative and discriminative assessments, utilizing deterministic computer-vision verification and structurally rigorous distractors to overcome the limitations of existing language-dependent and subjective benchmarks.

Daniel Nobrega Medeiros2026-03-03🤖 cs.AI

VisRef: Visual Refocusing while Thinking Improves Test-Time Scaling in Multi-Modal Large Reasoning Models

The paper proposes VisRef, a computationally efficient test-time scaling framework that improves multi-modal reasoning performance by dynamically re-injecting a diverse, semantically relevant coreset of visual tokens to prevent models from losing focus on image content during extended textual reasoning.

Soumya Suvra Ghosal, Youngeun Kim, Zhuowei Li + 6 more2026-03-03🤖 cs.AI

Physical Evaluation of Naturalistic Adversarial Patches for Camera-Based Traffic-Sign Detection

This paper evaluates the effectiveness of Naturalistic Adversarial Patches in physically disrupting traffic-sign detection for autonomous vehicles by introducing the customized CompGTSRB dataset and validating the attack's impact on a YOLOv5 model through systematic experiments on a Quanser QCar testbed.

Brianna D'Urso, Tahmid Hasan Sakib, Syed Rafay Hasan + 1 more2026-03-03🤖 cs.AI

Pretty Good Measurement for Radiomics: A Quantum-Inspired Multi-Class Classifier for Lung Cancer Subtyping and Prostate Cancer Risk Stratification

This paper introduces a quantum-inspired multi-class classifier based on the Pretty Good Measurement (PGM) that reformulates classification as quantum state discrimination, demonstrating competitive and often superior performance in radiomics tasks for lung cancer subtyping and prostate cancer risk stratification compared to established classical baselines.

Giuseppe Sergioli, Carlo Cuccu, Giovanni Pasini + 4 more2026-03-03⚛️ quant-ph

Scaling Quantum Machine Learning without Tricks: High-Resolution and Diverse Image Generation

This paper presents a novel, end-to-end quantum Wasserstein GAN framework that overcomes previous scaling limitations by utilizing advanced image loading techniques and tailored variational circuit architectures to generate high-resolution, diverse images from full MNIST, Fashion-MNIST, and Street View House Numbers datasets without relying on dimensionality reduction or patch-based tricks.

Jonas Jäger, Florian J. Kiwit, Carlos A. Riofrío2026-03-03⚛️ quant-ph

Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

This paper proposes AP-PCO, a joint position-color optimization framework that generates cross-spectral adversarial patches to effectively attack visual-infrared dense prediction systems by simultaneously perturbing both modalities while maintaining stealth through color adaptation.

He Li, Wenyue He, Weihang Kong + 1 more2026-03-03💻 cs

Ozone Cues Mitigate Reflected Downwelling Radiance in LWIR Absorption-Based Ranging

This paper introduces quadspectral and hyperspectral passive LWIR ranging methods that utilize ozone absorption features to estimate and mitigate reflected downwelling radiance, significantly improving distance measurement accuracy from over 100 meters to as low as 1.2 meters.

Unay Dorken Gallastegi, Wentao Shangguan, Vaibhav Choudhary + 4 more2026-03-03⚡ eess

Seeking Necessary and Sufficient Information from Multimodal Medical Data

This paper proposes a novel multimodal learning framework that decomposes representations into invariant and specific components to derive tractable Probability of Necessity and Sufficiency (PNS) objectives, thereby enhancing predictive performance and robustness to missing modalities in medical data analysis.

Boyu Chen, Weiye Bao, Junjie Liu + 5 more2026-03-03💻 cs

Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees

Proof-of-Perception (PoP) is a tool-using multimodal reasoning framework that generates calibrated, stepwise uncertainty via conformal sets to dynamically allocate computational resources, thereby reducing hallucinations and improving accuracy-efficiency trade-offs compared to existing baselines.

Arya Fayyazi, Haleh Akrami2026-03-03💻 cs

Diffusion-Based Low-Light Image Enhancement with Color and Luminance Priors

This paper proposes a novel conditional diffusion framework for low-light image enhancement that utilizes a Structured Control Embedding Module (SCEM) to decompose input images into physical priors, achieving state-of-the-art performance and strong generalization across multiple benchmarks without fine-tuning.

Xuanshuo Fu, Lei Kang, Javier Vazquez-Corral2026-03-03💻 cs

Percept-Aware Surgical Planning for Visual Cortical Prostheses with Vascular Avoidance

This paper presents a percept-aware surgical planning framework that optimizes electrode placement for cortical visual prostheses by formulating it as a differentiable constrained optimization problem, which simultaneously maximizes perceptual reconstruction fidelity and adheres to critical vascular safety and anatomical feasibility constraints.

Galen Pogoncheff, Alvin Wang, Jacob Granley + 1 more2026-03-03💻 cs

Deep Learning-Based Meat Freshness Detection with Segmentation and OOD-Aware Classification

This study presents a deep learning framework for meat freshness detection that combines U-Net-based segmentation with OOD-aware classification, demonstrating that EfficientNet-B0 achieves the highest accuracy (98.10%) on RGB images while supporting practical on-device deployment via TensorFlow Lite.

Hutama Arif Bramantyo, Mukarram Ali Faridi, Rui Chen + 2 more2026-03-03⚡ eess

← Previous Next →