cs.CV papers | Gist.Science

Vessel-Aware Deep Learning for OCTA-Based Detection of AMD

This paper proposes a vessel-aware deep learning framework for detecting age-related macular degeneration (AMD) in OCTA images by integrating external multiplicative attention with clinically meaningful vascular biomarkers, specifically tortuosity and dropout maps, to guide the model toward physiologically relevant regions and improve interpretability.

Margalit G. Mitzner, Moinak Bhattacharya, Zhilin Zou, Chao Chen, Prateek Prasanna2026-03-10💻 cs

Heterogeneous Decentralized Diffusion Models

This paper introduces an efficient framework for heterogeneous decentralized diffusion models that enables experts to train with mixed objectives (DDPM and Flow Matching) and reduced resource requirements, achieving a 16x decrease in compute and 14x reduction in data compared to prior approaches while improving image quality and diversity.

Zhiying Jiang, Raihan Seraj, Marcos Villagra, Bidhan Roy2026-03-10🤖 cs.LG

ButterflyViT: 354 $\times$ Expert Compression for Edge Vision Transformers

ButterflyViT introduces a geometric parameterization method that treats Mixture of Experts as rotations of a shared quantized substrate, achieving a 354 $\times$ memory reduction for Vision Transformers on edge devices while maintaining accuracy through spatial smoothness regularization.

Aryan Karmore2026-03-10💻 cs

XMACNet: An Explainable Lightweight Attention based CNN with Multi Modal Fusion for Chili Disease Classification

This paper introduces XMACNet, an explainable, lightweight CNN that combines self-attention mechanisms with multi-modal fusion of RGB images and vegetation indices to achieve high-accuracy chili disease classification suitable for edge deployment.

Tapon Kumer Ray, Rajkumar Y, Shalini R, Srigayathri K, Jayashree S, Lokeswari P2026-03-10💻 cs

EarthBridge: A Solution for 4th Multi-modal Aerial View Image Challenge Translation Track

This paper introduces EarthBridge, a high-fidelity cross-modal translation framework combining Diffusion Bridge Implicit Models and Contrastive Unpaired Translation to achieve second place in the 4th Multi-modal Aerial View Image Challenge by effectively translating between SAR, EO, and IR aerial imagery.

Zhenyuan Chen, Guanyuan Shen, Feng Zhang2026-03-10💻 cs

HiDE: Hierarchical Dictionary-Based Entropy Modeling for Learned Image Compression

The paper proposes HiDE, a hierarchical dictionary-based entropy modeling framework for learned image compression that enhances coding efficiency by decomposing external priors into global and local dictionaries with cascaded retrieval and employing a context-aware parameter estimator to achieve significant BD-rate savings over state-of-the-art methods.

Haoxuan Xiong, Yuanyuan Xu, Kun Zhu, Yiming Wang, Baoliu Ye2026-03-10💻 cs

A Hybrid Machine Learning Model for Cerebral Palsy Detection

This paper presents a hybrid machine learning model that combines VGG19, EfficientNet, and ResNet50 for feature extraction with a Bi-LSTM classifier to achieve a 98.83% accuracy in the early detection of Cerebral Palsy from MRI images, outperforming several individual pre-trained models.

Karan Kumar Singh, Nikita Gajbhiye, Gouri Sankar Mishra2026-03-10💻 cs

Step-Level Visual Grounding Faithfulness Predicts Out-of-Distribution Generalization in Long-Horizon Vision-Language Models

This paper establishes that the quality of a model's step-level visual grounding, quantified by the Step Grounding Rate (SGR), serves as a robust and independent predictor of out-of-distribution generalization in long-horizon vision-language models, outperforming traditional final-answer accuracy metrics.

Md Ashikur Rahman, Md Arifur Rahman, Niamul Hassan Samin, Abdullah Ibne Hanif Arean, Juena Ahmed Noshin2026-03-10💻 cs

MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies

This paper introduces MotionBits, a novel concept and learning-free segmentation method that identifies the smallest manipulable rigid bodies through kinematic spatial twist equivalence, outperforming state-of-the-art embodied perception models on the new MoRiBo benchmark and enabling more effective downstream robotic manipulation and reasoning tasks.

Howard H. Qian, Kejia Ren, Yu Xiang, Vicente Ordonez, Kaiyu Hang2026-03-10💻 cs

Active View Selection with Perturbed Gaussian Ensemble for Tomographic Reconstruction

This paper introduces Perturbed Gaussian Ensemble, an active view selection framework for sparse-view CT that leverages stochastic density scaling of uncertain Gaussian primitives to identify high-variance projections, thereby significantly improving reconstruction fidelity and reducing geometric artifacts compared to existing methods.

Yulun Wu, Ruyi Zha, Wei Cao, Yingying Li, Yuanhao Cai, Yaoyao Liu2026-03-10💻 cs

An Extended Topological Model For High-Contrast Optical Flow

This paper introduces an extended 3-manifold topological model for high-contrast optical flow that resolves the limitations of previous torus-based approaches by identifying that the most significant motion patches are concentrated near binary step-edge circles rather than the torus, thereby offering new insights into the topological and geometric structures underlying visual data inference.

Brad Turow, Jose A. Perea2026-03-10🔢 math

ColonSplat: Reconstruction of Peristaltic Motion in Colonoscopy with Dynamic Gaussian Splatting

This paper introduces ColonSplat, a dynamic Gaussian Splatting framework that achieves superior 3D reconstruction of peristaltic colon motion by preserving global geometric consistency, supported by a new synthetic benchmark dataset called DynamicColon and a critical analysis of existing methods' limitations.

Weronika Smolak-Dy\.zewska, Joanna Kaleta, Diego Dall'Alba, Przemysław Spurek2026-03-10💻 cs

IGLU: The Integrated Gaussian Linear Unit Activation Function

This paper introduces IGLU, a novel parametric activation function derived from a scale mixture of GELU gates that utilizes a Cauchy CDF to provide heavy-tailed gradient properties and robustness against vanishing gradients, alongside a computationally efficient rational approximation (IGLU-Approx) that achieves competitive or superior performance across vision and language tasks compared to standard baselines like ReLU and GELU.

Mingi Kang, Zai Yang, Jeova Farias Sales Rocha Neto2026-03-10🤖 cs.LG

A prior information informed learning architecture for flying trajectory prediction

This paper proposes a hardware-efficient trajectory prediction framework that integrates environmental priors with a Dual-Transformer-Cascaded (DTC) architecture to accurately predict the landing points of flying objects, such as tennis balls, by outperforming existing methods in complex real-world scenarios.

Xianda Huang, Zidong Han, Ruibo Jin, Zhenyu Wang, Wenyu Li, Xiaoyang Li, Yi Gong2026-03-10💻 cs

PICS: Pairwise Image Compositing with Spatial Interactions

The paper introduces PICS, a self-supervised framework that improves pairwise image compositing by employing an Interaction Transformer with mask-guided Mixture-of-Experts and adaptive blending to explicitly model spatial interactions and preserve physical consistency between objects and backgrounds.

Hang Zhou, Xinxin Zuo, Sen Wang, Li Cheng2026-03-10💻 cs

OPTED: Open Preprocessed Trachoma Eye Dataset Using Zero-Shot SAM 3 Segmentation

This paper introduces OPTED, an open-source preprocessed trachoma eye dataset derived from 2,832 images using a zero-shot SAM 3 pipeline to automatically extract and standardize regions of interest, thereby addressing the scarcity of high-quality data for automated trachoma classification in Sub-Saharan Africa.

Kibrom Gebremedhin, Hadush Hailu, Bruk Gebregziabher2026-03-10💻 cs

Learning From Design Procedure To Generate CAD Programs for Data Augmentation

This paper proposes a novel data augmentation paradigm that leverages Large Language Models to generate diverse, industry-resembling CAD programs by conditioning them on reference surfaces and modeling procedures, thereby addressing the scarcity of complex, spline-based geometric data in existing training sets.

Yan-Ying Chen, Dule Shu, Matthew Hong, Andrew Taber, Jonathan Li, Matthew Klenk2026-03-10🤖 cs.LG

PaQ-DETR: Learning Pattern and Quality-Aware Dynamic Queries for Object Detection

PaQ-DETR is a unified object detection framework that addresses query utilization imbalance by dynamically generating image-specific queries from shared latent patterns and employing a quality-aware one-to-many assignment strategy, resulting in consistent mAP improvements across various DETR backbones.

Zhengjian Kang, Jun Zhuang, Kangtong Mo, Qi Chen, Rui Liu, Ye Zhang2026-03-10💻 cs

DLRMamba: Distilling Low-Rank Mamba for Edge Multispectral Fusion Object Detection

The paper proposes DLRMamba, a novel framework for edge-based multispectral object detection that combines a Low-Rank SS2D module to reduce parameter redundancy with a Structure-Aware Distillation strategy to preserve feature fidelity, achieving superior efficiency and accuracy on resource-constrained hardware.

Qianqian Zhang, Leon Tabaro, Ahmed M. Abdelmoniem, Junshe An2026-03-10💻 cs

Small Target Detection Based on Mask-Enhanced Attention Fusion of Visible and Infrared Remote Sensing Images

This paper introduces ESM-YOLO+, a lightweight visible-infrared fusion network that employs a Mask-Enhanced Attention Fusion module and training-time Structural Representation enhancement to achieve high-precision small-target detection in complex remote sensing scenes while significantly reducing model complexity compared to baselines.

Qianqian Zhang, Xiaolong Jia, Ahmed M. Abdelmoniem, Li Zhou, Junshe An2026-03-10💻 cs

← Previous Next →

cs.CV