cs.CV papers | Gist.Science

Modelling and Simulation of Neuromorphic Datasets for Anomaly Detection in Computer Vision

To address the scarcity of Dynamic Vision Sensor data, this paper introduces ANTShapes, a Unity-based simulation framework that generates customizable neuromorphic datasets with statistically labeled anomalies for training event-based computer vision models.

Mike Middleton, Teymoor Ali, Hakan Kayan + 6 more2026-03-02🤖 cs.LG

All in One: Unifying Deepfake Detection, Tampering Localization, and Source Tracing with a Robust Landmark-Identity Watermark

This paper proposes a unified proactive forensics framework that employs an innovative 152-dimensional landmark-identity watermark (LIDMark) and a Factorized-Head Decoder to simultaneously achieve robust deepfake detection, tampering localization, and source tracing within a single, imperceptible system.

Junjiang Wu, Liejun Wang, Zhiqing Guo2026-03-02💻 cs

Few-Shot Continual Learning for 3D Brain MRI with Frozen Foundation Models

This paper proposes a few-shot continual learning framework for 3D brain MRI that combines a frozen foundation model with task-specific Low-Rank Adaptation (LoRA) modules to effectively handle sequential tumor segmentation and brain age estimation tasks without catastrophic forgetting, while using fewer than 0.1% trainable parameters per task.

Chi-Sheng Chen, Xinyu Zhang, Guan-Ying Chen + 3 more2026-03-02⚡ eess

Automated Dose-Based Anatomic Region Classification of Radiotherapy Treatment for Big Data Applications

This paper presents a scalable, automated deep-learning software solution that accurately classifies radiotherapy treatment sites into six anatomic regions by analyzing dose-volume overlaps with segmented organs, thereby overcoming metadata inconsistencies to enable reliable curation of large-scale, multi-institutional radiotherapy datasets.

Justin Hink, Yasin Abdulkadir, Jack Neylon + 1 more2026-03-02🔬 physics

LE-NeuS: Latency-Efficient Neuro-Symbolic Video Understanding via Adaptive Temporal Verification

LE-NeuS is a latency-efficient neuro-symbolic framework for long-form video question answering that achieves a significant reduction in inference latency (from 90x to ~10x compared to base VLMs) while preserving accuracy gains through CLIP-guided adaptive frame sampling and batched proposition detection.

Shawn Liang, Sahil Shah, Chengwei Zhou + 5 more2026-03-02💻 cs

No Calibration, No Depth, No Problem: Cross-Sensor View Synthesis with 3D Consistency

This paper introduces a calibration-free cross-sensor view synthesis framework that leverages a match-densify-consolidate pipeline and 3D Gaussian Splatting to generate aligned RGB-X data without requiring expensive sensor calibration or 3D priors for the non-RGB modality.

Cho-Ying Wu, Zixun Huang, Xinyu Huang + 1 more2026-03-02💻 cs

Evidential Neural Radiance Fields

This paper introduces Evidential Neural Radiance Fields, a probabilistic framework that enables the simultaneous quantification of both aleatoric and epistemic uncertainty in 3D scene modeling through a single forward pass without compromising rendering quality or incurring significant computational overhead.

Ruxiao Duan, Alex Wong2026-03-02🤖 cs.AI

CycleBEV: Regularizing View Transformation Networks via View Cycle Consistency for Bird's-Eye-View Semantic Segmentation

CycleBEV is a training-only regularization framework that enhances Bird's-Eye-View semantic segmentation by introducing an inverse view transformation network to enforce cycle consistency between perspective and BEV spaces, thereby improving geometric and semantic feature learning without increasing inference complexity.

Jeongbin Hong, Dooseop Choi, Taeg-Hyun An + 2 more2026-03-02🤖 cs.AI

Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning

This paper introduces HDFLIM, a framework that achieves efficient image captioning by aligning frozen vision and language models through hyperdimensional computing operations like binding and bundling, thereby eliminating the need for computationally intensive multimodal fine-tuning while maintaining performance comparable to end-to-end training methods.

Abhishek Dalvi, Vasant Honavar2026-03-02🤖 cs.AI

Incremental dimension reduction for efficient and accurate visual anomaly detection

This paper proposes an incremental dimension reduction algorithm that processes image features in batches to update truncated singular value decomposition, thereby enabling efficient and accurate visual anomaly detection on large-scale datasets with minimal memory overhead.

Teng-Yok Lee2026-03-02💻 cs

Extended Reality (XR): The Next Frontier in Education

This article examines how Extended Reality (XR) transforms education through immersive learning while addressing the significant barriers of cost, technical complexity, and ethical data concerns that must be overcome to balance innovation with accessibility and security.

Shadeeb Hossain2026-03-02💻 cs

Egocentric Visibility-Aware Human Pose Estimation

This paper addresses the challenge of keypoint invisibility in egocentric human pose estimation by introducing the large-scale, visibility-annotated Eva-3M dataset and the novel EvaPose method, which leverages explicit visibility information to achieve state-of-the-art performance.

Peng Dai, Yu Zhang, Yiqiang Feng + 2 more2026-03-02💻 cs

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

This paper introduces DLEBench, the first benchmark designed to evaluate Instruction-based Image Editing Models on small-scale object editing through a challenging dataset of 1,889 samples and a refined dual-mode evaluation protocol, revealing significant performance gaps in current models.

Shibo Hong, Boxian Ai, Jun Kuang + 5 more2026-03-02🤖 cs.AI

BuildAnyPoint: 3D Building Structured Abstraction from Diverse Point Clouds

BuildAnyPoint is a novel generative framework that leverages a Loosely Cascaded Diffusion Transformer and autoregressive mesh generation to reconstruct structured 3D building abstractions from diverse and sparse point clouds, achieving superior surface accuracy and distribution uniformity compared to prior methods.

Tongyan Hua, Haoran Gong, Yuan Liu + 3 more2026-03-02💻 cs

Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering

This paper introduces Semantically Decoupled Latent Steering (SDLS), a training-free inference-time framework that utilizes LLM-driven semantic decomposition and QR-based orthogonalization to generate intervention vectors that specifically suppress prior-comparison hallucinations in radiology report generation while preserving clinical accuracy.

Ao Li, Rui Liu, Mingjie Li + 6 more2026-03-02💻 cs

Vision-Language Semantic Grounding for Multi-Domain Crop-Weed Segmentation

The paper proposes Vision-Language Weed Segmentation (VL-WS), a novel framework that leverages vision-language alignment and a unified multi-domain training corpus to achieve superior generalization and data efficiency in fine-grained crop-weed segmentation across diverse agricultural environments.

Nazia Hossain, Xintong Jiang, Yu Tian + 3 more2026-03-02💻 cs

Any Model, Any Place, Any Time: Get Remote Sensing Foundation Model Embeddings On Demand

To address the challenges of heterogeneity in remote sensing foundation models, this paper introduces rs-embed, a Python library that enables users to retrieve embeddings from any supported model for any location and time range through a unified, single-line interface.

Dingqi Ye, Daniel Kiv, Wei Hu + 2 more2026-03-02🤖 cs.LG

HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit

HiDrop is a novel framework that significantly accelerates Multimodal Large Language Models (MLLMs) by aligning token pruning with hierarchical layer functions through Late Injection, Concave Pyramid Pruning, and Early Exit mechanisms, achieving a 90% reduction in visual tokens with a 1.72x training speedup while maintaining original performance.

Hao Wu, Yingqi Fan, Jinyang Dai + 3 more2026-03-02💬 cs.CL

A Reliable Indoor Navigation System for Humans Using AR-based Technique

This paper proposes a reliable indoor navigation system for humans that integrates Vuforia Area Target for environment modeling, AI NavMesh for pathfinding, and the A* algorithm to deliver faster, more accurate, and intuitive real-time guidance compared to traditional signage and GPS-based methods.

Vijay U. Rathod, Manav S. Sharma, Shambhavi Verma + 3 more2026-03-02💻 cs

EgoGraph: Temporal Knowledge Graph for Egocentric Video Understanding

EgoGraph is a training-free, dynamic knowledge graph framework that overcomes the limitations of existing methods in ultra-long egocentric video understanding by constructing a unified schema and temporal relational modeling to capture long-term cross-entity dependencies, thereby achieving state-of-the-art performance on long-term video question answering benchmarks.

Shitong Sun, Ke Han, Yukai Huang + 2 more2026-03-02💻 cs

← Previous Next →