cs.CV papers | Gist.Science

Accelerating Text-to-Video Generation with Calibrated Sparse Attention

The paper introduces CalibAtt, a training-free method that accelerates text-to-video generation by identifying and skipping stable, negligible attention connections through an offline calibration process, achieving up to 1.58x speedup while maintaining generation quality across various models.

Shai Yehezkel, Shahar Yadin, Noam Elata + 2 more2026-03-06💻 cs

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

FaceCam is a novel system that generates high-quality portrait videos with customizable camera trajectories by introducing a scale-aware conditioning representation and specialized data generation strategies, effectively overcoming geometric distortions and visual artifacts common in existing methods without relying on 3D priors.

Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu2026-03-06💻 cs

Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups

This paper proposes a novel, resolution-independent, transformer-based inpainting module that utilizes spatio-temporal embeddings and adaptive patch selection to effectively complete missing textures in real-time 3D streaming from sparse multi-camera setups, achieving superior quality-speed trade-offs compared to existing methods.

Leif Van Holland, Domenic Zingsheim, Mana Takhsha + 4 more2026-03-06💻 cs

Volley Revolver: A Novel Matrix-Encoding Method for Privacy-Preserving Neural Networks (Inference)

This paper introduces "Volley Revolver," a novel matrix-encoding method that enables efficient privacy-preserving neural network inference via homomorphic encryption, demonstrated by a convolutional neural network that classifies 32 encrypted MNIST images in approximately 287 seconds on a public cloud while requiring only a single ciphertext upload.

John Chiang2026-03-05💻 cs

Schrödinger's Camera: First Steps Towards a Quantum-Based Privacy Preserving Camera

This paper proposes a novel quantum-based privacy-preserving camera system that stores low-resolution imagery in reversible quantum states and utilizes a double deep Q-learning algorithm to dynamically balance privacy and utility before measurement, demonstrating the feasibility of controlling both aspects in simulation.

Hannah Kirkland, Sanjeev J. Koppal2026-03-05⚛️ quant-ph

GeoTop: Advancing Image Classification with Geometric-Topological Analysis

GeoTop is a mathematically principled framework that unifies Topological Data Analysis and Lipschitz-Killing Curvatures to resolve the diagnostic ambiguity of topologically equivalent structures by integrating robust topological signatures with precise geometric features, thereby achieving superior accuracy and interpretability in image classification tasks such as skin lesion diagnosis.

Mariem Abaach, Ian Morilla2026-03-05🤖 cs.LG

Catch Me If You Can Describe Me: Open-Vocabulary Camouflaged Instance Segmentation with Diffusion

This paper proposes a novel diffusion-based method for Open-Vocabulary Camouflaged Instance Segmentation (OVCIS) that effectively fuses multi-scale textual-visual features to overcome the challenges of blending boundaries and segmenting unseen object classes, demonstrating superior performance on benchmarks with applications in surveillance, wildlife monitoring, and military reconnaissance.

Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo + 4 more2026-03-05🤖 cs.AI

Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation

This paper presents Export3D, a one-shot 3D-aware portrait animation method that utilizes a contrastive pre-training framework and a tri-plane generator to achieve expression-controllable, view-varying image synthesis while effectively eliminating undesirable appearance swaps during cross-identity expression transfer.

Taekyung Ki, Dongchan Min, Gyeongsu Chae2026-03-05🤖 cs.AI

FireANTs: Adaptive Riemannian Optimization for Multi-Scale Diffeomorphic Matching

The paper introduces FireANTs, a training-free, GPU-accelerated multi-scale Adaptive Riemannian Optimization algorithm that achieves significantly faster and more memory-efficient dense diffeomorphic image matching than both traditional methods and deep learning approaches while maintaining robust generalization across diverse modalities and anatomical structures.

Rohit Jena, Pratik Chaudhari, James C. Gee2026-03-05💻 cs

Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

Merlin is a novel 3D vision-language foundation model trained on a massive dataset of over 15,000 abdominal CT scans and associated clinical data that outperforms existing 2D and specialized models across diverse diagnostic, prognostic, and quality-related tasks while demonstrating strong generalization across multiple institutions.

Louis Blankemeier, Ashwin Kumar, Joseph Paul Cohen + 37 more2026-03-05🤖 cs.AI

Natural Adversaries: Fuzzing Autonomous Vehicles with Realistic Roadside Object Placements

This paper introduces TrashFuzz, a black-box fuzzing algorithm that manipulates the realistic placement of common roadside objects to generate adversarial scenarios causing autonomous vehicles to misperceive traffic signals and violate traffic laws, demonstrating significant vulnerabilities in the Apollo system without relying on unnatural adversarial patches.

Yang Sun, Haoyu Wang, Christopher M. Poskitt + 1 more2026-03-05💻 cs

FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models

The paper proposes FINE, a novel pre-training method that factorizes diffusion model weights into shared, size-agnostic "learngenes" and layer-specific components, enabling the efficient initialization of variable-sized models without repeated pre-training while achieving state-of-the-art performance across diverse resource-constrained deployments.

Yucheng Xie, Fu Feng, Ruixiao Shi + 4 more2026-03-05💻 cs

Scaling Laws For Diffusion Transformers

This paper establishes the first scaling laws for Diffusion Transformers (DiT) by demonstrating a power-law relationship between pretraining loss and compute across a broad range of budgets, enabling accurate predictions of optimal model size, data requirements, and synthesis quality for future large-scale deployments.

Zhengyang Liang, Hao He, Ceyuan Yang + 1 more2026-03-05💻 cs

TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control

TextMaster is a unified framework that achieves realistic and controllable text editing by integrating high-resolution glyph information, perceptual loss, and an attention-based layout mechanism to overcome existing limitations in stroke accuracy and style control.

Zhenyu Yan, Jian Wang, Aoqiang Wang + 3 more2026-03-05💻 cs

FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation

The paper introduces FlowCLAS, a hybrid framework that enhances normalizing flows for anomaly segmentation by integrating a contrastive loss with outlier exposure to bridge the performance gap between generative and discriminative methods, achieving state-of-the-art results on multiple robotics benchmarks.

Chang Won Lee, Selina Leveugle, Svetlana Stolpner + 4 more2026-03-05🤖 cs.LG

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

This paper introduces VideoMindPalace, a framework that structures long-form video understanding into a topologically organized semantic graph based on hand-object interactions, activity zones, and layout mapping, alongside a new benchmark (VMB), to significantly enhance the spatio-temporal coherence and human-aligned reasoning capabilities of Large Vision Language Models.

Zeyi Huang, Yuyang Ji, Xiaofang Wang + 11 more2026-03-05💻 cs

DCENWCNet: A Deep CNN Ensemble Network for White Blood Cell Classification with LIME-Based Explainability

The paper proposes DCENWCNet, a novel deep CNN ensemble model that integrates three uniquely configured architectures to achieve state-of-the-art accuracy and robustness in white blood cell classification on the Rabbin-WBC dataset, while employing LIME to enhance model interpretability and trust in automated diagnosis.

Sibasish Dhibar2026-03-05🤖 cs.AI

Token Adaptation via Side Graph Convolution for Efficient Fine-tuning of 3D Point Cloud Transformers

This paper introduces STAG, a parameter-efficient fine-tuning method for 3D point cloud Transformers that utilizes a parallel graph convolutional side network to significantly reduce computational and memory costs while maintaining classification accuracy, alongside the release of a new comprehensive benchmark, PCC13.

Takahiko Furuya2026-03-05💻 cs

A dataset of high-resolution plantar pressures for gait analysis across varying footwear and walking speeds

This paper introduces the UNB StepUP-P150 dataset, a large-scale, high-resolution collection of plantar pressure data from 150 individuals across varying walking speeds and footwear conditions, designed to advance research in biometric gait recognition, biomechanics, and deep learning.

Robyn Larracy, Angkoon Phinyomark, Ala Salehi + 5 more2026-03-05🤖 cs.LG

Generative Human Geometry Distribution

This paper proposes a novel two-stage generative framework that encodes human geometry distributions as 2D feature maps within an SMPL domain to achieve high-fidelity, pose-conditioned avatar generation with significantly improved geometry quality compared to existing state-of-the-art methods.

Xiangjun Tang, Biao Zhang, Peter Wonka2026-03-05💻 cs

← Previous Next →