cs.CV 편의 논문 | Gist.Science

SoFlow: Solution Flow Models for One-Step Generative Modeling

이 논문은 확산 및 흐름 매칭 모델의 다단계 생성 과정이 초래하는 비효율성을 해결하기 위해, 자코비안-벡터 곱 (JVP) 계산 없이 Classifier-Free Guidance 를 지원하며 ImageNet 에서 MeanFlow 보다 우수한 성능을 보이는 새로운 원스텝 생성 프레임워크인 'Solution Flow Models (SoFlow)'를 제안합니다.

Tianze Luo, Haotian Yuan, Zhuang Liu2026-03-03🤖 cs.LG

AI-Powered Dermatological Diagnosis: From Interpretable Models to Clinical Implementation A Comprehensive Framework for Accessible and Trustworthy Skin Disease Detection

이 논문은 가족력 데이터를 임상 이미지와 결합한 해석 가능한 다중 모달 AI 프레임워크를 개발하여 피부 질환의 진단 정확도를 향상시키고, 향후 다양한 의료 환경에서의 전향적 임상 검증을 통해 임상 현장에 효과적으로 적용할 수 있는 포괄적인 체계를 제시합니다.

Satya Narayana Panda, Vaishnavi Kukkala, Spandana Iyer2026-03-03🤖 cs.AI

GeoTeacher: Geometry-Guided Semi-Supervised 3D Object Detection

이 논문은 제한된 라벨 데이터 환경에서 3D 객체 감지 성능을 향상시키기 위해, 교사 모델의 기하학적 지식을 전달하는 키포인트 기반 기하 관계 감독 모듈과 거리 감쇠 메커니즘을 포함한 볼륨 단위 데이터 증강 전략을 제안하는 'GeoTeacher'를 소개합니다.

Jingyu Li, Xiaolong Zhao, Zhe Liu + 2 more2026-03-03💻 cs

ForCM: Forest Cover Mapping from Multispectral Sentinel-2 Image by Integrating Deep Learning with Object-Based Image Analysis

본 연구는 심층 학습 모델 (UNet, ResUNet, AttentionUNet 등) 과 객체 기반 이미지 분석 (OBIA) 을 결합한 'ForCM'을 제안하여 아마존 열대우림의 Sentinel-2 위성 영상을 활용한 산림 피복 매핑 정확도를 기존 OBIA 방법 대비 95.64% 까지 향상시켰음을 보여줍니다.

Maisha Haque, Israt Jahan Ayshi, Sadaf M. Anis + 8 more2026-03-03🤖 cs.AI

← 이전 다음 →

cs.CV

SoFlow: Solution Flow Models for One-Step Generative Modeling

AI-Powered Dermatological Diagnosis: From Interpretable Models to Clinical Implementation A Comprehensive Framework for Accessible and Trustworthy Skin Disease Detection

GeoTeacher: Geometry-Guided Semi-Supervised 3D Object Detection

ForCM: Forest Cover Mapping from Multispectral Sentinel-2 Image by Integrating Deep Learning with Object-Based Image Analysis

Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization

Aligned explanations in neural networks

TP-Blend: Textual-Prompt Attention Pairing for Precise Object-Style Blending in Diffusion Models

Copy-Trasform-Paste: Zero-Shot Object-Object Alignment Guided by Vision-Language and Geometric Constraints

Counterfactual Explanations on Robust Perceptual Geodesics

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

When Anomalies Depend on Context: Learning Conditional Compatibility for Anomaly Detection

Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning

Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning

CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Investigating Disability Representations in Text-to-Image Models

RFDM: Residual Flow Diffusion Model for Efficient Causal Video Editing

Single-Slice-to-3D Reconstruction in Medical Imaging and Natural Objects: A Comparative Benchmark with SAM 3D

EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation