cs.CV 편의 논문 | Gist.Science

Twin Co-Adaptive Dialogue for Progressive Image Generation

이 논문은 사용자의 피드백과 동기화된 대화 에이전트를 통해 이미지 생성을 점진적으로 정제하고 모호성을 해소하는 'Twin-Co' 프레임워크를 제안하여 사용자 경험과 생성 품질을 동시에 향상시키는 방법을 제시합니다.

Jianhui Wang, Yangfan He, Yan Zhong + 12 more2026-02-26💻 cs

Identifying Memorization of Diffusion Models through $p$ -Laplace Analysis: Estimators, Bounds and Applications

본 논문은 확산 모델이 학습한 스코어 함수를 기반으로 $p$ -라플라시안 연산자를 수치적으로 근사하고 이론적 오차 한계를 증명하여, 조건부 텍스트가 없는 상황에서도 훈련 데이터의 암기를 효과적으로 식별할 수 있음을 보여줍니다.

Jonathan Brokman, Itay Gershon, Amit Giloni + 4 more2026-02-26🔢 math

Transformer-based cardiac substructure segmentation from contrast and non-contrast computed tomography for radiotherapy planning

본 연구는 사전 학습된 트랜스포머 기반의 SMIT 모델을 균형 있는 커리큘럼 학습으로 미세 조정하여, 라벨이 지정된 훈련 데이터의 양을 크게 줄이면서도 다양한 환자 및 영상 조건에 걸쳐 방사선 치료 계획에 필요한 심장 하부 구조 분할의 정확도와 견고성을 유지할 수 있음을 입증했습니다.

Aneesh Rangnekar, Nikhil Mankuzhy, Jonas Willmann + 5 more2026-02-26⚡ eess

← 이전 다음 →

cs.CV

Twin Co-Adaptive Dialogue for Progressive Image Generation

Identifying Memorization of Diffusion Models through $p$ -Laplace Analysis: Estimators, Bounds and Applications

Transformer-based cardiac substructure segmentation from contrast and non-contrast computed tomography for radiotherapy planning

JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models

Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

Capturing Stable HDR Videos Using a Dual-Camera System

Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers

PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding

Lang2Lift: A Language-Guided Autonomous Forklift System for Outdoor Industrial Pallet Handling

Voxel Densification for Serialized 3D Object Detection: Mitigating Sparsity via Pre-serialization Expansion

Variation-aware Vision Token Dropping for Faster Large Vision-Language Models

MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation

Uncovering Grounding IDs: How External Cues Shape Multimodal Binding

Hallucination Filtering in Radiology Vision-Language Models Using Discrete Semantic Entropy

ImpMIA: Leveraging Implicit Bias for Membership Inference Attack

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Caption-Driven Explainability: Probing CNNs for Bias via CLIP

cs.CV

Twin Co-Adaptive Dialogue for Progressive Image Generation

Identifying Memorization of Diffusion Models through ppp-Laplace Analysis: Estimators, Bounds and Applications

Transformer-based cardiac substructure segmentation from contrast and non-contrast computed tomography for radiotherapy planning

JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models

Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

Capturing Stable HDR Videos Using a Dual-Camera System

Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers

PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding

Lang2Lift: A Language-Guided Autonomous Forklift System for Outdoor Industrial Pallet Handling

Voxel Densification for Serialized 3D Object Detection: Mitigating Sparsity via Pre-serialization Expansion

Variation-aware Vision Token Dropping for Faster Large Vision-Language Models

MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation

Uncovering Grounding IDs: How External Cues Shape Multimodal Binding

Hallucination Filtering in Radiology Vision-Language Models Using Discrete Semantic Entropy

ImpMIA: Leveraging Implicit Bias for Membership Inference Attack

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Caption-Driven Explainability: Probing CNNs for Bias via CLIP

Identifying Memorization of Diffusion Models through $p$ -Laplace Analysis: Estimators, Bounds and Applications