Prompt-Based Caption Generation for Single-Tooth Dental Images Using Vision-Language Models
This paper addresses the lack of specialized dental datasets by proposing a framework that uses Vision-Language Models with guided prompts to generate high-quality, holistic captions for single-tooth RGB images, thereby enabling more comprehensive dental image analysis.