VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs
VIVID-Med introduces a novel framework that leverages a frozen large language model as a structured semantic teacher to pretrain lightweight, deployable medical Vision Transformers via a Unified Medical Schema and Structured Prediction Decomposition, achieving state-of-the-art performance across diverse medical imaging tasks with significantly reduced data requirements compared to existing vision-language models.