SNPgen: Phenotype-Supervised Genotype Representation and Synthetic Data Generation via Latent Diffusion
SNPgen is a two-stage conditional latent diffusion framework that generates privacy-preserving, phenotype-aligned synthetic genotype data, enabling machine learning models trained on synthetic samples to achieve predictive performance comparable to those trained on real data while maintaining strict privacy guarantees and preserving key genetic structures.