SGG-R: From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation
The paper introduces SGG-R, a structured reasoning framework that combines chain-of-thought-guided supervised fine-tuning with relation augmentation and a novel dual-granularity reward scheme in reinforcement learning to achieve end-to-end unbiased Scene Graph Generation with improved recall and reduced bias on long-tailed distributions.