SC3: The Multi-Solvent Solubility Challenge and Benchmark
This paper introduces SC3, a rigorously curated multi-solvent solubility benchmark with a recalibrated aleatoric limit and advanced evaluation metrics, revealing that current state-of-the-art models remain significantly less reliable than previously assumed and highlighting the critical role of calibrated uncertainty for future improvements.