DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning
The paper introduces DataChef-32B, a reinforcement learning-based system that automates the end-to-end generation of optimal data recipes for adapting Large Language Models to specific tasks, achieving performance comparable to or exceeding human-curated pipelines and official checkpoints.