Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

This paper introduces MicroCoder, a high-quality dataset of curated, recent, and challenging competitive programming problems processed through a four-stage framework with automatic difficulty filtering, which significantly boosts coding model performance on unseen hard tasks compared to existing baselines.

Zongqian Li, Tengchao Lv, Shaohan Huang, Yixuan Su, Qinzheng Sun, Qiufeng Yin, Ying Xin, Scarlett Li, Lei Cui, Nigel Collier, Furu Wei

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a brilliant but inexperienced apprentice how to solve complex puzzles. You have two choices for your training manual:

  1. The "Easy Mode" Book: A massive library containing thousands of puzzles, but 80% of them are simple riddles like "What is 2+2?" or "Name a fruit." The apprentice gets bored, thinks they are a genius, but fails miserably when faced with a real, tricky challenge.
  2. The "Hard Mode" Bootcamp: A much smaller, carefully curated book containing only the toughest, most recent, and most interesting puzzles. The apprentice struggles at first, makes many mistakes, but eventually learns to think deeply and solve problems they never thought possible.

This paper is about building that "Hard Mode" Bootcamp for AI coding models.

Here is the story of how the researchers at Microsoft and Cambridge built MicroCoder, a new dataset designed to make AI coders significantly smarter.

The Problem: Too Much "Fluff"

For a long time, the datasets used to train AI coders were like that "Easy Mode" book. They were huge, but they had three big issues:

  • Too Easy: They were filled with simple problems that didn't challenge the AI.
  • Outdated: They used old problems that the AI had already seen during its initial training, so it was just "memorizing" answers rather than learning.
  • Messy: The instructions were inconsistent (some asked for code in one format, others in another), and many problems had broken test cases (like a math problem with a missing number).

The Solution: The "Four-Stage Filter"

The researchers didn't just grab a bunch of problems; they built a sophisticated factory line to process them. Think of it as a high-end coffee roaster that sorts beans to ensure only the best make it into your cup.

  1. Collection (Gathering the Beans): They gathered raw problems from all over the internet and various coding competition platforms.
  2. Processing (Cleaning and Roasting): They translated everything into English, fixed broken images or formulas, and standardized the instructions so every problem looked the same. They also generated new test cases to ensure the problems were solvable.
  3. Filtering (The Difficulty Sorter): This is the magic step. They used a special AI "judge" to rate every problem on a scale of 1 to 5 based on five different factors (like how hard the logic is, how much knowledge you need, etc.).
    • The Analogy: Imagine a bouncer at an exclusive club. The bouncer checks the ID of every problem. If it's too easy (a "1"), it gets kicked out. If it's challenging (a "4" or "5"), it gets in.
  4. Verification (The Final Taste Test): Humans and machines double-checked the final list to make sure the problems actually worked and weren't duplicates.

The Result: MicroCoder

The result is MicroCoder, a dataset of about 13,300 problems.

  • It is smaller than some other datasets (which have hundreds of thousands of problems), but it is much denser with quality.
  • It focuses on fresh problems (so the AI hasn't seen them before) and hard problems.

The Proof: Training the AI

The researchers trained an AI model using MicroCoder and compared it to models trained on the old, massive "Easy Mode" datasets.

  • The "3x" Boost: Within just 300 steps of training, the MicroCoder-trained model improved three times faster than the others.
  • The "Hard" Wins: The biggest gains were on medium and hard problems. The AI didn't just get better at easy stuff; it learned to tackle the complex logic it previously failed at.
  • The "Recency" Factor: Because the problems were new, the AI couldn't cheat by memorizing answers; it had to actually learn how to think.

Why This Matters

Think of it like athletic training. If you only run on a flat, easy treadmill, you won't get very fit. But if you train on a steep, rocky mountain trail, your muscles will grow stronger, and you'll be able to run anywhere.

MicroCoder is that rocky mountain trail for AI. It proves that quality beats quantity. By carefully curating data that is difficult and fresh, we can build AI models that are much more capable at solving real-world, complex coding problems.

In short: Stop feeding the AI a mountain of easy homework. Give it a few hard, fresh challenges, and watch it learn to think like a master programmer.