SWE-Fuse: Empowering Software Agents via Issue-free Trajectory Learning and Entropy-aware RLVR Training

SWE-Fuse is a novel training framework that enhances software engineering agents by fusing issue-free trajectory learning with entropy-aware RLVR to overcome the limitations of noisy real-world issue descriptions, achieving state-of-the-art performance on the SWE-bench Verified benchmark.

Xin-Cheng Wen, Binbin Chen, Haoxuan Lan, Hang Yu, Peng Di, Cuiyun Gao

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a very smart but inexperienced apprentice how to fix broken machines in a giant, chaotic factory. This is what the paper SWE-Fuse is all about: teaching Artificial Intelligence (AI) to fix real-world software bugs.

Here is the story of how they did it, using simple analogies.

The Problem: The "Confusing Instruction Manual"

Usually, when we teach an AI to fix code, we give it a "ticket" (a bug report) that says, "The machine is making a weird noise." The AI then tries to fix it.

But in the real world, these tickets are often garbage.

  • The Mismatch: Sometimes the ticket says, "The engine is overheating," but the actual broken part is a loose screw in the radio.
  • The Result: The AI gets confused. It spends hours trying to fix the engine, only to realize the radio was the problem all along. It wastes time and learns the wrong lessons because the "instruction manual" (the bug description) was lying or unclear.

The Solution: SWE-Fuse (The Smart Training Camp)

The researchers built a new training system called SWE-Fuse. Instead of just reading the confusing tickets, they taught the AI a better way to learn. They used two main tricks:

Trick 1: The "Blindfolded Detective" (Issue-Free Learning)

Imagine you are training a detective. Instead of telling them, "The butler did it," you just show them the crime scene and say, "Find out what happened."

  • How it works: SWE-Fuse takes some training examples and hides the bug description. It forces the AI to look at the code, run tests, see what breaks, and figure out the problem on its own.
  • The Analogy: It's like teaching a chef to cook by giving them a list of ingredients and a broken dish, rather than a recipe that says "Add salt." The chef learns to taste and adjust (debug) rather than just blindly following a potentially wrong recipe.
  • Why it helps: This stops the AI from getting distracted by bad instructions. It learns the process of fixing things, not just memorizing the answers to specific questions.

Trick 2: The "Adaptive Coach" (Entropy-Aware RL)

Once the AI has learned the basics, they put it in a "Reinforcement Learning" gym. This is where the AI tries to fix bugs, gets a score, and tries again.

Usually, coaches are too strict or too loose.

  • Too strict: If the AI is confused (high "entropy" or uncertainty), a strict coach punishes it for trying new things. The AI gets scared to explore.
  • Too loose: If the AI is confident (low entropy), a loose coach lets it make wild guesses that might break everything.

SWE-Fuse's Coach is "Entropy-Aware":

  • When the AI is confused: The coach says, "Hey, you're not sure? Go ahead and try wild ideas! Explore!" (Relaxed rules).
  • When the AI is confident: The coach says, "You know what you're doing? Stick to the plan and be precise." (Strict rules).
  • The Analogy: Think of it like a parent teaching a child to ride a bike. When the child is wobbling and unsure, the parent holds the seat tight but encourages them to pedal. When the child is riding smoothly, the parent lets go and says, "Keep your balance, don't swerve!"

The Results: The Underdog Wins

The researchers tested this new training method on a famous challenge called SWE-bench, which is like the "Olympics" for AI code-fixing.

  • The Old Way: Other AI models (even very big ones) struggled. They got confused by the messy real-world tickets and often failed.
  • The SWE-Fuse Way: Their model, even though it was smaller (32 billion "brain cells" compared to some giants with hundreds of billions), won.
    • It solved 60.2% of the problems.
    • The next best open-source model only solved about 54%.
    • It even beat some massive, expensive, closed-source models from tech giants.

The Big Takeaway

The paper proves that you don't need the biggest, most expensive AI to fix software. You just need to teach it how to think rather than what to read.

By teaching the AI to ignore bad instructions and learn by doing (debugging step-by-step), and by coaching it with the right amount of freedom at the right time, they created a software repair expert that is smarter, faster, and more reliable than the competition.

In short: SWE-Fuse taught the AI to be a detective who solves crimes by looking at the evidence, rather than a student who just memorizes the answer key (even when the answer key is wrong).