LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms

This empirical study compares LLM-based and search-based merge conflict resolution tools, revealing that while LLMs excel with imbalanced content, search-based methods offer superior robustness and generalization, ultimately suggesting that hybrid systems combining both paradigms are necessary for optimal performance.

Original authors: Heleno de Souza Campos Junior, Leonardo Gresta Paulino Murta

Published 2026-05-19✓ Author reviewed
📖 5 min read🧠 Deep dive

Original authors: Heleno de Souza Campos Junior, Leonardo Gresta Paulino Murta

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you and a friend are both editing the same document at the same time. You both make changes to the same paragraph, and when you try to combine your work, the computer throws up its hands and says, "I don't know which version to keep!" This is called a merge conflict.

For decades, developers have had to manually fix these conflicts, which is tedious and prone to mistakes. Recently, two new "smart helpers" have emerged to solve this problem automatically. This paper is a head-to-head race between these two helpers to see which one is better.

The Two Contenders

Think of the two helpers as having very different personalities and skill sets:

1. The "Super-Reader" (LLM-based approach, represented by MergeGen)

  • How it works: This helper is like a brilliant student who has read millions of books and code documents. It doesn't really "calculate" the answer; instead, it uses its memory of how things usually look to guess the best solution. It predicts the next word or line based on patterns it has learned.
  • The Analogy: It's like a chef who has tasted thousands of soups. If you give it a recipe with a missing ingredient, it doesn't measure the spices; it just "knows" what the soup should taste like based on experience and adds the right amount.

2. The "Puzzle Solver" (Search-based approach, represented by SBCR)

  • How it works: This helper is a methodical engineer. It doesn't know what code means; it just sees lines of text. It treats the conflict like a giant jigsaw puzzle. It tries millions of different combinations of the existing lines, checking each one to see which mix looks the most like the original versions. It uses a simple rule: "The best solution is usually a mix that looks somewhat like both parents."
  • The Analogy: It's like a detective who has no idea who the suspect is, so they try every possible combination of alibis and clues until they find the one that fits the facts perfectly. It doesn't guess; it tests.

The Race: What Happened?

The researchers pitted these two against thousands of real-world conflicts from open-source projects (like Java, C#, and JavaScript code). Here is what they found:

1. The "Super-Reader" wins when things are messy.
When the two versions of the code were very different in size (e.g., one version added a huge paragraph while the other deleted a single line), the Super-Reader was amazing. Because it learned from so much data, it could understand the context and pick the right lines, even if the balance was weird. It was also much faster, solving conflicts in a blink of an eye.

2. The "Puzzle Solver" wins when things are balanced.
When the two versions were similar in size and structure, the Puzzle Solver was the champion. It found the perfect mix of lines more often than the Super-Reader. It was also more reliable when the code contained weird symbols, non-English text, or was extremely long.

3. The "Super-Reader" has a few bad habits.

  • Memory Leaks: Sometimes, the Super-Reader got "stuck" on a specific example it had seen before in its training. It would just repeat that answer, even if it was wrong for the current situation. This is called overfitting—it memorized the test instead of learning the lesson.
  • Short Attention Span: If the code chunk was too huge, the Super-Reader would get overwhelmed and stop writing halfway through, leaving the conflict half-solved.
  • Language Barrier: If the code had comments in a language the model wasn't trained on, it got confused.

4. The "Puzzle Solver" is a bit slow but steady.
It takes longer to solve the puzzle because it has to test many combinations. However, it never gets confused by long text or strange languages because it treats everything as simple text. It doesn't "memorize" anything, so it doesn't overfit.

The Big Conclusion: No "Silver Bullet"

The paper concludes that neither helper is perfect on its own.

  • If you give the Super-Reader a small, messy conflict, it's a genius.
  • If you give the Puzzle Solver a huge, balanced, or weirdly formatted conflict, it's the reliable workhorse.

The Solution?
The authors suggest building a hybrid system—a "Traffic Cop" that looks at the conflict first.

  • If the conflict is small and messy, the Traffic Cop sends it to the Super-Reader.
  • If the conflict is huge, balanced, or contains weird characters, the Traffic Cop sends it to the Puzzle Solver.

By letting the right tool do the right job, we can create a system that is both fast and accurate, saving developers from the headache of manual merging.

Summary in One Sentence

This paper proves that while AI "guessers" are fast and great at messy problems, "searchers" are more reliable for complex or weird ones, and the best future tool will be a smart combination of both.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →