Conventional Commit Classification using Large Language Models and Prompt Engineering

This paper demonstrates that training-free large language models, particularly DeepSeek-R1-32B using few-shot prompting, can effectively classify conventional commits from code diffs, offering a practical alternative to traditional supervised machine learning approaches.

Original authors: H. M. Sazzad Quadir, Sakib Al Hasan, Md. Nurul Ahad Tawhid

Published 2026-05-06✓ Author reviewed
📖 4 min read☕ Coffee break read

Original authors: H. M. Sazzad Quadir, Sakib Al Hasan, Md. Nurul Ahad Tawhid

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are the manager of a massive, chaotic library where thousands of books are being added every day. To keep things organized, the library has a strict rule: every new book must have a specific label on its spine (like "New Feature," "Bug Fix," or "Documentation") so that robots can automatically sort them, update the catalog, and tell visitors what's new.

However, in reality, the people adding the books often ignore the rules. They scribble messy notes like "fixed the thing" or "changed some code," making it impossible for the robots to know what category the book belongs to.

This paper is about teaching a super-smart robot (an AI) to read those messy notes and figure out the correct label, without having to spend years studying thousands of examples first.

The Problem: Messy Notes vs. Strict Rules

In software development, programmers write "commit messages" (notes) every time they save changes to their code. The industry has a standard format called Conventional Commits that acts like a strict filing system. It requires notes to start with a specific tag (e.g., feat:, fix:).

But humans are messy. They often forget the tags. Traditionally, to fix this, researchers would build a custom robot by feeding it thousands of labeled examples (like a student memorizing a textbook). This takes a lot of time and data.

The New Approach: The "Prompt" Strategy

Instead of training a new robot from scratch, the authors asked: Can we just give a very smart, pre-existing AI a set of instructions (a "prompt") to do the job?

They treated the AI like a brilliant intern who already knows a lot about language but needs to know exactly what task to do. They tested three different ways of giving instructions:

  1. Zero-Shot (The "Just Tell Me" Approach):

    • The Analogy: You walk up to the intern and say, "Here is a messy note. Please tell me what category it belongs to based on the rules." You give no examples.
    • Result: The intern guesses, but often gets it wrong because they don't know exactly what you want.
  2. Few-Shot (The "Show Me Examples" Approach):

    • The Analogy: You say, "Here is a messy note that means 'New Feature.' Here is another that means 'Bug Fix.' Now, look at this new messy note and tell me what it is." You show the intern a few clear examples first.
    • Result: This worked the best. The intern understood the pattern quickly and sorted the books accurately.
  3. Chain-of-Thought (The "Think Out Loud" Approach):

    • The Analogy: You say, "Before you give me the answer, please write down your step-by-step reasoning: 'I see the word 'fix', so I think it's a bug...'"
    • Result: Surprisingly, this didn't help. For this specific task of sorting labels, making the intern "think out loud" just added extra steps without making the final answer better. It was like asking a librarian to write an essay before shelving a book; it slowed them down without improving the result.

The Contenders: How Big Does the Brain Need to Be?

The researchers tested three different "interns" (AI models) of varying sizes:

  • Mistral-7B: A medium-sized brain (7 billion parameters).
  • LLaMA-3-8B: A slightly larger brain (8 billion parameters).
  • DeepSeek-R1-32B: A giant brain (32 billion parameters).

The Finding: The bigger brain won. The DeepSeek-R1-32B was the most accurate at reading the messy notes and finding the right label. This suggests that for this kind of task, having a larger, more powerful AI model makes a real difference.

The Bottom Line

The paper concludes that you don't need to build a custom machine learning model from scratch to organize messy software notes. Instead, you can use a powerful, pre-existing AI and simply give it a few good examples (Few-Shot prompting) to get the job done.

  • Best Strategy: Show the AI a few examples first.
  • Best AI: The biggest, most powerful model available.
  • Waste of Time: Making the AI write a long reasoning process before answering.

This approach saves time and effort because it skips the need to collect and label thousands of training examples, letting developers automate their file organization immediately.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →