Bug Severity Prediction in Software Projects Using Supervised Machine Learning Models

Imagine you are the captain of a massive, busy ship (your software project). Every day, hundreds of crew members shout out problems: "The engine is making a weird noise!" "The lifeboat latch is stuck!" "The coffee machine is leaking!"

Some of these problems are critical (the ship might sink if we don't fix the engine now). Others are minor (the coffee machine can wait until lunch).

In the real world of software, these "shouts" are called Bug Reports. In big projects like Eclipse (a huge software tool), there are tens of thousands of these reports piling up every day. Trying to read them all and decide which ones are emergencies is like trying to find a needle in a haystack while wearing blindfolded gloves. It's slow, tiring, and humans often make mistakes because they are tired or biased.

This thesis is about building a smart robot assistant (Machine Learning) that can read these reports instantly and tell the captain: "Captain, ignore the coffee machine. The engine is on fire! Fix that first!"

Here is a simple breakdown of how the author, Nafisha, built this robot and what she found.

1. The Goal: Sorting the Noise from the Fire

The main problem is Bug Severity Prediction.

The Old Way: A human manager reads a report and guesses, "Hmm, this sounds serious," or "This sounds like a typo."
The New Way: A computer program reads the report, learns from thousands of past examples, and instantly assigns a severity level: Critical, Major, Minor, or Trivial.

2. The Ingredients: The "School of Fish"

To teach the robot, the author needed a giant library of past stories.

The Dataset: She used the Eclipse Bugzilla database. Think of this as a massive archive of 88,682 past "shouts" from the ship's crew.
The Problem: The library was unbalanced. There were thousands of "minor" complaints (like a loose screw) but very few "critical" ones (like a sinking hull). If you teach a robot with mostly minor complaints, it will become lazy and think everything is minor.
The Fix: The author used a technique called SMOTE. Imagine you have a tiny pile of "Critical" cards and a huge pile of "Minor" cards. SMOTE is like a photocopier that creates fake but realistic "Critical" cards to balance the deck, so the robot learns to pay attention to the emergencies.

3. The Contest: Who is the Best Detective?

The author didn't just build one robot; she built 10 different detectives (Machine Learning Models) and put them in a contest to see who could sort the bugs best.

Here are the main contestants:

The Old School Detectives (Linear Models): Like Logistic Regression and SVM. They are simple, fast, and good at following strict rules.
The Team Players (Ensemble Trees): Like XGBoost, LightGBM, and CatBoost. Imagine a committee of experts who vote on every decision. They are very powerful and usually very accurate.
The Super-Reader (DistilBERT): This is a Deep Learning model. Think of it as a detective who has read every book in the library and understands the nuance of language. It doesn't just look for keywords; it understands the feeling of the sentence.

4. The Results: Who Won?

After running the contest, here is what happened:

The Overall Champions (Accuracy): DistilBERT and XGBoost won the race for general accuracy. They were the best at getting the right answer most of the time.
- Analogy: DistilBERT is like a genius who understands the whole story, while XGBoost is like a super-efficient committee that never misses a detail.
The "Safety First" Champion (Recall): Logistic Regression was the best at finding the most dangerous bugs, even if it sometimes cried "Wolf!" a little too often.
- Analogy: If you are looking for a shark in the ocean, you want a detector that screams "SHARK!" even if it's just a shadow. You don't want it to miss the real shark. Logistic Regression is that paranoid, safety-first detector.

5. The Big Takeaway

The study found that there is no single "perfect" robot. It depends on what you need:

If you want pure accuracy (getting the most right answers overall), use the Team Players (XGBoost) or the Super-Reader (DistilBERT).
If you want to make sure you never miss a critical disaster (even if you get a few false alarms), use the Old School Detective (Logistic Regression).

6. Why Does This Matter to You?

You might not be a software engineer, but this matters because:

Safety: It helps prevent apps from crashing or banking systems from failing.
Speed: It saves developers hours of reading boring reports, so they can fix the real problems faster.
Trust: When software works better and crashes less, you have more trust in the apps you use every day.

In a Nutshell

This paper is about teaching computers to be better at prioritizing emergencies. By using smart algorithms, we can turn a chaotic pile of complaints into a clear, organized to-do list, ensuring that the most dangerous fires are put out before they burn the whole house down.

Bug Severity Prediction in Software Projects Using Supervised Machine Learning Models

1. The Goal: Sorting the Noise from the Fire

2. The Ingredients: The "School of Fish"

3. The Contest: Who is the Best Detective?

4. The Results: Who Won?

5. The Big Takeaway

6. Why Does This Matter to You?

In a Nutshell

1. Problem Statement

2. Methodology

A. Data Preprocessing

B. Models Evaluated

C. Evaluation Metrics

3. Key Results

4. Key Contributions

5. Significance and Impact

Bug Severity Prediction in Software Projects Using Supervised Machine Learning Models

1. The Goal: Sorting the Noise from the Fire

2. The Ingredients: The "School of Fish"

3. The Contest: Who is the Best Detective?

4. The Results: Who Won?

5. The Big Takeaway

6. Why Does This Matter to You?

In a Nutshell

1. Problem Statement

2. Methodology

A. Data Preprocessing

B. Models Evaluated

C. Evaluation Metrics

3. Key Results

4. Key Contributions

5. Significance and Impact

More like this

Entropy After for reasoning model early exiting

Alternatives to the Laplacian for Scalable Spectral Clustering with Group Fairness Constraints

A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse

FLeX: Fourier-based Low-rank EXpansion for multilingual transfer