Towards a Taxonomy of Software Log Smells

This paper presents a taxonomy of nine log smells derived from a survey of 51 studies to help developers write better logging code, while also mapping these issues to existing repair tools and highlighting critical gaps in current research and tooling.

Nyyti Saarimäki, Donghwan Shin, Domenico Bianculli

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a crime in a massive, chaotic city. Your only clues are the diaries (logs) left behind by the city's citizens (the software). If the diaries are written in a secret code, filled with typos, missing pages, or contradictory stories, you'll never solve the crime. You'll be stuck in the dark.

This paper is about fixing those diaries.

The authors, researchers from Luxembourg and the UK, noticed that while software engineers are great at building apps, they often write terrible "diaries" (logs) about what those apps are doing. These bad logs lead to confusion, wasted time, and even security leaks.

To help, the team created a "Smell Taxonomy." Think of this as a Wanted Poster for bad logging habits. Just like a detective knows that "smoke" might mean a fire, developers need to recognize "smells" that mean something is wrong with their code.

Here is the breakdown of their findings, translated into everyday language:

1. What is a "Log Smell"?

In software, a "smell" isn't a bad odor. It's a warning sign. It's like seeing a wobbly leg on a table. The table still stands (the software still runs), but it's a sign that something is poorly designed and might collapse later.

The authors found 9 specific "Smells" that developers should avoid. They grouped them into two categories: The Code (how the diary is written) and The Diary (what the diary actually says).

The 9 "Smells" (The Bad Habits)

A. Smells in the Diary (The Log Files)

These are problems you see when you actually read the logs.

  1. Format Turmoil (The Messy Handwriting): Imagine one person writes in cursive, another in block letters, and a third uses a different language. If your logs don't follow a standard format, it's impossible to search them or read them quickly.
  2. Undercover Identifier (The Anonymous Note): A diary entry says, "The bank vault was opened," but it doesn't say who opened it. Without a name (or ID) attached to the log, you don't know which part of the system caused the problem.
  3. Mercurial Logging Level (The False Alarm): This is like a smoke detector that screams "FIRE!" when you just toast a piece of bread. The log says "ERROR" for a minor issue, or "INFO" for a disaster. It makes it hard to know what actually matters.
  4. Deceptive Variable (The Missing Ingredient): The log says, "The car broke down," but it forgets to say which car or what part broke. It's like a recipe that says "add spices" but doesn't list which ones.
  5. Message Madness (The Nonsense Story): The log entries are full of typos, bad grammar, or confusing sentences. "Connect to data base" vs "Connected to database." It makes the story hard to follow.
  6. Logging Lost in the Wind (The Missing Page): The diary has a gap. The software crashed, but the log entry that should explain why is simply missing. It's like a detective arriving at a crime scene and finding the most important page torn out.
  7. Landfill Logs (The Garbage Dump): This is the opposite of missing pages. The software writes too much. It logs every single breath the computer takes. The important clues are buried under thousands of lines of useless noise, like trying to find a diamond in a mountain of trash.

B. Smells in the Code (How the Diary is Written)

These are problems in the actual programming code that generates the logs.

  1. Sleeping Guards (The Lazy Bouncer): Imagine a bouncer at a club who is supposed to check IDs but falls asleep. In code, this means the computer is doing the hard work of writing a log entry even when nobody is looking at it. It wastes energy and slows down the system.
  2. Skeleton in the Closet (The Messy Room): The code that writes the logs is itself messy, duplicated, or hard to understand. If the "writer" is confused, the "diary" will be too.

2. Why Does This Happen? (The Causes)

The paper also looked at why these smells exist. It's usually not because developers are lazy, but because of:

  • No Rulebook: Everyone has their own way of writing logs.
  • Experience Gap: New developers might not know what to log.
  • Too Many Tools: Using different "notebooks" (libraries) that don't talk to each other.
  • Neglect: Updating the app but forgetting to update the logs.

3. What Happens If We Ignore Them? (The Consequences)

If you ignore these smells, the "diary" becomes useless, leading to:

  • Leaking Secrets: Accidentally writing passwords or private user data into the log file.
  • Time Travel Confusion: The log says Event B happened before Event A, but in reality, it was the other way around.
  • Slow Motion: The system runs so slowly because it's busy writing useless logs.
  • Ghost Bugs: The act of logging something actually changes how the software behaves, creating new bugs.

4. The Toolkit (The Solutions)

The researchers checked if there are any tools to fix these smells.

  • Good News: There are about 16 tools that can help fix things like "False Alarms" (wrong log levels) or "Missing Pages" (missing logs).
  • Bad News: There are almost no tools to fix the "Messy Handwriting" (Format Turmoil) or the "Lazy Bouncer" (Sleeping Guards). These are areas where future researchers need to invent new tools.

The Big Takeaway

Writing good logs is like writing a good mystery novel: it needs a clear plot, consistent characters, and no missing pages.

This paper gives developers a checklist (the taxonomy) to spot the bad habits before they cause a disaster. It tells us: "Hey, your logs smell like a landfill; let's clean them up before the system crashes."

By understanding these "smells," developers can write better software, debug faster, and sleep better at night knowing their digital diaries are actually useful.