Random irregular histograms

This paper introduces a fully automatic, fully Bayesian method for constructing irregular histograms that simultaneously selects the optimal number and location of bins, demonstrating consistency and minimax convergence rates while performing competitively with existing procedures in simulations.

Oskar Høgberg Simensen, Dennis Christensen, Nils Lid Hjort

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are a cartographer trying to draw a map of a mysterious, foggy island. You have a bunch of GPS pings (data points) from explorers who walked around the island, but you don't know the terrain. Your goal is to create a map that shows where the mountains are (peaks/modes), where the valleys are, and how steep the slopes are.

In statistics, this is called density estimation. The oldest and most famous tool for this is the histogram.

The Old Way: The "Cookie Cutter" Approach

Traditionally, making a histogram is like using a cookie cutter. You decide, "I'm going to slice the island into 10 equal-sized strips." You count how many explorers fell into each strip and draw a bar.

The Problem: The island isn't flat. Some areas are flat plains, while others have jagged, narrow peaks.

  • If you make your strips too wide, you smooth out the jagged peaks. You might miss a tiny mountain entirely because it got buried in a wide, flat strip.
  • If you make your strips too narrow, your map looks like a jagged, noisy mess. You might think a single rock is a mountain just because you got unlucky with the data.

The big question for statisticians has always been: How do I choose the perfect width for my strips? Most methods try to find a "Goldilocks" width that works for the whole map, but this often fails when the landscape changes.

The New Idea: The "Smart, Shapeshifting" Map

This paper proposes a new method called the Random Irregular Histogram. Instead of using cookie cutters of equal size, imagine you have a magical, shapeshifting ruler.

  • Where the terrain is flat: Your ruler stretches out, making wide strips. This smooths out the noise and gives you a clear view of the plains.
  • Where the terrain is jagged (near a peak): Your ruler shrinks down, making tiny, narrow strips. This allows you to zoom in and see the exact shape of the mountain without blurring it.

The authors call this "irregular" because the strips are different sizes. They use a Bayesian approach, which is like having a very smart, cautious guide who says: "Based on the data we have, here is the most likely map. If the data is noisy, I'll smooth it out. If the data shows a sharp spike, I'll zoom in."

How It Works (The Magic Trick)

The authors didn't just guess where to put the lines. They used a mathematical "search engine" to find the best possible map.

  1. The Search: They looked at billions of possible ways to slice the data.
  2. The Score: They gave every map a score based on two things:
    • Fit: Does the map match the GPS pings?
    • Simplicity: Is the map too complicated? (They don't want a map with a million tiny strips just to fit one weird data point).
  3. The Winner: They picked the map with the highest score.

Because they used a clever computer algorithm (Dynamic Programming), they could find this "perfect" map almost instantly, even with huge amounts of data.

Why This Matters: Finding the "Hidden Mountains"

The paper shows that this new method is a superhero at finding modes (the peaks of the distribution).

  • The Old Way: If you have a mountain range with one huge peak and one tiny, sharp peak nearby, the old "equal strip" method usually misses the tiny peak. It smooths it over because it's trying to be fair to the whole map.
  • The New Way: It zooms in on the tiny peak, making a very narrow strip just for that spot, so you can see it clearly.

The Analogy:
Imagine listening to a song.

  • Regular Histograms are like listening to the song through a low-quality speaker that averages the sound. You hear the bass and the melody, but you miss the tiny, high-pitched whistle in the background.
  • This New Method is like a high-fidelity sound engineer who knows exactly when to turn up the volume on the bass and when to isolate the whistle. It adapts to the music in real-time.

The Results

The authors tested their method against all the other famous methods using:

  1. Fake Data: They created 16 different "islands" (some with one peak, some with ten, some with weird shapes). Their method was usually the best at finding the peaks and didn't mess up the overall shape.
  2. Real Data:
    • Old Faithful Geyser: The time between eruptions has two distinct patterns (short waits and long waits). Their map showed these two patterns clearly, while the old method made it look messy.
    • Gene Research: In a study about breast cancer genes, they had to find how many genes were "active." Their map found a sharp spike of active genes right at the start, which the old method smoothed over and missed.

The Bottom Line

This paper gives us a new, automatic tool for drawing histograms. It doesn't require you to guess the settings. It automatically figures out where to be smooth and where to be sharp.

  • For the Statistician: It's a mathematically proven, fast, and accurate way to see the truth in the data.
  • For You: It's like having a map that automatically zooms in on the interesting parts of the world and zooms out on the boring parts, so you never miss a hidden mountain again.