Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

Here is an explanation of the paper, translated into everyday language with some creative metaphors.

The Big Picture: The Collatz Game

Imagine a simple game played with numbers. You pick a number, and you follow two rules:

If it's even, cut it in half.
If it's odd, triple it and add one.

You keep doing this until you reach the number 1. The "stopping time" is simply how many steps it takes to get there.

The big mystery (the Collatz Conjecture) is whether every number eventually reaches 1. Nobody has proven it yet. But this paper doesn't try to prove the math; instead, it asks: "If we treat these numbers like a random crowd, what does the 'time to finish' look like, and can we predict it?"

The authors looked at the first 10 million numbers and tried to build two different "crystal balls" to predict how long the game lasts for any given number.

The Problem: It's Messy and Unpredictable

If you plot the stopping times for 10 million numbers, it looks like a chaotic mess.

The "Long Tail": Most numbers finish quickly, but a few take forever. It's like a race where most people finish in 10 minutes, but a few run for 10 hours.
The "Stripes": When you look closely, the numbers aren't random. They form invisible "stripes" or bands. Some numbers always take longer than others just because of their specific "remainder" when divided by 8 (like how some days of the week are busier than others).

The authors wanted to build models to explain this mess.

Model 1: The "Statistical Weather Forecaster" (Bayesian Regression)

The Analogy: Imagine you are trying to predict how long a commute will take. You know that traffic is usually worse at 5 PM (time of day) and worse on rainy days (weather). You don't need to know the physics of every car engine; you just need the patterns.

How it works:
The authors built a Bayesian Negative Binomial Regression. That's a fancy way of saying: "Let's use a statistical formula that accounts for messy, unpredictable data."

The Inputs: They fed the model two simple clues:
1. Log(n): How big the number is (bigger numbers generally take longer, but not in a straight line).
2. n mod 8: The remainder when you divide the number by 8. This captures those "stripes" we saw earlier.
The Magic: The model doesn't just give one answer. It gives a range of probabilities. It says, "For this number, there's a 90% chance the game lasts between 100 and 200 steps."
The Result: This model was the champion. It predicted the stopping times better than anything else. It admitted, "I'm not perfect, but I'm very good at guessing the average and the uncertainty."

Model 2: The "Mechanical Toy" (Odd-Block Generator)

The Analogy: Imagine a Rube Goldberg machine. Instead of guessing the outcome, you try to rebuild the machine itself. You know the gears turn, the balls drop, and the levers flip. You want to simulate the machine step-by-step to see where the ball lands.

How it works:
The Collatz game has a hidden rhythm. When you hit an odd number, you do 3n + 1, which makes it even. Then you divide by 2 repeatedly until it's odd again.

The authors realized this is like a block of steps. You jump from one odd number to the next, and the "distance" of the jump depends on how many times you have to divide by 2.
The Twist: Instead of calculating the exact math for every step (which is slow), they turned it into a dice game. They said, "Let's pretend the length of these jumps is random, but follows a specific pattern."
The Refinement: At first, they just rolled a standard die (a geometric distribution). It was okay, but not great. Then, they realized the "stripes" mattered. So, they made the dice conditional. If the number is a certain "type" (based on the mod 8 rule), they used a different weighted die.
The Result: This model is mechanically faithful. It explains why the game behaves the way it does. However, as a pure predictor, it was less accurate than the statistical "Weather Forecaster." It was like a detailed simulation of a car engine that was slightly less accurate at predicting traffic than a simple app that just looks at the time of day.

The Showdown: Which is Better?

The authors put the two models head-to-head on a test set of numbers they hadn't seen before.

The Statistical Model (Weather Forecaster): Won easily. It predicted the exact number of steps with the highest accuracy. It's the best tool if you just want to know "How long will this take?"
The Mechanical Model (Rube Goldberg Machine): Lost on pure accuracy, but won on understanding. It showed us that the "stripes" (the mod 8 rule) are real and important. It proved that the randomness isn't just noise; it's structured by the number's shape.

The Takeaway

The paper teaches us two things:

Sometimes, simple statistics beat complex simulations. If you just want to predict the outcome, a smart statistical model that learns from the data is often better than trying to simulate every single rule of the universe.
Structure hides in the noise. Even in a chaotic system like the Collatz game, there are hidden patterns (like the mod 8 stripes) that drive the behavior. By combining the statistical power with the mechanical understanding, we get the full picture.

In short: The authors didn't solve the Collatz Conjecture, but they built a very good map of the territory, showing us where the mountains are and how the rivers flow, using both a satellite view (statistics) and a ground-level tour (mechanics).

Here is a detailed technical summary of the paper "Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective" by Nicolò Bonacorsi and Matteo Bordoni.

1. Problem Statement

The paper addresses the Collatz Conjecture, specifically focusing on the total stopping time $\tau(n)$ , defined as the number of steps required for the Collatz map $T(n)$ to reach 1 starting from an integer $n$ .

The Challenge: While the conjecture asserts $\tau(n) < \infty$ for all $n$ , the distribution of $\tau(n)$ is complex, exhibiting heavy right tails, overdispersion, and significant arithmetic heterogeneity (structure based on modular arithmetic).
The Goal: Instead of attempting a mathematical proof, the authors treat $n$ as a random variable and aim to build probabilistic models that can predict and explain the empirical distribution of $\tau(n)$ over a large dataset ( $N = 10^7$ ). They seek to quantify uncertainty and identify the drivers of the observed heterogeneity.

2. Methodology

The authors develop and compare two complementary modeling approaches: a statistical regression model and a mechanistic generative model.

A. Data Preparation

Dataset: Computed $\tau(n)$ for all $1 \le n \le 10^7$.
Optimization: Used dynamic programming with Numba JIT compilation to cache intermediate results, significantly accelerating the computation of long trajectories.
Exploratory Analysis:
- Overdispersion: The variance of $\tau(n)$ is approximately 24.56 times its mean, ruling out a Poisson distribution.
- Heterogeneity: Scatter plots reveal "banding" patterns, suggesting that the residue class $n \pmod 8$ is a critical factor in determining stopping times.

B. Model 1: Bayesian Hierarchical Negative Binomial Regression (NB2-GLM)

This is a phenomenological approach treating $\tau(n)$ as an overdispersed count variable.

Likelihood: Negative Binomial (NB2 parameterization), where $Var(Y) = \mu + \alpha\mu^2$ .
Covariates:
- Scale: $\log(n)$ (captures the slow growth of stopping times).
- Arithmetic Structure: $n \pmod 8$ (categorical feature).
Hierarchy: Includes a random intercept $u_r$ for each residue class $r \in \{0, \dots, 7\}$ , modeled as $u_r \sim \mathcal{N}(0, \sigma_u^2)$ . This performs partial pooling to stabilize estimates for each class while preventing overfitting.
Inference: Implemented using PyMC with NUTS (No-U-Turn Sampler) on a training set of 50,000 samples.

C. Model 2: Mechanistic Odd-Block Generative Model

This approach attempts to simulate the dynamics of the Collatz map by randomizing specific components.

Mechanism: Based on the odd-block decomposition. For an odd number $m$ , the map $3m+1 $is followed by$ K(m) = v_2(3m+1)$ divisions by 2 until the next odd number is reached.
Stochastic Approximation: Instead of deterministic $K(m)$ , the model replaces it with a random variable $K_j$ drawn from a probability mass function (PMF) $p_k$ .
Variants:
- G1 (Baseline): Assumes $K$ follows a geometric distribution ( $P(K=k) \approx 2^{-k}$ ).
- G2 (Global Calibrated): Estimates $p_k$ from data using a Dirichlet-Multinomial update (conjugate prior).
- G3 (Conditional Calibrated): Estimates $p_k$ conditioned on the residue class $m \pmod 8$ , explicitly modeling the arithmetic dependence.
Projection: Includes an "odd projection" step to ensure the stochastic trajectory remains within the odd state space.

3. Key Contributions

Probabilistic Framing: Successfully frames the deterministic Collatz dynamics as a statistical learning problem, quantifying uncertainty without positing physical noise.
Identification of Drivers: Demonstrates that low-order modular structure ( $n \pmod 8$ ) is a primary driver of the heterogeneity in stopping times, more so than just the magnitude of $n$ .
Model Comparison: Provides a rigorous comparison between a purely predictive statistical model (NB2-GLM) and a mechanistic generative model, showing how conditioning the generator on modular arithmetic bridges the gap between the two.
Scalable Computation: Details an efficient dynamic programming approach to compute stopping times for $10^7$ integers, enabling large-scale empirical analysis.

4. Results

The models were evaluated on a held-out test set ( $N_{test} = 50,000$ ) using:

Log Predictive Score: A proper scoring rule measuring the likelihood assigned to observed data.
1-Wasserstein Distance ( $W_1$ ): Measuring the global distributional mismatch.

Model	Log Score (Higher is Better)	$W_1$ Distance (Lower is Better)
NB2-GLM (M3)	-272,911	3.20
G3 (Conditional Generator)	-1,079,086	5.43
G2 (Global Generator)	-1,165,983	17.59

Predictive Performance: The NB2-GLM significantly outperforms the generative models in predictive likelihood (by orders of magnitude) and distributional fit. It captures the bulk of the distribution and the heavy tail effectively.
Generative Improvement: The Conditional Generator (G3) vastly outperforms the Global Generator (G2). Conditioning the block-length distribution on $m \pmod 8$ reduces the $W_1$ distance from 17.59 to 5.43, proving that modular structure is essential for the mechanistic model to be accurate.
Posterior Checks: The NB2-GLM matches the empirical distribution well, slightly overestimating the extreme right tail, consistent with the quadratic variance of the NB2 distribution.

5. Significance and Conclusion

Predictive vs. Mechanistic: The paper concludes that while the NB2-GLM is the superior model for pure prediction and uncertainty quantification, the Odd-Block Generator offers valuable mechanistic insight. It explains why the distribution is skewed (randomized block lengths) and why heterogeneity exists (dependence on $m \pmod 8$ ).
Bridging the Gap: The conditional generator (G3) serves as a bridge, showing that the "random effects" in the regression model correspond to explicit conditioning information in the generative model.
Future Directions: The authors suggest extending the conditioning to higher powers of two, incorporating state-dependence in the block lengths, and using likelihood-based calibration to align the scoring rules of generative models with their interpretability.

In summary, the paper demonstrates that arithmetic structure (specifically modulo 8) is a fundamental component of the Collatz stopping time distribution, and that Bayesian hierarchical modeling provides a robust framework for capturing this complexity, outperforming simple heuristic generators unless those generators are explicitly conditioned on the same modular arithmetic.

Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

The Big Picture: The Collatz Game

The Problem: It's Messy and Unpredictable

Model 1: The "Statistical Weather Forecaster" (Bayesian Regression)

Model 2: The "Mechanical Toy" (Odd-Block Generator)

The Showdown: Which is Better?

The Takeaway

1. Problem Statement

2. Methodology

A. Data Preparation

B. Model 1: Bayesian Hierarchical Negative Binomial Regression (NB2-GLM)

C. Model 2: Mechanistic Odd-Block Generative Model

3. Key Contributions

4. Results

5. Significance and Conclusion

More like this

The fourth known primitive solution to a5+b5+c5+d5=e5a^5 + b^5 + c^5 + d^5 = e^5a5+b5+c5+d5=e5

Waring-Goldbach problems for one square and higher powers

Reductification of parahoric group schemes

Sobolev regularity of the symmetric gradient of solutions to a class of ϕ\phiϕ-Laplacian systems

On the approximation of Weierstrass function via superoscillations

The fourth known primitive solution to $a^5 + b^5 + c^5 + d^5 = e^5$

Sobolev regularity of the symmetric gradient of solutions to a class of $\phi$ -Laplacian systems