When the Next Step Is Not One Step: Distribution-Aware Execution Modeling for Concurrent Go Programs

This paper introduces a distribution-aware execution modeling approach for concurrent Go programs that leverages scheduler nondeterminism to fine-tune a 7B model on empirical event distributions, achieving state-of-the-art accuracy and improved calibration on real-world bug predictions while also providing formal guarantees for detecting specific goroutine leaks.

Original authors: Kaviru Hapuarachchi

Published 2026-06-17
📖 5 min read🧠 Deep dive

Original authors: Kaviru Hapuarachchi

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to teach a robot to predict the next move in a game of chess. If the game is standard chess, the rules are fixed: if you move a pawn here, the opponent must respond in a specific way. The robot just needs to memorize the pattern.

But now, imagine a game of chess played in a chaotic, noisy room where three different people are all trying to move the pieces at the same time, and a random wind blows the board around. Sometimes, if you move a pawn, the wind might knock it over. Sometimes, a player might grab a piece before you can. Sometimes, the opponent might decide to move a different piece entirely.

This is the problem with computer programs that run multiple tasks at once (concurrent programs).

The paper you provided tackles this exact chaos. Here is the breakdown in simple terms:

The Problem: "One Answer" vs. "Many Possible Answers"

In traditional computer science, when a program runs, we usually assume it follows a straight line. If you give it the same input, it gives the same output.

  • The Old Way: Researchers trained AI models to predict the one next step a program would take. They treated the program like a straight line.
  • The Reality: In concurrent programs (like those written in the Go language), the "scheduler" (the part of the computer that decides which task runs when) is like a chaotic referee. If you run the same program twice, it might do A then B, or B then A. Both are correct. Both are valid.

If you train an AI to guess just one answer for a situation where there are actually three valid answers, the AI gets confused. It's like asking a weather forecaster to predict "It will rain" when the reality is "It might rain, it might snow, or it might be sunny," and the AI just picks one and hopes for the best.

The Solution: Predicting the "Weather Forecast"

The authors realized they shouldn't treat the chaos as a mistake. Instead, they treated the chaos as data.

  1. Run it many times: They took a program and ran it hundreds of times.
  2. Count the outcomes: They noticed that while the order changed, some patterns emerged. For example, "Event A" happened 60% of the time, "Event B" happened 30%, and "Event C" happened 10%.
  3. Teach the AI the distribution: Instead of teaching the AI to guess "Event A," they taught it to guess the whole forecast: "There is a 60% chance of A, 30% of B, and 10% of C."

They used a special math trick (called a "KL objective") to train a 7-billion-parameter AI model to match these real-world percentages rather than just guessing a single winner.

The Results: Did it Work?

They tested this on real-world, messy code from famous systems like Kubernetes and Google's gRPC.

  • The AI vs. The Experts: The fine-tuned AI (trained on less than 1,000 examples) got the next step right 36.2% of the time.
  • The Competition: This beat a very powerful, pre-trained AI (Gemini 3.5 Flash) that hadn't been trained on this specific type of problem at all (which only got 34.8% right).
  • The "Calibration" Win: Even more importantly, the new AI was better at knowing when it was unsure. If the situation was chaotic, the AI said, "I'm not sure, it could be anything." If the situation was predictable, it said, "I'm pretty sure." The old way of training made the AI confidently wrong more often.

The Limits: Where the Ceiling Is

The paper is very honest about what the AI can't do yet:

  • The Accuracy Ceiling: The AI tops out around 35–36% accuracy. It can't get much higher because some events are so rare (like a specific type of glitch) that the AI never sees them enough to learn them.
  • The "One Step" Problem: The AI is great at predicting the very next move. But if you ask it to predict the next 10 moves in a row, it falls apart after about one step. It's like a person who can tell you what happens in the next second of a movie, but if you ask them to predict the whole plot, they start making things up.

The "Leak" Discovery

The authors also found a specific "signature" for a type of computer bug called a "goroutine leak" (where a task gets stuck and never finishes).

  • They proved mathematically that if a task gets stuck in a specific type of waiting loop, the chance of it ever "waking up" is zero.
  • This isn't something the AI learned by guessing; it's a rule of the universe (the Go programming language's rules). The AI correctly learned that "Wake Up" is impossible in this specific scenario, which is a good sign that it's understanding the logic, not just memorizing numbers.

Summary

The paper says: "Stop trying to force a chaotic, multi-path system into a single straight line. Instead, show the AI the whole map of possibilities. It won't be perfect, and it can't predict long chains of events yet, but it becomes much better at understanding the nature of the chaos and knowing when it's guessing."

They released their code, data, and tools so others can try to build on this "chaos-aware" approach.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →