Learning Shortest Paths with Generative Flow Networks

This paper proposes a novel framework that leverages Generative Flow Networks with flow regularization to solve shortest path problems in non-acyclic graphs, demonstrating competitive performance in permutation environments and Rubik's Cube solving with reduced test-time search budgets.

Nikita Morozov, Ian Maksimov, Daniil Tiapkin, Sergey Samsonov

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you are standing in a massive, dark, and confusing maze. Your goal is to find the quickest way out. Usually, to solve this, you might try to memorize the entire map, or you might use a flashlight to check every single path one by one until you find the exit. This is how traditional computer algorithms work: they are like careful, slow explorers checking every turn.

But what if you could train a super-intelligent guide who doesn't just know the map, but intuitively knows the shortest route to the exit from anywhere in the maze, without ever needing to check the wrong turns?

That is exactly what this paper proposes using a new AI tool called GFlowNets (Generative Flow Networks). Here is the breakdown of their idea using simple analogies.

1. The Problem: The "Infinite Loop" Trap

In many complex puzzles (like a Rubik's Cube), you can make a move, then undo it, then make it again. This creates a "cycle."

  • Old AI methods often treat these puzzles like a straight line (a tree). They struggle when you can go in circles.
  • The New Idea: The authors realized that if you teach an AI to be "lazy" about how long its journey takes, it will naturally learn to take the shortest path.

2. The Core Insight: The "Lazy Traveler"

Imagine you are training a robot to walk from the start of a maze to the finish.

  • Standard Training: You tell the robot, "Go to the finish!" It might wander around, take a detour, go in a circle, and eventually get there. It learns a way, but maybe not the best way.
  • The Paper's Trick: The authors add a "penalty" for walking too far. They tell the robot: "You get a reward for reaching the goal, but you lose points for every single step you take."

The Magic:
If the robot wants to maximize its score (get the reward while losing the fewest points), it is forced to stop wandering. It realizes that the only way to win is to never take a step that doesn't bring it closer to the goal.

  • If it takes a wrong turn, it wastes energy.
  • If it loops back, it wastes energy.
  • Result: The robot's brain (the AI policy) eventually learns to assign zero probability to any path that isn't the shortest one. It literally stops "thinking" about the long, winding roads.

3. The "Backwards" Approach

Here is the cleverest part of their method.
Usually, when you solve a maze, you start at the beginning and try to find the exit.

  • The Paper's Strategy: They teach the AI to work backwards.
    • Imagine the "Exit" is a special "Sink" (a black hole) that sucks everything in.
    • The AI learns to start at the Exit and walk backwards to the Start.
    • Because the AI is penalized for taking too many steps, it learns the most direct route backwards.
    • Once it learns this, you just flip the map over. Now, when you are at the Start, the AI knows exactly which way to go to reach the Exit in the fewest steps.

4. Real-World Tests: Rubik's Cubes and Swapping

The authors tested this on two very hard problems:

  1. The Swap Puzzle: Imagine a line of people in random order. You can only swap neighbors. How do you get them in order? The AI learned the perfect strategy instantly.
  2. The Rubik's Cube: This is the ultimate maze. There are billions of possible positions.
    • The Competition: Other AI methods (like DeepCubeA) use a "search" strategy. They look ahead at many possibilities, like a chess player calculating 10 moves ahead. This takes a lot of computer power and time.
    • The GFlowNet Result: Their AI learned the "intuition" of the shortest path.
    • The Win: When solving a Rubik's Cube, their method found solutions just as short as the best existing methods, but it did so much faster and required less computer power. It was like comparing a hiker who checks every trail (other AIs) to a hiker who has a GPS that only shows the direct path (this new AI).

5. Why This Matters

This paper changes how we think about finding the shortest path.

  • Before: We thought we needed complex math or massive search trees to find the best route.
  • Now: We can just train a probabilistic model to be "efficient" (minimize steps), and the math guarantees it will find the shortest path automatically.

In a nutshell:
The authors discovered that if you teach an AI to hate wasting time, it becomes a master of finding the shortest path. They turned a complex navigation problem into a simple lesson on efficiency, proving that sometimes the best way to get somewhere is to stop taking detours.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →