DAP: A Discrete-token Autoregressive Planner for Autonomous Driving

DAP is a compact, discrete-token autoregressive planner that jointly forecasts BEV semantics and ego trajectories with reinforcement learning fine-tuning, achieving state-of-the-art performance on autonomous driving benchmarks despite a limited 160M parameter budget.

Bowen Ye, Bin Zhang, Hang Zhao

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are teaching a robot to drive a car. For a long time, the best way to do this was to show the robot thousands of hours of human driving videos and say, "Copy exactly what the human did." This is called Imitation Learning.

But here's the problem: If the robot just copies the human, it doesn't really understand the road. If the human makes a tiny mistake, the robot copies it. If the human gets distracted, the robot gets distracted. It's like a student who memorizes the answers to a math test but doesn't understand the math. If the test changes slightly, the student fails.

The paper you shared introduces DAP (Discrete-token Autoregressive Planner), a new way to teach self-driving cars that is smarter, more efficient, and more like how a human brain actually thinks.

Here is the breakdown of how DAP works, using simple analogies:

1. The "Discrete Token" Idea: Turning the World into Lego Bricks

Most self-driving AI tries to understand the world as a continuous, blurry stream of pixels (like a high-definition video). This is heavy and hard to process.

DAP's approach: It turns the entire driving scene into a set of Lego bricks (called "discrete tokens").

  • Instead of seeing a smooth curve of a road, it sees "Block A," "Block B," "Block C."
  • Instead of a smooth steering angle, it sees "Turn Left a little," "Go Straight," "Turn Right a little."

Why is this cool? Just like how Large Language Models (like the one you are talking to right now) turn words into tokens to write stories, DAP turns the driving world into tokens to "write" a driving plan. This makes the math much simpler and allows the model to learn faster and scale up easily.

2. The "Autoregressive" Part: Reading the Future One Word at a Time

There are two ways to predict the future:

  • The Old Way (Non-Autoregressive): The AI looks at the current scene and tries to spit out the entire next 5 seconds of driving in one giant burst. It's like trying to guess the ending of a movie before you've seen the middle.
  • The DAP Way (Autoregressive): The AI predicts the next split second, then uses that prediction to guess the next split second, and so on. It's like reading a book one word at a time. You read word 1, then use that to understand word 2, then word 3.

The Benefit: This creates a chain of logic. If the car predicts a pedestrian is about to step out in the next second, it can use that information to decide how to brake in the second after that. It builds a story of the future, step-by-step.

3. The Secret Sauce: "World Modeling" (Predicting the Scene AND the Car)

This is the biggest innovation. Most planners only predict: "Where will my car go?"

DAP predicts two things simultaneously:

  1. Where the car will go.
  2. How the world around the car will change.

The Analogy: Imagine you are playing chess.

  • A bad player only thinks: "I will move my knight here."
  • A grandmaster thinks: "I will move my knight here, AND I predict my opponent will move their pawn there, AND then I will move my bishop."

DAP does this for driving. It predicts the future traffic, the pedestrians, and the road signs (the "World") at the same time as it plans its own moves. By forcing the AI to predict the environment, it learns why it needs to move the way it does. It's no longer just copying; it's understanding cause and effect.

4. The "Coach" (Reinforcement Learning)

The paper mentions a second stage of training called SAC-BC.

  • Stage 1 (Imitation): The robot watches a human drive and learns the basics.
  • Stage 2 (The Coach): The robot drives in a simulator and gets a "score."
    • Did you hit a wall? Bad score.
    • Did you drive smoothly? Good score.
    • Did you stay in the lane? Good score.

Even if the human driver made a risky move, the "Coach" tells the robot, "Actually, that was dangerous. Don't do that." This teaches the robot to be safer than the human it was copying.

5. Why is this a Big Deal? (Efficiency)

Usually, to get a robot to be this smart, you need a massive computer brain (like a supercomputer with billions of parameters).

DAP is tiny.

  • It uses only 120 million parameters.
  • For context, many other state-of-the-art models use billions of parameters.
  • The Result: DAP is as smart as the giants but runs on a much smaller, cheaper, and faster computer. It's like fitting a Ferrari engine into a compact car.

Summary

DAP is a self-driving planner that:

  1. Simplifies the world into Lego-like blocks (Tokens).
  2. Reads the future step-by-step (Autoregressive).
  3. Predicts the environment along with its own moves (World Modeling), so it understands why it's driving.
  4. Gets coached by a reward system to be safer than the humans it learned from.
  5. Does all this with a tiny, efficient brain that doesn't need a supercomputer.

It's a shift from "blindly copying" to "understanding and predicting," making self-driving cars that are not just smart, but also safe and efficient.