A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

This paper introduces a vision-based reinforcement learning agent that achieves champion-level performance in Gran Turismo 7 by utilizing an asymmetric actor-critic framework to rely solely on ego-centric camera views and onboard sensors, thereby eliminating the need for external global localization while outperforming the game's built-in drivers.

Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you are teaching a robot to drive a race car at 200 mph. The biggest challenge isn't just making the car go fast; it's making smart decisions when you can't see everything around you.

This paper introduces a new "champion-level" AI driver that plays the video game Gran Turismo 7. Here is the simple breakdown of how they did it, using some everyday analogies.

1. The Problem: The "GPS vs. The Driver" Dilemma

Most previous AI racers were like super-powered GPS systems. They knew exactly where every other car was, the precise shape of the track, and the speed of everyone else because the game gave them a cheat sheet (global data).

  • The Flaw: In the real world, you can't have a GPS that tells you exactly where every other car is in real-time. Real drivers rely on their eyes and what they feel in the seat.
  • The Goal: The researchers wanted to build an AI that drives like a human, using only a camera (eyes) and a speedometer (body sensors), without any cheat codes.

2. The Solution: The "Student and the Coach" (Asymmetric Learning)

To teach this AI, they used a clever training method called an Asymmetric Actor-Critic framework. Think of it like a driving school:

  • The Student (The Actor): This is the AI that actually drives the car during the race. It is blind to the "big picture." It only sees what the camera sees (the road ahead, the car in front) and feels the steering wheel. It has to guess where the other cars are based on memory and what it sees.
  • The Coach (The Critic): This is the teacher. During training, the Coach has super-vision. It sees the entire track, knows exactly where every opponent is, and knows their speed.
  • How it works: The Student tries to drive. The Coach watches with its super-vision, grades the Student's moves, and says, "You turned too early because you didn't realize the car behind you was speeding up!" The Student learns from this feedback but doesn't get the super-vision itself. This way, the Student learns to be a champion using only its own eyes.

3. The "Short-Term Memory" Trick

Racing is chaotic. If a car passes you and goes around a blind corner, you can't see it anymore. A standard AI might forget it exists.

  • The Fix: The researchers gave the "Student" a recurrent neural network, which is like a short-term memory.
  • The Analogy: Imagine you are playing tag in a dark room. Even if the person you are chasing runs behind a pillar and disappears from your view, your memory tells you, "They are still there, moving left." The AI uses this memory to remember where opponents were a second ago, so it doesn't crash into them when they reappear.

4. The Training: "Practice Makes Perfect"

The AI didn't just learn by driving once.

  • The Gym: They trained it in a digital gym (the game) against 19 other AI opponents.
  • The Reset Button: Sometimes, AI gets "stuck" in bad habits (like always hugging the left wall). The researchers hit a "reset button" on the AI's brain halfway through training. This forced the AI to forget its bad habits and relearn from scratch using a wider variety of scenarios, making it smarter and more adaptable.
  • Visual Noise: They also taught the AI to ignore weird visual glitches (like a sudden shift in the camera angle) so it wouldn't panic if the view got slightly blurry.

5. The Results: Beating the Humans

They tested this new AI on three famous tracks (Tokyo, Spa, and Le Mans) against:

  1. GT Sophy: A previous, super-smart AI that used the "cheat sheet" (global data).
  2. Human Experts: Professional gamers.
  3. Human Champions: World-class racing drivers.

The Outcome:

  • The new "Vision-Only" AI beat the Human Champions and matched or beat the "Cheat Sheet" AI.
  • It started from the very last position (20th place) and fought its way to 1st place consistently.
  • It learned to overtake cars safely, using the camera to judge gaps just like a human would.

The Big Picture

This paper proves that you don't need a supercomputer with a perfect map of the world to drive a race car. You just need a good pair of eyes, a good memory, and a smart way to learn.

It's a huge step forward because it shows that AI can learn to drive in the messy, unpredictable real world just by looking at what's in front of it, paving the way for self-driving cars that don't need expensive, perfect sensors to navigate traffic.