YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search

This paper introduces YOLO-NAS-Bench, the first surrogate benchmark for YOLO-style object detectors, which employs a self-evolving mechanism to iteratively refine a LightGBM predictor, enabling efficient and accurate discovery of high-performing architectures that surpass official YOLO baselines.

Zhe Li, Xiaoyu Ding, Jiaxin Zheng, Yongtao Wang

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are a master chef trying to invent the world's best burger. You have a pantry full of ingredients (different types of buns, meats, cheeses, sauces) and a kitchen with a limited amount of time.

The Problem:
To find the perfect burger, you'd ideally want to cook and taste every single possible combination of ingredients. But there are millions of combinations! If you cooked one burger every hour, it would take you 100 years to try them all. In the world of AI, this is called Neural Architecture Search (NAS). Researchers want to automatically design the best AI "burger" (an object detector that finds things in photos), but training each design takes days of supercomputer time. It's too expensive and slow to try them all.

The Old Way:
Previously, researchers had a "menu" for simple tasks (like recognizing if a picture is a cat or a dog), but they didn't have a good menu for the complex task of finding objects in a scene (like spotting a cat and a dog in a busy park). They had to build their own test kitchens from scratch every time, making it hard to compare who was actually the best chef.

The Solution: YOLO-NAS-Bench
The authors of this paper built the first-ever "Tasting Menu" specifically for YOLO-style AI chefs. Think of it as a massive, pre-cooked database of 1,000 different burger recipes, where they already know exactly how good each one tastes (how accurate it is) and how long it takes to cook (how fast it is).

Here is how they made it even better, using a clever trick called the Self-Evolving Predictor:

1. The "Crystal Ball" (The Surrogate Predictor)

Instead of cooking every new burger idea from scratch, they trained a "Crystal Ball" (a computer program called a LightGBM predictor).

  • How it works: You tell the Crystal Ball, "I want a burger with a thick bun, spicy sauce, and double cheese."
  • The Magic: The Crystal Ball looks at its memory of the 1,000 burgers it already knows and says, "Based on what I've seen, this new combination will probably be a 9/10 on taste and take 10 minutes to cook."
  • The Benefit: This saves days of cooking time. You can test thousands of ideas in seconds just by asking the Crystal Ball.

2. The "Self-Evolving" Loop (The Secret Sauce)

Here is the paper's biggest innovation.

  • The Flaw: At first, the Crystal Ball was trained on random burgers. It was good at guessing average burgers, but it wasn't great at spotting the absolute best ones. It was like a food critic who had only eaten cafeteria food; they couldn't really tell the difference between a "good" burger and a "Michelin-star" burger.
  • The Fix: The authors created a loop where the Crystal Ball tries to find the best burgers it thinks exist.
    1. The Crystal Ball guesses which new, uncooked recipes might be amazing.
    2. The researchers actually cook (train) those specific "promising" recipes.
    3. They feed the results back to the Crystal Ball.
    4. Repeat: Now the Crystal Ball has tasted more "Michelin-star" burgers. It gets smarter at spotting the winners.

They did this 10 times. The Crystal Ball started with 1,000 recipes and ended up with 1,500, but the quality of its knowledge skyrocketed because it focused on the high-performance ones.

3. The Result: Beating the Pros

Once the Crystal Ball was super-smart, the researchers used it to search for the ultimate AI design.

  • They asked the Crystal Ball to find the best designs within specific time limits (like "find me the best burger that takes under 20 minutes to cook").
  • The Outcome: The designs the Crystal Ball found were better than the official, human-designed YOLO models (versions 8 through 12).
  • The Analogy: It's like a computer program looking at a menu of 1,000 burgers, predicting which new combinations would be best, and then inventing a burger that tastes better than the famous "Big Mac" or "Whopper," all without the chef having to spend years in the kitchen.

Summary

  • The Bottleneck: Designing AI is too slow because training takes forever.
  • The Benchmark: They built a library of 1,000 pre-tested AI designs (YOLO-NAS-Bench).
  • The Predictor: They built a "Crystal Ball" that predicts how good a new design will be without training it.
  • The Evolution: They made the Crystal Ball smarter by feeding it the best designs it discovered, creating a self-improving cycle.
  • The Win: Using this system, they found AI designs that are faster and more accurate than the current state-of-the-art human designs.

In short, they built a simulation lab where AI architects can test millions of ideas instantly, and they taught the simulation to get better at finding the winners by learning from its own discoveries.