Imagine you are trying to bake the perfect cake, but you are working in a brand-new, high-tech kitchen (the Huawei Ascend NPU) that no one has ever used before.
In the old, popular kitchen (the NVIDIA GPU), there are thousands of recipe books, cooking shows, and expert chefs to help you. If you want to bake a cake, you can just look up a recipe, tweak it, and you're good to go.
But in this new kitchen, there are no recipe books. The instructions are written in a strange, complex language, and if you get the timing or the ingredient placement even slightly wrong, the oven explodes (the code fails to compile). This is the "knowledge bottleneck" the paper talks about.
Enter AscendOptimizer, a smart, self-teaching robot chef designed to solve this problem without needing a human expert or a massive library of old recipes.
Here is how it works, broken down into simple steps:
1. The Two-Part Cake (The Problem)
Making a high-performance operator on this chip isn't just about the cooking (the math); it's about two things working together:
- The Delivery Driver (Host-side Tiling): This decides how to chop the ingredients into small, manageable chunks and move them from the pantry to the counter. If the chunks are too big, the counter gets cluttered. If they are too small, the driver wastes time walking back and forth.
- The Chef (Device-side Kernel): This is the actual cooking. It decides how to chop, mix, and bake the ingredients efficiently.
The problem is that these two are coupled. You can't just fix the delivery driver without checking if the chef can handle the new chunks, and vice versa.
2. The Robot's Strategy: A Two-Step Dance
Since the robot can't look up a recipe, it has to invent one through trial and error, but it does so very smartly using two different tricks.
Step 1: The "Evolutionary" Guessing Game (Optimizing the Delivery)
The robot starts with a basic plan for moving ingredients. It then tries thousands of tiny variations: "What if I move 10% more? What if I move them in a zig-zag?"
- The Magic: It doesn't just guess; it tests every guess on the real hardware immediately.
- The Filter: If a guess causes the oven to explode (compile error) or the cake to burn (wrong math), it instantly throws that idea away.
- The Result: Over time, the robot "evolves" a delivery plan that is perfectly tuned to the physical limits of this specific kitchen, finding the fastest way to move data without crashing the system.
Step 2: The "Rewind" Trick (Optimizing the Cooking)
This is the cleverest part. The robot needs to learn how to cook better, but it has no "Good vs. Bad" examples to study. So, it creates its own examples!
- The Rewind: The robot takes a "good" piece of code it has already found and deliberately breaks it. It removes a shortcut, slows down a process, or makes the code messy.
- The Lesson: Now it has a "Bad" version and a "Good" version. It asks its AI brain: "What exactly did I change to make this slow? How do I fix it?"
- The Library: It writes down these "Bad-to-Good" fixes in a notebook (an Experience Bank).
- The Application: When it encounters a new, slow operator, it looks at the problem, checks its notebook, and says, "Ah, this looks like that time I broke the mixing speed. I know how to fix it!" It then applies that fix.
3. The Loop: Dancing Together
The robot doesn't just do Step 1 then Step 2 once. It dances between them:
- It tweaks the Delivery to make the ingredients arrive faster.
- It tweaks the Cooking to make the processing faster.
- It checks the results. If the new cooking style needs different delivery chunks, it goes back to Step 1.
- It keeps switching back and forth, slowly refining the whole process until it hits the speed limit of the hardware.
The Results
The researchers tested this robot on 127 real-world tasks.
- The Baseline: The standard, open-source code (the "average chef").
- The Result: AscendOptimizer made the code run 1.19 times faster on average.
- The Wow Factor: For nearly half of the tasks, it was significantly faster, with some tasks running 2x faster or more.
Why This Matters
Before this, if you wanted to write fast code for Huawei's chips, you needed a rare, expensive human expert who knew all the secrets. Now, this "episodic agent" (the robot) can bootstrap its own expertise. It learns by doing, by breaking things on purpose to understand how to fix them, and by constantly testing on the real hardware.
It's like teaching a robot to drive a Formula 1 car on a track it's never seen before, not by giving it a manual, but by letting it crash a few times, learn from the crashes, and then drive faster than any human could without a manual.