Extension of ACETONE C code generator for multi-core architectures

This paper proposes extending the ACETONE C code generator, originally limited to sequential code, to support multi-core architectures by formally defining a processor assignment problem and outlining a future implementation of parallel code generation, scheduling heuristics, and synchronization mechanisms.

Yanis Aït-Aïssa (IRIT-TRACES), Thomas Carle (IRIT-TRACES), Sergei Chichin, Benjamin Lesage, Claire Pagetti

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are the manager of a busy kitchen in a high-stakes restaurant (like an airplane's navigation system). Your goal is to prepare a complex dish (a Deep Neural Network) that must be perfect and, crucially, finished within a strict, predictable time limit. If the dish isn't ready on time, the plane can't land safely.

For a long time, this kitchen had only one chef (a single-core processor). This chef was incredibly reliable and followed a strict recipe book (the ACETONE code generator) that ensured the food was always safe and the cooking time could be predicted down to the second. However, as the recipes got more complex, one chef simply couldn't cook fast enough.

The problem? The restaurant couldn't afford to buy a magical, specialized robot chef (a dedicated AI accelerator) yet. They had to stick with their existing team of human chefs (multi-core CPUs). But hiring more chefs introduced a new problem: coordination. If you just tell five chefs to start cooking at once, they might bump into each other, fight over the same ingredients, or wait around doing nothing.

This paper is about teaching the ACETONE system how to manage a team of chefs instead of just one. Here is how they did it, broken down into simple concepts:

1. The Recipe Map (The DAG)

First, the team realized that a complex recipe isn't just a list of steps; it's a map. Some steps must happen in order (you can't frost the cake before baking it), but other steps can happen at the same time (chopping vegetables and boiling water).

  • The Analogy: They turned the neural network into a flowchart (called a DAG). Imagine a subway map where some stations are connected by tracks (dependencies). You can't get to the next station until you pass the current one, but some lines run parallel.
  • The Goal: The system needs to figure out which chef works on which part of the map so that the whole meal is ready as fast as possible without anyone stepping on toes.

2. The Scheduling Puzzle (The Brain)

The hardest part is deciding who does what. This is a math puzzle.

  • The Old Way: The researchers looked at existing math formulas (ILP) to solve this. It was like trying to solve a Rubik's cube by checking every single possible move. It worked for small puzzles but took forever for big ones.
  • The New Way: They invented a smarter, faster way to solve the puzzle. They created two main strategies:
    • The "Fast & Good Enough" Strategy (ISH): Like a head chef who quickly assigns tasks to whoever is standing nearest to the ingredients. It's fast and gets the job done, but maybe not perfectly optimized.
    • The "Copy-Paste" Strategy (DSH): Sometimes, a chef has to wait for an ingredient to arrive from another station. Instead of waiting, this strategy says, "Let's just have a second chef cook a small batch of that ingredient right here!" It duplicates work to save time, trading a little extra memory for a lot of speed.

3. The Handshake (Synchronization)

When Chef A finishes chopping onions and needs to pass them to Chef B (who is on a different counter), they can't just throw them across the room. They need a system to ensure Chef B doesn't grab the onions before they are chopped, and Chef A doesn't drop new onions on top of the old ones.

  • The Analogy: They built a flag system in the shared memory.
    • Chef A writes the onions on a specific spot on the counter and raises a red flag.
    • Chef B looks at the red flag. If it's up, they know the onions are ready. They take them, lower the flag, and start cooking.
  • This ensures that even though the chefs are working in parallel, they never mess up the order of operations.

4. The Result: A Faster, Safer Kitchen

The team tested this new system on a real kitchen (a Texas Instruments computer chip with 4 cores).

  • The Outcome: By splitting the work among the cores and using their new "flag" system, they managed to cook the meal 8% faster overall.
  • The Real Win: While 8% sounds small, in the world of airplanes, that extra time is huge. More importantly, they proved that they could do this predictably. They knew exactly how long the worst-case scenario would take, which is the golden rule for safety-critical systems.

Why This Matters

Before this paper, if you wanted to run complex AI on an airplane, you had to wait for new, expensive hardware or accept that it would be too slow. This paper shows that we can take existing, reliable multi-core processors and make them work together like a well-oiled team.

It's like upgrading a single-lane road to a multi-lane highway with a smart traffic light system. You don't need to build a new highway; you just need to teach the cars (the code) how to drive in parallel without crashing. This makes advanced AI safer and more practical for the future of aviation.