Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

This paper proposes a joint hardware-workload co-optimization framework using an evolutionary algorithm to design generalized in-memory computing accelerators that significantly reduce the energy-delay-area product across multiple neural network workloads, overcoming the limitations of single-workload specialized designs.

Olga Krestinskaya, Mohammed E. Fouda, Ahmed Eltawil, Khaled N. Salama

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you are a chef trying to design the ultimate kitchen.

The Problem: The "One-Trick Pony" Kitchen

In the world of AI (Artificial Intelligence), computers need special "kitchens" called accelerators to cook up complex recipes (neural networks) quickly and without wasting energy.

Currently, most engineers design these kitchens for just one specific recipe.

  • If you want to cook a giant turkey (a huge AI model), you build a massive oven with huge burners.
  • If you want to cook a delicate soufflé (a small AI model), you build a tiny, precise stove.

The problem? Most real-world devices (like your phone or a self-driving car) need to cook many different recipes at once. If you use the "turkey kitchen" to make a soufflé, it's wasteful and slow. If you use the "soufflé kitchen" for a turkey, it burns out.

Existing methods try to solve this by either:

  1. Designing a kitchen for the biggest recipe (which is overkill for small tasks).
  2. Designing a kitchen for one specific recipe at a time (which doesn't work if you need to switch tasks).

The Solution: The "Universal Kitchen"

This paper introduces a new way to design a Universal Kitchen (a generalized In-Memory Computing accelerator) that can cook any recipe efficiently, from soufflés to turkeys, without wasting energy.

The authors call this "Joint Hardware-Workload Co-Optimization."

Here is how they did it, using some simple analogies:

1. The "Taste Test" Strategy (The Algorithm)

Instead of just guessing what the perfect kitchen looks like, they used a smart search engine called a Genetic Algorithm. Think of this like a reality TV cooking competition:

  • The Contestants: They generate thousands of random kitchen designs (different sizes of ovens, different numbers of burners, different layouts).
  • The Judges: They don't just test one dish. They cook four to nine different recipes on every single kitchen design.
  • The Score: They give a score based on how fast it cooked, how much electricity it used, and how much counter space it took.
  • The Evolution: The worst kitchens are thrown out. The best ones "mate" (combine their best features) to create new, better kitchens for the next round.

2. The "Smart Start" (Hamming Distance Sampling)

Usually, these competitions start with random kitchens. Sometimes, you get lucky and start with a great kitchen; other times, you start with a disaster. This leads to inconsistent results.

The authors added a clever twist: The Diversity Check.
Before the competition even starts, they look at all the random kitchens and pick the ones that are most different from each other.

  • Analogy: Imagine picking a team of explorers. Instead of picking 5 people who all look the same, you pick one who is tall, one who is short, one who is an expert in deserts, and one who is an expert in snow. This ensures you cover all possibilities and don't get stuck in a "local trap" (like only exploring the desert when you needed to find a mountain).

3. The Four-Phase Cooking Process

They didn't just run the competition once. They ran it in four distinct phases, like a chef refining a dish:

  • Phase 1 (Exploration): Throw everything at the wall. Try wild, crazy kitchen layouts to see what's possible.
  • Phase 2 (Transition): Start narrowing it down. Keep the good ideas but mix them carefully.
  • Phase 3 (Convergence): Focus on the top contenders. Make small, precise tweaks.
  • Phase 4 (Fine-Tuning): The final polish. Adjust the knobs by a tiny fraction to get the perfect temperature.

The Results: Why It Matters

The results were impressive. By using this "Universal Kitchen" approach:

  • Energy Savings: They reduced the energy cost (and time) by up to 95% compared to trying to force a "biggest recipe" kitchen to do everything.
  • No Compromise: They proved you don't have to sacrifice performance to get a general-purpose machine. The "Universal Kitchen" was almost as good as a "Specialist Kitchen" for every single recipe.
  • Flexibility: They tested this on two different types of "kitchen tools" (RRAM and SRAM memory) and even looked at how the cost of building the kitchen changes if you use different manufacturing technologies (like 7nm vs. 32nm chips).

The Big Picture

Think of this paper as a blueprint for building smart, adaptable AI chips. Instead of building a custom car for every single road trip, they figured out how to build one super-car that handles highways, dirt roads, and city streets equally well, saving money and fuel in the process.

This is a huge step forward for making AI faster, cheaper, and more energy-efficient for the devices we use every day.