Benchmarking Quantum Computers: Towards a Standard Performance Evaluation Approach

This paper reviews classical and quantum benchmarking methodologies to identify the unique challenges of evaluating quantum processors, assesses existing metrics against quality attributes, and proposes general guidelines to establish a standardized performance evaluation framework akin to SPEC for the quantum computing industry.

Arturo Acuaviva, David Aguirre, Rubén Peña, Mikel Sanz

Published Wed, 11 Ma
📖 7 min read🧠 Deep dive

Here is an explanation of the paper, "Benchmarking Quantum Computers: Towards a Standard Performance Evaluation Approach," translated into simple language with creative analogies.

The Big Picture: The "Car Showroom" Problem

Imagine you walk into a car showroom. One salesperson tells you, "My car is the fastest because it has 500 horsepower!" Another says, "Mine is the best because it gets 50 miles per gallon!" A third claims, "Mine is the safest because it has 10 airbags!"

You are confused. How do you compare them? Are you looking for a race car, a family hauler, or an off-roader? In the early days of classical computers (the ones we use today), manufacturers did the same thing. They invented their own tests to make their machines look good, often cheating by optimizing their computers specifically for those tests. This led to a mess where no one knew which computer was actually "better."

Eventually, the computer industry created a "referee" called SPEC (Standard Performance Evaluation Corporation). They created a standard set of driving tests (like a 100-mile highway drive, a steep hill climb, and a city commute) so everyone could compare cars fairly.

Now, the quantum computing world is in the same mess. We have different types of quantum computers (some use trapped ions, some use superconducting circuits, some use light). They are all very different, and there is no agreed-upon way to say which one is the "best." This paper argues that we need to create our own "SPEC" for quantum computers to stop the marketing hype and start real progress.


Part 1: Why We Can't Just Copy-Paste Old Rules

The authors explain that we can't just take the old rules for regular computers and slap them onto quantum computers. It's like trying to use a ruler to measure the temperature of a soup.

The Quantum "Magic" (and the Mess):

  • Fragile Qubits: Classical bits are like light switches (On or Off). Quantum bits (qubits) are like spinning coins. They are in a state of "both heads and tails" until you look at them. But if you touch them, or if the room gets too hot, they stop spinning and fall flat. This is called noise.
  • The "Reset" Button: Every time a quantum computer finishes a calculation, it has to be reset to zero. It's like a runner who has to walk all the way back to the starting line after every single lap. This takes time and energy.
  • The "Black Box" Problem: In a classical computer, you can see exactly what the processor is doing. In a quantum computer, the answer is probabilistic. You run the same test 1,000 times, and you get slightly different results each time. You have to guess the "true" answer based on the pattern.

Because of these weird rules, a simple number like "Speed" doesn't mean the same thing for a quantum computer as it does for a laptop.


Part 2: The Five "Golden Rules" for a Good Test

The paper suggests that if we want to test quantum computers fairly, any test (benchmark) must follow five golden rules. Think of these as the rules for a fair Olympic sport:

  1. Relevance: The test must actually matter. If you are testing a race car, don't measure how well it parks. For quantum computers, we need tests that actually solve problems we care about (like simulating new medicines), not just random math puzzles that don't mean anything.
  2. Reproducibility: If I run the test today, and you run it tomorrow with the same settings, we should get the same result. If the results change wildly because the machine is "moody," the test is useless.
  3. Fairness: The test shouldn't favor one type of machine over another just because of how it's written. It shouldn't be like a race where one runner has to run through mud while the other runs on a track.
  4. Verifiability: We need to be able to check the work. If a quantum computer says "I solved this," we need a way to prove it didn't just guess. (This is hard in quantum physics because sometimes the answer is too complex for us to check with our current tools).
  5. Usability: The test shouldn't be so expensive or complicated that only the biggest companies can afford to run it. It needs to be accessible.

Part 3: The "Toolbox" of Metrics (The Scorecards)

The authors reviewed many different ways people are currently trying to score quantum computers. They found that most of these "scorecards" are flawed.

  • The "Number of Qubits" Trap: Some people say, "I have 100 qubits, you have 50, so I win!" The authors say this is like saying, "I have a bigger engine, so I win," without checking if the engine actually works. A noisy 100-qubit computer might be worse than a clean 10-qubit one.
  • Quantum Volume (QV): This is a popular test that measures how big and complex a circuit a computer can run. It's a good start, but it's like measuring a car only by how fast it can drive in a straight line. It ignores how well it handles turns or bad weather.
  • The "Cheating" Problem: Many tests require a supercomputer to check the answer. But if the quantum computer is supposed to be better than a supercomputer, how can we check the answer if the supercomputer can't do it? This creates a paradox.

The paper concludes that no single number can tell us if a quantum computer is good. We need a whole "report card" with many different grades.


Part 4: The Roadmap (How to Fix It)

The authors propose a step-by-step plan to fix the industry:

  1. Match the Test to the Era: We are currently in the "NISQ" era (Noisy Intermediate-Scale Quantum). This is the "prototype" phase. We shouldn't be testing these machines on problems they can't solve yet (like breaking encryption). We should test them on things they can do, to help engineers improve the hardware.
  2. Report "Base" and "Peak" Scores:
    • Base Score: How does the computer perform with standard, no-frills settings? (Like a car in "Eco Mode"). This allows fair comparison.
    • Peak Score: How does it perform when experts tweak every setting to get the absolute best result? (Like a car in "Race Mode"). This shows the machine's potential.
  3. Create a "Referee" Organization (SPEQC): The authors propose creating a non-profit organization called SPEQC (Standard Performance Evaluation for Quantum Computers).
    • Think of SPEQC as the "Olympic Committee" for quantum computing.
    • They would create the standard tests.
    • They would make sure everyone follows the rules.
    • They would publish the results so consumers (scientists and companies) know which hardware is actually the best.

The Final Takeaway

The paper is a call to action. It says: "Stop the marketing games. Let's build a fair, standardized way to measure quantum computers so we can actually make them better."

Just as the car industry needed standard crash tests and fuel economy ratings to become a mature industry, quantum computing needs standard benchmarks to move from "science fiction" to "real-world technology." Without these standards, we risk building the wrong things, wasting money, and getting confused by fake progress.