⚛️ quantum physics

Unfair Sampling of Quantum Annealing in Weighted Graph Bipartitioning Problems

This study demonstrates that increasing the penalty coefficient in weighted graph bipartitioning problems generally improves the sampling fairness of quantum annealing across most instances, despite the trade-off of reduced ground-state probability under practical conditions.

Original authors: Shunta Ide, Shu Tanaka

Published 2026-04-14

📖 5 min read🧠 Deep dive

CC BY 4.0

Original authors: Shunta Ide, Shu Tanaka

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: Finding the Best Way to Split a Group

Imagine you are a party planner trying to split a group of 12 friends into two equal teams for a game. You want the teams to be perfectly balanced (50/50), but you also want to minimize the number of arguments between people who hate each other.

This is a combinatorial optimization problem. In the real world, there might be many different ways to split the group that result in the exact same "perfect" score. These are called degenerate ground states.

The Problem:
You have a super-smart computer (a Quantum Annealer) that is supposed to find these perfect splits. Ideally, if there are four perfect ways to split the group, the computer should find each of those four ways exactly 25% of the time. This is called fair sampling.

However, the researchers found that the computer is biased. It tends to pick two of the perfect splits 40% of the time each, and the other two only 10% of the time. It's like a referee who secretly prefers one team over another, even though both teams are equally good. This is unfair sampling.

The Solution: The "Penalty" Knob

To force the computer to respect the "equal team size" rule, scientists use a trick called the Penalty Method.

Think of the computer's goal as a hiker trying to find the lowest point in a valley (the best solution).

The Objective: The hiker wants to find the deepest valley (minimize arguments).
The Constraint: The hiker must stay on a specific path (keep teams equal size).

If the hiker steps off the path (unequal teams), they get hit with a "penalty." In the computer's code, this is a Penalty Coefficient (let's call it the Penalty Knob).

Low Knob: The penalty for stepping off the path is a gentle tap on the wrist.
High Knob: The penalty is a giant boulder dropped on your foot.

What the Researchers Discovered

The team (Shunta Ide and Shu Tanaka) asked a simple question: What happens to the "fairness" of the computer's choices if we turn up the Penalty Knob?

They tested this using a simulation and a real quantum computer made by D-Wave. Here is what they found:

1. The Trade-Off (The "Speed vs. Accuracy" Dilemma)

When they turned the Penalty Knob up high:

Good News: The computer became much more fair. It started picking all the perfect solutions with equal probability. It stopped favoring the "easy" ones.
Bad News: The computer got slower at finding any solution at all. Because the penalty was so heavy, the "landscape" became very steep and confusing, making it harder for the computer to find the bottom of the valley.

Analogy: Imagine you are trying to find a specific key in a dark room.

If you turn the lights up just a little (low penalty), you might find the key quickly, but you might only find the one that looks the most like a key, ignoring the others.
If you turn the lights up to maximum brightness (high penalty), you can see all the keys clearly and pick them randomly (fairness). But, the room is now so bright and confusing that it takes you much longer to find any key.

2. The Real-World Test

They tested this on the actual D-Wave machine (a real quantum computer). Even though real computers are noisy and imperfect, the same pattern held true: Turning up the penalty made the sampling fairer, but slightly harder to get a result.

3. The "Mostly True" Rule

They ran thousands of tests with different group sizes (from 4 friends up to 12).

The Result: In about 70% to 75% of cases, turning up the penalty knob made the sampling fairer.
The Catch: It didn't work for every single case. Sometimes, making the penalty too high made things worse or didn't change anything. But for the vast majority, the "High Penalty = Fairer Results" rule held up.

Why Does This Matter?

Usually, scientists tune the Penalty Knob just to make sure the computer follows the rules (e.g., "Don't give me a team with 3 people and 9 people"). They treat it like a simple on/off switch for rules.

This paper shows that the Penalty Knob is actually a dial for fairness.

If you just need one answer, you might keep the knob low to get a result fast.
If you need to understand the variety of all possible solutions (like in drug discovery or financial modeling where you need to see all options), you should turn the knob up to ensure you aren't missing hidden solutions just because the computer is biased.

The Bottom Line

Quantum computers are great at solving hard puzzles, but they have a habit of being "picky" and ignoring some perfect solutions. This study found a simple way to fix that bias: Increase the penalty for breaking the rules.

It's like telling a biased judge, "If you don't pick from the full list of qualified candidates, you get in big trouble." The judge might take longer to make a decision, but when they do, they will pick from the whole list fairly.

Future Work: The researchers admit they don't fully understand why this happens yet. They plan to dig deeper into the physics to see if they can design even better ways to make these quantum computers fair, perhaps by changing the "rules of the game" entirely rather than just adding penalties.

1. Problem Statement

Quantum Annealing (QA) is a leading approach for solving combinatorial optimization problems. However, a critical limitation known as unfair sampling persists: even in the adiabatic limit (sufficiently long annealing times), degenerate ground states (multiple optimal solutions) are not sampled with equal probability. This bias hinders applications requiring solution diversity, such as combinatorial counting and diversity-aware optimization.

This issue is particularly acute in constrained combinatorial optimization problems, where constraints are typically enforced via a penalty method. In this method, a penalty term ( $H_{const}$ ) is added to the objective Hamiltonian ( $H_{obj}$ ) with a coefficient $\mu$ . While the penalty coefficient is traditionally tuned to ensure feasibility (i.e., finding a valid solution), its specific influence on sampling fairness among degenerate ground states remains poorly understood.

The authors investigate this gap using the Weighted Graph Bipartitioning Problem (GBP) as a testbed. GBP involves partitioning a graph's vertices into two equal-sized subsets to minimize the cut weight, a problem with a hard equality constraint.

2. Methodology

The study employs a multi-faceted approach combining theoretical simulation and experimental validation:

Hamiltonian Formulation:
The problem is mapped to an Ising Hamiltonian. The total problem Hamiltonian is defined as:
$H_p = H_{obj} + \mu H_{const}$
Where $H_{obj}$ represents the cut weight and $H_{const} = (\sum \sigma_i^z)^2$ enforces the equal-partition constraint. The penalty coefficient is decomposed as $\mu = \mu_{opt} + \mu_+$ , where $\mu_{opt}$ is the minimum required for feasibility, and $\mu_+$ is the additional penalty.
Numerical Simulations:
- The time-dependent Schrödinger equation was solved using QuTiP to simulate QA dynamics.
- Single Instance Analysis: A specific 6-spin GBP instance was analyzed to characterize the relationship between annealing time ( $T$ ), penalty coefficient ( $\mu_+$ ), and sampling distribution.
- Scaling Analysis: 100 randomly generated instances were created for system sizes $N \in \{4, 6, 8, 10, 12\}$ . The study used a normalized penalty parameter $\lambda \in [0, 1]$ to cover the full range of $\mu$ .
- Metric: Sampling fairness was quantified using Shannon Entropy ( $S$ ) of the ground-state probability distribution. $S = \log_2 D$ (where $D$ is degeneracy) indicates perfect fairness; lower values indicate bias.
Experimental Validation:
- Experiments were conducted on the D-Wave Advantage2 System 1.13.
- Techniques included Parallel Annealing (using 400 distinct embeddings) and Spin Reversal Transformation (SRT) (10 gauge transformations) to mitigate hardware noise and systematic errors.
- The annealing time was set to 200 $\mu$ s (10x the default) to approach the adiabatic regime.

3. Key Contributions

Systematic Investigation of Penalty Coefficients: The paper is the first to systematically analyze how the penalty coefficient in constrained QA affects sampling fairness, moving beyond its traditional role of merely ensuring feasibility.
Identification of a Trade-off vs. Regime Dependence: The study reveals that the effect of the penalty coefficient on fairness depends on the annealing regime:
- Short Annealing Times: Increasing the penalty improves fairness but reduces the total ground-state probability (a trade-off).
- Near-Adiabatic Regimes: Increasing the penalty improves fairness without sacrificing ground-state probability.
Hardware Verification: The authors confirmed that unfair sampling persists on actual D-Wave hardware and that the qualitative trend of increasing fairness with higher penalties holds true, despite thermal noise.
Scaling Analysis: A comprehensive scaling study up to 12 spins demonstrates that while the trend is not universal, it holds for the majority of instances.

4. Key Results

Single Instance Behavior:
- For a 6-spin instance with 4 degenerate ground states, increasing $\mu_+$ suppressed the sampling bias.
- At short annealing times ( $T \lesssim 10^3$ ), higher $\mu_+$ increased entropy (fairness) but decreased the total probability of finding any ground state ( $P_{GS}$ ).
- At long annealing times ( $T \approx 10^4$ ), $P_{GS}$ remained near unity for all $\mu_+$ , and increasing $\mu_+$ monotonically increased entropy, effectively eliminating the trade-off.
D-Wave Hardware Results:
- Unfair sampling was observed on the D-Wave system, though the magnitude was lower than in simulations (likely due to thermal relaxation and noise).
- The entropy initially increased slightly with $\mu_+$ , dipped at $\mu_+ \approx 0.4$ , and then increased again. Crucially, the states that were biased in simulation remained the biased states on hardware, confirming the structural nature of the bias.
- As $\mu_+$ increased, the sampling distribution became closer to uniform, even as $P_{GS}$ decreased.
Scaling Analysis (N = 4 to 12):
- For $N=4$ , all instances showed fair sampling regardless of $\lambda$ .
- For larger $N$ , the behavior became complex, with some instances showing non-monotonic entropy changes.
- Monotonic Increase Rate: Table I shows the fraction of instances where entropy increased monotonically with the penalty coefficient:
  - $N=4$ : 100%
  - $N=6$ : 91%
  - $N=8$ : 74%
  - $N=10$ : 72%
  - $N=12$ : 74%
- The rate saturates around 70–75% for larger systems, indicating that increasing the penalty coefficient improves fairness in the majority of cases, even if not universally.

5. Significance and Future Directions

New Perspective on Penalty Design: The findings suggest that the penalty coefficient should not be tuned solely for feasibility but also as a control parameter for sampling diversity.
Theoretical Implications: The results highlight a gap in theoretical understanding, particularly regarding degenerate perturbation theory in constrained settings. The mechanism driving the bias remains an open question.
Future Work: The authors propose investigating the role of noise/thermal relaxation quantitatively, exploring larger degeneracies, extending the analysis to other constrained problems, and utilizing structure-aware drivers (e.g., XY-mixers or QAOA) that inherently preserve constraints to potentially eliminate unfair sampling.

In conclusion, this work provides empirical evidence that increasing the penalty coefficient is a viable strategy to mitigate unfair sampling in constrained quantum annealing, offering a practical lever for improving solution diversity in real-world applications.