⚛️ quantum physics

Planted-solution SAT and Ising benchmarks from integer factorization

This paper introduces a scalable and verifiable family of planted-solution benchmarks for SAT solvers and Ising optimization, derived from integer factorization constraints, which exhibit exponential runtime growth relative to the bit-length of the factors.

Original authors: Itay Hen

Published 2026-04-14

📖 4 min read🧠 Deep dive

CC BY 4.0

Original authors: Itay Hen

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to build a massive, intricate puzzle. Usually, when researchers test how good a computer is at solving puzzles, they either give it a pile of random junk (which is hard to verify) or a puzzle they made up that looks nice but doesn't get harder in a predictable way.

This paper introduces a new, super-organized puzzle based on something we all know: multiplication.

Here is the breakdown of what the authors did, using simple analogies:

1. The Core Idea: The "Reverse Multiplication" Puzzle

Think of multiplication like a factory assembly line. You take two numbers (let's call them Prime A and Prime B), run them through a machine, and out comes a big number (Product N).

The Normal Way: You know A and B, you press "Go," and the machine spits out N. Easy.
The Puzzle (Factorization): You are given only N. You have to figure out what A and B were. This is the "hard" problem that protects credit card numbers on the internet.

The authors built a puzzle where the computer has to act like a detective. It has to find the two secret numbers (A and B) that, when multiplied, create N. But here's the trick: The authors know the answer. They planted the solution (A and B) inside the puzzle. This means they can check if the computer is right or wrong instantly.

2. How They Built the Puzzle: The "Carry-Over" Chain Reaction

To turn multiplication into a puzzle a computer can solve, they broke it down into tiny logical steps (like "Is this bit 0 or 1?").

Imagine you are doing long multiplication by hand on a piece of paper. When you multiply two columns, sometimes the result is too big for that column, so you have to "carry over" a number to the next column.

The Magic: In this puzzle, that "carry-over" isn't just a small note; it's a domino effect.
A tiny change in the first column can send a ripple of carries all the way across the page, affecting columns far away.
The authors realized that these "ripples" create a massive, complex web of connections. It's like a game of telephone where a whisper at the start of the line gets amplified and distorted as it travels to the end.

3. Why It's a Great Test (The "Stress Test")

The authors wanted to see how fast modern computers (called SAT solvers) could solve these puzzles.

The Growth: They found that as they made the numbers slightly bigger (adding just one or two digits), the puzzle didn't just get a little harder; it got exponentially harder.
The Analogy: Imagine climbing a ladder. In a normal puzzle, every rung is the same height. In this puzzle, every time you add a rung, the ladder doubles in height.
The Result: When they tested it, the computers took roughly twice as long to solve the puzzle for every single extra digit they added. This is exactly the kind of difficulty researchers need to test if new quantum computers or super-advanced AI can actually break encryption.

4. Two Ways to Look at the Puzzle

The paper is special because it translates the same puzzle into two different languages:

SAT (Logic Language): Like a giant "True/False" checklist.
Ising (Physics Language): Like a magnetic puzzle where you have to align tiny magnets (spins) to find the lowest energy state.

This is like giving a mechanic the same car engine problem, but describing it once in English and once in Spanish. It allows researchers to test different types of computers (logic-based vs. physics-based) on the exact same problem to see which one is better.

5. The "Blueprint"

The authors didn't just solve one puzzle; they built a machine that generates infinite puzzles.

You tell the machine: "I want a puzzle with 20-digit numbers."
It instantly builds a unique, verifiable puzzle with a known answer.
It's scalable: You can make it easy (small numbers) or impossible (huge numbers) just by turning a dial.

Summary

Think of this paper as the invention of a perfectly calibrated stress test for computers.

Before this, testing computers on factorization was like trying to guess how strong a bridge is by throwing random rocks at it. Now, the authors have built a crane that drops weights of exact, known sizes onto the bridge. They can see exactly when the bridge bends, how much it bends, and if it breaks, all while knowing exactly what the "correct" answer should be.

This helps scientists understand the limits of current technology and prepares us for the future, where we might need to know if our digital locks are truly safe against super-powerful new computers.

1. Problem Statement

The paper addresses a critical gap in the benchmarking of Satisfiability (SAT) solvers and Ising optimization machines. Existing benchmarks generally fall into two categories:

Random Ensembles (e.g., random k-SAT): These offer systematic scalability and hardness near thresholds but lack a known "ground truth" solution, making it impossible to verify if a solver found the optimal answer.
Crafted Instances: These have known solutions but often lack controlled, single-parameter scaling or realistic structural complexity.

The authors propose a new class of planted-solution benchmarks derived from integer factorization. The goal is to create instances where the solution (the prime factors $p$ and $q$ ) is known by design, the difficulty scales systematically with a single parameter (bit-length $d$ ), and the structure reflects the deterministic, long-range correlations found in real-world arithmetic problems, rather than random disorder.

2. Methodology

The construction encodes the arithmetic constraints of multiplying two $d$ -bit primes ( $N = p \times q$ ) into a Constraint Satisfaction Problem (CSP). The pipeline consists of three stages:

A. Clause Generation from Binary Multiplication

The authors model the standard shift-and-add binary multiplication algorithm.

Partial Products: For every bit pair $(p_i, q_j)$ , an AND operation generates a partial product placed in column $k = i+j$ .
Column Contraction: Columns with multiple entries are reduced using half-adder logic. Two entries $x, y$ in a column are contracted into a sum ( $x \oplus y$ , remaining in the column) and a carry ( $x \land y$ , promoted to the next column).
Pinning: The final bits of the product $N$ are known constants. The single remaining variable in each column is "pinned" to match the corresponding bit of $N$ .
Result: This generates a system of AND clauses (carries), XOR clauses (sums), and pinning constraints.

B. Boolean Preprocessing

Before converting to standard CNF, the system undergoes iterative logical simplification:

Propagation: Fixed variables (from pinning) are substituted throughout the system.
Simplification: Logical rules (e.g., $0 \land x = 0$ , $x \oplus x = 0$ ) are applied to eliminate variables and clauses.
Equivalence Merging: Variables determined to be equal or complementary are merged using a union-find data structure.
Outcome: This significantly reduces the instance size for small $d$ , though the reduction ratio decreases as $d$ increases, leaving a "hard core" of residual constraints.

C. Conversion to DIMACS CNF and Ising Form

CNF: Residual AND and XOR clauses are converted to Conjunctive Normal Form (CNF) using standard encodings (3 clauses for AND, 4 for XOR).
Ising Compilation: The system is mapped to a quadratic Ising Hamiltonian $H(s)$ $H (s)$ .
- Boolean variables map to spins $s_i \in \{-1, +1\}$ .
- Gadgets: AND constraints are mapped to 3-spin quadratic gadgets. XOR constraints (which are cubic in spins) require one auxiliary spin to reduce to quadratic order.
- The ground state of the resulting Hamiltonian corresponds exactly to the planted factorization.

3. Key Contributions and Theoretical Analysis

A. Quartic Scaling ( $d^4$ )

The paper derives exact closed-form expressions for the instance size.

Mechanism: The key driver of complexity is carry cascading. A contraction in column $k$ generates a carry for $k+1$ , which increases the entry count in the next column, requiring more contractions, and so on.
Column Population: The number of entries in column $k$ ( $m_k$ ) grows quadratically, peaking at $O(d^2)$ near the middle of the multiplication table.
Total Size: Summing the contractions over $O(d^2)$ active columns results in a total number of constraints and variables scaling as $\Theta(d^4)$ .
Significance: This is a direct consequence of the arithmetic structure, distinguishing it from random SAT ensembles where size scales linearly or quadratically with the number of variables.

B. Long-Range Correlations

Unlike random SAT or frustration-based planted instances, these benchmarks possess deterministic, long-range correlations. A change in a low-order bit can propagate carries all the way to the most significant bit (distance $\sim d^2$ ), creating a complex interaction graph with:

Heterogeneous degree distribution (high-degree spins near the peak column).
Long-range edges connecting distant columns.
Hierarchical community structure based on column contraction.

C. Dual Representation

The construction provides a unified benchmark available in both CNF (for SAT solvers) and Ising (for classical/quantum optimizers), allowing for direct cross-platform comparison using the same ground truth.

4. Results

Empirical Benchmarking

The authors tested the instances on state-of-the-art Conflict-Driven Clause Learning (CDCL) SAT solvers (Kissat 3.0 and CaDiCaL 1.5) for bit-lengths $d$ ranging from 8 to 27.

Runtime Scaling: The median runtime $T$ $T$ grows exponentially with the bit-length $d$ $d$ .
- Fitted scaling: $T \sim 2^{\beta d}$ where $\beta \approx 1$ .
- Interpretation: Each additional bit roughly doubles the median runtime.
Solver Consistency: Both solvers exhibited nearly identical growth rates, suggesting the difficulty is inherent to the problem structure (carry correlations) rather than solver-specific heuristics.
Difficulty: At $d=27$ ( $N \approx 10^{16}$ ), runtimes reached $\sim 10^4$ seconds, indicating that instances with $d \ge 35$ will serve as rigorous stress tests for modern solvers.

Ising Model Properties

The compiled Ising models have a known ground-state energy $E_0$ .
The spectral gap (minimum energy penalty for violating a constraint) is bounded below by 2, providing a structured "staircase" penalty landscape.

5. Significance and Impact

New Benchmark Regime: This work introduces a benchmark family that is scalable, verifiable, and structurally rich. It fills the void between random ensembles (hard but unverified) and simple crafted instances (verified but structurally simple).
Solver Stress Testing: The $d^4$ scaling and exponential runtime growth make these instances ideal for probing the limits of CDCL heuristics and quantum annealing/optimization hardware.
Arithmetic Structure: By deriving hardness from the deterministic propagation of arithmetic carries, the paper provides a testbed for understanding how solvers handle long-range dependencies, a feature common in real-world problems but often absent in random benchmarks.
Open Source: The authors provide open-source software to generate these instances, enabling reproducibility and further research.

In summary, the paper successfully transforms the arithmetic problem of integer factorization into a rigorous, scalable, and verifiable benchmark suite for SAT and Ising solvers, demonstrating that the inherent carry-propagation structure of binary multiplication creates a uniquely challenging class of optimization problems.