⚛️ quantum physics

Training single-electron and single-photon stochastic physical neural networks

This paper proposes and demonstrates the training of novel single-electron and single-photon stochastic physical neural networks, showing that using empirical outputs in the backward pass enables these noise-resilient architectures to achieve over 97% accuracy on MNIST digit classification despite high stochasticity and model uncertainty.

Original authors: Tong Dou, Shiro Kumara, Josh Burns, Ethan Sigler, Parth Girdhar, David Petty, Gerard Milburn, Jo Plested, Matt Woolley

Published 2026-04-14

📖 6 min read🧠 Deep dive

CC BY 4.0

Original authors: Tong Dou, Shiro Kumara, Josh Burns, Ethan Sigler, Parth Girdhar, David Petty, Gerard Milburn, Jo Plested, Matt Woolley

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Idea: Building Brains Out of Dice and Light

Imagine you are trying to build a super-smart computer brain (a neural network) to recognize handwritten numbers, like the digits on a mail envelope.

Usually, we build these brains using silicon chips that act like perfect, deterministic calculators. If you give them the same input twice, they give the exact same answer twice. But these chips are hungry; they eat a lot of electricity and generate a lot of heat.

This paper asks a bold question: What if we built these brains using the actual, messy, random nature of the physical world? What if, instead of a perfect calculator, our "neurons" were like rolling dice or flipping a coin?

The authors propose a new type of computer called a Physical Neural Network (PNN). In these networks, the "neurons" aren't software code; they are real physical devices (like tiny electronic dots or beams of light) that behave randomly.

The Problem: How Do You Train a Random Brain?

Here is the tricky part: In a normal computer, if you make a mistake, the computer knows exactly why and how to fix it. It's like a student who gets a math problem wrong, looks at the steps, and says, "Ah, I added 2+2 wrong."

But in a Stochastic PNN, the "neuron" is a coin flip.

Input: "Is this a 7?"
Neuron: Flips a coin.
Output: "Heads (Yes)!" or "Tails (No)!"

If the coin lands on "No" and the answer was actually "Yes," the computer can't just look at the math and say, "I should have added differently." The coin is random! You can't calculate the gradient (the direction to fix the mistake) easily because the output is just a random 0 or 1.

It's like trying to teach a student who only answers questions by rolling dice. How do you tell them to study harder if they just got lucky?

The Solution: Three New Types of "Dice Rollers"

The authors designed three specific physical devices to act as these random neurons:

The Single-Electron Transistor (The Tiny Electron Gate):
Imagine a tiny room (a quantum dot) where electrons (tiny charged particles) want to get in. The "pre-activation" (the input signal) is like a gate that controls how likely an electron is to tunnel through.
- The Randomness: Sometimes an electron sneaks in; sometimes it doesn't. It's a game of chance.
- The Output: If an electron is inside, the neuron says "1" (Active). If the room is empty, it says "0" (Inactive).
The Single-Photon Detector (The Light Catcher):
Imagine a very sensitive camera that can detect just one photon (a particle of light).
- The Randomness: You shine a dim light at it. Sometimes the camera clicks (detects a photon), sometimes it doesn't. It's based on the random nature of light.
- The Output: Click = "1", No Click = "0".
The True Single-Photon Neuron (The Quantum Switch):
This is the most futuristic one. Imagine a machine that shoots exactly one photon at a time. This photon hits a special mirror (a beam splitter) that can either let it pass or reflect it.
- The Randomness: The photon has a 50/50 (or adjustable) chance of going one way or the other.
- The Output: If it goes to detector A, the neuron is "1". If it goes to detector B, it's "0".

The Training Strategy: How to Teach the Dice

Since the neurons are random, the authors had to invent new ways to "teach" the network. They tested three main strategies:

1. The "Perfect Knowledge" Method (True Probability)

Imagine you are the teacher, and you know the exact odds of the coin landing on heads (e.g., 70%). Even though the student (the computer) only sees the result of one flip, you, the teacher, know the underlying probability.

How it works: During training, the computer ignores the single random flip and instead uses the mathematical average (the 70% chance) to figure out how to adjust the weights.
Result: This works great, but it requires knowing the exact physics of the device perfectly, which is hard in the real world.

2. The "Empirical" Method (Learning from Samples)

Now, imagine you don't know the odds. You only see the results of the coin flips.

How it works: The computer looks at the actual results (e.g., "It flipped heads 3 times out of 10"). It uses these real-world samples to guess the direction to move. It's like a student who learns by trial and error, keeping a tally of what happened.
Result: This is very practical because you don't need to know the perfect physics. The paper shows that even with very few samples (just a few coin flips), the network can still learn to recognize digits with over 97% accuracy.

3. The "Straight-Through" Method (The Magic Trick)

This is a clever shortcut.

How it works: During the "forward pass" (making a guess), the computer lets the coin flip happen (it's random). But during the "backward pass" (learning from mistakes), the computer pretends the coin flip was smooth and predictable, as if it were a normal calculator.
Result: It's a bit of a lie, but it works surprisingly well. It allows the network to learn quickly without getting stuck on the randomness.

The Results: Why This Matters

The authors tested these ideas on a classic task: recognizing handwritten numbers (the MNIST dataset).

High Accuracy: Even though the neurons were "noisy" and random, the network achieved 97%+ accuracy.
Robustness: The network didn't break when the "coins" were biased or the "light" was dim. It handled the noise naturally.
Few Trials Needed: You don't need to flip the coin a million times to get a good answer. Even flipping it just a few times per layer was enough to train the brain effectively.

The Takeaway: Embracing the Chaos

The main message of this paper is that we don't need to fight against the randomness of the physical world to build smart computers. In fact, we can embrace it.

Think of it like this:

Old Way: Build a perfect, rigid, energy-hungry robot that tries to eliminate all mistakes.
New Way: Build a flexible, low-energy system that uses the natural "fuzziness" of electrons and photons as a feature, not a bug.

By treating the randomness as a built-in part of the brain (like a probabilistic switch), we can create computers that are potentially much faster and use a fraction of the energy of today's supercomputers. This opens the door to "green" AI that runs on the fundamental laws of physics rather than brute-force digital calculation.

1. Problem Statement

Deep learning faces escalating computational and energetic costs, motivating the exploration of Physical Neural Networks (PNNs) where computation is performed directly via physical processes. However, a central challenge in PNNs is training in the presence of device non-idealities and intrinsic noise.

The Regime Shift: In conventional analog PNNs, noise is often treated as a small perturbation to deterministic signals. However, in extreme low-energy regimes (using single-electrons or single-photons), noise is fundamental (e.g., charge discreteness, photon shot noise).
The Core Issue: In these regimes, neuron outputs are inherently stochastic and discrete (binary outcomes like 0 or 1) rather than continuous values. Standard backpropagation fails because the activation functions are non-differentiable.
The Training Bottleneck: While "physics-aware" training exists, it often assumes access to the underlying activation probabilities or infinite sampling (infinite trials). In real hardware, only a limited number of stochastic samples (trials) are accessible per neuron evaluation. The paper asks: Can stochastic PNNs be reliably trained using only limited sampling data, especially when the output layer operates at a low Signal-to-Noise Ratio (SNR)?

2. Methodology

A. Physical Realizations of Stochastic Neurons

The authors propose and model three specific types of Physical Stochastic Neurons (PSNs) where the activation is a probabilistic switch $h_i \in \{0, 1\}$ with probability $p_{PSN}(z_i)$ :

Single-Photon Detector (SPD) Neuron: Uses coherent light where photon counting follows a Poisson distribution. The "click" probability is $1 - e^{-\lambda}$ .
Single-Electron Transistor (SET) Neuron: Based on a semiconductor quantum dot in the Coulomb blockade regime. The charge state (occupied/unoccupied) is governed by tunneling rates, resulting in a Fermi-Dirac distribution that maps to a sigmoid activation probability.
True Single-Photon (TSP) Neuron: A novel proposal using a deterministic single-photon source driving a controllable beam-splitter-like interaction (e.g., in an optomechanical system). The stochasticity arises from measuring the occupation of a secondary mode.

B. Training Strategies

The paper investigates three distinct backpropagation-compatible estimators for training these networks on the MNIST dataset (784-400-10 architecture):

True Probability (TP) Approach (Benchmark):
- Assumes the activation probability $p_{PSN}(z)$ is known or can be computed exactly.
- In the backward pass, gradients are propagated through the expectation value (the smooth probability function) rather than the discrete sample.
- Limitation: Requires knowledge of the underlying physics model and infinite trials to approximate the expectation perfectly.
Empirical Gradient (EG) Estimator:
- Designed for scenarios where $p_{PSN}(z)$ is unknown and only discrete samples are available.
- Replaces the unknown probability $p$ in the gradient derivative with the empirical sample mean ( $\hat{h}$ ) from $K$ trials.
- Requires the activation probability to have an "autonomous representation" (the derivative can be expressed solely as a function of the probability itself, e.g., $p'(z) = p(1-p)$ for sigmoid).
- This acts as an implicit regularizer by embracing the sampling noise.
Straight-Through (ST) Estimator:
- A heuristic that bypasses the stochastic sampling and nonlinearity entirely in the backward pass by substituting a surrogate gradient (typically the identity matrix).
- The paper extends this to the output layer by replacing true probabilities with discrete sampled labels.

C. Output Layer Handling

Smoothing: To handle the numerical singularity of Cross-Entropy (CE) loss when a target class is not sampled in finite trials ( $K < \infty$ ), the authors introduce a sample smoothing technique (inspired by label smoothing) to ensure non-zero probabilities for all classes.
Softmax vs. Linear: The paper compares standard Softmax+CE against Linear+MSE formulations, analyzing the trade-offs in shallow vs. deep architectures.

3. Key Contributions

Novel Neuron Models: Introduction of the True Single-Photon (TSP) stochastic neuron, offering a pathway to fully quantum stochastic PNNs, alongside detailed models for SET and SPD neurons.
Finite-Sampling Training Framework: Development and validation of the Empirical Gradient (EG) estimator, demonstrating that PNNs can be trained effectively without access to the exact activation probabilities, relying solely on discrete samples.
Robustness to Noise: Demonstration that high-accuracy training is possible even with very few trials per layer (e.g., $K=2$ to $10$), challenging the notion that stochastic PNNs require massive sampling budgets to converge.
Architecture Analysis: Systematic comparison of training configurations (TP vs. EG vs. ST) across hidden and output layers, revealing that combining EG in hidden layers with ST in the output layer yields optimal performance.

4. Results

Accuracy with Few Trials: Using the EG estimator in the hidden layer and the TP approach (infinite trials) in the output layer, the network achieves >97% test accuracy on MNIST with as few as 2–3 trials per hidden neuron.
Fully Empirical Training: Even when applying the EG estimator to both hidden and output layers (fully stochastic forward and backward pass), the network converges to high accuracy (>97%) as the number of trials increases, though it is more sensitive to low trial counts initially.
ST Estimator Performance: Using ST estimators in the hidden layer limits performance (saturating around 93%), whereas using EG in the hidden layer allows the network to reach ~98% accuracy.
Output Layer Sampling: Extending stochastic sampling to the output layer (discrete class labels) is feasible and energy-efficient. With sample smoothing, the performance gap between stochastic and deterministic output layers diminishes significantly as trial counts reach $K \approx 5-10$ .
Depth vs. Activation: While linear output activations with MSE loss underperform in single-hidden-layer networks, adding a second hidden layer allows them to match the performance of Softmax+CE configurations.

5. Significance

Bridging Theory and Hardware: This work provides a practical framework for training PNNs that acknowledges the fundamental discreteness and stochasticity of quantum and nanoscale devices, moving beyond the "noise-as-perturbation" paradigm.
Energy Efficiency: By demonstrating that high accuracy can be achieved with few trials (low sampling cost), the paper suggests that stochastic PNNs could operate with significantly lower energy consumption than digital counterparts, which require high-precision, deterministic arithmetic.
Scalability: The proposed EG and ST estimators allow for training without requiring a perfect physical model of the device during the backward pass, making the approach more robust to device drift and calibration errors in real-world hardware.
Quantum Potential: The inclusion of the TSP neuron and the discussion of quantum advantages position this work as a foundational step toward Quantum Machine Learning architectures that leverage intrinsic quantum stochasticity rather than fighting against it.

In conclusion, the paper establishes that embracing stochasticity through physics-aware training strategies (specifically the EG estimator) enables the reliable training of deep physical neural networks, even under severe constraints of energy, noise, and limited sampling.