Predictive Coding Graphs are a Superset of Feedforward Neural Networks

This paper demonstrates that predictive coding graphs constitute a mathematical superset of feedforward neural networks, thereby strengthening their theoretical foundation in machine learning and highlighting the importance of network topology.

Björn van Zwol

Published Mon, 09 Ma
📖 5 min read🧠 Deep dive

Here is an explanation of the paper "Predictive Coding Graphs are a Superset of Feedforward Neural Networks" using simple language and creative analogies.

The Big Idea: From a Straight Line to a Web

Imagine you have a standard Feedforward Neural Network (FNN). Think of this like a conveyor belt in a factory.

  • Raw materials (data) enter at the start.
  • They move in one direction: Station A \rightarrow Station B \rightarrow Station C.
  • At each station, a worker (a neuron) does a specific job and passes the item to the next.
  • Once the item reaches the end, you get a final product (a prediction).
  • The Catch: If a mistake happens at Station C, the only way to fix it is to stop the whole line, send a message all the way back to Station A to tell them what went wrong, and then restart. This "sending the message back" is called Backpropagation. It works well, but it's rigid and requires the whole factory to stop and talk to each other in a very specific order.

Now, meet the paper's hero: Predictive Coding Graphs (PCGs).
Think of a PCG not as a conveyor belt, but as a busy, chaotic open-plan office.

  • Everyone (every node) is talking to everyone else.
  • Instead of just waiting for the next item, every worker is constantly guessing what the next item should look like based on what they see.
  • If your guess is wrong, you feel a "tension" (error). You adjust your guess until the tension goes away.
  • The Magic: In this office, you don't just talk forward. You can talk backward, sideways, or even talk to yourself. The whole office adjusts together to find the best solution.

The Paper's Two Big Discoveries

The author, Björn van Zwol, proves two main things that connect these two very different worlds.

1. The "Quiet Time" Discovery (Testing Phase)

The Analogy: Imagine the factory (FNN) and the office (PCN) are both trying to assemble a toy.

  • During Training (Learning): The office is loud. People are arguing, guessing, and adjusting their positions to figure out how to build the toy. The factory is quiet, just following a strict script.
  • During Testing (Inference): This is when the toy is finished and you just need to show it to a customer.
    • The paper proves that when the office (PCN) is just "showing off" the final toy, it behaves exactly like the factory (FNN). The chaotic guessing stops, and the workers line up in a straight line, passing the toy down the chain just like the factory.
    • Why it matters: This means PCNs are just as good at making predictions as the standard networks we use today. They aren't a "worse" alternative; they are a "different" one that ends up doing the same job perfectly.

2. The "Master Blueprint" Discovery (The Superset)

The Analogy: Imagine the Factory (FNN) is a specific type of house: a bungalow. It has a front door, a hallway, and a back door. It's simple and straight.

  • The paper proves that the PCG is not just another house; it is the entire city of architecture.
  • The PCG is a universal blueprint that can build any house.
  • If you want a bungalow (a standard FNN), you just take the PCG blueprint and tape over the windows and doors that aren't needed. You "mask" the extra connections.
  • But because the PCG is a "superset," it can also build a house with a spiral staircase (loops), a house with a secret tunnel to the back (backward connections), or a house where the kitchen talks to the bedroom (lateral connections).

The "Superset" Concept:

  • FNNs are a small circle inside a big circle.
  • PCGs are the big circle.
  • Every FNN is a PCG, but not every PCG is an FNN.

Why Should We Care? (The "So What?")

1. It's More "Biological" (Like the Human Brain)
Our brains don't work like conveyor belts. Neurons in the brain talk to each other in loops and webs. Backpropagation (the standard training method) is mathematically clever but biologically weird (how does the brain send a "negative error signal" backward through time?). PCGs work more like how our brains actually process information: by constantly predicting and correcting errors in real-time.

2. It Unlocks New Architectures
Because PCGs allow "backward" and "sideways" connections, they can solve problems that standard networks struggle with.

  • Skip Connections: You might have heard of "ResNets" (Residual Networks), which are famous for being very deep and smart. They use "skip connections" (jumping over layers). The paper shows that these are just a specific, easy-to-make version of a PCG.
  • The Future: If we can train these weird, loop-filled networks (which standard methods can't do easily), we might discover new types of AI that are more robust, efficient, or creative.

3. The Trade-off
There is a catch. Running a "conveyor belt" (FNN) is fast. Running a "busy office" (PCG) where everyone talks to everyone takes more time and computing power. The paper admits this: PCGs are slower to run right now. But, the author suggests that the extra flexibility might be worth the extra time, just like a complex Swiss Army knife is more useful than a simple knife, even if it takes longer to open.

Summary in One Sentence

This paper proves that Predictive Coding Graphs are the "Swiss Army Knife" of neural networks: they include all the standard "knives" (Feedforward Networks) we use today as simple special cases, but they also have the potential to be much more complex, flexible, and brain-like tools for the future of AI.