A Hybrid Residue Floating Numerical Architecture with Formal Error Bounds for High Throughput FPGA Computation

This paper introduces the Hybrid Residue Floating Numerical Architecture (HRFNA), a formally verified numerical system combining carry-free residue arithmetic with lightweight exponent scaling that achieves significantly higher throughput, reduced resource usage, and improved energy efficiency on FPGAs compared to IEEE 754 standards while maintaining rigorous, bounded numerical error.

Mostafa Darvishi

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are running a massive, high-speed factory that processes numbers. This factory is built on a special kind of machine called an FPGA (Field-Programmable Gate Array), which is like a Lego set for computers that can be rearranged to do specific jobs incredibly fast.

For decades, the factory has used a standard method called Floating-Point Arithmetic (like the math your calculator uses). It's very flexible and handles huge numbers and tiny decimals perfectly. But, it's also clunky and slow. Every time two numbers meet, they have to stop, line up their decimal points, check if they are too big, and shuffle bits around. It's like two people trying to have a conversation, but they have to stop every sentence to check their watches, adjust their glasses, and make sure they are speaking the same volume. It works, but it wastes a lot of time and energy.

The Problem: The "Traffic Jam"

The author of this paper, Mostafa Darvishi, noticed that this "stop-and-check" process creates a traffic jam in the factory. The machines are so busy organizing the numbers that they can't actually do the math as fast as they could.

The Solution: The "HRFNA" Factory

The paper introduces a new system called HRFNA (Hybrid Residue–Floating Numerical Architecture). Think of HRFNA as a completely redesigned factory floor that uses a clever trick to avoid the traffic jam.

Here is how it works, using a simple analogy:

1. The "Residue" System (The Parallel Assembly Lines)

Imagine you have a huge number, say 123. In the old system, you have to write it out as 1-2-3 and carry over digits if it gets too big.
In HRFNA, instead of writing the whole number, we break it down into three different "views" using three different clocks (moduli).

  • Clock A says: "It's 3."
  • Clock B says: "It's 1."
  • Clock C says: "It's 5."

The magic is that you can do math on these three views simultaneously and independently.

  • If you want to multiply two numbers, you just multiply the "3s," the "1s," and the "5s" at the exact same time.
  • No carrying over! There is no waiting for one line to finish before the next one starts. It's like having 100 workers painting a wall at the same time, rather than one worker painting the whole thing. This is the "Carry-Free" part.

2. The "Floating" Part (The Volume Knob)

The problem with the "Residue" system is that it's great at math, but it's bad at knowing how big the number actually is. It's like having three people describing a car, but none of them know if it's a toy car or a truck.

HRFNA adds a single "Volume Knob" (an exponent) to the whole group.

  • The three views do the math fast and furious.
  • The Volume Knob just sits there, watching.
  • If the numbers get too huge (like if the factory starts producing giant trucks instead of toy cars), the Volume Knob gets turned down once to shrink everything back to a manageable size.

3. The "Normalization" (The Rare Cleanup)

In the old system, you had to check and adjust the volume knob after every single math problem.
In HRFNA, the Volume Knob only gets adjusted when the numbers get really big.

  • Analogy: Imagine a chef chopping vegetables. In the old system, the chef stops after every chop to measure the pile. In HRFNA, the chef chops, chops, chops, and only stops once every hour to measure the pile and maybe move it to a bigger bowl.
  • This "stop" is called Normalization. Because it happens so rarely, the factory never stops for long.

Why is this a Big Deal?

The paper proves that this new system isn't just a cool trick; it's mathematically sound and safe.

  • Speed: Because the factory doesn't stop to check the volume knob constantly, it runs 2.4 times faster than the old system.
  • Efficiency: It uses 38–55% less space on the chip (like fitting a bigger factory into a smaller building).
  • Accuracy: The author proved that the "Volume Knob" adjustments introduce very tiny errors, and these errors are predictable and bounded. It's not a "guess"; it's a calculated, safe margin of error.
  • Stability: They tested it on complex tasks like solving physics equations (ODE solvers) and multiplying huge matrices. The system didn't crash or drift off course; it stayed stable for millions of steps.

The Bottom Line

Think of HRFNA as a high-speed train compared to the old stop-and-go bus.

  • The Bus (Floating-Point) stops at every station to pick up and drop off passengers (normalization), making the trip slow and expensive.
  • The Train (HRFNA) runs on parallel tracks (Residue) and only stops at major terminals (Normalization) when absolutely necessary. It gets you to the destination much faster, uses less fuel, and arrives with a predictable schedule.

This new architecture is a game-changer for scientific computing, AI, and engineering simulations running on FPGAs, offering the best of both worlds: the speed of simple math and the flexibility of handling huge numbers.