Torus embeddings

This paper proposes adapting deep learning frameworks to utilize torus embeddings, which leverage native integer overflow for efficient quantization and TinyML deployment while achieving performance and stability comparable to standard hyperspherical embeddings.

Dan Stowell

Published 2026-03-04
📖 4 min read☕ Coffee break read

Imagine you are trying to organize a massive library of information. In the world of Artificial Intelligence (AI), this "library" is made of embeddings—mathematical maps that turn complex data (like images of cats or songs of birds) into lists of numbers so computers can understand them.

For a long time, AI researchers have been organizing these lists of numbers in two main ways:

  1. The Infinite Room (Euclidean Space): You can put the numbers anywhere, but they can get lost or drift too far apart.
  2. The Giant Bubble (Hypersphere): You force all the numbers to sit on the surface of a giant, invisible ball. This keeps them organized and close together, which is great for finding similar items.

The Problem with the Bubble
While the "Giant Bubble" works well for training AI, it's a nightmare for the actual computers that run these models in the real world (like your phone, a smart thermostat, or a tiny sensor).

Why? Because the numbers on a bubble are messy decimals (like 3.14159...). Most everyday computers, especially the tiny, low-power ones, are built to handle simple whole numbers (integers) very efficiently. When you try to squeeze the messy "bubble" numbers into simple whole numbers, you lose a lot of detail, like trying to fit a high-definition photo into a pixelated 8-bit video game. It's inefficient and wastes the computer's potential.

The Solution: The Donut (Torus)
Dan Stowell, the author of this paper, suggests a new way to organize the library: The Donut (or Torus).

Think of a video game like Pac-Man or Asteroids. If you walk off the right edge of the screen, you instantly reappear on the left. If you walk off the top, you reappear at the bottom. The world wraps around itself.

In math, this is called a Torus.

  • The Analogy: Instead of a sphere where you have to deal with tricky curves, imagine a flat square grid where the edges are glued together.
  • Why it's great for computers: Computers are already built to handle "wrapping around." If you add two numbers and the result is too big, the computer just "wraps around" to the start (like a clock going from 12 back to 1). This is called overflow arithmetic. It's the fastest, most basic thing a computer can do.

By designing the AI's map to be a "Donut" instead of a "Bubble," the data fits perfectly into the computer's native language (simple whole numbers) without needing complex conversions.

How They Tried It
The author tested two ways to make this "Donut map":

  1. The "Clifford" Method: This was like trying to fold a piece of paper into a donut shape using complex origami. It worked, but it was unstable and sometimes the AI got confused and crashed.
  2. The "Pairwise Normalization" Method: This was like taking pairs of numbers and gently twisting them into a circle. This method was stable, easy to train, and performed just as well as the traditional "Bubble" method.

The Results

  • Performance: The "Donut" maps worked just as well as the "Bubble" maps for recognizing cats, dogs, and bird songs.
  • Efficiency: When they compressed the data to be tiny (using very few bits, like 1 or 8 bits), the "Donut" maps held their shape better. They didn't lose as much detail as the "Bubble" maps did.
  • The Future: This is a big deal for TinyML (AI on tiny devices). If you want to put a smart AI on a battery-powered sensor in a forest to listen for birds, you don't have a supercomputer. You have a simple chip. The "Donut" method allows these simple chips to run powerful AI models efficiently because they speak the computer's native language.

In a Nutshell
The paper argues that instead of forcing AI data into a complex, curved shape (a sphere) that doesn't fit well with simple computer chips, we should shape the data like a donut. This shape naturally fits the way computers count and wrap around, making AI faster, more efficient, and perfect for running on the small, everyday devices that surround us.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →