Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

The paper introduces Covenant-72B, a 72-billion-parameter language model successfully pre-trained on 1.1 trillion tokens through the largest permissionless, globally distributed collaboration to date, demonstrating that open, blockchain-supported participation can achieve performance competitive with centralized training at unprecedented scale.

Joel Lidin, Amir Sarfi, Erfan Miahi, Quentin Anthony, Shivam Chauhan, Evangelos Pappas, Benjamin Thérien, Eugene Belilovsky, Samuel Dare

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you want to build a giant, super-smart robot brain (a Large Language Model, or LLM) that knows everything in the world. Usually, only the richest companies can do this because they need to buy thousands of expensive super-computers and hook them all together in a single, massive data center. It's like trying to build a skyscraper, but you can only use one crane, and that crane costs a billion dollars.

The Covenant-72B paper is about a radical new way to build that skyscraper.

Instead of one giant crane, they used thousands of small, ordinary cranes scattered all over the world, connected by the regular internet. And the best part? Anyone could bring their crane to the job site, no permission needed.

Here is how they pulled off this massive, decentralized construction project, explained simply:

1. The Problem: The "Internet Traffic Jam"

If you try to connect thousands of computers over the regular internet to train a giant AI, you hit a wall. The internet is slow and unreliable compared to the super-fast cables inside a data center.

  • The Analogy: Imagine trying to coordinate a dance routine with 20,000 people spread across different countries. If everyone has to shout their moves to everyone else every single second, the noise and lag would make the dance impossible. The computers would spend more time waiting for messages than actually learning.

2. The Solution: The "Group Chat" Strategy

The team used a clever trick called SparseLoCo.

  • The Analogy: Instead of sending a full, high-definition video of every move (which is huge and slow), the computers only send a text message saying, "I moved my left foot up."
  • How it works: The computers do a lot of work locally (learning on their own) and only occasionally send a tiny, heavily compressed summary of what they learned to the group. They use a technique called "error feedback" to make sure that even though they are sending tiny summaries, they don't lose any important information over time. It's like sending a postcard instead of a movie, but the postcard is so smartly written that you can reconstruct the whole movie from it.

3. The "Trustless" Element: The Blockchain Bouncer

Usually, when you let random people join a project, you worry they might cheat, break things, or send garbage data.

  • The Analogy: Imagine a massive potluck dinner where anyone can bring a dish. How do you know someone didn't bring a plate of rocks?
  • The Fix: They used a system called Gauntlet (running on a blockchain). Think of Gauntlet as a super-strict, automated food critic.
    • Every time someone brings a "dish" (a piece of AI learning data), the critic tastes it immediately.
    • If the dish tastes good (the math checks out), the critic gives them points and adds their dish to the main pot.
    • If the dish is bad or they are trying to cheat, they get zero points and are ignored.
    • This creates a system where people are rewarded for being honest and helpful, and there is no need for a central boss to say, "You are allowed to join."

4. The Result: A Giant Brain Built by the Crowd

They managed to train a 72-billion-parameter model (a very large AI) using this method.

  • The Scale: They used about 1.1 trillion words of data to teach the AI.
  • The Hardware: They didn't use a supercomputer. They used a mix of computers owned by volunteers, some with 8 powerful graphics cards, all connected via regular home internet.
  • The Performance: The resulting AI, Covenant-72B, is just as smart as models built by big tech companies in their expensive data centers, even though it was built by a crowd of strangers over the regular internet.

5. Why This Matters

This is a huge deal for democratization.

  • Before: Only the "rich kids" (big tech companies) could build the smartest AI because they had the money for the expensive data centers.
  • Now: This paper proves that if you have a smart enough way to coordinate, anyone with a computer and an internet connection can help build the next generation of AI. It turns AI training from a "closed club" into an "open global party."

Summary

The paper describes a successful experiment where they built a world-class AI by:

  1. Letting anyone join (no whitelist).
  2. Using a "text message" system to avoid internet traffic jams.
  3. Using a blockchain bouncer to stop cheaters.
  4. Proving that a crowd of strangers can build a giant brain just as well as a single giant company.

It's the difference between building a house with one expensive crane versus building it with a thousand volunteers using hand tools, but doing it so efficiently that the house ends up just as strong.