Threadle: A Memory-Efficient Network Storage and Query Engine for Large, Multilayer, and Mixed-mode Networks

Threadle is a high-performance, memory-efficient C# engine designed to store and query massive, multilayer, mixed-mode networks by utilizing a pseudo-projection approach that avoids the prohibitive memory costs of materializing two-mode data projections.

Carl Nordlund, Yukun Jiao

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are trying to map the entire social world of a country—every person, every job, every school, and every family connection. You want to know who knows whom, not just directly, but through shared experiences like working in the same office or living in the same neighborhood.

The problem is that this map is massive. If you tried to draw every single connection on a giant piece of paper, the paper would be bigger than the Earth, and your computer would explode from trying to hold it in its memory.

This is the problem Threadle solves.

Here is a simple breakdown of what Threadle is, how it works, and why it's a game-changer, using some everyday analogies.

1. The Problem: The "Catastrophic Explosion"

In the world of data, we often look at two types of connections:

  • One-mode: Person A is friends with Person B. (Simple).
  • Two-mode: Person A and Person B both work at "Company X." They aren't directly connected, but they share a "membership."

To analyze this, old software usually tries to project the data. It takes "Company X" and draws a line between every single pair of employees.

  • If a company has 10 people, that's 45 lines.
  • If a company has 10,000 people, that's 50 million lines.
  • If you do this for an entire country's workforce, you end up with trillions of lines.

The Analogy: Imagine you have a library with 20 million books. The old way of organizing them is to write down every single pair of books that share a topic. If two books are about "cats," you write "Book A + Book B." If you have 10,000 books about cats, you end up with a list so long it would take up a warehouse. Your computer tries to read this list, runs out of space, and crashes.

2. The Solution: The "Pseudo-Projection" Magic

Threadle is a new tool (written in a language called C#) that refuses to write that massive list. Instead, it uses a clever trick called Pseudo-Projection.

The Analogy:
Instead of writing down every pair of friends in a club, Threadle just keeps a list of who belongs to which club.

  • Old Way: "Alice is friends with Bob, Charlie, Dave... (list of 10,000 names)."
  • Threadle Way: "Alice is in the 'Gym Club'. Bob is in the 'Gym Club'."

When you ask Threadle, "Is Alice connected to Bob?", it doesn't look at a giant list of friends. It simply checks: "Are they both in the Gym Club?"

  • If yes -> They are connected.
  • If no -> They aren't.

It does this instantly, without ever having to draw the millions of lines between them. It's like checking a guest list for a party instead of asking every guest to introduce themselves to every other guest.

3. The Result: Fitting an Ocean in a Teacup

The paper demonstrates this with a massive test case:

  • The Data: 20 million people (the whole population of a small country).
  • The Connections: Equivalent to 8 trillion potential relationships.
  • The Old Way: Would require 64 Terabytes of RAM (enough to fill a small server room).
  • Threadle's Way: Fits into 20 Gigabytes of RAM (about the size of a high-end laptop).

The Compression: Threadle squeezed that massive ocean of data into a teacup, achieving a compression ratio of over 2,000 to 1.

4. How It Works in Real Life

Threadle isn't just a storage box; it's a query engine. It comes with two main parts:

  1. The Engine (Threadle): The heavy lifter that stores the data efficiently. It's like the engine of a car.
  2. The Interface (threadleR): A tool that lets researchers (using the R programming language) talk to the engine. It's like the steering wheel and dashboard.

Researchers can ask complex questions like:

  • "Find everyone who shares a workplace with Person X."
  • "Trace a path from Person A to Person Z through schools and workplaces."
  • "Simulate how a rumor spreads through these layers."

Because Threadle doesn't crash the computer, researchers can run these simulations on entire populations instead of just small, unrepresentative samples.

5. Why Does This Matter?

For decades, sociologists and data scientists had to choose between accuracy (using all the data) and feasibility (using a small sample).

  • If they used all the data, their computers would crash.
  • If they used a small sample, they might miss important patterns.

Threadle removes this trade-off. It allows scientists to study the full picture of human society—kinship, jobs, education, and housing—all at once, in real-time, on a standard computer.

Summary

Threadle is a memory-efficient engine that lets us store and analyze the social networks of entire nations without needing a supercomputer. It does this by refusing to draw every single connection line, instead keeping a smart list of memberships and checking connections on the fly. It turns a "trillion-dollar problem" into a "20-gigabyte solution."