Nonparametric two-sample hypothesis testing for low-rank random graphs of differing sizes

This paper proposes a nonparametric two-sample hypothesis test for low-rank random graphs of differing sizes, utilizing maximum mean discrepancy on optimally transported graph embeddings to determine if the networks originate from the same distribution.

Joshua Agterberg, Minh Tang, Carey Priebe

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a mystery: Are two different social networks built on the same "blueprint," or are they fundamentally different?

In the world of data science, these "networks" are just maps of connections—like who follows whom on Twitter, which neurons fire together in a brain, or which people are friends in a group. Usually, statisticians compare two networks by looking at them side-by-side, assuming they have the exact same number of people and that Person A in Network 1 is the same as Person A in Network 2.

But in real life, that's rarely true. One network might have 1,000 people, and the other 5,000. The people are different, and the names don't match up. How do you tell if they are "the same kind of network" without knowing who is who?

This paper by Agterberg, Tang, and Priebe proposes a clever new way to answer that question. Here is the breakdown using simple analogies.

1. The Problem: Comparing Apples to Oranges (That Look Similar)

Imagine you have two jars of marbles.

  • Jar A has 100 marbles.
  • Jar B has 1,000 marbles.

You want to know: Do these jars contain marbles from the same factory?

If you just look at the colors, you might get confused. Maybe Jar A has mostly red marbles, but Jar B has a mix of red, blue, and green. Is that because they are different factories, or just because Jar B is bigger and has more variety?

In network terms, the "marbles" are the people (vertices), and the "connections" between them are the edges. The authors want to know if the underlying "factory settings" (the probability distribution) that created these connections are the same, even if the networks are different sizes and the people are strangers to each other.

2. The Solution: The "Shadow" Technique (Latent Space)

The authors use a concept called Generalized Random Dot Product Graphs (GRDPG).

Think of every person in a network as having a secret "ID card" hidden in their pocket. This ID card isn't a name; it's a set of coordinates in a hidden, multi-dimensional space (like a secret map).

  • If two people have ID cards that are close together on this secret map, they are likely to be friends.
  • If they are far apart, they probably won't connect.

The network we see (the web of connections) is just the shadow cast by these hidden ID cards. The paper's method tries to reverse-engineer the shadows to find the hidden ID cards.

3. The Twist: The "Rotating Room" (Indefinite Orthogonal Transformations)

Here is where it gets tricky. The authors discovered that the "secret map" has a weird property: It can be rotated or flipped without changing the shadows.

Imagine you are in a room with a unique pattern on the floor. If you rotate the room 90 degrees, the pattern looks different, but the shape of the room is the same. In math terms, this is called an indefinite orthogonal transformation.

Because of this, you can't just line up the ID cards from Network A and Network B directly. You have to figure out how to rotate the ID cards of Network B so they align perfectly with Network A. If you can rotate them to match, the networks are "twins." If you can't, they are different.

4. The Tool: The "Optimal Transport" Dance

How do you find the right rotation? The authors use a technique called Optimal Transport.

Think of it like a dance floor.

  • You have a group of dancers (Network A's ID cards) on the left.
  • You have another group (Network B's ID cards) on the right.
  • You want to pair them up so that the total distance they have to walk to meet is minimized.

The authors developed an algorithm (using something called the Sinkhorn algorithm) that efficiently figures out the best rotation and pairing. It's like a super-smart matchmaker that says, "Okay, if we spin Network B this way, these two people are a perfect match."

5. The Test: The "Maximum Mean Discrepancy" (MMD)

Once the networks are rotated and aligned, the authors use a statistical test called Maximum Mean Discrepancy (MMD).

Imagine you have two bags of marbles again. You want to know if they are from the same factory. You take a handful from each bag and ask a very sensitive detector: "How different do these handfuls look?"

  • If the detector says "Zero difference," the networks are likely the same.
  • If it says "Huge difference," they are different.

The paper proves that this detector works even when:

  1. The networks are sparse (very few connections, like a ghost town).
  2. The networks have negative eigenvalues (a mathematical quirk that happens in complex networks, like a "negative" connection).
  3. The networks are different sizes.

6. Why This Matters

Previous methods were like trying to compare two puzzles where you knew exactly which piece went where. This new method works even if:

  • The puzzles have different numbers of pieces.
  • The pieces are scrambled.
  • The puzzles are made of "fuzzy" or "negative" pieces.

In summary:
The authors built a mathematical "universal translator." They take two messy, different-sized networks, extract their hidden "DNA" (latent positions), rotate them until they align, and then check if their DNA matches. If it does, the networks are statistically identical, regardless of their size or how the people are named.

This is a huge step forward for analyzing real-world data, where we rarely have perfect, matched datasets. It allows scientists to compare a small brain network to a large social network and ask: "Are they built on the same fundamental rules?"