Leveraging Non-linear Dimension Reduction and Random Walk Co-occurrence for Node Embedding

The paper introduces COVE, an explainable high-dimensional node embedding method based on random walk co-occurrence and non-linear dimension reduction, which, when processed with UMAP and HDBSCAN, achieves performance comparable to the Louvain algorithm on clustering and link prediction tasks.

Ryan DeWolfe

Published 2026-03-02
📖 5 min read🧠 Deep dive

Imagine you have a massive, tangled ball of yarn. Each knot in the yarn is a person (or a place, or a website), and the strands connecting them are their relationships. This is a graph in the world of data science.

The goal of this paper is to figure out how to untangle this ball of yarn and lay it flat on a table so we can see which knots naturally group together (like friends in a social circle or airports in the same continent). This process is called Node Embedding.

Here is the story of how the author, Ryan DeWolfe, solved a tricky problem with a new method called COVE.

The Old Way: The "Tiny Room" Problem

For a long time, scientists tried to flatten these complex networks into a very small space, usually just 2 dimensions (like a flat piece of paper). They used a technique called a "Random Walk."

Think of a Random Walk like a blindfolded person wandering through a city. If two people keep bumping into each other while wandering, they must live in the same neighborhood. The old algorithms assumed that if you squeeze all this "neighborhood" information into a tiny 2D room, you can still see the groups clearly.

The Problem: It's like trying to fit a 3D globe into a flat map. When you squish a complex 3D structure into a tiny 2D space, things get distorted. The "neighborhoods" (communities) get mashed together, and you can't tell who belongs to which group anymore.

The New Idea: The "Big Warehouse" Approach

Ryan DeWolfe asked a simple question: "Why do we have to squeeze everything into a tiny room immediately?"

His answer: Don't squeeze it yet.

Instead, he proposed a new method called COVE. Here is how it works, using a creative analogy:

  1. The Big Warehouse (High Dimensions):
    Imagine you have a massive, multi-story warehouse with thousands of aisles. Instead of forcing everyone into a tiny 2D room, you let everyone wander around in this huge warehouse.

    • In COVE, the "warehouse" is a high-dimensional space (think of it as having 128 or more dimensions, not just X and Y).
    • The method calculates exactly who hangs out with whom based on the "Random Walk" (the blindfolded wandering). Because the warehouse is so big, there is plenty of room for every distinct group to spread out without bumping into each other. The groups stay perfectly separate and clear.
  2. The Magic Lens (UMAP):
    Now, you have a perfect, crystal-clear map of the groups in this giant warehouse, but you still need to show it to a human on a 2D screen.

    • This is where UMAP comes in. Think of UMAP as a magic lens or a high-tech camera.
    • Unlike the old methods that squished the data before taking the picture, COVE takes the perfect high-res photo in the big warehouse first, and then uses the magic lens to flatten it down to 2D.
    • Because the data was so clear in the warehouse to begin with, the flattened 2D version still keeps the groups distinct.

The Results: What Happened?

The author tested this new "Warehouse + Magic Lens" approach against the old "Tiny Room" methods.

  • Clustering (Finding Groups): When they tried to find communities (like identifying which airports are in Europe vs. Asia), the new method worked just as well as the best existing tools, and sometimes even slightly better. It was able to see the groups clearly, whereas the old methods often got them mixed up.
  • Link Prediction (Guessing Connections): They also tried to guess missing connections (e.g., "Will these two airports start flying to each other?"). The new method performed just as well as the old ones.
  • The "Explainable" Bonus: Because the method is based on simple math (counting how often nodes appear together in a walk) rather than a "black box" neural network, it's easier to understand why the algorithm made its decisions.

The "Secret Sauce"

The paper also mentions a little trick they used to make the "Magic Lens" (UMAP) work even better. Sometimes, when you try to flatten a 3D object, the lens gets confused and starts with a random guess. The author realized they could use the original shape of the yarn ball to give the lens a "head start," making the final picture even sharper.

The Bottom Line

The paper argues that we have been trying to force complex data into a box that is too small for too long. By letting the data breathe in a large, high-dimensional space first (COVE) and then using modern tools to gently flatten it (UMAP), we get clearer, more accurate maps of our data.

It's the difference between trying to fold a giant quilt into a pocket (old way) versus laying the quilt out on a huge table to see the pattern, and then carefully rolling it up to fit in the pocket (COVE). The pattern remains much clearer.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →