t-SNE Exaggerates Clusters, Provably

This paper proves that t-SNE visualizations cannot reliably reflect the strength of input clustering or the extremity of outliers, thereby challenging the common belief that t-SNE outputs accurately preserve the structure of the original data.

Noah Bergam, Szymon Snoeck, Nakul Verma

Published 2026-03-03
📖 6 min read🧠 Deep dive

The Big Picture: The "Magic Map" That Lies to You

Imagine you have a giant, messy room filled with thousands of different objects (data points). Some are clearly grouped together (like all the red balls in one corner, all the blue books in another), and some are just random junk scattered everywhere.

You want to take a photo of this room and shrink it down to a flat, 2D poster so you can see the patterns. t-SNE is the most popular "magic camera" used by scientists to do this. It's famous because it usually does a great job: it pulls similar things together and pushes different things apart, making beautiful, colorful clusters on a screen.

The Problem: This paper argues that t-SNE is a bit of a magician who pulls rabbits out of hats that aren't there. It is so good at making things look organized that it can trick you into seeing structure where none exists, or hide structure that is actually there.

The authors prove two main things:

  1. It exaggerates clusters: It can make a messy pile of sand look like distinct islands, even if the sand was just one big, uniform pile.
  2. It hides outliers: If you have a single, weird object that doesn't fit anywhere (an outlier), t-SNE will often force it to fit in with the crowd, hiding its weirdness.

1. The "Imposter" Clusters (Making the Messy Look Organized)

The Analogy: Imagine you have a group of people standing in a room.

  • Scenario A: They are standing in two tight, distinct groups (Team Red and Team Blue), far apart from each other.
  • Scenario B: They are all standing in one giant, jumbled circle, barely touching each other.

If you ask t-SNE to take a picture of both scenarios, it will produce the exact same picture for both.

What the paper proves:
The authors show that you can take a dataset that is perfectly un-clustered (like Scenario B) and tweak the distances between the points just a tiny bit. Even though the data is now "messy," t-SNE will still spit out a beautiful, clean picture with two distinct islands (Scenario A).

The Takeaway:
If you see a pretty, clustered t-SNE plot, you cannot be sure that your data actually has those clusters. The map might be lying. It's like looking at a weather map that shows a sunny day, but you're actually standing in a thunderstorm. The map is just "optimizing" to look nice, not to be accurate.

2. The "House of Cards" Instability (One Tiny Change, Total Chaos)

The Analogy: Imagine a house of cards. If you have a perfectly balanced structure, it looks great. But if you blow a tiny breath of air (a tiny change in the data), the whole thing collapses into a pile.

What the paper proves:
t-SNE is incredibly unstable. If you have a dataset that looks like a "regular simplex" (a geometric shape where every point is roughly the same distance from every other point, like a perfect pyramid), t-SNE can turn it into any shape you want just by changing the distances between points by a microscopic amount (like 1%).

The Takeaway:
Because high-dimensional data (like gene sequences or text) often behaves like this "perfect pyramid" where distances are all similar, t-SNE is essentially playing with a house of cards. A tiny, invisible change in the data can make the clusters appear, disappear, or merge completely. You can't trust the stability of the picture.

3. The "Poison Point" Attack (One Bad Apple Spoils the Bunch)

The Analogy: Imagine a classroom where the students are sitting in two distinct groups: Math kids and Art kids.
Now, imagine you sneak one single student into the room who sits exactly in the middle of the room, equidistant from everyone.

What the paper proves:
If you add just one "poison point" (a single data point placed strategically in the center), t-SNE's entire worldview collapses.

  • Before: The Math and Art kids are clearly separated.
  • After: t-SNE gets confused. Because the poison point is the "nearest neighbor" to almost everyone, t-SNE drags all the Math and Art kids toward the poison point. The two distinct groups merge into one big, messy blob.

The Takeaway:
t-SNE is incredibly fragile. An attacker (or just a random glitch) only needs to add one weird data point to completely destroy the ability to see the real clusters in your data.

4. The "Outlier" Eraser (Hiding the Weirdos)

The Analogy: Imagine a party where 99 people are dancing in a circle, and one person is standing 100 feet away in the corner, screaming.

  • A normal camera (like PCA): Would show the crowd in the center and the screaming person far away.
  • t-SNE: Would drag the screaming person right into the middle of the dance circle and make them look like they are part of the group.

What the paper proves:
t-SNE is mathematically incapable of showing extreme outliers. Its goal is to keep points close to their neighbors. If a point is too far away, t-SNE gets confused and forces it to be close to someone, even if that someone is far away in reality.

The Takeaway:
If you are using t-SNE to find fraud (like a credit card thief who looks very different from normal users), t-SNE will likely hide the thief. It will tuck the thief into the crowd of normal users, making them invisible. If you need to find the "weirdos," do not use t-SNE.


Summary: What Should You Do?

The authors aren't saying "Stop using t-SNE." They are saying: "Don't trust it blindly."

  • Don't assume: Just because you see a cluster, it doesn't mean the data is actually clustered.
  • Don't ignore: Just because you don't see an outlier, it doesn't mean one isn't there.
  • Be skeptical: t-SNE is a tool for exploration, not for proof. It's great for getting a "vibe" of the data, but if you need to make scientific conclusions based on the shapes you see, you need to double-check with other methods (like PCA) or mathematical proofs.

In short: t-SNE is a talented artist who loves to paint pretty pictures, but sometimes it paints things that aren't really there, or erases the things that are. Always check the canvas against the reality.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →