K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

This paper establishes a rigorous variational and gradient-based equivalence between K-Means and differentiable Radial Basis Function networks, proving that the latter converges to the former as temperature vanishes and proposing Entmax-1.5 to ensure stable training for end-to-end differentiable clustering.

Felipe de Jesus Felix Arredondo, Alejandro Ucan-Puc, Carlos Astengo Noguez

Published 2026-03-06
📖 5 min read🧠 Deep dive

Here is an explanation of the paper "K-Means as a Radial Basis Function Network" using simple language, analogies, and metaphors.

The Big Idea: Bridging Two Worlds

Imagine you have two very different tools for organizing a messy room:

  1. The "Hard" Organizer (K-Means): This tool is like a strict librarian. It looks at a book and immediately slaps a label on it: "This goes in the History bin." It's fast and efficient, but once the label is on, it can't be changed. If you try to teach a robot to learn while it's labeling, the robot gets confused because the "labeling" step is a sudden, jerky jump, not a smooth slide.
  2. The "Soft" Organizer (RBF Networks): This tool is like a gentle artist. Instead of a hard label, it paints a soft, fuzzy cloud around the book. The book is 90% History, 10% Biography. It's smooth, flexible, and a robot can easily learn from it because the changes are gradual.

The Problem: The strict librarian (K-Means) is great at finding groups, but it breaks the robot's brain (the neural network) because it's not "differentiable" (you can't calculate a smooth slope for it). The gentle artist (RBF) is great for robots, but it's often seen as just a "soft approximation" of the librarian, not the real deal.

The Paper's Solution: The authors prove that these two tools are actually the same person wearing different hats.

They show that if you take the "Soft" Organizer and turn a specific dial (called temperature, σ\sigma) all the way down to zero, the soft clouds instantly snap into hard labels. The "Soft" Organizer becomes the "Hard" Organizer.


The Core Concepts Explained

1. The Temperature Dial (σ\sigma)

Imagine you are trying to decide which of three friends to sit with at lunch.

  • High Temperature (Hot Day): You are indecisive. You might sit with 40% of Friend A, 30% of Friend B, and 30% of Friend C. You are "spread out." This is the Soft state.
  • Low Temperature (Freezing Day): You are desperate for warmth. You immediately run to the friend who is closest to you and sit only with them. You ignore the others completely. This is the Hard (K-Means) state.

The paper proves that as you slowly turn the dial from "Hot" to "Freezing," the soft, fuzzy decision smoothly transforms into the hard, binary decision without breaking anything.

2. The "Gamma-Convergence" (The Magic Bridge)

In math, proving two things are the same as a limit approaches zero is hard. The authors used a concept called Γ\Gamma-convergence.

  • Analogy: Imagine a hill with a valley at the bottom.
    • The Soft version is a wide, gentle bowl.
    • The Hard version is a sharp, deep V-shape.
    • The paper proves that as you shrink the bowl, it doesn't just get smaller; it morphs perfectly into the V-shape. The bottom of the bowl (the best solution) lands exactly on the bottom of the V.

3. The Gradient Problem (The "Stuck" Robot)

Neural networks learn by sliding down a hill (gradient descent).

  • The Issue: If you use the "Hard" K-Means, the hill has a cliff. If the robot is on the edge, it doesn't know which way to slide because the ground is flat until it suddenly drops. The robot gets stuck.
  • The Fix: By using the "Soft" version (RBF), the hill is smooth. The robot can slide down easily. The paper shows that if you slide down this smooth hill and then turn the temperature dial to zero, the robot ends up in the exact same spot as if it had tried to jump off the cliff.

4. The "Entmax-1.5" Solution (The Safety Valve)

There was a catch. When the temperature gets too low (very close to zero), the math gets unstable. The numbers get so huge or so tiny that computers crash (like trying to divide by zero).

  • The Fix: The authors swapped the standard "Softmax" math for a new tool called Entmax-1.5.
  • Analogy: Think of Softmax as a balloon that inflates until it pops. Entmax-1.5 is a balloon that has a safety valve. It still snaps to a hard decision (0% or 100%), but it does so without the math exploding. It allows the computer to handle the "Freezing Day" scenario without crashing.

Why Does This Matter? (The Real-World Impact)

Before this paper, if you wanted to use K-Means inside a modern AI (like a self-driving car or a chatbot), you had to do it in two separate steps:

  1. Run K-Means to find groups.
  2. Feed those groups into the AI.

This is like baking a cake, taking it out of the oven, and then trying to frost it with a different machine that doesn't talk to the oven.

With this new method:
You can bake and frost the cake in one continuous motion. The AI can now learn how to group things and learn how to recognize patterns at the same time.

  • Joint Optimization: The AI doesn't just "guess" the groups; it adjusts the groups while it learns to see the data better.
  • End-to-End: You can plug this clustering tool directly into a deep neural network, and the whole system learns together, making the AI smarter and more efficient.

Summary in One Sentence

The authors proved that the rigid, old-school K-Means algorithm is actually just a "frozen" version of a smooth, modern neural network, and by using a special mathematical trick (Entmax-1.5), we can now melt them together so AI can learn to group data and recognize patterns all at once.