Combining Serverless and High-Performance Computing Paradigms to support ML Data-Intensive Applications

This paper introduces Cylon, a high-performance distributed data frame solution that leverages a serverless communicator using NAT Traversal TCP Hole Punching to enable direct communication between AWS Lambda functions, achieving scaling efficiency within 6.5% of traditional EC2 clusters for data-intensive machine learning applications.

Mills Staylor, Arup Kumar Sarker, Gregor von Laszewski, Geoffrey Fox, Yue Cheng, Judy Fox

Published 2026-03-06
📖 5 min read🧠 Deep dive

Here is an explanation of the paper, translated into simple language with some creative analogies.

The Big Picture: The "Serverless" Dilemma

Imagine you are running a massive cooking competition. You have thousands of chefs (data processors) who need to work together to create a giant feast (a Machine Learning model).

Traditionally, you would rent a huge, expensive kitchen building (a Data Center or HPC Cluster). You have to buy the ovens, pay the electricity, hire the security guards, and keep the building warm even when no one is cooking. It's reliable, but it's expensive and rigid.

Then, Serverless Computing (like AWS Lambda) came along. This is like renting a kitchen by the minute. You only pay for the exact seconds your chefs are chopping vegetables. If you need 100 chefs for 5 minutes, you pay for that. If you need 0 chefs, you pay nothing. It's incredibly flexible and cheap for small tasks.

The Problem:
In a traditional kitchen, chefs stand next to each other and can shout instructions or hand ingredients directly to one another instantly. In the "Serverless" kitchen, the chefs are scattered across different buildings. To pass a recipe or an ingredient, they have to mail it to a central post office (like Amazon S3 or Redis), wait for it to be sorted, and then the other chef picks it up.

This "mailing" process is slow. For complex tasks like training AI, where chefs need to swap data constantly, this delay kills performance. It's like trying to build a house by mailing bricks back and forth instead of just handing them to the bricklayer next to you.

The Solution: "Cylon" and the "Hole Punch"

The researchers in this paper built a tool called Cylon. Think of Cylon as a universal translator and a super-fast delivery service that works inside the serverless kitchen.

Their big breakthrough was solving the communication problem using a trick called NAT Traversal (TCP Hole Punching).

The Analogy: The Secret Handshake
Imagine two chefs, Chef A and Chef B, are in different buildings. They both have a security guard (a firewall) who won't let them call each other directly because they don't know each other's phone numbers.

  1. The Old Way (S3/Redis): Chef A writes a note, walks to the post office, drops it in a box. Chef B has to walk to the post office, check the box, and pick it up. This takes forever.
  2. The New Way (Hole Punching): Chef A and Chef B both call a neutral "Rendezvous Server" (a friend who knows both of them). The friend tells them, "Hey, I see you both are trying to connect. Here is your direct phone number."
  3. The Punch: Suddenly, the security guards at both buildings realize, "Oh, these two are friends! Let them talk directly." The guards "punch a hole" in the wall, and Chef A and Chef B can now talk directly, instantly, without going through the post office.

What Did They Actually Do?

  1. Built the Bridge: They took an existing library called FMI (which was good at this "hole punching") and integrated it into Cylon, a high-speed data processing tool.
  2. The Test: They ran a massive data sorting task (a "Join" operation) on 64 different serverless functions (AWS Lambda).
  3. The Result:
    • Speed: The serverless chefs worked almost as fast as the traditional chefs in the big data center. In fact, they were only 6.5% slower than the traditional, expensive machines.
    • Communication: Their direct "hole punching" method was 10 to 100 times faster than using the "post office" (S3/Redis) method.
    • Cost: For tasks that happen suddenly and briefly (like a sudden surge in data), serverless was much cheaper. They calculated that a complex data job cost about 3 cents on serverless, whereas keeping a traditional machine running just to wait for that job would cost much more in wasted time.

Why Does This Matter?

This paper proves that Serverless isn't just for simple, boring tasks anymore.

  • Before: People thought serverless was only good for "embarrassingly parallel" work (tasks where everyone works alone, like counting pixels in a million photos).
  • Now: They proved serverless can handle complex teamwork (like training AI models or analyzing earthquake data) where everyone needs to talk to everyone else constantly.

The "So What?" for You

If you are a scientist, a doctor, or a data analyst:

  • You don't need to buy a supercomputer anymore to do big data analysis.
  • You can use the cloud's "pay-as-you-go" model to run massive AI experiments.
  • You can process huge datasets (like genome sequences or weather patterns) faster and cheaper than ever before, because your "chefs" can finally talk to each other directly instead of waiting for the mail.

In short: The researchers figured out how to let serverless functions "hold hands" directly, turning a slow, disconnected network into a high-speed supercomputer that you only pay for when you use it.