DNS-GT: A Graph-based Transformer Approach to Learn Embeddings of Domain Names from DNS Queries

This paper introduces DNS-GT, a novel self-supervised Transformer-based model that learns contextual domain name embeddings from DNS query sequences to outperform existing baselines in tasks like domain classification and botnet detection.

Massimiliano Altieri, Ronan Hamon, Roberto Corizzo, Michelangelo Ceci, Ignacio Sanchez

Published Fri, 13 Ma
📖 4 min read☕ Coffee break read

Imagine you are a security guard at a massive, busy airport. Your job is to spot troublemakers before they cause harm. In the digital world, this "airport" is a computer network, and the "passengers" are the tiny requests computers make to find websites (these are called DNS queries).

For years, security guards (cybersecurity systems) have relied on two main methods:

  1. The "Wanted Poster" approach: They check if a passenger matches a known criminal's face (a signature). If they do, they get stopped. But this fails against new criminals who haven't been caught yet.
  2. The "Suspicious Behavior" approach: They look for people acting weird. But traditional computer programs are bad at understanding context. They might flag a person for running, not realizing they are just late for a flight, not escaping a crime.

This paper introduces a new, super-smart security guard called DNS-GT. Here is how it works, explained simply:

1. The Problem: The "Word" vs. The "Sentence"

Old methods treated every website request like a single word in a dictionary. They learned that "google.com" is usually good and "bad-site.com" is usually bad.

  • The Flaw: This is like trying to understand a movie by only looking at individual frames. You miss the story.
  • The Reality: A website isn't just a name; it's part of a conversation. If a computer asks for "google.com" followed by "youtube.com," that's normal. But if it asks for "google.com" followed by a weird, random string of letters and then a known virus site, that's a suspicious story.

2. The Solution: The "Super-Reader" (DNS-GT)

The authors built DNS-GT, which is like a super-advanced reader that doesn't just memorize words; it understands the story behind them.

It uses two powerful tools:

  • The Transformer (The Context Master): This is the same technology behind chatbots like me. It looks at a whole sentence of requests at once. It understands that the meaning of a request changes based on what came before it.
  • The Graph (The Relationship Map): Imagine drawing lines between people who are talking to each other. This model draws lines between related requests, ignoring the ones that don't fit the conversation. It focuses only on the relevant connections.

3. How It Learns: The "Fill-in-the-Blank" Game

You might wonder, "How does it learn without a teacher telling it what is bad?"
The model plays a game called "Fill-in-the-Blank."

  • The Setup: The computer feeds the model a long list of website requests from a normal user.
  • The Trick: The model secretly hides (masks) one of the requests.
  • The Challenge: The model has to guess what the hidden request was, based only on the other requests in the list.
    • Example: If the list is [facebook.com, instagram.com, <MASK>, whatsapp.com], the model should guess <MASK> is likely messenger.com or something similar.
  • The Result: By playing this game millions of times with real data, the model learns the "grammar" of normal internet behavior. It learns what a "normal sentence" looks like.

4. Catching the Criminals

Once the model is trained, it can spot the bad guys in two ways:

  • The "Out of Place" Detector: If a computer suddenly asks for a list of requests that don't make sense together (like a sentence with random words thrown in), the model gets confused. That confusion is a red flag. It means, "This story doesn't make sense; someone is lying."
  • The Botnet Hunter: Botnets are armies of infected computers. They often talk to each other in weird patterns. Because DNS-GT understands the context of the whole group, it can spot these coordinated, unnatural patterns much better than old methods.

5. Why This is a Big Deal

  • No "Wanted Posters" Needed: It doesn't need to know the specific name of a new virus to catch it. It just knows the virus is acting "weird" in the story.
  • Privacy Friendly: It can learn from raw data without needing to label every single request as "good" or "bad" (which is hard and expensive to do).
  • Adaptable: Just like a human guard who learns from experience, this model can be fine-tuned to catch specific types of threats, like botnets or phishing scams, very quickly.

The Bottom Line

Think of DNS-GT as a security guard who has read every book in the library and can instantly tell if a sentence is a lie, even if the liar is using a new name. It moves cybersecurity from "checking a list of names" to "understanding the story," making it much harder for cybercriminals to hide in plain sight.