Graph Neural Networks in EEG-based Emotion Recognition: A Survey

This survey provides a comprehensive review and unified framework for Graph Neural Networks in EEG-based emotion recognition, categorizing existing methods by graph construction stages to offer guidance on their unique physiological foundations while outlining open challenges and future directions.

Chenyu Liu, Yuqiu Deng, Yihao Wu, Ruizhi Yang, Zhongruo Wang, Liangwei Zhang, Siyun Chen, Tianyi Zhang, Yang Liu, Yi Ding, Liming Zhai, Ziyu Jia, Xinliang Zhou

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine your brain is a bustling, high-tech city. Every time you feel an emotion—like joy, anger, or sadness—it's not just one part of the city reacting; it's a complex conversation happening between different neighborhoods (the brain regions).

EEG (Electroencephalogram) is like a drone flying over this city, recording the electrical chatter of the streets. Emotion Recognition is the art of listening to that chatter to figure out what the city is feeling.

For a long time, computers tried to listen to this chatter by treating each street (electrode) as an isolated voice. But emotions are a group effort. A new type of AI called Graph Neural Networks (GNNs) has emerged, which is much better at understanding how these neighborhoods talk to each other.

This paper is a survey (a big map) of how researchers are currently using these "City-Listening" GNNs to understand emotions. The authors realized that while everyone is building these maps, they are doing it in very different ways. So, they created a Unified Framework—a standard blueprint—to organize all these methods into three simple construction stages.

Here is the blueprint, explained with everyday analogies:

Stage 1: The Node-Level (Choosing the "Characters")

Before you can map a conversation, you need to decide who is speaking. In the brain, these "speakers" are the electrodes on your scalp.

  • The Question: What data do we feed the computer for each speaker?
  • The Options:
    • Univariate (The Specialist): We pick just one type of data for each speaker. Maybe we only listen to the speed of their speech (Time Domain) or the pitch of their voice (Frequency Domain). This is the most common approach.
    • Hybrid (The Generalist): We give the computer a mix of everything—speed, pitch, and volume all at once. It's like giving the detective a full dossier instead of just one clue. It's powerful but can be messy if there isn't enough data to sort it all out.

Stage 2: The Edge-Level (Drawing the "Connections")

Now that we have our speakers, we need to draw lines between them to show who is talking to whom. These lines are the Edges.

  • The Question: How do we decide who is connected to whom?
  • The Options:
    • Model-Independent (The Mapmaker): We draw lines based on fixed rules, like a physical map. "If two speakers are standing next to each other, they must be connected." Or, "If their voices sound similar, they are connected." These rules don't change; they are based on physics or biology.
    • Model-Dependent (The Detective): We let the computer learn the connections. As it studies the data, it says, "Hey, Speaker A and Speaker C seem to be conspiring together right now, even though they are far apart." The computer draws the lines itself based on what it learns, making the map smarter but more complex.

Stage 3: The Graph-Level (Building the "City Structure")

Finally, we assemble the speakers and connections into a full structure to understand the whole city's mood.

  • The Question: What kind of city layout are we building?
  • The Options:
    • Multi-Graph (The Multi-Layered City): Instead of one map, we build several at once. One map shows how neighborhoods talk horizontally (left to right), another shows vertical talk (front to back), and another shows how they talk over time. We combine all these views for a 3D understanding.
    • Hierarchical Graph (The Neighborhood Hierarchy): We group speakers into neighborhoods, then group neighborhoods into districts. This helps the computer understand both the local gossip (within a neighborhood) and the city-wide trends (between districts).
    • Time Series Graph (The Movie): Emotions change over time. This approach treats the city as a movie, not a photo. It looks at how the connections change from second to second.
    • Sparse Graph (The Filter): In a real city, not everyone talks to everyone. This approach tries to cut out the noise, keeping only the most important connections to avoid getting overwhelmed by irrelevant chatter.

What's Next? (The Future Directions)

The authors point out that while we are doing well, there are still some blind spots:

  1. The "Time Travel" Problem: Current maps often miss how a conversation in one neighborhood now affects a different neighborhood later. We need a "fully connected time graph" to catch these delayed reactions.
  2. Graph Condensation (The Compression): Our current maps are huge and full of redundant details. We need a way to compress the city into a tiny, efficient model that keeps only the essential emotional information, making it faster to run.
  3. Heterogeneous Graphs (The Multi-Sensor City): Emotions aren't just in the brain; they affect your heart and sweat glands too. Future maps should combine brain data with heart data to get the full picture of human emotion.
  4. Dynamic Graphs (The Living City): Most maps are static snapshots. We need maps that breathe and change shape in real-time as the emotion evolves, capturing the fluid nature of human feelings.

The Bottom Line

This paper is a guidebook for engineers and scientists. It says: "Stop reinventing the wheel. Here is a standard way to build these emotion-detecting AI systems, broken down into three clear steps. If you follow this blueprint, you can build better, faster, and more accurate tools to understand the human heart and mind."