MEGC2026: Micro-Expression Grand Challenge on Visual Question Answering

Imagine you are at a high-stakes poker game. Everyone is trying to look calm, but one player just got a terrible hand. For a split second—faster than a camera shutter—his face twitches. His lip curls slightly, or his eyebrow raises. Then, just as quickly, he masks it with a poker face. That tiny, involuntary flicker is a Micro-Expression (ME). It's the truth trying to peek out from behind a lie.

For years, scientists have been trying to build computers that can spot these fleeting "truth flashes." But now, the game has changed. The MEGC 2026 (Micro-Expression Grand Challenge 2026) is a new competition asking a different, more human question: "Can you not just spot the twitch, but actually talk about it?"

Here is a simple breakdown of what this paper is about, using some everyday analogies.

The Big Idea: From "Spotting" to "Chatting"

In the past, computers were like security guards with a checklist. They would scan a face and say, "Yes, I see a micro-expression," or "No, I don't."

MEGC 2026 is upgrading the computer to be more like a detective with a notebook. Instead of just checking a box, the computer is given a video and asked natural questions like:

"What emotion was the person feeling right before they smiled?"
"How many times did they try to hide their anger?"
"Describe the specific muscle movement around the eyes."

This is powered by Multimodal Large Language Models (MLLMs). Think of these models as super-smart students who have read every book on human emotion and watched millions of videos. They can look at a video and "chat" about what they see, combining visual clues with language skills.

The Two Main Challenges (The Tasks)

The competition has two levels, like a video game with a "Tutorial" and a "Boss Level."

Level 1: The Short-Clip Detective (ME-VQA)

The Scenario: You are shown a very short video clip (a few seconds long) of someone's face.
The Task: You are asked a specific question about that clip.
- Example: "Did the person show a 'lip corner depressor' (a sign of sadness)?" or "Is this a happy or angry micro-expression?"
The Goal: The computer must answer in full sentences, explaining why it thinks that. It's like a teacher asking a student to explain their answer, not just give a number.

Level 2: The Long-Haul Detective (ME-LVQA)

The Scenario: This is the hard mode. You are given a long video (like a whole conversation or a tense meeting) that might last several minutes.
The Task: The video is full of normal talking, laughing, and frowning. Hidden inside are tiny, fleeting micro-expressions. You have to find them and answer complex questions.
- Example: "How many times did the person try to suppress a smile during the meeting?" or "List all the different facial movements that happened."
The Challenge: This is like finding a needle in a haystack, but the haystack is moving, and the needle is invisible to the naked eye. The computer has to remember what happened 30 seconds ago to answer a question about what happened 5 minutes ago.

The "Test Drive" (Baseline Results)

The authors of the paper didn't just propose the idea; they tried it out using two powerful AI models (called Qwen). Think of these models as two different cars the researchers rented to see if they could win the race.

The Zero-Shot Test (Driving without a map): They asked the AI to do the job without any special training on micro-expressions.
- Result: The AI was okay at spotting big, obvious emotions (like "Is he happy?"), but it was terrible at spotting the tiny, subtle ones. It was like a driver who can see a stop sign but misses a small pothole.
The Fine-Tuned Test (Driving with a map): They gave the AI a crash course using specific micro-expression videos.
- Result: The AI got better at writing good sentences and describing what it saw. However, it still struggled with the hardest part: counting exactly how many micro-expressions happened and pinpointing exactly when they occurred in long videos.

The Takeaway

The paper concludes that while AI is getting smarter at "talking" about emotions, it still has a long way to go to truly "understand" the subtle, split-second lies and truths of the human face.

The Good News: AI can now describe micro-expressions in natural language, making the technology more useful for real-world applications (like lie detection or mental health analysis).
The Bad News: Long videos are still too confusing for current AI. The models get lost in the noise of normal facial movements and miss the tiny details.

In short: MEGC 2026 is inviting researchers to teach computers how to be better observers of human emotion, moving from simple "spotting" to complex "storytelling" about what people are really feeling.

MEGC2026: Micro-Expression Grand Challenge on Visual Question Answering

The Big Idea: From "Spotting" to "Chatting"

The Two Main Challenges (The Tasks)

Level 1: The Short-Clip Detective (ME-VQA)

Level 2: The Long-Haul Detective (ME-LVQA)

The "Test Drive" (Baseline Results)

The Takeaway

1. Problem Definition

2. Methodology

3. Key Contributions

4. Experimental Results

5. Significance and Future Directions

MEGC2026: Micro-Expression Grand Challenge on Visual Question Answering

The Big Idea: From "Spotting" to "Chatting"

The Two Main Challenges (The Tasks)

Level 1: The Short-Clip Detective (ME-VQA)

Level 2: The Long-Haul Detective (ME-LVQA)

The "Test Drive" (Baseline Results)

The Takeaway

1. Problem Definition

2. Methodology

3. Key Contributions

4. Experimental Results

5. Significance and Future Directions

More like this

Monotone Comparative Statics without Lattices

Motion Illusions Generated Using Predictive Neural Networks Also Fool Humans

Performance Analysis of IEEE 802.11p Preamble Insertion in C-V2X Sidelink Signals for Co-Channel Coexistence

Construction of time-varying ISS-Lyapunov Functions for Impulsive Systems

Real-Time BDI Agents: a model and its implementation