SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding

The paper introduces SEED, a novel semantic evaluation metric for visual brain decoding that integrates three neuroscientifically inspired components to achieve superior alignment with human judgments, revealing critical limitations in current state-of-the-art models and providing open-source data to guide future advancements.

Juhyeon Park, Peter Yongho Kim, Jiook Cha, Shinjae Yoo, Taesup Moon

Published 2026-02-25
📖 5 min read🧠 Deep dive

Imagine you are an artist who can paint pictures directly from someone's brainwaves. You ask a person to look at a photo of a teddy bear, and you use a super-computer to translate their brain activity into a new image.

Now, imagine you show that new painting to a group of people and ask, "How close is this to the original teddy bear?"

  • The People say: "It looks like a cat. It's cute, but it's definitely not a bear."
  • The Old Computer Score says: "98% Perfect! Great job!"

This is the problem this paper, SEED, is trying to solve.

The Problem: The "Fake Perfect" Score

For a long time, scientists measuring brain-decoding models have used a set of standard "rulers" (metrics) to grade how good the AI is at painting what it sees in the brain.

Think of these old rulers like a strict geometry teacher. They check if the lines are straight and the colors are in the right spots. If the AI draws a cat instead of a bear, but the cat is the same size and color as the bear, the geometry teacher gives it an A+.

But humans don't grade like that. We care about meaning. If you ask for a bear and get a cat, you failed, even if the cat is drawn perfectly. The old rulers were giving "A+" grades to paintings that were semantically wrong, making researchers think the technology was better than it actually was.

The Solution: SEED (The "Human-Like" Judge)

The authors created a new grading system called SEED (Semantic Evaluation for Visual Brain Decoding). Instead of just checking lines and pixels, SEED tries to grade the painting the way a human does. It uses three different "judges" to give a final score:

  1. The Object Detective (Object F1):

    • Analogy: Imagine a game of "I Spy."
    • How it works: This judge looks at the original photo and the AI's painting and asks, "Did the AI find the main things?" If the original has a dog and a ball, does the painting have a dog and a ball? If the AI swapped the dog for a cat, this judge gives a low score. It's like checking if the ingredients in a cake are actually flour and eggs, not just if the cake looks round.
  2. The Storyteller (Cap-Sim):

    • Analogy: Imagine two people describing a photo to a blind friend.
    • How it works: This judge asks an AI to write a sentence describing the original photo and another sentence for the AI's painting. Then, it compares the stories.
    • Example: If the original is "A man skiing on a snowy hill" and the painting is "A woman skiing on a sunny beach," the stories are very different. Even if the shapes look similar, the story is wrong. This catches details like gender, background, and actions that the Object Detective might miss.
  3. The Vibe Checker (EffNet):

    • Analogy: A quick gut feeling.
    • How it works: This is an existing tool that looks at the overall "feel" and structure of the image. It checks if the general vibe matches, acting as a safety net to make sure the picture isn't just a random mess.

The Final Score: SEED takes the average of these three judges. If the AI gets a high score, it means it got the objects right, the story right, and the vibe right.

What Did They Find?

The authors tested this new system on the best brain-decoding models currently available (the "champions" of the field).

  • The Shocking Result: Even the "champion" models, which were getting near-perfect scores on the old rulers, were actually failing the SEED test.
  • The "Near-Miss" Problem: They found that models often confuse similar things. They might turn a dog into a cat, or a truck into a bus. To the old rulers, this was a small mistake. To SEED (and humans), it's a big failure.
  • The Missing Details: Sometimes the models got the main object right (a bird) but missed the details (the bird was facing the wrong way, or the background was a jungle instead of a forest).

Why Does This Matter?

Think of it like training a student.

  • If you only use the Old Rulers, you tell the student, "You're doing great!" even when they are drawing the wrong animal. The student stops trying to improve because they think they've already won.
  • With SEED, you tell the student, "You drew a cat, but I asked for a bear. You need to learn the difference."

The Takeaway

This paper is a wake-up call. It says, "Stop using the old, broken rulers that give fake perfect scores." By using SEED, researchers can finally see the real mistakes their AI is making. This will help them build better brain-decoding tools that don't just look "okay" to a computer, but actually make sense to human brains.

In short: SEED is the new, honest teacher that makes sure the AI is actually learning to see what we see, not just guessing the right shape.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →