EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education

This paper introduces EduAIGV-1k, the first benchmark dataset for evaluating AI-generated educational videos on math concepts, and proposes EduVQA, a novel assessment framework featuring a Structured 2D Mixture-of-Experts module that leverages fine-grained perceptual and prompt-alignment annotations to outperform existing video quality baselines.

Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui

Published 2026-03-04
📖 4 min read☕ Coffee break read

Imagine you've just built a magical video machine that can turn any sentence you write into a moving picture. It's amazing! You type "a cat eating pizza," and poof, there's a cat eating pizza.

But now, imagine you want to use this machine to teach a 5-year-old how to count or understand shapes. Suddenly, the stakes get higher. If the machine draws "three blue blocks" but accidentally makes them red, or if the blocks melt into a puddle halfway through the video, the lesson fails. The child learns the wrong thing.

This is the problem the paper EduVQA is trying to solve. Here is the story of how they fixed it, explained simply:

1. The Problem: The "Magic" Machine is a Bad Teacher

Current AI video generators are great at making cool, artistic videos for movies or TikTok. But they are terrible at being precise teachers. They often:

  • Miscount: They might draw "five apples" but only show four.
  • Get Confused: They might make a triangle rotate the wrong way.
  • Glitch: The video might flicker or the objects might jump around weirdly.

Existing tools for judging video quality are like art critics. They ask, "Is this pretty? Is the lighting good?" They don't ask, "Did the AI actually follow the math instructions?"

2. The Solution Part 1: Building a "Report Card" (The Dataset)

To fix this, the researchers built a massive test bank called EduAIGV-1k. Think of this as a giant library of 1,130 math videos.

  • The Prompts: They wrote 113 specific math instructions (like "Show a square turning into a circle" or "Count three red balls").
  • The Generators: They fed these instructions into 10 different AI video machines (the "students").
  • The Grading: Instead of just giving a video a score of "5 out of 10," human experts graded them on a detailed report card with two main sections:
    1. The "Look" (Perceptual Quality): Is the video smooth? Do the edges look sharp, or is it blurry? (Like checking if a drawing is neat).
    2. The "Meaning" (Prompt Alignment): Did the AI actually do what you asked? If you said "four blue cars," did it show exactly four, and were they blue? (Like checking if the student answered the math problem correctly).

3. The Solution Part 2: The "Super-Grader" (The EduVQA Model)

Now that they had a library of graded videos, they needed a robot that could grade new videos automatically. They built EduVQA.

Think of EduVQA as a super-smart teaching assistant with a special brain structure called S2D-MoE (Structured 2D Mixture-of-Experts). Here is a simple analogy for how it works:

  • The Old Way (Single Grader): Imagine one teacher trying to grade a student's essay. They have to check grammar, spelling, plot, and math all at once. They might get tired and miss small details.
  • The EduVQA Way (The Expert Panel): Imagine a team of specialists working together:
    • Expert A only looks at the spatial stuff (is the drawing clear?).
    • Expert B only looks at the time stuff (is the movement smooth?).
    • Expert C only checks the words (did the AI count correctly?).
    • Expert D checks the whole story (does it make sense?).

The magic of EduVQA is that these experts talk to each other. They share their findings so the final grade isn't just a sum of parts, but a smart, connected judgment. If the "Word Expert" sees a mistake, it tells the "Overall Expert," "Hey, the whole video is wrong because the numbers are wrong!"

4. Why This Matters

Before this paper, if you wanted to make an AI video for a school, you'd have to guess if it was good. Now, with EduVQA:

  • Teachers can trust that the AI videos they use will actually teach the right concepts.
  • Developers have a clear target to aim for. They can say, "My AI needs to get a higher score on 'counting accuracy' before I release it."
  • Kids get better learning tools where the math is actually correct, not just pretty.

The Bottom Line

The researchers built a giant math-video test and a smart grading robot to ensure that when AI makes videos for kids, it doesn't just look cool—it actually teaches the lesson correctly. They are turning the "magic" of AI into a reliable tool for education.