Do Large Language Models Understand Data Visualization Principles?

This paper presents the first systematic evaluation of large language and vision-language models as flexible validators for data visualization principles, revealing their potential to detect and fix design violations while highlighting a persistent gap with symbolic solvers and an asymmetry where models are better at correcting errors than reliably detecting them.

Martin Sinnona, Valentin Bonas, Viviana Siless, Emmanuel Iarussi

Published 2026-02-24
📖 5 min read🧠 Deep dive

Imagine you are a chef trying to teach a very smart, but inexperienced, sous-chef how to cook a perfect meal. You have a strict recipe book (the Data Visualization Principles) that says things like, "Never use red for a cold dish," or "Always list ingredients from smallest to largest."

For years, we've had a Robot Chef (the old Symbolic Systems) that checks the recipe. It's incredibly precise because it follows a rigid, mathematical rulebook. But here's the catch: to teach the robot, a human expert has to write out every single rule in a complex computer language. If you want to add a new rule, you have to hire a programmer to rewrite the robot's brain. It's accurate, but it's slow and inflexible.

Now, enter the New Sous-Chef: a Large Language Model (LLM). This is an AI that has read millions of cookbooks and knows what "good food" feels like. It doesn't need a rigid rulebook; it just needs you to tell it, "Hey, make sure this dish follows the rules."

This paper is the ultimate taste test. The researchers wanted to see: Can this new AI Sous-Chef actually understand the rules of good cooking, or is it just guessing?

The Experiment: The "Chart" Kitchen

The researchers set up two kitchens to test the AI:

  1. The Synthetic Kitchen (The Practice Range): They generated 2,000 fake charts (recipes) using a computer. They deliberately messed them up in specific ways (e.g., using the wrong color for a category, or cutting off the bottom of a graph). They knew exactly which rules were broken because a computer (the Robot Chef) had written the recipe.
  2. The Real Kitchen (The Restaurant): They took 300 real charts that humans had actually made and published online. They checked these against the rules too.

They asked the AI two main questions:

  • The Inspector: "Look at this recipe. Did I break any rules?"
  • The Fixer: "You broke a rule. Now, rewrite the recipe to fix it without breaking anything else."

The Results: The Good, The Bad, and The Weird

Here is what they found, translated into everyday terms:

1. The AI is a decent Inspector, but not perfect.
When the AI looked at the fake charts, the best models (like Gemini-2.5-Flash) got about 68% of the rules right. That's like a student getting a B- on a test. They were good at spotting obvious mistakes (like a bar chart that looks like a line), but they struggled with subtle, tricky rules (like "don't use color to show order").

  • The Analogy: The AI can tell you the soup is too salty, but it might miss that the chef forgot to peel the carrots.

2. Showing the AI the Picture didn't help much.
The researchers gave some AI models both the text recipe and a picture of the finished dish. They hoped seeing the "ugly" chart would help the AI spot the error.

  • The Result: It helped a tiny bit, but not as much as they hoped. The AI was still mostly relying on the text recipe, not the visual "vibe" of the chart. It's like giving a blind taste-tester a photo of the food; they still have to guess based on the description.

3. The "Fixer" is surprisingly better than the "Inspector."
This was the most surprising twist! When the AI was asked to detect a mistake, it was okay. But when asked to fix the mistake, it got much better (up to 94% success rate).

  • The Analogy: Imagine a student who struggles to identify why a sentence is grammatically wrong. But if you say, "Rewrite this sentence to be correct," they suddenly write a perfect sentence. The AI is better at doing the right thing than explaining what was wrong.

4. Open Source vs. The Big Brands.
The "Big Brand" models (like GPT-4 and Gemini) were generally better than the "Open Source" models (the free, community-built ones). However, the best open-source model was catching up fast, proving that you don't always need the most expensive tool to get a good meal.

The Big Takeaway

The paper concludes that AI is a promising new tool for checking our data charts, but it's not ready to replace the expert human or the rigid robot just yet.

  • The Promise: AI can act as a flexible, conversational editor. You can say, "Make this chart follow the rules," and it will likely do a great job fixing it.
  • The Limit: It still gets confused by the subtle, nuanced rules of human perception. It's like a sous-chef who knows how to chop vegetables perfectly but doesn't quite understand why a certain garnish looks unappetizing.

In short: We are no longer just building robots that follow rules; we are teaching them to understand the spirit of the rules. They are getting there, but they still need a human chef to double-check the final dish.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →