Text to Automata Diagrams: Comparing TikZ Code Generation with Direct Image Synthesis

This study evaluates the effectiveness of vision-language and large language models in converting scanned student-drawn automata diagrams into TikZ code, finding that while direct image-to-text generation often yields errors, human-corrected descriptions significantly improve the accuracy of the resulting digital diagrams for educational applications like automated grading.

Ethan Young, Zichun Wang, Aiden Taylor, Chance Jewell, Julian Myers, Satya Sri Rajiteswari Nimmagadda, Anthony White, Aniruddha Maiti, Ananya Jana

Published 2026-03-10
📖 5 min read🧠 Deep dive

Imagine you are a teacher grading a stack of homework. The students have drawn complex maps of "robot brains" (called automata diagrams) on paper. These drawings are messy, scribbled in pencil, and look different for every student. You want to turn these messy sketches into clean, perfect digital diagrams so you can grade them automatically or show them on a screen.

This paper is about a team of researchers trying to build a robot translator to do this job. They wanted to see if AI could look at a messy student drawing, describe it in words, and then turn those words back into a perfect digital drawing.

Here is how they did it, explained with some everyday analogies:

1. The Goal: The "Rosetta Stone" for Robot Brains

The researchers wanted to create a pipeline (a step-by-step process) that works like this:

  1. Input: A messy, hand-drawn sketch of a robot brain.
  2. Step A: An AI "Eye" looks at the sketch and writes a description in plain English.
  3. Step B: A human teacher reads that description and fixes any mistakes (like a proofreader).
  4. Step C: A second AI "Hand" takes the description and writes code (called TikZ) that draws a perfect, clean version of the diagram.
  5. Output: A digital diagram that looks just like what the student intended to draw, but without the messy pencil marks.

2. The Experiment: The "Blind Taste Test"

The researchers tested two different paths to see which one worked better:

  • Path A (The Raw AI): The AI looks at the drawing, writes a description, and immediately passes it to the "Hand" AI to draw the code.
  • Path B (The Human Editor): The AI looks at the drawing, writes a description, a human reads it and fixes the errors (like "Wait, that arrow is pointing the wrong way!"), and then passes the corrected description to the "Hand" AI.

They also tested two different ways of asking the AI to describe the drawing:

  • The "Blind" Prompt: Just showing the picture.
  • The "Context" Prompt: Showing the picture plus the original exam question (e.g., "Draw a machine that counts even numbers").

3. The Big Problems: Where the AI Got Lost

The researchers found that the AI "Eye" is good at seeing shapes, but it often misses the logic.

  • The Analogy: Imagine the AI is a tourist taking a photo of a city map. It sees the lines and the dots, but it doesn't understand that the red line is a highway and the blue line is a river. It might say, "There is a line here," when it should say, "There is a highway connecting these two cities."
  • The Result: When the AI wrote the description without human help, it often missed arrows, got the directions wrong, or forgot which "city" was the starting point.

4. The Two Ways to Draw: "Painting" vs. "Blueprints"

The researchers tried two methods to turn the text back into a picture:

  1. Direct Image Synthesis: Asking the AI to just "paint" a picture based on the text. This is like asking an artist to draw a map from a story. It's fast, but the artist might get the details wrong.
  2. TikZ Code Generation: Asking the AI to write code (a set of strict instructions) that a computer uses to draw the map. This is like giving a robot a blueprint. If the blueprint is right, the robot builds it perfectly.

The Surprise: The "Blueprint" method (TikZ code) worked much better than the "Painting" method. Even if the text description had small errors, the code-based approach was more consistent and accurate.

5. The Verdict: Humans Are Still the Boss

Here is what they discovered:

  • AI alone is messy: When the AI tried to describe the drawing on its own, it made too many mistakes to be useful for grading.
  • Human editing is magic: When a human took 5 minutes to fix the AI's description, the final result was almost perfect.
  • Context helps: If you tell the AI what the student was supposed to draw (the exam question), it makes fewer mistakes.
  • Code is king: Turning text into code (TikZ) is a more reliable way to recreate diagrams than asking the AI to just "draw" the image directly.

Why Does This Matter?

Think of this as building a super-automated teaching assistant.

  • For Teachers: Instead of squinting at 50 messy pencil drawings at 2 AM, the system could turn them into clean digital maps, highlight where the student made a mistake (like a missing arrow), and give a grade.
  • For Students: It could give instant feedback: "Hey, you drew the start circle here, but the rules say it should be there."

In short: The AI is a great assistant, but it's not ready to work alone yet. It needs a human to double-check its notes, and it works best when it's building a blueprint (code) rather than trying to paint a picture.