Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs

This paper introduces Feynman, a scalable knowledge-infused agent that generates high-quality, well-aligned diagram-caption pairs using an iterative code-planning and optimization-based rendering pipeline, resulting in a 100k+ dataset and a new benchmark called Diagramma for evaluating visual reasoning in vision-language models.

Zixin Wen, Yifu Cai, Kyle Lee, Sam Estep, Josh Sunshine, Aarti Singh, Yuejie Chi, Wode Ni

Published 2026-03-16
📖 4 min read☕ Coffee break read

Imagine you want to teach a robot how to draw a perfect map of a city, but the robot is terrible at two things: knowing the facts (like where the library is) and drawing the lines (making the streets look neat).

Most current AI models are like a brilliant art student who has never studied geography, or a geography professor who can't hold a pencil. They either draw beautiful but factually wrong maps, or they write perfect descriptions of cities that look like messy scribbles when drawn.

This paper introduces FEYNMAN, a new AI agent that solves this problem by acting like a super-efficient construction crew rather than a single artist. Here is how it works, broken down into simple steps:

1. The Problem: The "All-in-One" Trap

Usually, when you ask an AI to "draw a diagram of a chemical reaction," it tries to do everything at once: think of the chemistry, decide what the atoms look like, and draw the picture.

  • The Result: It often gets the chemistry wrong, or the picture looks messy and unreadable. It's like asking a chef to design the menu, cook the meal, and plate it all in one second. Something usually goes wrong.

2. The Solution: The FEYNMAN Assembly Line

FEYNMAN changes the game by breaking the job into three distinct roles, passing the work down the line like a relay race.

Step 1: The "Idea Generator" (The Architect)

First, FEYNMAN asks a smart AI (like a human expert) to just list the facts.

  • Analogy: Imagine an architect sitting down and saying, "Okay, for this house, we need a kitchen, a living room, and a roof. The kitchen needs a stove."
  • FEYNMAN doesn't draw anything yet. It just writes down the knowledge (the "ideas") needed for the diagram. This ensures the science or math is 100% correct before a single line is drawn.

Step 2: The "Translator" (The Blueprint Maker)

Next, FEYNMAN takes those facts and translates them into a special language called "Substance."

  • Analogy: The architect hands the list to a translator who speaks "Blueprint." The translator writes: "Place the stove here. Put the table there."
  • Crucially, this language doesn't say exactly where to put the pixels. It just says what needs to be there and how they relate to each other.

Step 3: The "Artistic Builder" (The Renderer)

This is where the magic happens. FEYNMAN uses a tool called PENROSE.

  • Analogy: Imagine a robot builder who has a blueprint. Every time you ask it to build the house, it follows the rules (kitchen next to living room) but randomly decides the exact shade of the walls, the angle of the roof, or the size of the windows.
  • Because the rules are strict but the details are random, PENROSE can generate 100 different versions of the same diagram. They all mean the same thing, but they all look different. This gives us "visual diversity" without losing the "knowledge accuracy."

Step 4: The "Quality Control" (The Panel of Judges)

Before FEYNMAN saves the diagram, it shows it to a panel of other AIs (the "judges").

  • Analogy: Think of a reality TV show where the judges critique the house. "Hey, that stove is too close to the window," or "The text is too small to read."
  • If the judges say it's bad, FEYNMAN goes back, fixes the code, and tries again. It keeps doing this until the diagram is perfect.

3. The Big Win: A Massive Library of Diagrams

Because this process is automated and efficient, FEYNMAN didn't just make a few diagrams. It built a giant library of over 100,000 high-quality diagrams covering math, science, and computer science.

  • It cost less than $400 to make this library (using a cheap AI model for the heavy lifting).
  • It created a new test called DIAGRAMMA to see if other AI models can actually understand these diagrams. The results? Even the smartest AIs today struggle with them, proving that diagrams are still a hard challenge for machines.

Why This Matters

Think of FEYNMAN as the industrial revolution for educational diagrams.

  • Before: A teacher had to spend hours drawing a diagram by hand, or an AI would draw a messy, confusing one.
  • Now: FEYNMAN can instantly generate thousands of clear, accurate, and diverse diagrams for textbooks, websites, and AI training.

It separates the brain (the knowledge) from the hand (the drawing), allowing both to do what they do best. This means we can finally teach AI to "see" and "understand" complex visual concepts, just like a human does.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →