CircuitSense: A Hierarchical MLLM Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process

The paper introduces CircuitSense, a hierarchical benchmark of over 8,000 circuit problems that evaluates Multi-modal Large Language Models across perception, analysis, and design tasks, revealing a critical performance gap where models excel at visual recognition but struggle significantly with deriving symbolic equations and performing mathematical reasoning essential for engineering design.

Arman Akbari, Jian Gao, Yifei Zou, Mei Yang, Jinru Duan, Dmitrii Torbunov, Yanzhi Wang, Yihui Ren, Xuan Zhang

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a super-smart robot how to be an electrical engineer. You show it pictures of circuit diagrams (the blueprints of electronic devices) and ask it to do three things:

  1. See: "What parts are in this picture?" (Is that a resistor or a capacitor?)
  2. Think: "If I turn this on, what will happen mathematically?" (Can you write the formula that describes how the electricity flows?)
  3. Build: "Design a new circuit that meets these specific rules."

This paper, CircuitSense, is a giant test designed to see if today's most advanced AI robots (called Multi-modal Large Language Models, or MLLMs) can actually do these things, or if they are just very good at guessing.

The "CircuitSense" Exam

The researchers built a massive exam with over 8,000 questions. They didn't just use old textbook problems; they invented a "synthetic generator" (like a video game engine) to create brand-new, never-before-seen circuit puzzles. This ensures the AI can't just cheat by memorizing answers from the internet.

The exam is organized like a ladder of difficulty:

  • Level 1 (The Bottom): Simple resistor networks (like a basic ladder).
  • Level 2-4: Getting complex with transistors and tiny electronic switches.
  • Level 5 (The Top): System-level blueprints, like looking at a whole radio or a computer chip as a set of black boxes.

The test covers three main skills:

  • Perception: Spotting the parts (easy for AI).
  • Analysis: Deriving the math equations from the picture (the hard part).
  • Design: Creating a working circuit from scratch (the hardest part).

The Big Surprise: The "Eye" vs. The "Brain"

The results were shocking, like finding out a student who aced the reading comprehension test failed the math test.

  • The Eyes are Great: When asked to simply identify parts (e.g., "Point to the capacitor"), the top AI models got it right 85% to 100% of the time. They can "see" the circuit perfectly.
  • The Brain is Broken: When asked to derive the math equation (e.g., "Write the formula for how this circuit amplifies sound"), the same models crashed. Their accuracy dropped to below 19%.

The Analogy:
Imagine showing a human a picture of a car engine and asking, "What is this?" They say, "It's a V8 engine." (Perfect score!).
Then you ask, "If I turn the key, how much torque will the wheels produce at 3,000 RPM?"
The AI tries to answer but just starts guessing random numbers or making up formulas that look like math but don't work. It's like a person who can recognize a piano but doesn't know how to play a single note.

Why Does This Matter?

The paper argues that for AI to be a true "engineer's assistant," it needs to do more than just recognize patterns. It needs Symbolic Reasoning.

  • Pattern Matching: "I've seen this shape before; it's usually a resistor." (AI is good at this).
  • True Understanding: "Because this is a resistor connected to a capacitor in this specific way, the voltage will drop by 50% at this frequency." (AI is bad at this).

The researchers found that the AI models that were slightly better at doing the math (deriving equations) were also the only ones that could successfully design new circuits. This proves that math is the bridge between seeing a picture and building a machine. Without the math, the AI is just a sophisticated photo album, not an engineer.

The Takeaway

CircuitSense is a wake-up call. It tells us that while AI is amazing at looking at pictures and chatting, it is still terrible at the core of engineering: translating a visual blueprint into a working mathematical model.

Until AI can reliably do the math behind the picture, it cannot be trusted to design the critical systems that run our world (like the power grid or medical devices). The paper suggests that future AI research needs to focus less on "seeing better" and more on "thinking mathematically."