LLM-Guided Open Hypothesis Learning from Autonomous Scanning Probe Microscopy Experiments

This paper presents an autonomous scanning probe microscopy framework that integrates symbolic regression with large language models to generate and evaluate new physical hypotheses from sparse experimental data, successfully discovering interpretable voltage-time growth laws for ferroelectric domain switching without pre-specified models.

Original authors: Boris Slautin, Utkarsh Pratiush, Yu Liu, Kamyar Barakati, Sergei Kalinin

Published 2026-05-11
📖 5 min read🧠 Deep dive

Original authors: Boris Slautin, Utkarsh Pratiush, Yu Liu, Kamyar Barakati, Sergei Kalinin

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine a scientist working in a lab with a super-powerful microscope. In the past, this scientist would have to decide exactly what to measure, run the test, look at the results, and then decide what to do next. This is slow and relies heavily on the scientist's own intuition.

In recent years, scientists have built "self-driving" labs. These are like autonomous cars for science: the computer controls the microscope, runs experiments, and tweaks the settings to find the best results as fast as possible. However, there's a catch: these self-driving labs are usually very good at optimizing (finding the best setting) but terrible at discovering new laws. They can tell you "this voltage makes the biggest dot," but they can't tell you why or write down a new rule of physics that explains it. They are stuck inside a box of ideas the human programmer gave them.

This paper introduces a new system that breaks out of that box. It teaches the computer not just to find the best answer, but to invent new theories based on what it sees.

Here is how the system works, using a simple analogy:

The Two-Brain System

Think of this new system as a team of two very different robots working together on a puzzle.

1. The "Pattern Finder" (Symbolic Regression)
Imagine a robot that is incredibly good at math but has no common sense. You give it a few scattered data points (like a few dots on a graph), and it starts screaming out thousands of different math formulas that could connect those dots.

  • What it does: It generates wild guesses like "The size of the dot equals the voltage times the square root of time" or "The size equals the voltage plus a random number."
  • The Problem: Because it has no common sense, it might suggest formulas that are mathematically perfect but physically impossible (like saying a dot gets smaller when you turn up the power). It's like a student who memorized a math textbook but doesn't understand how the real world works.

2. The "Physics Professor" (The Large Language Model)
Now, imagine a second robot that is a super-smart physics professor. This robot has read every physics textbook ever written. It doesn't do the math itself; instead, it acts as a judge.

  • What it does: It looks at the thousands of wild formulas generated by the "Pattern Finder" and says, "Wait a minute. That formula says the dot grows backward in time? That's impossible. Throw it out."
  • The Magic: It ranks the formulas based on whether they make sense in the real world. It picks the ones that follow the rules of physics (like "dots should get bigger with more voltage") and explains why they are good.

The Experiment: Growing Tiny Electric Bubbles

To test this, the researchers used a special microscope to poke a tiny piece of material called PZT (a type of ceramic that holds an electric charge). When they zap it with electricity, a tiny "bubble" of switched charge grows.

  • The Goal: They wanted to find the rule that explains how big that bubble gets based on how long they zap it and how hard they zap it.
  • The Process:
    1. Start: They started with just five random guesses (five different zap settings).
    2. The Loop:
      • The "Pattern Finder" looked at the five results and wrote down 50 possible math rules.
      • The "Physics Professor" read them, gave them scores, and picked the best one.
      • The computer then used that best rule to decide where to zap next to learn more.
      • They did this 10 times, adding more data each round.

The Result: From Guessing to Understanding

At the very beginning, the "Pattern Finder" was confused. It suggested silly rules, like "The bubble size depends only on time, not voltage." The "Physics Professor" gave these low scores and said, "No, that doesn't make sense."

As the experiment continued and the computer gathered more data, the "Pattern Finder" started suggesting smarter rules. Finally, the "Physics Professor" picked a winner: a rule that said the bubble grows based on both the voltage and the time, specifically following a pattern where growth slows down over time (like a "creep" motion).

Why is this a big deal?
In previous experiments, scientists had to tell the computer, "Here are three possible rules; pick the best one." The computer just chose from the list.
In this new experiment, the computer created the rule itself from the data, and the "Physics Professor" confirmed it was real. The system didn't just find the best setting; it discovered a new way to describe how the material behaves.

The Bottom Line

This paper shows a way to turn autonomous science from a "search engine" (which just finds the best answer in a list) into a "scientist" (which can write new laws of physics). By combining a math-bot that generates ideas with an AI-bot that checks if those ideas make sense, the system can learn complex physical rules on its own, starting from almost nothing.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →