Enhancing Authorship Attribution with Synthetic Paintings

This study demonstrates that augmenting limited real artwork datasets with synthetic images generated via DreamBooth fine-tuning of Stable Diffusion significantly improves the accuracy and generalization of computational authorship attribution models.

Clarissa Loures, Caio Hosken, Luan Oliveira, Gianlucca Zuin, Adriano Veloso

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a mystery: Who painted this picture?

Usually, art experts solve this by studying the artist's life, the chemicals in the paint, and the specific way they hold a brush. But today, we want to teach computers to do this. The problem? Computers are like hungry students; they need to see thousands of examples to learn a style. But for many famous painters, we only have a handful of paintings left in the world. It's like trying to teach someone to recognize a specific singer's voice by only letting them hear three songs.

This paper is about a clever trick to solve this "not enough data" problem. Here is the story of how they did it, explained simply.

1. The Problem: The "Empty Classroom"

The researchers wanted to teach a computer to distinguish between seven British painters from the 1700s and 1800s. These painters were like neighbors who lived in the same town, wore similar clothes, and painted similar landscapes. They were very hard to tell apart.

To make it worse, the computer only had 7 to 25 photos of each artist to study. That's like trying to learn a new language by reading just a few pages of a dictionary. The computer kept getting confused.

2. The Solution: The "AI Photocopier"

Instead of waiting for more real paintings to appear (which isn't happening), the researchers decided to invent new ones.

They used a powerful AI tool called Stable Diffusion (specifically a technique called DreamBooth). Think of this AI as a super-talented art student who has studied the real paintings.

  • The Training: They showed the AI just a few real paintings of "Artist A."
  • The Prompt: They told the AI, "Draw a painting in the style of Artist A."
  • The Result: The AI didn't copy the real paintings exactly. Instead, it learned the vibe, the brushstrokes, and the colors, and then created brand new, fake paintings that looked like they were painted by that artist.

They made 100 of these "fake" paintings for each artist. Now, instead of having 10 real examples, the computer had 10 real ones plus 100 fake ones.

3. The Experiment: Mixing the Ingredients

The team ran four different tests to see which "recipe" worked best:

  • Recipe A (Real Only): The computer studied only the few real paintings. (The "Starving Student" approach).
  • Recipe B (Fake Only): The computer studied only the AI-generated paintings. (The "Imagination Only" approach).
  • Recipe C (Fake to Real): The computer studied the fake paintings but was tested on the real ones. (The "Theory vs. Practice" approach).
  • Recipe D (The Hybrid Mix): The computer studied a mix of real and fake paintings together. (The "Best of Both Worlds" approach).

4. The Results: What Worked?

Here is what they found, using some fun analogies:

  • The "Fake Only" Surprise: When the computer studied only the AI-generated art, it actually got really good at recognizing the style! It was like a student who memorized the textbook so well they could ace the test. However, this only worked if the test was also fake.
  • The "Domain Gap" (The Reality Check): When they trained the computer on fake art and then tested it on real art, it stumbled. It's like teaching someone to drive on a video game simulator and then putting them behind the wheel of a real car on a rainy day. The real car felt different.
  • The Winner (The Hybrid Mix): The best results came from Recipe D. By mixing the few real paintings with the many fake ones, the computer learned the "rules" of the style without getting confused by the lack of data.
    • For the artists with the fewest real paintings (the ones in the most trouble), this trick was a lifesaver. Their accuracy jumped significantly.
    • For the artists who already had lots of paintings, the fake ones didn't help much. It's like adding extra water to a soup that is already perfectly seasoned; it doesn't make it better, and sometimes it makes it watery.

5. The Catch: The AI's "Bad Habits"

There was one funny glitch. The AI, when generating new paintings, kept making them look "cropped" (cut off at the edges), even though the researchers told it not to.

  • Why? Because the few real paintings they used to train the AI happened to have a lot of cut-off figures in them. The AI learned, "Oh, this artist likes to cut off the edges!" and copied that bad habit.
  • Lesson: Garbage in, garbage out. If your training data has flaws, the AI will copy those flaws.

The Big Takeaway

This paper proves that synthetic data is a powerful tool for art authentication, but it works best as a supplement, not a replacement.

Think of it like this: If you are trying to learn a song and you only have one recording, you might struggle. But if you have that one recording plus a bunch of AI-generated covers of the same song, you can finally hear the melody clearly and recognize the artist, even if the AI covers aren't perfect.

In short: When real art is rare, AI-generated art can fill the gaps, helping computers become better art detectives. But we still need the real thing to make sure the AI isn't just making things up!