Imagine you have a magical, super-smart artist named "AI." You can tell this artist, "Draw me a scene of people cooking in the 18th century," or "Show me a group of people working in the 1950s." The artist is incredibly talented and can create beautiful pictures in seconds.
But here's the catch: Does this artist actually know history, or are they just guessing based on old movies and stereotypes?
This paper, titled "Synthetic History," is like a report card for these AI artists. The researchers wanted to see if AI models (specifically "Diffusion Models" like Stable Diffusion and FLUX) can accurately depict the past, or if they are just making up a "fake history" that looks cool but is factually wrong.
To test this, they created a massive "History Exam" called HistVis.
The Setup: The AI History Exam
The researchers didn't ask the AI to draw specific famous people (like Napoleon) or specific events (like the moon landing). Instead, they asked it to draw universal human activities (like "eating," "dancing," "working," or "praying") across 10 different time periods (from the 1600s to today).
They asked three different AI artists to draw these scenes 10 times each. That resulted in 30,000 images. They then graded these images on three specific subjects:
1. The "Style" Test: Does the AI think the past is black and white?
The Metaphor: Imagine if you asked a photographer to take a picture of a 17th-century village. If they automatically gave you a black-and-white sketch or an old-fashioned engraving, even though you didn't ask for that, they are relying on a stereotype.
The Finding: The AI artists have a "default setting" for every era.
- For the 17th and 18th centuries, the AI almost always drew them like old engravings or paintings, even when asked for a realistic photo.
- For the 19th century, they switched to drawings.
- For the 20th century, they finally started making photographs.
- The Problem: The AI isn't thinking about what the world actually looked like; it's just following a visual rulebook it learned from its training data. It assumes the past must look like an old painting.
2. The "Time Travel" Test: Did the AI bring a smartphone to the 1700s?
The Metaphor: This is the "Back to the Future" test. If you see Marty McFly wearing his sneakers in 1885, that's a mistake. The researchers looked for anachronisms—objects that didn't exist in that time period.
The Finding: The AI is a terrible time traveler.
- They found modern headphones in 18th-century music scenes.
- They found vacuum cleaners in 19th-century homes.
- They found smartphones in 1950s photos.
- The Problem: The AI focuses too much on the activity (e.g., "listening to music") and forgets the time (e.g., "1700s"). It thinks, "Music = Headphones," ignoring that headphones didn't exist yet. One model (SD3) was particularly bad at this, putting modern gadgets in almost 25% of its 1930s images.
3. The "Who Was There?" Test: Are the people realistic?
The Metaphor: Imagine a school play about the 17th century. If the director casts only white men for every single role, even in a scene where women and people of other races were historically present, the play feels wrong.
The Finding: The AI has a hard time guessing the right mix of people.
- Gender: The AI often over-casts men in roles like "cooking" or "dining," even though historical data suggests women did a lot of this work. Conversely, it sometimes under-casts women in education, even when they were present.
- Race: The AI tends to make almost everyone White in older time periods, only adding diversity as it gets closer to the present day. It struggles to imagine a diverse world in the past, often defaulting to a "Western-centric" view.
- The Problem: The AI is projecting today's biases or a simplified version of history onto the past, rather than reflecting the complex reality of who was actually there.
The Big Conclusion
The researchers tried to "fix" the AI by giving it better instructions (like saying, "Make it look like a real photo, not a drawing"). But the AI was stubborn. It kept falling back on its old habits.
In simple terms:
Current AI models are like tourists with a camera who have never actually visited the places they are photographing. They rely on postcards and movies to guess what a place looks like. They get the general vibe right, but they miss the details, bring the wrong props, and often get the people wrong.
Why Does This Matter?
If schools, museums, or news outlets start using these AI images to teach history or show the past, they might accidentally teach us fake history. We might start believing that the past was always black and white, that only men did certain jobs, or that modern technology has always existed.
This paper is a wake-up call: Before we trust AI to tell us stories about our past, we need to teach it to be a better historian, not just a better artist.
Get papers like this in your inbox
Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.