Imagine you are playing a game of "Connect the Dots," but instead of drawing lines between numbers, you are connecting real-world people, places, and ideas with invisible threads of knowledge.
This paper introduces a new game called CREATE. It's designed to test how "creative" Artificial Intelligence (AI) really is. But here's the twist: it doesn't ask the AI to write a poem or paint a picture. Instead, it asks the AI to find hidden, interesting, and surprising connections between two things that seem unrelated.
The Core Idea: The "Mental Web"
Think of an AI's brain as a giant, messy spiderweb of facts.
- The Old Way: Most AI tests ask the AI to find the nearest thread. "Who is the father of George Washington?" (Easy: George Washington's father).
- The CREATE Way: This test asks the AI to find the weird threads. "How is Dakota Johnson connected to a character in Lord of the Rings?"
A normal answer might be: "They both live in the US." (Boring, weak thread).
A creative answer might be: "Dakota Johnson is the stepdaughter of Antonio Banderas. Antonio Banderas was in Shrek. Shrek is a fantasy movie, just like Lord of the Rings." (Surprising, strong, and fun thread).
The Rules of the Game
To win at CREATE, the AI has to follow two golden rules:
- Be Specific (The "Strong Thread" Rule):
Imagine you are tying a knot. A weak knot is made of loose string (e.g., "They both exist"). A strong knot is made of steel cable (e.g., "They are step-relatives"). The AI gets points for finding "steel cable" connections, not "loose string" ones. - Be Diverse (The "New Path" Rule):
If the AI finds five ways to connect two people, but all five ways go through "Hollywood movies," it's not very creative. It's just repeating the same pattern. The AI needs to find paths through different worlds—maybe one through sports, one through family, and one through geography. It's like finding five different routes to the same city: one by train, one by boat, one by foot, etc.
The Scoreboard: "Creative Utility"
How do we grade the AI? The authors invented a score called Creative Utility.
Think of it like a treasure hunt.
- If the AI finds one amazing treasure (a very specific, surprising connection), that's good.
- If it finds many treasures, but they are all the same kind of rock, that's okay.
- If it finds many different kinds of treasures (gold, jewels, ancient maps) that are all high quality, that's a perfect score.
The paper also introduces a "Patience" meter. If a user is impatient, they only care about the best single answer. If they are patient, they want a whole list of great, different answers. The AI is scored based on how well it handles both.
What Did They Find? (The Plot Twist)
The researchers tested the smartest AI models available (like GPT-5, Claude, and Gemini). Here is what happened:
- The "Thinking" Trap: You might think that if you tell an AI to "think harder" or give it more time to reason, it will get more creative. Not necessarily. The paper found that just giving the AI more "thinking tokens" (more time to chat with itself) didn't always make it find better connections. Sometimes, it just made the AI repeat the same ideas over and over.
- The "Creative Prompt" Myth: Telling the AI "Be creative!" in the instructions didn't help much. It's like telling a person to "be funny" right before a joke; it doesn't guarantee a laugh. The AI needs better tools, not just better instructions.
- The Factuality Trade-off: The AI models that found the most creative and diverse paths sometimes made up facts (hallucinations). The models that were most careful about being 100% true to facts sometimes played it too safe and found boring connections. The best models had to learn a delicate balance: Be bold, but be true.
Why Does This Matter?
We often ask AI to write stories or solve math problems. But true creativity is about seeing the world differently. It's about connecting a "what" with a "why" in a way no one expected.
This paper is like a gym for the AI's imagination. It shows us that while AI is getting better at finding these hidden connections, it still struggles to be truly "human" in its creativity. It often gets stuck in loops or plays it too safe.
In a nutshell: CREATE is a new test that asks AI, "Can you connect the dots in a way that surprises me, without making things up?" The answer is: "Getting there, but we still have a long way to go."