Imagine you are teaching a robot how to clean your room. You show it how to pick up a sock and put it in the hamper. You show it how to pick up a book and put it on the shelf.
If the robot is truly smart, it should be able to figure out how to pick up a sock and then put it on the shelf, or pick up a book and then put it in the hamper, even if you never specifically showed it those exact combinations. This ability to mix and match learned skills to solve new problems is called compositional generalization. It's a superpower of human intelligence, but for Artificial Intelligence (AI), it's like trying to teach a dog to do calculus.
This paper introduces a new tool called COGITAO (a mouthful, so let's just call it the "Lego Lab") designed to test exactly how good AI is at this kind of mixing-and-matching.
The Problem: AI is a "Pattern Matcher," Not a "Thinker"
Current AI models (like the ones powering chatbots or image generators) are incredibly good at memorizing patterns. If you show them a million pictures of a cat, they can spot a cat. But if you ask them to do something slightly new—like "rotate the cat and then make it blue"—they often get confused. They tend to just guess based on what they've seen before, rather than actually understanding the rules of how to combine actions.
The researchers wanted a way to test this without the messiness of the real world (like bad lighting or messy rooms). They needed a clean, controlled environment.
The Solution: The "Lego Lab" (COGITAO)
Think of COGITAO as a giant, infinite digital sandbox made of grids (like graph paper).
- The Objects: Instead of real cats or cars, the AI sees simple shapes (squares, circles, weird blobs) made of colored pixels.
- The Actions: The researchers created a "toolbox" of 28 simple moves. You can rotate a shape, move it up, flip it, change its color, or cut a piece off.
- The Game: The AI is given an "Input Grid" (a picture of shapes) and a "Rule" (a list of moves, like "Rotate 90 degrees, then move up"). It has to draw the "Output Grid" (what the picture looks like after the moves).
The magic of COGITAO is that it can generate millions of unique puzzles. It can make the rules easy (just move one shape) or incredibly hard (rotate three shapes, flip two, and change the colors of all of them in a specific order).
The Experiment: Testing the AI's Brain
The researchers took several state-of-the-art AI models (some designed for vision, some for language, some designed to "think" step-by-step) and put them through the COGITAO Lab.
They ran two main types of tests:
The "Mix-and-Match" Test (Compositional Generalization):
- Training: The AI learns to "Rotate" and "Move Up" separately. It also learns to do "Rotate then Move Up."
- The Test: The AI is asked to do "Move Up then Rotate."
- The Result: Even though the AI knew both moves perfectly, it failed miserably when asked to swap the order. It was like a chef who knows how to chop onions and how to fry eggs, but when asked to fry an egg then chop an onion, they just fried the onion and chopped the egg.
The "New Environment" Test (Systematic Generalization):
- Training: The AI learns to move shapes on a small 10x10 grid with 2 shapes.
- The Test: The AI is asked to move shapes on a huge 20x20 grid with 10 shapes.
- The Result: The AI got confused. It couldn't scale its logic up. It was like teaching someone to drive in a parking lot, then expecting them to drive a Formula 1 car on a highway immediately.
The Big Surprise: "Stubbornness"
The paper found something fascinating about how the AI failed. They called it "Stubbornness."
When the AI faced a new, tricky puzzle, it didn't try to figure out the new rules. Instead, it just ignored the new instructions and did what it was trained to do most often.
- Example: If the AI was trained mostly on "Move Right," and you asked it to "Move Left," it would often just "Move Right" anyway. It was too lazy to learn the new rule and just defaulted to its old habit.
Why Does This Matter?
You might think, "So what? It's just a grid game."
But this is a huge deal for the future of AI.
- Real-World Robots: If a robot can't learn to "open the fridge" and "get the milk" separately and then combine them to "get the milk from the fridge," it will never be useful in a real house.
- True Intelligence: Humans can learn a few basic concepts and combine them in infinite ways. Current AI is stuck in a loop of memorization. COGITAO proves that simply making AI bigger or training it on more data isn't the answer. We need to build AI that actually understands how to combine ideas, not just how to copy them.
The Takeaway
COGITAO is like a stress test for the human-like reasoning of AI. It shows us that while our current AI is a brilliant memorizer, it is still a terrible "combinator." It can't easily mix and match its skills to solve new problems.
The paper concludes that until we can build AI that passes the COGITAO test, we are still far from creating machines that truly think like humans. We are building smart parrots that can repeat what they've heard, but we haven't yet built the thinkers who can write their own songs.
Get papers like this in your inbox
Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.