Imagine you are sitting at a table with a friend. You are looking at a piece of paper that says "81". Your friend is sitting directly across from you, looking at the other side of that same piece of paper.
If you asked your friend, "What do you see?" a human would instantly know: "I see 18." Why? Because if you flip the paper 180 degrees, the 8 stays an 8, but the 1 flips around to become a 1 on the other side, and the whole thing reads backward.
This paper asks a very simple question: Can AI do this?
The researchers created a test called FlipSet to see if Vision-Language Models (the smart AI systems that can "see" images and talk about them) can understand what someone else sees when they are looking at the world from a different angle.
Here is the breakdown of what they found, using some everyday analogies:
1. The Big Problem: The "Selfie" Habit
The researchers tested 103 different AI models. The result was shocking: 9 out of 10 models failed.
Instead of imagining what the monkey (the "friend" in the picture) sees, the AI almost always just described what it (the camera) sees.
- The Camera sees: "81"
- The Monkey sees: "18"
- The AI says: "81"
The paper calls this Egocentric Bias. It's like a toddler who thinks that because they see a toy on the left, everyone else must see it on the left too. The AI is stuck in its own "selfie" perspective and cannot mentally step into someone else's shoes.
2. The "Three-Part Puzzle" Experiment
To figure out why the AI was failing, the researchers didn't just ask the hard question. They broke the task down into three smaller puzzles to see which part the AI was good at and which part it broke.
Think of the AI's brain as having three different tools:
Tool A: The "Social Awareness" Tool (Theory of Mind)
- The Question: "Does the monkey see something different than the camera?"
- The Result: The AI is a genius here (90% success). It knows that if you sit across from me, you see a different view. It understands the concept of "other people."
Tool B: The "Mental Gymnast" Tool (Mental Rotation)
- The Question: "If I take this word '81' and spin it 180 degrees in a vacuum, what does it look like?"
- The Result: The AI is okay, but shaky (26% success). It can sometimes figure out how shapes flip, but it's not great at it.
Tool C: The "Grand Finale" (Putting it together)
- The Question: "What does the monkey see?" (This requires using Tool A and Tool B at the same time).
- The Result: The AI crashes completely (10% success).
3. The "Broken Assembly Line"
This is the most important discovery. The AI has the parts, but it can't assemble them.
Imagine a car factory.
- The factory is great at making wheels (Social Awareness).
- The factory is decent at making engines (Mental Rotation).
- But when they try to put the wheels and engine together to make a car (Perspective Taking), the car falls apart.
The researchers call this a Compositional Deficit. The AI knows the pieces, but it lacks the "glue" to combine them in a real-world situation. It's like having a dictionary and a thesaurus, but not knowing how to write a sentence that makes sense.
4. Why "Thinking Harder" Didn't Help
The researchers tried to help the AI by asking it to "think step-by-step" (a technique called Chain-of-Thought). Usually, this helps AI solve math or logic problems.
But here, it made things worse. It was like asking a person who is bad at math to "talk through their steps" while trying to solve a problem they don't understand. The AI would confidently say, "I see the camera, so the answer is 81," and then write a long, convincing paragraph explaining why that is correct, even though it was wrong. It was "hallucinating" logic to support its bad guess.
The Bottom Line
This paper tells us that while AI is getting very good at recognizing objects and understanding language, it is still terrible at "stepping into someone else's shoes."
Current AI models are like mirrors: they reflect exactly what is in front of them. They haven't yet learned to be windows: looking through them to see the world from a different angle. Until we fix this "egocentric bias," AI will struggle to interact with us in complex, real-world social situations where understanding another person's point of view is crucial.
Get papers like this in your inbox
Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.