When word order matters: human brains represent sentence meaning differently from large language models

This study using 7T fMRI data reveals that while large language models outperform order-agnostic models, they still fail to capture human brain representations of sentence meaning as effectively as models explicitly designed to encode structural relations, underscoring the critical role of sentence structure in human cognition.

Original authors: Fodor, J., Murawski, C., Suzuki, S.

Published 2026-03-18
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Question: Do AI Brains Think Like Human Brains?

Imagine you have two chefs.

  • Chef A is a human who has cooked for decades. They understand that "The dog chased the cat" is very different from "The cat chased the dog," even though the ingredients (the words) are exactly the same. They understand the story.
  • Chef B is a super-advanced robot (a Large Language Model, like the ones powering modern AI). It has read every recipe book in the world. It can write a perfect sentence about a dog chasing a cat.

The big question scientists have been asking is: When Chef B writes a sentence, does it "understand" the story the same way Chef A does? Or is the robot just a fancy parrot that mimics the sound of understanding without actually getting the meaning?

This paper says: The robot is good at mimicking, but it's failing the "story test."


The Experiment: The "Word Swap" Game

To find the answer, the researchers didn't just ask the AI to write essays. They set up a tricky game to see how the brain and the AI handle sentence structure.

They created 108 special sentences. Think of these sentences as LEGO sets.

  • The "Same" Set: You have a blue brick, a red brick, and a green brick.
  • The "Swapped" Set: You have the exact same blue, red, and green bricks, but you rearrange them.

The Trap:
If you just look at the list of bricks (the words), the two sets look identical.

  • Sentence 1: "The cameraman brought the equipment to the director."
  • Sentence 2: "The director brought the cameraman to the equipment."

To a computer that only counts words, these are 90% similar. To a human, they are totally different stories! In the first, the cameraman is the hero; in the second, the director is the hero.

The Test:
The researchers put 30 people in an MRI machine (a giant camera that takes pictures of the brain) and showed them these sentences. They also asked a group of people online to rate how similar the sentences were.

Then, they asked: "Which computer model's 'brain' matches the human brain's reaction to these swapped sentences?"

They tested four types of "computer brains":

  1. The "Word Bag" (Mean Model): A computer that just throws all the words in a bucket and averages them. It ignores order completely.
  2. The "Transformer" (The AI): The fancy AI models (like GPT-4) that we use today.
  3. The "Graph" Model: A computer that draws a map of how words connect to each other (like a family tree).
  4. The "Hybrid" Model: A mix of the two, designed specifically to understand roles (who did what to whom).

The Results: The Robot Missed the Mark

Here is what happened when they compared the computer models to the human brain:

1. The "Word Bag" was terrible.
As expected, the model that ignores word order got a negative score. It thought the swapped sentences were almost identical, while the human brain knew they were opposites.

2. The "Transformer" (AI) was better, but still wrong.
The AI models were much better than the "Word Bag." They did a decent job. However, when the researchers looked closely at the "Swapped" sentences, the AI still thought they were too similar.

  • The Analogy: Imagine the AI is like a student who memorized the vocabulary list but didn't read the instructions. It sees the words "cameraman," "director," and "equipment" and thinks, "Ah, these are the same ingredients, so the dish must be the same!" It missed the recipe.
  • The Brain: The human brain, however, lit up differently for the swapped sentences. It knew the roles had changed. The AI failed to match this pattern.

3. The "Hybrid" and "Graph" models won.
The models that were explicitly built to understand who did what to whom (the semantic roles) matched the human brain the best.

  • The Analogy: These models were like a detective who doesn't just list the suspects; they draw a map of who did what to whom. When the roles were swapped, the detective (and the human brain) said, "Wait, the plot has changed!"

The "Long Sentence" Surprise

There was one other weird thing they found.
When people read very long sentences, their brains lit up in a very similar way, regardless of what the sentence actually meant.

  • The Analogy: It's like walking into a gym. Whether you are lifting a heavy weight or just stretching, your heart rate goes up and you sweat. The brain seems to have a "long sentence mode" where it just gets ready to work hard, ignoring the specific details for a moment. The researchers had to account for this "gym mode" to see the real differences in how the brain understood the stories.

The Bottom Line

What does this mean for us?

  1. AI is impressive, but it's not human. Large Language Models are amazing at generating text and answering questions. But they don't "think" about sentence structure the way our brains do. They are more like super-advanced pattern matchers than true understanders.
  2. Structure matters. The human brain cares deeply about the order of words and the roles they play. If you swap the subject and the object, the meaning changes completely, and our brains know it instantly. Current AI models are still a bit "clumsy" at this specific task.
  3. We need better models. To build AI that truly understands us, we might need to move away from just "predicting the next word" and start building models that explicitly map out the relationships between words, just like our brains do.

In short: The AI can write a poem, but it doesn't quite understand the story behind the words the way a human does. It's a brilliant mimic, but not a true thinker yet.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →