Shape vs. Context: Examining Human--AI Gaps in Ambiguous Japanese Character Recognition

This paper investigates the behavioral gaps between humans and Vision-Language Models in recognizing ambiguous Japanese characters by demonstrating that their decision boundaries differ in shape-only tasks, though contextual information can partially improve human alignment in VLMs.

Daichi Haraguchi

Published 2026-03-02
📖 5 min read🧠 Deep dive

The Big Idea: Are AI Brains Like Human Brains?

Imagine you have a super-smart robot that can read almost anything perfectly. It gets 99% of the answers right on a test. But does it "think" the way you do when it's confused?

This paper asks a simple question: When the robot sees something blurry or ambiguous, does it guess the same way a human would?

The author, Daichi Haraguchi, decided to test this using two very similar Japanese characters that look almost identical:

  • ソ (So): Looks like a little "S" or a checkmark.
  • ン (n): Looks like a "n" or a tiny hook.

The difference between them is tiny—just the angle of one little line. To humans, this is a classic "is it a duck or a rabbit?" optical illusion.

The Experiment: Blurring the Lines

To test this, the researcher didn't just use clear pictures. He used a special AI tool (a β\beta-VAE) to create a smooth gradient of images.

The Analogy: The Color Mixer
Imagine you have a bucket of blue paint (the character "So") and a bucket of red paint (the character "n").

  • Step 1: You pour 100% blue. It's clearly blue.
  • Step 2: You pour 100% red. It's clearly red.
  • Step 3: You mix them. You get purple. Then darker purple. Then lighter purple.

The researcher created 15 versions of these characters, ranging from "100% So" to "100% n," with every tiny shade of "maybe-So-maybe-n" in between.

He then asked two groups to look at these blurry images and guess what they were:

  1. Humans: Real people taking a survey.
  2. AI Models: Two famous AI chatbots (GPT and Gemini).

Part 1: The "Shape-Only" Test (Looking at the Blurry Blob)

The Setup: The AI and humans were shown only the single, blurry character. No other words, no context. Just the blob.

The Result:

  • Humans: As the image got more like "n," humans smoothly switched their votes from "So" to "n." It was a clean, logical line.
  • The AI: The AI was weird.
    • One AI (GPT) kept insisting it was "So" even when the image was almost 100% "n." It was stubborn.
    • The other AI (Gemini) was confused and didn't switch its vote as smoothly as humans did.

The Takeaway: Even when the picture is clear enough for a human to be 100% sure, the AI is still hesitating or guessing wrong based on its own internal biases. They don't see the world the same way.

Part 2: The "Context" Test (The Sentence Puzzle)

The Setup: Now, the researcher put that same blurry character inside a word.

  • Example: The word "Dance" (ダンス).
  • He replaced the middle character with the blurry blob.
  • Scenario A: The word is "Dance" (ダンス). The context strongly suggests the blob is "n".
  • Scenario B: The word is "So-so" (ソソ). The context suggests the blob is "So".

The Question: Does putting the blurry blob in a sentence help the AI guess correctly, just like it helps humans?

The Result:

  • Humans: We are great at using context. If we see "D_nce," we instantly know it's "Dance," even if the 'a' is scribbled out.
  • The AI: It got better, but not perfectly.
    • When the word had other clear clues (like another "n" elsewhere in the word), the AI started acting more like a human.
    • However, the AI still had its own "personality." Sometimes it ignored the context and stuck to its weird shape-based biases from the first test.

The Analogy: The Detective

  • Humans are like a detective who looks at the crime scene (the shape) but also checks the alibi (the context). If the alibi is strong, they ignore the blurry fingerprint.
  • The AI is like a detective who is obsessed with the fingerprint. Even if the alibi says "It's definitely the butler," the AI might still say, "But the fingerprint looks a little like the gardener!"

Why Does This Matter?

You might think, "Well, if the AI gets the right answer 95% of the time, who cares how it thinks?"

The Author's Point:
It matters because accuracy isn't everything.

  • If an AI makes a mistake, we want to know why.
  • If an AI is confident but wrong because it ignores context, that's dangerous in real life (like in medical diagnosis or self-driving cars).
  • This study shows that we can't just test AI by giving them clear pictures. We have to test them when things are blurry and confusing to see if they think like us.

The Conclusion

The paper concludes that AI and Humans are not aligned in how they handle ambiguity.

  • AI has its own "decision boundaries" that are different from ours.
  • Context helps, but it doesn't fix the AI's weird brain completely.

The Final Lesson: To truly understand if AI is "safe" or "aligned" with humans, we shouldn't just ask, "Did it get the answer right?" We need to ask, "Did it figure it out the way a human would?" And right now, the answer is: Not quite.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →