Quantifying and extending the coverage of spatial categorization data sets

This paper demonstrates that large language models can effectively align with human spatial categorization labels to guide the strategic expansion of the Topological Relations Picture Series (TRPS), resulting in a new dataset with 42 scenes that offers superior coverage of spatial relations compared to previous extensions.

Wanchun Li, Alexandra Carstensen, Yang Xu, Terry Regier, Charles Kemp

Published Wed, 11 Ma
📖 4 min read☕ Coffee break read

Imagine you are trying to map out the entire world of "where things are." You want to know how people in different countries describe the position of a cup on a table, a bird in a cage, or a shadow on a wall.

For decades, researchers have used a specific "photo album" called the TRPS (Topological Relations Picture Series) to do this. It has 71 pictures showing objects in various positions. But here's the problem: The album is incomplete. It's like trying to map the entire ocean using only a few pictures of the shoreline. It misses vast areas of the "ocean" of spatial language, especially for languages other than English.

This paper is about how the authors used Artificial Intelligence (specifically Large Language Models or LLMs) to fix this map, fill in the missing pieces, and figure out which new pictures and languages are most important to add next.

Here is the breakdown of their approach using simple analogies:

1. The Problem: The "Missing Puzzle Pieces"

Think of the existing 71 pictures as a puzzle that only shows the edges. Researchers know there are many more ways to describe space (like "among," "under," "left of," or "outside"), but they don't have pictures for all of them.

  • The Challenge: To make a complete map, they need to add hundreds of new pictures and test them in dozens of languages. Doing this by hand (hiring humans to draw pictures and label them) is too slow and expensive.

2. The Solution: The "AI Intern"

The authors decided to use an AI (specifically a model called Gemini) as a super-fast research assistant.

  • The Experiment: They showed the AI 220 different pictures (the old 71 plus 150 new ones) and asked it: "If you were a native speaker of Spanish, Chinese, or French, what word would you use to describe this picture?"
  • The Test: They checked if the AI's answers matched what real humans would say.
  • The Result: The AI was surprisingly good! It agreed with human answers about 80–90% of the time. It's not perfect, but it's accurate enough to be a reliable "draftsman."

3. The Strategy: The "Coverage Map"

Now that they had an AI that could guess labels for any language and any picture, they needed a way to decide which new pictures to actually test with real humans. They didn't want to just add random pictures; they wanted to fill the "gaps" in the map.

They used a concept called Coverage, which is like a net catching fish:

  • Imagine the "universe of all possible spatial scenes" is a huge ocean full of different fish (scenes).
  • The old 71 pictures are a small net that only catches the fish near the shore.
  • The goal is to cast a bigger net that catches fish from the deep ocean, the reefs, and the open sea.
  • The AI helped them simulate casting nets in different places. They asked: "If we add this new picture, does it catch a type of 'fish' (spatial concept) that our current net is missing?"

4. The Outcome: A Better Map

Using this AI-assisted strategy, they created a new set of 42 pictures (called the LCXRK set).

  • The Result: When they measured how much of the "ocean" these new pictures covered, they found that their new set covered the space much better than previous attempts.
  • The Language Test: They also used the AI to figure out which languages were missing from their study. The AI suggested that Portuguese and Romanian were very different from the languages they already had, so those should be the next ones to test with real humans.

5. Why This Matters

This paper isn't saying "AI will replace human scientists." Instead, it's saying "AI is the best tool to help us plan our experiments."

  • Before: Researchers had to guess which pictures to draw and which languages to study.
  • Now: They can use AI to simulate thousands of scenarios, find the gaps in their knowledge, and then use real humans to verify the most important ones.

In a nutshell: The authors used AI to build a better "menu" of spatial descriptions. They used the AI to taste-test thousands of combinations, figured out which dishes (scenes and languages) were missing from the menu, and created a new, more complete menu that covers the whole world of how we describe "where things are."