Understanding Wikidata Qualifiers: An Analysis and Taxonomy

This paper analyzes the semantics and usage of Wikidata qualifiers to develop a refined taxonomy based on frequency and diversity metrics, aiming to improve knowledge graph querying, inference, and contributor guidance.

Gilles Falquet, Sahar Aljalbout

Published 2026-03-13
📖 5 min read🧠 Deep dive

Imagine Wikidata as a massive, global library where every book (or "item") has a card catalog entry. Usually, these entries are simple facts: "George C. Scott was married to Colleen Dewhurst."

But life is complicated. What if they were only married for five years? What if that marriage ended in divorce? What if the source of this information is just a rumor?

In a simple library card, you can't write all that without making a mess. In Wikidata, they use "Qualifiers." Think of qualifiers as sticky notes you can attach to a fact to add context, dates, or warnings.

This paper is like a team of librarians who decided to organize the entire "Sticky Note Department" because there were too many different types of notes, and nobody knew which one to use when.

Here is the breakdown of their findings, explained with everyday analogies:

1. The Problem: Too Many Sticky Notes

The authors looked at the Wikidata database and found 2,240 different types of sticky notes (qualifiers).

  • The Confusion: When a volunteer tries to add a fact, they are overwhelmed. "Do I use the 'Start Date' note? The 'Time Period' note? The 'Happened In' note?"
  • The Mess: Some notes are used millions of times (like "Start Date"), while others are used only once or twice. Some notes are used correctly; others are used in ways the creators never intended.

2. The Method: How to Find the "Important" Notes

The authors didn't just look at how often a note was used (Frequency). They also looked at how versatile it was (Diversity).

  • The Analogy: Imagine a tool in a toolbox.
    • Hammer: Used 1,000 times, but only for nails. (High frequency, low diversity).
    • Swiss Army Knife: Used 100 times, but for cutting, screwing, opening bottles, and sawing. (Lower frequency, high diversity).
  • The Discovery: The authors realized that the most "important" qualifiers are the Swiss Army Knives—those that help describe facts in many different ways. They used a math formula (borrowed from ecology to measure biodiversity) to find the top 300 most useful notes.

3. The Solution: The New "Sticky Note" Taxonomy

The authors created a new filing system (a taxonomy) to sort these 300 top notes into four main drawers. This helps users know exactly which drawer to open.

Drawer A: The "When and Where" (Context/Validity)

These notes tell you when or where a fact is true. Without them, the fact might be false.

  • Analogy: A "Sale" sign in a store window.
  • Example: "The store is open" is only true until 5 PM (Start/End time) or only in Germany (Valid in place).
  • Key Notes: Start time, End time, Valid in place.

Drawer B: The "How Sure Are We?" (Epistemic/Uncertainty)

These notes tell you how confident we are about the fact.

  • Analogy: A weather forecast.
  • Example: "It will rain tomorrow" is a hypothesis (uncertain) or circa 1900 (imprecise).
  • Key Notes: Sourcing circumstances (e.g., "hypothesis"), Earliest date, Latest date.

Drawer C: The "Structure" (Structural)

These notes act like labels on parts of a machine. They don't change the meaning of the fact; they just organize the data so computers can read it better.

  • Analogy: A recipe card where "2 cups" is the amount and "flour" is the ingredient. You need both to make sense of the number.
  • Example: If you say a gene starts at position "100," that number is useless unless you say which chromosome it's on. The "Chromosome" note is a structural qualifier.
  • Key Notes: Chromosome, Astronomical filter, Catalog.

Drawer D: The "Extra Info" (Additional)

These are the bonus facts. They don't change the truth of the statement; they just add flavor.

  • Analogy: A movie poster that says "Starring Tom Hanks" (the main fact) and adds "Filmed in 1994" or "Directed by Steven Spielberg" (the extra notes).
  • Sub-types:
    • Sequence: "He was the 39th President."
    • Cause/Effect: "The bridge collapsed because of an earthquake."
    • Role: "Tom Hanks played the role of a pilot."
    • Source: "This fact comes from a 2020 Census."

4. Why This Matters (The "So What?")

The authors argue that this new filing system solves three big problems:

  1. For the Volunteer (The Creator): Instead of guessing which sticky note to use, they can look at the "When/Where" drawer or the "Uncertainty" drawer and find the right tool instantly.
  2. For the Computer (The Query): When you ask Wikidata a complex question like, "Show me all marriages that ended in divorce between 1960 and 1970," the computer can now understand that "divorce" is a cause and "1960-1970" is a time context. It makes searching much smarter.
  3. For Future Libraries: If anyone wants to build a new knowledge graph (a digital brain), they shouldn't just copy Wikidata's messy list. They should use this new, organized structure from the start.

The Bottom Line

This paper is a guidebook for organizing the chaos of "extra details" in a giant database. By sorting sticky notes into logical categories (Time, Certainty, Structure, and Extras), they made Wikidata easier for humans to write and easier for computers to understand. It turns a messy pile of notes into a well-organized filing cabinet.