VoxKnesset: A Large-Scale Longitudinal Hebrew Speech Dataset for Aging Speaker Modeling

This paper introduces VoxKnesset, a large-scale open-access dataset of 2,300 hours of longitudinal Hebrew parliamentary speech spanning 2009–2025, which is used to benchmark and demonstrate the challenges of speaker verification and age prediction over time, revealing significant performance degradation in standard models as speakers age.

Yanir Marmor, Arad Zulti, David Krongauz, Adam Gabet, Yoad Snapir, Yair Lifshitz, Eran Segal

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you have a friend who you've known for 15 years. If you asked a computer to recognize their voice today, it would be easy. But if you asked that same computer to recognize them using a recording from 15 years ago, it might get confused. Why? Because people's voices change as they age, just like their faces do. They get deeper, raspier, or softer.

Most computer systems today are trained on "snapshots"—they hear a voice once and assume it never changes. This paper introduces a solution to that problem: a massive new library of voices called VoxKnesset.

Here is the breakdown of what the researchers did, using some everyday analogies.

1. The Problem: The "Time Travel" Gap

Think of current voice AI (like Siri or Alexa) as a photographer who only takes one photo of you and then tries to recognize you in a year. If you've gained weight, grown a beard, or just aged, the photo doesn't match the reality.

For years, scientists didn't have enough data to teach computers how voices change over time. They had:

  • High-quality photos: But only one per person (like a yearbook).
  • Long-term videos: But they were blurry, small, or didn't have accurate names/ages attached.

They needed a "Time-Lapse Video" of human voices that was also accurate and huge.

2. The Solution: The "Parliamentary Time Machine"

The researchers found the perfect place to get this data: The Israeli Knesset (Parliament).

Why there?

  • The "Stage": Members of Parliament (MPs) speak in the same room, with similar microphones, for decades. It's a controlled environment.
  • The "Cast": There are hundreds of MPs. Some serve for 15+ years.
  • The "Script": The government keeps perfect, verified records of who spoke, when, and exactly what they said. No guessing ages from blurry photos; the records are official.

They built VoxKnesset, a dataset containing 2,300 hours of Hebrew speech from 393 different people, recorded over 16 years (2009–2025). It's like having a time-lapse camera on 393 different people, capturing their voices from their 30s all the way to their 80s.

3. The Experiment: Testing the "Aging" AI

The team took this new library and tested the world's best voice AI models to see how they handled aging. They asked three big questions:

A. Can the AI tell how old someone is?

  • The Result: Yes, but with a catch.
  • The Analogy: Imagine trying to guess someone's age by looking at a single photo. You might be right. But if you try to guess how much they aged between two photos taken 10 years apart, the AI gets confused.
  • The Finding: The AI was good at guessing a person's age based on a single snapshot (cross-sectional). But if you asked it to track the change over time, it failed. It couldn't tell the difference between "Person A is old" and "Person A got older."

B. Does aging break voice security?

  • The Result: Yes, significantly.
  • The Analogy: Imagine your voice is a key to your house. If you lose weight or get a cold, the key might still fit. But if you age 15 years, the key might not fit at all.
  • The Finding: The researchers tested "Speaker Verification" (using voice as a password). Over a 15-year gap, the error rate more than doubled. A system that was 98% accurate became much less reliable. This is a huge problem for banks or security systems that rely on voice ID.

C. Can we fix it?

  • The Result: Yes, if we train the AI differently.
  • The Analogy: Instead of teaching the AI to recognize a "static photo," we taught it to recognize a "movie."
  • The Finding: When they trained a model specifically on pairs of recordings from the same person (e.g., "This is the same guy, 5 years apart"), the AI learned to track the aging process. It could finally see the "temporal signal"—the subtle changes that happen as a person grows older.

4. Why This Matters

This isn't just about Hebrew speakers or politicians. It's about the future of how we interact with machines.

  • Security: If your voice is your password, we need to know how to update that password as you age, so you don't get locked out of your bank account in 10 years.
  • Health: Doctors might use voice analysis to detect diseases like Parkinson's or Alzheimer's. To do that, they need to know what "normal aging" sounds like so they don't mistake a natural change for a disease.
  • Fairness: Most AI is trained on young voices. This dataset helps make AI work better for older adults, who are often left out of the tech world.

The Bottom Line

The authors released this massive dataset (VoxKnesset) to the public. They are essentially handing the world a "Time-Lapse Video" of human voices to help engineers build smarter, more human-like AI that understands that we all change with time.