Computational lexical analysis of Flamenco genres

This study employs computational lexical analysis and machine learning on over 2,000 Flamenco lyrics to accurately classify traditional genres (*palos*), identify their unique semantic fields, and map inter-genre relationships that reveal historical connections and evolutionary patterns within this cultural heritage.

Pablo Rosillo-Rodes, Maxi San Miguel, David Sanchez

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine Flamenco not just as music you hear, but as a massive, living library of stories. For centuries, these stories have been passed down orally, like secret family recipes, from generation to generation. They are rich, emotional, and deeply tied to the history of Spain. But until now, no one had tried to count the ingredients in these recipes to see exactly what makes each dish unique.

This paper is like hiring a super-smart, tireless robot chef to taste-test over 2,000 Flamenco songs and figure out exactly what "flavor" belongs to which style.

Here is the breakdown of their delicious discovery:

1. The Problem: Too Many Styles, Too Much Confusion

Flamenco has many different "styles" (called palos). Think of these like genres in music: some are sad and slow, others are fast and party-like. Traditionally, experts tell them apart by listening to the rhythm or the guitar chords. But sometimes, even experts argue about whether a song is one style or another.

The researchers asked: "Can we tell these styles apart just by reading the lyrics, like a detective solving a case by looking at the words used?"

2. The Method: The "Word Detective"

The team used a computer program (a type of Artificial Intelligence called a "Multinomial Naive Bayes classifier") to act as a word detective. They fed it thousands of lyrics and asked it to learn the "vocabulary fingerprint" of each style.

  • The Training: Imagine showing the robot 100 songs of a sad style (Seguiriyas) and 100 songs of a party style (Bulerías). The robot learns that the sad songs often use words like "pain," "soul," and "God," while the party songs use words like "dance," "wine," and "beautiful."
  • The Test: Then, they gave the robot new songs it had never seen before and asked, "What style is this?"

3. The Results: The Robot Got It Right!

The robot was surprisingly good at its job. It could correctly identify the style of a song just by its words about 85% to 90% of the time for the most distinct styles.

  • The "Sad" Styles: Styles like Seguiriyas and Soleá were like a dark, heavy coat. Their lyrics were full of words about suffering, death, the soul, and God. They spoke the language of deep emotion and Gypsy heritage.
  • The "Party" Styles: Styles like Bulerías and Alegrías were like a bright, colorful carnival. Their lyrics were full of geography (names of cities like Cádiz and Seville), love, and celebration.
  • The "Love" Styles: Some styles, like Fandangos, were obsessed with romance, using words like "woman," "heart," and "five senses."

4. The "Family Tree" of Flamenco

The most exciting part of the paper is when they mapped out how these styles are related. They treated the lyrics like DNA. If two styles use very similar words, they are "cousins." If they use totally different words, they are "strangers."

They built a network map (a family tree) that revealed:

  • The "Gypsy" Branch: Seguiriyas and Soleá are very close relatives, sharing a deep, ancient vocabulary of pain and spirituality.
  • The "Malaga" Branch: Malagueñas and Fandangos are siblings, likely born from the same region (Malaga) and sharing themes of love and sorrow.
  • The "Tango" Cousins: Tangos and Tientos are so similar in their word choices that the computer thought they were almost the same thing. This confirms a historical theory that Tientos is just a slow version of Tangos.
  • The "Hub": Bulerías acted like the "popular kid" at school who knows everyone. It has the most diverse vocabulary and connects all the other styles together, likely because it's the style used for the final, wild party at the end of a Flamenco show.

5. Why This Matters

Before this study, understanding the history of Flamenco was like trying to understand a family tree by just looking at old photos. It was subjective and open to debate.

This study is like getting a DNA test for Flamenco. It proves that:

  1. Words matter: The lyrics aren't just random; they are a coded map of the culture's history, geography, and emotions.
  2. History is hidden in the text: The computer found historical connections (like the link between Tangos and Tientos) just by counting words, confirming what human historians suspected but couldn't prove with numbers.
  3. It's a new way to listen: We can now "read" the soul of a Flamenco genre without even hearing the music.

In a nutshell: The researchers used a computer to read thousands of Flamenco songs and discovered that every style has its own unique "word personality." By mapping these personalities, they drew a new family tree that shows how these musical styles are related, proving that the words of Flamenco are just as important as the music itself.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →