PsihoRo: Depression and Anxiety Romanian Text Corpus

The paper introduces PsihoRo, the first open-source Romanian corpus for depression and anxiety, which was constructed using open-ended responses and standardized screening surveys (PHQ-9 and GAD-7) from 205 participants to address the lack of mental health resources for the Romanian language in NLP.

Alexandra Ciobotaru, Ana-Maria Bucur, Liviu P. Dinu

Published Thu, 12 Ma
📖 4 min read☕ Coffee break read

Imagine you are trying to understand the mood of a whole country by listening to its people talk. For years, scientists have been able to do this easily in English, using vast libraries of text to spot signs of sadness or worry. But for Romanian speakers, this library was empty. There was no "Rosetta Stone" to translate their words into data about mental health.

This paper introduces PsihoRo, a new, open-source library designed to fill that gap. Think of it as building the first dedicated "mental health map" specifically for the Romanian language.

Here is a simple breakdown of how they built it and what they found:

1. How They Built the Map (The Data Collection)

Instead of just scrolling through social media (which is like listening to people shout their best moments at a party), the researchers invited 205 people to sit down and have a quiet, honest conversation.

  • The Interview: They asked six open-ended questions. Some were about heavy topics like war, politics, and stress (the "stormy weather"), while others were about resilience and community (the "sunshine").
  • The Check-Up: To make sure the data was accurate, they didn't just guess who was struggling. They gave everyone two standard medical "thermometers": the PHQ-9 (for depression) and the GAD-7 (for anxiety). These tools give a score to see how much someone is struggling.
  • The Result: They ended up with a collection of 205 honest stories, paired with medical scores, all completely anonymous.

2. The Detective Work (The Analysis)

Once they had the stories, they used three different "magnifying glasses" to look for patterns:

  • The Word Counter (LIWC): They used a special software that counts specific types of words.
    • The Big Surprise: In English, using the word "I" a lot is a classic sign of depression. But in Romanian, this didn't work. Romanian is a "pro-drop" language, meaning you can often drop the subject "I" from a sentence without it sounding weird. So, the researchers found that counting "I" is useless for Romanian mental health analysis.
    • What actually worked? They found that people with higher anxiety or depression scores used more words about their bodies (like "headache" or "stomach"), more words showing uncertainty ("maybe," "perhaps"), and fewer words about success or fun.
  • The Emotion Radar: They built an AI to detect emotions like sadness, fear, or joy.
    • People at risk for depression spoke a lot about sadness.
    • People at risk for anxiety spoke a lot about fear and trust (often worrying about their families).
    • Interestingly, the emotion of "surprise" was completely missing from the data!
  • The Theme Finder (Topic Modeling): They grouped the conversations by theme.
    • Everyone was worried about politics and the economy (a common theme in Romania right now).
    • However, the at-risk groups focused more on how these external stresses were draining their energy, affecting their sleep, and making them feel isolated from their communities.

3. Why This Matters

Think of this research as giving a translator to a doctor who speaks English but needs to treat a Romanian patient.

  • No More Guessing: Before this, researchers had to assume English rules applied to Romanian. This paper proves that languages have different "fingerprints" for mental health.
  • A First Step: While 205 people might sound like a small crowd, it's a massive step forward for a language that previously had zero resources. It's the foundation upon which better AI tools, better therapy apps, and more targeted support for Romanians can be built.

In a nutshell: The researchers built a new, honest library of Romanian stories about mental health. They discovered that to understand Romanian sadness and worry, you have to look at what they say about their bodies and their uncertainty, not just how often they say "I." It's a crucial first step toward helping the Romanian population feel seen and understood.