Multilingual AI-Driven Password Strength Estimation with Similarity-Based Detection

Imagine you are trying to build a digital bouncer for a nightclub (your website). This bouncer's job is to stand at the door and check everyone's ID (their password) to make sure it's strong enough to keep the bad guys out.

For a long time, this bouncer was a bit old-fashioned. He only knew how to check for specific rules: "Do you have a capital letter? A number? A symbol?" But hackers are smart; they know these rules and can easily guess passwords that follow them.

This research paper introduces a super-smart, multilingual bouncer who learns by listening to how real people actually create passwords, rather than just following a rulebook. Here is the story of how they built him, broken down simply.

1. The Old Way vs. The New Way

The Old Way (PassGAN):
Previously, to train a bouncer, researchers used a complex, heavy-duty machine called a GAN (Generative Adversarial Network). Think of this like a robot chef that needs to be fed millions of leaked passwords (stolen from data breaches) to learn how to cook up new, realistic passwords.

The Problem: This robot is expensive to run, needs a massive kitchen (computer power), and requires a diet of stolen data, which is ethically messy.

The New Way (ChatGPT):
The researchers asked: "What if we just asked a super-intelligent language assistant (like ChatGPT) to cook up a list of passwords instead?"

The Analogy: Instead of a robot chef grinding through millions of data points, you just ask a knowledgeable friend, "Hey, give me 6,000 realistic passwords that people in India or the UK might use."
The Result: Surprisingly, this "friend" did just as good a job as the heavy-duty robot chef, but it was faster, cheaper, and didn't need to eat stolen data.

2. The "Multilingual" Twist

Most password bouncers only speak English. But people in India (and everywhere else) often mix languages. They might use an English word like "Love" combined with an Indian name like "Raja" or a food item like "Dosa."

The Experiment: The researchers taught their new bouncer three things:
1. English-only passwords.
2. Indian-only passwords (using names, foods, and cultural words).
3. Mixed passwords (English + Indian).

They found that the Mixed bouncer was the strongest. Why? Because real humans are messy! We mix languages when we think of passwords. By training the bouncer to understand this mix, it became much better at guessing what a real human would type.

3. The "Fuzzy Match" Detective

Here is the cleverest part. In the past, if a hacker guessed "Password123" and the real password was "Password124," the system would say, "No match, you failed."

But hackers are sneaky. They often guess passwords that are almost right.

The Old Method: A strict librarian who only accepts the exact book title.
The New Method (Jaro Similarity): A fuzzy detective.

The researchers used a tool called the Jaro function. Imagine the detective looks at two passwords and says, "These aren't identical, but they look 80% similar. That's close enough to be a threat!"

They set a "similarity threshold" of 0.5. If the passwords look more than 50% alike, the bouncer flags them as weak. This mimics how real hackers actually attack—by guessing variations, not just exact copies.

4. The Big Wins

The results were impressive:

The Indian Test: When they tested the bouncer on real Indian passwords, it got a 99.97% match rate. It was almost perfect! This is huge because, until now, no one had built a specialized bouncer for Indian passwords.
The English Test: The new ChatGPT method beat the old PassGAN robot in some areas and matched it in others, proving you don't need the heavy robot anymore.
The Mixed Test: The bouncer trained on mixed languages performed better than the one trained only on English, proving that cultural context matters.

5. Why This Matters

Safety: It helps websites create better "strength meters" that tell users, "Hey, 'Raja123' is weak because it's too predictable," before they even sign up.
Ethics: We don't need to rely on stolen databases to train our security tools anymore; we can use AI to generate realistic examples safely.
Simplicity: You don't need a supercomputer to do this. A simple AI prompt can do the heavy lifting.

The Bottom Line

This paper is like upgrading a security guard from a rule-following robot to a culturally aware, multilingual human who understands that people mix languages and make small typos. By using modern AI (ChatGPT) and a "fuzzy" matching system, we can build better, fairer, and more effective password protectors for everyone, not just English speakers.

The only catch? The researchers only fed the AI a small amount of data (about 6,000 passwords) because of limits on how much ChatGPT would generate at once. If they could feed it a million, the bouncer would probably be even sharper!

Here is a detailed technical summary of the research paper "Multilingual AI-Driven Password Strength Estimation with Similarity-Based Detection" by Nikitha M. Palaniappan and Ying He.

1. Problem Statement

The paper addresses the persistent vulnerability of user-chosen passwords, which often follow predictable patterns despite security policies. Traditional Password Strength Meters (PSMs) rely on rule-based checks or entropy calculations, which are insufficient against modern, large-scale guessing attacks driven by leaked datasets and advanced computational power.

Existing data-driven approaches, such as PassGAN (Generative Adversarial Networks), have improved password guessing but suffer from:

High Computational Cost: Training complex neural architectures requires significant resources.
Language Limitation: Most models are trained exclusively on English datasets, failing to capture the linguistic diversity of global users (specifically non-English speakers).
Rigid Matching: Traditional evaluation relies on exact matches, whereas real-world attackers often succeed with passwords that are similar but not identical to the target.
Ethical Concerns: Reliance on massive leaked datasets for training raises privacy and ethical issues.

The research aims to determine if Large Language Models (LLMs), specifically ChatGPT, can serve as a more efficient, multilingual alternative to GANs for generating realistic password datasets, and if incorporating non-English (Indian) data improves PSM performance.

2. Methodology

The study proposes a data-driven approach that replaces the complex training of GANs with prompt-based generation using ChatGPT.

A. Dataset Generation

Instead of training a neural network on leaked data, the authors used ChatGPT to generate three distinct password datasets (6,666 passwords each):

English Dataset: Common English words and patterns.
Indian Dataset: Culturally specific references (names, foods, religious terms).
Mixed Dataset: A combination of English and Indian word fragments.

Constraints: All generated passwords were 8–10 characters long and included at least one uppercase letter, one lowercase letter, one number, and one symbol to ensure structural consistency.
Prompting: Prompts were designed to generate "meaningful parts of real words" rather than random characters to mimic human behavior.

B. Testing Datasets

The generated passwords were tested against two real-world leaked datasets:

English Test: 11,356 passwords from the LinkedIn breach.
Indian Test: 7,675 passwords from a specific Indian leaked dataset.
Note: Test datasets were filtered to match the 8–10 character length constraint.

C. Similarity-Based Matching (Jaro Function)

To simulate realistic attack scenarios where attackers guess variations of a password, the study moved away from exact matching.

Metric: The Jaro similarity function was used to calculate the similarity between generated and leaked passwords.
Threshold: A similarity score of 0.5 was established as the matching threshold. Scores $\ge$ 0.5 were considered a "match."
Rationale: This accounts for typos, slight variations, and common substitutions that exact matching would miss.

D. Evaluation Metric

Performance was measured using Matching Accuracy ( $A$ ):
$A = \frac{M}{N_{test}}$
Where $M$ is the number of successful matches (similarity $\ge$ 0.5) and $N_{test}$ is the total number of passwords in the test set.

3. Key Contributions

ChatGPT as a PassGAN Alternative: The study demonstrates that ChatGPT can generate realistic password lists comparable to PassGAN without the need for computationally expensive training or access to massive leaked datasets.
Multilingual PSM Development: This is the first work to develop and evaluate a PSM specifically tailored for Indian passwords, addressing a significant gap in existing literature which focuses almost exclusively on English.
Similarity-Based Detection: The integration of the Jaro similarity function allows for the detection of "near-miss" passwords, providing a more accurate reflection of real-world attack success rates than exact matching.
Performance Validation: The research proves that multilingual training (mixing English and Indian data) yields higher accuracy in guessing English passwords than English-only models, suggesting that cross-lingual patterns exist in user password creation.

4. Results

The experiments yielded the following key findings (using a Jaro threshold of 0.5):

Experiment	Model / Dataset	Accuracy
English vs. English	ChatGPT (English) vs. PassGAN (English)	100%
Baseline	PassGAN vs. LinkedIn (English)	96.00%
Indian Specific	ChatGPT (Indian) vs. Leaked Indian	99.97%
Multilingual	ChatGPT (Mixed) vs. LinkedIn (English)	99.92%
English Only	ChatGPT (English) vs. LinkedIn (English)	78.08%

Key Insight: The Mixed Language model achieved the highest accuracy (99.92%) against the English LinkedIn dataset, outperforming both the PassGAN baseline (96.00%) and the English-only ChatGPT model (78.08%).
Indian Model: The Indian-specific model achieved near-perfect accuracy (99.97%) against the Indian leaked dataset, validating the efficacy of culturally aware generation.

5. Significance and Conclusion

Feasibility of LLMs: The study confirms that generative AI tools like ChatGPT are viable, efficient, and effective alternatives to GANs for password research. They offer faster development, easier dataset creation, and avoid the ethical pitfalls of using leaked databases for training.
Importance of Multilingualism: The results strongly suggest that incorporating non-English data improves the modeling of real-world password behavior. Users often mix languages or use cultural references, and a monolingual model fails to capture these nuances.
Practical Application: The proposed method provides a lightweight framework for developing secure, language-aware PSMs, particularly for under-represented languages like Indian dialects.

Limitations & Future Work:
The study was constrained by the limited number of passwords ChatGPT could generate per session (approx. 6,666 per set) compared to the millions used in PassGAN studies. Additionally, the generated passwords followed a rigid 8–10 character structure. Future work should explore larger datasets, more diverse languages (e.g., Chinese), and advanced semantic matching techniques like cosine similarity and vector embeddings.