Goldfish: Monolingual Language Models for 350 Languages

Imagine the world of artificial intelligence as a giant library. For a long time, the librarians (AI researchers) have been building one massive, super-detailed encyclopedia that tries to cover every language on Earth at once. These are the "multilingual models" like BLOOM or XGLM. They are like a single, enormous chef trying to cook a perfect meal for 7,000 different cultures simultaneously.

The problem? When this giant chef tries to cook a meal for a small, remote village (a low-resource language), the food often comes out burnt or bland. The chef is so busy juggling the huge, popular cuisines (like English, Spanish, or Chinese) that they don't have enough time, ingredients, or attention to get the small village dishes right.

Enter "Goldfish."

This paper introduces a new project called Goldfish. Instead of one giant chef, the researchers decided to hire thousands of tiny, specialized chefs. Each Goldfish model is a small, single-language expert dedicated to just one language.

Here is the breakdown of what they did and why it matters, using some simple analogies:

1. The "Big Fish" vs. The "Goldfish"

The Big Fish (Multilingual Models): These are the massive AI models everyone talks about. They are huge (billions of parameters) and trained on terabytes of data. But because they are so big, they often treat low-resource languages as an afterthought. It's like a giant ocean liner trying to dock at a tiny, rocky pier; it just doesn't fit well, and the passengers (the AI's predictions) get seasick.
The Goldfish (Monolingual Models): The researchers created over 1,000 tiny models. Each one is small (only 125 million parameters) and focuses on a single language. They are like a fleet of small, agile boats. Even though each boat is small, they are perfectly designed for their specific harbor.

2. The "Memory" Problem

The paper makes a funny joke about the name "Goldfish." Real goldfish are famous for having short memories. The researchers say their models are like goldfish: they are small, there are many of them, and they don't have "memories" of the entire world's data.

But here's the twist: For low-resource languages, having a short memory is actually a superpower.

The giant multilingual models try to remember everything, which confuses them when they try to speak a rare language.
The Goldfish models only remember the specific language they were trained on. Because they aren't distracted by 500 other languages, they actually speak the target language much more clearly and grammatically.

3. The "Recipe" (Data)

Imagine you want to learn to cook a specific dish, say "Quechua Stew."

The Old Way: You give a master chef a giant bag of ingredients containing 10,000 pounds of beef, 5,000 pounds of potatoes, and only 10 grams of the special Quechua spices. The chef makes a stew, but it tastes mostly like beef.
The Goldfish Way: You give a small, dedicated cook a small bag containing only the perfect amount of Quechua spices and ingredients. Even though the bag is small, the cook makes a perfect stew because they aren't distracted by the other 9,990 pounds of beef.

The researchers found that for many languages, these small, dedicated models actually performed better than the giant ones, even though the giant ones had way more total data. In fact, for some languages, the Goldfish models were so good they beat the giant models, and sometimes even beat simple "Bigram" models (which are like a toddler guessing the next word based only on the word immediately before it).

4. The "Grammar" vs. "Reasoning" Test

The researchers tested the Goldfish in two ways:

Grammar (Can it speak correctly?): The Goldfish passed with flying colors. They generated text that sounded natural and grammatically correct for their specific languages.
Reasoning (Can it solve logic puzzles?): The Goldfish struggled here. They were about as smart as a random guess.

The Analogy: Think of the Goldfish as a local tour guide.

If you ask the guide, "What is the best way to say 'hello' in this village?" or "How do I order coffee here?", they will give you a perfect, grammatically correct answer.
But if you ask the guide, "Explain the theory of relativity" or "Solve this complex math problem," they will be lost. They know their language perfectly, but they haven't been trained to be a genius philosopher.

Why This Matters

For a long time, if you spoke a language with few speakers (like a language with only 100,000 speakers), you were ignored by AI. The big models didn't care enough about you.

Goldfish changes the game by saying: "We don't need a giant brain to speak your language well. We just need a small, focused one."

They released these models for 350 different languages, many of which had never had a dedicated AI model before. This is like handing a dictionary and a grammar book to 350 different communities that were previously left out of the digital conversation.

The Bottom Line

The paper proves that bigger isn't always better. For the world's smaller languages, a small, specialized, monolingual "Goldfish" model is often a much better speaker than a giant, distracted multilingual "Whale." It's a step toward making AI fairer and more inclusive for everyone, not just the speakers of the world's biggest languages.

Goldfish: Monolingual Language Models for 350 Languages

1. The "Big Fish" vs. The "Goldfish"

2. The "Memory" Problem

3. The "Recipe" (Data)

4. The "Grammar" vs. "Reasoning" Test

Why This Matters

The Bottom Line

1. Problem Statement

2. Methodology

Data Collection and Processing

Model Architecture and Training

3. Key Contributions

4. Results and Evaluation

Perplexity (FLORES-200)

Grammaticality (MultiBLiMP)

Reasoning Benchmarks

5. Significance and Discussion

Goldfish: Monolingual Language Models for 350 Languages

1. The "Big Fish" vs. The "Goldfish"

2. The "Memory" Problem

3. The "Recipe" (Data)

4. The "Grammar" vs. "Reasoning" Test

Why This Matters

The Bottom Line

1. Problem Statement

2. Methodology

Data Collection and Processing

Model Architecture and Training

3. Key Contributions

4. Results and Evaluation

Perplexity (FLORES-200)

Grammaticality (MultiBLiMP)

Reasoning Benchmarks

5. Significance and Discussion

More like this

Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

Markovian Generation Chains in Large Language Models