Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

This paper presents a case study on developing a Multimodal Large Language Model for the low-resource Basque language, demonstrating that strong performance can be achieved with approximately 20% Basque multimodal data and that a Basque-adapted language model backbone is not strictly necessary.

Lukas Arana, Julen Etxaniz, Ander Salaberria, Gorka Azkune

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you have a brilliant, world-traveled chef (a Large Language Model) who can cook amazing meals in English. This chef knows everything about the world, can describe pictures, and answer questions about them. However, if you ask them to cook a traditional Basque dish or describe a picture using the Basque language, they stumble. They might understand the ingredients, but they don't know the local recipes or the specific words to describe the flavors.

This paper is about teaching that world-famous chef how to cook delicious Basque meals without needing to hire a new chef from the Basque Country from scratch.

Here is the story of how the researchers did it, broken down into simple concepts:

1. The Problem: The "Language Gap"

Right now, the smartest AI models are like chefs trained mostly on English recipes. If you ask them about low-resource languages (languages with very little data on the internet, like Basque), they perform poorly. It's like asking a French chef to cook a traditional Scottish stew; they might guess, but the result won't be authentic or accurate.

2. The Solution: Building a New Kitchen

Since there were no existing "Basque recipe books" (datasets) for teaching AI about images and text, the researchers had to create them from scratch.

  • The Translation Factory: They took huge libraries of English image descriptions and questions (like "What is in this picture?") and translated them into Basque.
  • The Result: They built a massive new library containing over 3 million image-text pairs in Basque. Think of this as creating a massive, high-quality cookbook specifically for Basque cuisine.

3. The Experiment: Two Different Chefs

The researchers tested two different "chefs" (AI backbones) to see who could learn Basque best:

  1. The English Specialist (Llama): A chef who only speaks English and knows the world, but has never heard of Basque.
  2. The Basque Native (Latxa): A chef who already speaks Basque fluently and knows the local culture.

They trained both chefs using a mix of English and Basque "recipes" (data) to see who would become the better Basque cook.

4. The Big Surprises (The Findings)

Surprise #1: You Don't Need a Full Basque Library
The researchers thought they needed a library that was 100% Basque to get good results. Instead, they found that just 20% Basque data mixed with 80% English data was enough to create a top-tier Basque chef.

  • The Analogy: It's like learning to cook a specific regional dish. You don't need to live in that region your whole life. If you have a great base of general cooking skills (English) and just a few specific local recipes (20% Basque), you can still make an amazing meal.

Surprise #2: The English Chef is Just as Good
They expected the "Basque Native" chef (Latxa) to be much better because they already knew the language. But the "English Specialist" (Llama) performed almost exactly the same!

  • The Analogy: It turns out that if you teach a generalist chef a few specific local recipes, they can cook the local dish just as well as a local chef who was born there. You don't need a native speaker to build a strong Basque AI; you just need a smart generalist with a few Basque instructions.

Surprise #3: Text-Only Practice Helps
They also found that if the chef practiced only writing in Basque (without looking at pictures), it actually helped them get better at looking at pictures and describing them in Basque.

  • The Analogy: It's like practicing your vocabulary by reading a book in a foreign language. Even if you aren't looking at pictures, reading the words helps your brain understand how to describe those pictures later.

5. The Conclusion: A Blueprint for the World

The main takeaway is that we don't need to build a massive, expensive, native-language AI from the ground up for every small language in the world.

Instead, we can take a powerful, general AI (like the English chef), give it a small dose of local data (20% Basque), and maybe some text-only practice, and it will become a strong, capable AI for that language.

Why does this matter?
This is a "recipe" that can be used for hundreds of other low-resource languages (like Welsh, Catalan, or indigenous languages) that currently get ignored by big tech. It opens the door for these languages to join the AI revolution without needing millions of dollars in data collection.

In short: You don't need a native speaker to teach an AI a new language; you just need a smart teacher and a few good textbooks.