RILEC: Detection and Generation of L1 Russian Interference Errors in English Learner Texts

Imagine you are trying to learn to play the piano, but your brain is secretly trying to play the violin at the same time. Every time you press a piano key, your muscle memory from the violin makes your finger slip to the wrong note. In the world of language learning, this is called L1 Interference. It's when your native language (like Russian) "hacks" your second language (English), causing you to make specific, predictable mistakes.

For example, a Russian speaker might write "I will have enough time" instead of "If we have enough time," because in Russian, the grammar for "if" and "will" works differently. Or they might write "cassa" instead of "cashier" because they are literally spelling the Russian word with English letters.

This paper introduces a new tool called RILEC to help teachers and students catch these specific "violin slips" in English essays. Here is how they built it, explained simply:

1. The Problem: Not Enough "Mistake" Data

To teach a computer to spot these specific Russian-style errors, you need a massive library of examples. But real student essays are hard to find, and even harder to find where someone has carefully labeled why the mistake happened. It's like trying to teach a doctor to diagnose a rare disease when you only have five patient files.

2. The Solution: The "Mistake Factory" (RILEC)

The authors built RILEC (Russian L1 Interference Learner English Corpus). Think of this as a giant, high-tech "Mistake Factory."

They started with a real collection of essays from Russian students (about 6,000 sentences). But they knew that wasn't enough to train a super-smart AI. So, they invented three different ways to manufacture new, realistic mistakes to fill the gaps:

The "Robot Coach" (PPO-Optimized Models): Imagine a robot that has read thousands of essays. They trained this robot using a special technique called PPO (think of it as a video game reward system). Every time the robot made a mistake that looked like a real Russian learner's error, it got a "gold star." If it made a normal mistake, it got nothing. Over time, the robot learned to generate thousands of new sentences that sound exactly like a Russian learner struggling with English.
The "Rule Book" (Rule-Based): For some very specific errors (like mixing up tenses or spelling Russian words with English letters), they wrote a strict set of instructions. It's like a mad-libs game: "Take this sentence, find the year, and change the verb to the wrong tense." This ensures they get plenty of examples for the tricky, rule-heavy errors.
The "Creative Writer" (LLM Prompting): They asked a very smart AI (like a creative writing assistant) to look at a real mistake and say, "Okay, make up a new story that uses this exact same mistake." This helped them generate errors that were more natural and varied.

By combining these three methods, they expanded their dataset from 6,000 sentences to 18,000+ sentences. It's like turning a small seed into a massive forest of examples.

3. The Result: A Super-Spotter

Once they had this massive library of "Mistake Factory" data, they trained a new AI model to be a Super-Spotter.

Before: If you showed a computer a student essay, it might say, "This sentence is wrong," but it wouldn't know why. It's like a teacher saying, "You got this wrong," without explaining the rule.
After: The new model, trained on RILEC, can say, "You used the wrong tense because you are thinking in Russian," or "You spelled 'cashier' as 'cassa' because of transliteration."

4. Why This Matters

The paper found that this "Mistake Factory" approach worked incredibly well.

Accuracy: The model got very good at spotting specific errors like Transliteration (spelling Russian words in English) and Tense Semantics (using the wrong time tense), scoring over 90% accuracy on those.
The "Human" Touch: Even though the data was made by machines, the mistakes felt real. The model learned the style and logic of a Russian learner, not just random typos.

The Big Picture

Think of this research as building a specialized translator for errors. Instead of just fixing the grammar, it translates the student's brain back to the teacher. It explains, "Ah, I see what happened! You were thinking in Russian, and your brain translated 'cassa' directly."

This helps teachers give better feedback and helps students understand why they are making mistakes, rather than just being told they are wrong. It turns the frustrating process of learning a new language into a clearer, more logical journey.

RILEC: Detection and Generation of L1 Russian Interference Errors in English Learner Texts

1. The Problem: Not Enough "Mistake" Data

2. The Solution: The "Mistake Factory" (RILEC)

3. The Result: A Super-Spotter

4. Why This Matters

The Big Picture

1. Problem Statement

2. Methodology

A. The RILEC Dataset

B. Data Augmentation Strategies

C. Model Training & Evaluation

3. Key Contributions

4. Results

5. Significance and Future Work

RILEC: Detection and Generation of L1 Russian Interference Errors in English Learner Texts

1. The Problem: Not Enough "Mistake" Data

2. The Solution: The "Mistake Factory" (RILEC)

3. The Result: A Super-Spotter

4. Why This Matters

The Big Picture

1. Problem Statement

2. Methodology

A. The RILEC Dataset

B. Data Augmentation Strategies

C. Model Training & Evaluation

3. Key Contributions

4. Results

5. Significance and Future Work

More like this

One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations

MultiGraSCCo: A Multilingual Anonymization Benchmark with Annotations of Personal Identifiers

ConFu: Contemplate the Future for Better Speculative Sampling

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance