Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment

The paper introduces Agnostics, a language-agnostic reinforcement learning pipeline that enables large language models to learn coding in low-resource languages by evaluating code solely through external I/O behavior, thereby eliminating the need for language-specific datasets and infrastructure while achieving state-of-the-art performance across multiple models and languages.

Aleksander Boruch-Gruszecki, Yangtian Zi, Zixuan Wu, Tejas Oberoi, Carolyn Jane Anderson, Joydeep Biswas, Arjun Guha

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you have a brilliant, world-class chef (a Large Language Model) who can cook incredible meals in French, Italian, and American cuisine. These are the "high-resource" languages like Python and JavaScript. But if you ask this chef to cook a traditional dish from a small, remote village—say, a specific type of stew from a tiny island in the Pacific (a "low-resource" language like Fortran, Julia, or R)—they stumble. They might not know the ingredients, or worse, they might try to cook it using French techniques that just don't work.

The problem isn't just that the chef hasn't read many cookbooks from that island (lack of training data). The bigger issue is that every time you want to teach them a new island's cuisine, you have to hire a whole new team of translators, build a new kitchen, and write a new set of rules for how to taste the food. It's expensive, slow, and tedious.

Enter "Agnostics."

The paper introduces a new method called Agnostics that solves this by changing the rules of the game. Instead of asking the chef, "Did you cook this exactly like a French recipe?", Agnostics asks a much simpler question: "Does the food taste right?"

Here is how it works, broken down into simple analogies:

1. The "Black Box" Tasting Test

In the old way, to check if a chef cooked a dish correctly, you had to inspect their recipe step-by-step. If they used the wrong knife or the wrong spice, you failed them. This required a human expert who knew that specific language perfectly.

Agnostics uses a "Black Box" approach.

  • The Setup: You give the chef a list of ingredients (Input) and a description of the final dish (Expected Output).
  • The Test: The chef cooks the dish. You don't care how they cooked it or what language they used. You just taste the final dish.
  • The Verdict: If the taste matches the description, the chef gets a gold star (a reward). If it tastes like mud, they get nothing.

Because the test only cares about the result (the taste), the chef can use any language to cook it. They can use a French knife, a Japanese wok, or a Swiss army knife. As long as the final stew tastes right, they win.

2. The "Universal Kitchen" (The Environment)

Usually, to teach a model a new language, you need to build a custom kitchen for that language. Agnostics builds a Universal Kitchen.

  • It's a container (like a shipping container) that holds everything needed to run code.
  • To add a new language (like R or OCaml), you just drop in a tiny, 4-line instruction manual (a YAML config file) that says: "Here is how to turn on the stove and how to serve the food."
  • Suddenly, the Universal Kitchen can handle Lua, Julia, R, OCaml, and Fortran without needing a complete rebuild.

3. The "Trial and Error" Coach (Reinforcement Learning)

The paper uses a technique called Reinforcement Learning with Verifiable Rewards (RLVR). Think of this as a coach who doesn't give you a textbook answer but lets you practice until you get it right.

  • The model tries to solve a problem.
  • The "Tasting Test" (the verifier) checks the result.
  • If it's right, the model gets a reward and learns, "Hey, that way of thinking worked!"
  • If it's wrong, the model gets no reward and tries again, adjusting its strategy.

Because the test is purely about the output, the model learns the logic of the language rather than just memorizing syntax. It learns how to think in that language.

The Results: Small Models, Big Wins

The most exciting part of this paper is what happened when they tried it.

  • They took a small, 4-billion-parameter model (think of it as a smart intern) and trained it on these low-resource languages using Agnostics.
  • The Result: This "intern" started performing as well as, or even better than, massive 70-billion-parameter models (the "Master Chefs") that had been trained on everything.
  • They tested this on five difficult languages (Lua, Julia, R, OCaml, Fortran) and the small model became a master of all of them.

Why This Matters

Before Agnostics, if you wanted an AI to help a scientist write code in Fortran (used for weather modeling) or R (used for statistics), you had to hope the AI was already good at it, or spend months building custom tools.

With Agnostics, you can take any programming language, write a tiny 4-line config file, and instantly start training an AI to be an expert in it. It turns the process of teaching AI new languages from "building a new factory for every product" into "just changing the recipe card."

In short: Agnostics stops worrying about how the code is written and starts focusing on what the code does. It's the difference between grading a student on their handwriting versus grading them on whether they actually solved the math problem. And it turns out, once you stop caring about the handwriting, the students learn much faster.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →