SIEVE: Sample-Efficient Parametric Learning from Natural Language

The paper proposes SIEVE, a sample-efficient parametric learning method that leverages a novel synthetic data generation pipeline (SIEVE-GEN) to decompose natural language context and internalize it into model weights using as few as three query examples, outperforming prior distillation methods in reasoning tasks.

Parth Asawa, Alexandros G. Dimakis, Matei Zaharia

Published 2026-04-06
📖 4 min read☕ Coffee break read

Imagine you have a brilliant but very forgetful assistant (an AI model). Right now, if you want them to learn a new set of rules—like "how to calculate discounts in a store" or "the specific laws of the NBA"—you have to hand them a giant, thick instruction manual every single time they ask a question. This is called In-Context Learning. It works, but it's clumsy. The assistant has to read the whole manual for every single question, which is slow, uses up a lot of their "brain space," and they forget the rules as soon as you close the book.

The paper proposes a new method called SIEVE to fix this. Instead of handing the assistant the manual every time, SIEVE helps them memorize the rules permanently so they can answer questions without the book.

Here is the problem: Usually, to teach an AI to memorize something, you need thousands of examples and a human expert to grade every single answer. That's expensive and slow.

SIEVE solves this by being a "Smart Tutor" that works with just three examples.

Here is how it works, using a simple analogy:

The Problem: The "Kitchen Chaos"

Imagine you are teaching a chef (the AI) how to make 30 different types of soups. You have a giant cookbook with 30 different recipes (the Context).

  • Old Way (In-Context Learning): Every time the chef wants to make soup, you hand them the entire 30-recipe book. They have to flip through it to find the right one. It's messy and slow.
  • Bad Way (Traditional Learning): You try to teach the chef by making them practice all 30 soups 1,000 times. But you only have 3 examples of how to do it. They get confused and fail.

The SIEVE Solution: The "Smart Filter"

The authors realized something clever: Not every rule applies to every question.

  • If you are making Tomato Soup, you don't need the rules for Fish Soup.
  • If you are trading a basketball player, you don't need the rules about referee fouls.

SIEVE uses a three-step magic trick called SIEVE-GEN to teach the chef efficiently:

  1. The Decomposition (Breaking it down):
    Instead of treating the cookbook as one giant block, SIEVE breaks it down into individual "recipe cards" (atomic units). Now, instead of a 30-page book, you have 30 separate cards.

  2. The Back-Translation (Inventing the questions):
    SIEVE takes your 3 seed examples (the few instructions you gave it) and a "base" AI to invent thousands of new soup questions.

    • The Trick: It doesn't just ask "Make soup." It asks, "Make a soup using only the Tomato and Basil rules."
    • It pairs each new question with only the specific recipe cards needed for that question. It filters out the noise.
  3. The Verification (The Quality Check):
    Before teaching the chef, SIEVE double-checks: "Did we really need the 'Spicy Pepper' rule for this Tomato soup?" If not, it throws that card away. This ensures the chef only learns the exact connection between a question and the right rule.

The Result: The "Internalized Chef"

Once the chef has practiced these thousands of "filtered" examples, SIEVE trains the chef to internalize the knowledge.

  • Before: The chef needed the book on the table to answer.
  • After: The chef has the rules burned into their brain (the model's "weights"). They can answer any soup question instantly, without the book, and they are actually better at it than when they were reading the book.

Why is this a big deal?

  • Sample Efficiency: You only need 3 examples to start the whole process. You don't need a team of experts to write 10,000 questions.
  • Better than "Reading": In tests, the AI trained with SIEVE performed better than an AI that was just reading the manual every time.
  • Works on Hard Stuff: It worked on complex tasks like calculating retail discounts, understanding NBA trade rules, and even translating languages from a 50,000-word grammar book (which is too big for a normal AI to hold in its "short-term memory").

The Bottom Line

Think of SIEVE as a way to turn a "reference librarian" (who needs to look things up) into a "subject matter expert" (who just knows the answer) using very little data. It does this by realizing that you don't need to study the whole library to learn a specific topic; you just need to study the relevant pages for the specific questions you are asking.

This means in the future, your AI assistants could learn your personal preferences, your company's specific rules, or new languages just by you giving them a few examples, and then they would "remember" it forever without needing to be reminded.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →