PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

PRECEPT is a unified test-time adaptation framework that enhances LLM agent resilience by integrating deterministic exact-match rule retrieval, conflict-aware memory with Bayesian reliability, and the Pareto-guided COMPASS prompt-evolution loop to achieve superior compositional generalization, continuous learning, and robustness against knowledge drift and adversarial inputs.

Arash Shahmansoori

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are hiring a very smart, but slightly forgetful and overly confident assistant to manage a complex logistics company. You give them a rulebook, but the world changes every day: ports close, new laws appear, and sometimes the rulebook itself contains outdated or even wrong advice.

Traditional AI assistants (like the ones in the paper's "baselines") try to learn by reading their own notes written in plain English. The problem? As the number of rules grows, their notes become a mess. They start misreading their own handwriting, confusing "Ship to Hamburg" with "Ship to London" because the words look similar. They get stuck in loops, repeating the same mistakes, and they can't easily combine simple rules to solve complex problems.

PRECEPT is a new framework designed to fix this. Think of it as upgrading your assistant from a "note-taker" to a "super-organized librarian with a magic rulebook."

Here is how PRECEPT works, broken down into three simple superpowers:

1. The Magic Index Card System (Deterministic Retrieval)

The Problem: Imagine trying to find a specific rule in a 1,000-page diary written in long paragraphs. If you ask, "What do I do if it's raining AND the truck is broken?", the assistant has to read the whole page, guess the meaning, and might get it wrong.

The PRECEPT Solution: Instead of a diary, PRECEPT uses a giant, perfect index card system.

  • Every rule is written on a card with a unique barcode (a "condition key").
  • When a task comes in, the assistant doesn't "read" or "interpret" the rules. It simply scans the barcode.
  • The Analogy: It's like using a vending machine. You don't ask the machine, "What do you think I want?" You press button "A1," and it exactly gives you the soda you asked for. No guessing, no misreading.
  • The Result: Even if you have thousands of rules, the assistant finds the right one instantly and with 0% error. It can also stack rules together like Lego blocks (e.g., "Safety" rules always beat "Speed" rules) without getting confused.

2. The "Trust but Verify" Detective (Conflict Resolution)

The Problem: Sometimes, your assistant has two sources of information:

  1. The Old Manual (Static): A dusty book from 2010 that says "Always ship via Port A."
  2. The Live News (Dynamic): A breaking news alert saying "Port A is closed."

Old assistants might get confused, trying to blend the two, or they might stubbornly stick to the old manual because it "sounds" authoritative.

The PRECEPT Solution: PRECEPT acts like a detective with a lie detector.

  • It treats the "Old Manual" and the "Live News" as two different witnesses.
  • It uses a mathematical "trust score" (Bayesian reliability). If the Old Manual says one thing and the Live News says another, and the Live News has been proven right recently, the detective immediately ignores the Old Manual.
  • The Analogy: Imagine a courtroom. The Old Manual is a witness who has been wrong before. The Live News is a witness who just saw the event. PRECEPT doesn't just listen to both; it weighs their credibility. If the Old Manual is lying (or just outdated), PRECEPT silences it and follows the truth.

3. The "Self-Correcting GPS" (Drift Adaptation)

The Problem: Imagine you are driving to a destination. You have a GPS that says "Turn Left." You turn left, but there's a wall. The GPS is wrong because the road changed (drift). A bad GPS would keep telling you to turn left, over and over, until you crash.

The PRECEPT Solution: PRECEPT has a smart "forget" button.

  • If the GPS says "Turn Left" and you hit a wall, PRECEPT doesn't just try again. It immediately deletes the "Turn Left" rule from its memory.
  • It marks that specific route as "Forbidden" so it never tries it again.
  • It then explores new paths until it finds the correct one, and then it saves the new correct rule.
  • The Analogy: It's like a GPS that learns from its mistakes in real-time. If a road is closed, it doesn't just say "Try again"; it permanently reroutes and updates the map so no one else gets stuck there.

The "Coach" (COMPASS)

Finally, PRECEPT has a Coach named COMPASS.

  • While the assistant is working, the Coach watches how well they are doing.
  • If the assistant is struggling with a specific type of problem, the Coach rewrites the assistant's "instruction manual" (the prompt) to make it smarter.
  • But the Coach is smart too: it doesn't just guess. It tests the new instructions in a simulation first. If the new instructions are better, it installs them. If not, it keeps the old ones.

Why Does This Matter?

In the real world, AI agents often fail because they try to "think" too much about simple rules. They get overwhelmed by complexity.

PRECEPT proves that structure is better than scale.

  • Instead of making the AI "smarter" (which is hard and expensive), PRECEPT gives it a better structure (exact rules, conflict detection, and self-correction).
  • The Result: In tests, PRECEPT solved complex logistics problems 40% more often on the first try than other top AI methods. It made 60% fewer mistakes, learned from its errors instantly, and could handle situations where the rules changed overnight.

In short: PRECEPT turns a chaotic, guessing game into a precise, reliable machine by giving the AI a perfect filing system, a lie detector, and a GPS that never gets stuck in a loop.