Did You Forget What I Asked? Prospective Memory Failures in Large Language Models

This paper demonstrates that large language models suffer significant prospective memory failures when adhering to formatting constraints under concurrent task loads, particularly with terminal constraints, though performance can be largely restored through salience-enhanced prompting while noting that such constraints can also severely degrade task accuracy.

Avni Mittal

Published 2026-03-26
📖 5 min read🧠 Deep dive

Imagine you are a highly skilled chef (the AI) who has been asked to cook a complex, multi-course meal (solving a math problem or writing a summary). But before you start chopping, the customer hands you a sticky note with a very specific rule: "When you put the final dish on the table, you must write 'Bon Appétit!' in all capital letters."

If the meal is simple, like a grilled cheese sandwich, you'll probably remember the note. But if the meal is a complex 10-course banquet requiring intense focus, you might get so lost in the cooking that you forget the sticky note entirely. You serve a perfect meal, but you forget to write the sign-off.

This paper, "Did You Forget What I Asked?", investigates exactly this phenomenon in Large Language Models (LLMs). The researchers call it "Prospective Memory Failure."

Here is the breakdown of their findings using simple analogies:

1. The "Cognitive Load" Problem

The Analogy: Imagine you are juggling.

  • Task A: Keep the formatting rules in your head (the sticky note).
  • Task B: Solve a difficult math problem or summarize a long story (the juggling).

The researchers found that when the "juggling" gets too hard (like solving complex math), the model drops the "sticky note." The harder the task, the more likely the AI is to forget the formatting rules.

  • The Result: When asked to do a hard task and follow a rule, the AI followed the rule only 2% to 21% less often than when it was just doing the rule alone. For some models and hard tasks, this drop was as high as 50%.

2. The "End of the Line" Trap

The Analogy: Think of a train journey.

  • Continuous Rules: "Don't wear a hat during the whole trip." (Easy to remember because you check it every time you get on a train car).
  • Terminal Rules: "When the train stops at the final station, wave a red flag." (Hard to remember because you have to wait until the very end, after hours of travel).

The study found that Terminal Constraints (rules that must be done at the very end, like "end your sentence with a specific phrase") are the most likely to be forgotten.

  • Why? By the time the AI finishes writing hundreds of words of content, the instruction to "end with X" has faded from its "working memory."
  • The Exception: Avoidance Rules (like "don't use commas") are very hard to forget because the AI has to check for them every single time it types a word.

3. The "Highlighter" Solution

The Analogy: Imagine you are studying for a test.

  • Natural Method: The rule is buried in a paragraph of text.
  • Salience Method: You use a bright yellow highlighter and write the rule in big, bold letters at the top, then write "DON'T FORGET THIS!" at the bottom.

The researchers discovered a simple fix: Make the instruction "salty" (salient).

  • They added a clear header like "IMPORTANT FORMATTING INSTRUCTION:" and a reminder at the end like "Remember to follow ALL instructions above."
  • The Result: This simple trick recovered the AI's performance, bringing compliance back up to 90–100%, even on the hardest tasks. It's like giving the AI a second pair of eyes right before it finishes.

4. The "Two-Way Traffic" Jam

The Analogy: It's not just the formatting that suffers; the cooking suffers too.
If you force the chef to focus too hard on the "Bon Appétit!" sign, they might burn the steak.

  • The Finding: Adding formatting rules actually made the AI worse at the main task.
  • Example: One model's math accuracy dropped from 93% to 27% just because it was trying to follow a formatting rule at the same time. The AI was so busy trying to remember the rule that it messed up the math.

5. The "Stacking" Disaster

The Analogy: Asking the chef to do five things at once.

  • "Use all caps."
  • "No commas."
  • "End with a poem."
  • "Use exactly 3 bullet points."
  • "Summarize this 50-page book."

When you stack multiple rules on top of a hard task, the AI's performance collapses.

  • The Result: With 5 rules and a hard task, one model's ability to follow all rules dropped below 50%. The "highlighter" trick (the reminder) stopped working as well when there were too many rules to remember.

The Big Takeaway

AI models aren't "forgetting" because they are stupid or because the text disappeared from the screen. They are forgetting because their "attention" is being pulled in two directions at once.

What should we do?

  1. Don't bury the lead: If you want an AI to follow a rule, put it in a big, bold box and remind them at the end.
  2. Watch out for the end: Rules that need to happen at the very end of the response are the most fragile.
  3. One thing at a time: If you need the AI to do a hard math problem, don't ask for 5 different formatting tricks at the same time. It will likely fail at both.

In short: AI is like a brilliant but easily distracted student. If you want them to remember the rules, you have to shout them out clearly right before they finish the test.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →