If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

This paper introduces LIFESTATE-BENCH, a novel benchmark utilizing narrative datasets like Hamlet to evaluate lifelong learning in large language models, revealing that while non-parametric methods outperform parametric ones in managing stateful interactions, all models still struggle with catastrophic forgetting over extended engagements.

Original authors: Siqi Fan, Xiusheng Huang, Yiqun Yao, Xuezhi Fang, Kang Liu, Peng Han, Shuo Shang, Aixin Sun, Yequan Wang

Published 2026-04-14
📖 4 min read☕ Coffee break read

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are talking to a very talented actor who has memorized the entire script of a play. They can recite lines perfectly, but here's the catch: they have no memory of what happened in the previous scene. Every time you start a new conversation, they are a blank slate, pretending to be a character for the first time, even if you've been talking for hours.

This is how most Large Language Models (LLMs) work today. They are "stateless," meaning they don't naturally remember their own story as it unfolds.

This paper introduces a new way to test if we can teach these AI actors to actually remember their life story and evolve as characters, just like humans do. Here is the breakdown in simple terms:

1. The Problem: The "Amnesiac Actor"

Humans learn by accumulating experiences. If you meet someone today, remember them tomorrow, and meet them again next week, your relationship changes. You know their secrets, their moods, and your shared history.

Current AI models are like actors who forget the plot every time the curtain rises. If you ask them, "Do you remember that we fought yesterday?" they might say, "I don't know, who are you?" because they don't have a built-in "memory bank" that updates as the conversation goes on.

2. The Solution: "LIFESTATE-BENCH"

The authors created a new test called LIFESTATE-BENCH. Think of this as a long-running TV drama designed specifically to test the AI's memory.

Instead of short, random chats, they gave the AI a script (based on Shakespeare's Hamlet and some made-up stories) with a clear timeline. The AI had to play a character through multiple "episodes."

The test asks three specific types of questions to see if the AI is actually "living" the story:

  • Self-Awareness: "Who are you right now?" (Does the AI remember its role?)
  • Fact Memory: "What happened in the last scene?" (Did it remember the plot details?)
  • Relationship Shift: "How do you feel about this other character now?" (Did the relationship change because of what happened yesterday?)

3. The Experiment: Two Ways to Remember

The researchers tested two different ways to help the AI remember:

  • Method A: The "Photo Album" (Non-Parametric)
    Imagine giving the AI a giant scrapbook of everything that happened so far. Every time a new question comes, you hand the AI the whole book (or a summarized version of it) to read before answering.

    • Result: This worked much better. The more context the AI could read, the better it remembered the story.
  • Method B: The "Brain Surgery" (Parametric)
    Imagine trying to permanently rewrite the AI's brain (its internal code) to "learn" the new facts, so it doesn't need the scrapbook. This is like trying to teach a dog a new trick by physically changing its brain structure.

    • Result: This was less effective. The AI tended to "forget" old things as it learned new things (a problem called "catastrophic forgetting"). It was like the AI was so busy learning the new scene that it erased the memory of the previous one.

4. The Big Discovery

The study found that current AI is still terrible at long-term storytelling.

  • They forget easily: As the story got longer, the AI's performance dropped. It started mixing up who was the villain and who was the hero.
  • Reading helps more than learning: The "Photo Album" method (giving the AI the history to read) was far superior to trying to "train" the AI to remember.
  • Relationships are hard: The AI was okay at remembering facts ("The king died"), but terrible at understanding how relationships changed ("Now I hate my uncle because he killed my father").

The Takeaway

This paper is a reality check. While AI can chat like a human, it doesn't yet have a "soul" or a continuous life story. It's like a brilliant improvisational actor who forgets the plot the moment the scene ends.

To make AI truly useful for long-term companionship or complex storytelling, we need to stop trying to force them to "memorize" everything in their brain and instead give them better tools to review their history as the story progresses. The authors' new benchmark is a tool to help developers figure out how to fix this memory gap.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →