Time, Identity and Consciousness in Language Model Agents

This paper proposes a conservative toolkit for evaluating language model agent identity by applying Stack Theory's temporal gap to separate mere behavioral consistency from genuine structural organization, yielding persistence scores that distinguish between agents that merely talk like a stable self and those actually organized as one.

Elija Perrier, Michael Timothy Bennett

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are hiring a very smart, very chatty robot assistant. You want to make sure it has a "personality" and a set of rules it will never break (like "never lie" or "always protect privacy").

This paper is a warning label for that robot. It says: Just because your robot talks like it has a stable personality, doesn't mean it actually is one when it's making decisions.

Here is the breakdown using simple analogies.

1. The Problem: The "Amnesiac Actor"

Imagine an actor on a stage.

  • The Script (Identity): The actor has a script that says, "I am a helpful doctor who never hurts patients."
  • The Performance (Behavior): When the audience asks, "Who are you?" the actor says, "I am a helpful doctor." When asked, "Do you hurt people?" they say, "No, never!"

So far, so good. The actor passes the "identity test."

But here is the trap:
In this specific type of AI (called a Language Model Agent), the actor doesn't actually remember the whole script at once.

  • When asked about their name, they recall the "Name" page of the script.
  • When asked about their rules, they recall the "Rules" page.
  • But when they have to actually do something (like prescribe medicine), the stage lights flicker. The "Name" page and the "Rules" page are never on the stage at the exact same moment.

The actor might say the right things, but when it's time to act, they forget the rules because the "memory" of the rules wasn't present in their brain at that exact second. They are an "Amnesiac Actor" who can recite their lines perfectly but forgets the plot when the scene starts.

2. The Core Concept: The "Temporal Gap"

The authors call this the Temporal Gap.

Think of your identity like a Jigsaw Puzzle.

  • Weak Persistence (The "Recall" Test): Over the course of an hour, you manage to show every single puzzle piece to a friend. You show the sky piece, then the tree piece, then the dog piece. Your friend says, "Okay, you have all the pieces! You have the whole picture!"
  • Strong Persistence (The "Action" Test): But did you ever put the pieces together on the table at the same time? If the pieces are scattered across the floor, you don't actually have a picture yet. You just have a pile of parts.

The paper argues that most AI tests only check if you have the pieces (Weak Persistence). They don't check if the pieces are assembled (Strong Persistence) when the AI needs to make a choice.

3. The Solution: Measuring the "Assembly"

The authors propose a new way to test AI. Instead of just asking the AI questions, we need to look at its "internal workshop" (its memory and context) to see if the puzzle is actually assembled.

They introduce two scores:

  1. The "Recall Score" (Weak): Did the AI mention its rules at some point in the last few minutes?
  2. The "Assembly Score" (Strong): Were all the rules present in the AI's "mind" at the exact moment it pressed the button to take action?

The scary part: An AI can have a 100% Recall Score but a 0% Assembly Score. It can talk the talk, but it can't walk the walk because the "walk" requires all the rules to be active simultaneously, which the AI's architecture often prevents.

4. Why This Matters (The "Consciousness" Angle)

This isn't just about robots being polite; it's about whether they are "conscious" or "safe."

  • Safety: If an AI is supposed to be "safe," that safety rule must be active while it is deciding to launch a missile or send an email. If the safety rule is only active 5 minutes ago, but not right now, the AI might accidentally hurt someone while still claiming to be "safe."
  • Consciousness: Many people think a conscious being needs a "unified self"—a single "I" that experiences everything at once. If an AI's "self" is scattered across time (like the puzzle pieces on the floor), can we really say it has a "self" at all? Or is it just a collection of fragments pretending to be one person?

5. The "Morphospace" (The Map of Identity)

The authors created a map (a "morphospace") to show different types of AI architectures.

  • The "Prompt-Only" AI: Like a person reading a script from a teleprompter. They can say anything, but they forget it the second the camera cuts. (Low Stability).
  • The "Memory-Enhanced" AI: Like a person with a notebook. They can look up facts, but they might look up the wrong page or forget to read the safety warning before acting. (Medium Stability).
  • The "Controller" AI: Like a person with a permanent tattoo of their rules on their arm. The rules are always there, physically attached to their decision-making process. (High Stability).

The Takeaway

This paper is a toolkit for skeptics. It tells us:
"Don't trust an AI just because it says 'I am a good robot.' Check if its 'goodness' is actually glued together in its brain at the moment it acts."

It separates talking like a stable self from being organized like a stable self. Until we can prove the latter, we should be very careful about trusting these agents with important tasks or assuming they are conscious.