Markovian Generation Chains in Large Language Models

Imagine you have a magical, slightly eccentric translator named "The Echo." You give it a sentence, and it whispers back a slightly different version. Then, you take that new version, feed it back to the Echo, and ask for another version. You keep doing this over and over again.

What happens to the sentence after 50 rounds? Does it stay the same? Does it get weirder? Does it eventually turn into gibberish, or does it get stuck in a loop?

This paper, titled "Markovian Generation Chains in Large Language Models," is essentially a study of what happens when we play this "Telephone Game" with AI, but with a scientific twist. The authors treat the AI not just as a tool, but as a character in a game of chance.

Here is the breakdown using simple analogies:

1. The Core Concept: The "No-Memory" Game

Usually, when we talk to an AI, we have a conversation. The AI remembers what we said five minutes ago.
In this experiment, the authors strip the AI of its memory. Every time they ask for a rewrite, they only give the AI the immediate previous sentence and a set of instructions (like "rewrite this"). They don't let the AI see the original sentence or the history of the conversation.

The Analogy: Imagine a game of "Telephone" where the person in the middle is blindfolded and has amnesia. They only hear the person right next to them, repeat it back, and then forget everything else. The paper calls this a Markovian Generation Chain. It's a chain of events where the future depends only on the present, not the past.

2. The Two Modes of the Game: The Robot vs. The Gambler

The authors tested two different ways the AI decides what to say next.

Greedy Decoding (The Robot): The AI always picks the most obvious, statistically probable next word.
- What happens: The sentence gets stuck very quickly. It's like a ball rolling down a hill and landing in a small hole. Once it hits the bottom, it just sits there, or bounces back and forth between two very similar spots.
- The Result: The text becomes repetitive. After a few turns, you get the exact same sentence, or two sentences that swap back and forth forever. The "diversity" dies.
Sampling-Based Decoding (The Gambler): The AI is allowed to take a risk. It picks words based on probability, but sometimes it picks a less common word to add variety (controlled by a "temperature" setting).
- What happens: The sentence keeps changing. It's like a drunk person walking through a maze. They might wander around for a long time, exploring new paths, before they eventually get stuck in a loop.
- The Result: The text stays fresh for much longer. You get many unique versions of the sentence before it finally repeats.

3. The "Drift" and the "Loop"

The paper discovered two main behaviors:

The Loop (Recurrent Set): Eventually, almost every sentence will repeat. The AI will eventually say, "We begin with a prologue," then "We start with a prologue," then back to "We begin with a prologue." It gets trapped in a tiny cycle.
The Drift (Transient Phase): Before it gets stuck, the sentence wanders. If you use the "Gambler" mode, the sentence might drift far away from the original meaning, or it might just get more and more elaborate.

The Temperature Analogy:
Think of the "Temperature" setting like the heat in a room.

Low Heat (Cold): The molecules (words) move slowly. They settle into a neat, rigid pattern quickly. (Greedy decoding).
High Heat (Hot): The molecules are bouncing around wildly. They take longer to settle, creating more chaos and variety before they finally calm down.

4. Why Does This Matter?

You might ask, "Who cares if an AI repeats itself after 50 tries?"

The authors point out that this is exactly how the real world is starting to work:

The "Broken Telephone" of the Future: Imagine a news article written by an AI, then rewritten by another AI for a different audience, then summarized by a third AI, and so on. If these AIs don't have memory of the original source, the story could slowly mutate, lose its meaning, or get stuck in a weird loop of nonsense.
Multi-Agent Systems: We are building systems where AI agents talk to other AI agents. If Agent A talks to Agent B, and Agent B talks to Agent C, and they all forget the original context, the conversation could degrade rapidly.

5. The Big Takeaway

The paper proves that repeatedly feeding AI output back into AI input is risky.

If you want consistency and safety, you use "Greedy" mode, but you risk the text getting boring and stuck in a loop.
If you want creativity and variety, you use "Sampling" mode, but the text might drift away from the original meaning or take a very long time to settle.

In a nutshell:
This paper is a warning label for the future of AI. It tells us that if we let AI talk to itself too many times without a human in the loop to check the facts, the conversation will either get stuck in a boring loop or drift into a hallucination. It's a study of how information degrades (or evolves) when passed through a machine that has no memory of where it started.

Here is a detailed technical summary of the paper "Markovian Generation Chains in Large Language Models".

1. Problem Statement

As Large Language Models (LLMs) become ubiquitous in text processing pipelines (e.g., translation, rewriting, summarization), there is a growing concern regarding the iterative reprocessing of LLM-generated content. A critical, underexplored question is: How do texts evolve when they are repeatedly fed back into an LLM as input?

While previous research has focused on "model collapse" (degradation of training data distributions due to synthetic data training), this paper investigates inference-time dynamics. Specifically, it examines what happens when an LLM's output is used as the input for the next inference step, without any prior memory or history, creating a recursive loop. The authors aim to formalize this process and characterize whether texts converge to a static state, oscillate, or diverge in diversity.

2. Methodology

2.1 Formalization: Markovian Generation Chains

The authors define the iterative inference process as a Markovian Generation Chain.

State Space ( $S$ ): The set of all possible sentence strings.
Transition Operator ( $T_{M,\rho,d}$ ): A stochastic operator mapping a sentence $s^{(t)}$ to a distribution over next sentences $s^{(t+1)}$ , conditioned on a fixed model $M$ , prompt template $\rho$ , and decoding configuration $d$ .
Markov Assumption: The process is memoryless; $s^{(t+1)}$ depends only on $s^{(t)}$ , not on the history $s^{(0)} \dots s^{(t-1)}$ .
Composition: For tasks like round-trip translation (English $\to$ French $\to$ English), the process is modeled as the composition of two Markov kernels, inducing a transition matrix on the source language.

2.2 Experimental Setup

Datasets: Three distinct corpora were used: BookSum (narrative), ScriptBase-alpha (scripts), and News2024 (journalism). 150 seed sentences were sampled from each.
Models: Four instruction-tuned models were evaluated: GPT-4o-mini, Llama-3.1-8B, Mistral-7B, and Qwen2.5-7B.
Tasks:
1. Iterative Rephrasing: Rewriting a sentence while preserving meaning.
2. Round-Trip Translation: Translating to a bridge language and back to English.
Decoding Regimes:
- Greedy Decoding: Deterministic selection (argmax).
- Sampling-based Decoding: Stochastic selection with fixed hyperparameters ( $\tau=0.7$ , top-p=0.9).
Metrics:
- Diversity: Count of unique sentences ( $U$ ) over $T=50$ iterations.
- Recurrence: First recurrence time ( $\tau_T$ ), measuring how quickly a sentence repeats.
- Similarity: METEOR, ROUGE-1, and BLEU scores between consecutive steps ( $s^{(t)}$ vs. $s^{(t-1)}$ ).

3. Key Contributions

Theoretical Framework: The paper introduces the concept of Markovian Generation Chains to formalize iterative LLM inference, distinguishing it from training-time model collapse. It treats sentences as discrete states in a Markov chain.
Characterization of Dynamics: The authors identify two primary regimes of behavior:
- Early Recurrence: Rapid convergence to fixed points or short cycles (common in greedy decoding).
- Long Transients: Continued production of novel sentences over a finite horizon (common in sampling-based decoding).
Decoding Impact Analysis: The study demonstrates that the decoding strategy is the primary driver of diversity. Sampling-based decoding significantly prolongs the "pre-recurrence" phase compared to greedy decoding.
Input Sensitivity: The research quantifies the correlation between seed sentence length and output diversity, finding that longer seeds often lead to higher diversity, particularly under sampling regimes.
Granularity Extension: The authors extend the analysis from single sentences to paragraphs, showing that while full paragraph recurrence is rare, sentence-level attractors persist within paragraph-level chains.

4. Key Results

4.1 Greedy vs. Sampling Decoding

Greedy Decoding: Trajectories almost immediately enter recurrent sets (fixed points or short cycles). For example, a sentence might oscillate between two near-paraphrases. This results in very low sentence-level diversity (often < 5 unique sentences in 50 steps).
Sampling-based Decoding: Trajectories exhibit long transients. Many chains do not repeat a sentence exactly within 50 iterations. The number of unique paraphrases is significantly higher, though diversity varies by model and domain.

4.2 Convergence and Attractors

Under greedy decoding, models exhibit model-specific attractors. Different models converge to different fixed points or cycles even when starting from the same seed.
Round-trip Translation: Production MT services (e.g., Google Translate) behave nearly deterministically, converging quickly. In contrast, LLMs with sampling decoding show substantial surface-form variability, even when the semantic goal is meaning preservation.

4.3 Information-Theoretic Observations

Entropy: Depending on the transition kernel and initial state, iterative reprocessing can either increase or decrease sentence entropy.
KL Contraction: The paper proves that under repeated application of a stochastic kernel, the Kullback-Leibler (KL) divergence between distributions contracts, formalizing the tendency toward stabilization.
Drift: While diversity (surface form) may increase under sampling, semantic drift (measured by similarity metrics) accumulates, meaning the text can diverge significantly from the original meaning over many iterations.

4.4 Ablation Studies

Prompt Sensitivity: Changing the prompt template has a smaller effect on recurrence behavior than changing the decoding regime.
Prompt Heterogeneity: Alternating prompts between iterations increases diversity slightly but does not eliminate exact recurrences.
Paragraph Level: Even when processing full paragraphs, individual sentences within the paragraph often fall into local attractors (repeating sentence forms), despite the full paragraph never repeating exactly.

5. Significance and Implications

Multi-Agent Systems: The findings are critical for multi-agent LLM systems where agents pass messages to one another. The paper suggests that without careful control (e.g., temperature settings), these systems may rapidly converge to repetitive loops or suffer from cumulative semantic distortion ("broken telephone" effect).
Content Safety & Quality: In workflows where LLMs iteratively refine content (e.g., automated editing), greedy decoding may lead to stagnation, while sampling may introduce unintended semantic drift.
Distinction from Model Collapse: The authors clarify that inference-time recursion (this paper) is mechanistically distinct from training-time model collapse. Inference dynamics are driven by the properties of the transition kernel (stochasticity and attractors), not by the degradation of the training distribution.
Future Directions: The framework provides a compact mathematical tool to simulate and predict the behavior of complex LLM-mediated communication pipelines, offering a basis for designing more robust multi-turn interaction systems.

In summary, the paper establishes that iterative LLM processing is not a random walk but a structured stochastic process governed by Markovian dynamics, where the choice of decoding strategy fundamentally dictates whether the system converges to a static state or explores a diverse, albeit potentially drifting, space of text.