Telogenesis: Goal Is All U Need

Imagine you are a detective trying to solve a mystery in a giant, dark warehouse filled with 50 different rooms. You only have enough flashlight battery power to check one room at a time.

Your goal isn't just to "look around"; it's to figure out which room has changed the most recently. Maybe someone moved a box in Room 4, or a light flickered in Room 12. If you miss these changes, you fail.

The paper you shared, titled "Telogenesis: Goal Is All U Need," is about how a robot (or an AI) can decide which room to check next without being told what to do.

Here is the simple breakdown of their discovery:

1. The Old Way vs. The New Way

The Old Way (The "Coverage" Strategy):
Imagine a security guard who walks in a perfect circle, checking Room 1, then Room 2, then Room 3, and so on, no matter what.

Pros: Eventually, they see everything.
Cons: If a fire starts in Room 4, the guard might not get there for a long time if they are currently in Room 40. They are slow to react to changes.

The New Way (The "Telogenesis" Strategy):
Imagine a detective who doesn't have a list. Instead, they have an internal "hunch meter" that tells them where to look based on three feelings:

Ignorance (The "I don't know" feeling): "I haven't looked at Room 12 in a while. I'm totally blind to it. I should check it."
Surprise (The "Wait, what?" feeling): "I looked at Room 5 yesterday and it was blue. Today it's red! Something changed! I need to look at it again immediately."
Staleness (The "It's been too long" feeling): "I haven't looked at Room 9 since breakfast. Even if I didn't see a change yesterday, it's been so long that it might have changed by now. I should check it."

The AI combines these three feelings into a single score. The room with the highest score gets the flashlight. This is Telogenesis: creating your own goals (where to look) from the inside out, rather than waiting for a boss to tell you.

2. The Big Surprise: It Depends on How You Measure Success

The researchers found something weird and fascinating about how we judge these detectives.

Scenario A: The "God's Eye View" Judge
If a judge looks at the entire warehouse and says, "You failed because you didn't know the color of the walls in 40 rooms," the Old Way (circular walking) wins. It's better at knowing the average state of everything.
Scenario B: The "Real World" Judge
But in the real world, the detective doesn't know what's in the rooms they haven't visited. The only thing that matters is: "How fast did you notice the fire?"
When the researchers measured success by speed of detection, the New Way (the hunch-based detective) crushed the Old Way.
- As the warehouse got bigger (more rooms), the Old Way got slower and slower.
- The New Way stayed fast, no matter how big the warehouse got.

The Lesson: If you are limited in attention (like a human or a robot with limited battery), you shouldn't try to know everything. You should try to notice changes as fast as possible.

3. The "Magic" Discovery: Learning Without a Teacher

In the final experiment, the researchers made the detective even smarter. They let the detective learn which rooms change often.

The Setup: Some rooms were "chaotic" (changing every few minutes). Other rooms were "boring" (staying the same for hours). The detective was not told which was which.
The Result: The detective started paying more attention to the chaotic rooms and less to the boring ones.
How? The "Staleness" feeling (the "it's been too long" timer) automatically sped up for the chaotic rooms because they kept surprising the detective. The timer slowed down for the boring rooms.

The system taught itself the structure of the environment just by paying attention to its own confusion and surprise. It didn't need a teacher, a reward, or a manual. It just needed to notice what it didn't know.

The Takeaway

The title "Goal Is All U Need" is a play on a famous AI paper title ("Data is all you need").

The authors are saying: You don't need an external boss to tell you what to do. If you have a brain that tracks what it doesn't know (ignorance), what surprises it (error), and what it's ignored for too long (staleness), you can generate your own goals.

In a world where we can't look at everything at once, the best strategy isn't to try to know everything. It's to listen to your own curiosity and rush toward the things that are most likely to have changed.

In short: Don't wait for a goal to be given to you. Your own confusion and curiosity are the goals you need to survive.

Here is a detailed technical summary of the paper "Telogenesis: Goal Is All U Need" by Zhuoran Deng et al.

1. Problem Statement

The central problem addressed is endogenous goal generation in autonomous agents. While Goal-Conditioned Reinforcement Learning (GCRL) allows agents to pursue diverse objectives, these goals are invariably specified externally. Biological agents, conversely, generate their own exploratory targets from internal cognitive states (e.g., uncertainty, surprise) without external reward signals.

The authors ask: Can attentional priorities—a minimal form of goal—emerge endogenously from an agent's internal cognitive state (epistemic gaps) without external rewards? Furthermore, they investigate how the choice of evaluation metrics (global error vs. local change detection) influences the perceived efficacy of such endogenous strategies in partially observable environments.

2. Methodology: The Telogenesis Framework

The authors propose Telogenesis (origin of purpose from within), a framework where a priority function $\pi_i(t)$ generates observation targets based on three types of epistemic gaps:

A. The Priority Function

For an agent maintaining a Bayesian world model over $N$ variables (observing $b \ll N$ per step), the priority score for variable $i$ at time $t$ is:
$\pi_i(t) = w_1 \tilde{\sigma}^2_i(t) + w_2 \tilde{S}_i(t) + w_3 (1 - e^{-\lambda \Delta t_i})$

Ignorance ( $\tilde{\sigma}^2_i$ ): Normalized posterior variance. High when data is insufficient.
Surprise ( $\tilde{S}_i$ ): Normalized prediction error ( $|x_i - \hat{x}_i| / (\hat{\sigma}_i + \epsilon)$ ). Spikes when observations violate expectations, signaling model mismatch.
Staleness ($1 - e^{-\lambda \Delta t_i}$): A saturating function of time since the last observation. This is the key innovation: it generates priority for variables that have not been observed recently, based purely on temporal reasoning, preventing the agent from ignoring unobserved variables.

Selection Mechanism: Targets are selected via a Softmax distribution over $\pi_i(t)$ , controlled by a temperature parameter $\tau$ and an activation threshold $\theta$ .

B. Experimental Setup

The framework was validated across three experiments:

Minimal System: $N=6$ scalar variables with asymmetric noise. Regime switches occur every $T_r$ ticks. Compared against Random, Rotation (deterministic cycle), and Error-driven (greedy) baselines.
Liminal Environment: A modular, partially observable world ( $N=16$ , 4 modules) with heterogeneous dynamics and variable coupling.
Emergent Structure Learning: An extension where the staleness decay rate $\lambda$ is learnable per variable via a surprise-weighted exponential moving average, without supervision on the environment's volatility structure.

3. Key Contributions

Unified Priority Function: Formalized a scalar priority function decomposing cognitive deficits into Ignorance, Surprise, and Staleness.
Metric-Dependent Reversal: Identified a critical methodological finding where the superiority of a strategy depends entirely on the evaluation metric:
- Global Prediction Error: Coverage-based strategies (e.g., Rotation) appear optimal.
- Change Detection Latency: Priority-guided allocation (Telogenesis) dominates, with advantages scaling monotonically with environmental complexity.
Power Law Scaling: Demonstrated that detection latency follows a power law with respect to the attention budget, where priority-guided allocation has a steeper exponent ($0.55 $) than rotation ($ 0.40$), yielding higher marginal returns on attention.
Unsupervised Structure Recovery: Showed that making $\lambda$ learnable allows the system to spontaneously recover the latent volatility structure of the environment (differentiating high vs. low volatility variables) without external labels or rewards.

4. Key Results

A. Component Ablation

In the minimal system, all three components (Ignorance, Surprise, Staleness) were necessary.

Ignorance only: Performed no better than random selection.
Ignorance + Surprise: Improved focus on switching variables but failed for unobserved variables.
Full Function (with Staleness): Significantly outperformed Random and Ignorance-only baselines ( $p < 10^{-6}$ ).

B. The Metric Reversal

Global Error: Under this metric (assuming omniscient access to all states), Rotation and Error-driven strategies often outperformed Telogenesis because they ensure uniform coverage or react directly to known errors.
Change Detection Latency: When measured by the time to detect a regime switch, Telogenesis significantly outperformed Rotation.
- As dimensionality ( $N$ ) increased from 8 to 48, the performance gap widened (Cohen's $d$ grew from $-0.27$ to $-0.95$ ).
- Rotation latency scaled linearly with $N$ (must complete a full cycle), while Telogenesis latency remained approximately constant ( $\sim 4$ ticks) because Staleness and Surprise directed attention to likely-changing variables.

C. Power Law in Attention Budget

For $N=48$ , detection latency $L$ vs. budget $b$ followed:

Priority: $L \propto b^{-0.55}$
Rotation: $L \propto b^{-0.40}$
The steeper exponent for Telogenesis indicates that additional attention budget yields proportionally larger improvements in detection speed compared to fixed rotation strategies.

D. Emergent Structure Learning (Experiment 3)

In the Liminal environment with heterogeneous volatility:

The system learned distinct decay rates $\lambda_i$ for each variable without supervision.
High-volatility variables converged to $\bar{\lambda}_{high} \approx 0.289$ .
Low-volatility variables converged to $\bar{\lambda}_{low} \approx 0.202$ .
Statistical significance: $t(49) = 22.5, p < 10^{-6}$ .
The system self-organized to attend more frequently to volatile variables, effectively recovering the ground-truth environmental structure through the closed loop of priority $\to$ observation $\to$ error $\to$ $\lambda$ update.

5. Significance and Implications

Redefining Adaptation: The paper argues that in partially observable worlds, the primary adaptive challenge is not "minimizing global error" (which requires omniscience) but "detecting where error has appeared." Telogenesis optimizes for this agent-centric metric.
Endogenous Goal Formation: It demonstrates that epistemic gaps alone are sufficient to generate adaptive priority structures. This suggests a pathway toward goal formation in agents that do not rely on external reward functions.
Cognitive Architecture: The work proposes a distinct computational layer between the world model and policy, where attentional priorities drive observation selection.
Unsupervised Learning: The spontaneous recovery of environmental volatility structure suggests that systems with world models and endogenous goal mechanisms can learn latent environmental properties without supervision.

Conclusion: The paper concludes that "Goal is all u need." By formalizing the generation of attentional priorities from internal cognitive states (ignorance, surprise, staleness), agents can achieve superior adaptation in complex, partially observable environments without external rewards, provided the evaluation metric aligns with the agent's actual capabilities (change detection rather than global omniscience).