Mining Beyond the Bools: Learning Data Transformations and Temporal Specifications

Imagine you are trying to teach a robot how to play a video game, like FrozenLake (where you slide a character across ice to a goal without falling into holes).

Usually, when we teach robots, we use a method called "Imitation Learning." It's like showing the robot a video of a human playing the game and saying, "Do exactly what they did." The robot memorizes the specific moves: "When I'm at square A, go right. When I'm at square B, go down."

The Problem: This approach is fragile. If you change the game slightly—say, you move the goal to a different spot or add a new hole—the robot gets confused. It's like a student who memorized the answers to a math test but doesn't understand the math itself. If the numbers change, they fail.

The Solution: This paper proposes a smarter way. Instead of just memorizing moves, the system tries to discover the rules of the game itself. It looks at the game logs and asks: "What are the underlying laws that make this game work?"

Here is how they did it, broken down into simple concepts:

1. The "Detective" Phase (Finding the Functions)

Imagine you are a detective looking at a series of photos of a moving car. You see the car at position 10, then position 11, then 12.

Old way: You just note "Car was at 10, then 11."
This paper's way: The system acts like a detective using a special tool (called SyGuS) to figure out the mechanism. It realizes: "Ah! The car isn't just moving randomly; it's following a rule: New Position = Old Position + 1."

The system automatically discovers these "rules of motion" (like adding 1, subtracting 1, or comparing coordinates) without anyone telling it what they are. It figures out that the player moves by +1 or -1 and that holes are static obstacles.

2. The "Storyteller" Phase (The New Language)

Once the system knows the rules of motion, it needs to write a "specification" (a set of instructions) for the robot.

The Old Language (LTL): This is like writing a story using only "Yes/No" switches. To say "Don't fall in the hole," you'd have to list every single hole coordinate: "If you are at (1,1) AND (0,3) AND (3,2), stop." This is clumsy and doesn't work if you add a new hole.
The New Language (TSLf): This is like writing a story using variables and relationships. The system writes: "Always stay away from any coordinate that matches a hole."
- It's the difference between memorizing a phone book (Old) and understanding the concept of "dialing a number" (New).

3. The "Teacher" Phase (Mining the Rules)

The system looks at examples of winning games (positive traces) and losing games (negative traces).

It sees that in winning games, the player eventually reaches the goal.
It sees that in losing games, the player hit a hole.
It combines these observations into a master rule: "Eventually reach the goal, BUT always avoid anything that looks like a hole."

4. The Result: A Super-Adaptable Robot

When the researchers tested this, the results were impressive:

Sample Efficiency: The system learned the game with very few examples (sometimes as few as 20). The "memorizing" robots needed thousands of examples to get even close.
Generalization: When they changed the game (moved the holes, made the grid bigger, or even changed the physics so the player moved differently), the system didn't break. Because it learned the logic (e.g., "avoid holes"), it could apply that logic to a completely new board. The memorizing robots failed immediately.

A Creative Analogy: The Chess Player

The Old Way (Imitation Learning): You show a robot 1,000 videos of a Grandmaster playing chess. The robot memorizes: "If the Knight is on B1, move to C3." If you change the board setup, the robot panics because it has never seen this specific setup before.
This Paper's Way: The robot watches the videos and figures out the rules of chess: "Knights move in an L-shape," "You lose if your King is captured," and "You win if you checkmate."
- Now, if you put the pieces on a 10x10 board or change the starting positions, the robot still knows how to play because it understands the principles, not just the specific moves.

Why This Matters

This paper is a step toward Symbolic Reinforcement Learning. Instead of just "guessing" the right move based on trial and error (like a neural network), the AI builds a formal model of the world. It learns the "laws of physics" and "laws of logic" of its environment.

This makes AI more robust, requires less data to learn, and allows it to adapt to new situations instantly—just like a human who understands the rules of a game can walk into a new version of that game and start playing immediately.

Here is a detailed technical summary of the paper "Mining Beyond the Bools: Learning Data Transformations and Temporal Specifications".

1. Problem Statement

Specification Mining is the process of deriving logical properties (specifications) from system execution traces. While existing approaches (e.g., using LTL or LTLf) are effective for capturing temporal ordering, they rely on Boolean abstractions of events.

Limitation: Real-world systems involve complex data transformations over non-Boolean types (e.g., integers, coordinates). To use standard Boolean logics, data must either be manually crafted into predicates (requiring human intuition) or "bit-blasted" (converting integers to binary bits), which increases formula size and introduces spurious semantic relations.
Goal: The authors aim to automate the discovery of data-aware temporal specifications directly from raw execution traces, capturing both how variables change (data transformations) and when these changes occur (temporal logic), without manual predicate engineering.

2. Methodology

The proposed framework operates in three main stages: Function Discovery, Trace Lifting, and Specification Mining.

A. Formalism: TSLf (Temporal Stream Logic with Finite Prefix)

The authors introduce TSLf, a finite-prefix interpretation of Temporal Stream Logic (TSL).

Syntax: Unlike LTLf, which uses Boolean atoms, TSLf uses Update Terms ( $[s \leftarrow f(s_1, \dots, s_n)]$ ) to describe variable updates via functions, and Predicate Terms ( $p(s_1, \dots, s_n)$ ) for conditions.
Semantics: It operates over finite traces where every variable is uniquely updated at each timestep (mutual exclusion of updates) until the final state, marked by a distinguished END predicate.
Expressivity: TSLf natively supports first-order predicates and functional updates, allowing specifications like "eventually the player coordinate equals the goal coordinate" rather than "eventually bit 3 of x is true."

B. Phase 1: Function Discovery (Bottom-Up Synthesis)

Before mining temporal logic, the system must discover which functions explain the data evolution in the traces.

Input: Raw traces of variable values over time.
Process:
1. Candidate Generation: For every variable transition ( $v_t \to v_{t+1}$ ), the system generates candidate input-output pairs based on other variables in the system.
2. SyGuS Synthesis: The system uses Syntax-Guided Synthesis (SyGuS) (specifically the CVC5 solver) to find a function $f$ that satisfies the observed input-output constraints.
3. Greedy Merging: A bottom-up algorithm iteratively merges groups of transitions. If a single function can explain multiple transitions (e.g., $x_{t+1} = x_t + 1$ for multiple steps), they are grouped. Spurious functions (e.g., $f(x) = x - x + 1$ ) are pruned as more data points are added.
Output: A minimal set of functions $\mathcal{F}$ that cover the data transformations in the traces.

C. Phase 2: Lifting to Well-Formed TSLf Traces

Once functions are discovered, raw logs are converted into formal TSLf traces.

Determinization: If multiple valid updates exist for a variable at a timestep, the system selects the most frequent valid update across all traces to ensure a single, well-formed interpretation.
Predicate Application: Predicates (e.g., equality, inequality) are automatically applied between variables of the same type at every timestep.
Result: A sequence of sets containing update terms and predicates, ready for standard mining algorithms.

D. Phase 3: Specification Mining

The lifted traces are treated as Boolean sequences (where each update/predicate is an atom) and fed into a mining solver (extended Bolt).

Decomposition: The mining problem is split into:
- Liveness ( $F\psi$ ): What must eventually happen (e.g., reach the goal).
- Safety ( $G\phi$ ): What must always hold (e.g., avoid holes).
Synthesis: The resulting specification is used to synthesize a reactive controller (using the tool Issy) that maps observations to actions.

3. Key Contributions

TSLf Formalism: Introduction of a finite-trace semantics for Temporal Stream Logic that natively supports data transformations and first-order predicates, bridging the gap between propositional LTLf and complex data systems.
Unified Mining Pipeline: A novel algorithm that combines SyGuS for function discovery with Boolean Subset Cover for temporal mining, eliminating the need for manual predicate definition.
Symbolic Reinforcement Learning: Demonstration of a paradigm where an agent learns a formal world model (specifications) from traces and synthesizes a controller, rather than learning a policy via gradient descent or state-action mapping.
Robustness and Efficiency: The method achieves high generalization with significantly fewer samples compared to passive learning baselines.

4. Experimental Results

The approach was evaluated on the OpenAI-Gymnasium ToyText suite (FrozenLake, CliffWalking, Taxi, Blackjack).

FrozenLake:
- Performance: Achieved 100% win rate on unseen board configurations (varied hole positions and grid sizes) using only 24 traces (12 positive, 12 negative).
- Comparison: Baselines (Alergia, Behavioral Cloning, Decision Trees) trained on 1000 samples achieved only 70–85% on varied configurations and <50% on fixed configurations when tested on new sizes.
- Insight: TSLf learned relational rules (e.g., "avoid coordinates matching holes"), whereas baselines memorized specific state-action pairs.
CliffWalking:
- TSLf generalized perfectly to varied cliff heights and board sizes.
- Baselines failed to generalize when the hazard region changed (e.g., higher cliffs), as they learned local transition probabilities rather than the global safety invariant.
Taxi:
- Successfully captured the temporal sequence of goals (Pick up passenger $\to$ Drop off at destination).
- Baselines failed to learn the sequential dependency, achieving low win rates even with 1000 samples. TSLf required only 24 traces.
Blackjack:
- Demonstrated ability to mine safety specifications for stochastic, non-spatial games.
- Achieved 100% adherence to complex strategies (Threshold, Conservative, Basic) with 8–24 traces.

5. Significance

Sample Efficiency: The method is orders of magnitude more sample-efficient than neural or symbolic baselines, requiring fewer than 20 examples to learn complex policies.
Generalization: By learning relational invariants (e.g., $x = \text{goal}$ ) rather than specific values, the synthesized controllers generalize to environments with different sizes, layouts, and dynamics that were never seen during training.
Interpretability: The output is a human-readable formal specification (TSLf formula), providing transparency into the agent's decision-making logic, unlike "black box" neural networks.
Future of RL: This work represents a step toward purely symbolic reinforcement learning, where agents interact with environments, mine formal specifications from experience, and refine their behavior through formal synthesis, potentially solving the out-of-distribution generalization challenges faced by current deep RL methods.