Learning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories

Imagine you have a giant, messy pile of Lego bricks. Somewhere in that pile, there is a tiny, perfect, single Lego brick hidden. Your goal is to find that one perfect brick and get rid of all the junk.

This is essentially what symbolic simplification is in mathematics and physics. Scientists often start with a massive, complicated equation (the messy pile) that describes how particles interact. They know that, deep down, this equation should simplify into a tiny, elegant, and beautiful formula (the single perfect brick). But finding that path through the mess is incredibly hard because there are billions of ways to rearrange the pieces, and most of them lead to dead ends or even bigger messes.

This paper introduces a new way for computers to learn how to clean up these mathematical messes. Here is the breakdown of their clever approach:

1. The Problem: The "Unscramble" Puzzle

Think of a Rubik's Cube. If you twist it randomly, it gets messy. If you want to solve it, you need to know the specific sequence of moves to get back to the solved state.

The Old Way: Previous AI methods tried to learn by guessing. They would look at a messy equation and try to guess the answer directly (like trying to solve the Rubik's Cube by looking at the final picture and hoping to paint the right colors). Or, they used "Reinforcement Learning," where the AI tries millions of random moves, gets a "reward" only when it wins, and slowly learns. This is slow and often gets stuck.
The New Way: The author, David Shih, realized that while solving a puzzle is hard, making a puzzle is easy.

2. The Secret Sauce: "Scramble and Reverse"

Instead of teaching the AI how to solve a puzzle from scratch, the authors taught it how to unscramble a puzzle they created themselves.

Here is the step-by-step magic trick:

Start Simple: Take a beautiful, simple equation (the "Goal").
Scramble It: Randomly apply mathematical rules to make it messy and complicated. (Imagine taking a neat sentence and randomly swapping words, adding synonyms, and rearranging the grammar until it's a jumbled mess).
Record the Steps: As you scramble it, write down exactly what you did.
Reverse the Tape: Now, take that messy equation and play the recording backward. You now have a perfect "Oracle Trajectory"—a step-by-step guide showing exactly how to go from the mess back to the simple answer.

The AI is trained on millions of these "reverse tapes." It learns: "When I see this specific type of mess, the next best move is to do X." Because the AI learns the process of simplifying one step at a time, rather than guessing the whole answer at once, it becomes incredibly good at it.

3. The "Smart" AI Architecture

The AI uses a special type of neural network called a Transformer (the same technology behind tools like ChatGPT).

Permutation Equivariance: In math, the order of terms in an addition doesn't matter ( $A + B$ is the same as $B + A$ ). The AI is designed to understand this. It treats the equation like a bag of marbles rather than a line of marbles. It doesn't care if you shuffle the order; it still knows what to do.
The "Soft" Loss: Sometimes, there isn't just one "right" move. There might be three different moves that all lead to the same simplified result. The AI is taught to accept any of those valid moves as a success, rather than being penalized for picking a different valid path.

4. The Results: Beating the Experts

The authors tested this on two very difficult physics problems:

Dilogarithms: Complex functions that appear in quantum physics calculations.
Scattering Amplitudes: Equations describing how particles collide and scatter.

The Scoreboard:

Previous Best AI: Got about 92% to 96% of the problems right.
This New AI: Got 99.9% of the problems right.

It didn't just get the easy ones right; it solved the hardest, most scrambled versions that confused the old models.

5. The "Super-Size" Challenge

To prove it was truly powerful, they tried to simplify a real-world physics problem that was too big for the AI to handle in one go (an equation with over 200 terms, while the AI was only trained on 25).

They used a "divide and conquer" strategy:

Contrastive Grouping: They broke the giant 200-term mess into smaller, manageable chunks (like sorting a huge pile of laundry into small baskets).
Beam Search: They let the AI explore multiple possible paths at once, keeping the best ones and discarding the dead ends.

The Result: They achieved a 100% success rate, turning massive, unwieldy Feynman diagram calculations into the famous, elegant "Parke-Taylor" formula.

Why This Matters

This paper shows that we don't need to teach AI to be a genius mathematician from scratch. Instead, if we teach it to recognize patterns in how things get messy, it can learn to clean them up perfectly.

It's like teaching a child to tidy their room not by showing them the final clean room, but by showing them how to pick up one sock, then one shirt, then one toy, over and over again. Eventually, they learn the habit of cleaning, and they can clean a room they've never seen before.

This approach could revolutionize how physicists calculate particle interactions, potentially leading to new discoveries in the universe by making the math manageable again.

Here is a detailed technical summary of the paper "Learning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories" by David Shih.

1. Problem Statement

The paper addresses the challenge of symbolic simplification: reducing complex mathematical expressions into compact, interpretable forms. This is a fundamental problem in theoretical physics (e.g., simplifying Feynman integrals or scattering amplitudes) but is computationally difficult due to:

Combinatorial Explosion: At each step, many algebraic identities can be applied to various parts of an expression.
Non-Monotonic Complexity: The "right" move often temporarily increases complexity (e.g., expanding a term) before subsequent cancellations yield a simpler result.
Lack of General Algorithms: No general-purpose algorithm exists to navigate this search space efficiently.

Previous approaches using Reinforcement Learning (RL) or End-to-End Sequence-to-Sequence (Seq2Seq) regression have struggled with success rates, particularly as expression complexity increases.

2. Methodology

The authors propose a self-supervised machine learning framework that treats simplification as a Markov Decision Process (MDP) but avoids the sparse reward and sample inefficiency issues of RL.

A. Oracle Trajectory Generation (The Core Insight)

Instead of training an agent to explore a state space to find a reward, the authors generate training data by reversing the problem:

Start with a Simple Target: Generate a known, simplified expression (the goal state).
Forward Scramble: Apply a sequence of random mathematical identities to "scramble" the simple expression into a complex one. This is computationally easy because one can always make an expression more complex.
Reverse for Training: Record the sequence of states and invert the operations. The resulting trajectory (Complex $\to$ Simple) serves as an "Oracle Trajectory," providing the model with explicit, step-by-step supervision on how to simplify the expression.

This approach generates a limitless supply of training data without requiring human expert demonstrations (distinguishing it from Behavioral Cloning).

B. Model Architecture

Policy Network: A Transformer-based encoder.
Input Representation: Expressions are represented as sequences of term vectors. Crucially, no positional encoding is used because mathematical terms form an unordered set (addition is commutative).
Permutation Equivariance: The policy head is designed to be permutation-equivariant, ensuring the model's output depends on the content of the terms, not their order in the input sequence.
Action Space: Actions consist of applying a specific mathematical identity to a specific part of the expression (e.g., a specific term or bracket).

C. Key Technical Innovations

Multi-Label Soft Loss: In many symbolic domains, multiple distinct actions can lead to the same simplified result (action equivalence) due to algebraic symmetries. Standard cross-entropy penalizes the model for choosing a valid alternative. The authors use a multi-label loss where all equivalent oracle actions receive a target probability of $1/k$, teaching the model to recognize multiple valid strategies.
Inference Techniques: To improve robustness during inference, the authors employ:
- Anti-cycle Detection: Prevents the model from entering infinite loops (e.g., applying an identity and immediately reversing it).
- Backtracking: If a path fails, the model reverts to a state of locally minimal complexity and tries the next-best action.
- Reject Term Increase (RTI): Masks actions that would increase the term count beyond a threshold, preventing "term explosion."

3. Applications and Experiments

The framework was tested on two high-energy physics problems:

A. Dilogarithm Reduction

Task: Simplify sums of dilogarithm functions ( $Li_2$ ) using reflection, inversion, and duplication identities.
Data: 100k training samples generated by scrambling simple targets.
Results:
- Achieved a 99.9% solve rate on the DSZ test set (4,737 samples).
- Significantly outperformed the previous best Seq2Seq model (92%) and RL approaches.
- Demonstrated strong generalization: trained on scrambles up to depth 7, it maintained near-perfect performance on test cases scrambled up to depth 10.

B. Spinor-Helicity Scattering Amplitude Simplification

Task: Simplify on-shell scattering amplitudes in Yang-Mills theory using Schouten identities, momentum conservation, and momentum-squared identities.
Data: 500k trajectories per $n$ -point problem (4, 5, and 6 points).
Results:
- 4-point: 99.9% solve rate (vs. 98.2% for CDS).
- 5-point: 99.6% solve rate (vs. 96.0% for CDS).
- 6-point: 99.4% solve rate (vs. 96.9% for CDS).
- The model reduced failure rates by factors of 5 to 80 compared to prior work.

C. Real-World Challenge: Tree-Level Gluon Amplitudes

Task: Simplify realistic 5-point gluon amplitudes derived from Feynman diagrams, which can contain up to 228 terms (far exceeding the model's 25-term input capacity).
Pipeline:
1. Contrastive Grouping: A pre-trained encoder groups terms likely to simplify together, decomposing the large expression into manageable sub-problems.
2. Beam Search: Navigates the vast combinatorial space of identity sequences.
Result: Achieved a 100% solve rate on a representative subset of 103 forms, successfully reducing expressions with ~90 terms down to the single-term Parke-Taylor formula.

4. Key Contributions

Self-Supervised Oracle Trajectories: A novel data generation paradigm that leverages the asymmetry between "complexification" (easy) and "simplification" (hard) to create unlimited, high-quality training data.
Handling Action Equivalence: The introduction of multi-label soft loss to handle algebraic symmetries where multiple actions yield the same result, a critical factor in achieving high accuracy.
Permutation-Equivariant Architecture: A Transformer design that respects the unordered nature of mathematical terms, improving generalization.
State-of-the-Art Performance: Substantially outperforming both RL and end-to-end regression baselines on standard benchmarks.
Scalability: Demonstrating that a model trained on simple, small expressions can generalize to simplify massive, realistic Feynman-diagram-level amplitudes when combined with appropriate search strategies (grouping + beam search).

5. Significance

This work represents a major advance in applying machine learning to symbolic mathematics. By reframing the problem from "learning to search" (RL) or "learning a global map" (Regression) to "learning step-wise transitions via reversed trajectories," the authors bypass the sample inefficiency of RL and the generalization limits of end-to-end regression.

The success on Yang-Mills scattering amplitudes is particularly significant for theoretical physics, as it demonstrates that ML can automate the simplification of complex, high-multiplicity calculations that are currently a bottleneck in precision physics. The "scramble-and-reverse" paradigm is proposed as a generalizable framework for any symbolic domain with reversible rewrite rules and known simple forms.