Ancestral state reconstruction with discrete characters using deep learning

This study adapts the deep learning software phyddle to perform ancestral state reconstruction for discrete characters, demonstrating that while it matches Bayesian inference on simple models and small trees, it offers a viable alternative for complex models with intractable likelihoods, as validated by its application to empirical datasets including Liolaemus lizards and the 2014 Ebola outbreak.

Nagel, A. A., Landis, M. J.

Published 2026-03-21
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a detective trying to solve a cold case. You have a family tree of suspects (a phylogeny) and you know what the current suspects look like (their traits, like eye color or location). Your goal is to figure out what their great-great-grandparents looked like, even though those ancestors are long dead and left no photos. This is the problem of Ancestral State Reconstruction (ASR).

For decades, detectives have used a specific set of mathematical rules (called "likelihood-based methods") to solve this. These rules work great when the crime scene is simple. But if the crime scene is messy, complex, or involves rules that don't fit the standard math (like how a virus spreads through a city), those old rules break down. They get stuck because the math becomes too hard to solve.

Enter the new detective: Deep Learning.

This paper introduces a new tool called PHYDDLE. Think of PHYDDLE not as a rule-follower, but as a super-smart student who learns by watching thousands of practice cases. Instead of trying to solve a complex equation, it looks at patterns in the data and says, "I've seen this pattern before; in 90% of those cases, the ancestor was this way."

Here is a breakdown of how the authors tested this new detective, using simple analogies:

1. The Training Camp (Simulation)

Before sending PHYDDLE out to solve real crimes, the authors had to train it. They created a massive "training camp" with 500,000 fake family trees and fake evolutionary histories.

  • The Analogy: Imagine a video game where you play thousands of levels to learn the rules. PHYDDLE played these evolutionary levels over and over, learning to guess the past based on the present.
  • The Challenge: The authors had to make sure the training games were diverse enough. If they only trained PHYDDLE on small, simple trees, it would be terrible at solving cases with huge, complex trees. They had to teach it to handle trees of all shapes and sizes.

2. The Test Drive (Simple vs. Complex)

The authors put PHYDDLE to the test in two scenarios:

  • Scenario A: The Simple Crime (Small Trees, Simple Rules)

    • The Setup: A small family tree with just a few branches and simple rules (like a coin flip determining a trait).
    • The Result: PHYDDLE performed almost perfectly, matching the results of the old, trusted mathematical methods.
    • The Takeaway: For simple cases, the new AI detective is just as good as the old-school math detective.
  • Scenario B: The Complex Crime (Big Trees, Messy Rules)

    • The Setup: Huge family trees with hundreds of branches, or complex rules where traits change depending on how fast species are born or die (like the Ebola virus spreading).
    • The Result: PHYDDLE still did a decent job, but it started to make more mistakes than the old math methods. As the trees got bigger, the AI got a bit "confused."
    • The Takeaway: The AI is great, but it's not magic. It struggles when the family tree gets too big or the rules get too complicated, likely because it hasn't seen every possible variation of a huge tree during training.

3. Real-World Cases (The Empirical Tests)

Finally, they used PHYDDLE on two real-life mysteries:

  • Case 1: The Lizards of South America (Liolaemus)

    • The Mystery: Did these lizards evolve in the high mountains (Andes) or the lowlands?
    • The Result: PHYDDLE's guess was very similar to the traditional method. It successfully mapped out where the lizard ancestors likely lived, showing that the AI can handle real biological data.
  • Case 2: The 2014 Ebola Outbreak

    • The Mystery: Where did the virus start, and how did it move between different districts in Sierra Leone?
    • The Twist: This is a "hard" problem. The virus spreads in a way that doesn't have a simple math formula (likelihood) to solve it. Traditional methods struggle here.
    • The Result: PHYDDLE was able to reconstruct the virus's journey. It correctly guessed that the outbreak likely started in the eastern region (State 0) and spread outward. This is a huge win because it solved a problem that was previously very difficult to crack with standard math.

The Verdict: What Does This Mean?

Think of Likelihood-based methods as a calculator. It's incredibly precise and accurate, but it can only solve problems where you can write down a clear equation. If the equation is too messy, the calculator gives an error.

Think of Deep Learning (PHYDDLE) as a human expert. It might not be 100% perfect on every single calculation, but it can look at a messy, complex situation and make a very good guess based on experience and pattern recognition.

The Bottom Line:
This paper shows that we can now use AI to solve evolutionary mysteries that were previously impossible to crack because the math was too hard. While the AI isn't perfect yet (it gets a bit less accurate on very large trees), it opens the door to studying complex biological processes—like how diseases spread or how species adapt to changing environments—without getting stuck on the math.

It's like giving evolution a new pair of eyes that can see patterns in the chaos, helping us understand the history of life on Earth a little better.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →