A New Information Theoretic Approach Shows that Mixture Models Outperform Partitioned Models for Phylogenetic Analyses of Amino Acid Data

By applying the newly introduced marginal Akaike information criterion (mAIC) to diverse empirical datasets, this study demonstrates that mixture models universally outperform partitioned models for phylogenetic analyses of amino acid data, highlighting the importance of further developing mixture models for accurate evolutionary inference.

Ren, H., Jiang, C., Wong, T. K. F., Shao, Y., Susko, E., Minh, B. Q., Lanfear, R.

Published 2026-03-18
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to reconstruct the family tree of a massive, ancient clan. You have a huge pile of old letters (DNA or protein sequences) from hundreds of different relatives. The goal is to figure out who is related to whom and how they evolved over millions of years.

To do this, scientists use mathematical "models" to guess how these letters changed over time. For a long time, there were two main ways to build these models: Partitioned Models and Mixture Models.

This paper is like a referee blowing the whistle to settle a decades-long debate: Which method actually works better? And the answer is a resounding victory for the Mixture Models.

Here is the breakdown in simple terms:

1. The Two Contenders

The Partitioned Model: The "Strict Teacher"
Imagine a classroom where the teacher divides the students into groups based on their height.

  • Group A (Tall kids) gets a specific rule: "You can only wear blue shoes."
  • Group B (Short kids) gets a different rule: "You can only wear red shoes."
  • The Problem: In real life, a tall kid might sometimes wear red shoes, and a short kid might wear blue. The "Strict Teacher" forces everyone into a box. If you misclassify a student, the whole group's rules get messed up. In science, this is called "partitioning." You force different parts of your DNA into different buckets and apply one rule to the whole bucket.

The Mixture Model: The "Flexible Chef"
Now, imagine a chef making a giant stew. Instead of separating ingredients into bowls first, the chef throws everything into one pot.

  • The chef knows that some ingredients (like carrots) behave one way, while others (like potatoes) behave differently.
  • The chef doesn't force the carrots to stay in a "carrot zone." Instead, the chef calculates the flavor of every single ingredient based on how it actually behaves in the pot.
  • The Advantage: It's flexible. It allows a specific spot in the DNA to act like a "carrot" even if its neighbors act like "potatoes." It doesn't need to force things into pre-defined boxes.

2. The Big Problem: The "Ruler" Was Broken

For years, scientists tried to compare these two methods using a standard ruler called AIC (Akaike Information Criterion). Think of AIC as a scorecard that tells you which model fits the data best. Lower scores are better.

The Catch: The old ruler was biased!

  • It was designed to measure the "Strict Teacher" (Partitioned models).
  • When they tried to measure the "Flexible Chef" (Mixture models) with this same ruler, the scores were unfair. It was like trying to measure a marathon runner's speed with a ruler meant for a snail.
  • Because of this broken ruler, scientists often thought the "Strict Teacher" was better, even when the "Flexible Chef" was actually doing a better job.

3. The New Solution: A Fair Ruler (mAIC)

The authors of this paper (along with a colleague named Susko) invented a new, fair ruler called mAIC (marginal AIC).

  • This new ruler knows how to measure both the "Strict Teacher" and the "Flexible Chef" on the same playing field.
  • It levels the playing field so we can see who actually fits the data better.

4. The Results: The Chef Wins!

The researchers took nine massive datasets (representing insects, plants, fungi, bacteria, and ancient archaea) and ran them through both models using the new fair ruler.

The Outcome:

  • The Flexible Chef (Mixture Models) won every single time.
  • The "Strict Teacher" (Partitioned Models) was consistently outperformed.
  • The difference wasn't small; it was massive. In some cases, the Mixture Model was thousands of points better on the scorecard.

Why did the Chef win?
Real evolution is messy. DNA sites don't always follow the neat rules of the "Strict Teacher." Sometimes a specific part of a protein behaves uniquely, regardless of which "bucket" you put it in. The Mixture Model captures this messy reality perfectly, while the Partitioned Model forces a rigid structure that doesn't exist in nature.

5. The "Robustness" Test: Does the Tree Hold Up?

To be sure, the scientists didn't just look at the scorecard. They also did a "stress test."

  • They took the family tree and removed one relative at a time to see if the tree fell apart or stayed strong.
  • Result: Both methods were pretty good at keeping the tree standing, but the Mixture Models were slightly more consistent.

The Takeaway for Everyone

For a long time, scientists were using a broken ruler that made them think the "Strict Teacher" (Partitioned Models) was the best way to study evolution.

This paper says: "Stop using the old ruler. Switch to the new one (mAIC), and you'll see that the 'Flexible Chef' (Mixture Models) is the superior method."

What does this mean for the future?

  • Scientists should stop forcing their data into rigid boxes.
  • They should embrace the flexible, "mix-and-match" approach of Mixture Models.
  • This will lead to more accurate family trees of life, helping us understand how animals, plants, and bacteria actually evolved.

In short: Nature is too complex for rigid boxes. We need flexible models to understand it.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →