geneML: Gene annotation across diverse fungal species using deep learning

The paper introduces geneML, a fast and open-source deep learning tool that significantly improves the accuracy, sensitivity, and biological completeness of gene and alternative transcript prediction across diverse fungal genomes compared to existing methods like BRAKER3 and AUGUSTUS.

Original authors: Vader, L., Harvey, C. J., Weber, T., Hon, L. S.

Published 2026-05-21
📖 3 min read☕ Coffee break read

Original authors: Vader, L., Harvey, C. J., Weber, T., Hon, L. S.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to read a massive, ancient library of books written in a strange, messy code. This library belongs to the world of fungi (mushrooms, molds, yeasts, etc.). Each book is a genome, and the "words" inside are genes. For a long time, scientists have struggled to figure out exactly where one word ends and another begins, especially because these fungal books are written in many different dialects and often have sentences that can be rearranged in multiple ways (called alternative splicing).

Enter geneML, a new digital assistant designed specifically to read these fungal books.

Here is how it works, using some simple comparisons:

1. The "Smart Reader" vs. The "Old Dictionary"

Previously, scientists used tools like BRAKER3 to find genes. Think of BRAKER3 as a very careful librarian who relies heavily on a physical dictionary (protein hints) to find words. It's good, but it sometimes misses words or gets confused by the messy handwriting.

geneML is like a super-smart reader who has studied thousands of fungal books and learned the patterns of the language itself using deep learning (a type of artificial intelligence). Instead of just looking up words in a dictionary, it understands the flow and structure of the sentences.

2. Catching More Words Without Making Mistakes

When the researchers tested geneML on nine different types of fungi, it did a better job than the old librarian.

  • The Score: It improved the overall accuracy score from about 65% to 67%.
  • The Magic: The real win was that geneML found more genes (it caught 69% of them compared to 64% before) without making more mistakes. It didn't just guess randomly; it actually found hidden words that the old tools missed.

3. Speed: The Fast Courier

You might think a super-smart AI would take forever to think, but geneML is surprisingly fast. It can read an entire fungal genome in about 6 minutes on a standard computer. That's like reading a whole novel in the time it takes to brew a strong cup of coffee.

4. Handling the "Twist" in the Story

Fungal genes are tricky because they can be "cut and pasted" in different ways to create different versions of the same story (this is called alternative splicing). Most tools struggle with this, but geneML is one of the few that can handle these twists.

  • When tested against real experimental data from a fungus called Fusarium graminearum, geneML correctly identified 41% of these different story versions.
  • The old tool (AUGUSTUS) only found 33%.
  • More importantly, geneML was more precise, meaning when it said it found a version, it was right 71% of the time, compared to the old tool's 49%.

5. Finding the Missing Pieces

Finally, the researchers used geneML to re-read a set of already "corrected" fungal books. They found that geneML spotted 15% more complete genes than the original annotations. It's like finding that a puzzle was missing a few corner pieces, and geneML was the one to spot them, making the final picture of the fungus much more complete and biologically accurate.

The Bottom Line:
geneML is a free, open-source tool that acts like a faster, sharper, and more attentive reader for fungal genomes. It finds more genes, handles complex sentence structures better, and does it all in the blink of an eye. You can find it online at the GitHub link provided in the paper.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →