Viral non-coding RNA structure annotation and API-based data retrieval with Rfam and R2DT

This paper presents computational protocols and practical examples for automating viral non-coding RNA annotation and programmatically retrieving Rfam data via its RESTful API, while leveraging R2DT to generate comprehensive 2D structure visualizations for integration into bioinformatics and machine learning workflows.

Original authors: Muston, P., Triebel, S., Nawrocki, E., Ontiveros-Palacios, N., Jandalala, I., Sweeney, B., Bateman, A., Marz, M., Petrov, A. I., Madrigal, P.

Published 2026-05-14
📖 3 min read☕ Coffee break read

Original authors: Muston, P., Triebel, S., Nawrocki, E., Ontiveros-Palacios, N., Jandalala, I., Sweeney, B., Bateman, A., Marz, M., Petrov, A. I., Madrigal, P.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine the world of viruses as a massive library of instruction manuals. Inside these manuals, there are special sections written in a secret code called "non-coding RNA." These sections don't tell the virus how to build proteins; instead, they fold into specific 3D shapes that act like tiny tools or switches, controlling how the virus operates.

This paper introduces a set of new tools and a guidebook to help scientists find and understand these secret sections. Here is how the paper breaks it down, using simple comparisons:

1. The Master Blueprint (Rfam)
Think of Rfam as a giant, highly organized encyclopedia of these RNA shapes. It doesn't just list the letters of the code; it provides the "family albums" for thousands of different RNA types. For every family, it shows the average shape they all take (like a standard blueprint) and the rules for how they fold. This encyclopedia is essential for scientists trying to figure out what these mysterious RNA shapes are doing in new virus genomes they discover.

2. The Automated Detective (Annotation Protocols)
The paper presents a new "detective kit" for computers. Instead of a scientist manually reading through a virus's entire instruction manual to find these RNA shapes, this kit allows a computer to scan a whole viral genome automatically. It acts like a high-speed scanner that highlights every time it finds a known RNA shape, tagging it instantly so researchers know exactly where the important parts are.

3. The Magic Drawing Board (R2DT)
Once the computer finds these shapes, they need to be seen. The paper introduces R2DT, which is like a magic drawing board. You can feed it a single virus's code or a collection of different viruses (an alignment), and it instantly generates clear, easy-to-read 2D diagrams of the RNA structures. It turns complex, invisible folding patterns into visual maps that anyone can look at and understand.

4. The Direct Phone Line (The API)
Finally, the paper explains how to talk directly to the Rfam encyclopedia using a "phone line" called an API. Usually, you might have to visit a website and click through many pages to get data. This new method lets computer programs dial Rfam directly. Researchers can ask specific questions like, "Send me the family details for this RNA," "Download the list of all similar sequences," or "Check if this new virus sequence matches any known family." The encyclopedia replies instantly with the data in a format ready for analysis.

In Summary
The paper is essentially a "How-To" guide for scientists. It teaches them how to use Rfam (the encyclopedia) and R2DT (the drawing board) together with a direct digital connection (the API) to automatically find, visualize, and study the hidden RNA structures inside viruses. This helps researchers plug this information directly into their own computer programs, compare different viruses, or use it to train artificial intelligence systems.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →