Bioinformatics sits at the exciting intersection where biology meets data science, using powerful computer tools to decode the vast complexity of living systems. From mapping the human genome to tracking how viruses evolve, this field transforms raw biological information into actionable insights that drive modern medicine and research forward without requiring a supercomputer to understand the basics.

On Gist.Science, we ensure you never miss a breakthrough by processing every new preprint in this category directly from bioRxiv. Our team provides both plain-language explanations and detailed technical summaries for each paper, making cutting-edge discoveries accessible to everyone regardless of their background.

Below are the latest bioinformatics papers added from bioRxiv, ready for you to explore with clarity and depth.

BTEXgenie: A curated and user-friendly tool for profile HMM-based substrate-specific annotation of BTEX degradation genes

BTEXgenie is a curated, user-friendly tool that utilizes custom profile hidden Markov models to achieve significantly higher sensitivity than existing databases in detecting substrate-specific genes involved in aerobic and anaerobic BTEX degradation, while also providing integrated pathway and genomic visualizations for environmental and comparative genomic studies.

Qu, J., Garber, A. I., Armbruster, C. R.2026-05-15💻 bioinformatics

TDP-43 regulates chromatin looping and gene transcription through binding and stabilizing DNA G-quadruplex structures

This study reveals that TDP-43 regulates gene transcription and facilitates long-range chromatin looping by binding to and stabilizing DNA G-quadruplex structures at chromatin loop anchors, thereby providing a mechanistic explanation for gene dysregulation in diseases associated with TDP-43 dysfunction.

Yang, F., Zhang, S., Guo, X., Qiao, Y., Zhang, Y., Sun, H., Chen, X., Wang, H.2026-05-15💻 bioinformatics

A modular Bayesian framework for inferring transmission networks from polyclonal infections, with application to Plasmodium falciparum

This paper introduces a modular Bayesian framework, exemplified by the Plasmotrack software for *Plasmodium falciparum*, that reconstructs directed transmission networks from polyclonal infections by accommodating multiple genetic sources and unobserved parents to estimate key public health metrics.

Murphy, M. R., Nielsen, R., Perkins, A., Greenhouse, B.2026-05-15💻 bioinformatics

Viral non-coding RNA structure annotation and API-based data retrieval with Rfam and R2DT

This paper presents computational protocols and practical examples for automating viral non-coding RNA annotation and programmatically retrieving Rfam data via its RESTful API, while leveraging R2DT to generate comprehensive 2D structure visualizations for integration into bioinformatics and machine learning workflows.

Muston, P., Triebel, S., Nawrocki, E., Ontiveros-Palacios, N., Jandalala, I., Sweeney, B., Bateman, A., Marz, M., Petrov, A. I., Madrigal, P.2026-05-14💻 bioinformatics

PXN Unlocks the Power of Public Gene Expression Data Through Cross-Technology Integration

The paper introduces PXN, a probabilistic machine learning framework that overcomes cross-platform incompatibility in public gene expression data by seamlessly translating diverse datasets (including bridging microarray and RNA-seq technologies) into a unified representation, thereby significantly enhancing the accuracy and statistical power of large-scale integrative biological analyses.

Sui, Z., Yu, D., Erdengasileng, A., Zhang, J., Qiu, X.2026-05-14💻 bioinformatics

Cataloging cysteines in ECOD domains using a protein language model

The authors developed TriCyP, a protein language model-based tool that accurately predicts cysteine functional states (disulfide bonding, metal coordination, and free thiols) from predicted structures, enabling a proteome-scale catalog of 2.7 million cysteines across ECOD domains that reveals distinct biological patterns and identifies novel metal-binding families and potential protein-protein interactions.

Yuan, R. D., Durham, J., Cong, Q., Schaeffer, R. D. D.2026-05-14💻 bioinformatics

A Context-Specific, Literature-Supported Framework for Validating Stress Response Differentially Expressed Gene Sets

This paper presents a context-specific framework that validates stress-response gene sets by leveraging protein-protein interaction networks restricted to differentially expressed genes, demonstrating that biologically supported "Principal Response" genes form significantly interconnected subnetworks across temperature conditions.

Frishman, B. A., Gonzalez, J. L., Forbes, V. E.2026-05-13💻 bioinformatics