Bioinformatics sits at the exciting intersection where biology meets data science, using powerful computer tools to decode the vast complexity of living systems. From mapping the human genome to tracking how viruses evolve, this field transforms raw biological information into actionable insights that drive modern medicine and research forward without requiring a supercomputer to understand the basics.

On Gist.Science, we ensure you never miss a breakthrough by processing every new preprint in this category directly from bioRxiv. Our team provides both plain-language explanations and detailed technical summaries for each paper, making cutting-edge discoveries accessible to everyone regardless of their background.

Below are the latest bioinformatics papers added from bioRxiv, ready for you to explore with clarity and depth.

Metabarcode and transcriptome datasets of Pinus sylvestris to assess fungal phyllosphere and disease dynamics.

This paper presents a comprehensive dataset of ITS2 metabarcoding and RNA-seq profiles from 200 and 48 *Pinus sylvestris* genotypes, respectively, to investigate how host genotype influences foliar fungal communities and disease susceptibility in the context of Dothistroma needle blight.

Moore, B., Perry, A., Kaur, S., Crampton, B., Gurung, A., Beaton, J., Smith, V. A., Morris, J., Hedley, P. E., Nemeth, K., Barber, H., Cavers, S., Jones, S.2026-05-18💻 bioinformatics

Combining amino acid frequency and 1D convolutional neural network embeddings for the identification of protein-protein interactions using a random forest classifier

This study proposes a two-stage framework that combines amino acid frequency features with latent representations learned by a 1D convolutional neural network autoencoder, demonstrating that a random forest classifier trained on this hybrid feature set significantly improves the accuracy of predicting protein-protein interactions compared to using frequency features alone.

Sindhi, N. A., Pawar, N., Dixson, J., Garcia, D.2026-05-18💻 bioinformatics

Genome-wide computational prediction of miRNAs encoded by influenza A virus (H3N2) predicts target genes involved in pulmonary and antiviral innate immunity

This study employs a genome-wide computational pipeline to predict influenza A virus (H3N2)-encoded miRNAs and their target genes, revealing a network of host genes involved in pulmonary and antiviral innate immunity that may clarify viral pathogenesis and suggest therapeutic targets.

Siddiqi, M. A., Kumar, H., Mazumder, M.2026-05-18💻 bioinformatics

KaryoScope: rapid, alignment-free sequence annotation for the pangenome era

KaryoScope is a rapid, alignment-free tool that enables base-resolution annotation of diverse genomic features across entire pangenome assemblies in minutes, effectively characterizing previously inaccessible variable regions like centromeres and subtelomeres to support comparative and clinical analysis.

Ranallo-Benavidez, T. R., Chen, Y.-A., Potapova, T. A., Alanko, J. N., Loucks, H., Lucas, J., Human Pangenome Reference Consortium,, Guarracino, A., Puglisi, S. J., MARCHET, C., Miga, K. H., Gerton, J (…)2026-05-17💻 bioinformatics

Hidden State Genomics: Graph-Based Analysis of Sparse Auto-Encoder Feature Activity in Genomic Language Models

This study employs sparse autoencoders and graph-based analysis to reveal that the Nucleotide Transformer v2 genomic language model encodes granular sequence syntax and local biophysical constraints rather than complex regulatory logic, explaining its strong performance on specific molecular tasks but weaker capabilities in broader regulatory inference.

Kmiec, E., O'Brien, S., McCoy, M.2026-05-16💻 bioinformatics

TAMIPAMI: Software and methods for PAM/TAM identification for CRISPR and OMEGA gene editing systems

This paper introduces TAMIPAMI, a streamlined experimental and computational framework that simplifies PAM/TAM identification for CRISPR and OMEGA systems by requiring only a single control library, utilizing a novel algorithm to define minimal degenerate motifs, and offering accessible web and command-line tools for rapid characterization.

Orosco, C., Jain, P. K., Rivers, A. R.2026-05-16💻 bioinformatics