plsMD: A plasmid reconstruction tool from short-read assemblies

The paper introduces plsMD, a novel computational tool that significantly improves the reconstruction of full plasmid sequences from short-read whole-genome sequencing data by integrating Unicycler assemblies with replicon and plasmid databases, thereby outperforming existing methods in accuracy and enabling more robust phylogenetic and antimicrobial resistance tracking studies.

Lotfi, M., Jalal, D., Sayed, A. A.2026-03-18💻 bioinformatics

usiGrabber: Automating the curation of proteomics spectra data at scale, making large datasets ready for use in machine learning systems

The paper introduces usiGrabber, a scalable and portable framework that automates the extraction and curation of large-scale proteomics spectra data from public repositories like PRIDE, demonstrating its effectiveness by rapidly constructing a massive phosphorylation dataset to retrain a machine learning classifier with performance comparable to manually curated models.

Auge, G., Clausen, M., Ketterer, K. + 7 more2026-03-18💻 bioinformatics

Hierarchical genomic feature annotation with variable-length queries

This paper introduces HKS, a data structure built on the Spectral Burrows-Wheeler Transform that enables exact, lossless hierarchical annotation of variable-length k-mers by resolving multi-matches through a user-defined category hierarchy and enhancing specificity with a context-aware smoothing algorithm, achieving high accuracy in genomic feature assignment while maintaining performance comparable to existing tools like Kraken2.

Alanko, J. N., Ranallo-Benavidez, T. R., Barthel, F. P. + 2 more2026-03-18💻 bioinformatics

PREMISE: A Quality-Aware Probabilistic Framework for Pathogen Resolution and Source Assignment in Viral mNGS

The paper introduces PREMISE, a high-performance, quality-aware probabilistic framework that utilizes alignment-based Expectation-Maximization to overcome the limitations of k-mer methods, enabling accurate identification of viral subtypes, estimation of relative abundances, and detection of complex events like reassortment and recombination in Influenza A viruses from metagenomic sequencing data.

Vijendran, S., Dorman, K., Anderson, T. K. + 1 more2026-03-18💻 bioinformatics

GOTFlow: Learning Directed Population Transitions from Cross-Sectional Biomedical Data with Optimal Transport

GOTFlow is a novel framework that leverages graph-constrained optimal transport in a learned latent space to infer directed, interpretable population transitions and molecular drivers from cross-sectional biomedical data, overcoming limitations of existing methods in modeling non-linear, heterogeneous, and unbalanced biological dynamics.

Wright, G., Alzaid, E., Muter, J. + 2 more2026-03-18💻 bioinformatics