Assessing the potential of bee-collected pollen sequence data to train machine learning models for geolocation of sample origin

This study demonstrates that supervised machine learning models, specifically Random Forest and k-Nearest Neighbors, can accurately predict the geographic origin of samples using bee-collected pollen DNA metabarcoding data, proving that raw sequence abundance is sufficient for reliable geolocation without time-consuming taxonomic assignment.

Hayes, R. A., Kern, A. D., Ponisio, L. C.2026-04-01💻 bioinformatics

VicMAG, an open-source tool for visualizing circular metagenome-assembled genomes highlighting bacterial virulence and antimicrobial resistance

The authors present VicMAG, an open-source visualization tool designed to comprehensively display circular metagenome-assembled genomes (cMAGs) with annotations for virulence factors, antimicrobial resistance genes, and mobile genetic elements, thereby facilitating holistic surveillance of bacterial pathogen spread in clinical and environmental settings.

Tsuda, Y., Tanizawa, Y., Vu, T. M. H. + 7 more2026-04-01💻 bioinformatics

Protein Language Models Outperform BLAST for Evolutionarily Distant Enzymes: A Systematic Benchmark of EC Number Prediction

This study systematically benchmarks protein language models against BLAST for enzyme commission number prediction, demonstrating that while simple MLP classifiers match BLAST's performance on in-distribution proteins, they significantly outperform it for evolutionarily distant organisms, establishing that smaller PLMs with lightweight classifiers are both efficient and superior for remote homology detection.

Sathyamoorthy, R., Puri, M.2026-04-01💻 bioinformatics

Amino acid substitutomics: profiling amino acid substitutions at proteomic scale unveils biological implication and escape mechanism in cancer

This study introduces "AA substitutomics," a novel proteomic pipeline using the PIPI-C tool to identify widespread post-translational amino acid substitutions in cancer that are largely absent from genomic databases, thereby revealing critical biological implications and mechanisms of drug resistance and immune escape.

Zhao, P., DAI, S., Lai, S. + 3 more2026-03-31💻 bioinformatics

Deep representation learning for temporal inference in cancer omics: a systematic review

This systematic review examines the application of deep representation learning, particularly Variational Autoencoders, in cancer omics, highlighting their current dominance in subtyping and prognosis while identifying the scarcity of longitudinal data as a major barrier to modeling cancer's temporal dynamics and proposing the use of VAEs as generative models to advance time-based cancer staging.

Prol-Castelo, G., Cirillo, D., Valencia, A.2026-03-31💻 bioinformatics

Transcriptional Hysteresis and Irreversibility in Periodontitis Revealed by Single-Cell Latent Manifold Modeling

By integrating single-cell RNA sequencing with variational autoencoder modeling and agentic AI simulations, this study quantifies the transcriptional hysteresis and irreversible structural collapse in severe periodontitis, introducing a Regenerative Permission Index (RPI) that predicts the failure of biomaterial interventions in advanced disease states.

Yadalam, P. K.2026-03-31💻 bioinformatics

Modeling gene regulatory perturbations via deep learning from high-throughput reporter assays

This paper introduces BlueSTARR, a retrainable deep learning framework that leverages whole-genome STARR-seq data to predict the regulatory effects of noncoding variants, revealing global signatures of purifying selection and demonstrating the model's ability to capture distance- and treatment-dependent transcription factor binding patterns.

Venukuttan, R., Doty, R., Thomson, A. + 10 more2026-03-31💻 bioinformatics

Cell type composition drives patient stratification in single-cell RNA-seq cohorts

This study demonstrates that simple, interpretable cell-type composition metrics, particularly centered log-ratio-transformed proportions, outperform complex computational methods for unsupervised patient stratification in single-cell RNA-seq cohorts by capturing clinically relevant variation driven by cellular heterogeneity, and introduces the open-source R package scECODA to facilitate this approach.

Halter, C., Andreatta, M., Carmona, S.2026-03-31💻 bioinformatics

Protein Language Model Decoys for Target Decoy Competition in Proteomics: Quality Assessment and Benchmarks

This study introduces protein language model-based decoys for proteomics target-decoy competition and benchmarks them against classical methods, finding that while they offer superior sequence-level indistinguishability and diagnostic value, they currently do not outperform traditional reverse decoys in overall search performance.

Reznikov, G., Kusters, F., Mohammadi, M. + 2 more2026-03-31💻 bioinformatics