GraphHDBSCAN*: Graph-based Hierarchical Clustering on High Dimensional Single-cell RNA Sequencing Data

The paper introduces GraphHDBSCAN*, a hyperparameter-free, graph-based hierarchical clustering method that effectively recovers both fine-grained flat partitions and biologically meaningful hierarchical structures in high-dimensional, sparse single-cell RNA sequencing data, outperforming existing state-of-the-art approaches.

Ghoreishi, S. A., Szmigiel, A. W., Nagai, J. S. + 3 more2026-03-26💻 bioinformatics

OPTIMIS: Optimizing Personalized Therapies through Integrated Multiscale Intelligent Simulation

The OPTIMIS framework addresses the challenge of controlling complex multiscale biological systems by integrating a hybrid stochastic-deterministic model into a differentiable Neural ODE surrogate, enabling deep reinforcement learning agents to successfully predict and prevent dangerous immune reactions in engineered cellular therapies with over 70% success rates.

Su, Z., Wu, Y.2026-03-26💻 bioinformatics

Is metabolism spatially optimized? Structural modeling of consecutive enzyme pairs reveals no evidence for spatial optimization of catalytic site proximity.

This study utilizes structural modeling and computational analysis of 107 consecutive enzyme pairs in *E. coli* to demonstrate that, despite a tendency for these enzymes to interact, their catalytic sites are not systematically positioned in spatially optimized configurations to facilitate metabolite transfer.

Algorta, J., Walther, D.2026-03-26💻 bioinformatics

Signature Distance: Generalizing Energy Statistics

This paper introduces Signature Distance (SD), a structural generalization of energy distance that compares empirical distributions via sorted pointwise distance profiles to detect local density and topological changes, offering a differentiable and computationally efficient metric for improved generative model evaluation, hypothesis testing, and data augmentation in high-dimensional biological data.

Lazzaro, N., Marchesi, R., Leonardi, G. + 6 more2026-03-25💻 bioinformatics

Fitness translocation: improving variant effect prediction with biologically-grounded data augmentation

This paper introduces "fitness translocation," a data augmentation strategy that leverages variant fitness data from homologous proteins to generate synthetic training examples in embedding space, thereby significantly improving the accuracy of protein variant effect prediction models, particularly when training data is scarce.

Mialland, A., Fukunaga, S., Katsuki, R. + 3 more2026-03-25💻 bioinformatics

Interpretable multi-omics machine learning reveals drought-driven shifts in plant-microbe interactions

By integrating genomic, metabolomic, and microbiome data from 198 soybean accessions, this study employs an interpretable machine learning approach to reveal that the isoflavone daidzin and the bacterium *Candidatus Nitrosocosmicus* are key drivers of drought resilience through specific plant-microbe interactions in the rhizosphere.

Yoshioka, H., Debeljak, P., Prado, S. + 3 more2026-03-25💻 bioinformatics