Importance of taking Single Amino Acid Variant and accessory proteome variability into account in Data Independent Acquisition Proteomics: illustrated with Legionella pneumophila analysis

This study demonstrates that integrating single amino acid variants and accessory proteome variability into a DIA-NN workflow significantly enhances protein identification and coverage in *Legionella pneumophila* proteomics, enabling more accurate bacterial proteotyping and a deeper understanding of allelic diversity.

Dupas, A., Ibranosyan, M., Ginevra, C. + 2 more2026-04-03💻 bioinformatics

Anonymized Somatic Tumor Twins (STTs) enable open genome data sharing and use in research and clinical oncology

This paper introduces GenomeAnonymizer, a novel method that generates privacy-preserving Somatic Tumor Twins (STTs) from tumor-normal DNA sequences, effectively removing germline variants while retaining critical somatic information to enable open data sharing and accelerate oncology research and clinical decision-making.

Gaitan, N., Martin, R., Tello, D. + 10 more2026-04-03💻 bioinformatics

Optimisation of Weighted Ensembles of Genomic Prediction Models in Maize

This study evaluates three weight optimisation approaches (linear transformation, Nelder-Mead, and Bayesian) for weighted ensembles of genomic prediction models in maize, finding that while these methods generally improve prediction accuracy over naive equal-weight ensembles—particularly when optimal weights differ significantly from equal distribution—no single approach demonstrated clear superiority across all scenarios.

Tomura, S., Powell, O. M., Wilkinson, M. J. + 2 more2026-04-02💻 bioinformatics

Benchmarking Agentic Bioinformatics Systems for Complex Protein-Set Retrieval: A Coccolithophore Calcification Case Study

This study benchmarks three agentic bioinformatics systems on a complex protein-retrieval task involving coccolithophore calcification, revealing that while some agents generate larger protein sets, the Codex system achieves superior performance through a better balance of sensitivity and specificity, higher relevance, and greater repeatability.

Zhang, X.2026-04-02💻 bioinformatics

Resolution of recursive data corruption to transform T-cell epitope discovery

This study identifies a critical methodological flaw where immunopeptidomics datasets are contaminated by computational predictions, causing a recursive bias that inflates benchmark performance while failing in clinical applications, and proposes the deepMHCflare model trained on clean data to successfully overcome this limitation and improve T-cell epitope discovery.

Preibisch, G., Tyrolski, M., Kucharski, P. + 6 more2026-04-02💻 bioinformatics

Towards a Cytometry Foundation Model: Interpretable Sample-level Predictive Modelling via Pretrained Transformers

This paper introduces the Generalised Pretrained Cytometry Transformer (GPCT), an interpretable foundation model that leverages a novel pretraining regime to learn transferable cellular representations from heterogeneous marker panels, thereby overcoming scalability limitations and enabling high-accuracy, biologically validated sample-level predictions across diverse flow cytometry datasets.

Zhuang, Z., Mashford, B. S., Zheng, L. + 1 more2026-04-02💻 bioinformatics

A structure-informed deep learning framework for modeling TCR-peptide-HLA interactions

This paper introduces StriMap, a structure-informed deep learning framework that accurately models TCR-peptide-HLA interactions to enable the identification of antigenic drivers in autoimmunity and the prioritization of targets for immunotherapy, as demonstrated by its successful discovery of validated molecular mimics linking ankylosing spondylitis and inflammatory bowel disease.

Cao, K., Li, R., Strazar, M. + 8 more2026-04-02💻 bioinformatics

CardamomOT: a mechanistic optimal transport-based framework for gene regulatory network inference, trajectory reconstruction and generative modeling

CardamomOT is a unified mechanistic optimal transport framework that overcomes previous limitations in inferring gene regulatory networks and reconstructing unobserved protein trajectories from single-cell RNA sequencing time series, thereby enabling accurate generative modeling of cellular responses to perturbations.

Mauge, Y., Ventre, E.2026-04-02💻 bioinformatics

When Multimodal Fusion Fails: Contrastive Alignment as a Necessary Stabilizer for TCR--Peptide Binding Prediction

The paper introduces TRACE, a multimodal framework that employs CLIP-style intra-entity contrastive alignment to stabilize TCR-peptide binding predictions by regularizing noisy structural data against strong sequence embeddings, thereby demonstrating that constrained modality interaction is more critical than naive fusion for robust bioinformatics performance.

Qi, C., Wang, W., Fang, H. + 1 more2026-04-02💻 bioinformatics