bioinformatics 篇论文

生物信息学宛如一座连接生物学与计算机科学的桥梁，利用强大的算法和数据分析技术，将海量的生命遗传信息转化为可理解的科学发现。这一领域不再依赖显微镜下的观察，而是通过代码挖掘基因组的秘密，帮助科学家理解疾病机制、追踪病毒变异并推动精准医疗的发展。

作为 Gist.Science 的专属栏目，我们持续追踪来自 bioRxiv 的最新预印本论文，确保您能第一时间接触前沿动态。团队对每一篇新上传的预印本进行深度处理，不仅提供详尽的技术总结，更精心撰写通俗易懂的科普解读，让复杂的生物数据变得清晰易懂。

以下为您呈现该领域最新发表的几项重要研究成果，带您探索生命数字化的最新进展。

Neurotox: Deep learning decodes conserved hallmarks of neurotoxicity across venomous species

该研究开发了名为 Neurotox 的深度学习框架，通过分析 20 万条蛋白质序列证实，神经毒性特征并非仅由孤立的关键接触残基决定，而是源于能够塑造二级结构组织及受体相互作用的分布式序列特征。

Bedraoui, A., El Mejjad, S., Enezari, S., El Hajji, F. Z., Galan, J., El Fatimy, R., Daouda, T.2026-03-10💻 bioinformatics

Counting strands in outer membrane beta-barrels

该研究通过整合三种结构标准改进了 PolarBearal 算法，实现了对 AlphaFold2 数据库中 57 万余个细菌外膜β-桶蛋白结构的高精度（97%）自动链数标注，从而解决了以往手动计数低效及现有算法无法处理结构复杂性的难题，为外膜蛋白的结构功能研究、进化分析及药物设计提供了大规模数据集支持。

Lim, S., Nimmagadda, T., Khamis, A., Montezano, D., Feehan, R., Copeland, M., Slusky, J.2026-03-10💻 bioinformatics

PhosSight: a Unified Deep Learning Framework Boosting and Accelerating Phosphoproteome Identification to Enable Biological Discoveries

PhosSight 是一个统一的深度学习框架，通过引入 PhosDetect 模型精准预测肽段可检测性，有效解决了 DDA 和 DIA 模式下的数据缺失与搜索效率瓶颈，显著提升了磷酸化蛋白质组的鉴定深度并助力发现了如 MARK2 等新的预后激酶靶点。

Wang, B., Cheng, Z., She, C., Zhang, J., Lv, L., Zhu, H., Liu, L., Fu, Y., Yi, X.2026-03-10💻 bioinformatics

Improving Causal Gene Identification Using Large Language Models

本研究通过结合检索增强生成（RAG）与基因组距离信息，评估并优化了大语言模型在复杂疾病因果基因识别中的表现，发现虽然两者分别提升了预测精度，但联合使用时收益递减，揭示了混合方法在融合结构化特征与非结构化文本数据方面的潜力与局限。

Ofer, D., Kaufman, H.2026-03-10💻 bioinformatics

Inferring large networks with matrix factorisation to capture non-linear dependencies among genes using sparse single-cell profiles

该论文提出了一种名为 NIRD 的网络推断方法，通过矩阵分解和树集成回归处理单细胞转录组数据的稀疏性，从而有效捕捉基因间的非线性依赖关系，并在消除批次效应及结合 RNA 速度预测转录因子靶点方面展现出优越性能。

Jha, I. P., Meshran, A. G., Kumar, V., Natarajan, K. N., KUMAR, V.2026-03-10💻 bioinformatics

bioinformatics

Neurotox: Deep learning decodes conserved hallmarks of neurotoxicity across venomous species

Counting strands in outer membrane beta-barrels

PhosSight: a Unified Deep Learning Framework Boosting and Accelerating Phosphoproteome Identification to Enable Biological Discoveries

Improving Causal Gene Identification Using Large Language Models

Inferring large networks with matrix factorisation to capture non-linear dependencies among genes using sparse single-cell profiles

Exploring per-base quality scores as a surrogate marker of cell-free DNA fragmentome

FAMUS: A Few-Shot Learning Framework for Large-Scale Protein Annotation

Developing SCL2205 : A Protein Sequence-based Spatial Modelling Dataset for the Protein Language Model Frontier

Intrinsic dataset features drive mutational effect prediction by protein language models

Phosphorylation of a tumor-derived ASXL2 epitope remodels 1 peptide-HLA binding affinity and interaction dynamics