当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving generalization of machine learning-identified biomarkers using causal modelling with examples from immune receptor diagnostics
Nature Machine Intelligence ( IF 23.8 ) Pub Date : 2024-01-24 , DOI: 10.1038/s42256-023-00781-8
Milena Pavlović , Ghadi S. Al Hajj , Chakravarthi Kanduri , Johan Pensar , Mollie E. Wood , Ludvig M. Sollid , Victor Greiff , Geir K. Sandve

Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here we argue that a causal perspective improves the identification of these challenges and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker—adaptive immune receptor repertoires (AIRRs). Through simulations, we illustrate how major biological and experimental factors of the AIRR domain may influence the learned biomarkers. In conclusion, we argue that causal modelling improves machine learning-based biomarker robustness by identifying stable relations between variables and guiding the adjustment of the relations and variables that vary between populations.



中文翻译:

使用因果模型和免疫受体诊断示例提高机器学习识别的生物标志物的泛化能力

机器学习越来越多地用于从高维分子数据中发现诊断和预后生物标志物。然而,与实验设计相关的多种因素可能会影响学习通用且临床适用的诊断的能力。在这里,我们认为因果视角可以提高对这些挑战的识别,并将它们与基于机器学习的诊断的稳健性和泛化性的关系形式化。为了进行具体讨论,我们重点关注最近建立的特定高维生物标志物——适应性免疫受体库(AIRR)。通过模拟,我们说明了 AIRR 结构域的主要生物学和实验因素如何影响学习到的生物标志物。总之,我们认为因果建模通过识别变量之间的稳定关系并指导调整群体之间变化的关系和变量来提高基于机器学习的生物标志物的鲁棒性。

更新日期:2024-01-26
down
wechat
bug