当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DLM-DTI: a dual language model for the prediction of drug-target interaction with hint-based learning
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-02-01 , DOI: 10.1186/s13321-024-00808-1
Jonghyun Lee , Dae Won Jun , Ildae Song , Yun Kim

The drug discovery process is demanding and time-consuming, and machine learning-based research is increasingly proposed to enhance efficiency. A significant challenge in this field is predicting whether a drug molecule’s structure will interact with a target protein. A recent study attempted to address this challenge by utilizing an encoder that leverages prior knowledge of molecular and protein structures, resulting in notable improvements in the prediction performance of the drug-target interactions task. Nonetheless, the target encoders employed in previous studies exhibit computational complexity that increases quadratically with the input length, thereby limiting their practical utility. To overcome this challenge, we adopt a hint-based learning strategy to develop a compact and efficient target encoder. With the adaptation parameter, our model can blend general knowledge and target-oriented knowledge to build features of the protein sequences. This approach yielded considerable performance enhancements and improved learning efficiency on three benchmark datasets: BIOSNAP, DAVIS, and Binding DB. Furthermore, our methodology boasts the merit of necessitating only a minimal Video RAM (VRAM) allocation, specifically 7.7GB, during the training phase (16.24% of the previous state-of-the-art model). This ensures the feasibility of training and inference even with constrained computational resources.

中文翻译:

DLM-DTI:基于提示的学习预测药物与靶标相互作用的双语言模型

药物发现过程要求高且耗时,越来越多地提出基于机器学习的研究来提高效率。该领域的一个重大挑战是预测药物分子的结构是否会与靶蛋白相互作用。最近的一项研究试图通过利用编码器来解决这一挑战,该编码器利用了分子和蛋白质结构的先验知识,从而显着提高了药物-靶标相互作用任务的预测性能。尽管如此,先前研究中使用的目标编码器表现出计算复杂性随着输入长度呈二次方增加,从而限制了它们的实际用途。为了克服这一挑战,我们采用基于提示的学习策略来开发紧凑且高效的目标编码器。通过适应参数,我们的模型可以融合常识和目标导向的知识来构建蛋白质序列的特征。这种方法在三个基准数据集(BIOSNAP、DAVIS 和 Binding DB)上产生了显着的性能增强和学习效率。此外,我们的方法的优点是在训练阶段只需要最小的视频 RAM (VRAM) 分配,特别是 7.7GB(之前最先进模型的 16.24%)。即使计算资源有限,这也确保了训练和推理的可行性。
更新日期:2024-02-01
down
wechat
bug