当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MulTFBS: A Spatial-Temporal Network with Multichannels for Predicting Transcription Factor Binding Sites
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-05-11 , DOI: 10.1021/acs.jcim.3c02088
Jujuan Zhuang 1 , Xinru Huang 1 , Shuhan Liu 1 , Wanquan Gao 1 , Rui Su 1 , Kexin Feng 1
Affiliation  

Revealing the mechanisms that influence transcription factor binding specificity is the key to understanding gene regulation. In previous studies, DNA double helix structure and one-hot embedding have been used successfully to design computational methods for predicting transcription factor binding sites (TFBSs). However, DNA sequence as a kind of biological language, the method of word embedding representation in natural language processing, has not been considered properly in TFBS prediction models. In our work, we integrate different types of features of DNA sequence to design a multichanneled deep learning framework, namely MulTFBS, in which independent one-hot encoding, word embedding encoding, which can incorporate contextual information and extract the global features of the sequences, and double helix three-dimensional structural features have been trained in different channels. To extract sequence high-level information effectively, in our deep learning framework, we select the spatial-temporal network by combining convolutional neural networks and bidirectional long short-term memory networks with attention mechanism. Compared with six state-of-the-art methods on 66 universal protein-binding microarray data sets of different transcription factors, MulTFBS performs best on all data sets in the regression tasks, with the average R2 of 0.698 and the average PCC of 0.833, which are 5.4% and 3.2% higher, respectively, than the suboptimal method CRPTS. In addition, we evaluate the classification performance of MulTFBS for distinguishing bound or unbound regions on TF ChIP-seq data. The results show that our framework also performs well in the TFBS classification tasks.

中文翻译:


MulTFBS:用于预测转录因子结合位点的多通道时空网络



揭示影响转录因子结合特异性的机制是理解基因调控的关键。在之前的研究中,DNA双螺旋结构和one-hot嵌入已成功用于设计预测转录因子结合位点(TFBS)的计算方法。然而,DNA序列作为一种生物语言,即自然语言处理中的词嵌入表示方法,在TFBS预测模型中尚未得到适当考虑。在我们的工作中,我们整合了DNA序列的不同类型的特征,设计了一个多通道的深度学习框架,即MulTFBS,其中独立的one-hot编码、词嵌入编码,可以结合上下文信息并提取序列的全局特征,双螺旋三维结构特征已在不同通道中进行了训练。为了有效地提取序列高级信息,在我们的深度学习框架中,我们通过将卷积神经网络和具有注意机制的双向长短期记忆网络相结合来选择时空网络。与在不同转录因子的 66 个通用蛋白结合微阵列数据集上的 6 种最先进方法相比,MulTFBS 在回归任务中的所有数据集上表现最佳,平均 R 2 为 0.698平均PCC为0.833,分别比次优方法CRPTS高5.4%和3.2%。此外,我们还评估了 MulTFBS 在 TF ChIP-seq 数据上区分结合或非结合区域的分类性能。结果表明,我们的框架在 TFBS 分类任务中也表现良好。
更新日期:2024-05-11
down
wechat
bug