Top

Research Article

Split Viewer

Mol. Cells 2010; 30(2): 99-105

Published online August 31, 2010

https://doi.org/10.1007/s10059-010-0093-0

© The Korean Society for Molecular and Cellular Biology

A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach

Yudong Cai1,2,*, ZhiSong He3, Xiaohe Shi4, Xiangying Kong4,5, Lei Gu6, and Lu Xie7

1Institute of System Biology, Shanghai University, Shanghai 200244, People’s Republic of China, 2Centre for Computational Systems Biology, Fudan University, Shanghai 200433, People’s Republic of China, 3Department of Bioinformatics, College of Life Sciences, Zhejiang University, ZheJiang 310058, People’s Republic of China, 4Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine, People’s Republic of China, 5State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai 200025, People’s Republic of China, 6Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Germany, 7Shanghai Center for Bioinformation Technology, Shanghai 200235, People’s Republic of China

Correspondence to : *Correspondence: cai_yud@yahoo.com.cn

Received: August 14, 2010; Revised: April 6, 2010; Accepted: April 22, 2010

Abstract

Protein-DNA interactions play an essential role in tran-scriptional regulation, DNA repair, and many vital biologi-cal processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Keywords bioinformatics, data mining, machine learning, mRMR, protein-DNA interaction

Article

Research Article

Mol. Cells 2010; 30(2): 99-105

Published online August 31, 2010 https://doi.org/10.1007/s10059-010-0093-0

Copyright © The Korean Society for Molecular and Cellular Biology.

A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach

Yudong Cai1,2,*, ZhiSong He3, Xiaohe Shi4, Xiangying Kong4,5, Lei Gu6, and Lu Xie7

1Institute of System Biology, Shanghai University, Shanghai 200244, People’s Republic of China, 2Centre for Computational Systems Biology, Fudan University, Shanghai 200433, People’s Republic of China, 3Department of Bioinformatics, College of Life Sciences, Zhejiang University, ZheJiang 310058, People’s Republic of China, 4Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine, People’s Republic of China, 5State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai 200025, People’s Republic of China, 6Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Germany, 7Shanghai Center for Bioinformation Technology, Shanghai 200235, People’s Republic of China

Correspondence to:*Correspondence: cai_yud@yahoo.com.cn

Received: August 14, 2010; Revised: April 6, 2010; Accepted: April 22, 2010

Abstract

Protein-DNA interactions play an essential role in tran-scriptional regulation, DNA repair, and many vital biologi-cal processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Keywords: bioinformatics, data mining, machine learning, mRMR, protein-DNA interaction

Mol. Cells
Nov 30, 2022 Vol.45 No.11, pp. 763~867
COVER PICTURE
Naive (cyan) and axotomized (magenta) retinal ganglion cell axons in Xenopus tropicalis (Choi et al., pp. 846-854).

Share this article on

  • line
  • mail

Related articles in Mol. Cells

Molecules and Cells

eISSN 0219-1032
qr-code Download