Mol. Cells 2010; 30(2): 99-105
Published online July 23, 2010
https://doi.org/10.1007/s10059-010-0093-0
© The Korean Society for Molecular and Cellular Biology
Correspondence to : *Correspondence: cai_yud@yahoo.com.cn
Protein-DNA interactions play an essential role in tran-scriptional regulation, DNA repair, and many vital biologi-cal processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.
Keywords bioinformatics, data mining, machine learning, mRMR, protein-DNA interaction
Mol. Cells 2010; 30(2): 99-105
Published online August 31, 2010 https://doi.org/10.1007/s10059-010-0093-0
Copyright © The Korean Society for Molecular and Cellular Biology.
Yudong Cai1,2,*, ZhiSong He3, Xiaohe Shi4, Xiangying Kong4,5, Lei Gu6, and Lu Xie7
1Institute of System Biology, Shanghai University, Shanghai 200244, People’s Republic of China, 2Centre for Computational Systems Biology, Fudan University, Shanghai 200433, People’s Republic of China, 3Department of Bioinformatics, College of Life Sciences, Zhejiang University, ZheJiang 310058, People’s Republic of China, 4Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine, People’s Republic of China, 5State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai 200025, People’s Republic of China, 6Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Germany, 7Shanghai Center for Bioinformation Technology, Shanghai 200235, People’s Republic of China
Correspondence to:*Correspondence: cai_yud@yahoo.com.cn
Protein-DNA interactions play an essential role in tran-scriptional regulation, DNA repair, and many vital biologi-cal processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.
Keywords: bioinformatics, data mining, machine learning, mRMR, protein-DNA interaction
Hyeonseo Hwang, Hee Ryung Chang, and Daehyun Baek
Mol. Cells 2023; 46(1): 21-32 https://doi.org/10.14348/molcells.2023.2157Leslie B. Poole, and Kimberly J. Nelson
Mol. Cells 2016; 39(1): 53-59 https://doi.org/10.14348/molcells.2016.2330Sunjin Moon, Joon Shin, Dongju Lee, Rho H. Seong, and Weontae Lee
Mol. Cells 2013; 36(4): 333-339 https://doi.org/10.1007/s10059-013-0119-5