TOP

Research Article

Split Viewer

Mol. Cells 2010; 30(2): 99-105

Published online July 23, 2010

https://doi.org/10.1007/s10059-010-0093-0

© The Korean Society for Molecular and Cellular Biology

A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach

Yudong Cai1,2,*, ZhiSong He3, Xiaohe Shi4, Xiangying Kong4,5, Lei Gu6, and Lu Xie7

1Institute of System Biology, Shanghai University, Shanghai 200244, People’s Republic of China, 2Centre for Computational Systems Biology, Fudan University, Shanghai 200433, People’s Republic of China, 3Department of Bioinformatics, College of Life Sciences, Zhejiang University, ZheJiang 310058, People’s Republic of China, 4Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine, People’s Republic of China, 5State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai 200025, People’s Republic of China, 6Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Germany, 7Shanghai Center for Bioinformation Technology, Shanghai 200235, People’s Republic of China

Correspondence to : *Correspondence: cai_yud@yahoo.com.cn

Received: August 14, 2010; Revised: April 6, 2010; Accepted: April 22, 2010

Abstract

Protein-DNA interactions play an essential role in tran-scriptional regulation, DNA repair, and many vital biologi-cal processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Keywords bioinformatics, data mining, machine learning, mRMR, protein-DNA interaction

Article

Research Article

Mol. Cells 2010; 30(2): 99-105

Published online August 31, 2010 https://doi.org/10.1007/s10059-010-0093-0

Copyright © The Korean Society for Molecular and Cellular Biology.

A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach

Yudong Cai1,2,*, ZhiSong He3, Xiaohe Shi4, Xiangying Kong4,5, Lei Gu6, and Lu Xie7

1Institute of System Biology, Shanghai University, Shanghai 200244, People’s Republic of China, 2Centre for Computational Systems Biology, Fudan University, Shanghai 200433, People’s Republic of China, 3Department of Bioinformatics, College of Life Sciences, Zhejiang University, ZheJiang 310058, People’s Republic of China, 4Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine, People’s Republic of China, 5State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai 200025, People’s Republic of China, 6Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Germany, 7Shanghai Center for Bioinformation Technology, Shanghai 200235, People’s Republic of China

Correspondence to:*Correspondence: cai_yud@yahoo.com.cn

Received: August 14, 2010; Revised: April 6, 2010; Accepted: April 22, 2010

Abstract

Protein-DNA interactions play an essential role in tran-scriptional regulation, DNA repair, and many vital biologi-cal processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Keywords: bioinformatics, data mining, machine learning, mRMR, protein-DNA interaction

Mol. Cells
Feb 28, 2023 Vol.46 No.2, pp. 69~129
COVER PICTURE
The bulk tissue is a heterogeneous mixture of various cell types, which is depicted as a skein of intertwined threads with diverse colors each of which represents a unique cell type. Single-cell omics analysis untangles efficiently the skein according to the color by providing information of molecules at individual cells and interpretation of such information based on different cell types. The molecules that can be profiled at the individual cell by single-cell omics analysis includes DNA (bottom middle), RNA (bottom right), and protein (bottom left). This special issue reviews single-cell technologies and computational methods that have been developed for the single-cell omics analysis and how they have been applied to improve our understanding of the underlying mechanisms of biological and pathological phenomena at the single-cell level.

Share this article on

  • line
  • mail

Related articles in Mol. Cells

Molecules and Cells

eISSN 0219-1032
qr-code Download