TOP

Research Article

Split Viewer

Mol. Cells 2010; 30(2): 99-105

Published online July 23, 2010

https://doi.org/10.1007/s10059-010-0093-0

© The Korean Society for Molecular and Cellular Biology

A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach

Yudong Cai1,2,*, ZhiSong He3, Xiaohe Shi4, Xiangying Kong4,5, Lei Gu6, and Lu Xie7

1Institute of System Biology, Shanghai University, Shanghai 200244, People’s Republic of China, 2Centre for Computational Systems Biology, Fudan University, Shanghai 200433, People’s Republic of China, 3Department of Bioinformatics, College of Life Sciences, Zhejiang University, ZheJiang 310058, People’s Republic of China, 4Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine, People’s Republic of China, 5State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai 200025, People’s Republic of China, 6Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Germany, 7Shanghai Center for Bioinformation Technology, Shanghai 200235, People’s Republic of China

Correspondence to : *Correspondence: cai_yud@yahoo.com.cn

Received: August 14, 2010; Revised: April 6, 2010; Accepted: April 22, 2010

Abstract

Protein-DNA interactions play an essential role in tran-scriptional regulation, DNA repair, and many vital biologi-cal processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Keywords bioinformatics, data mining, machine learning, mRMR, protein-DNA interaction

Article

Research Article

Mol. Cells 2010; 30(2): 99-105

Published online August 31, 2010 https://doi.org/10.1007/s10059-010-0093-0

Copyright © The Korean Society for Molecular and Cellular Biology.

A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach

Yudong Cai1,2,*, ZhiSong He3, Xiaohe Shi4, Xiangying Kong4,5, Lei Gu6, and Lu Xie7

1Institute of System Biology, Shanghai University, Shanghai 200244, People’s Republic of China, 2Centre for Computational Systems Biology, Fudan University, Shanghai 200433, People’s Republic of China, 3Department of Bioinformatics, College of Life Sciences, Zhejiang University, ZheJiang 310058, People’s Republic of China, 4Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine, People’s Republic of China, 5State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai 200025, People’s Republic of China, 6Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Germany, 7Shanghai Center for Bioinformation Technology, Shanghai 200235, People’s Republic of China

Correspondence to:*Correspondence: cai_yud@yahoo.com.cn

Received: August 14, 2010; Revised: April 6, 2010; Accepted: April 22, 2010

Abstract

Protein-DNA interactions play an essential role in tran-scriptional regulation, DNA repair, and many vital biologi-cal processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.

Keywords: bioinformatics, data mining, machine learning, mRMR, protein-DNA interaction

Mol. Cells
Sep 30, 2023 Vol.46 No.9, pp. 527~572
COVER PICTURE
Chronic obstructive pulmonary disease (COPD) is marked by airspace enlargement (emphysema) and small airway fibrosis, leading to airflow obstruction and eventual respiratory failure. Shown is a microphotograph of hematoxylin and eosin (H&E)-stained histological sections of the enlarged alveoli as an indicator of emphysema. Piao et al. (pp. 558-572) demonstrate that recombinant human hyaluronan and proteoglycan link protein 1 (rhHAPLN1) significantly reduces the extended airspaces of the emphysematous alveoli by increasing the levels of TGF-β receptor I and SIRT1/6, as a previously unrecognized mechanism in human alveolar epithelial cells, and consequently mitigates COPD.

Share this article on

  • line

Related articles in Mol. Cells

Molecules and Cells

eISSN 0219-1032
qr-code Download