Mol. Cells 2019; 42(2): 166-174
Published online February 28, 2019
https://doi.org/10.14348/molcells.2018.0403
© The Korean Society for Molecular and Cellular Biology
Correspondence to : *Correspondence: pcronald@ucdavis.edu (PR); insuklee@yonsei.ac.kr (IL)
Bacterial species in the genus
Keywords co-functional network, plant bacterial pathogen,
TCSs control diverse aspects of biological function in bacteria. For example, RpfC and RpfG (
Although many genes involved in bacterial virulence have been identified through molecular and cellular studies, elucidation of virulence-associated processes of
A comprehensive understanding of regulatory circuits controlling bacterial virulence requires insight into the collaborative and regulatory interactions among multiple genes. Molecular interaction networks have proven useful in such endeavors (Cowen et al., 2017). Large-scale molecular networks have been constructed for many organisms, spanning from unicellular microbes to human and crops, by both experimental and computational approaches. However, experimental mapping of molecular interactions has been conducted in only limited number of species.
The availability of data from functional genomics, comparative genomics, and proteomics facilitates a genome-wide scale analysis of gene function. Because datasets from each technique are incomplete, error-prone, and limited in sensitivity, a single dataset alone is insufficient to fully describe a particular biological process. However, such datasets can be integrated to generate a more accurate and comprehensive view of gene function than is contained in any single dataset. For example, genome-scale co-functional networks have enabled effective integration of heterogeneous genomics data, significantly enhancing both accuracy and comprehensiveness of the molecular network models (Shim et al., 2017). These networks can then be utilized for prioritizing candidate genes for biological processes or complex traits of interest. For example, genome-scale co-functional networks for
In this study, we present a genome-scale co-functional network of
The genome set of
Co-functional gene networks were constructed by supervised machine learning processes, which require gold-standard data for benchmarking inferred models. Gold-standard data play critical roles in error-tolerant and unbiased learning. We compiled 11,669 positive gold-standard co-functional gene pairs from Gene Ontology Biological Process (GOBP) annotations (
To evaluate the constructed network, we generated another gold standard functional gene pairs based on MetaCyc pathway database (Caspi et al., 2016) independently. We generated 7457 and 250,664 gene pairs for positive and negative gold standard data sets, respectively. For the evaluation of network capacity to retrieve known genes for each pathway, we used only 145 MetaCyc pathway terms that have no less than five member genes.
We benchmarked inferred co-functional gene pairs for given genomics data (
Where
If there was no data intrinsic score associated with the inferred gene pairs, the calculated
Inferred co-functional links with assigned
where
Co-expressed genes across various biological conditions are likely to be co-regulated genes for a process. We inferred co-functional links from co-expression across
Protein domain is a structural and functional unit of protein. Therefore, proteins for similar function tend to have similar domain composition. We constructed domain profiles for coding genes of
In prokaryotic genomes, genes operating for the same process are often encoded as a co-transcriptional gene cluster, called operon. Therefore, we may infer co-functional gene pairs by their genomic proximity across prokaryotic genomes (Dandekar et al., 1998). We previously found that two measures of gene neighborhood, distance-based gene neighborhood (DGN) and probability-based gene neighborhood (PGN), are complementary and their integration can increase coverage and accuracy of co-functional network based on gene neighborhood (Shin et al., 2014). We inferred two co-functional networks by DGN and PGN across 1,626 prokaryotic genomes as described in our previous work (Shin et al., 2014), then integrated them into a single network for gene neighborhood.
Functionally coupled genes are often gained and lost during speciation by their functional constraints. Therefore, we may infer co-functional links by similarity of phylogenetic profiles which are patterns of presence and absence of homologous genes in many other species genomes (Kensche et al., 2008). We refer to these species used to construct the phylogenetic profiles as reference species. We previously found that this network inference could be more effective with phylogenetic profiles for each domain of tree of life: Archaea, Bacteria, and Eukarya (Shin and Lee, 2015). Therefore, we constructed phylogenetic profiles for each of the three domains based on the best BLASTP hit score of all
As functional genes are evolutionarily conserved between species (orthologs), functional associations between genes can also be evolutionarily conserved between species (associalogs)(Kim et al., 2013). To identify orthology relationships, we used inparanoid (Sonnhammer and Ostlund, 2015) algorithm which includes inparalogous relationships for gene pairs with similar functions. We then identified evolutionarily conserved co-functional links between two species with the following inparanoid weighted
To generate a
To analyze the differential expression levels of PXO_RS05990 (
Plants were inoculated by
The workflow of the construction of XooNet is summarized in Fig. 1A and described in detail in the Material and Methods section. We constructed four component networks from
To evaluate the quality of XooNet, we need to use another co-functional gene pairs as a test data set to avoid over-fitting models. For the network assessment, we compiled co-functional gene pairs from MetaCyc pathway annotations, an independent annotation database from GOBP and KEGG that were used for generating co-functional gene pairs to train XooNet. Consistent with their independent origin, the test gene pairs based on MetaCyc pathway annotations overlap with only 9% of the gold-standard gene pairs used for network training. We observed a 20-fold enrichment of MetaCyc pathway gene pairs for the integrated XooNet compared with random gene pairs. This result indicates that integration of diverse genomics data effectively improved the quality of XooNet (Fig. 1B).
Next, we assessed the capability of XooNet to predict pathways in
To survive in ever changing environment, bacteria have evolved regulatory circuits that coordinate expression of one set of genes in one environment and a different set of genes in another environment. These regulatory circuits are generally operated by TCSs. Perturbation of key regulators for TCSs can cause differential expression of other genes, providing clues for the cellular pathways regulated by those TCSs. For example, mutations of two TCS regulators, StoS and SreK, which positively regulate extracellular polysaccharide production and swarming in
We implemented two network-based algorithms for generating functional hypotheses, which can be accessed on the XooNet “network-search” page (
As mentioned above,
To identify such regulators, we submitted the five
To access the function of the five candidate genes in regulation of
We also tested if the four mutant strains produce functionally active Rax/PctB proteins and activate the XA21 immune receptor in rice. We inoculated the mutant strains on both TP309 and XA21-TP309 rice plants. All mutant strains are virulent on TP309 plant as much as the wild-type strain thus, formed long lesion (>13 cm) (Fig. 4A). On XA21-TP309 plant, those mutants still induced the XA21-mediated immune response and formed a short lesion (6 cm<) like the wild-type strain (Fig. 4B) indicating that all Rax/PctB proteins are functional in the mutants.
We have demonstrated XooNet’s ability to predict a wide variety of cellular pathways by high retrieval efficacy of known genes for the same MetaCyc pathways. In addition, we successfully showed that XooNet could reconstruct pathways regulated by two TCS regulators, StoS and SreK, which are for extracellular polysaccharide production and swarming in
In this report, we present XooNet, a genome-scale co-functional network of
Component networks from ten distinct data type in XooNet
Code | Description | #links |
---|---|---|
XO-CX | Inferred co-functional network from co-expression in | 12,269 |
XO-DP | Inferred co-functional network from similarity of domain profiles between two | 3,078 |
XO-GN | Inferred co-functional network from gene neighborhood of | 18,487 |
XO-PG | Inferred co-functional network from similarity of phylogenetic profiles between two | 36,244 |
BA-HT | Associalogs transferred from high-throughput protein-protein interactions in five bacterial species | 5,522 |
EC-CC | Associalogs transferred from co-citation of | 19,210 |
EC-CX | Associalogs transferred from co-expression of | 20,283 |
EC-LC | Associalogs transferred from literature-curated protein-protein interactions in | 828 |
PA-CC | Associalogs transferred from co-citation of | 10,929 |
PA-CX | Associalogs transferred from co-expression of | 6,808 |
XooNet | An integrated co-functional network for | 106,000 |
Mol. Cells 2019; 42(2): 166-174
Published online February 28, 2019 https://doi.org/10.14348/molcells.2018.0403
Copyright © The Korean Society for Molecular and Cellular Biology.
Hanhae Kim1,4,6, Anna Joe2,5,6, Muyoung Lee1, Sunmo Yang1, Xiaozhi Ma3, Pamela C. Ronald2,5,*, and Insuk Lee1,*
1Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea, 2Department of Plant Pathology and the Genome Center, University of California, CA 95616, USA, 3Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China, 4Bio and Basic Science R&D Coordination Division, Korea Institute of S&T Evaluation and Planning, Seoul, Korea, 5Feedstocks Division, Joint Bioenergy Institute, CA 94608, USA
Correspondence to:*Correspondence: pcronald@ucdavis.edu (PR); insuklee@yonsei.ac.kr (IL)
Bacterial species in the genus
Keywords: co-functional network, plant bacterial pathogen,
TCSs control diverse aspects of biological function in bacteria. For example, RpfC and RpfG (
Although many genes involved in bacterial virulence have been identified through molecular and cellular studies, elucidation of virulence-associated processes of
A comprehensive understanding of regulatory circuits controlling bacterial virulence requires insight into the collaborative and regulatory interactions among multiple genes. Molecular interaction networks have proven useful in such endeavors (Cowen et al., 2017). Large-scale molecular networks have been constructed for many organisms, spanning from unicellular microbes to human and crops, by both experimental and computational approaches. However, experimental mapping of molecular interactions has been conducted in only limited number of species.
The availability of data from functional genomics, comparative genomics, and proteomics facilitates a genome-wide scale analysis of gene function. Because datasets from each technique are incomplete, error-prone, and limited in sensitivity, a single dataset alone is insufficient to fully describe a particular biological process. However, such datasets can be integrated to generate a more accurate and comprehensive view of gene function than is contained in any single dataset. For example, genome-scale co-functional networks have enabled effective integration of heterogeneous genomics data, significantly enhancing both accuracy and comprehensiveness of the molecular network models (Shim et al., 2017). These networks can then be utilized for prioritizing candidate genes for biological processes or complex traits of interest. For example, genome-scale co-functional networks for
In this study, we present a genome-scale co-functional network of
The genome set of
Co-functional gene networks were constructed by supervised machine learning processes, which require gold-standard data for benchmarking inferred models. Gold-standard data play critical roles in error-tolerant and unbiased learning. We compiled 11,669 positive gold-standard co-functional gene pairs from Gene Ontology Biological Process (GOBP) annotations (
To evaluate the constructed network, we generated another gold standard functional gene pairs based on MetaCyc pathway database (Caspi et al., 2016) independently. We generated 7457 and 250,664 gene pairs for positive and negative gold standard data sets, respectively. For the evaluation of network capacity to retrieve known genes for each pathway, we used only 145 MetaCyc pathway terms that have no less than five member genes.
We benchmarked inferred co-functional gene pairs for given genomics data (
Where
If there was no data intrinsic score associated with the inferred gene pairs, the calculated
Inferred co-functional links with assigned
where
Co-expressed genes across various biological conditions are likely to be co-regulated genes for a process. We inferred co-functional links from co-expression across
Protein domain is a structural and functional unit of protein. Therefore, proteins for similar function tend to have similar domain composition. We constructed domain profiles for coding genes of
In prokaryotic genomes, genes operating for the same process are often encoded as a co-transcriptional gene cluster, called operon. Therefore, we may infer co-functional gene pairs by their genomic proximity across prokaryotic genomes (Dandekar et al., 1998). We previously found that two measures of gene neighborhood, distance-based gene neighborhood (DGN) and probability-based gene neighborhood (PGN), are complementary and their integration can increase coverage and accuracy of co-functional network based on gene neighborhood (Shin et al., 2014). We inferred two co-functional networks by DGN and PGN across 1,626 prokaryotic genomes as described in our previous work (Shin et al., 2014), then integrated them into a single network for gene neighborhood.
Functionally coupled genes are often gained and lost during speciation by their functional constraints. Therefore, we may infer co-functional links by similarity of phylogenetic profiles which are patterns of presence and absence of homologous genes in many other species genomes (Kensche et al., 2008). We refer to these species used to construct the phylogenetic profiles as reference species. We previously found that this network inference could be more effective with phylogenetic profiles for each domain of tree of life: Archaea, Bacteria, and Eukarya (Shin and Lee, 2015). Therefore, we constructed phylogenetic profiles for each of the three domains based on the best BLASTP hit score of all
As functional genes are evolutionarily conserved between species (orthologs), functional associations between genes can also be evolutionarily conserved between species (associalogs)(Kim et al., 2013). To identify orthology relationships, we used inparanoid (Sonnhammer and Ostlund, 2015) algorithm which includes inparalogous relationships for gene pairs with similar functions. We then identified evolutionarily conserved co-functional links between two species with the following inparanoid weighted
To generate a
To analyze the differential expression levels of PXO_RS05990 (
Plants were inoculated by
The workflow of the construction of XooNet is summarized in Fig. 1A and described in detail in the Material and Methods section. We constructed four component networks from
To evaluate the quality of XooNet, we need to use another co-functional gene pairs as a test data set to avoid over-fitting models. For the network assessment, we compiled co-functional gene pairs from MetaCyc pathway annotations, an independent annotation database from GOBP and KEGG that were used for generating co-functional gene pairs to train XooNet. Consistent with their independent origin, the test gene pairs based on MetaCyc pathway annotations overlap with only 9% of the gold-standard gene pairs used for network training. We observed a 20-fold enrichment of MetaCyc pathway gene pairs for the integrated XooNet compared with random gene pairs. This result indicates that integration of diverse genomics data effectively improved the quality of XooNet (Fig. 1B).
Next, we assessed the capability of XooNet to predict pathways in
To survive in ever changing environment, bacteria have evolved regulatory circuits that coordinate expression of one set of genes in one environment and a different set of genes in another environment. These regulatory circuits are generally operated by TCSs. Perturbation of key regulators for TCSs can cause differential expression of other genes, providing clues for the cellular pathways regulated by those TCSs. For example, mutations of two TCS regulators, StoS and SreK, which positively regulate extracellular polysaccharide production and swarming in
We implemented two network-based algorithms for generating functional hypotheses, which can be accessed on the XooNet “network-search” page (
As mentioned above,
To identify such regulators, we submitted the five
To access the function of the five candidate genes in regulation of
We also tested if the four mutant strains produce functionally active Rax/PctB proteins and activate the XA21 immune receptor in rice. We inoculated the mutant strains on both TP309 and XA21-TP309 rice plants. All mutant strains are virulent on TP309 plant as much as the wild-type strain thus, formed long lesion (>13 cm) (Fig. 4A). On XA21-TP309 plant, those mutants still induced the XA21-mediated immune response and formed a short lesion (6 cm<) like the wild-type strain (Fig. 4B) indicating that all Rax/PctB proteins are functional in the mutants.
We have demonstrated XooNet’s ability to predict a wide variety of cellular pathways by high retrieval efficacy of known genes for the same MetaCyc pathways. In addition, we successfully showed that XooNet could reconstruct pathways regulated by two TCS regulators, StoS and SreK, which are for extracellular polysaccharide production and swarming in
In this report, we present XooNet, a genome-scale co-functional network of
. Component networks from ten distinct data type in XooNet.
Code | Description | #links |
---|---|---|
XO-CX | Inferred co-functional network from co-expression in | 12,269 |
XO-DP | Inferred co-functional network from similarity of domain profiles between two | 3,078 |
XO-GN | Inferred co-functional network from gene neighborhood of | 18,487 |
XO-PG | Inferred co-functional network from similarity of phylogenetic profiles between two | 36,244 |
BA-HT | Associalogs transferred from high-throughput protein-protein interactions in five bacterial species | 5,522 |
EC-CC | Associalogs transferred from co-citation of | 19,210 |
EC-CX | Associalogs transferred from co-expression of | 20,283 |
EC-LC | Associalogs transferred from literature-curated protein-protein interactions in | 828 |
PA-CC | Associalogs transferred from co-citation of | 10,929 |
PA-CX | Associalogs transferred from co-expression of | 6,808 |
XooNet | An integrated co-functional network for | 106,000 |