Mol. Cells 2016; 39(9): 692-698
Published online September 9, 2016
https://doi.org/10.14348/molcells.2016.0148
© The Korean Society for Molecular and Cellular Biology
Correspondence to : *Correspondence: jbkim@konkuk.ac.kr
Advances in next generation sequencing (NGS) technologies have enabled population-level studies for many animals to unravel the relationships between genotypic differences and traits of specific populations. The objective of this study was to perform evolutionary analysis of single nucleotide polymorphisms (SNP) in genes of Korean native cattle Hanwoo in comparison to SNP data from four other cattle breeds (Jersey, Simmental, Angus, and Holstein) and four related species (pig, horse, human, and mouse) obtained from public databases through NGS-based resequencing. We analyzed population structures and differentiation levels for the five cattle breeds and estimated species-specific SNPs with their origins and phylogenetic relationships among species. In addition, we identified Hanwoo-specific genes and proteins, and determined distinct changes in protein-protein interactions among five species (cattle, pig, horse, human, mouse) in the STRING network database by additionally considering indirect protein interactions. We found that the Hanwoo population was clearly different from the other four cattle populations. There were Hanwoo-specific genes related to its meat trait. Protein interaction rewiring analysis also confirmed that there were Hanwoo-specific protein-protein interactions that might have contributed to its unique meat quality.
Keywords evolutionary analyses, Hanwoo, interaction network, single nucleotide polymorphism, resequencing
Next-generation sequencing (NGS) technologies (Metzker, 2010) have enabled the accumulation of population-scale DNA sequence data. NGS has provided opportunities as well as challenges to many population-based genome projects such as the 1000 genomes project (Genomes Project et al., 2010), the 1000 bull genomes project (Hayes, 2012), the international HapMap project (International HapMap, 2003), and the Drosophila population genomics project (Begun et al., 2007). In addition, various species- and breed-specific studies have been conducted to identify unique genomic features. For example, novel nonsynonymous mutations specific to dogs living at high altitude areas have been identified though sequencing of 60 individual dogs (Gou et al., 2014). Similar study has been conducted for a pig population by sequencing 69 individuals, yielding a set of loci related to genetic adaptation to a high- and low-latitude environments (Ai et al., 2015). In addition, sequencing data of 234 bulls from the 1000 bull genome projects have been used to identify variants and traits associated with milk production level and curly coat (Daetwyler et al., 2014). Gir cattle population has also been analyzed through sequencing 11 individuals, resulting in the finding of a number of loci associated with osmotic stress and heat shock that can influence their adaptation to tropical climates (Liao et al., 2013). Recently, several studies have been performed on Hanwoo cattle breed, which is indigenous and representative cattle breed in Korea. The Hanwoo breed has evolved from the 1960s to the present in Korea with genetic improvement associated with meat traits (Lee et al., 2014). For examples, a comparative study on three cattle breeds (Hanwoo, Black Angus, and Holstein) has been performed to reveal genetic and genomic characteristics specific to the Hanwoo breed (Lee et al., 2013). Using whole-genome sequencing, a similar comparative analysis has been performed to identify variations in economically important traits in three Korean cattle breeds (Hanwoo, Jeju Heugu, and Korean Holstein) (Choi et al., 2014). Moreover, potential selective-sweep regions have been discovered through sequencing 10 Hanwoo and 10 Yanbian cattle individuals (Choi et al., 2015). However, most of these studies have usually focused on the identification of breed-specific variants and traits. Less attention has been paid to evolutionary and network-level perspective features to explain their uniqueness. Therefore, the objective of this study was to perform evolutionary analysis for Hanwoo cattle breed in the perspective of breed-specific single-nucleotide polymorphisms (SNPs), genes, and proteins through resequencing of Hanwoo cattles and build a protein-protein interaction database. Specifically, we analyzed the population structure and differentiation of five cattle breeds (Hanwoo, Jersey, Simmental, Angus, and Holstein). We identified cattle breed-specific SNPs and their evolutionary origins. In addition, we discovered Hanwoo-specific genes/proteins. Moreover, we investigated how these interactions among Hanwoo-specific proteins might have been rewired during evolution.
The DNA extraction protocol was approved by the Committee on Ethics of Animal Experiments, National Institute of Animal Science, Republic of Korea (Permit Number: NIAS2015-774). Genomic DNAs were extracted from AI bull semen straws or blood samples obtained from the Hanwoo Improvement Center of the National Agricultural Cooperative Federation in Republic of Korea with permission from the owners.
We generated whole-genome resequencing data from Hanwoo (N = 126). Hanwoo samples were obtained from the Hanwoo Improvement Center (National Agricultural Cooperative Federation, Republic of Korea). Indexed shotgun paired-end (PE) libraries with average inserts of 500 bp were generated using TruSeq Nano DNA Library Prep Kit (Illumina, USA) following standard Illumina sample-preparation protocol. Briefly, 200 ng of gDNAs were fragmented with Covaris M220 (USA) to obtain median fragment size of ∼500 bp. These fragmented DNAs were end repaired followed by A-tailing and ligation to indexed adapter (∼125 bp adapter). Gel-based size selection was performed for adapter-ligated DNAs to generate DNAs in the range of 550 to 650 bp. PCR amplification was performed in eight cycles. Size-selected libraries were analyzed with Agilent 2100 Bioanalyzer (Agilent Technologies) to determine the size distribution and determine whether there was adapter contamination. The resulting libraries without adaptor contamination were sequenced on Illumina HiSeq 2500 (2 × 125 bp paired-end sequences) and NextSeq500 (2 × 150 bp paired-end sequences) sequencing platforms.
Resequenced data of the 126 Hanwoo genomes and the sequencing data of other four cattle breeds (Jersey, Simmental, Angus, and Holstein; N = 10 for all breeds) collected from the NCBI SRA database were aligned to bovine reference genome assembly (UMD 3.1) using Bowtie2 v2.2.4 with default parameters (Langmead and Salzberg, 2012). SAMtools v1.1 (Li et al., 2009) was used for converting (SAM/BAM), sorting, and indexing alignments. Picard tools v1.125 (
The evoSNPI pipeline has been developed to predict evolutionary origins of SNPs and rewiring information of protein interactions among related species (Cho et al., 2015). Using evoSNPI, we found target species-specific genes/proteins as well as changes in protein interactions among different species from the SNP data. Input for evoSNPI included the following: (i) VCF files containing SNP information for each species obtained by independent SNP calling pipeline, (ii) pairwise whole-genome alignments between a chosen reference and all other species, and (iii) a phylogenetic tree in newick format. First, evoSNPI was used to find SNPs in orthologous positions given pairwise whole-genome alignments using liftover tool from UCSC Genome Browser (Karolchik et al., 2003). Interactivenn (Heberle et al., 2015) was then used to visualize orthologous information. Second, the evolutionary origin of SNPs was inferred based on position information of SNPs, whether those SNPs exist in orthologous regions, and the maximum parsimony algorithm (Takahashi and Nei, 2000). Once the evolutionary origins of SNPs were predicted, the number of SNPs on each branch of a phylogenetic tree was recorded. Third, target species-specific nonsynonymous SNPs and associated genes with those SNPs were identified. Finally, interactions among proteins of the genes in different species were identified from the STRING network database (Szklarczyk et al., 2014), one of the largest database of protein-protein interactions of many species. In this step, Random Walk with Restart (RWR) algorithm (Kim et al., 2008) was applied to the STRING network database to incorporate indirectly linked proteins with the original target species-specific proteins. Specifically, each protein in the STRING network was ranked with a score representing the degree of closeness with the original protein sets. The top 5% of those proteins were used as additional proteins. Edge scores in the STRING network database were normalized (between 0 to 1). These scores were used to quantify the similarity and difference in protein interactions among different species. Orthologous protein information was obtained from OrthoDB which covers 3,027 complete genomes including 61 vertebrate species (Kriventseva et al., 2015; Waterhouse et al., 2013).
ANNOVAR v 2015JUN17 (Wang et al., 2010) and SnpEff v4.1 (Cingolani et al., 2012) with Ensembl gene annotation database (UMD3.1) were used to annotate SNPs of the five cattle breeds. Hanwoo-specific nonsynonymous SNPs and genes in the Hanwoo breed were identified by comparing Hanwoo to other breeds. Hanwoo-specific genes were then analyzed to find overrepresented biological functions using panther website (Mi et al., 2005). Enriched biological functions associated with Hanwoo-specific genes (Bonferroni-corrected
The VCF files from randomly selected ten Hanwoo breeds and the other four cattle breeds were generated from the SNP calling step, merged by VCFtools v 0.1.13 (Danecek et al., 2011), and converted to PLINK format file (.ped and .map) using PLINK v1.90b (Purcell et al., 2007). Additional filtering was carried out with the PLINK tool using the following parameters: -geno 0.01 --maf 0.05 --hwe 0.000001. Principal component analysis (PCA) was applied with GCTA v1.24.4 (Yang et al., 2011). It was performed with the following two steps: (i) calculation of genetic relationship matrix (GRM) with parameters of “--make-grm”, and (ii) estimation of the first four principal components with parameters of “--pca 4”. The R package was used to generate the PCA plot. Population structure was inferred with ADMIXTURE v1.3.0 (Alexander et al., 2009) and visualized with CLUMPAK (Kopelman et al., 2015).
To identify regions of population differentiation among Hanwoo, Jersey, Simmental, Angus, and Holstein, the mean Z-transformed Fst values [Z(Fst)] were calculated for 100 kbp non-overlapping genomic windows in all chromosomes from the VCF files used in the population structure analysis with VCFtools v 0.1.13 (Danecek et al., 2011). Gene-level analysis was performed in genomic windows with extremely high Z-transformed Fst value (> 5) by identifying enriched Gene Ontology (GO) terms in those regions using the getBM function in the biomaRt R package (Durinck et al., 2005). In this analysis, copy number variable regions were collected from literature (Bickhart et al., 2012; Choi et al., 2013; 2016), and genes within those regions were not used. The manhattan plot of Z(Fst) values were generated with the qqman R package (Li et al., 2015).
A total of 16,361,482, 7,313,386, 8,180,573, 7,085,527 and 8,125,851 SNPs were identified from Hanwoo, Jersey, Simmental, Angus, and Holstein, respectively (Materials and Methods; Table 1). Among them, 14,551,596 (88.94%), 7,283,202 (99.59%), 8,159,778 (99.75%), 7,064,818 (99.71%) and 8,097,083 (99.65%) SNPs of Hanwoo, Jersey, Simmental, Angus, and Holstein, respectively, were reported in the dbSNP database (version 146). The transition-to-transversion ratio (Ti/Tv) was also calculated to evaluate SNP quality. The Ti/Tv ratios for Hanwoo, Jersey, Simmental, Angus, and Holstein were 2.29, 2.25, 2.24, 2.22 and 2.24, respectively. To identify SNPs explaining phenotypic differences in each cattle breed, we annotated all SNPs with 19 functional categories, such as synonymous, nonsynonymous, intron, and untranslated regions (Supplementary Table S2). The majority of SNPs were founded in the intergenic (72% in Hanwoo and 73% in other four cattle breeds) and intron (27% in Hanwoo and 26% in other four cattle breeds) regions. Only a small fraction of SNPs (1.2, 1.1, 1.0, 1,1 and 1,1% in Hanwoo, Jersey, Simmental, Angus, and Holstein, respectively) were detected in genic regions including exonic, splice site, and untranslated regions (Supplementary Table S2).
After filtering out SNPs with various population statistics such as minor allele frequency, genotype rate, and Hardy-Weinberg equilibrium (Materials and Methods), a total of 1,826,768 SNPs from ten individuals of each cattle breeds were used to analyze population structure. In this analysis, randomly selected ten Hanwoo individuals were used to reduce a sample size bias. We first used principal component analysis (PCA) to identify the relationships among the five cattle populations using SNP data. As shown in Fig. 1A, the Hanwoo population was distinctly separated from the other four cattle populations. Interestingly, individuals of the Jersey and Angus population were relatively more dispersed.
Next, we further analyzed the population structure of the five cattle populations using ADMIXTURE to estimate individual ancestry and admixture proportions (Materials and Method). Population structure plots for the number of clusters K from two to seven were drawn (Supplementary Fig. S2). Assuming that they were five ancestral populations (Fig. 1B), the Hanwoo population was clearly differentiated from the other populations. Although K was increased to seven, the Hanwoo population was still clustered as one distinct group and showed clear separation from the other four populations (Supplementary Fig. S2).
To identify the regions associated with population differentiation in the five cattle populations, we calculated the mean of Z-transformed Fst [Z(Fst)] values from SNPs in 100 kbp non-overlapping genomic regions (Materials and Methods). As shown in Fig. 2, 62 significant regions [Z(Fst) > 5] were identified as regions to explain population differentiation, with a total of 86 genes including 2,390 SNPs across all chromosomes. Some highly scored differentiation-regions included the PSAT1 gene associated with metabolic process (GO: 0008152), the BLCAP gene related to protein binding (GO: 0005515), and the FBLIM1 gene with functions in the regulation of protein localization (GO:0032880), mitochondrial inner membrane (GO:0005743), and filamin binding (GO:0031005), respectively. We also compared genes in the 62 significant regions to known cattle trait-associated genes (Kawahara-Miki et al., 2011), and found the SLC43A3 gene with transmembrane transport (GO:005585) function and the LEPR gene with leptin realted functions (GO:0033210, GO0038021 and 0044321) that have clear association with the meat trait of cattle.
Recently, we have developed a pipeline called evoSNPI (Cho et al., 2015) to predict the evolutionary origin of SNPs and the rewiring of protein-protein interactions among different species (Materials and Methods). We applied the evoSNPI pipeline to the Hanwoo SNP data as well as SNP data from the other cattle breeds (
Based on the identified SNPs in orthologous positions, we extracted Hanwoo-specific genes including nonsynonymous SNPs only in the Hanwoo breed. As a results, we found 1,509 Hanwoo-specific genes corresponding to 1,646 proteins identified in the Ensembl bioMart Database (Kinsella et al., 2011). To explain the specificity of Hanwoo, we extended the gene-level analysis to rewiring analysis for protein-protein interactions. The “rewiring” concept is a widely used term in systems biology to indicate the changes of interactions among proteins (or genes), and the systems-level characteristics of Hanwoo compared with other related species can be obtained from this analysis. From the initial Hanwoo-specific genes/proteins (1,509/1,646), we first ran the Random Walk with Restart algorithm, and found closely associated additional proteins based on the STRING network database (Supplementary Table S3). We also com pared the extended protein set (a total of 2,592 proteins) to known cattle trait-associated genes. As a result, 76 of the 2,592 extended proteins were cattle trait-associated ones, including MYOD1, MYH3, and PYGM (Supplementary Table S4). Majority (63 out of 76) of known cattle trait-associated genes had association with meat quality. Eleven genes were associated with milk production, while eleven genes were related to growth (Supplementary Table S4). Next, we compared protein-protein interactions of the 19 Hanwoo-specific proteins with orthologous information in OrthoDB among the five species (Fig. 4). After converting edge-scores to be between 0 and 1, the STRING network database was used to report similarity or difference in protein-protein interactions among species. The degree of network rewiring of 15 protein-protein interaction pairs of extended Hanwoo-specific proteins in the five species, which have edge-score difference 0.2 or higher between cattle and the other two species, is shown in Fig. 4. For example, there was no interaction between EGR3 and FGF2 in cattle breeds, although this interaction was observed in other species (0.36, 0.60, 0.36, and 0.53 in pig, horse, mouse, and human respectively). In contrast, there were many exclusive interactions only in the cattle breeds, including ABL and FGF2, GPX3 and CAT, PTN and FGF2. Results for all other protein pairs are summarized in Table S5. Examples of rewired protein interactions among cattle, pig, and horse are shown in Fig. 5. There was unique interaction between ABL and ANAS and between MYOD1 and IL2 in cattle. These unique interactions were not observed in pig or horse. In contrast, the interactions between PBX2 and CREBBP in pig or horse was not observed in the cattle network. In Fig. 5, red coloured genes are known to be associated with a meat, growth or milk trait in cattle (Kawahara-Miki et al., 2011).
In this study, 126 individuals of Korean native cattle Hanwoo were subjected to whole-genome resequencing using high-throughput next-generation sequencing technologies, and compared with the genomes of four other cattle breeds, Jersey, Simmental, Angus, and Holstein in terms of SNPs. The four cattle breeds were selected because they are all used for resources of meat or milk production as Hanwoo is the most important meat resource in Korea. In addition, these four cattle breeds have been widely used in recent studies for Hanwoo (Choi et al., 2014; Daetwyler et al., 2014; Ramey et al., 2013; Stothard et al., 2011) and their phylogenetic relationship was recently investigated (Decker et al., 2009).
We conducted population structure and differentiation analyses using SNPs to explain population genetic similarity and difference among the five cattle breeds. The Hanwoo population was clearly separated from the other four cattle populations, which represents that the Hanwoo breed might have a unique set of SNPs comparing to other cattle breeds, and such unique SNPs can explain the phenotypic differences of Hanwoo such as mean quality. The population structure of the five cattle populations yielded similar results to a previously reported phylogenomics study on cattle breeds (Decker et al., 2009). We discovered several candidate regions covering highly differentiated SNPs among the five cattle populations. From the GO enrichment analysis for the genes in these regions, metal ion binding, protein localization, mitochondrial inner membrane, and filamin binding functions were identified as enriched biological functions. Among them, the metal ion binding function is closely related to skeletal muscle responsible for the meat quality (Jeremiah et al., 2003).
We also performed network-based evolutionary analyses by using the evoSNPI pipeline and found perturbed changes in protein-protein interactions related with Hanwoo-specific genes comparing to other species (cattle, pig, horse, human, and mouse). There were Hanwoo-specific genes that have nonsynonymous SNPs not present in other four cattle breeds and other four species. Most of them are associated with the meat trait of Hanwoo. Rewired protein-protein interaction analysis among different species also identified Hanwoo breed-specific protein-protein interactions exclusively present only in the Hanwoo breed network such as interactions between EGR3 and FGF2, between ANAS and ALB, between CTSC and SLC46A2, between MYOD1 and IL2, between ALB and CXXC1, and between CREBBP and PBX2. In addition, the interaction between ALB and MTR was present both in the Hanwoo and pig networks, but the interaction in the Hanwoo network was more strong. Among them, three proteins, MTR, FGF2, and EGR3, are related with metabolism-related functions, which are known as an critical factor in making marbled meat in cattle (Lim et al., 2015). Therefore, this analysis confirmed that there were Hanwoo-specific protein interactions that might have contributed to its unique meat quality. This analysis enables the investigation of additional genes (and proteins) interacting with original breed-specific genes (and proteins) discovered by only using direct genetic differences, and the identification of systems-level features and their evolutionary changes relevant to phenotypic differences.
. Statistics of SNPs identified from Hanwoo, Jersey, Simmental, Angus, and Holstein cattle breeds
Cattle breeds | No. of SNPs | Found in dbSNPa | Ti/Tv ratiob |
---|---|---|---|
Hanwoo | 16,361,482 | 14,551,596 (88.94%) | 2.29 (0.00001) |
Jersey | 7,313,386 | 7,283,202 (99.59%) | 2.25 (0.00001) |
Simmental | 8,180,573 | 8,159,778 (99.75%) | 2.24 (0.00018) |
Angus | 7,085,527 | 7,064,818 (99.71%) | 2.22 (0.00000) |
Holstein | 8,125,851 | 8,097,083 (99.65%) | 2.24 (0.00013) |
aThe number of SNPs found in dbSNP database (version 146). Fractions are in parentheses.
bThe ratio of the number of transitions to the number of transversions. Standard deviations are in parentheses.
Mol. Cells 2016; 39(9): 692-698
Published online September 30, 2016 https://doi.org/10.14348/molcells.2016.0148
Copyright © The Korean Society for Molecular and Cellular Biology.
Daehwan Lee1,4, Minah Cho1,4, Woon-young Hong1, Dajeong Lim2, Hyung-Chul Kim2, Yong-Min Cho2, Jin-Young Jeong2, Bong-Hwan Choi2, Younhee Ko3, and Jaebum Kim1,*
1Department of Stem Cell and Regenerative Biology, Konkuk University, Seoul 05029, Korea, 2National Institute of Animal Science, Wanju 55365, Korea, 3Department of Clinical Genetics, Department of Pediatrics, Yonsei University College of Medicine, Seoul 03722, Korea, 4These authors contributed equally to this work.
Correspondence to:*Correspondence: jbkim@konkuk.ac.kr
Advances in next generation sequencing (NGS) technologies have enabled population-level studies for many animals to unravel the relationships between genotypic differences and traits of specific populations. The objective of this study was to perform evolutionary analysis of single nucleotide polymorphisms (SNP) in genes of Korean native cattle Hanwoo in comparison to SNP data from four other cattle breeds (Jersey, Simmental, Angus, and Holstein) and four related species (pig, horse, human, and mouse) obtained from public databases through NGS-based resequencing. We analyzed population structures and differentiation levels for the five cattle breeds and estimated species-specific SNPs with their origins and phylogenetic relationships among species. In addition, we identified Hanwoo-specific genes and proteins, and determined distinct changes in protein-protein interactions among five species (cattle, pig, horse, human, mouse) in the STRING network database by additionally considering indirect protein interactions. We found that the Hanwoo population was clearly different from the other four cattle populations. There were Hanwoo-specific genes related to its meat trait. Protein interaction rewiring analysis also confirmed that there were Hanwoo-specific protein-protein interactions that might have contributed to its unique meat quality.
Keywords: evolutionary analyses, Hanwoo, interaction network, single nucleotide polymorphism, resequencing
Next-generation sequencing (NGS) technologies (Metzker, 2010) have enabled the accumulation of population-scale DNA sequence data. NGS has provided opportunities as well as challenges to many population-based genome projects such as the 1000 genomes project (Genomes Project et al., 2010), the 1000 bull genomes project (Hayes, 2012), the international HapMap project (International HapMap, 2003), and the Drosophila population genomics project (Begun et al., 2007). In addition, various species- and breed-specific studies have been conducted to identify unique genomic features. For example, novel nonsynonymous mutations specific to dogs living at high altitude areas have been identified though sequencing of 60 individual dogs (Gou et al., 2014). Similar study has been conducted for a pig population by sequencing 69 individuals, yielding a set of loci related to genetic adaptation to a high- and low-latitude environments (Ai et al., 2015). In addition, sequencing data of 234 bulls from the 1000 bull genome projects have been used to identify variants and traits associated with milk production level and curly coat (Daetwyler et al., 2014). Gir cattle population has also been analyzed through sequencing 11 individuals, resulting in the finding of a number of loci associated with osmotic stress and heat shock that can influence their adaptation to tropical climates (Liao et al., 2013). Recently, several studies have been performed on Hanwoo cattle breed, which is indigenous and representative cattle breed in Korea. The Hanwoo breed has evolved from the 1960s to the present in Korea with genetic improvement associated with meat traits (Lee et al., 2014). For examples, a comparative study on three cattle breeds (Hanwoo, Black Angus, and Holstein) has been performed to reveal genetic and genomic characteristics specific to the Hanwoo breed (Lee et al., 2013). Using whole-genome sequencing, a similar comparative analysis has been performed to identify variations in economically important traits in three Korean cattle breeds (Hanwoo, Jeju Heugu, and Korean Holstein) (Choi et al., 2014). Moreover, potential selective-sweep regions have been discovered through sequencing 10 Hanwoo and 10 Yanbian cattle individuals (Choi et al., 2015). However, most of these studies have usually focused on the identification of breed-specific variants and traits. Less attention has been paid to evolutionary and network-level perspective features to explain their uniqueness. Therefore, the objective of this study was to perform evolutionary analysis for Hanwoo cattle breed in the perspective of breed-specific single-nucleotide polymorphisms (SNPs), genes, and proteins through resequencing of Hanwoo cattles and build a protein-protein interaction database. Specifically, we analyzed the population structure and differentiation of five cattle breeds (Hanwoo, Jersey, Simmental, Angus, and Holstein). We identified cattle breed-specific SNPs and their evolutionary origins. In addition, we discovered Hanwoo-specific genes/proteins. Moreover, we investigated how these interactions among Hanwoo-specific proteins might have been rewired during evolution.
The DNA extraction protocol was approved by the Committee on Ethics of Animal Experiments, National Institute of Animal Science, Republic of Korea (Permit Number: NIAS2015-774). Genomic DNAs were extracted from AI bull semen straws or blood samples obtained from the Hanwoo Improvement Center of the National Agricultural Cooperative Federation in Republic of Korea with permission from the owners.
We generated whole-genome resequencing data from Hanwoo (N = 126). Hanwoo samples were obtained from the Hanwoo Improvement Center (National Agricultural Cooperative Federation, Republic of Korea). Indexed shotgun paired-end (PE) libraries with average inserts of 500 bp were generated using TruSeq Nano DNA Library Prep Kit (Illumina, USA) following standard Illumina sample-preparation protocol. Briefly, 200 ng of gDNAs were fragmented with Covaris M220 (USA) to obtain median fragment size of ∼500 bp. These fragmented DNAs were end repaired followed by A-tailing and ligation to indexed adapter (∼125 bp adapter). Gel-based size selection was performed for adapter-ligated DNAs to generate DNAs in the range of 550 to 650 bp. PCR amplification was performed in eight cycles. Size-selected libraries were analyzed with Agilent 2100 Bioanalyzer (Agilent Technologies) to determine the size distribution and determine whether there was adapter contamination. The resulting libraries without adaptor contamination were sequenced on Illumina HiSeq 2500 (2 × 125 bp paired-end sequences) and NextSeq500 (2 × 150 bp paired-end sequences) sequencing platforms.
Resequenced data of the 126 Hanwoo genomes and the sequencing data of other four cattle breeds (Jersey, Simmental, Angus, and Holstein; N = 10 for all breeds) collected from the NCBI SRA database were aligned to bovine reference genome assembly (UMD 3.1) using Bowtie2 v2.2.4 with default parameters (Langmead and Salzberg, 2012). SAMtools v1.1 (Li et al., 2009) was used for converting (SAM/BAM), sorting, and indexing alignments. Picard tools v1.125 (
The evoSNPI pipeline has been developed to predict evolutionary origins of SNPs and rewiring information of protein interactions among related species (Cho et al., 2015). Using evoSNPI, we found target species-specific genes/proteins as well as changes in protein interactions among different species from the SNP data. Input for evoSNPI included the following: (i) VCF files containing SNP information for each species obtained by independent SNP calling pipeline, (ii) pairwise whole-genome alignments between a chosen reference and all other species, and (iii) a phylogenetic tree in newick format. First, evoSNPI was used to find SNPs in orthologous positions given pairwise whole-genome alignments using liftover tool from UCSC Genome Browser (Karolchik et al., 2003). Interactivenn (Heberle et al., 2015) was then used to visualize orthologous information. Second, the evolutionary origin of SNPs was inferred based on position information of SNPs, whether those SNPs exist in orthologous regions, and the maximum parsimony algorithm (Takahashi and Nei, 2000). Once the evolutionary origins of SNPs were predicted, the number of SNPs on each branch of a phylogenetic tree was recorded. Third, target species-specific nonsynonymous SNPs and associated genes with those SNPs were identified. Finally, interactions among proteins of the genes in different species were identified from the STRING network database (Szklarczyk et al., 2014), one of the largest database of protein-protein interactions of many species. In this step, Random Walk with Restart (RWR) algorithm (Kim et al., 2008) was applied to the STRING network database to incorporate indirectly linked proteins with the original target species-specific proteins. Specifically, each protein in the STRING network was ranked with a score representing the degree of closeness with the original protein sets. The top 5% of those proteins were used as additional proteins. Edge scores in the STRING network database were normalized (between 0 to 1). These scores were used to quantify the similarity and difference in protein interactions among different species. Orthologous protein information was obtained from OrthoDB which covers 3,027 complete genomes including 61 vertebrate species (Kriventseva et al., 2015; Waterhouse et al., 2013).
ANNOVAR v 2015JUN17 (Wang et al., 2010) and SnpEff v4.1 (Cingolani et al., 2012) with Ensembl gene annotation database (UMD3.1) were used to annotate SNPs of the five cattle breeds. Hanwoo-specific nonsynonymous SNPs and genes in the Hanwoo breed were identified by comparing Hanwoo to other breeds. Hanwoo-specific genes were then analyzed to find overrepresented biological functions using panther website (Mi et al., 2005). Enriched biological functions associated with Hanwoo-specific genes (Bonferroni-corrected
The VCF files from randomly selected ten Hanwoo breeds and the other four cattle breeds were generated from the SNP calling step, merged by VCFtools v 0.1.13 (Danecek et al., 2011), and converted to PLINK format file (.ped and .map) using PLINK v1.90b (Purcell et al., 2007). Additional filtering was carried out with the PLINK tool using the following parameters: -geno 0.01 --maf 0.05 --hwe 0.000001. Principal component analysis (PCA) was applied with GCTA v1.24.4 (Yang et al., 2011). It was performed with the following two steps: (i) calculation of genetic relationship matrix (GRM) with parameters of “--make-grm”, and (ii) estimation of the first four principal components with parameters of “--pca 4”. The R package was used to generate the PCA plot. Population structure was inferred with ADMIXTURE v1.3.0 (Alexander et al., 2009) and visualized with CLUMPAK (Kopelman et al., 2015).
To identify regions of population differentiation among Hanwoo, Jersey, Simmental, Angus, and Holstein, the mean Z-transformed Fst values [Z(Fst)] were calculated for 100 kbp non-overlapping genomic windows in all chromosomes from the VCF files used in the population structure analysis with VCFtools v 0.1.13 (Danecek et al., 2011). Gene-level analysis was performed in genomic windows with extremely high Z-transformed Fst value (> 5) by identifying enriched Gene Ontology (GO) terms in those regions using the getBM function in the biomaRt R package (Durinck et al., 2005). In this analysis, copy number variable regions were collected from literature (Bickhart et al., 2012; Choi et al., 2013; 2016), and genes within those regions were not used. The manhattan plot of Z(Fst) values were generated with the qqman R package (Li et al., 2015).
A total of 16,361,482, 7,313,386, 8,180,573, 7,085,527 and 8,125,851 SNPs were identified from Hanwoo, Jersey, Simmental, Angus, and Holstein, respectively (Materials and Methods; Table 1). Among them, 14,551,596 (88.94%), 7,283,202 (99.59%), 8,159,778 (99.75%), 7,064,818 (99.71%) and 8,097,083 (99.65%) SNPs of Hanwoo, Jersey, Simmental, Angus, and Holstein, respectively, were reported in the dbSNP database (version 146). The transition-to-transversion ratio (Ti/Tv) was also calculated to evaluate SNP quality. The Ti/Tv ratios for Hanwoo, Jersey, Simmental, Angus, and Holstein were 2.29, 2.25, 2.24, 2.22 and 2.24, respectively. To identify SNPs explaining phenotypic differences in each cattle breed, we annotated all SNPs with 19 functional categories, such as synonymous, nonsynonymous, intron, and untranslated regions (Supplementary Table S2). The majority of SNPs were founded in the intergenic (72% in Hanwoo and 73% in other four cattle breeds) and intron (27% in Hanwoo and 26% in other four cattle breeds) regions. Only a small fraction of SNPs (1.2, 1.1, 1.0, 1,1 and 1,1% in Hanwoo, Jersey, Simmental, Angus, and Holstein, respectively) were detected in genic regions including exonic, splice site, and untranslated regions (Supplementary Table S2).
After filtering out SNPs with various population statistics such as minor allele frequency, genotype rate, and Hardy-Weinberg equilibrium (Materials and Methods), a total of 1,826,768 SNPs from ten individuals of each cattle breeds were used to analyze population structure. In this analysis, randomly selected ten Hanwoo individuals were used to reduce a sample size bias. We first used principal component analysis (PCA) to identify the relationships among the five cattle populations using SNP data. As shown in Fig. 1A, the Hanwoo population was distinctly separated from the other four cattle populations. Interestingly, individuals of the Jersey and Angus population were relatively more dispersed.
Next, we further analyzed the population structure of the five cattle populations using ADMIXTURE to estimate individual ancestry and admixture proportions (Materials and Method). Population structure plots for the number of clusters K from two to seven were drawn (Supplementary Fig. S2). Assuming that they were five ancestral populations (Fig. 1B), the Hanwoo population was clearly differentiated from the other populations. Although K was increased to seven, the Hanwoo population was still clustered as one distinct group and showed clear separation from the other four populations (Supplementary Fig. S2).
To identify the regions associated with population differentiation in the five cattle populations, we calculated the mean of Z-transformed Fst [Z(Fst)] values from SNPs in 100 kbp non-overlapping genomic regions (Materials and Methods). As shown in Fig. 2, 62 significant regions [Z(Fst) > 5] were identified as regions to explain population differentiation, with a total of 86 genes including 2,390 SNPs across all chromosomes. Some highly scored differentiation-regions included the PSAT1 gene associated with metabolic process (GO: 0008152), the BLCAP gene related to protein binding (GO: 0005515), and the FBLIM1 gene with functions in the regulation of protein localization (GO:0032880), mitochondrial inner membrane (GO:0005743), and filamin binding (GO:0031005), respectively. We also compared genes in the 62 significant regions to known cattle trait-associated genes (Kawahara-Miki et al., 2011), and found the SLC43A3 gene with transmembrane transport (GO:005585) function and the LEPR gene with leptin realted functions (GO:0033210, GO0038021 and 0044321) that have clear association with the meat trait of cattle.
Recently, we have developed a pipeline called evoSNPI (Cho et al., 2015) to predict the evolutionary origin of SNPs and the rewiring of protein-protein interactions among different species (Materials and Methods). We applied the evoSNPI pipeline to the Hanwoo SNP data as well as SNP data from the other cattle breeds (
Based on the identified SNPs in orthologous positions, we extracted Hanwoo-specific genes including nonsynonymous SNPs only in the Hanwoo breed. As a results, we found 1,509 Hanwoo-specific genes corresponding to 1,646 proteins identified in the Ensembl bioMart Database (Kinsella et al., 2011). To explain the specificity of Hanwoo, we extended the gene-level analysis to rewiring analysis for protein-protein interactions. The “rewiring” concept is a widely used term in systems biology to indicate the changes of interactions among proteins (or genes), and the systems-level characteristics of Hanwoo compared with other related species can be obtained from this analysis. From the initial Hanwoo-specific genes/proteins (1,509/1,646), we first ran the Random Walk with Restart algorithm, and found closely associated additional proteins based on the STRING network database (Supplementary Table S3). We also com pared the extended protein set (a total of 2,592 proteins) to known cattle trait-associated genes. As a result, 76 of the 2,592 extended proteins were cattle trait-associated ones, including MYOD1, MYH3, and PYGM (Supplementary Table S4). Majority (63 out of 76) of known cattle trait-associated genes had association with meat quality. Eleven genes were associated with milk production, while eleven genes were related to growth (Supplementary Table S4). Next, we compared protein-protein interactions of the 19 Hanwoo-specific proteins with orthologous information in OrthoDB among the five species (Fig. 4). After converting edge-scores to be between 0 and 1, the STRING network database was used to report similarity or difference in protein-protein interactions among species. The degree of network rewiring of 15 protein-protein interaction pairs of extended Hanwoo-specific proteins in the five species, which have edge-score difference 0.2 or higher between cattle and the other two species, is shown in Fig. 4. For example, there was no interaction between EGR3 and FGF2 in cattle breeds, although this interaction was observed in other species (0.36, 0.60, 0.36, and 0.53 in pig, horse, mouse, and human respectively). In contrast, there were many exclusive interactions only in the cattle breeds, including ABL and FGF2, GPX3 and CAT, PTN and FGF2. Results for all other protein pairs are summarized in Table S5. Examples of rewired protein interactions among cattle, pig, and horse are shown in Fig. 5. There was unique interaction between ABL and ANAS and between MYOD1 and IL2 in cattle. These unique interactions were not observed in pig or horse. In contrast, the interactions between PBX2 and CREBBP in pig or horse was not observed in the cattle network. In Fig. 5, red coloured genes are known to be associated with a meat, growth or milk trait in cattle (Kawahara-Miki et al., 2011).
In this study, 126 individuals of Korean native cattle Hanwoo were subjected to whole-genome resequencing using high-throughput next-generation sequencing technologies, and compared with the genomes of four other cattle breeds, Jersey, Simmental, Angus, and Holstein in terms of SNPs. The four cattle breeds were selected because they are all used for resources of meat or milk production as Hanwoo is the most important meat resource in Korea. In addition, these four cattle breeds have been widely used in recent studies for Hanwoo (Choi et al., 2014; Daetwyler et al., 2014; Ramey et al., 2013; Stothard et al., 2011) and their phylogenetic relationship was recently investigated (Decker et al., 2009).
We conducted population structure and differentiation analyses using SNPs to explain population genetic similarity and difference among the five cattle breeds. The Hanwoo population was clearly separated from the other four cattle populations, which represents that the Hanwoo breed might have a unique set of SNPs comparing to other cattle breeds, and such unique SNPs can explain the phenotypic differences of Hanwoo such as mean quality. The population structure of the five cattle populations yielded similar results to a previously reported phylogenomics study on cattle breeds (Decker et al., 2009). We discovered several candidate regions covering highly differentiated SNPs among the five cattle populations. From the GO enrichment analysis for the genes in these regions, metal ion binding, protein localization, mitochondrial inner membrane, and filamin binding functions were identified as enriched biological functions. Among them, the metal ion binding function is closely related to skeletal muscle responsible for the meat quality (Jeremiah et al., 2003).
We also performed network-based evolutionary analyses by using the evoSNPI pipeline and found perturbed changes in protein-protein interactions related with Hanwoo-specific genes comparing to other species (cattle, pig, horse, human, and mouse). There were Hanwoo-specific genes that have nonsynonymous SNPs not present in other four cattle breeds and other four species. Most of them are associated with the meat trait of Hanwoo. Rewired protein-protein interaction analysis among different species also identified Hanwoo breed-specific protein-protein interactions exclusively present only in the Hanwoo breed network such as interactions between EGR3 and FGF2, between ANAS and ALB, between CTSC and SLC46A2, between MYOD1 and IL2, between ALB and CXXC1, and between CREBBP and PBX2. In addition, the interaction between ALB and MTR was present both in the Hanwoo and pig networks, but the interaction in the Hanwoo network was more strong. Among them, three proteins, MTR, FGF2, and EGR3, are related with metabolism-related functions, which are known as an critical factor in making marbled meat in cattle (Lim et al., 2015). Therefore, this analysis confirmed that there were Hanwoo-specific protein interactions that might have contributed to its unique meat quality. This analysis enables the investigation of additional genes (and proteins) interacting with original breed-specific genes (and proteins) discovered by only using direct genetic differences, and the identification of systems-level features and their evolutionary changes relevant to phenotypic differences.
. Statistics of SNPs identified from Hanwoo, Jersey, Simmental, Angus, and Holstein cattle breeds.
Cattle breeds | No. of SNPs | Found in dbSNPa | Ti/Tv ratiob |
---|---|---|---|
Hanwoo | 16,361,482 | 14,551,596 (88.94%) | 2.29 (0.00001) |
Jersey | 7,313,386 | 7,283,202 (99.59%) | 2.25 (0.00001) |
Simmental | 8,180,573 | 8,159,778 (99.75%) | 2.24 (0.00018) |
Angus | 7,085,527 | 7,064,818 (99.71%) | 2.22 (0.00000) |
Holstein | 8,125,851 | 8,097,083 (99.65%) | 2.24 (0.00013) |
aThe number of SNPs found in dbSNP database (version 146). Fractions are in parentheses.
bThe ratio of the number of transitions to the number of transversions. Standard deviations are in parentheses.
Jongin Lee, Nayoung Park, Daehwan Lee, and Jaebum Kim
Mol. Cells 2020; 43(8): 728-738 https://doi.org/10.14348/molcells.2020.0040Jung-Woo Choi, Bong-Hwan Choi, Seung-Hwan Lee, Seung-Soo Lee, Hyeong-Cheol Kim, Dayeong Yu, Won-Hyong Chung, Kyung-Tai Lee, Han-Ha Chai, Yong-Min Cho, and Dajeong Lim
Mol. Cells 2015; 38(5): 466-473 https://doi.org/10.14348/molcells.2015.0019Hyojung Paik, Junho Kim, Sunjae Lee, Hyoung-Sam Heo, Cheol-Goo Hur*, and Doheon Lee*
Mol. Cells 2012; 33(4): 351-361 https://doi.org/10.1007/s10059-012-2264-7