Mol. Cells 2016; 39(2): 141-148
Published online January 7, 2016
https://doi.org/10.14348/molcells.2016.2264
© The Korean Society for Molecular and Cellular Biology
Correspondence to : *Correspondence: sykwon@kribb.re.kr
Oriental melon (C
Keywords genetic linkage map, Korean melon, simple sequence repeat, single-nucleotide polymorphism, transcriptome analysis
Melon (
Melon is a diploid species, with a basic number of chromosomes (x = 12 [2x = 2n = 24]) and an estimated genome size of 450 Mb, similar to that of rice (419 Mb). The melon genome is being sequenced as part of the Spanish Genome Initiative (MELOGENOMICS). Moreover, BAC libraries, high-resolution genetic maps, oligo-based microarrays, and a large number of transcriptome sequences (RNA-Seq and expressed sequence tag(EST)) for melon are also available as genetic and genomic tools.
Oriental melon (
Due to the high-throughput capacity of next-generation sequencing (NGS) technology, which was developed in the last decade, transcriptome analysis has become widely used for genome-scale studies. Transcriptome analysis can be used to profile gene expression and identify novel transcripts, splicing isoforms, and sequence variations, including single-nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs). In the present work, we generated a total of 67,440,566,178 raw sequence reads (67.4 Gb) from the female and male flowers, leaves, roots, and fruit of two oriental melon varieties (KM and NW). A total of 64,998 transcripts from KM and 100,234 from NW specimens were
To characterize the oriental melon transcriptome and increase the sequence coverage of
Sequence data with a quality score above 20 (Q ≥ 20) were extracted using SolexaQA (Cox et al., 2010; Kim et al., 2014). Sequence reads from different tissue samples were
The quality-checked reads from each tissue were merged and used for transcript assembly. The assembled transcripts were validated by direct comparison with gene sequences in the SEEDERS plant annotation database using BLASTX (evalue ≤ 1e?05) (Altschul et al., 1990). Protein sequences with the highest similarity were retrieved for further analysis. Short reads of KM transcripts were mapped to the MELONOMICS melon genome (
For gene ontology (GO) term analysis, the assembled loci were annotated to the GO database (downloaded from
Illumina sequencing was used to generate mRNA libraries for the various oriental melon tissues examined. Reads for each sequence tag were mapped to the assembled loci using Bowtie (mismatch ≤ 2 bp); the number of clean mapped reads for each locus was determined, and the data were normalized using the DESeq library in R. Only transcripts with a tag count ≥ 50 were retained for further analysis. Genes differentially expressed between samples were identified based on the fold-change in expression, with the results analyzed by
To identify SNPs, a quality check of the KM and NW raw reads was performed using the Solexa QA package. The raw reads were aligned against melon mRNA sequences using TopHat, with modified default parameters (mismatches [?N] = 1; maximum insertion length = 1; minimum intron length [?i] = 50; maximum intron length [?I] = 14,018; mate inner distance [?r] = 350; segment mismatches = 1; maximum segment intron = 100), and the results were saved as a BAM file for further analysis using SAMTools (Kim et al., 2014; Li et al., 2009).
Using the varFilter command in SAMTools, SNPs were called only for variable positions with a minimum mapping quality (?Q) of 30. The minimum and maximum read depths were set at 3 and 1,000, respectively. Significant SNP sites among the sequences of transcripts from KM, NW, and melon were identified using a Perl script developed in-house (Supplementary Fig. S2).
To identify SSRs, assembled transcripts of NW specimens were formatted according to the SSR Locator’s protocol (da Maia et al., 2008). Perfect SSRs (designated ‘P-type’ SSRs) forming dimer to hexamer motifs with more than five repeat units and located more than 100 bp from other SSRs were selected. Imperfect SSRs (designated ‘I-type’ SSRs) were selected by allowing for 5 bp of erroneous sequence. Previously reported criteria (Garg et al., 2011; Kong et al., 2007) were used to select SSRs. Primers were designed using Primer 3 in the SSR Locator. Using the designed primer sets, virtual PCR was performed with SSRs from NW specimens according to the SSR Locator’s protocol. Transcripts containing one P-type SSR were selected and used for the development of KM, NW, and melon markers. The following selection criteria were used for the primers: (i) the expected amplicon size should be the same as that of the virtual PCR; and (ii) the primer sets for the different SSRs should not overlap. The selected primer sets were used for virtual PCR analysis of the KM and melon sequence data to distinguish NW-specific marker candidates (Supplementary Fig. S3).
An oriental melon genetic map was constructed using MAPMAKER 3.0/EXP (Lander et al., 1987) with 234 dCAPS (Neff et al., 1998) and 25 SSR markers. F2 population NW and KM specimens were used for mapping. Recombination fractions were converted to map distances in centimorgans (cM) using the Kosambi mapping function (Kosambi, 1943).
To perform transcriptome analysis, RNA-Seq data were generated from five different tissues (female and male flowers, fruit, leaves, and roots) of two oriental melon cultivars (KM and NW). In total, 30.5 Gb (251,752,490 raw reads) and 36.8 Gb (287,233,170 raw reads) of KM and NW sequence data, respectively, were generated using an Illumina HighSeq 2000 (Supplementary Table S1). The quality of the sequence data (Q ≥ 20) was assessed using SolexaQA, and the reads were trimmed and sorted by length using the DynamicTrim and LengthSort programs, respectively.
Transcripts of each oriental melon cultivar were assembled using Velvet (v1.2.07) and Oases (v0.2.08) (
Putative functions of the assembled transcripts were annotated using BLASTP (e-value ≤ 1e?06) with the SEEDERS non-redundant protein database. Of the 64,998 KM transcripts, 36,871 were assigned to 21,363 reference proteins, and 64,149 of 100,234 NW transcripts were assigned to 21,914 reference proteins (Table 2). To classify the functions of the assembled oriental melon loci, GO term analysis was performed using TAIR identification information. A total of 42,386 KM and 42,743 NW transcripts were assigned to 23 functional categories: 13 ‘biological process’ categories, 7 ‘cellular component” categories, and 3 ‘molecular function’ categories (Fig. 2). For both the KM and NW transcripts, ‘cellular process’, ‘cell and cell part’, and ‘catalytic activity’ were the most common terms in the ‘biological process,’ ‘cellular component,’ and ‘molecular function’ categories, respectively. These GO term data will be used for further studies of the characteristics of oriental melon by functional profiling, prediction of gene function, and functional categorization of genes (Rhee et al., 2008).
Constitutive promoters, such as those for ubiquitin and 35S, are used in plant genetic engineering to express genes of interest in a wide range of species (Brisson, 1984; Cornejo et al., 1993). However, overexpression using constitutive promoters may lead to undesirable pleiotropic effects in transgenic plants (Hsieh et al., 2002; Kasuga et al., 1999). The use of tissue-specific promoters with particular developmental expression patterns has been suggested as a strategy to avoid such undesirable pleiotropic effects (Kasuga et al., 2004). Therefore, the development of tissue-specific promoters capable of driving transgene expression is an important area of research in plant genetic engineering (Potenza, 2004).
For unbiased detection of tissue-specific expressed transcripts, statistical analysis of a large number of raw reads was performed. Fisher’s exact test was used to compare the proportion of given transcripts among all transcripts in the different tissues. Non-normalized oriental melon cDNA libraries were prepared from the female and male flowers, fruit, leaves, and roots of each of the oriental melon cultivars. The sequence tags of short reads were mapped to each transcript using Bowtie, and the number of mapped reads for each transcript was then determined. The mapped reads were normalized using the DESeq library in R script. Statistical analyses identified 1,169 and 2,504 tissue-specific transcripts in the KM and NW cultivars, respectively. For the KM cultivar, 144, 107, 231, 256, and 431 transcripts were specific to the female flower, male flower, fruit, leaves, and roots, respectively. In the case of the NW cultivar, 75, 1,121, 368, 150, and 790 transcripts were specific to the female flower, male flower, fruit, leaves, and roots, respectively (Supplementary Tables S2 and S5). The functions of the KM and NW transcripts were predicted by identifying orthologues using melon mRNA sequences. In total, 16,873 of 27,427 melon mRNAs were identified as orthologues of 48,144 KM transcripts and 66,216 NW transcripts. Among these transcripts, 914 were KM-specific, identified based on 746 melon mRNAs, whereas 1,070 NW-specific transcripts were identified based on 573 melon mRNAs (Supplementary Table S3). These tissue-specific candidates will be validated using RT-PCR, and the promoter regions will be investigated for cloning tissue-specific promoters. Furthermore, functional studies of tissue-specific genes will provide additional insights into plant development.
Molecular markers are important resources for constructing high-density genetic maps such as those used in crop breeding and for the identification of traits of interest. Since NGS technology was developed, many plant genomes have been sequenced, including that of melon (Garcia-Mas et al., 2012). In addition, a large amount of sequence data for melon has been accumulated over the past several years (Gonzalez-Ibeas et al., 2007; 2010; Portnoy et al., 2011; Rodriguez-Moreno et al., 2011). SSRs and SNPs are increasingly used in the construction of melon genetic maps (Blanca et al., 2011; 2012; Diaz et al., 2011; Kong et al., 2011). SNPs and SSRs were identified among the KM, NW, and melon transcripts using the assembled transcripts and melon mRNA sequences. A total of 7,871 SNPs covering 2,156 loci and 3,110 transcripts were identified between the KM and NW cultivars (KM/NW), and 4,752 SNPs were identified in exon regions. Between the KM and melon sequences (KM/melon), 3,730 SNPs were identified covering 1,063 loci and 1,547 transcripts, and 2,297 SNPs were identified in exons (Table 3; Supplementary Table S4).
The distribution of synonymous and non-synonymous SNPs among the 12 melon chromosomes was also investigated. The number of synonymous and non-synonymous SNPs in chromosome 2 was significantly larger in KM/NW and KM/melon (Fig. 3). The frequency of SNP occurrence between KM and NW was expected to be low, as the sequences were derived from two near-isogenic lines. However, the number of SNPs between KM/NW was larger than between KM/melon. The melon samples used for genome sequencing were double-haploid line, derived from the cross between PI 161375 (Song-whan Charmi) (SC) and the ‘Piel de sapo’ (PS) (Oliver et al., 2001). The NW line was bred from the cross between EunCheon type commercial Fl variety and Chinese landrace melon. Furthermore, EunCheon is derived from the cross between Japanese landrace Charmi and small melon. Half of genome sequences in melon reference were Songwhan Charmi (PI 161375) while NW genome sequence were consisted with lots of melon landrace genome. Thus genome variations of KM and NW were higher than those of KM and melon reference. Consequently, fewer SNPs were identified in KM/melon compared with KM/NW. Data regarding the synonymous and non-synonymous SNPs of NW and melon compared with KM are provided in Supplementary Tables S6 and S7. The GO terms for the KM and NW transcripts or KM and melon mRNA sequences with SNPs were sorted, and the rate of synonymous and non-synonymous SNPs was determined (Supplementary Table S8).
dCAPS primers were designed based on 277 SNPs of KM and NW transcripts and used to screen polymorphic markers. dCAPS primer sets for 245 of the 277 SNPs were moderately amplified in both oriental melon cultivars. The amplified products were digested using restriction enzymes specific to sites in the primer sequences. A total of 234 PCR products exhibited polymorphism between the KM and NW cultivars, and 16 SNP markers are shown in Supplementary Fig. S4.
A general screening of the KM and NW transcript dataset and melon mRNA sequences for unigenes was performed for the presence of di-, tri-, tetra-, penta-, and hexa-SSR motifs. Motifs ranging from dimers to hexamers with more than five repeat units were selected (Garg et al., 2011; Kong et al., 2007). A total of 10,709 SSRs were identified among 9,938 KM transcripts (Supplementary Table S9). The major types of SSRs identified were dinucleotides (6,429), followed by trinucleotides (3,736), tetranucleotides (318), pentanucleotides (127), and hexanucleotides (99) (Table 4). The most frequent SSR motif was GA/TC (1,776), followed by AG/CT (1,574), AT/AT (1,171), TA/TA (1,148), and GAA/TTC (778) (Supplementary Table S10). In NW, 15,662 SSRs were identified among 14,436 transcripts (Supplementary Table S9). The major SSR motifs were dinucleotides (8,708) and trinucleotides (6,219), followed by tetra-nucleotides (412), hexanucleotides (166), and pentanucleotides (157) (Table 4). The GA/TC motif exhibited the highest frequency (2,731), followed by AG/CT (2,401), GAA/TTC (1,399), and AT/AT (1,310) (Supplementary Table S10).
For NW, primers were designed for SSR marker candidates from the transcripts. Transcripts containing one SSR marker candidate were selected for screening polymorphisms among KM, NW, and melon using virtual PCR (Supplementary Table S11). The presence and size of amplicons resulting from the virtual PCR analyses were compared between NW and KM or NW and melon (Supplementary Table S12) for use in further marker development. Of 8,052 SSR marker candidates, 64 were selected for PCR analysis to determine the presence of polymorphisms between KM and NW. A total of 25 SSR markers exhibited a polymorphic pattern between the two cultivars; the results for 16 markers are shown in Supplementary Fig. S5.
The polymorphisms of the 64 SSR markers and 277 dCAPS markers designed from SNP markers were analyzed in the parent lines, and 25 SSR and 234 SNP markers exhibited polymorphisms. All of these polymorphic markers were screened in 94 F2 population plants and exhibited co-dominant type Mendelian segregation. The genotypes of these 259 markers were used to construct an oriental melon genetic linkage map consisting of 12 linkage groups covering 926 cM, with an average map distance between markers of 3.7 cM. A total of 248 markers were assigned to the 12 linkage groups, and 11 markers were unlinked (Fig. 4). The chromosome numbers for all of the linkage groups were determined based on alignment of our DNA marker sequences with the melon genome sequence data (Garcia-Mas et al., 2012) regarding the markers are provided in Supplementary Table S13.
The oriental melon genetic linkage constructed based on the 25 SSR and 234 dCAPS SNP markers spanned 926 cM. Although not all of the markers were distributed evenly among the 12 linkage groups, analysis using the Χ2 goodness-of-fit test showed no significant distortions from the expected Mendelian ratio for any of the markers. The largest linkage group was on Ch6, spanning 114 cM, whereas the smallest was on Ch11, spanning 28.2 cM. Because no markers common to other previously published melon linkage maps were used in our study, it was difficult to estimate the consensus between the genome structures. A previously reported integrated melon genetic map was constructed using 1,592 markers from 8 independent mapping experiments and spanned 1,150 cM across the 12 linkage groups (Diaz et al., 2011). In this integrated map, the genetic length of the linkage groups ranged from 73 to 119 cM. The significantly shorter genetic length on Ch11 and lower resolution of linkage groups (8 gaps of more than 20 cM) in our map compared with the previously reported integrated map are most likely due to our development of marker types only from transcriptome data or to the use of an insufficient number of markers. It will therefore be necessary to use more markers derived from the genome sequence database and consensus markers positioned based on several mapping experiments to construct a linkage map in which the markers are evenly distributed and cover all of the genome.
. Metrics of oriental melon
KM | NW | |
---|---|---|
Number of assembled transcripts ( | 64,998 | 100,234 |
Minimum length (bp) | 200 | 117 |
Maximum length (bp) | 13,444 | 11,659 |
Mean length (bp) | 706 | 739 |
N50 | 939 | 1,138 |
Number of assembled loci | 49,409 | 51,557 |
. Functional annotation statistics of assembled oriental melon transcripts
Number of total transcripts | Number of annotated transcripts (e-value ≤ 1e?05) | Number of unigenes | |
---|---|---|---|
KM | 64,998 | 36,871 | 21,363 |
NW | 100,234 | 64,149 | 21,914 |
. Number of SNPs among KM, NW and Melon
KM/NW | KM/Melon | |
---|---|---|
Number of SNPs detected | 7,871 | 3,730 |
Number of SNPs at CDS | 4,752 | 2,297 |
Number of Loci with SNPs | 2,156 | 1,063 |
Number of Transcripts with SNPs | 3,110 | 1,547 |
. Types of SSRs according to motif length in KM and NW transcripts
Number of repeat units | KM | NW | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Length of motifs | Length of motifs | |||||||||
Di- | Tri- | Tetra- | Penta- | Hexa- | Di- | Tri- | Tetra- | Penta- | Hexa- | |
5 | 2,807 | 1,505 | 180 | 83 | 65 | 3,917 | 2,533 | 208 | 96 | 108 |
6 | 1,137 | 806 | 77 | 30 | 21 | 1,529 | 1,300 | 119 | 40 | 40 |
7 | 686 | 485 | 27 | 8 | 10 | 874 | 794 | 33 | 16 | 13 |
8 | 515 | 298 | 15 | 4 | 2 | 654 | 454 | 30 | 2 | 2 |
9 | 392 | 197 | 10 | - | 1 | 491 | 358 | 11 | - | 3 |
10 | 248 | 124 | 3 | - | - | 342 | 202 | 1 | 3 | - |
11 | 152 | 96 | 1 | 1 | - | 221 | 169 | 5 | - | - |
12 | 125 | 59 | 2 | 1 | - | 178 | 116 | 4 | - | - |
13 | 92 | 44 | 1 | - | - | 103 | 63 | - | - | - |
14 | 62 | 29 | 2 | - | - | 78 | 52 | 1 | - | - |
15 | 45 | 17 | - | - | - | 83 | 35 | - | - | - |
16 | 37 | 17 | - | - | - | 40 | 33 | - | - | - |
17 | 35 | 16 | - | - | - | 52 | 33 | - | - | - |
18 | 31 | 25 | - | - | - | 44 | 54 | - | - | - |
19 | 11 | 18 | - | - | - | 25 | 23 | - | - | - |
20 | 15 | - | - | - | - | 11 | - | - | - | - |
21 | 14 | - | - | - | - | 13 | - | - | - | - |
22 | 8 | - | - | - | - | 18 | - | - | - | - |
23 | 2 | - | - | - | - | 11 | - | - | - | - |
24 | 6 | - | - | - | - | 4 | - | - | - | - |
25 | 3 | - | - | - | - | 7 | - | - | - | - |
26 | 2 | - | - | - | - | 4 | - | - | - | - |
27 | - | - | - | - | - | 2 | - | - | - | - |
28 | 2 | - | - | - | - | 5 | - | - | - | - |
29 | 2 | - | - | - | - | 2 | - | - | - | - |
30 | - | - | - | - | - | - | - | - | - | - |
Total | 6,429 | 3,736 | 318 | 127 | 99 | 8,708 | 6,219 | 412 | 157 | 166 |
Mol. Cells 2016; 39(2): 141-148
Published online February 29, 2016 https://doi.org/10.14348/molcells.2016.2264
Copyright © The Korean Society for Molecular and Cellular Biology.
Hyun A Kim1,5, Ah-Young Shin1,5, Min-Seon Lee2, Hee-Jeong Lee2, Heung-Ryul Lee2, Jongmoon Ahn2, Seokhyeon Nahm2, Sung-Hwan Jo3, Jeong Mee Park1,4, and Suk-Yoon Kwon1,4,*
1Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Korea, 2Nongwoo Bio Co., Ltd., Yeoju 469-885, Korea, 3SEEDERS, Daeduk Industry Academic Cooperation Building, Daejeon 34016, Korea, 4Biosystems and Bioengineering Program, University of Science and Technology, Daejeon 305-350, Korea, 5These authors contributed equally to this work.
Correspondence to:*Correspondence: sykwon@kribb.re.kr
Oriental melon (C
Keywords: genetic linkage map, Korean melon, simple sequence repeat, single-nucleotide polymorphism, transcriptome analysis
Melon (
Melon is a diploid species, with a basic number of chromosomes (x = 12 [2x = 2n = 24]) and an estimated genome size of 450 Mb, similar to that of rice (419 Mb). The melon genome is being sequenced as part of the Spanish Genome Initiative (MELOGENOMICS). Moreover, BAC libraries, high-resolution genetic maps, oligo-based microarrays, and a large number of transcriptome sequences (RNA-Seq and expressed sequence tag(EST)) for melon are also available as genetic and genomic tools.
Oriental melon (
Due to the high-throughput capacity of next-generation sequencing (NGS) technology, which was developed in the last decade, transcriptome analysis has become widely used for genome-scale studies. Transcriptome analysis can be used to profile gene expression and identify novel transcripts, splicing isoforms, and sequence variations, including single-nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs). In the present work, we generated a total of 67,440,566,178 raw sequence reads (67.4 Gb) from the female and male flowers, leaves, roots, and fruit of two oriental melon varieties (KM and NW). A total of 64,998 transcripts from KM and 100,234 from NW specimens were
To characterize the oriental melon transcriptome and increase the sequence coverage of
Sequence data with a quality score above 20 (Q ≥ 20) were extracted using SolexaQA (Cox et al., 2010; Kim et al., 2014). Sequence reads from different tissue samples were
The quality-checked reads from each tissue were merged and used for transcript assembly. The assembled transcripts were validated by direct comparison with gene sequences in the SEEDERS plant annotation database using BLASTX (evalue ≤ 1e?05) (Altschul et al., 1990). Protein sequences with the highest similarity were retrieved for further analysis. Short reads of KM transcripts were mapped to the MELONOMICS melon genome (
For gene ontology (GO) term analysis, the assembled loci were annotated to the GO database (downloaded from
Illumina sequencing was used to generate mRNA libraries for the various oriental melon tissues examined. Reads for each sequence tag were mapped to the assembled loci using Bowtie (mismatch ≤ 2 bp); the number of clean mapped reads for each locus was determined, and the data were normalized using the DESeq library in R. Only transcripts with a tag count ≥ 50 were retained for further analysis. Genes differentially expressed between samples were identified based on the fold-change in expression, with the results analyzed by
To identify SNPs, a quality check of the KM and NW raw reads was performed using the Solexa QA package. The raw reads were aligned against melon mRNA sequences using TopHat, with modified default parameters (mismatches [?N] = 1; maximum insertion length = 1; minimum intron length [?i] = 50; maximum intron length [?I] = 14,018; mate inner distance [?r] = 350; segment mismatches = 1; maximum segment intron = 100), and the results were saved as a BAM file for further analysis using SAMTools (Kim et al., 2014; Li et al., 2009).
Using the varFilter command in SAMTools, SNPs were called only for variable positions with a minimum mapping quality (?Q) of 30. The minimum and maximum read depths were set at 3 and 1,000, respectively. Significant SNP sites among the sequences of transcripts from KM, NW, and melon were identified using a Perl script developed in-house (Supplementary Fig. S2).
To identify SSRs, assembled transcripts of NW specimens were formatted according to the SSR Locator’s protocol (da Maia et al., 2008). Perfect SSRs (designated ‘P-type’ SSRs) forming dimer to hexamer motifs with more than five repeat units and located more than 100 bp from other SSRs were selected. Imperfect SSRs (designated ‘I-type’ SSRs) were selected by allowing for 5 bp of erroneous sequence. Previously reported criteria (Garg et al., 2011; Kong et al., 2007) were used to select SSRs. Primers were designed using Primer 3 in the SSR Locator. Using the designed primer sets, virtual PCR was performed with SSRs from NW specimens according to the SSR Locator’s protocol. Transcripts containing one P-type SSR were selected and used for the development of KM, NW, and melon markers. The following selection criteria were used for the primers: (i) the expected amplicon size should be the same as that of the virtual PCR; and (ii) the primer sets for the different SSRs should not overlap. The selected primer sets were used for virtual PCR analysis of the KM and melon sequence data to distinguish NW-specific marker candidates (Supplementary Fig. S3).
An oriental melon genetic map was constructed using MAPMAKER 3.0/EXP (Lander et al., 1987) with 234 dCAPS (Neff et al., 1998) and 25 SSR markers. F2 population NW and KM specimens were used for mapping. Recombination fractions were converted to map distances in centimorgans (cM) using the Kosambi mapping function (Kosambi, 1943).
To perform transcriptome analysis, RNA-Seq data were generated from five different tissues (female and male flowers, fruit, leaves, and roots) of two oriental melon cultivars (KM and NW). In total, 30.5 Gb (251,752,490 raw reads) and 36.8 Gb (287,233,170 raw reads) of KM and NW sequence data, respectively, were generated using an Illumina HighSeq 2000 (Supplementary Table S1). The quality of the sequence data (Q ≥ 20) was assessed using SolexaQA, and the reads were trimmed and sorted by length using the DynamicTrim and LengthSort programs, respectively.
Transcripts of each oriental melon cultivar were assembled using Velvet (v1.2.07) and Oases (v0.2.08) (
Putative functions of the assembled transcripts were annotated using BLASTP (e-value ≤ 1e?06) with the SEEDERS non-redundant protein database. Of the 64,998 KM transcripts, 36,871 were assigned to 21,363 reference proteins, and 64,149 of 100,234 NW transcripts were assigned to 21,914 reference proteins (Table 2). To classify the functions of the assembled oriental melon loci, GO term analysis was performed using TAIR identification information. A total of 42,386 KM and 42,743 NW transcripts were assigned to 23 functional categories: 13 ‘biological process’ categories, 7 ‘cellular component” categories, and 3 ‘molecular function’ categories (Fig. 2). For both the KM and NW transcripts, ‘cellular process’, ‘cell and cell part’, and ‘catalytic activity’ were the most common terms in the ‘biological process,’ ‘cellular component,’ and ‘molecular function’ categories, respectively. These GO term data will be used for further studies of the characteristics of oriental melon by functional profiling, prediction of gene function, and functional categorization of genes (Rhee et al., 2008).
Constitutive promoters, such as those for ubiquitin and 35S, are used in plant genetic engineering to express genes of interest in a wide range of species (Brisson, 1984; Cornejo et al., 1993). However, overexpression using constitutive promoters may lead to undesirable pleiotropic effects in transgenic plants (Hsieh et al., 2002; Kasuga et al., 1999). The use of tissue-specific promoters with particular developmental expression patterns has been suggested as a strategy to avoid such undesirable pleiotropic effects (Kasuga et al., 2004). Therefore, the development of tissue-specific promoters capable of driving transgene expression is an important area of research in plant genetic engineering (Potenza, 2004).
For unbiased detection of tissue-specific expressed transcripts, statistical analysis of a large number of raw reads was performed. Fisher’s exact test was used to compare the proportion of given transcripts among all transcripts in the different tissues. Non-normalized oriental melon cDNA libraries were prepared from the female and male flowers, fruit, leaves, and roots of each of the oriental melon cultivars. The sequence tags of short reads were mapped to each transcript using Bowtie, and the number of mapped reads for each transcript was then determined. The mapped reads were normalized using the DESeq library in R script. Statistical analyses identified 1,169 and 2,504 tissue-specific transcripts in the KM and NW cultivars, respectively. For the KM cultivar, 144, 107, 231, 256, and 431 transcripts were specific to the female flower, male flower, fruit, leaves, and roots, respectively. In the case of the NW cultivar, 75, 1,121, 368, 150, and 790 transcripts were specific to the female flower, male flower, fruit, leaves, and roots, respectively (Supplementary Tables S2 and S5). The functions of the KM and NW transcripts were predicted by identifying orthologues using melon mRNA sequences. In total, 16,873 of 27,427 melon mRNAs were identified as orthologues of 48,144 KM transcripts and 66,216 NW transcripts. Among these transcripts, 914 were KM-specific, identified based on 746 melon mRNAs, whereas 1,070 NW-specific transcripts were identified based on 573 melon mRNAs (Supplementary Table S3). These tissue-specific candidates will be validated using RT-PCR, and the promoter regions will be investigated for cloning tissue-specific promoters. Furthermore, functional studies of tissue-specific genes will provide additional insights into plant development.
Molecular markers are important resources for constructing high-density genetic maps such as those used in crop breeding and for the identification of traits of interest. Since NGS technology was developed, many plant genomes have been sequenced, including that of melon (Garcia-Mas et al., 2012). In addition, a large amount of sequence data for melon has been accumulated over the past several years (Gonzalez-Ibeas et al., 2007; 2010; Portnoy et al., 2011; Rodriguez-Moreno et al., 2011). SSRs and SNPs are increasingly used in the construction of melon genetic maps (Blanca et al., 2011; 2012; Diaz et al., 2011; Kong et al., 2011). SNPs and SSRs were identified among the KM, NW, and melon transcripts using the assembled transcripts and melon mRNA sequences. A total of 7,871 SNPs covering 2,156 loci and 3,110 transcripts were identified between the KM and NW cultivars (KM/NW), and 4,752 SNPs were identified in exon regions. Between the KM and melon sequences (KM/melon), 3,730 SNPs were identified covering 1,063 loci and 1,547 transcripts, and 2,297 SNPs were identified in exons (Table 3; Supplementary Table S4).
The distribution of synonymous and non-synonymous SNPs among the 12 melon chromosomes was also investigated. The number of synonymous and non-synonymous SNPs in chromosome 2 was significantly larger in KM/NW and KM/melon (Fig. 3). The frequency of SNP occurrence between KM and NW was expected to be low, as the sequences were derived from two near-isogenic lines. However, the number of SNPs between KM/NW was larger than between KM/melon. The melon samples used for genome sequencing were double-haploid line, derived from the cross between PI 161375 (Song-whan Charmi) (SC) and the ‘Piel de sapo’ (PS) (Oliver et al., 2001). The NW line was bred from the cross between EunCheon type commercial Fl variety and Chinese landrace melon. Furthermore, EunCheon is derived from the cross between Japanese landrace Charmi and small melon. Half of genome sequences in melon reference were Songwhan Charmi (PI 161375) while NW genome sequence were consisted with lots of melon landrace genome. Thus genome variations of KM and NW were higher than those of KM and melon reference. Consequently, fewer SNPs were identified in KM/melon compared with KM/NW. Data regarding the synonymous and non-synonymous SNPs of NW and melon compared with KM are provided in Supplementary Tables S6 and S7. The GO terms for the KM and NW transcripts or KM and melon mRNA sequences with SNPs were sorted, and the rate of synonymous and non-synonymous SNPs was determined (Supplementary Table S8).
dCAPS primers were designed based on 277 SNPs of KM and NW transcripts and used to screen polymorphic markers. dCAPS primer sets for 245 of the 277 SNPs were moderately amplified in both oriental melon cultivars. The amplified products were digested using restriction enzymes specific to sites in the primer sequences. A total of 234 PCR products exhibited polymorphism between the KM and NW cultivars, and 16 SNP markers are shown in Supplementary Fig. S4.
A general screening of the KM and NW transcript dataset and melon mRNA sequences for unigenes was performed for the presence of di-, tri-, tetra-, penta-, and hexa-SSR motifs. Motifs ranging from dimers to hexamers with more than five repeat units were selected (Garg et al., 2011; Kong et al., 2007). A total of 10,709 SSRs were identified among 9,938 KM transcripts (Supplementary Table S9). The major types of SSRs identified were dinucleotides (6,429), followed by trinucleotides (3,736), tetranucleotides (318), pentanucleotides (127), and hexanucleotides (99) (Table 4). The most frequent SSR motif was GA/TC (1,776), followed by AG/CT (1,574), AT/AT (1,171), TA/TA (1,148), and GAA/TTC (778) (Supplementary Table S10). In NW, 15,662 SSRs were identified among 14,436 transcripts (Supplementary Table S9). The major SSR motifs were dinucleotides (8,708) and trinucleotides (6,219), followed by tetra-nucleotides (412), hexanucleotides (166), and pentanucleotides (157) (Table 4). The GA/TC motif exhibited the highest frequency (2,731), followed by AG/CT (2,401), GAA/TTC (1,399), and AT/AT (1,310) (Supplementary Table S10).
For NW, primers were designed for SSR marker candidates from the transcripts. Transcripts containing one SSR marker candidate were selected for screening polymorphisms among KM, NW, and melon using virtual PCR (Supplementary Table S11). The presence and size of amplicons resulting from the virtual PCR analyses were compared between NW and KM or NW and melon (Supplementary Table S12) for use in further marker development. Of 8,052 SSR marker candidates, 64 were selected for PCR analysis to determine the presence of polymorphisms between KM and NW. A total of 25 SSR markers exhibited a polymorphic pattern between the two cultivars; the results for 16 markers are shown in Supplementary Fig. S5.
The polymorphisms of the 64 SSR markers and 277 dCAPS markers designed from SNP markers were analyzed in the parent lines, and 25 SSR and 234 SNP markers exhibited polymorphisms. All of these polymorphic markers were screened in 94 F2 population plants and exhibited co-dominant type Mendelian segregation. The genotypes of these 259 markers were used to construct an oriental melon genetic linkage map consisting of 12 linkage groups covering 926 cM, with an average map distance between markers of 3.7 cM. A total of 248 markers were assigned to the 12 linkage groups, and 11 markers were unlinked (Fig. 4). The chromosome numbers for all of the linkage groups were determined based on alignment of our DNA marker sequences with the melon genome sequence data (Garcia-Mas et al., 2012) regarding the markers are provided in Supplementary Table S13.
The oriental melon genetic linkage constructed based on the 25 SSR and 234 dCAPS SNP markers spanned 926 cM. Although not all of the markers were distributed evenly among the 12 linkage groups, analysis using the Χ2 goodness-of-fit test showed no significant distortions from the expected Mendelian ratio for any of the markers. The largest linkage group was on Ch6, spanning 114 cM, whereas the smallest was on Ch11, spanning 28.2 cM. Because no markers common to other previously published melon linkage maps were used in our study, it was difficult to estimate the consensus between the genome structures. A previously reported integrated melon genetic map was constructed using 1,592 markers from 8 independent mapping experiments and spanned 1,150 cM across the 12 linkage groups (Diaz et al., 2011). In this integrated map, the genetic length of the linkage groups ranged from 73 to 119 cM. The significantly shorter genetic length on Ch11 and lower resolution of linkage groups (8 gaps of more than 20 cM) in our map compared with the previously reported integrated map are most likely due to our development of marker types only from transcriptome data or to the use of an insufficient number of markers. It will therefore be necessary to use more markers derived from the genome sequence database and consensus markers positioned based on several mapping experiments to construct a linkage map in which the markers are evenly distributed and cover all of the genome.
. Metrics of oriental melon
KM | NW | |
---|---|---|
Number of assembled transcripts ( | 64,998 | 100,234 |
Minimum length (bp) | 200 | 117 |
Maximum length (bp) | 13,444 | 11,659 |
Mean length (bp) | 706 | 739 |
N50 | 939 | 1,138 |
Number of assembled loci | 49,409 | 51,557 |
. Functional annotation statistics of assembled oriental melon transcripts.
Number of total transcripts | Number of annotated transcripts (e-value ≤ 1e?05) | Number of unigenes | |
---|---|---|---|
KM | 64,998 | 36,871 | 21,363 |
NW | 100,234 | 64,149 | 21,914 |
. Number of SNPs among KM, NW and Melon.
KM/NW | KM/Melon | |
---|---|---|
Number of SNPs detected | 7,871 | 3,730 |
Number of SNPs at CDS | 4,752 | 2,297 |
Number of Loci with SNPs | 2,156 | 1,063 |
Number of Transcripts with SNPs | 3,110 | 1,547 |
. Types of SSRs according to motif length in KM and NW transcripts.
Number of repeat units | KM | NW | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Length of motifs | Length of motifs | |||||||||
Di- | Tri- | Tetra- | Penta- | Hexa- | Di- | Tri- | Tetra- | Penta- | Hexa- | |
5 | 2,807 | 1,505 | 180 | 83 | 65 | 3,917 | 2,533 | 208 | 96 | 108 |
6 | 1,137 | 806 | 77 | 30 | 21 | 1,529 | 1,300 | 119 | 40 | 40 |
7 | 686 | 485 | 27 | 8 | 10 | 874 | 794 | 33 | 16 | 13 |
8 | 515 | 298 | 15 | 4 | 2 | 654 | 454 | 30 | 2 | 2 |
9 | 392 | 197 | 10 | - | 1 | 491 | 358 | 11 | - | 3 |
10 | 248 | 124 | 3 | - | - | 342 | 202 | 1 | 3 | - |
11 | 152 | 96 | 1 | 1 | - | 221 | 169 | 5 | - | - |
12 | 125 | 59 | 2 | 1 | - | 178 | 116 | 4 | - | - |
13 | 92 | 44 | 1 | - | - | 103 | 63 | - | - | - |
14 | 62 | 29 | 2 | - | - | 78 | 52 | 1 | - | - |
15 | 45 | 17 | - | - | - | 83 | 35 | - | - | - |
16 | 37 | 17 | - | - | - | 40 | 33 | - | - | - |
17 | 35 | 16 | - | - | - | 52 | 33 | - | - | - |
18 | 31 | 25 | - | - | - | 44 | 54 | - | - | - |
19 | 11 | 18 | - | - | - | 25 | 23 | - | - | - |
20 | 15 | - | - | - | - | 11 | - | - | - | - |
21 | 14 | - | - | - | - | 13 | - | - | - | - |
22 | 8 | - | - | - | - | 18 | - | - | - | - |
23 | 2 | - | - | - | - | 11 | - | - | - | - |
24 | 6 | - | - | - | - | 4 | - | - | - | - |
25 | 3 | - | - | - | - | 7 | - | - | - | - |
26 | 2 | - | - | - | - | 4 | - | - | - | - |
27 | - | - | - | - | - | 2 | - | - | - | - |
28 | 2 | - | - | - | - | 5 | - | - | - | - |
29 | 2 | - | - | - | - | 2 | - | - | - | - |
30 | - | - | - | - | - | - | - | - | - | - |
Total | 6,429 | 3,736 | 318 | 127 | 99 | 8,708 | 6,219 | 412 | 157 | 166 |
Young-Kyu Kim, Chong-wook Park, and Ki-Joong Kim
Mol. Cells 2009; 27(3): 365-381 https://doi.org/10.1007/s10059-009-0047-6