Cathepsin F, which is encoded by
Alternative splicing (AS) is an important and ubiquitous molecular mechanism that increases eukaryotes genome diversity and complexity by generating different isoforms of a single gene without significantly increasing genome size (Kim et al., 2014; 2016). High-throughput sequencing data revealed that over 90% of human genes undergo AS, and that this process is more frequent in higher than in lower eukaryotes (Ast, 2004; Pan et al., 2008). There are four major types of AS events, including exon creation or loss (skipping), alternative 5′ splice sites, alternative 3′ splice sites, and intron retention (Ast, 2004; Park et al., 2015a). In the human genome, 40% of the new exons are generated by AS, and most of these are cassette exons (inclusion or skipping of a single exon) (Zhang and Chasin, 2006). In addition, over 90% of the primate-specific cassette exons (recently generated exons) overlap with transposable elements (TEs) and 62% overlap with
Cathepsin F, a protein that is encoded by the Cathepsin F gene (
Animal preparation and study design were conducted according to the Guidelines of the Institutional Animal Care and Use Committee (KRIBB-AEC-16067) of the Korea Research Institute of Bioscience and Biotechnology (KRIBB). Rhesus and crab-eating monkeys were provided by the National Primate Research Center of Republic of Korea or imported from China using a Convention on International Trade in Endangered Species of Wild Fauna and Flora permit.
Total RNA samples extracted from
Using a standard protocol, genomic DNA from heparinized blood samples was extracted from the following species: (1) HU: human (
Complementary DNA was generated using the GoScript Reverse Transcription (RT) System (Promega). Following the manufacturer’s instructions, 500 ng total RNA, 1 μl oligo (dT)15 primer, 1 μl random primer, 4 μl GoScript 5× reaction buffer, 2 μl MgCl2, 1 μl nucleotides mix, 0.5 μl recombinant RNasin® ribonuclease inhibitor (Promega), 1 μl GoScript reverse transcriptase, and nuclease-free water (up to 20 μl) were added to a microcentrifuge tube, thoroughly mixed, and incubated for 1 h at 42°C. The expression levels of
Genomic DNA from the several primates mentioned above was amplified using primer pairs specifically designed from highly conserved sequences in human and non-human primates (
PCR products were separated on a 1.5% agarose gel, purified with the Gel SV Extraction kit (GeneAll), and cloned into a pGEM-T-easy vector (Promega). The cloned DNA was isolated using the Plasmid DNA Mini-prep kit (GeneAll). Primate DNA samples and alternative transcripts were sequenced by Macrogen, Inc.
To estimate the integration time of
In a previous study, and based on the large-scale transcriptome sequencing and genetic analyses of 16 tissues from male and female crab-eating monkey, we identified a specific AS event (relative to the human genome) corresponding to the integration of the
Our comparative structure analysis indicated that the integration of the
The NETWORK analysis performed revealed that the integration time of
To validate the
Approximately 45% of the human genome comprises TEs, which are mostly (about 90%) retroelements such as human endogenous retrovirus (about 8%), long interspersed elements (about 20%), and SINEs (about 13%) (Bannert and Kurth, 2004; Schmitz and Brosius, 2011).
In our previous transcriptome study, we identified
Reverse transcription-PCR results revealed six transcript variants of the
To investigate the expression level of the original and of the six variant transcripts of the
Human CTSF propeptide consists of a signal peptide, a cystatin-like domain, an I29 inhibitor domain, and a mature form of cathepsin F (Jeric et al., 2013). Previous studies have revealed the cysteine-cathepsin-related activation of programmed cell death (apoptosis) (Guicciardi et al., 2004; Repnik et al., 2012), but the physiological functions of CTSF have not been thoroughly investigated. The analysis of the translated sequences of the six transcript variants performed in the present study revealed that V1 and V6 transcripts encoded 464 amino acids, whereas V2–V5 transcripts encoded 500 amino acids. The C-terminal end region of these transcripts also differed from those of human (484 amino acids), rhesus monkey (490 amino acids), and crab-eating monkey (490 amino acids) reference genes (Fig. 6). These different C-terminal sequences were derived from several AS events, including
Non-human primates are the most valuable animal model species for biomedical research in microbiology, vaccine development, biochemistry, and neuroscience (Park et al., 2015a; Rhesus Macaque Genome et al., 2007), as they have more biological and behavioral similarities and closer genetic relationship to humans than other animal models such as rodents, rabbits, and dogs (Carlsson et al., 2004; Huh et al., 2012; Park et al., 2015b). Rhesus and crab-eating monkeys are the most widely and frequently used study species among non-human primates (Huh et al., 2012; Rhesus Macaque Genome et al., 2007). Previous studies demonstrated that missense mutations in the