Mol. Cells 2021; 44(9): 658-669
Published online September 27, 2021
https://doi.org/10.14348/molcells.2021.0173
© The Korean Society for Molecular and Cellular Biology
Correspondence to : schoi@kangwon.ac.kr (SSC); lkkim@yuhs.ac (LKK)
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
Enhancers have been conventionally perceived as cis-acting elements that provide binding sites for trans-acting factors. However, recent studies have shown that enhancers are transcribed and that these transcripts, called enhancer RNAs (eRNAs), have a regulatory function. Here, we identified putative eRNAs by profiling and determining the overlap between noncoding RNA expression loci and eRNA-associated histone marks such as H3K27ac and H3K4me1 in hepatocellular carcinoma (HCC) cell lines. Of the 132 HCC-derived noncoding RNAs, 74 overlapped with the eRNA loci defined by the FANTOM consortium, and 65 were located in the proximal regions of genes differentially expressed between normal and tumor tissues in TCGA dataset. Interestingly, knockdown of two selected putative eRNAs, THUMPD3-AS1 and LINC01572, led to downregulation of their target mRNAs and to a reduction in the proliferation and migration of HCC cells. Additionally, the expression of these two noncoding RNAs and target mRNAs was elevated in tumor samples in the TCGA dataset, and high expression was associated with poor survival of patients. Collectively, our study suggests that noncoding RNAs such as THUMPD3-AS1 and LINC01572 (i.e., putative eRNAs) can promote the transcription of genes involved in cell proliferation and differentiation and that the dysregulation of these noncoding RNAs can cause cancers such as HCC.
Keywords enhancer RNA, hepatocellular carcinoma, long noncoding RNA, RNA polymerase II, transcribed enhancer, transcriptional regulation
Tens of thousands of eRNAs have been further systematically identified and characterized under the basic definition of ncRNAs transcribed from enhancer regions marked by two histone modification signals, H3K4me1 and H3K27ac, in various human tissues and cell types through the Functional Annotation of the Mammalian Genome (FANTOM) consortium (https://fantom.gsc.riken.jp/) (Andersson et al., 2014; Creyghton et al., 2010). FANTOM has thus far compiled more than 65,000 enhancers that have been identified to produce eRNAs (Andersson et al., 2014). It is becoming clear that active enhancers in the genome are associated with eRNA production, i.e., transcription of eRNAs could be a hallmark for the active state of enhancers (Andersson et al., 2014; De Santa et al., 2010; Hah et al., 2013; Kim et al., 2010; Lam et al., 2013; Li et al., 2013; Melgar et al., 2011; Wu et al., 2014).
Although few findings contradicting the functional importance of eRNAs as
Recently, the functional importance of eRNAs has been studied in the context of the regulation of inflammatory responses. For instance, the production of eRNAs in the enhancer region of
To date, no eRNAs involved in hepatocellular carcinoma (HCC) have been reported. In the present work, we attempted to identify putative eRNAs that are commonly expressed in different human HCC cell lines by integrating two types of chromatin signals, H3K4me1 and H3K27ac, and information about ncRNAs derived from RNA-seq data. Subsequently, we identified putative target mRNAs of each ncRNA identified as a putative eRNA by investigating correlations with the expression levels of the mRNAs. Furthermore, we attempted to prove the function by conducting knockdown (KD) experiments for two selected putative eRNAs:
Chromatin immunoprecipitation sequencing (ChIP-seq) and RNA sequencing (RNA-seq) data were obtained from seven hepatocellular cell lines: Huh-7, Huh7.5, FT3-7, PLC-PRF-5, SNU182, SNU387, and SNU449 (Alexander et al., 1976; Ku and Park, 2005; Yi, 2010). Cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM) (for Huh-7, Huh7.5, and FT3-7 cells; Gibco, USA) or Roswell Park Memorial Institute-1640 medium (for PLC-PRF-5, SNU182, SNU387, and SNU449 cells; HyClone, USA) containing 10% fetal bovine serum (FBS; Gibco) and 1% antibiotic-antimycotic (Gibco). All of these cell lines were tested for mycoplasma contamination in the Core facility of National Cancer Center, Korea, and verified to be negative.
Total RNA was isolated using Hybrid-R (GeneAll, Korea) for mRNA sequencing, library preparation was performed with DNase I-treated total RNA using a TruSeq kit (Illumina, USA), and paired-end sequencing was performed using the HiSeq platform (Illumina). The total RNA-seq and ChIP-seq reads for each cell line are summarized in Supplementary Tables S1 and S2, respectively. The peak calling results of the two active histone marks, H3K4me1 and H3K27ac, positioned in the genomic regions where the two putative eRNAs located are listed in Supplementary Table S3.
The ChIP-seq and RNA-seq data set analyzed in this study have been deposited in the Gene Expression Omnibus with the project number PRJNA737566 (Reviewer access: https://dataview.ncbi.nlm.nih.gov/object/PRJNA737566?reviewer=r8n75e8rhnls10h1m1nt440f07). ChIP-seq data visualized in UCSC Browser (http://genome.ucsc.edu/s/yeeun/eRNA_study_ChIP%2Dseq).
RNA-seq data for five different cancer types were downloaded from TCGA (https://portal.gdc.cancer.gov/), including liver hepatocellular carcinoma (LIHC), breast cancer (BRCA), colorectal adenocarcinoma (COAD), stomach adenocarcinoma (STAD), and lung adenocarcinoma (LUAD) data. RNA-seq data from 40 LIHC, 102 BRCA, 39 COAD, 16 STAD, and 52 LUAD matched normal-tumor samples of the same patient were used for our analysis.
After the quality of raw sequencing reads was checked using ‘FastQC’ (v0.11.8) (Andrews, 2010) for the RNA-seq data generated from the seven hepatocellular cell lines, the contaminating adaptor and unpaired reads were excluded using ‘Trimmomatic’ (v0.38; parameters: PE LEADING, 3; TRAILING, 3; SLIDINGWINDOW, 4:15; MINLEN, 36) (Bolger et al., 2014). The cleaned reads were then aligned to the reference genome (GRCh38/hg38; http://asia.ensembl.org/) using ‘Tophat’ (v2.1.1; parameters: --library-type fr-firststrand) (Trapnell et al., 2009), and the fragments per kilobase of transcript per million mapped reads (FPKM) values were obtained by running ‘Cuffquant’ (v2.2.1; parameters: --library-type fr-firststrand --multi-read-correct --frag-bias-correct) (Trapnell et al., 2010). After normalization of the 7 cell line library sizes using ‘Cuffnorm’ (v2.2.1; parameters: --library-type fr-firststrand --library-norm-method classic-fpkm) (Trapnell et al., 2010), ncRNAs with FPKM ≥ 1 were mapped to the genomic regions where the active enhancer marks H3K27ac and H3K4me1 were located.
The ChIP-seq data generated from the seven hepatocellular cell lines for H3K27ac and H3K4me1 were mapped to the reference genomic regions as follows. The quality check and trimming procedures for raw ChIP-seq reads were the same as those used for the RNA-seq data described above. The cleaned ChIP-seq data were then aligned to the reference genome (GRCh38/hg38) using ‘Bowtie’ (v1.3.0; parameters: -S --fr) (Langmead et al., 2009). Subsequently, the reads mapped to black-list regions from the ENCODE portal (identifier: ENCFF356LFX; https://www.encodeproject.org/) (Sloan et al., 2016) were filtered out using bedtools (2.27.0; parameters: intersect v) (Quinlan and Hall, 2010) and nonuniquely aligned reads and potential PCR duplicates were excluded using ‘MarkDuplicates’ of ‘Picard’ (v2.25.0; REMOVE_DUPLICATES = true, ASSUME_SORTED = true, VALIDATION_STRINGENCY = LENIENT) (Picard Toolkit, 2019), and peak calling and peak annotation were performed using ‘MACS2’ (v2.2.4; parameters: callpeak -f BAMPE -p 0.0001 [--narrow | --broad] -g hs --SPMR [--call-summits | --no-model] -B --keep-dup 1 --mfold 10 100) (Zhang et al., 2008) and ‘annotatePeaks.pl’ of ‘HOMER’ (v4.11.1; parameter: hg38) (Heinz et al., 2010), respectively.
The human HCC cell line FT3-7 was cultured in DMEM (HyClone) with 10% FBS (HyClone) and penicillin-streptomycin (Gibco) and maintained at 37°C in an incubator containing 5% CO2. To knock down two selected eRNAs,
Cell proliferation was assessed by cell counting kit-8 (CCK-8; Dojindo, Japan) assays following the manufacturer’s protocol. In brief, cells were seeded in 96-well plates, and 10 µl of CCK-8 reagent was added at the indicated time points (0, 24, 48, and 72 h). Cell viability was measured with a microplate reader (Molecular Devices, USA) at 450 nm.
The cells were plated in 96-well plates (3595; Corning, USA). The proliferation of cells was monitored for 4 days by an IncuCyte Live-Cell analysis system (Sartorius, Germany).
A total of 2,000 cells were plated in 6-well plates and maintained for 2 weeks. To visualize and count colonies grown for 2 weeks, cells were fixed with 4% paraformaldehyde and stained with 1% crystal violet.
To evaluate the invasion or migration of cancer cells, transwell plates (8 µm pore size, 3422; Corning) were coated with Matrigel (354234; BD, USA) or gelatin (G1393; Sigma, USA). Cells (3 × 105/ml cells in 100 µl for invasion assays and 1.5 × 105/ml cells in 100 µl for migration assays) in serum-free medium were added to the upper chamber, and 600 µl of 10% FBS medium was added to the lower chamber. After 24-48 h, the invaded and migrated cells were fixed with 4% paraformaldehyde and stained with 1% crystal violet. Six random fields were photographed at ×200 magnification with a microscope (Carl Zeiss, Germany).
Triplicate RNA-seq data were generated from each of the two KD experiments, i.e., the
Kaplan–Meier survival plots and hazard ratios (HRs) for the two selected ncRNAs,
All statistical tests and plots were conducted using the R (4.0.3) and Python (3.6.8) languages. Kendall’s correlation analysis and linear regression analysis were performed with the ‘scipy.stats.stats’ and ‘sklearn.linear_model’ packages, respectively, of Python (3.6.8). Gene ontology (GO) analysis was performed using the DAVID tool (https://david.ncifcrf.gov/home.jsp) (Jiao et al., 2012). Other batch works were performed with home-built Python scripts.
Active enhancers with two histone modification signals, H3K4me1 (i.e., an enhancer chromatin mark) and H3K27ac (i.e., an active chromatin mark), are known to produce eRNAs (Creyghton et al., 2010; Heintzman et al., 2007; 2009). Figure 1 shows our workflow used for identifying and detecting eRNAs: we used two types of ChIP-seq signals marking active enhancers and total RNA-seq data. Our rationale for identifying eRNAs was to select only ncRNAs transcribed from the genomic regions that are supposed to be active enhancer regions.
Briefly, only those ncRNAs transcribed from genomic positions at which the two types of ChIP-seq signals were commonly located were selected after excluding other RNAs transcribed from the genomic regions mapped to transcription start sites (TSSs), transcription termination sites (TTSs), and exons of protein-coding genes (Fig. 1A). A list of the ncRNAs that were considered to be transcribed from active enhancer regions was generated for each of seven different HCC cell lines: FT3-7, Huh-7, Huh7.5, PLC-PRF-5, SNU182, SNU387, and SNU449. Subsequently, those ncRNAs commonly identified from four different cell lines were defined as putative eRNAs with high confidence (Fig. 1A).
We decided to overlap the lists of eRNAs from four cell lines, Huh7.5, SNU182, FT3-7, and PLC-PRF-5, rather than from all seven cell lines because principal component analysis (PCA) showed that the gene expression profiles of three cell lines, SNU387, SNU449, and Huh-7, were substantially different from those of the remaining four cell lines (Supplementary Fig. S1). As expected, the list of overlapping eRNAs from all seven cell lines that included the three cell lines with very different gene expression profiles had a substantially reduced number of confident eRNAs such that further exploration would have been difficult. The numbers of ChIP-seq signals, transcribed ncRNAs, and ncRNAs detected in the active enhancer regions are summarized in Supplementary Table S4.
Consequently, a total of 132 ncRNAs that were commonly detected in the four cell lines were defined to be putative eRNAs (Supplementary Table S5), i.e., ncRNAs transcribed from active enhancer regions. Interestingly, all 132 ncRNAs were lncRNAs by definition with lengths > 200 nt. Small ncRNAs, such as miRNAs, piRNAs, and siRNAs, were also present in the individual cell lines we used to produce the ChIP-seq and RNA-seq data, but all disappeared when the ncRNAs from the four different cell types overlapped. Subsequently, by referencing the annotation information in GTF files (GRCh38/hg38), the lncRNAs were subcategorized into antisense ncRNAs, long intergenic noncoding RNA (lincRNAs), pseudogenes, sense overlapping ncRNAs and sense intronic ncRNAs, of which antisense ncRNAs and lincRNAs were the most frequent types (Fig. 1B). No chromosomal bias seemed to be present in the ncRNAs identified as putative eRNAs (Supplementary Table S5). Approximately 56% (74/132) of the genomic positions that were found to produce ncRNAs identified as putative eRNAs also overlapped with the ‘transcribed eRNAs’ identified with FANTOM, producing a significant enrichment of FANTOM eRNAs in the list of eRNAs detected in the present work (chi-squared test,
Next, to determine whether the expression of 132 ncRNAs identified as putative eRNAs is altered during tumor formation, we investigated the expression profiles of these ncRNAs in TCGA LIHC data. Approximately 49.2% (65/132) of the 132 ncRNAs were found to be significantly differently expressed (i.e., DEGs) between normal and tumor tissues in the TCGA-LIHC dataset (Fig. 2A). The volcano plot in Fig. 2B shows that the magnitude of the changes in the expression levels of these ncRNA genes was relatively weak but still significant. Notably, the number of upregulated ncRNAs was greater than that of downregulated ncRNAs (49 upregulated and 16 downregulated). The ChIP-seq peak locations and shapes for two putative eRNAs,
We next tried to identify putative eRNA-target mRNA pairs by correlation analysis using information on the expression levels of these genes from the seven different HCC cell lines. The rationale for this approach was that the expression levels of an eRNA should be correlated (either positively or negatively, although a positive correlation seems to be more likely assuming that enhancers act as
As a result, we found a total of putative 44,650 eRNA-target mRNA pairs (highlighted in dark blue in Fig. 3A) using weak thresholds (
Subsequently, we investigated whether these correlated genes were located in the same topology associating domains (TADs) where each of the ncRNAs identified as putative eRNAs was positioned. Surprisingly, there were few genes in the same TADs as
We next investigated which genes had their expression altered by knocking down these two ncRNAs under the assumption that the expression of target genes should be altered by KD of their upstream regulator eRNAs. For this purpose, ASOs were employed to KD expression in the FT3-7 HCC cell line. One ASO was designed to target exon 3 of
DEGs before and after KD were then identified using the threshold
We next analyzed which genes in the lists from the
Seven genes,
Similarly, five genes,
Furthermore, the expression levels of some of these genes were found to be associated with the overall survival of LIHC patients (Figs. 5E and 5F): upregulation of the genes conferred an approximately two times more likely chance of poor survivals (odds ratio: approximately 1.6-2.0) (Fig. 5G).
As mentioned in the Introduction, several eRNAs are known to have oncogenic potential. Consistently, we showed (Fig. 5) that the two putative eRNAs,
Subsequently, we examined whether these putative eRNAs could affect the invasion and migration of HCC cells using transwell assays. Invaded and migrated cells were detected after allowing the cells to migrate for 24-48 h. As shown in Figs. 6F and 6G, the invasion and migration abilities of
In the present work, a total of 132 putative eRNAs were identified by intersecting the genomic positions identified from three types of information: (i) enhancer chromatin marks (H3K4me1), (ii) active chromatin marks (H3K27ac), and (iii) transcribed ncRNAs. Subsequently, we identified putative target mRNAs for each of the 132 ncRNAs and tested the hypothesis that the ncRNAs identified as putative eRNAs could regulate the expression of downstream target genes by knocking down two selected eRNAs,
The fact that enhancer regions in the genome are transcribed by RNA pol II to produce ncRNAs has been confirmed by numerous studies (Andersson et al., 2014; Arner et al., 2015; De Santa et al., 2010; Kim et al., 2010; Melgar et al., 2011) since the first reports in 2010 (Kim et al., 2010), which have integrated gene expression information with data regarding active chromatin and open chromatin marks such as H3K27ac, DNase hypersensitivity sites, and enhancer chromatin marks such as H3K4me1 and RNA pol II marks. Here, we integrated two active enhancer chromatin marks, H3K4me1 and H3K27ac, along with transcribed ncRNA data. Only those ncRNAs transcribed from supposedly active enhancer regions commonly detected in four different HCC cell lines were defined to be putative eRNAs with high confidence. Putative target mRNAs of each ncRNA defined to be an eRNA were identified by selecting eRNA-mRNA pairs with significantly high correlation in seven HCC cell lines (Supplementary Tables S6 and S7) because we assumed that such a correlation indicates eRNA regulation of the target mRNA. Several other studies have already successfully estimated the relationship between eRNAs and their target mRNAs (Carullo et al., 2020; Chen et al., 2018; Hsieh et al., 2014; Mousavi et al., 2013; Schaukowitch et al., 2014; Zhang et al., 2019).
We also investigated how often the putative target mRNAs defined by the correlation analysis were located within the TADs of
Interestingly, the thirteen genes located within the
While extensive transcription in enhancer genomic regions may no longer be a novel aspect of genomic metabolism, the mechanisms of action by which eRNAs are involved remain to be further studied. As seen in Fig. 1B, we showed that all the putative eRNAs identified in the present work were lncRNAs, and the action mechanism of eRNAs can be inferred based on the known mechanisms of action of lncRNAs. As summarized by Ali and Grote (2020), lncRNAs can regulate gene expression not only through transcriptional products but also through other mechanisms of action. Namely, lncRNAs are known (i) to function as
Taken together, our results confirm that eRNAs are widely transcribed in active enhancer regions, consistent with many other previous studies. Our results also suggest that the conclusion that eRNAs are
This research was supported by National Research Foundation of Korea (NRF) (NRF-2019R1A5A6099645, NRF-2017M3A9G7073033) and by the Ministry of Education, Science, and Technology (MEST) (2019R1A2C 1002350).
L.K.K., T.K., and S.S.C. conceived and designed the experiments. Y.E.L., J.L., Y.S.L., J.J.J., H.W., and H.I.C. performed experiments and data analyses. Y.S.L., Y.G.C., and T.K.K. contributed reagents and materials. T.K., L.K.K., T.K.K., and S.S.C. wrote the paper. All authors read and approved the manuscript.
The authors have no potential conflicts of interest to disclose.
Mol. Cells 2021; 44(9): 658-669
Published online September 30, 2021 https://doi.org/10.14348/molcells.2021.0173
Copyright © The Korean Society for Molecular and Cellular Biology.
Ye-Eun Lee1,8 , Jiyeon Lee2,8
, Yong Sun Lee3
, Jiyoung Joan Jang3
, Hyeonju Woo4
, Hae In Choi5
, Young Gyu Chai6
, Tae-Kyung Kim7
, TaeSoo Kim4
, Lark Kyun Kim2,*
, and Sun Shim Choi1,*
1Division of Biomedical Convergence, College of Biomedical Science, Institute of Bioscience & Biotechnology, Kangwon National University, Chuncheon 24341, Korea, 2Severance Biomedical Science Institute, Graduate School of Medical Science, Brain Korea 21 Project, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06230, Korea, 3Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang 10408, Korea, 4Department of Life Science and the Research Center for Cellular Homeostasis, Ewha Womans University, Seoul 03760, Korea, 5Department of Bionanotechnology, Hanyang University, Seoul 04673, Korea, 6Department of Molecular and Life Science, Hanyang University, Ansan 15588, Korea, 7Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang 37673, Korea, 8These authors contributed equally to this work.
Correspondence to:schoi@kangwon.ac.kr (SSC); lkkim@yuhs.ac (LKK)
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
Enhancers have been conventionally perceived as cis-acting elements that provide binding sites for trans-acting factors. However, recent studies have shown that enhancers are transcribed and that these transcripts, called enhancer RNAs (eRNAs), have a regulatory function. Here, we identified putative eRNAs by profiling and determining the overlap between noncoding RNA expression loci and eRNA-associated histone marks such as H3K27ac and H3K4me1 in hepatocellular carcinoma (HCC) cell lines. Of the 132 HCC-derived noncoding RNAs, 74 overlapped with the eRNA loci defined by the FANTOM consortium, and 65 were located in the proximal regions of genes differentially expressed between normal and tumor tissues in TCGA dataset. Interestingly, knockdown of two selected putative eRNAs, THUMPD3-AS1 and LINC01572, led to downregulation of their target mRNAs and to a reduction in the proliferation and migration of HCC cells. Additionally, the expression of these two noncoding RNAs and target mRNAs was elevated in tumor samples in the TCGA dataset, and high expression was associated with poor survival of patients. Collectively, our study suggests that noncoding RNAs such as THUMPD3-AS1 and LINC01572 (i.e., putative eRNAs) can promote the transcription of genes involved in cell proliferation and differentiation and that the dysregulation of these noncoding RNAs can cause cancers such as HCC.
Keywords: enhancer RNA, hepatocellular carcinoma, long noncoding RNA, RNA polymerase II, transcribed enhancer, transcriptional regulation
Tens of thousands of eRNAs have been further systematically identified and characterized under the basic definition of ncRNAs transcribed from enhancer regions marked by two histone modification signals, H3K4me1 and H3K27ac, in various human tissues and cell types through the Functional Annotation of the Mammalian Genome (FANTOM) consortium (https://fantom.gsc.riken.jp/) (Andersson et al., 2014; Creyghton et al., 2010). FANTOM has thus far compiled more than 65,000 enhancers that have been identified to produce eRNAs (Andersson et al., 2014). It is becoming clear that active enhancers in the genome are associated with eRNA production, i.e., transcription of eRNAs could be a hallmark for the active state of enhancers (Andersson et al., 2014; De Santa et al., 2010; Hah et al., 2013; Kim et al., 2010; Lam et al., 2013; Li et al., 2013; Melgar et al., 2011; Wu et al., 2014).
Although few findings contradicting the functional importance of eRNAs as
Recently, the functional importance of eRNAs has been studied in the context of the regulation of inflammatory responses. For instance, the production of eRNAs in the enhancer region of
To date, no eRNAs involved in hepatocellular carcinoma (HCC) have been reported. In the present work, we attempted to identify putative eRNAs that are commonly expressed in different human HCC cell lines by integrating two types of chromatin signals, H3K4me1 and H3K27ac, and information about ncRNAs derived from RNA-seq data. Subsequently, we identified putative target mRNAs of each ncRNA identified as a putative eRNA by investigating correlations with the expression levels of the mRNAs. Furthermore, we attempted to prove the function by conducting knockdown (KD) experiments for two selected putative eRNAs:
Chromatin immunoprecipitation sequencing (ChIP-seq) and RNA sequencing (RNA-seq) data were obtained from seven hepatocellular cell lines: Huh-7, Huh7.5, FT3-7, PLC-PRF-5, SNU182, SNU387, and SNU449 (Alexander et al., 1976; Ku and Park, 2005; Yi, 2010). Cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM) (for Huh-7, Huh7.5, and FT3-7 cells; Gibco, USA) or Roswell Park Memorial Institute-1640 medium (for PLC-PRF-5, SNU182, SNU387, and SNU449 cells; HyClone, USA) containing 10% fetal bovine serum (FBS; Gibco) and 1% antibiotic-antimycotic (Gibco). All of these cell lines were tested for mycoplasma contamination in the Core facility of National Cancer Center, Korea, and verified to be negative.
Total RNA was isolated using Hybrid-R (GeneAll, Korea) for mRNA sequencing, library preparation was performed with DNase I-treated total RNA using a TruSeq kit (Illumina, USA), and paired-end sequencing was performed using the HiSeq platform (Illumina). The total RNA-seq and ChIP-seq reads for each cell line are summarized in Supplementary Tables S1 and S2, respectively. The peak calling results of the two active histone marks, H3K4me1 and H3K27ac, positioned in the genomic regions where the two putative eRNAs located are listed in Supplementary Table S3.
The ChIP-seq and RNA-seq data set analyzed in this study have been deposited in the Gene Expression Omnibus with the project number PRJNA737566 (Reviewer access: https://dataview.ncbi.nlm.nih.gov/object/PRJNA737566?reviewer=r8n75e8rhnls10h1m1nt440f07). ChIP-seq data visualized in UCSC Browser (http://genome.ucsc.edu/s/yeeun/eRNA_study_ChIP%2Dseq).
RNA-seq data for five different cancer types were downloaded from TCGA (https://portal.gdc.cancer.gov/), including liver hepatocellular carcinoma (LIHC), breast cancer (BRCA), colorectal adenocarcinoma (COAD), stomach adenocarcinoma (STAD), and lung adenocarcinoma (LUAD) data. RNA-seq data from 40 LIHC, 102 BRCA, 39 COAD, 16 STAD, and 52 LUAD matched normal-tumor samples of the same patient were used for our analysis.
After the quality of raw sequencing reads was checked using ‘FastQC’ (v0.11.8) (Andrews, 2010) for the RNA-seq data generated from the seven hepatocellular cell lines, the contaminating adaptor and unpaired reads were excluded using ‘Trimmomatic’ (v0.38; parameters: PE LEADING, 3; TRAILING, 3; SLIDINGWINDOW, 4:15; MINLEN, 36) (Bolger et al., 2014). The cleaned reads were then aligned to the reference genome (GRCh38/hg38; http://asia.ensembl.org/) using ‘Tophat’ (v2.1.1; parameters: --library-type fr-firststrand) (Trapnell et al., 2009), and the fragments per kilobase of transcript per million mapped reads (FPKM) values were obtained by running ‘Cuffquant’ (v2.2.1; parameters: --library-type fr-firststrand --multi-read-correct --frag-bias-correct) (Trapnell et al., 2010). After normalization of the 7 cell line library sizes using ‘Cuffnorm’ (v2.2.1; parameters: --library-type fr-firststrand --library-norm-method classic-fpkm) (Trapnell et al., 2010), ncRNAs with FPKM ≥ 1 were mapped to the genomic regions where the active enhancer marks H3K27ac and H3K4me1 were located.
The ChIP-seq data generated from the seven hepatocellular cell lines for H3K27ac and H3K4me1 were mapped to the reference genomic regions as follows. The quality check and trimming procedures for raw ChIP-seq reads were the same as those used for the RNA-seq data described above. The cleaned ChIP-seq data were then aligned to the reference genome (GRCh38/hg38) using ‘Bowtie’ (v1.3.0; parameters: -S --fr) (Langmead et al., 2009). Subsequently, the reads mapped to black-list regions from the ENCODE portal (identifier: ENCFF356LFX; https://www.encodeproject.org/) (Sloan et al., 2016) were filtered out using bedtools (2.27.0; parameters: intersect v) (Quinlan and Hall, 2010) and nonuniquely aligned reads and potential PCR duplicates were excluded using ‘MarkDuplicates’ of ‘Picard’ (v2.25.0; REMOVE_DUPLICATES = true, ASSUME_SORTED = true, VALIDATION_STRINGENCY = LENIENT) (Picard Toolkit, 2019), and peak calling and peak annotation were performed using ‘MACS2’ (v2.2.4; parameters: callpeak -f BAMPE -p 0.0001 [--narrow | --broad] -g hs --SPMR [--call-summits | --no-model] -B --keep-dup 1 --mfold 10 100) (Zhang et al., 2008) and ‘annotatePeaks.pl’ of ‘HOMER’ (v4.11.1; parameter: hg38) (Heinz et al., 2010), respectively.
The human HCC cell line FT3-7 was cultured in DMEM (HyClone) with 10% FBS (HyClone) and penicillin-streptomycin (Gibco) and maintained at 37°C in an incubator containing 5% CO2. To knock down two selected eRNAs,
Cell proliferation was assessed by cell counting kit-8 (CCK-8; Dojindo, Japan) assays following the manufacturer’s protocol. In brief, cells were seeded in 96-well plates, and 10 µl of CCK-8 reagent was added at the indicated time points (0, 24, 48, and 72 h). Cell viability was measured with a microplate reader (Molecular Devices, USA) at 450 nm.
The cells were plated in 96-well plates (3595; Corning, USA). The proliferation of cells was monitored for 4 days by an IncuCyte Live-Cell analysis system (Sartorius, Germany).
A total of 2,000 cells were plated in 6-well plates and maintained for 2 weeks. To visualize and count colonies grown for 2 weeks, cells were fixed with 4% paraformaldehyde and stained with 1% crystal violet.
To evaluate the invasion or migration of cancer cells, transwell plates (8 µm pore size, 3422; Corning) were coated with Matrigel (354234; BD, USA) or gelatin (G1393; Sigma, USA). Cells (3 × 105/ml cells in 100 µl for invasion assays and 1.5 × 105/ml cells in 100 µl for migration assays) in serum-free medium were added to the upper chamber, and 600 µl of 10% FBS medium was added to the lower chamber. After 24-48 h, the invaded and migrated cells were fixed with 4% paraformaldehyde and stained with 1% crystal violet. Six random fields were photographed at ×200 magnification with a microscope (Carl Zeiss, Germany).
Triplicate RNA-seq data were generated from each of the two KD experiments, i.e., the
Kaplan–Meier survival plots and hazard ratios (HRs) for the two selected ncRNAs,
All statistical tests and plots were conducted using the R (4.0.3) and Python (3.6.8) languages. Kendall’s correlation analysis and linear regression analysis were performed with the ‘scipy.stats.stats’ and ‘sklearn.linear_model’ packages, respectively, of Python (3.6.8). Gene ontology (GO) analysis was performed using the DAVID tool (https://david.ncifcrf.gov/home.jsp) (Jiao et al., 2012). Other batch works were performed with home-built Python scripts.
Active enhancers with two histone modification signals, H3K4me1 (i.e., an enhancer chromatin mark) and H3K27ac (i.e., an active chromatin mark), are known to produce eRNAs (Creyghton et al., 2010; Heintzman et al., 2007; 2009). Figure 1 shows our workflow used for identifying and detecting eRNAs: we used two types of ChIP-seq signals marking active enhancers and total RNA-seq data. Our rationale for identifying eRNAs was to select only ncRNAs transcribed from the genomic regions that are supposed to be active enhancer regions.
Briefly, only those ncRNAs transcribed from genomic positions at which the two types of ChIP-seq signals were commonly located were selected after excluding other RNAs transcribed from the genomic regions mapped to transcription start sites (TSSs), transcription termination sites (TTSs), and exons of protein-coding genes (Fig. 1A). A list of the ncRNAs that were considered to be transcribed from active enhancer regions was generated for each of seven different HCC cell lines: FT3-7, Huh-7, Huh7.5, PLC-PRF-5, SNU182, SNU387, and SNU449. Subsequently, those ncRNAs commonly identified from four different cell lines were defined as putative eRNAs with high confidence (Fig. 1A).
We decided to overlap the lists of eRNAs from four cell lines, Huh7.5, SNU182, FT3-7, and PLC-PRF-5, rather than from all seven cell lines because principal component analysis (PCA) showed that the gene expression profiles of three cell lines, SNU387, SNU449, and Huh-7, were substantially different from those of the remaining four cell lines (Supplementary Fig. S1). As expected, the list of overlapping eRNAs from all seven cell lines that included the three cell lines with very different gene expression profiles had a substantially reduced number of confident eRNAs such that further exploration would have been difficult. The numbers of ChIP-seq signals, transcribed ncRNAs, and ncRNAs detected in the active enhancer regions are summarized in Supplementary Table S4.
Consequently, a total of 132 ncRNAs that were commonly detected in the four cell lines were defined to be putative eRNAs (Supplementary Table S5), i.e., ncRNAs transcribed from active enhancer regions. Interestingly, all 132 ncRNAs were lncRNAs by definition with lengths > 200 nt. Small ncRNAs, such as miRNAs, piRNAs, and siRNAs, were also present in the individual cell lines we used to produce the ChIP-seq and RNA-seq data, but all disappeared when the ncRNAs from the four different cell types overlapped. Subsequently, by referencing the annotation information in GTF files (GRCh38/hg38), the lncRNAs were subcategorized into antisense ncRNAs, long intergenic noncoding RNA (lincRNAs), pseudogenes, sense overlapping ncRNAs and sense intronic ncRNAs, of which antisense ncRNAs and lincRNAs were the most frequent types (Fig. 1B). No chromosomal bias seemed to be present in the ncRNAs identified as putative eRNAs (Supplementary Table S5). Approximately 56% (74/132) of the genomic positions that were found to produce ncRNAs identified as putative eRNAs also overlapped with the ‘transcribed eRNAs’ identified with FANTOM, producing a significant enrichment of FANTOM eRNAs in the list of eRNAs detected in the present work (chi-squared test,
Next, to determine whether the expression of 132 ncRNAs identified as putative eRNAs is altered during tumor formation, we investigated the expression profiles of these ncRNAs in TCGA LIHC data. Approximately 49.2% (65/132) of the 132 ncRNAs were found to be significantly differently expressed (i.e., DEGs) between normal and tumor tissues in the TCGA-LIHC dataset (Fig. 2A). The volcano plot in Fig. 2B shows that the magnitude of the changes in the expression levels of these ncRNA genes was relatively weak but still significant. Notably, the number of upregulated ncRNAs was greater than that of downregulated ncRNAs (49 upregulated and 16 downregulated). The ChIP-seq peak locations and shapes for two putative eRNAs,
We next tried to identify putative eRNA-target mRNA pairs by correlation analysis using information on the expression levels of these genes from the seven different HCC cell lines. The rationale for this approach was that the expression levels of an eRNA should be correlated (either positively or negatively, although a positive correlation seems to be more likely assuming that enhancers act as
As a result, we found a total of putative 44,650 eRNA-target mRNA pairs (highlighted in dark blue in Fig. 3A) using weak thresholds (
Subsequently, we investigated whether these correlated genes were located in the same topology associating domains (TADs) where each of the ncRNAs identified as putative eRNAs was positioned. Surprisingly, there were few genes in the same TADs as
We next investigated which genes had their expression altered by knocking down these two ncRNAs under the assumption that the expression of target genes should be altered by KD of their upstream regulator eRNAs. For this purpose, ASOs were employed to KD expression in the FT3-7 HCC cell line. One ASO was designed to target exon 3 of
DEGs before and after KD were then identified using the threshold
We next analyzed which genes in the lists from the
Seven genes,
Similarly, five genes,
Furthermore, the expression levels of some of these genes were found to be associated with the overall survival of LIHC patients (Figs. 5E and 5F): upregulation of the genes conferred an approximately two times more likely chance of poor survivals (odds ratio: approximately 1.6-2.0) (Fig. 5G).
As mentioned in the Introduction, several eRNAs are known to have oncogenic potential. Consistently, we showed (Fig. 5) that the two putative eRNAs,
Subsequently, we examined whether these putative eRNAs could affect the invasion and migration of HCC cells using transwell assays. Invaded and migrated cells were detected after allowing the cells to migrate for 24-48 h. As shown in Figs. 6F and 6G, the invasion and migration abilities of
In the present work, a total of 132 putative eRNAs were identified by intersecting the genomic positions identified from three types of information: (i) enhancer chromatin marks (H3K4me1), (ii) active chromatin marks (H3K27ac), and (iii) transcribed ncRNAs. Subsequently, we identified putative target mRNAs for each of the 132 ncRNAs and tested the hypothesis that the ncRNAs identified as putative eRNAs could regulate the expression of downstream target genes by knocking down two selected eRNAs,
The fact that enhancer regions in the genome are transcribed by RNA pol II to produce ncRNAs has been confirmed by numerous studies (Andersson et al., 2014; Arner et al., 2015; De Santa et al., 2010; Kim et al., 2010; Melgar et al., 2011) since the first reports in 2010 (Kim et al., 2010), which have integrated gene expression information with data regarding active chromatin and open chromatin marks such as H3K27ac, DNase hypersensitivity sites, and enhancer chromatin marks such as H3K4me1 and RNA pol II marks. Here, we integrated two active enhancer chromatin marks, H3K4me1 and H3K27ac, along with transcribed ncRNA data. Only those ncRNAs transcribed from supposedly active enhancer regions commonly detected in four different HCC cell lines were defined to be putative eRNAs with high confidence. Putative target mRNAs of each ncRNA defined to be an eRNA were identified by selecting eRNA-mRNA pairs with significantly high correlation in seven HCC cell lines (Supplementary Tables S6 and S7) because we assumed that such a correlation indicates eRNA regulation of the target mRNA. Several other studies have already successfully estimated the relationship between eRNAs and their target mRNAs (Carullo et al., 2020; Chen et al., 2018; Hsieh et al., 2014; Mousavi et al., 2013; Schaukowitch et al., 2014; Zhang et al., 2019).
We also investigated how often the putative target mRNAs defined by the correlation analysis were located within the TADs of
Interestingly, the thirteen genes located within the
While extensive transcription in enhancer genomic regions may no longer be a novel aspect of genomic metabolism, the mechanisms of action by which eRNAs are involved remain to be further studied. As seen in Fig. 1B, we showed that all the putative eRNAs identified in the present work were lncRNAs, and the action mechanism of eRNAs can be inferred based on the known mechanisms of action of lncRNAs. As summarized by Ali and Grote (2020), lncRNAs can regulate gene expression not only through transcriptional products but also through other mechanisms of action. Namely, lncRNAs are known (i) to function as
Taken together, our results confirm that eRNAs are widely transcribed in active enhancer regions, consistent with many other previous studies. Our results also suggest that the conclusion that eRNAs are
This research was supported by National Research Foundation of Korea (NRF) (NRF-2019R1A5A6099645, NRF-2017M3A9G7073033) and by the Ministry of Education, Science, and Technology (MEST) (2019R1A2C 1002350).
L.K.K., T.K., and S.S.C. conceived and designed the experiments. Y.E.L., J.L., Y.S.L., J.J.J., H.W., and H.I.C. performed experiments and data analyses. Y.S.L., Y.G.C., and T.K.K. contributed reagents and materials. T.K., L.K.K., T.K.K., and S.S.C. wrote the paper. All authors read and approved the manuscript.
The authors have no potential conflicts of interest to disclose.
Cho-Won Kim, Hong Kyu Lee, Min-Woo Nam, Youngdong Choi, and Kyung-Chul Choi
Mol. Cells 2022; 45(12): 935-949 https://doi.org/10.14348/molcells.2022.0105Chaehwan Oh, Dahyeon Koh, Hyeong Bin Jeon, and Kyoung Mi Kim
Mol. Cells 2022; 45(9): 603-609 https://doi.org/10.14348/molcells.2022.0056Mijin Park, Byul Moon, Jong-Hwan Kim, Seung-Jin Park, Seon-Kyu Kim, Kihyun Park, Jaehoon Kim, Seon-Young Kim, Jeong-Hoon Kim, and Jung-Ae Kim
Mol. Cells 2022; 45(8): 550-563 https://doi.org/10.14348/molcells.2022.0009