Mol. Cells 2017; 40(12): 889-896
Published online December 20, 2017
https://doi.org/10.14348/molcells.2017.0263
© The Korean Society for Molecular and Cellular Biology
Correspondence to : *Correspondence: hirose@igm.hokudai.ac.jp
Nuclear bodies are subnuclear, spheroidal, and membraneless compartments that concentrate specific proteins and/or RNAs. They serve as sites of biogenesis, storage, and sequestration of specific RNAs, proteins, or ribonucleoprotein complexes. Recent studies reveal that a subset of nuclear bodies in various eukaryotic organisms is constructed using architectural long noncoding RNAs (arcRNAs). Here, we describe the unifying mechanistic principles of the construction and function of these bodies, especially focusing on liquid-liquid phase separation induced by architectural molecules that form multiple weakly adhesive interactions. We also discuss three possible advantages of using arcRNAs rather than architectural proteins to build the bodies: position-specificity, rapidity, and economy in sequestering nucleic acid-binding proteins. Moreover, we introduce two recently devised methods to discover novel arcRNA-constructed bodies; one that focuses on the RNase-sensitivity of these bodies, and another that focuses on “semi-extractability” of arcRNAs.
Keywords architectural RNA, liquid-liquid phase separation, low-complexity domain, multivalency, nuclear body, prion-like domain
Cellular bodies or condensates are subcellular, membraneless, and spheroidal compartments approximately 0.2 to 2 μm in diameter. Nuclear bodies are cellular bodies within the nucleus, and include the nucleolus, Cajal body, histone locus body, Polycomb body, promyelocytic leukemia body, nuclear speckle, and paraspeckle (Fig. 1). Each nuclear body is defined by enrichment of a specific marker protein and/or RNA, and functions as the site of biogenesis, storage, or sequestration of specific RNAs, proteins, or ribonucleoprotein (RNP) complexes. Most cellular bodies exhibit liquid droplet-like features; the cellular bodies are demixed (phase-separated) from the surrounding nucleoplasm or cytoplasm, they can fuse and become a larger droplet, and shearing forces can deform and break down the droplets to smaller droplets (Banani et al., 2017, 2011; Brangwynne et al., 2011). Such liquid-liquid phase separation (LLPS) is accomplished by molecules that can form multiple intermolecular interactions, a property that is called multivalency, a classical concept in polymer chemistry (Molliex et al., 2015; Nott et al., 2015). Above a threshold concentration, molecules that form multivalent interactions can self-assemble into large oligomers or polymers, often causing LLPS to enable body formation. Indeed, concentrating the key body component at a particular cellular site induces formation of nucleolus, Cajal body, paraspeckle, and histone locus body (Berry et al., 2015; Kaiser et al., 2008; Mao et al., 2011). LLPS is often enabled by proteins that contain a region enriched with a limited number of amino acid types, a domain referred to as low-complexity domain (LCD) (Fig. 2) (Kato et al., 2012; Molliex et al., 2015). LCDs lack a defined three-dimensional structure, which most other classical protein domains possess, and such intrinsically disordered LCDs provide the basis for multivalent weakly adhesive intermolecular interactions, such as electrostatic interactions (e.g., charge, cation-pi, and dipole-dipole), pi stacking interactions, and hydrophobic interactions (Fig. 2) (Brangwynne et al., 2015; Nott et al., 2015; Petri et al., 2012; Reichheld et al., 2017). These weak interactions are short lived and provide little structural order to the peptide chain, consistent with the dynamic nature of phase-separated liquids. Although a cellular body as a whole is often stably maintained over hours or even days, the protein components of the body are usually dynamic, exchanging rapidly with the surrounding nucleoplasm or cytoplasm on timescales of seconds (Fig. 2) (Dundr et al., 2004; Mao et al., 2011).
The formation of various cellular bodies depends on the specific structural proteins, e.g., Coilin in Cajal bodies or TIA1 in stress granules. A subset of nuclear bodies, however, use specific long noncoding RNA (lncRNA) or pre-mRNA as their scaffolding molecule. These scaffolding lncRNAs and pre-mRNAs are defined as “architectural lncRNAs (arcRNAs)” (Chujo et al., 2016). An RNA qualifies as an arcRNA if: 1) it is localized and enriched in a specific nuclear body, and 2) it constructs and stabilizes the body structure. The latter can be shown by RNA depletion to disrupt the body or artificial RNA tethering to construct the body. Presently, five lncRNAs and one pre-mRNA can firmly be classified as arcRNAs (Table 1). These six established arcRNAs and the nuclear bodies that they build are: 1) mammalian nuclear paraspeckle assembly transcript 1 isoform 2 (NEAT1_2) lncRNA in the paraspeckle (Sasaki et al., 2009; Shevtsov and Dundr, 2011), 2) intergenic spacer (IGS) lncRNA in the amyloid body (Audas et al., 2012), 3) human satellite III (SatIII) lncRNA in the nuclear stress body (Valgardsdottir et al., 2005), 4) histone pre-mRNA in the histone locus body (Shevtsov and Dundr, 2011), 5)
ArcRNAs share four common characteristics that are closely connected with the benefits of using RNA, but not protein, as the architectural molecules of nuclear bodies. These benefits will be discussed below.
First, the expression of arcRNAs is temporarily upregulated by specific stimuli (Table 1). For example, NEAT1_2 expression is enhanced upon cellular stresses such as proteasome inhibition, viral infection, or hypoxia, and NEAT1_2 promotes cell survival under such adverse conditions (Choudhry et al., 2014; Hirose et al., 2014; Imamura et al., 2014). NEAT1_2 is also transcriptionally upregulated during and required for mammalian corpus luteum formation and mammary gland development (Nakagawa et al., 2014; Standaert et al., 2014). Moreover, NEAT1_2 is upregulated in many cancer tissues and variously regulates cancer progression (Adriaens et al., 2016; Chakravarty et al., 2014; Choudhry et al., 2014; Mello et al., 2017).
Second, arcRNAs sequestrate various regulatory proteins, such as RNA-binding proteins, DNA-binding proteins, and E3 ubiquitin ligases, thereby affecting gene expression patterns. For example, upon NEAT1_2 upregulation and paraspeckle number increase, nucleoplasmic SFPQ protein is sequestrated into paraspeckles, reducing SFPQ-mediated transcription of target genes such as
Third, all six arcRNA-constructed nuclear bodies are assembled at the site of arcRNA transcription (Fig. 3) (Audas et al., 2012; Clemson et al., 2009; Dangli et al., 1983; Liu et al., 2006; Rizzi et al., 2004; Shimada et al., 2003).
Fourth, most arcRNAs are transcribed from or include repetitive sequences. IGS and SatIII lncRNAs are transcribed from the ribosomal DNA repeat intergenic regions and peri-centromeric SatIII repeats, respectively (Audas et al., 2012; Rizzi et al., 2004).
ArcRNA-constructed nuclear bodies contain various protein components, and there are several commonalities between the body proteins (Fig. 3). First, many are RNA-binding proteins, as expected given the requirement of RNA to build the bodies. Second, arcRNA-constructed nuclear bodies are often enriched with RNA-binding proteins that also contain a form of LCD called a prion-like domain (PLD) (Table 1) (Chujo et al., 2016; King et al., 2012; Naganuma et al., 2012). The PLD is rich in amino acids such as tyrosine, asparagine, glutamine, and glycine, and can form protein–protein interactions. Some of the PLD-containing RNA-binding proteins may bridge RNPs to form large nuclear bodies. Indeed, each paraspeckle contains about 50 NEAT1_2 molecules, and the PLDs of paraspeckle proteins such as FUS and RBM14 are required to construct paraspeckles (Chujo et al., 2017; Hennig et al., 2015). Moreover, recombinant RBM14 or FUS can form hydrogel in a PLD-dependent manner (Hennig et al., 2015). These results suggest that PLDs of RBM14 and FUS may induce phase separation to form paraspeckles in the nucleoplasm. Also, the PLD of HNRNPD is required to construct the SNB, and PLD-containing proteins (e.g., Nona and Hrb87F) are enriched in the
ArcRNAs are widely used as the architectural molecule of nuclear bodies in a variety of eukaryotes, suggesting that RNA was evolutionarily adopted as the scaffold molecule in some cases. In this section, we discuss three possible reasons why RNA is used as the architectural molecule of nuclear bodies: position-specificity, rapidity, and economy in sequestering nucleic acid-binding proteins.
First, the use of RNA enables position-specific nuclear body formation, which enables gene regulation at a specific nuclear region. To induce LLPS to form membraneless subcellular bodies, it is necessary to raise the concentration of the key scaffold molecules above a threshold. In the nucleus, certain homogeneous RNAs can be highly concentrated at the site of transcription, if the transcription level is high enough (Fig. 1 and 3). In fact, all six known nuclear bodies with arcRNA scaffolds are formed at the site of arcRNA transcription (Audas et al., 2012; Clemson et al., 2009; Dangli et al., 1983; Liu et al., 2006; Rizzi et al., 2004; Shimada et al., 2003). Moreover, by simply repeating a specific scaffold sequence in one RNA molecule, the scaffold sequence can easily be enriched in one place, as in the case of SatIII (containing repeats of 158 nt), Hsr omega (containing repeats of 280 nt), and DM1-related RNA with triplet repeat expansion (Garbe et al., 1986; Moyzis et al., 1987; Taneja et al., 1995). In addition, RNA can increase the local concentration of specific RNA-binding proteins that the RNA interacts with. As many RNA-binding proteins bear PLDs (Table 1), arcRNA can increase the local concentration of PLD-containing RNA-binding proteins to promote PLD-mediated LLPS to construct a massive complex. For example, PLD-containing RNA-binding protein FUS forms high-order assemblies at a low concentration in the presence but not in the absence of RNAs (Schwartz et al., 2013). Through nuclear body formation at the transcription sites of arcRNAs, arcRNAs can regulate the local nucleoplasmic concentration of freely available regulatory proteins by sequestrating them into the nuclear body, which may eventually regulate local gene expression.
Second, RNA enables rapid and reversible sequestration of nucleic acid-binding proteins. Interestingly, all the known nuclear bodies with arcRNA scaffolds form or increase in size or number upon specific stimuli (Table 1). When such stimuli are no longer present, arcRNAs can be quickly degraded and proteins trapped in the bodies disperse into the nucleoplasm, rapidly allowing gene expression to return to normal. For example,
Third, it is more economical to use RNAs than proteins to sequestrate a massive number of various nucleic acid-binding proteins. The formation of a nuclear body requires the accumulation of specific proteins at specific sites in the nucleus. Therefore, it is beneficial for the architectural molecules to be able to trap a massive number of proteins. One protein can usually be captured by one or several protein domains containing tens to hundreds of amino acids, whereas one protein can be captured by only 4–17 nt of RNA (Lunde et al., 2007; Prikryl et al., 2011). Thus, one protein comprising 100 amino acids can capture only one or two proteins, whereas one RNA molecule comprising 100 nt may capture 5–20 proteins. One arcRNA molecule, for instance a 23-kb NEAT1_2 RNA molecule, may capture thousands of proteins. To date, more than 60 protein species have been identified as components of paraspeckles (Yamazaki and Hirose, 2015). The majority of these are RNA-binding proteins and transcription factors. Such collective binding and sequestration of various nucleic acid-binding proteins helps to regulate the overall gene expression. Also, whereas longer polypeptides of random sequences can easily aggregate and become toxic to cells, longer RNAs of random sequences remain soluble due to the charged phosphate backbone. This may be one reason most proteins are shorter than 2,000 amino acids, whereas many lncRNAs and mRNA precursors comprise 10,000 nt. A longer architectural molecule enables multivalent interaction with a massive number of nucleic acid-binding proteins, which facilitates the regulation of gene expression. Whereas body proteins rapidly exchange in and out of the body, arcRNA can stay inside the body (Mao et al., 2011), which is presumably due to the high molecular weight of arcRNA and the great number of interactions it makes. Such soluble, highly multivalent, and static features may make arcRNAs competent architectural molecules. RNA hardly forms insoluble aggregates and the reading frames of lncRNAs do not need to remain translatable; therefore, the sequence of an architectural lncRNA is likely to be only constrained by the requirement to form the correct RNA–protein interaction, and an arcRNA has flexibility to add and change sequences. Such advantages would allow lncRNAs to increase the diversity of their binding partners and the combinations of proteins that are incorporated into nuclear bodies, enabling the formation of a wide variety of nuclear bodies for faster adaptation to environmental changes.
Formation of known arcRNA-dependent nuclear bodies is initiated by expression of arcRNAs induced during specific stresses and developmental stages. So, considering that there are over 20,000 lncRNAs with unknown functions (Hon et al., 2017), it is likely that arcRNA-dependent nuclear bodies remain to be identified in various cellular contexts. Here, we introduce two recently reported methods to discover novel nuclear bodies with arcRNA scaffolds: 1) RNase-sensitivity screening of nuclear bodies (Mannen et al., 2016), and 2) transcriptome-wide screening of semi-extractable RNA (Chujo et al., 2017).
The structures of known arcRNA-constructed nuclear bodies can be disrupted by RNase treatment of the permeabilized cells. Thus, in combination with a venus-tagged cDNA expression library, one can seek proteins that form foci that are disrupted upon RNase treatment. In the reported RNase-sensitivity screen, first, a venus-tagged cDNA expression library of 10,432 proteins was used to select 463 proteins that formed foci within the nucleus (Hirose and Goshima, 2015). Then, the cells were permeabilized with weak detergent, followed by treatment with RNases, identifying 32 proteins in HeLa cells that form RNase-sensitive foci in the nucleus. Many of the identified proteins were known components of various nuclear bodies. However, six proteins were new components of the SNB, one of which, HNRNPD, was required for SNB formation. This study also showed that both the RNA-binding domain and the PLD of HNRNPD are required for the SNB formation; the PLD of HNRNPD enabled HNRNPD–HNRNPD and HNRNPD–Sam68 interactions. These results collectively demonstrate that SNB formation requires (a) putative arcRNA(s), RNA–protein interactions, and protein–protein interactions. The molecular entity of SNB arcRNA remains to be identified.
Transcriptome-wide screening of semi-extractable RNAs looks for RNAs that are unusually difficult to extract from cells. This method originated from a serendipitous finding that cellular NEAT1_2 is extremely difficult to extract using conventional RNA extraction methods. A complete extraction of cellular NEAT1_2 requires either needle shearing or heating of the cell lysates in an RNA extraction reagent such as TRIzol, a property that was named semi-extractability. One feature of phase-separated liquid droplets is that such droplets can be broken down by shearing forces. So, requirement of needle shearing to extract NEAT1_2 suggests that after lysing cells with TRIzol, paraspeckles might still somehow remain as phase-separated droplets, unless sheared with a needle or heated. Semi-extractability of NEAT1_2 was largely dependent on the FUS protein, especially its PLD, which also indicates that LLPS of paraspeckles is the cause of semi-extractability. As LLPS and PLD are often linked to nuclear body formation, it was hypothesized that unidentified semi-extractable RNAs might also form nuclear bodies. Using transcriptome-wide identification of semi-extractable RNAs in HeLa cells, 45 semi-extractable RNAs were identified, of which the top 10 abundant RNAs all showed predominant localization in nuclear body-like foci. Of these, three RNAs (CCAT1 lncRNA, LINC00473 lncRNA, and ADARB2 pre-mRNA) were confirmed to form foci not only at the transcription site, but also outside of transcription sites, strongly suggesting formation of novel nuclear bodies. To firmly establish these RNAs as arcRNAs, body marker proteins need to be identified, and disintegration of body marker foci upon RNA depletion needs to be confirmed. As this semi-extractable RNA screening can be performed relatively easily, this method is useful to find novel arcRNAs in various cellular contexts. Taken together, the two approaches above suggest the presence of unidentified arcRNAs in the human transcriptome. Identification and characterization of the additional arcRNAs will unveil the commonality in this class of RNA in terms of the sequence elements and modes of action.
Recent studies have greatly advanced our mechanistic understanding of the formation and function of arcRNA-constructed nuclear bodies. However, many important questions remain to be answered, including the following:
What are the fundamental rules of RNA sequences and/or RNA structures that enable LLPS of the nuclear bodies? Are they clusters of binding sites for PLD-containing RNA-binding proteins?
There are more than 20,000 human lncRNAs with unknown functions. Are there novel, uncharacterized arcRNAs that function during specific cellular stresses and developmental stages?
There is a large gap between the molecular functions of arcRNA-constructed nuclear bodies (e.g., protein sequestration) and the physiological functions of the bodies (e.g., organ development). How does the molecular role of the nuclear bodies link to the physiological functions?
The six known nuclear bodies built on arcRNAs
Nuclear body | arcRNA | Inducing stress or signal | Repetitive sequences in arcRNA | Body proteins | Molecular, cellular, and physiological functions |
---|---|---|---|---|---|
Paraspeckle | NEAT1 | Viral infection, hypoxia, MG132, corpus luteum formation, mammary gland development | Fragments of transposable elements | > 60 proteins including PSPC1a, SFPQa, NONO, RBM14a, HNRNPK, FUSa, DAZAP1a, HNRNPH3a, HNRNPA1a, HNRNPR, HNRNPUL1a, TDP-43a, BRG1b, BRMb, BAF155b (Yamazaki and Hirose, 2015) | Sequestration of RNA-binding proteins and transcription factors. Retention of mouse CTN RNA. |
Amyloid body | IGS | Acidosis, heat shock, transcriptional stress | Ribosomal DNA repeats | VHL, DNMT1, POLD1, HSP70, MDM2, RPA40, RPA16, NOL1, NOM1, NOP52, PES1, RRP1B, SENP3 | Immobilization of body proteins, activation of HIF. |
Nuclear stress body | Satellite III | Heat shock, cadmium | Satellite III repeats | SRSF1, SAFB, TDP-43a, HSF1, NFAT5a, BRG1b | Sequestration of RNA-binding proteins and transcription factors |
Histone locus body | Histone mRNA precursor | Interphase, especially G1 and S phases | Unknown | FLASH, LSM10, LSM11, SLBP, SYMPLEKIN | Processing of histone pre-mRNAs |
Omega speckle | Hsr omega | Heat shock, amides, ecdysone hormone | Tandem repeats of 280 nt for 10 kb | Nonaa, Sex-lethala, sans fille, PEPa, Hrb87Fa, Hrp40a, Hrb57Aa, ISWIb | Normal development (Nullisomy animals are embryonic lethal or extremely weak) |
Mei2 dot | meiRNA | Entry into meiosis | U(U/C)AAAC | Mei2, Mmi1 | Sequestration of RNA-binding proteins |
aProteins containing prion-like domains.
bChromatin remodeling complex protein.
Mol. Cells 2017; 40(12): 889-896
Published online December 31, 2017 https://doi.org/10.14348/molcells.2017.0263
Copyright © The Korean Society for Molecular and Cellular Biology.
Takeshi Chujo, and Tetsuro Hirose*
Institute for Genetic Medicine, Hokkaido University, Sapporo 060-0815, Japan
Correspondence to:*Correspondence: hirose@igm.hokudai.ac.jp
Nuclear bodies are subnuclear, spheroidal, and membraneless compartments that concentrate specific proteins and/or RNAs. They serve as sites of biogenesis, storage, and sequestration of specific RNAs, proteins, or ribonucleoprotein complexes. Recent studies reveal that a subset of nuclear bodies in various eukaryotic organisms is constructed using architectural long noncoding RNAs (arcRNAs). Here, we describe the unifying mechanistic principles of the construction and function of these bodies, especially focusing on liquid-liquid phase separation induced by architectural molecules that form multiple weakly adhesive interactions. We also discuss three possible advantages of using arcRNAs rather than architectural proteins to build the bodies: position-specificity, rapidity, and economy in sequestering nucleic acid-binding proteins. Moreover, we introduce two recently devised methods to discover novel arcRNA-constructed bodies; one that focuses on the RNase-sensitivity of these bodies, and another that focuses on “semi-extractability” of arcRNAs.
Keywords: architectural RNA, liquid-liquid phase separation, low-complexity domain, multivalency, nuclear body, prion-like domain
Cellular bodies or condensates are subcellular, membraneless, and spheroidal compartments approximately 0.2 to 2 μm in diameter. Nuclear bodies are cellular bodies within the nucleus, and include the nucleolus, Cajal body, histone locus body, Polycomb body, promyelocytic leukemia body, nuclear speckle, and paraspeckle (Fig. 1). Each nuclear body is defined by enrichment of a specific marker protein and/or RNA, and functions as the site of biogenesis, storage, or sequestration of specific RNAs, proteins, or ribonucleoprotein (RNP) complexes. Most cellular bodies exhibit liquid droplet-like features; the cellular bodies are demixed (phase-separated) from the surrounding nucleoplasm or cytoplasm, they can fuse and become a larger droplet, and shearing forces can deform and break down the droplets to smaller droplets (Banani et al., 2017, 2011; Brangwynne et al., 2011). Such liquid-liquid phase separation (LLPS) is accomplished by molecules that can form multiple intermolecular interactions, a property that is called multivalency, a classical concept in polymer chemistry (Molliex et al., 2015; Nott et al., 2015). Above a threshold concentration, molecules that form multivalent interactions can self-assemble into large oligomers or polymers, often causing LLPS to enable body formation. Indeed, concentrating the key body component at a particular cellular site induces formation of nucleolus, Cajal body, paraspeckle, and histone locus body (Berry et al., 2015; Kaiser et al., 2008; Mao et al., 2011). LLPS is often enabled by proteins that contain a region enriched with a limited number of amino acid types, a domain referred to as low-complexity domain (LCD) (Fig. 2) (Kato et al., 2012; Molliex et al., 2015). LCDs lack a defined three-dimensional structure, which most other classical protein domains possess, and such intrinsically disordered LCDs provide the basis for multivalent weakly adhesive intermolecular interactions, such as electrostatic interactions (e.g., charge, cation-pi, and dipole-dipole), pi stacking interactions, and hydrophobic interactions (Fig. 2) (Brangwynne et al., 2015; Nott et al., 2015; Petri et al., 2012; Reichheld et al., 2017). These weak interactions are short lived and provide little structural order to the peptide chain, consistent with the dynamic nature of phase-separated liquids. Although a cellular body as a whole is often stably maintained over hours or even days, the protein components of the body are usually dynamic, exchanging rapidly with the surrounding nucleoplasm or cytoplasm on timescales of seconds (Fig. 2) (Dundr et al., 2004; Mao et al., 2011).
The formation of various cellular bodies depends on the specific structural proteins, e.g., Coilin in Cajal bodies or TIA1 in stress granules. A subset of nuclear bodies, however, use specific long noncoding RNA (lncRNA) or pre-mRNA as their scaffolding molecule. These scaffolding lncRNAs and pre-mRNAs are defined as “architectural lncRNAs (arcRNAs)” (Chujo et al., 2016). An RNA qualifies as an arcRNA if: 1) it is localized and enriched in a specific nuclear body, and 2) it constructs and stabilizes the body structure. The latter can be shown by RNA depletion to disrupt the body or artificial RNA tethering to construct the body. Presently, five lncRNAs and one pre-mRNA can firmly be classified as arcRNAs (Table 1). These six established arcRNAs and the nuclear bodies that they build are: 1) mammalian nuclear paraspeckle assembly transcript 1 isoform 2 (NEAT1_2) lncRNA in the paraspeckle (Sasaki et al., 2009; Shevtsov and Dundr, 2011), 2) intergenic spacer (IGS) lncRNA in the amyloid body (Audas et al., 2012), 3) human satellite III (SatIII) lncRNA in the nuclear stress body (Valgardsdottir et al., 2005), 4) histone pre-mRNA in the histone locus body (Shevtsov and Dundr, 2011), 5)
ArcRNAs share four common characteristics that are closely connected with the benefits of using RNA, but not protein, as the architectural molecules of nuclear bodies. These benefits will be discussed below.
First, the expression of arcRNAs is temporarily upregulated by specific stimuli (Table 1). For example, NEAT1_2 expression is enhanced upon cellular stresses such as proteasome inhibition, viral infection, or hypoxia, and NEAT1_2 promotes cell survival under such adverse conditions (Choudhry et al., 2014; Hirose et al., 2014; Imamura et al., 2014). NEAT1_2 is also transcriptionally upregulated during and required for mammalian corpus luteum formation and mammary gland development (Nakagawa et al., 2014; Standaert et al., 2014). Moreover, NEAT1_2 is upregulated in many cancer tissues and variously regulates cancer progression (Adriaens et al., 2016; Chakravarty et al., 2014; Choudhry et al., 2014; Mello et al., 2017).
Second, arcRNAs sequestrate various regulatory proteins, such as RNA-binding proteins, DNA-binding proteins, and E3 ubiquitin ligases, thereby affecting gene expression patterns. For example, upon NEAT1_2 upregulation and paraspeckle number increase, nucleoplasmic SFPQ protein is sequestrated into paraspeckles, reducing SFPQ-mediated transcription of target genes such as
Third, all six arcRNA-constructed nuclear bodies are assembled at the site of arcRNA transcription (Fig. 3) (Audas et al., 2012; Clemson et al., 2009; Dangli et al., 1983; Liu et al., 2006; Rizzi et al., 2004; Shimada et al., 2003).
Fourth, most arcRNAs are transcribed from or include repetitive sequences. IGS and SatIII lncRNAs are transcribed from the ribosomal DNA repeat intergenic regions and peri-centromeric SatIII repeats, respectively (Audas et al., 2012; Rizzi et al., 2004).
ArcRNA-constructed nuclear bodies contain various protein components, and there are several commonalities between the body proteins (Fig. 3). First, many are RNA-binding proteins, as expected given the requirement of RNA to build the bodies. Second, arcRNA-constructed nuclear bodies are often enriched with RNA-binding proteins that also contain a form of LCD called a prion-like domain (PLD) (Table 1) (Chujo et al., 2016; King et al., 2012; Naganuma et al., 2012). The PLD is rich in amino acids such as tyrosine, asparagine, glutamine, and glycine, and can form protein–protein interactions. Some of the PLD-containing RNA-binding proteins may bridge RNPs to form large nuclear bodies. Indeed, each paraspeckle contains about 50 NEAT1_2 molecules, and the PLDs of paraspeckle proteins such as FUS and RBM14 are required to construct paraspeckles (Chujo et al., 2017; Hennig et al., 2015). Moreover, recombinant RBM14 or FUS can form hydrogel in a PLD-dependent manner (Hennig et al., 2015). These results suggest that PLDs of RBM14 and FUS may induce phase separation to form paraspeckles in the nucleoplasm. Also, the PLD of HNRNPD is required to construct the SNB, and PLD-containing proteins (e.g., Nona and Hrb87F) are enriched in the
ArcRNAs are widely used as the architectural molecule of nuclear bodies in a variety of eukaryotes, suggesting that RNA was evolutionarily adopted as the scaffold molecule in some cases. In this section, we discuss three possible reasons why RNA is used as the architectural molecule of nuclear bodies: position-specificity, rapidity, and economy in sequestering nucleic acid-binding proteins.
First, the use of RNA enables position-specific nuclear body formation, which enables gene regulation at a specific nuclear region. To induce LLPS to form membraneless subcellular bodies, it is necessary to raise the concentration of the key scaffold molecules above a threshold. In the nucleus, certain homogeneous RNAs can be highly concentrated at the site of transcription, if the transcription level is high enough (Fig. 1 and 3). In fact, all six known nuclear bodies with arcRNA scaffolds are formed at the site of arcRNA transcription (Audas et al., 2012; Clemson et al., 2009; Dangli et al., 1983; Liu et al., 2006; Rizzi et al., 2004; Shimada et al., 2003). Moreover, by simply repeating a specific scaffold sequence in one RNA molecule, the scaffold sequence can easily be enriched in one place, as in the case of SatIII (containing repeats of 158 nt), Hsr omega (containing repeats of 280 nt), and DM1-related RNA with triplet repeat expansion (Garbe et al., 1986; Moyzis et al., 1987; Taneja et al., 1995). In addition, RNA can increase the local concentration of specific RNA-binding proteins that the RNA interacts with. As many RNA-binding proteins bear PLDs (Table 1), arcRNA can increase the local concentration of PLD-containing RNA-binding proteins to promote PLD-mediated LLPS to construct a massive complex. For example, PLD-containing RNA-binding protein FUS forms high-order assemblies at a low concentration in the presence but not in the absence of RNAs (Schwartz et al., 2013). Through nuclear body formation at the transcription sites of arcRNAs, arcRNAs can regulate the local nucleoplasmic concentration of freely available regulatory proteins by sequestrating them into the nuclear body, which may eventually regulate local gene expression.
Second, RNA enables rapid and reversible sequestration of nucleic acid-binding proteins. Interestingly, all the known nuclear bodies with arcRNA scaffolds form or increase in size or number upon specific stimuli (Table 1). When such stimuli are no longer present, arcRNAs can be quickly degraded and proteins trapped in the bodies disperse into the nucleoplasm, rapidly allowing gene expression to return to normal. For example,
Third, it is more economical to use RNAs than proteins to sequestrate a massive number of various nucleic acid-binding proteins. The formation of a nuclear body requires the accumulation of specific proteins at specific sites in the nucleus. Therefore, it is beneficial for the architectural molecules to be able to trap a massive number of proteins. One protein can usually be captured by one or several protein domains containing tens to hundreds of amino acids, whereas one protein can be captured by only 4–17 nt of RNA (Lunde et al., 2007; Prikryl et al., 2011). Thus, one protein comprising 100 amino acids can capture only one or two proteins, whereas one RNA molecule comprising 100 nt may capture 5–20 proteins. One arcRNA molecule, for instance a 23-kb NEAT1_2 RNA molecule, may capture thousands of proteins. To date, more than 60 protein species have been identified as components of paraspeckles (Yamazaki and Hirose, 2015). The majority of these are RNA-binding proteins and transcription factors. Such collective binding and sequestration of various nucleic acid-binding proteins helps to regulate the overall gene expression. Also, whereas longer polypeptides of random sequences can easily aggregate and become toxic to cells, longer RNAs of random sequences remain soluble due to the charged phosphate backbone. This may be one reason most proteins are shorter than 2,000 amino acids, whereas many lncRNAs and mRNA precursors comprise 10,000 nt. A longer architectural molecule enables multivalent interaction with a massive number of nucleic acid-binding proteins, which facilitates the regulation of gene expression. Whereas body proteins rapidly exchange in and out of the body, arcRNA can stay inside the body (Mao et al., 2011), which is presumably due to the high molecular weight of arcRNA and the great number of interactions it makes. Such soluble, highly multivalent, and static features may make arcRNAs competent architectural molecules. RNA hardly forms insoluble aggregates and the reading frames of lncRNAs do not need to remain translatable; therefore, the sequence of an architectural lncRNA is likely to be only constrained by the requirement to form the correct RNA–protein interaction, and an arcRNA has flexibility to add and change sequences. Such advantages would allow lncRNAs to increase the diversity of their binding partners and the combinations of proteins that are incorporated into nuclear bodies, enabling the formation of a wide variety of nuclear bodies for faster adaptation to environmental changes.
Formation of known arcRNA-dependent nuclear bodies is initiated by expression of arcRNAs induced during specific stresses and developmental stages. So, considering that there are over 20,000 lncRNAs with unknown functions (Hon et al., 2017), it is likely that arcRNA-dependent nuclear bodies remain to be identified in various cellular contexts. Here, we introduce two recently reported methods to discover novel nuclear bodies with arcRNA scaffolds: 1) RNase-sensitivity screening of nuclear bodies (Mannen et al., 2016), and 2) transcriptome-wide screening of semi-extractable RNA (Chujo et al., 2017).
The structures of known arcRNA-constructed nuclear bodies can be disrupted by RNase treatment of the permeabilized cells. Thus, in combination with a venus-tagged cDNA expression library, one can seek proteins that form foci that are disrupted upon RNase treatment. In the reported RNase-sensitivity screen, first, a venus-tagged cDNA expression library of 10,432 proteins was used to select 463 proteins that formed foci within the nucleus (Hirose and Goshima, 2015). Then, the cells were permeabilized with weak detergent, followed by treatment with RNases, identifying 32 proteins in HeLa cells that form RNase-sensitive foci in the nucleus. Many of the identified proteins were known components of various nuclear bodies. However, six proteins were new components of the SNB, one of which, HNRNPD, was required for SNB formation. This study also showed that both the RNA-binding domain and the PLD of HNRNPD are required for the SNB formation; the PLD of HNRNPD enabled HNRNPD–HNRNPD and HNRNPD–Sam68 interactions. These results collectively demonstrate that SNB formation requires (a) putative arcRNA(s), RNA–protein interactions, and protein–protein interactions. The molecular entity of SNB arcRNA remains to be identified.
Transcriptome-wide screening of semi-extractable RNAs looks for RNAs that are unusually difficult to extract from cells. This method originated from a serendipitous finding that cellular NEAT1_2 is extremely difficult to extract using conventional RNA extraction methods. A complete extraction of cellular NEAT1_2 requires either needle shearing or heating of the cell lysates in an RNA extraction reagent such as TRIzol, a property that was named semi-extractability. One feature of phase-separated liquid droplets is that such droplets can be broken down by shearing forces. So, requirement of needle shearing to extract NEAT1_2 suggests that after lysing cells with TRIzol, paraspeckles might still somehow remain as phase-separated droplets, unless sheared with a needle or heated. Semi-extractability of NEAT1_2 was largely dependent on the FUS protein, especially its PLD, which also indicates that LLPS of paraspeckles is the cause of semi-extractability. As LLPS and PLD are often linked to nuclear body formation, it was hypothesized that unidentified semi-extractable RNAs might also form nuclear bodies. Using transcriptome-wide identification of semi-extractable RNAs in HeLa cells, 45 semi-extractable RNAs were identified, of which the top 10 abundant RNAs all showed predominant localization in nuclear body-like foci. Of these, three RNAs (CCAT1 lncRNA, LINC00473 lncRNA, and ADARB2 pre-mRNA) were confirmed to form foci not only at the transcription site, but also outside of transcription sites, strongly suggesting formation of novel nuclear bodies. To firmly establish these RNAs as arcRNAs, body marker proteins need to be identified, and disintegration of body marker foci upon RNA depletion needs to be confirmed. As this semi-extractable RNA screening can be performed relatively easily, this method is useful to find novel arcRNAs in various cellular contexts. Taken together, the two approaches above suggest the presence of unidentified arcRNAs in the human transcriptome. Identification and characterization of the additional arcRNAs will unveil the commonality in this class of RNA in terms of the sequence elements and modes of action.
Recent studies have greatly advanced our mechanistic understanding of the formation and function of arcRNA-constructed nuclear bodies. However, many important questions remain to be answered, including the following:
What are the fundamental rules of RNA sequences and/or RNA structures that enable LLPS of the nuclear bodies? Are they clusters of binding sites for PLD-containing RNA-binding proteins?
There are more than 20,000 human lncRNAs with unknown functions. Are there novel, uncharacterized arcRNAs that function during specific cellular stresses and developmental stages?
There is a large gap between the molecular functions of arcRNA-constructed nuclear bodies (e.g., protein sequestration) and the physiological functions of the bodies (e.g., organ development). How does the molecular role of the nuclear bodies link to the physiological functions?
. The six known nuclear bodies built on arcRNAs.
Nuclear body | arcRNA | Inducing stress or signal | Repetitive sequences in arcRNA | Body proteins | Molecular, cellular, and physiological functions |
---|---|---|---|---|---|
Paraspeckle | NEAT1 | Viral infection, hypoxia, MG132, corpus luteum formation, mammary gland development | Fragments of transposable elements | > 60 proteins including PSPC1a, SFPQa, NONO, RBM14a, HNRNPK, FUSa, DAZAP1a, HNRNPH3a, HNRNPA1a, HNRNPR, HNRNPUL1a, TDP-43a, BRG1b, BRMb, BAF155b (Yamazaki and Hirose, 2015) | Sequestration of RNA-binding proteins and transcription factors. Retention of mouse CTN RNA. |
Amyloid body | IGS | Acidosis, heat shock, transcriptional stress | Ribosomal DNA repeats | VHL, DNMT1, POLD1, HSP70, MDM2, RPA40, RPA16, NOL1, NOM1, NOP52, PES1, RRP1B, SENP3 | Immobilization of body proteins, activation of HIF. |
Nuclear stress body | Satellite III | Heat shock, cadmium | Satellite III repeats | SRSF1, SAFB, TDP-43a, HSF1, NFAT5a, BRG1b | Sequestration of RNA-binding proteins and transcription factors |
Histone locus body | Histone mRNA precursor | Interphase, especially G1 and S phases | Unknown | FLASH, LSM10, LSM11, SLBP, SYMPLEKIN | Processing of histone pre-mRNAs |
Omega speckle | Hsr omega | Heat shock, amides, ecdysone hormone | Tandem repeats of 280 nt for 10 kb | Nonaa, Sex-lethala, sans fille, PEPa, Hrb87Fa, Hrp40a, Hrb57Aa, ISWIb | Normal development (Nullisomy animals are embryonic lethal or extremely weak) |
Mei2 dot | meiRNA | Entry into meiosis | U(U/C)AAAC | Mei2, Mmi1 | Sequestration of RNA-binding proteins |
aProteins containing prion-like domains.
bChromatin remodeling complex protein.