Mol. Cells 2019; 42(7): 512-522
Published online July 26, 2019
https://doi.org/10.14348/molcells.2019.0137
© The Korean Society for Molecular and Cellular Biology
Correspondence to : ijung@kaist.ac.kr
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
Chromosomes located in the nucleus form discrete units of genetic material composed of DNA and protein complexes. The genetic information is encoded in linear DNA sequences, but its interpretation requires an understanding of three-dimensional (3D) structure of the chromosome, in which distant DNA sequences can be juxtaposed by highly condensed chromatin packing in the space of nucleus to precisely control gene expression. Recent technological innovations in exploring higher-order chromatin structure have uncovered organizational principles of the 3D genome and its various biological implications. Very recently, it has been reported that large-scale genomic variations may disrupt higher-order chromatin organization and as a consequence, greatly contribute to disease-specific gene regulation for a range of human diseases. Here, we review recent developments in studying the effect of structural variation in gene regulation, and the detection and the interpretation of structural variations in the context of 3D chromatin structure.
Keywords 3D chromatin structure, gene regulation, Hi-C, structural variation, topologically associating domain
The mammalian genome needs to be tightly packaged into the nucleus in order to fit its relatively large size to the limited space inside nucleus. Thus, there exist long-lasting questions on how the genome is organized into a three-dimensional (3D) structure to be folded into the nucleus and its consequential effect on the genome function. In the last decade, the development of chromosome conformation capture (3C) technology and its variations have revolutionized the analysis of the 3D genome organization at high-resolution compared to imaging based methods and uncovered basic principles underlying chromatin folding (Dekker et al., 2002; Dostie et al., 2006; Lieberman-Aiden et al., 2009; Rao et al., 2014; Simonis et al., 2006; Zhao et al., 2006). These ‘C’ technologies first fragmentize chromatin by restriction enzyme digestion and ligate intra-molecules to convert spatially proximal DNA fragments into a unique DNA ligation product. As a result, the ligation frequency can be an indicator of the spatial distance between two genomic loci, regardless of their linear genomic distance (Dekker et al., 2002). The 3C method detects ligation products one at a time by polymerase chain reaction (PCR) amplification using locus-specific primers as a measurement of one-to-one interactions. Systematic detection of chromatin interactions at increasing scales and resolutions was enabled by developments of various 3C-based methods in conjunction with genome-wide high-throughput approaches. Both 4C (circular chromosome conformation capture/chromosome conformation capture-on-chip) (Simonis et al., 2006; Splinter et al., 2012; van de Werken et al., 2012; Zhao et al., 2006) and 5C (3C-carbon copy) (Dostie et al., 2006) methods begin with 3C templates. The 4C methods capture one-to-all chromatin interactions by detecting the ligation frequencies between one locus (a bait region) and all other genomic loci using inverse PCR that uses primers for the bait sequence to amplify its ligation partners. In contrast to 4C, 5C uses multiplexed ligation-mediated amplification to quantify all potential interactions between the targeted genomic loci of the 3C library to detect many-to-many interactions. The development of Hi-C (high-throughput chromosome conformation capture) technology, combined with high throughput sequencing technologies and biotin mark at ligation junctions, allowed the capture of all-to-all chromatin contacts in a genome-wide and unbiased manner (Lieberman-Aiden et al., 2009; Rao et al., 2014). The unprecedented resolution and comprehensive view of 3D genome maps enabled by Hi-C has revolutionized the characterization of higher-order chromatin structure (Dixon et al., 2012; Guo et al., 2015; Jin et al., 2013; Lieberman-Aiden et al., 2009; Rao et al., 2014; 2017; Schmitt et al., 2016; Vian et al., 2018).
These ‘C’-based technological innovations not only uncover the principles of higher-order chromatin structure, but also reveal that 3D genome is tightly coupled with other nuclear processes including cellular differentiation and reprogramming (Dixon et al., 2015; Krijger et al., 2016; Siersbaek et al., 2017), DNA replication (Pope et al., 2014), and X chromosome inactivation (Crane et al., 2015; Engreitz et al., 2013; Giorgetti et al., 2016). Especially, the interplay between 3D chromatin structure and transcription has a critical role in determining the cell fate, orchestrated by multiple chromatin regulators, specific transcription factors, and long non-coding RNAs (Chen et al., 2016; Chong et al., 2018; de Wit et al., 2013; Stadhouders et al., 2018; 2019). In this aspect, recent studies highlighted the disorganization of 3D chromatin structure as a cause of aberrant gene regulation mechanisms in various human diseases (Flavahan et al., 2016; Franke et al., 2016; Hnisz et al., 2016; Lupianez et al., 2015). In addition to the epigenetic alterations, the disorganized 3D genome often involves large scale genomic variations including deletions, inversions, duplications, and translocations (Lupianez et al., 2015). Although only handful of studies have systematically investigated the relationship between large scale genomic variations and 3D chromatin structure (Chakraborty and Ay, 2017; Dixon et al., 2018; Weischenfeldt et al., 2017), the field is rapidly growing due to its significant biological implications. Here, we introduce recent studies on the basic components of the 3D genome that mediates the effect of large scale genomic variations in gene regulation and the ways in which we can precisely detect and interpret these variations in the context of 3D chromatin structure.
Extensive analyses of genome-wide chromatin contact maps have uncovered that the genome is hierarchically organized into multiple layers in the nucleus, from chromatin loops that connect distant DNA fragments such as enhancer and promoter, to larger chromosomal domains known as topologically associating domains (TADs) and compartmentalized structures in the chromosome. In order to understand the effect of large-scale genomic variations on 3D chromatin structure, we first briefly discuss recent discoveries regarding the basic principles of higher-order chromatin organization.
The first Hi-C study generated genome-wide chromatin contact maps at Megabase (Mb) resolution and demonstrated a plaid pattern of 3D chromatin structure of interphase nucleus in the mammalian genome (Fig. 1A), indicating that the mammalian genome is spatially compartmentalized into two parts, labelled compartments A and B (Lieberman-Aiden et al., 2009). Compartment A/B regions are of multi-Mb scale and the regions in the same compartments tend to be spatially proximal compared to the regions in different compartments (Lieberman-Aiden et al., 2009). This spatial segregation of chromatin is highly associated with various nuclear structures. For example, compartment A is often found in the interior nuclear space and euchromatin regions, while compartment B is highly concordant with nuclear lamina associated domains and heterochromatin regions (Pombo and Dillon, 2015; van Steensel and Belmont, 2017). The spatial segregation of chromatin is also dynamically reorganized during cellular differentiation and between cell-types, concordant with cell-type specific gene expression patterns (Dixon et al., 2015; Schmitt et al., 2016), suggesting a close relationship between 3D chromatin structure and gene regulation.
High-resolution chromatin contact maps obtained from recent Hi-C studies have revealed sub-Mb scale domains in which fragments located in the same domain have a higher number of chromatin contacts compared to the number of interactions between fragments located in different domains. These distinct units are now referred to as TADs (Fig. 1B) (Dixon et al., 2012; Nora et al., 2012). Although the mechanisms underlying TAD formation are yet to be clarified, the most promising model is the loop extrusion procedure that involves dimerization of CTCF proteins located in TAD boundaries and stabilization of loops by cohesin proteins demarcating TAD formation (Fig. 1C) (Rao et al., 2014; 2017; Sanborn et al., 2015; Schwarzer et al., 2017; Vian et al., 2018). A set of studies strongly support that TAD is a basic unit of 3D chromatin structure. First, TAD boundaries are well conserved during cellular differentiation (Dixon et al., 2015) (Fig. 1D) and even between species (Dixon et al., 2012) (Fig. 1E). Second, chromatin interactions are dynamically reorganized during cellular differentiation, albeit the interaction changes occur in a TAD-wise manner, where the pattern of either increase or decrease in chromatin contacts is similar across the fragments in the same TAD (Fig. 1F). Lastly, long-range enhancer-promoter interactions are restricted by TAD boundaries (Fig. 1G). Mammalian genomes, especially the human genome, can be uniquely characterized by the enrichment of
Since proper folding of chromatin structure is crucial in gene regulation, disruption of TAD boundaries or disorganization of inter-TAD structures can cause aberrant gene expression by exposing genes to inappropriate regulatory elements. The fact that mutations in the genes that encode structural proteins such as CTCF and cohesin have been frequently linked to human diseases and developmental abnormalities supports the notion of a close link between chromatin folding and disease development (Hnisz et al., 2016; Katainen et al., 2015). Recently, two mechanisms have been proposed on how TAD alteration contributes to abnormal gene expression (Valton and Dekker, 2016). One mechanism locally disrupts domains by deleting or dysregulating TAD boundaries, leading to fusion of two adjacent TADs (Flavahan et al., 2016; Hnisz et al., 2016). Another mechanism breaks existing TADs and the resulting TAD fragments form new combinations, creating new TADs, without directly affecting TAD boundaries (Groschel et al., 2014; Northcott et al., 2014). Large scale genomic variations known as structural variations (SV), including duplications, deletions, inversions, and translocations are often involved in TAD boundary disruptions or disorganization of inter-TAD structures.
Genomic deletion spanning TAD boundaries or inhibition of structural protein binding sites can result in fusion of neighboring TADs, a phenomenon known as TAD fusion (Fig. 2A), while duplication spanning TAD boundaries can cause formation of neo-TADs (Franke et al., 2016; Hnisz et al., 2016; Lupianez et al., 2015). TAD fusion rewires the spatial distance between enhancers and promoters that were originally located in two different TADs, which can postulate abnormal gene activation. For example, recurrent deletion of the boundary sequences which results in up-regulation of prominent proto-oncogenes via TAD fusion was identified in T cell acute lymphoblastic leukemia (T-ALL) patient samples (Hnisz et al., 2016). This is an evidence for oncogene activation resulting from
SVs between TADs (deletions, inversions, translocations, and duplications spanning multiple TADs) can rearrange sub-regions of TADs to create new TADs, through a phenomenon known as neo-TAD formation or TAD shuffling. For example, a balanced translocation or inversion relocates sub-regions of two TADs to one other and deletion of the region spanning multiple TADs may generate shuffled TADs by joining sub-regions of two distant TADs (Fig. 2B). Although further in-depth investigations are required to generalize the effect of TAD alteration on enhancer-promoter interactions, several studies have clearly shown that TAD shuffling caused by inter-TAD SVs can be critical in oncogenesis, as observed in acute myeloid leukemia (AML) patients where relocation of enhancer by inversion in chromosome 3 activates
Aberrant gene expression resulted from SV-driven TAD alteration emphasizes the importance of accurate detection and interpretation of large-scale genomic rearrangements in various human diseases. However, despite the significant contributions made by the current usage of whole-genome sequencing (WGS) based approaches, they were limited in precise detection and interpretation of SVs due to sample purity, repeat sequences, transposons, complex form of genomic rearrangements, and limited coverage of sequencing depth and read length (Alkan et al., 2011; Campbell et al., 2008; Rausch et al., 2012; Tattini et al., 2015; Zhang et al., 2018). These limitations led to greater discrepancies in the detected SVs between different calling algorithms (Kosugi et al., 2019). In this aspect, additional information of the 3D architecture of the genome can be employed as a new strategy to precisely detect and interpret SVs. Indeed, recent studies have demonstrated the effectiveness of utilizing 3D chromatin contact maps in characterizing large scale SVs, particularly in cancer cell lines and primary tumor samples (Dixon et al., 2018; Harewood et al., 2017; Rickman et al., 2012).
In principle, WGS methods use the number of reads spanning the breakpoint of each SV for detection. However, the number of reads supporting the SV is limited by the read coverage at the breakpoint and the fraction of clones carrying the corresponding SV. Therefore, the detection efficiency of SVs is highly affected and often limited by sequencing depth and allele frequencies (Fig. 3A top right). In contrast, in chromatin contact maps, the reads spanning the breakpoints are not the only evidence for the SV, but also the reads corresponding to ligated DNA fragments located nearby the SV breakpoint can further support the presence of the SV (Fig. 3A bottom right). Due to random collisions in the crowded nucleus, the frequencies of ligated DNA fragments exponentially decay with increasing genomic distance (Fig. 3A bottom right). Therefore, ligated reads located nearby the breakpoints of the SV are detected at an exponentially greater level due to shorter genomic distance after rearrangement of the genome by the SV (denoted ‘d2’ in Fig. 3A left and bottom right), compared to a much lower number of ligated reads in the absence of SV (denoted ‘d1’ in Fig. 3A left and bottom right). Unlike chromatin contact maps, the read coverage of WGS does not reflect any difference according to the change in genomic distance between A and B (denoted in Fig. 3A top right). Therefore, chromatin contact maps are very sensitive to the detection of such large-scale SVs regardless of the original linear genomic distances, which is a great merit for detecting SVs in low purity samples, such as human tumor tissues.
In addition to the importance of detecting SVs, it is also crucial to correctly interpret the type of SVs such as inversion, duplication, deletion, translocation, and other complex form of SVs in order to understand the effect of SVs in gene regulation. WGS, however, uses mapped orientation of reads at breakpoints to determine the type of SVs. Therefore, if the read lengths cannot cover the whole rearranged region or in case of complex SVs, the exact type of SV cannot be clearly distinguished using only mapped read orientation information (Alkan et al., 2011; Soylev et al., 2019). For example, balanced inversion and duplication events containing inverted segments are difficult to be classified solely by read mapping orientation (Soylev et al., 2019). In this regard, chromatin contact maps are highly useful, since different types of SVs produce unique chromatin contact signature (gradient pattern), where the number of chromatin interactions is the highest at the breakpoint and gradually decreases with increasing distance from the breakpoint (Figs. 3B–3E). The direction of the gradient pattern reflects the orientation of the rearranged segments and thereby indicates the type of the corresponding SV. For example, deletion produces new chromatin contact signals between upstream regions of the start breakpoint and the downstream regions of the end breakpoint, generating new chromatin contacts at the upper part of the breakpoint coordinates (Fig. 3B). Inversion produces gradient patterns at both left and right sides of the breakpoint coordinates, which is known as the ‘butterfly’ shape (Fig. 3C) (Harewood et al., 2017). In case of translocations, reciprocal and nonreciprocal translocations can be determined by the presence of either gradient patterns at both sides, similar to those of inversion, or gradient pattern only at one side, respectively (Fig. 3D). In case of duplications, the gradient patterns can be classified into three sub-types (Fig. 3E). Tandem duplications without inverted fragment produce gradient pattern at the bottom of the breakpoint coordinates. In contrast, duplication with inverted fragment produces gradient pattern at either left or right side of the breakpoint coordinates according to the position of inverted duplicated DNA fragment.
Due to its high sensitivity in detecting large-scale SVs and precise interpretation of SV orientations, several computational algorithms have been developed to identify SVs based on chromatin contact maps and combined approaches using multiple platforms including Hi-C and WGS have been proposed to improve the detection power for SVs (Chaisson et al., 2019; Dixon et al., 2018; Harewood et al., 2017; Jacobson et al., 2019; Rickman et al., 2012).
Recent cancer genomic studies have shown superposition of multiple large-scale SVs that can affect the entire chromosome, contributing to oncogenesis or cancer progression (Notta et al., 2016). However, SV detection with WGS can include many false positive SVs due to inaccurate mapping of short sequenced reads, and thus, requires elaborate and laborious manual curation. Therefore, it is challenging to identify complex forms of SV using WGS. Here, we address this issue by describing how chromatin contact information can be used to identify and interpret complex forms of SVs.
First, chromatin contact maps can provide a direct evidence for the existence of chromosome-wide SV, which can greatly reduce the number of false positive SVs. For example, the detection of paired-end reads spanning A and B regions in WGS can be the result of three possible cases (Fig. 4A left). It can be a result of a ring chromosome formation, which is recognized as an extremely large-scale duplication event. Meanwhile, it can be a result of an extremely large-scale deletion event. Lastly, it could be a simple mapping error of the paired-end reads due to the similarity between A′ and B, resulting in false positive SV calling. Indeed, the orientation of mapped reads is not sufficient to distinguish the three given cases (Fig. 4A middle), requiring additional information such as copy number variation profiles. However, chromatin contact maps show unique gradient patterns for each case (Fig. 4A right). Gradient pattern showing increased chromatin contact signals at the bottom of the breakpoint indicates an actual rearrangement of the chromosome structure, while gradient pattern at the top of the breakpoint validates the occurrence of an extremely large-scale deletion event. In case of the mapping error, we would simply not observe any gradient pattern in the chromatin contact map.
Second, chromatin contact maps can provide linkage information between the chromosomal segments that were shattered by a complex genomic rearrangement, allowing reconstruction of aberrant genome to understand the effect of complex forms of SVs. For example, chromothripsis is one of the most dramatic types of chromosome-wide genomic rearrangement (Stephens et al., 2011), in which a whole chromosome is shattered into multiple pieces by a single catastrophic event, followed by derivative chromosome formation via aberrant repair process of the broken fragments, resulting in massive rearrangements and loss of the fragments (Fig. 4B left). Due to the complexity of the genomic rearrangement, one-dimensional genomic alteration diagram generated by WGS data can mislead the interpretation of rearranged chromatin structure (Fig. 4B middle). However, the chromatin contact map provides linkage information between aberrantly repaired adjacent fragments, since chromothripsis is caused by a single event that generates fragments sharing no more than two breakpoints, a feature clearly seen in chromatin contact maps (Fig. 4B right). Therefore, reconstruction of derivative chromosome can be enabled by the linkage information obtained from chromatin contact maps (Burton et al., 2013), which in turn allows accurate prediction of the effect of chromothripsis in genome function.
Lastly, chromatin contact maps can also be used to intuitively detect complex forms of SVs. For example, chromoplexy is another form of dramatic chromosome-wide rearrangement where multiple chromosomes are cleaved and joined together through a single catastrophic event (Fig. 4C left) (Baca et al., 2013). The provided example shows chromoplexy between three chromosomes where the points A and B are the breakpoints. Though the feature of chromoplexy is well characterized from the distinct ‘closed-chain’ patterns in the circos plot, it is hard to recognize chromoplexy in presence of other unrelated translocation events located in close proximity (black lines in Fig. 4C middle). However, chromatin contact map generates unique signatures where the breakpoints (blue and red dotted lines in Fig. 4C right) of translocations involved in chromoplexy are aligned into a line as they share breakpoint coordinates (highlighted in yellow in Fig. 4C right), while unrelated translocations are not joined at the alignment lines (black arrows in Fig. 4C right). This unique feature can be utilized to more intuitively understand complex rearrangements of the chromosomes.
SV are important in generating genomic diversity between individuals (Chaisson et al., 2019; Levy-Sakin et al., 2019) and are involved in disease-specific gene regulation mechanisms. Thus, precise detection and interpretation of large-scale genomic rearrangements are crucial, but WGS-based conventional methods are limited in solving this problem. As an alternative or supplementary approach, a set of recent studies strongly suggest that the utilization of 3D genome structure is highly efficient in detecting and interpreting large-scale SVs (Chaisson et al., 2019; Dixon et al., 2018; Harewood et al., 2017; Jacobson et al., 2019; Rickman et al., 2012). Nevertheless, several challenges remain. The first major demerit of the SV identification based on chromatin contact map is the resolution; small size SVs that do not significantly change the chromatin interactions cannot be detected in the chromatin contact map. Currently, Hi-C based chromatin contact maps are very useful in detecting large-scale SVs, but it is limited in detecting SVs within 1 Mb scale, since the gradient patterns generated by SVs are not strong enough compared to the contact frequencies between fragments in close genomic distance in the original chromatin contact maps. Further, the resolution of the chromatin contact map is generally tens of kilobases, which is too large to determine the exact breakpoint of the SVs. Development of new computational methods such as deep learning algorithms is required to precisely detect smaller SVs and compute exact breakpoints solely based on Hi-C result. Second, the inability to predict the effect of SVs in the context of 3D genome hinders the interpretation of the functional consequence of SVs. Due to the limited knowledge of non-coding regulatory regions in the genome, it is difficult to predict the pathogenicity of each SV in the context of higher-order chromatin structure. Although new bioinformatics approaches are being attempted to address this challenge (Weischenfeldt et al., 2017), the complex SVs make it more difficult to predict their effects. Thus, integration of SV-driven 3D chromatin structure and basic principles of gene regulation mechanisms is essential to predict the pathogenicity of each SV.
To conclude, indeed, there are many hurdles in applying this new strategy for detection and interpretation of SVs, but the development of new computational methods and integrative approaches will result in a very powerful tool to comprehensively understand the complex rearrangement of genome driven by SVs in both normal and disease context, beyond the one-dimensional genome.
This work was funded by the Ministry of Science, ICT, and Future Planning through the National Research Foundation in Republic of Korea (2017R1C1B2008838) and Korean Ministry of Health and Welfare (HI17C0328).
Mol. Cells 2019; 42(7): 512-522
Published online July 31, 2019 https://doi.org/10.14348/molcells.2019.0137
Copyright © The Korean Society for Molecular and Cellular Biology.
Kyukwang Kim1,2, Junghyun Eom1,2, and Inkyung Jung1,*
1Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea, 2These authors contributed equally to this work.
Correspondence to:ijung@kaist.ac.kr
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
Chromosomes located in the nucleus form discrete units of genetic material composed of DNA and protein complexes. The genetic information is encoded in linear DNA sequences, but its interpretation requires an understanding of three-dimensional (3D) structure of the chromosome, in which distant DNA sequences can be juxtaposed by highly condensed chromatin packing in the space of nucleus to precisely control gene expression. Recent technological innovations in exploring higher-order chromatin structure have uncovered organizational principles of the 3D genome and its various biological implications. Very recently, it has been reported that large-scale genomic variations may disrupt higher-order chromatin organization and as a consequence, greatly contribute to disease-specific gene regulation for a range of human diseases. Here, we review recent developments in studying the effect of structural variation in gene regulation, and the detection and the interpretation of structural variations in the context of 3D chromatin structure.
Keywords: 3D chromatin structure, gene regulation, Hi-C, structural variation, topologically associating domain
The mammalian genome needs to be tightly packaged into the nucleus in order to fit its relatively large size to the limited space inside nucleus. Thus, there exist long-lasting questions on how the genome is organized into a three-dimensional (3D) structure to be folded into the nucleus and its consequential effect on the genome function. In the last decade, the development of chromosome conformation capture (3C) technology and its variations have revolutionized the analysis of the 3D genome organization at high-resolution compared to imaging based methods and uncovered basic principles underlying chromatin folding (Dekker et al., 2002; Dostie et al., 2006; Lieberman-Aiden et al., 2009; Rao et al., 2014; Simonis et al., 2006; Zhao et al., 2006). These ‘C’ technologies first fragmentize chromatin by restriction enzyme digestion and ligate intra-molecules to convert spatially proximal DNA fragments into a unique DNA ligation product. As a result, the ligation frequency can be an indicator of the spatial distance between two genomic loci, regardless of their linear genomic distance (Dekker et al., 2002). The 3C method detects ligation products one at a time by polymerase chain reaction (PCR) amplification using locus-specific primers as a measurement of one-to-one interactions. Systematic detection of chromatin interactions at increasing scales and resolutions was enabled by developments of various 3C-based methods in conjunction with genome-wide high-throughput approaches. Both 4C (circular chromosome conformation capture/chromosome conformation capture-on-chip) (Simonis et al., 2006; Splinter et al., 2012; van de Werken et al., 2012; Zhao et al., 2006) and 5C (3C-carbon copy) (Dostie et al., 2006) methods begin with 3C templates. The 4C methods capture one-to-all chromatin interactions by detecting the ligation frequencies between one locus (a bait region) and all other genomic loci using inverse PCR that uses primers for the bait sequence to amplify its ligation partners. In contrast to 4C, 5C uses multiplexed ligation-mediated amplification to quantify all potential interactions between the targeted genomic loci of the 3C library to detect many-to-many interactions. The development of Hi-C (high-throughput chromosome conformation capture) technology, combined with high throughput sequencing technologies and biotin mark at ligation junctions, allowed the capture of all-to-all chromatin contacts in a genome-wide and unbiased manner (Lieberman-Aiden et al., 2009; Rao et al., 2014). The unprecedented resolution and comprehensive view of 3D genome maps enabled by Hi-C has revolutionized the characterization of higher-order chromatin structure (Dixon et al., 2012; Guo et al., 2015; Jin et al., 2013; Lieberman-Aiden et al., 2009; Rao et al., 2014; 2017; Schmitt et al., 2016; Vian et al., 2018).
These ‘C’-based technological innovations not only uncover the principles of higher-order chromatin structure, but also reveal that 3D genome is tightly coupled with other nuclear processes including cellular differentiation and reprogramming (Dixon et al., 2015; Krijger et al., 2016; Siersbaek et al., 2017), DNA replication (Pope et al., 2014), and X chromosome inactivation (Crane et al., 2015; Engreitz et al., 2013; Giorgetti et al., 2016). Especially, the interplay between 3D chromatin structure and transcription has a critical role in determining the cell fate, orchestrated by multiple chromatin regulators, specific transcription factors, and long non-coding RNAs (Chen et al., 2016; Chong et al., 2018; de Wit et al., 2013; Stadhouders et al., 2018; 2019). In this aspect, recent studies highlighted the disorganization of 3D chromatin structure as a cause of aberrant gene regulation mechanisms in various human diseases (Flavahan et al., 2016; Franke et al., 2016; Hnisz et al., 2016; Lupianez et al., 2015). In addition to the epigenetic alterations, the disorganized 3D genome often involves large scale genomic variations including deletions, inversions, duplications, and translocations (Lupianez et al., 2015). Although only handful of studies have systematically investigated the relationship between large scale genomic variations and 3D chromatin structure (Chakraborty and Ay, 2017; Dixon et al., 2018; Weischenfeldt et al., 2017), the field is rapidly growing due to its significant biological implications. Here, we introduce recent studies on the basic components of the 3D genome that mediates the effect of large scale genomic variations in gene regulation and the ways in which we can precisely detect and interpret these variations in the context of 3D chromatin structure.
Extensive analyses of genome-wide chromatin contact maps have uncovered that the genome is hierarchically organized into multiple layers in the nucleus, from chromatin loops that connect distant DNA fragments such as enhancer and promoter, to larger chromosomal domains known as topologically associating domains (TADs) and compartmentalized structures in the chromosome. In order to understand the effect of large-scale genomic variations on 3D chromatin structure, we first briefly discuss recent discoveries regarding the basic principles of higher-order chromatin organization.
The first Hi-C study generated genome-wide chromatin contact maps at Megabase (Mb) resolution and demonstrated a plaid pattern of 3D chromatin structure of interphase nucleus in the mammalian genome (Fig. 1A), indicating that the mammalian genome is spatially compartmentalized into two parts, labelled compartments A and B (Lieberman-Aiden et al., 2009). Compartment A/B regions are of multi-Mb scale and the regions in the same compartments tend to be spatially proximal compared to the regions in different compartments (Lieberman-Aiden et al., 2009). This spatial segregation of chromatin is highly associated with various nuclear structures. For example, compartment A is often found in the interior nuclear space and euchromatin regions, while compartment B is highly concordant with nuclear lamina associated domains and heterochromatin regions (Pombo and Dillon, 2015; van Steensel and Belmont, 2017). The spatial segregation of chromatin is also dynamically reorganized during cellular differentiation and between cell-types, concordant with cell-type specific gene expression patterns (Dixon et al., 2015; Schmitt et al., 2016), suggesting a close relationship between 3D chromatin structure and gene regulation.
High-resolution chromatin contact maps obtained from recent Hi-C studies have revealed sub-Mb scale domains in which fragments located in the same domain have a higher number of chromatin contacts compared to the number of interactions between fragments located in different domains. These distinct units are now referred to as TADs (Fig. 1B) (Dixon et al., 2012; Nora et al., 2012). Although the mechanisms underlying TAD formation are yet to be clarified, the most promising model is the loop extrusion procedure that involves dimerization of CTCF proteins located in TAD boundaries and stabilization of loops by cohesin proteins demarcating TAD formation (Fig. 1C) (Rao et al., 2014; 2017; Sanborn et al., 2015; Schwarzer et al., 2017; Vian et al., 2018). A set of studies strongly support that TAD is a basic unit of 3D chromatin structure. First, TAD boundaries are well conserved during cellular differentiation (Dixon et al., 2015) (Fig. 1D) and even between species (Dixon et al., 2012) (Fig. 1E). Second, chromatin interactions are dynamically reorganized during cellular differentiation, albeit the interaction changes occur in a TAD-wise manner, where the pattern of either increase or decrease in chromatin contacts is similar across the fragments in the same TAD (Fig. 1F). Lastly, long-range enhancer-promoter interactions are restricted by TAD boundaries (Fig. 1G). Mammalian genomes, especially the human genome, can be uniquely characterized by the enrichment of
Since proper folding of chromatin structure is crucial in gene regulation, disruption of TAD boundaries or disorganization of inter-TAD structures can cause aberrant gene expression by exposing genes to inappropriate regulatory elements. The fact that mutations in the genes that encode structural proteins such as CTCF and cohesin have been frequently linked to human diseases and developmental abnormalities supports the notion of a close link between chromatin folding and disease development (Hnisz et al., 2016; Katainen et al., 2015). Recently, two mechanisms have been proposed on how TAD alteration contributes to abnormal gene expression (Valton and Dekker, 2016). One mechanism locally disrupts domains by deleting or dysregulating TAD boundaries, leading to fusion of two adjacent TADs (Flavahan et al., 2016; Hnisz et al., 2016). Another mechanism breaks existing TADs and the resulting TAD fragments form new combinations, creating new TADs, without directly affecting TAD boundaries (Groschel et al., 2014; Northcott et al., 2014). Large scale genomic variations known as structural variations (SV), including duplications, deletions, inversions, and translocations are often involved in TAD boundary disruptions or disorganization of inter-TAD structures.
Genomic deletion spanning TAD boundaries or inhibition of structural protein binding sites can result in fusion of neighboring TADs, a phenomenon known as TAD fusion (Fig. 2A), while duplication spanning TAD boundaries can cause formation of neo-TADs (Franke et al., 2016; Hnisz et al., 2016; Lupianez et al., 2015). TAD fusion rewires the spatial distance between enhancers and promoters that were originally located in two different TADs, which can postulate abnormal gene activation. For example, recurrent deletion of the boundary sequences which results in up-regulation of prominent proto-oncogenes via TAD fusion was identified in T cell acute lymphoblastic leukemia (T-ALL) patient samples (Hnisz et al., 2016). This is an evidence for oncogene activation resulting from
SVs between TADs (deletions, inversions, translocations, and duplications spanning multiple TADs) can rearrange sub-regions of TADs to create new TADs, through a phenomenon known as neo-TAD formation or TAD shuffling. For example, a balanced translocation or inversion relocates sub-regions of two TADs to one other and deletion of the region spanning multiple TADs may generate shuffled TADs by joining sub-regions of two distant TADs (Fig. 2B). Although further in-depth investigations are required to generalize the effect of TAD alteration on enhancer-promoter interactions, several studies have clearly shown that TAD shuffling caused by inter-TAD SVs can be critical in oncogenesis, as observed in acute myeloid leukemia (AML) patients where relocation of enhancer by inversion in chromosome 3 activates
Aberrant gene expression resulted from SV-driven TAD alteration emphasizes the importance of accurate detection and interpretation of large-scale genomic rearrangements in various human diseases. However, despite the significant contributions made by the current usage of whole-genome sequencing (WGS) based approaches, they were limited in precise detection and interpretation of SVs due to sample purity, repeat sequences, transposons, complex form of genomic rearrangements, and limited coverage of sequencing depth and read length (Alkan et al., 2011; Campbell et al., 2008; Rausch et al., 2012; Tattini et al., 2015; Zhang et al., 2018). These limitations led to greater discrepancies in the detected SVs between different calling algorithms (Kosugi et al., 2019). In this aspect, additional information of the 3D architecture of the genome can be employed as a new strategy to precisely detect and interpret SVs. Indeed, recent studies have demonstrated the effectiveness of utilizing 3D chromatin contact maps in characterizing large scale SVs, particularly in cancer cell lines and primary tumor samples (Dixon et al., 2018; Harewood et al., 2017; Rickman et al., 2012).
In principle, WGS methods use the number of reads spanning the breakpoint of each SV for detection. However, the number of reads supporting the SV is limited by the read coverage at the breakpoint and the fraction of clones carrying the corresponding SV. Therefore, the detection efficiency of SVs is highly affected and often limited by sequencing depth and allele frequencies (Fig. 3A top right). In contrast, in chromatin contact maps, the reads spanning the breakpoints are not the only evidence for the SV, but also the reads corresponding to ligated DNA fragments located nearby the SV breakpoint can further support the presence of the SV (Fig. 3A bottom right). Due to random collisions in the crowded nucleus, the frequencies of ligated DNA fragments exponentially decay with increasing genomic distance (Fig. 3A bottom right). Therefore, ligated reads located nearby the breakpoints of the SV are detected at an exponentially greater level due to shorter genomic distance after rearrangement of the genome by the SV (denoted ‘d2’ in Fig. 3A left and bottom right), compared to a much lower number of ligated reads in the absence of SV (denoted ‘d1’ in Fig. 3A left and bottom right). Unlike chromatin contact maps, the read coverage of WGS does not reflect any difference according to the change in genomic distance between A and B (denoted in Fig. 3A top right). Therefore, chromatin contact maps are very sensitive to the detection of such large-scale SVs regardless of the original linear genomic distances, which is a great merit for detecting SVs in low purity samples, such as human tumor tissues.
In addition to the importance of detecting SVs, it is also crucial to correctly interpret the type of SVs such as inversion, duplication, deletion, translocation, and other complex form of SVs in order to understand the effect of SVs in gene regulation. WGS, however, uses mapped orientation of reads at breakpoints to determine the type of SVs. Therefore, if the read lengths cannot cover the whole rearranged region or in case of complex SVs, the exact type of SV cannot be clearly distinguished using only mapped read orientation information (Alkan et al., 2011; Soylev et al., 2019). For example, balanced inversion and duplication events containing inverted segments are difficult to be classified solely by read mapping orientation (Soylev et al., 2019). In this regard, chromatin contact maps are highly useful, since different types of SVs produce unique chromatin contact signature (gradient pattern), where the number of chromatin interactions is the highest at the breakpoint and gradually decreases with increasing distance from the breakpoint (Figs. 3B–3E). The direction of the gradient pattern reflects the orientation of the rearranged segments and thereby indicates the type of the corresponding SV. For example, deletion produces new chromatin contact signals between upstream regions of the start breakpoint and the downstream regions of the end breakpoint, generating new chromatin contacts at the upper part of the breakpoint coordinates (Fig. 3B). Inversion produces gradient patterns at both left and right sides of the breakpoint coordinates, which is known as the ‘butterfly’ shape (Fig. 3C) (Harewood et al., 2017). In case of translocations, reciprocal and nonreciprocal translocations can be determined by the presence of either gradient patterns at both sides, similar to those of inversion, or gradient pattern only at one side, respectively (Fig. 3D). In case of duplications, the gradient patterns can be classified into three sub-types (Fig. 3E). Tandem duplications without inverted fragment produce gradient pattern at the bottom of the breakpoint coordinates. In contrast, duplication with inverted fragment produces gradient pattern at either left or right side of the breakpoint coordinates according to the position of inverted duplicated DNA fragment.
Due to its high sensitivity in detecting large-scale SVs and precise interpretation of SV orientations, several computational algorithms have been developed to identify SVs based on chromatin contact maps and combined approaches using multiple platforms including Hi-C and WGS have been proposed to improve the detection power for SVs (Chaisson et al., 2019; Dixon et al., 2018; Harewood et al., 2017; Jacobson et al., 2019; Rickman et al., 2012).
Recent cancer genomic studies have shown superposition of multiple large-scale SVs that can affect the entire chromosome, contributing to oncogenesis or cancer progression (Notta et al., 2016). However, SV detection with WGS can include many false positive SVs due to inaccurate mapping of short sequenced reads, and thus, requires elaborate and laborious manual curation. Therefore, it is challenging to identify complex forms of SV using WGS. Here, we address this issue by describing how chromatin contact information can be used to identify and interpret complex forms of SVs.
First, chromatin contact maps can provide a direct evidence for the existence of chromosome-wide SV, which can greatly reduce the number of false positive SVs. For example, the detection of paired-end reads spanning A and B regions in WGS can be the result of three possible cases (Fig. 4A left). It can be a result of a ring chromosome formation, which is recognized as an extremely large-scale duplication event. Meanwhile, it can be a result of an extremely large-scale deletion event. Lastly, it could be a simple mapping error of the paired-end reads due to the similarity between A′ and B, resulting in false positive SV calling. Indeed, the orientation of mapped reads is not sufficient to distinguish the three given cases (Fig. 4A middle), requiring additional information such as copy number variation profiles. However, chromatin contact maps show unique gradient patterns for each case (Fig. 4A right). Gradient pattern showing increased chromatin contact signals at the bottom of the breakpoint indicates an actual rearrangement of the chromosome structure, while gradient pattern at the top of the breakpoint validates the occurrence of an extremely large-scale deletion event. In case of the mapping error, we would simply not observe any gradient pattern in the chromatin contact map.
Second, chromatin contact maps can provide linkage information between the chromosomal segments that were shattered by a complex genomic rearrangement, allowing reconstruction of aberrant genome to understand the effect of complex forms of SVs. For example, chromothripsis is one of the most dramatic types of chromosome-wide genomic rearrangement (Stephens et al., 2011), in which a whole chromosome is shattered into multiple pieces by a single catastrophic event, followed by derivative chromosome formation via aberrant repair process of the broken fragments, resulting in massive rearrangements and loss of the fragments (Fig. 4B left). Due to the complexity of the genomic rearrangement, one-dimensional genomic alteration diagram generated by WGS data can mislead the interpretation of rearranged chromatin structure (Fig. 4B middle). However, the chromatin contact map provides linkage information between aberrantly repaired adjacent fragments, since chromothripsis is caused by a single event that generates fragments sharing no more than two breakpoints, a feature clearly seen in chromatin contact maps (Fig. 4B right). Therefore, reconstruction of derivative chromosome can be enabled by the linkage information obtained from chromatin contact maps (Burton et al., 2013), which in turn allows accurate prediction of the effect of chromothripsis in genome function.
Lastly, chromatin contact maps can also be used to intuitively detect complex forms of SVs. For example, chromoplexy is another form of dramatic chromosome-wide rearrangement where multiple chromosomes are cleaved and joined together through a single catastrophic event (Fig. 4C left) (Baca et al., 2013). The provided example shows chromoplexy between three chromosomes where the points A and B are the breakpoints. Though the feature of chromoplexy is well characterized from the distinct ‘closed-chain’ patterns in the circos plot, it is hard to recognize chromoplexy in presence of other unrelated translocation events located in close proximity (black lines in Fig. 4C middle). However, chromatin contact map generates unique signatures where the breakpoints (blue and red dotted lines in Fig. 4C right) of translocations involved in chromoplexy are aligned into a line as they share breakpoint coordinates (highlighted in yellow in Fig. 4C right), while unrelated translocations are not joined at the alignment lines (black arrows in Fig. 4C right). This unique feature can be utilized to more intuitively understand complex rearrangements of the chromosomes.
SV are important in generating genomic diversity between individuals (Chaisson et al., 2019; Levy-Sakin et al., 2019) and are involved in disease-specific gene regulation mechanisms. Thus, precise detection and interpretation of large-scale genomic rearrangements are crucial, but WGS-based conventional methods are limited in solving this problem. As an alternative or supplementary approach, a set of recent studies strongly suggest that the utilization of 3D genome structure is highly efficient in detecting and interpreting large-scale SVs (Chaisson et al., 2019; Dixon et al., 2018; Harewood et al., 2017; Jacobson et al., 2019; Rickman et al., 2012). Nevertheless, several challenges remain. The first major demerit of the SV identification based on chromatin contact map is the resolution; small size SVs that do not significantly change the chromatin interactions cannot be detected in the chromatin contact map. Currently, Hi-C based chromatin contact maps are very useful in detecting large-scale SVs, but it is limited in detecting SVs within 1 Mb scale, since the gradient patterns generated by SVs are not strong enough compared to the contact frequencies between fragments in close genomic distance in the original chromatin contact maps. Further, the resolution of the chromatin contact map is generally tens of kilobases, which is too large to determine the exact breakpoint of the SVs. Development of new computational methods such as deep learning algorithms is required to precisely detect smaller SVs and compute exact breakpoints solely based on Hi-C result. Second, the inability to predict the effect of SVs in the context of 3D genome hinders the interpretation of the functional consequence of SVs. Due to the limited knowledge of non-coding regulatory regions in the genome, it is difficult to predict the pathogenicity of each SV in the context of higher-order chromatin structure. Although new bioinformatics approaches are being attempted to address this challenge (Weischenfeldt et al., 2017), the complex SVs make it more difficult to predict their effects. Thus, integration of SV-driven 3D chromatin structure and basic principles of gene regulation mechanisms is essential to predict the pathogenicity of each SV.
To conclude, indeed, there are many hurdles in applying this new strategy for detection and interpretation of SVs, but the development of new computational methods and integrative approaches will result in a very powerful tool to comprehensively understand the complex rearrangement of genome driven by SVs in both normal and disease context, beyond the one-dimensional genome.
This work was funded by the Ministry of Science, ICT, and Future Planning through the National Research Foundation in Republic of Korea (2017R1C1B2008838) and Korean Ministry of Health and Welfare (HI17C0328).
Uijin Kim and Dong-Sung Lee
Mol. Cells 2023; 46(2): 86-98 https://doi.org/10.14348/molcells.2023.0013Hongwoo Lee and Pil Joon Seo
Mol. Cells 2021; 44(12): 883-892 https://doi.org/10.14348/molcells.2021.0014Won-Young Choi, Ji-Hyun Hwang, Ann-Na Cho, Andrew J. Lee, Inkyung Jung, Seung-Woo Cho, Lark Kyun Kim, and Young-Joon Kim
Mol. Cells 2020; 43(12): 1011-1022 https://doi.org/10.14348/molcells.2020.0207