Mol. Cells 2015; 38(6): 475-481
Published online May 19, 2015
https://doi.org/10.14348/molcells.2015.0103
© The Korean Society for Molecular and Cellular Biology
Correspondence to : *Correspondence: jskim01@snu.ac.kr
Programmable nucleases, which include zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and RNA-guided engineered nucleases (RGENs) repurposed from the type II clustered, regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated protein 9 (Cas9) system are now widely used for genome editing in higher eukaryotic cells and whole organisms, revolutionising almost every discipline in biological research, medicine, and biotechnology. All of these nucleases, however, induce off-target mutations at sites homologous in sequence with on-target sites, limiting their utility in many applications including gene or cell therapy. In this review, we compare methods for detecting nuclease off-target mutations. We also review methods for profiling genome-wide off-target effects and discuss how to reduce or avoid off-target mutations.
Keywords Cas9, CRISPR, genome editing, off-target, TALEN, ZFN
Genome editing is a method of modifying genome sequences in cells and whole organisms via custom-designed programmable nucleases (Kim and Kim, 2014), which cleave chromosomal DNA in a targeted manner, producing site-specific DNA double-strand breaks (DSBs). These DSBs are efficiently repaired in cells by endogenous DNA repair systems known as homologous recombination (HR) and non-homologous end joining (NHEJ), often causing site-specific genome modifications. This technique is now widely used in research, medicine, and biotechnology, a phenomenon that is highlighted by the choice of genome editing as the 2011 Method of the Year and one of the 2015 Methods to Watch by Nature Methods and as a Breakthrough of the Year runner-up by Science in 2013. For example, gene knockout using engineered nucleases enables identification and validation of drug-able target genes. Gene correction in stem and somatic cells can lead to gene therapy for the treatment of various genetic and non-genetic diseases.
Three different classes of programmable nucleases have been developed: zinc finger nucleases (ZFNs) (Bibikova et al., 2003; Kim et al., 2009; 2010; Urnov et al., 2005), transcription activator-like effector nucleases (TALENs) (Kim et al., 2013a; 2013c, Miller et al., 2011), and RNA-guided endonucleases (RGENs) (Cho et al., 2013a; 2013b; Cong et al., 2013; Mali et al., 2013b) repurposed from the type II CRISPR system, an adaptive immune response in bacteria and archea.
ZFNs and TALENs consist of a common nuclease domain derived from FokI, a type IIS restriction enzyme, and distinct DNA-binding domains: ZFNs use zinc fingers (Kim et al., 1996), whereas TALENs employ TAL effectors derived from Xanthomonas, a plant pathogen (Boch et al., 2009; Moscou and Bogdanove, 2009). These DNA-binding domains can be engineered to target user-defined DNA sequences. Because the FokI nuclease domain must dimerize to cleave DNA (Bitinaite et al., 1998), these FokI-based nucleases function as pairs, contributing to their high specificities. Typically, a ZFN pair recognizes an 18-to 36-bp DNA sequence, and a TALEN pair recognizes a 30- to 40-bp DNA sequence, surpassing the complexity of the human genome (4E16 = 4.3 billion > 3.2 billion, the size of the human genome). In practice, however, these nucleases can induce off-target mutations. Furthermore, many ZFNs, especially those made using publically-available zinc-finger resources, are cytotoxic (Kim et al., 2009), which may arise from their off-target effects. Custom-made ZFNs, available from a commercial source, are more potent and specific but are expensive. ZFNs prefer guanine-rich target sequences, limiting targetable sites.
TALENs, the 2nd generation of programmable nucleases, can be designed to target almost any DNA sequence, a critical advantage over ZFNs and RGENs. Unlike zinc fingers that recognize 3-bp sub-sites, TAL effector modules recognize single bases. Four different modules, each specific to one of the four bases, are used to make TALENs. TAL effector arrays often consist of up to 20 modules, making it time-consuming and labor-intensive to prepare plasmids that encode TALENs. In general, TALENs are not cytotoxic, but can induce off-target mutations (Mussolino et al., 2011). Fortunately, TALEN off-target effects can be avoided by choosing unique target sequences that differ by at least 7 nucleotides from any other site in the genome (Kim et al., 2013a). A web-based resource (
CRISPR/Cas-derived RGENs constitute yet another class of programmable nucleases. RGENs consist of a target-specific CRISPR RNA (crRNA), a target-independent trans-activating crRNA (tracrRNA), and Cas9, the protein component originated from Streptococcus pyogenes. Essential portions of crRNA and tracrRNA can be linked to form a single-chain guide RNA (sgRNA) (Jinek et al., 2012). Both crRNAs and sgRNAs function as guide RNAs (gRNAs) to direct Cas9 to target sites. The specificity of an RGEN is determined by both the gRNA, which hybridizes with a 20-bp target DNA sequence, and Cas9, which recognizes the 5′-NGG-3′ sequence known as the protospacer-adjacent motif (PAM). New RGENs with desired specificity are prepared by replacing
Programmable nucleases can cut their target sites efficiently inducing site-specific DSBs in the genome, but can also create unwanted cleavages at off-target sites with high sequence homology to on-target sites, often inducing off-target mutations. Thus, both zinc finger proteins and TAL effector arrays can bind to homologous sites, leading to off-target DNA cleavages. RGEN off-target mutations are caused by both Cas9 and gRNAs. The optimal PAM sequence recognized by Cas9 derived from S. pyogenes is 5′-NGG-3′. However, Cas9 can cleave sites with a 5′-NAG-3′ or 5′-NGA-3′ PAM albeit less efficiently (Hsu et al., 2013). A few nucleotide mismatches between a 20-nt gRNA sequence and a target DNA sequence is also tolerated by an RGEN. Mismatches in the PAM-distal sequence at the 5′ terminus are tolerated better than are those in the 10-to 12-nt PAM-proximal sequence, often termed a seed region. Furthermore, RGENs can cleave off-target sites with a few extra or missing nucleotides that can produce a DNA or RNA bulge, respectively (Lin et al., 2014).
Imprecise repair of on- and off-target DNA cleavages can give rise to gross chromosomal rearrangements such as deletions (Lee et al., 2010), inversions (Lee et al., 2012; Park et al., 2014), and translocations (Brunet et al., 2009; Cho et al., 2014), in addition to local mutations. An example is a ZFN designed to target the C-C chemokine receptor 5 (
Various methods, which include Sanger sequencing, high-throughput sequencing, restriction fragment length polymorphism (RFLP) analysis, mismatch-sensitive enzymes, have been developed for detecting indels induced by erroneous NHEJ repair of DSBs. Sanger sequencing of DNA from individual clones is the gold standard for confirming nuclease-triggered mutations at on- or off-target sites, but this method is time-consuming and cost-inefficient when many samples need to be analyzed in parallel. High-throughput sequencing enables accurate measurements of indel frequencies at up to hundreds of on- and off-target sites at once. Although this method is highly sensitive, allowing detection of indels that are induced with frequencies that range from 0.01% to 1% (∼0.1% on average), care must be taken to discard false-positive sequence reads that result from PCR artifacts and to include a negative control (no nuclease expression) at each target site (Cho et al., 2014). A web-based tool (
Mismatch-sensitive nucleases, which include T7 endonuclease I (T7E1) (Kim et al., 2009) and CEL-I enzyme (a.k.a. Surveyor nuclease) are widely used to measure indel frequencies in bulk populations of cells. These enzymes recognize and cleave heteroduplexes formed by hybridization of wild-type and mutant DNA sequences or of two different mutant DNA sequences. PCR amplicons treated with these enzymes are then subjected to agarose gel electrophoresis. The size and intensity of cleaved DNA bands provide accurate measurements of mutation frequencies. Although these enzymes can detect both indels and point mutations, T7E1 is more sensitive to indels than CEL-I enzyme (Vouillot et al., 2015). Because programmable nucleases rarely produce point mutations (Kim et al., 2013b), T7E1 is preferred for detecting nuclease-induced mutations. In fact, under optimal conditions, T7E1 can detect indels that are induced at frequencies below 1% (Kim et al., 2013a).
In contrast to agarose gel electrophoresis, polyacrylamide gel electrophoresis (PAGE) can be used to separate heteroduplexed DNA from homoduplexed DNA by without the use of mismatch-sensitive nucleases (Zhu et al., 2014). However, accurate quantitation of mutation frequencies using this method is difficult because multiple DNA bands are obtained.
Programmable nucleases often induce homozygous biallelic mutations in a cell or an organism, leading to a complete gene disruption or knockout. These mutations cannot be detected by T7E1 or CEL-I, because heteroduplexes are not formed. RGENs can be used for RFLP analysis to distinguish homozygous biallelic mutants from wild-type sequences or monoallelic mutations (Kim et al., 2014a). Thus, RGENs cannot cleave indel sequences induced by themselves in cells. In fact, RFLP analysis using conventional restriction enzymes was one of the first methods for detecting mutations induced by programmable nucleases in cells (Urnov et al., 2005). However, unlike RGEN-RFLP, this method is limited by the availability of appropriate restriction sites in a target DNA site. Fluorescence PCR (Kim et al., 2011) and DNA melting analysis (Parant et al., 2009) can also be used for measuring genome editing activities of programmable nucleases, but these methods require special devices.
Several different methods have been used to identify nuclease off-target sites: bioinformatic prediction based on sequence homology, chromatin immunoprecipitation coupled with deep sequencing (ChIP-Seq), systematic evolution of ligands by exponential amplification (SELEX), integrase-deficient lentivirus (IDLV) capture in cells,
Cas9 and other programmable nucleases can induce off-target mutations at sites that differ from their on-target sites by several nucleotides. This means that more than 10,000 potential off-target sites identified based on sequence homology must be examined. A web-based computer program, CAS-OFFinder (Bae et al., 2014), can be used to list all of these homologous sites, but measuring indel frequencies at these sites one by one is an almost impossible task. To profile genome-wide off-target effects of engineered nucleases in an unbiased manner, SELEX (Miller et al., 2011) and ChIP-Seq (Kuscu et al., 2014; Wu et al., 2014) have been used. These methods rely on DNA binding
IDLV capture and
These two methods are complementary but neither of them is comprehensive. Gabriel et al. (2011) and Pattanayak et al. (2011) have applied IDLV capture and in vitro selection, respectively, to examine off-target effects of the same
Whole genome/exome sequencing of clonal populations of human cells in which a gene of interest was modified using ZFNs (Yusa et al., 2011), TALENs (Smith et al., 2014; Veres et al., 2014), or RGENs (Cho et al., 2014; Kim et al., 2015) revealed remarkable specificities of these nucleases. Although these nucleases have detectable off-target effects in a bulk population of cells, off-target mutations are almost absent in the entire human exome or genome of an individual clone. This is because mutation frequencies at off-target sites are usually orders of magnitude smaller than those at on-target sites. Off-target mutations that occur at a frequency of 10% cannot be revealed by sequencing of DNA from just a few clones with a typical depth of 30X.
In the following two sections, we review recently improved methods for profiling genome-wide off-target sites of Cas9 nucleases and various approaches for reducing or avoiding their off-target effects. Some of these methods can also be applied to other programmable nucleases.
Recently, four different methods have been reported for identifying potential off-target sites of RGENs in a bulk population of cells (Figs. 1C?1F). Genome-wide, unbiased identification of DSBs enabled by sequencing (GUIDE-seq) represents an improvement over IDLV capture (Tsai et al., 2015). Blunt-ended, double-stranded phophothiorate oligodeoxynucleotides (dsODNs) can be captured at on-target and off-target sites, when DSBs are repaired by NHEJ in cells. These dsODN integration sites are mapped in the genome by PCR amplification and deep sequencing. High-throughput genomic translocation sequencing (HTGTS) exploits translocations that are induced by erroneous ligation of on-target and off-target sites in cells (Frock et al., 2015). HTGTS identifies off-target sites by using the on-target DSB as a ‘bait’ to catch ‘prey’ sequences that are trans-located to the on-target site. High-throughput sequencing is used to determine prey sequences that correspond to off-target sites.
Off-target DSBs can also be captured in fixed, permeable cells. Breaks labelling, enrichments on streptavidin and next-generation sequencing (BLESS) is performed by labelling DSBs present in fixed cells using biotinylated oligonucleotides, which are then enriched and subjected to deep sequencing (Crosetto et al., 2013; Ran et al., 2015). Because this method captures DSBs at a single moment, many bona fide off-target cleavage sites can be missed, resulting in poor sensitivity.
Cell-free genomic DNA can be used to profile nuclease off-target effects
A key difference between these methods is whether genomic DNA is cleaved in cells or
It is unknown to what extent programmable nucleases are limited by chromatin. Because HTGTS, GUIDE-seq, and BLESS profile nuclease cleavage sites in cells, off-target sites captured in one cell type could be different from those in other cell types, owing to differential chromatin accessibility in each cell type. Digenome-seq functions independently of the cell type, because naked, chromatin-free genomic DNA is used.
DSB repair by NHEJ in cells often result in deletions of sequences of up to hundreds of base pairs in length at cleavage sites. As a result, neither HTGTS nor GUIDE-seq can pinpoint off-target sites: One must search for potential off-target sites based on the sequence homology around captured sites. Both BLESS and Digenome-seq do not depend on NHEJ and can pinpoint cleavage sites at single-nucleotide resolution.
To determine which method is most sensitive and comprehensive, one needs to test the same nucleases using each of these methods. Only one sgRNA, specific to the VEGF-A site, has been tested by HTGTS (Frock et al., 2015), GUIDE-seq (Tsai et al., 2015), and Digenome-seq (Kim et al., 2015) thus far. Each of these methods revealed a different set of potential off-target sites, suggesting that no method is comprehensive. However, most of these candidate sites were invalidated by targeted deep sequencing. Importantly, these three methods commonly identified a total of 7 off-target sites in addition to the on-target site in the human genome. Notably, Digenome-seq identified one additional bona fide off-target site, with an indel frequency of 0.065%, which was validated using deep sequencing.
First, the choice of unique target sequences, which differ from any other sites in the genome by at least 2 or 3 nucleotides in a 20-nt sequence, is important for avoiding off-target effects (Cho et al., 2014). RGENs discriminate efficiently against potential off-target sites with mismatches in the PAM sequence and the seed region upstream of the PAM sequence. A web-based computer algorithm is available (
Third, paired nickases can generate two single-strand breaks or nicks on different DNA strands, producing a composite DSB and doubling the specificity of genome editing (Fig. 2B) (Cho et al., 2014; Kim et al., 2012; Mali et al., 2013a; Ran et al., 2013). Both ZFNs and Cas9 can be converted to nickases by inactivating one active site. Cas9 has two active sites, each cleaving either the Watson or Crick strand. Among the two nickase forms, D10A Cas9 appears more efficient and robust than H840A Cas9. One caveat to this approach is that two active sgRNAs are required to make a functional Cas9 nickase pair. Furthermore, target sequences must contain two PAM sequences, limiting the choice of targetable sites.
Fourth, the use of recombinant Cas9 protein [commercially available from ToolGen (
Since we and others have reported RNA-guided genome editing in human cells in January, 2013 (Cho et al., 2013a; Cong et al., 2013; Jinek et al., 2013; Mali et al., 2013b), the CRISPR-Cas9 system has been widely used in many labs all around the world to modify genomes in various organisms and cells. Although Cas9 nucleases have off-target effects, whole genome/exome sequencing of gene-modified clones shows that these nucleases are highly specific. Recent methods such as Digenome-seq and GUIDE-seq that profile genome-wide off-target sites in a bulk population of cells reveal a broad spectrum of sgRNA specificities. Certain sgRNAs are remarkably specific, resulting in no measurable off-target mutations, whereas others are promiscuous. To find rules that govern sgRNA specificity, one needs to profile the off-target effects of as many sgRNAs as possible at the genome-wide level. Digenome-seq is appealing in this regard, because it can be multiplexed without increasing the sequencing depth. Hundreds of sgRNAs can be tested in a single assay.
Genome-wide off-target profiling methods yield a list of potential off-target sites that are cleaved under certain conditions. Bona fide off-target sites must be validated by targeted deep sequencing. Unfortunately, validation of off-target effects is limited by intrinsic errors of sequencing platforms, which are in the range of 1% to 0.01% (0.1% on average). A more sensitive method is needed to confirm that certain sgRNAs do not induce off-target mutations with indel frequencies below 0.01% in the entire genome. Highly specific and efficient nucleases will enable applications in somatic gene and cell therapy and possibly in human germline genome editing to prevent the transmission of fatal genetic mutations.
. Web-based tools available for guide-RNA design
Name | Developer | Address |
---|---|---|
Cas-OFFinder | Jin-Soo Kim lab, Seoul National University | |
CHOPCHOP | George Church lab, Harvard University | |
CRISPR Design | Feng Zhang lab, Massachusetts Institute of Technology | |
CRISPR Design tool | The Broad Institute of Harvard and MIT | |
CRISPR/Cas9 gRNA finder | Jack Lin lab, University of Colorado | |
CRISPRfinder | Christine Pourcel lab, Universit? Paris-Sud 11 | |
E-CRISP | Boutros lab, DKFZ German Cancer Research Center | |
ZiFiT | Keith Joung lab, Harvard University |
Mol. Cells 2015; 38(6): 475-481
Published online June 30, 2015 https://doi.org/10.14348/molcells.2015.0103
Copyright © The Korean Society for Molecular and Cellular Biology.
Taeyoung Koo1,4,5, Jungjoon Lee2,5, and Jin-Soo Kim1,3,4,*
1Center for Genome Engineering, Institute for Basic Science, Daejeon 305-811, Korea, 2The Institute of Molecular Biology and Genetics, Seoul National University, Seoul 151-742, Korea, 3Department of Chemistry, Seoul National University, Seoul 151-742, Korea, 4University of Science and Technology, Daejeon 305-350, Korea, 5These authors contributed equally to this work.
Correspondence to:*Correspondence: jskim01@snu.ac.kr
Programmable nucleases, which include zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and RNA-guided engineered nucleases (RGENs) repurposed from the type II clustered, regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated protein 9 (Cas9) system are now widely used for genome editing in higher eukaryotic cells and whole organisms, revolutionising almost every discipline in biological research, medicine, and biotechnology. All of these nucleases, however, induce off-target mutations at sites homologous in sequence with on-target sites, limiting their utility in many applications including gene or cell therapy. In this review, we compare methods for detecting nuclease off-target mutations. We also review methods for profiling genome-wide off-target effects and discuss how to reduce or avoid off-target mutations.
Keywords: Cas9, CRISPR, genome editing, off-target, TALEN, ZFN
Genome editing is a method of modifying genome sequences in cells and whole organisms via custom-designed programmable nucleases (Kim and Kim, 2014), which cleave chromosomal DNA in a targeted manner, producing site-specific DNA double-strand breaks (DSBs). These DSBs are efficiently repaired in cells by endogenous DNA repair systems known as homologous recombination (HR) and non-homologous end joining (NHEJ), often causing site-specific genome modifications. This technique is now widely used in research, medicine, and biotechnology, a phenomenon that is highlighted by the choice of genome editing as the 2011 Method of the Year and one of the 2015 Methods to Watch by Nature Methods and as a Breakthrough of the Year runner-up by Science in 2013. For example, gene knockout using engineered nucleases enables identification and validation of drug-able target genes. Gene correction in stem and somatic cells can lead to gene therapy for the treatment of various genetic and non-genetic diseases.
Three different classes of programmable nucleases have been developed: zinc finger nucleases (ZFNs) (Bibikova et al., 2003; Kim et al., 2009; 2010; Urnov et al., 2005), transcription activator-like effector nucleases (TALENs) (Kim et al., 2013a; 2013c, Miller et al., 2011), and RNA-guided endonucleases (RGENs) (Cho et al., 2013a; 2013b; Cong et al., 2013; Mali et al., 2013b) repurposed from the type II CRISPR system, an adaptive immune response in bacteria and archea.
ZFNs and TALENs consist of a common nuclease domain derived from FokI, a type IIS restriction enzyme, and distinct DNA-binding domains: ZFNs use zinc fingers (Kim et al., 1996), whereas TALENs employ TAL effectors derived from Xanthomonas, a plant pathogen (Boch et al., 2009; Moscou and Bogdanove, 2009). These DNA-binding domains can be engineered to target user-defined DNA sequences. Because the FokI nuclease domain must dimerize to cleave DNA (Bitinaite et al., 1998), these FokI-based nucleases function as pairs, contributing to their high specificities. Typically, a ZFN pair recognizes an 18-to 36-bp DNA sequence, and a TALEN pair recognizes a 30- to 40-bp DNA sequence, surpassing the complexity of the human genome (4E16 = 4.3 billion > 3.2 billion, the size of the human genome). In practice, however, these nucleases can induce off-target mutations. Furthermore, many ZFNs, especially those made using publically-available zinc-finger resources, are cytotoxic (Kim et al., 2009), which may arise from their off-target effects. Custom-made ZFNs, available from a commercial source, are more potent and specific but are expensive. ZFNs prefer guanine-rich target sequences, limiting targetable sites.
TALENs, the 2nd generation of programmable nucleases, can be designed to target almost any DNA sequence, a critical advantage over ZFNs and RGENs. Unlike zinc fingers that recognize 3-bp sub-sites, TAL effector modules recognize single bases. Four different modules, each specific to one of the four bases, are used to make TALENs. TAL effector arrays often consist of up to 20 modules, making it time-consuming and labor-intensive to prepare plasmids that encode TALENs. In general, TALENs are not cytotoxic, but can induce off-target mutations (Mussolino et al., 2011). Fortunately, TALEN off-target effects can be avoided by choosing unique target sequences that differ by at least 7 nucleotides from any other site in the genome (Kim et al., 2013a). A web-based resource (
CRISPR/Cas-derived RGENs constitute yet another class of programmable nucleases. RGENs consist of a target-specific CRISPR RNA (crRNA), a target-independent trans-activating crRNA (tracrRNA), and Cas9, the protein component originated from Streptococcus pyogenes. Essential portions of crRNA and tracrRNA can be linked to form a single-chain guide RNA (sgRNA) (Jinek et al., 2012). Both crRNAs and sgRNAs function as guide RNAs (gRNAs) to direct Cas9 to target sites. The specificity of an RGEN is determined by both the gRNA, which hybridizes with a 20-bp target DNA sequence, and Cas9, which recognizes the 5′-NGG-3′ sequence known as the protospacer-adjacent motif (PAM). New RGENs with desired specificity are prepared by replacing
Programmable nucleases can cut their target sites efficiently inducing site-specific DSBs in the genome, but can also create unwanted cleavages at off-target sites with high sequence homology to on-target sites, often inducing off-target mutations. Thus, both zinc finger proteins and TAL effector arrays can bind to homologous sites, leading to off-target DNA cleavages. RGEN off-target mutations are caused by both Cas9 and gRNAs. The optimal PAM sequence recognized by Cas9 derived from S. pyogenes is 5′-NGG-3′. However, Cas9 can cleave sites with a 5′-NAG-3′ or 5′-NGA-3′ PAM albeit less efficiently (Hsu et al., 2013). A few nucleotide mismatches between a 20-nt gRNA sequence and a target DNA sequence is also tolerated by an RGEN. Mismatches in the PAM-distal sequence at the 5′ terminus are tolerated better than are those in the 10-to 12-nt PAM-proximal sequence, often termed a seed region. Furthermore, RGENs can cleave off-target sites with a few extra or missing nucleotides that can produce a DNA or RNA bulge, respectively (Lin et al., 2014).
Imprecise repair of on- and off-target DNA cleavages can give rise to gross chromosomal rearrangements such as deletions (Lee et al., 2010), inversions (Lee et al., 2012; Park et al., 2014), and translocations (Brunet et al., 2009; Cho et al., 2014), in addition to local mutations. An example is a ZFN designed to target the C-C chemokine receptor 5 (
Various methods, which include Sanger sequencing, high-throughput sequencing, restriction fragment length polymorphism (RFLP) analysis, mismatch-sensitive enzymes, have been developed for detecting indels induced by erroneous NHEJ repair of DSBs. Sanger sequencing of DNA from individual clones is the gold standard for confirming nuclease-triggered mutations at on- or off-target sites, but this method is time-consuming and cost-inefficient when many samples need to be analyzed in parallel. High-throughput sequencing enables accurate measurements of indel frequencies at up to hundreds of on- and off-target sites at once. Although this method is highly sensitive, allowing detection of indels that are induced with frequencies that range from 0.01% to 1% (∼0.1% on average), care must be taken to discard false-positive sequence reads that result from PCR artifacts and to include a negative control (no nuclease expression) at each target site (Cho et al., 2014). A web-based tool (
Mismatch-sensitive nucleases, which include T7 endonuclease I (T7E1) (Kim et al., 2009) and CEL-I enzyme (a.k.a. Surveyor nuclease) are widely used to measure indel frequencies in bulk populations of cells. These enzymes recognize and cleave heteroduplexes formed by hybridization of wild-type and mutant DNA sequences or of two different mutant DNA sequences. PCR amplicons treated with these enzymes are then subjected to agarose gel electrophoresis. The size and intensity of cleaved DNA bands provide accurate measurements of mutation frequencies. Although these enzymes can detect both indels and point mutations, T7E1 is more sensitive to indels than CEL-I enzyme (Vouillot et al., 2015). Because programmable nucleases rarely produce point mutations (Kim et al., 2013b), T7E1 is preferred for detecting nuclease-induced mutations. In fact, under optimal conditions, T7E1 can detect indels that are induced at frequencies below 1% (Kim et al., 2013a).
In contrast to agarose gel electrophoresis, polyacrylamide gel electrophoresis (PAGE) can be used to separate heteroduplexed DNA from homoduplexed DNA by without the use of mismatch-sensitive nucleases (Zhu et al., 2014). However, accurate quantitation of mutation frequencies using this method is difficult because multiple DNA bands are obtained.
Programmable nucleases often induce homozygous biallelic mutations in a cell or an organism, leading to a complete gene disruption or knockout. These mutations cannot be detected by T7E1 or CEL-I, because heteroduplexes are not formed. RGENs can be used for RFLP analysis to distinguish homozygous biallelic mutants from wild-type sequences or monoallelic mutations (Kim et al., 2014a). Thus, RGENs cannot cleave indel sequences induced by themselves in cells. In fact, RFLP analysis using conventional restriction enzymes was one of the first methods for detecting mutations induced by programmable nucleases in cells (Urnov et al., 2005). However, unlike RGEN-RFLP, this method is limited by the availability of appropriate restriction sites in a target DNA site. Fluorescence PCR (Kim et al., 2011) and DNA melting analysis (Parant et al., 2009) can also be used for measuring genome editing activities of programmable nucleases, but these methods require special devices.
Several different methods have been used to identify nuclease off-target sites: bioinformatic prediction based on sequence homology, chromatin immunoprecipitation coupled with deep sequencing (ChIP-Seq), systematic evolution of ligands by exponential amplification (SELEX), integrase-deficient lentivirus (IDLV) capture in cells,
Cas9 and other programmable nucleases can induce off-target mutations at sites that differ from their on-target sites by several nucleotides. This means that more than 10,000 potential off-target sites identified based on sequence homology must be examined. A web-based computer program, CAS-OFFinder (Bae et al., 2014), can be used to list all of these homologous sites, but measuring indel frequencies at these sites one by one is an almost impossible task. To profile genome-wide off-target effects of engineered nucleases in an unbiased manner, SELEX (Miller et al., 2011) and ChIP-Seq (Kuscu et al., 2014; Wu et al., 2014) have been used. These methods rely on DNA binding
IDLV capture and
These two methods are complementary but neither of them is comprehensive. Gabriel et al. (2011) and Pattanayak et al. (2011) have applied IDLV capture and in vitro selection, respectively, to examine off-target effects of the same
Whole genome/exome sequencing of clonal populations of human cells in which a gene of interest was modified using ZFNs (Yusa et al., 2011), TALENs (Smith et al., 2014; Veres et al., 2014), or RGENs (Cho et al., 2014; Kim et al., 2015) revealed remarkable specificities of these nucleases. Although these nucleases have detectable off-target effects in a bulk population of cells, off-target mutations are almost absent in the entire human exome or genome of an individual clone. This is because mutation frequencies at off-target sites are usually orders of magnitude smaller than those at on-target sites. Off-target mutations that occur at a frequency of 10% cannot be revealed by sequencing of DNA from just a few clones with a typical depth of 30X.
In the following two sections, we review recently improved methods for profiling genome-wide off-target sites of Cas9 nucleases and various approaches for reducing or avoiding their off-target effects. Some of these methods can also be applied to other programmable nucleases.
Recently, four different methods have been reported for identifying potential off-target sites of RGENs in a bulk population of cells (Figs. 1C?1F). Genome-wide, unbiased identification of DSBs enabled by sequencing (GUIDE-seq) represents an improvement over IDLV capture (Tsai et al., 2015). Blunt-ended, double-stranded phophothiorate oligodeoxynucleotides (dsODNs) can be captured at on-target and off-target sites, when DSBs are repaired by NHEJ in cells. These dsODN integration sites are mapped in the genome by PCR amplification and deep sequencing. High-throughput genomic translocation sequencing (HTGTS) exploits translocations that are induced by erroneous ligation of on-target and off-target sites in cells (Frock et al., 2015). HTGTS identifies off-target sites by using the on-target DSB as a ‘bait’ to catch ‘prey’ sequences that are trans-located to the on-target site. High-throughput sequencing is used to determine prey sequences that correspond to off-target sites.
Off-target DSBs can also be captured in fixed, permeable cells. Breaks labelling, enrichments on streptavidin and next-generation sequencing (BLESS) is performed by labelling DSBs present in fixed cells using biotinylated oligonucleotides, which are then enriched and subjected to deep sequencing (Crosetto et al., 2013; Ran et al., 2015). Because this method captures DSBs at a single moment, many bona fide off-target cleavage sites can be missed, resulting in poor sensitivity.
Cell-free genomic DNA can be used to profile nuclease off-target effects
A key difference between these methods is whether genomic DNA is cleaved in cells or
It is unknown to what extent programmable nucleases are limited by chromatin. Because HTGTS, GUIDE-seq, and BLESS profile nuclease cleavage sites in cells, off-target sites captured in one cell type could be different from those in other cell types, owing to differential chromatin accessibility in each cell type. Digenome-seq functions independently of the cell type, because naked, chromatin-free genomic DNA is used.
DSB repair by NHEJ in cells often result in deletions of sequences of up to hundreds of base pairs in length at cleavage sites. As a result, neither HTGTS nor GUIDE-seq can pinpoint off-target sites: One must search for potential off-target sites based on the sequence homology around captured sites. Both BLESS and Digenome-seq do not depend on NHEJ and can pinpoint cleavage sites at single-nucleotide resolution.
To determine which method is most sensitive and comprehensive, one needs to test the same nucleases using each of these methods. Only one sgRNA, specific to the VEGF-A site, has been tested by HTGTS (Frock et al., 2015), GUIDE-seq (Tsai et al., 2015), and Digenome-seq (Kim et al., 2015) thus far. Each of these methods revealed a different set of potential off-target sites, suggesting that no method is comprehensive. However, most of these candidate sites were invalidated by targeted deep sequencing. Importantly, these three methods commonly identified a total of 7 off-target sites in addition to the on-target site in the human genome. Notably, Digenome-seq identified one additional bona fide off-target site, with an indel frequency of 0.065%, which was validated using deep sequencing.
First, the choice of unique target sequences, which differ from any other sites in the genome by at least 2 or 3 nucleotides in a 20-nt sequence, is important for avoiding off-target effects (Cho et al., 2014). RGENs discriminate efficiently against potential off-target sites with mismatches in the PAM sequence and the seed region upstream of the PAM sequence. A web-based computer algorithm is available (
Third, paired nickases can generate two single-strand breaks or nicks on different DNA strands, producing a composite DSB and doubling the specificity of genome editing (Fig. 2B) (Cho et al., 2014; Kim et al., 2012; Mali et al., 2013a; Ran et al., 2013). Both ZFNs and Cas9 can be converted to nickases by inactivating one active site. Cas9 has two active sites, each cleaving either the Watson or Crick strand. Among the two nickase forms, D10A Cas9 appears more efficient and robust than H840A Cas9. One caveat to this approach is that two active sgRNAs are required to make a functional Cas9 nickase pair. Furthermore, target sequences must contain two PAM sequences, limiting the choice of targetable sites.
Fourth, the use of recombinant Cas9 protein [commercially available from ToolGen (
Since we and others have reported RNA-guided genome editing in human cells in January, 2013 (Cho et al., 2013a; Cong et al., 2013; Jinek et al., 2013; Mali et al., 2013b), the CRISPR-Cas9 system has been widely used in many labs all around the world to modify genomes in various organisms and cells. Although Cas9 nucleases have off-target effects, whole genome/exome sequencing of gene-modified clones shows that these nucleases are highly specific. Recent methods such as Digenome-seq and GUIDE-seq that profile genome-wide off-target sites in a bulk population of cells reveal a broad spectrum of sgRNA specificities. Certain sgRNAs are remarkably specific, resulting in no measurable off-target mutations, whereas others are promiscuous. To find rules that govern sgRNA specificity, one needs to profile the off-target effects of as many sgRNAs as possible at the genome-wide level. Digenome-seq is appealing in this regard, because it can be multiplexed without increasing the sequencing depth. Hundreds of sgRNAs can be tested in a single assay.
Genome-wide off-target profiling methods yield a list of potential off-target sites that are cleaved under certain conditions. Bona fide off-target sites must be validated by targeted deep sequencing. Unfortunately, validation of off-target effects is limited by intrinsic errors of sequencing platforms, which are in the range of 1% to 0.01% (0.1% on average). A more sensitive method is needed to confirm that certain sgRNAs do not induce off-target mutations with indel frequencies below 0.01% in the entire genome. Highly specific and efficient nucleases will enable applications in somatic gene and cell therapy and possibly in human germline genome editing to prevent the transmission of fatal genetic mutations.
. Web-based tools available for guide-RNA design.
Name | Developer | Address |
---|---|---|
Cas-OFFinder | Jin-Soo Kim lab, Seoul National University | |
CHOPCHOP | George Church lab, Harvard University | |
CRISPR Design | Feng Zhang lab, Massachusetts Institute of Technology | |
CRISPR Design tool | The Broad Institute of Harvard and MIT | |
CRISPR/Cas9 gRNA finder | Jack Lin lab, University of Colorado | |
CRISPRfinder | Christine Pourcel lab, Universit? Paris-Sud 11 | |
E-CRISP | Boutros lab, DKFZ German Cancer Research Center | |
ZiFiT | Keith Joung lab, Harvard University |
Dana Carroll*
Mol. Cells 2023; 46(1): 4-9 https://doi.org/10.14348/molcells.2022.0163Peter Karagiannis and Shin-Il Kim
Mol. Cells 2021; 44(8): 541-548 https://doi.org/10.14348/molcells.2021.0078Dong-Sik Chae, Seongho Han, Min-Kyung Lee, and Sung-Whan Kim
Mol. Cells 2021; 44(4): 245-253 https://doi.org/10.14348/molcells.2021.0037