Mol. Cells 2021; 44(3): 127-135
Published online March 31, 2021
https://doi.org/10.14348/molcells.2021.0002
© The Korean Society for Molecular and Cellular Biology
Correspondence to : jp24@kaist.ac.kr
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
Since the introduction of RNA sequencing (RNA-seq) as a high-throughput mRNA expression analysis tool, this procedure has been increasingly implemented to identify cell-level transcriptome changes in a myriad of model systems. However, early methods processed cell samples in bulk, and therefore the unique transcriptomic patterns of individual cells would be lost due to data averaging. Nonetheless, the recent and continuous development of new single-cell RNA sequencing (scRNA-seq) toolkits has enabled researchers to compare transcriptomes at a single-cell resolution, thus facilitating the analysis of individual cellular features and a deeper understanding of cellular functions. Nonetheless, the rapid evolution of high throughput single-cell “omics” tools has created the need for effective hypothesis verification strategies. Particularly, this issue could be addressed by coupling cell engineering techniques with single-cell sequencing. This approach has been successfully employed to gain further insights into disease pathogenesis and the dynamics of differentiation trajectories. Therefore, this review will discuss the current status of cell engineering toolkits and their contributions to single-cell and genome-wide data collection and analyses.
Keywords cell engineering, CRISPR screening, lineage tracing, single-cell multi-omics
Since the first single-cell transcriptome analysis in 2009, the throughput of single-cell transcriptomic techniques has grown exponentially, allowing for a single study to characterize millions of cells (Svensson et al., 2018). Additionally, single-cell approaches are no longer limited to RNA analyses, but can also be used to characterize DNA and proteins (Lee et al., 2020). This powerful technique has been adopted in many fields of the life sciences and has rapidly expanded our understanding of biological systems. For example, a transcriptome and an open chromatin atlas of the embryonic development process of humans and mice are being created, thus providing new insights into the mechanisms by which gene expression modulates an individual’s developmental process (Cao et al., 2019; 2020; He et al., 2020a; Park et al., 2020; Pijuan-Sala et al., 2020). Moreover, the human immune system’s response to COVID-19 is being currently studied at various levels to gain insights into potential molecular mechanisms that could be targeted to control this disease (Sungnak et al., 2020; Zhang et al., 2020). The Human Cell Atlas is an international collaborative initiative that has provided countless researchers with a platform to produce data cooperatively, as well as to compare and analyze their results and focus their efforts towards a single common goal (Panina et al., 2020; Regev et al., 2017; 2018).
The ever-increasing wealth of single-cell data is deepening our understanding of the structure of the human body. However, to further this understanding, a system that efficiently validates new hypotheses is urgently needed. Interestingly, single-cell techniques can be combined with various cell engineering techniques to provide a platform for efficient hypothesis verification, and several single-cell engineering toolkits are being developed based on this strategy. Therefore, this review will address how cell engineering techniques such as CRISPR screening and lineage tracing are being combined with single-cell techniques to usher in a new era of cell engineering (Table 1).
Early single-cell sequencing approaches depended on amplifying the genetic materials of a single cell to create an RNA-seq library (Tang et al., 2009). Later on, two layers of barcode systems were introduced to increase the throughput of single-cell techniques. Among these, cell barcodes (CBCs) are incorporated during reverse transcription (RT) or template switching steps. This allows for the simultaneous preparation of multiple cells by pooling them after cDNA synthesis, increasing the throughput and efficiency of scRNA-seq library preparation. The second barcoding technique consists of a unique molecular identifier (UMI), which corrects the quantification error caused by polymerase chain reaction (PCR) amplification bias by adding random barcodes during the RT reaction (Kivioja et al., 2011).
In order to separate the cells using primers with unique barcode sequences, a multi-well plate system was first implemented, which allowed for a throughput of 100 to 1,000 cells (Hashimshony et al., 2012; Islam et al., 2011; Ramsköld et al., 2012) (Fig. 1). Afterward, the development of microfluidic systems in which each cell is mixed with CBC-specific RT primer conjugated-beads within a single droplet further increased the throughput to more than 10,000 cells (Klein et al., 2015; Macosko et al., 2015). One of the most recent advancements in this field includes
In order to sequence the genetic materials of single cells, solid tissues must first be dissociated. However, it is difficult to dissociate tissues with hard-to-release cell types while avoiding damage to fragile tissues. Many attempts have been made to achieve this balance, such as cryopreservation and methanol fixation. However, each of these approaches has its unique limitations (major loss of epithelial cell types and ambient RNA leakage, respectively), which negatively affect the final scRNA-seq results (Denisenko et al., 2020; Slyper et al., 2020). Single-nucleus RNA-seq (snRNA-seq) was developed to solve these problems. snRNA-seq decouples tissue acquisition from immediate sample processing, thus circumventing the inherent difficulties of obtaining fresh tissues for scRNA-seq analyses, as well as the potential loss of sensitive cells due to enzymatic digestion. Therefore, this strategy can be applied to hard-to-dissociate tissues, such as tissues rich in cell types (e.g., neurons, adipocytes, and skeletal muscle cells), archived frozen clinical materials, and tissues that must be frozen to register into specific coordinates. Moreover, given that snRNA-seq can be used to handle minute frozen specimens, large-scale studies from tissue atlases to longitudinal clinical trials and human genetics can be performed (Ernst et al., 2020; Gaublomme et al., 2019; Rozenblatt-Rosen et al., 2020).
Cell hashing was recently introduced to further increase the throughput of scRNA-seq. In this technique, each sample is labeled with unique ‘hashtag’ barcodes using oligonucleotide-conjugated antibodies (Stoeckius et al., 2017; 2018). This can be also applied to snRNA-seq using DNA-barcoded antibodies targeting the nuclear pore complex (Gaublomme et al., 2019). Moreover, MULTI-seq utilizes chemically modified oligos that can directly stain cellular membranes, making the staining process much easier. Similarly, short barcoding oligo (SBO) barcoding introduces oligos into cells using liposomal transfection, thus enabling single-cell experiment multiplexing. sci-Plex is another newly developed strategy that overcomes this issue by directly labeling nuclei with unmodified single-stranded DNA oligos (Srivatsan et al., 2020). Combined with the sci-RNA-seq combinatorial indexing technique, this approach provides a platform to multiplex hundreds of different conditions, thus rendering a total throughput of millions of cells.
The process of demultiplexing and doublet detection was further enhanced with the creation of demuxlet, a computational algorithm inspired by algorithms that were initially developed to detect DNA contamination in sequencing samples. Even without an oligo or antibody, demuxlet allows researchers to hash their samples even when only the genotype is available. Additionally, this approach can simultaneously demultiplex and detect doublets from more than two individuals from multiplexed Drop-seq using genetic variations, an achievement that was thought to be impossible prior to the development of this algorithm (Kang et al., 2018). The development of Souporcell then allowed for an increased genotype calling efficiency, thus enabling the determination of genotypes
In the field of functional genomics, diverse cell engineering tools such as shRNAs or CRISPRs have been used to modify gene expression. For instance, a pooled screening strategy has been designed to efficiently and simultaneously test the function of multiple genes (Sharma and Petsalaki, 2018). In pooled screening, cells are targeted by a pool of viral vector libraries with shRNAs or CRISPR sgRNAs, and the relative enrichment of specific DNA sequences in cells with certain phenotypes is measured to identify the genes associated with that phenotype (Joung et al., 2017; Sanjana, 2017; Sims et al., 2011). Despite being favorably used to perform efficient and scalable parallel cell modifications, the effectiveness of this pooled screening approach is limited to simple phenotypes only.
To perform pooled screening with scRNA-seq readouts, genetic modifications (e.g., sgRNAs) must be detected with a scRNA-seq technique. However, all available scRNA-seq platforms rely on oligo-dT priming prior to cDNA synthesis, thus capturing only polyadenylated (poly(A)) RNA transcripts. Nonetheless, most genetic modifications used in pooled screening (sgRNAs or shRNAs) do not contain poly(A) tails. In 2016, several independent studies overcame this issue by inserting barcodes associated with individual sgRNAs into poly(A) reporter transcripts, which resulted in successful CRISPR screening at a single-cell resolution (Adamson et al., 2016; Datlinger et al., 2017; Dixit et al., 2016; Jaitin et al., 2016; Xie et al., 2017) (Fig. 2). Further, Jaitin et al. (2016) developed CRISP-seq, an approach that combines pooled CRISPR screening with scRNA-seq. Specifically, CRISPR interference (CRISPRi) was combined with a poly(A) unique guide index (UGI) to characterize cellular responses at a single-cell resolution. Similarly, Dixit et al. (2016) developed Perturb-seq, a conceptually similar method of enhancing perturbation analysis utilizing droplet-based microfluidics, thus replacing the conventional CRISP-seq micro-well plate-based methods. Coupling mosaic single-cell analysis with indexed CRISPR sequencing (Mosaic-seq) enabled the successful development of a lentiviral dCas9-KRAB-blast vector that contained the epigenetic modifier KRAB, a repressor of enhancer function, to quantify enhancer repression at a single-cell resolution Xie et al., 2017). Moreover, recent advances have enabled the direct detection of sgRNAs by scRNA-seq by incorporating a sgRNA-specific RT primer (Replogle et al., 2020), thus facilitating CRISPR scRNA-seq screening without the need to build complex barcoded libraries.
Since its development, CRISPR scRNA-seq screening has been applied to address diverse biological questions. For example, Norman et al. (2019) applied Perturb-seq to manipulate a large number of gene pairs and measure the resulting changes in cell state. The authors then created a gene interaction (GI) manifold (high-dimensional surface) that can be interpreted and modeled to gain insights into how complex phenotypes emerge. Such large-scale GI analyses may render important insights into how complex, multigenic interactions govern biological traits and disease risks, such as synthetic lethal interactions in cancer and the discovery of gene targets that lessen the severity of genetic diseases (Norman et al., 2019). To cite another example, Jin et al. (2020) applied Perturb-seq
Even though Perturb-seq was an instrumental breakthrough in scRNA-seq techniques, its widespread implementation remains limited due to its inherent flaws. Specifically, this approach is prohibitively costly (even for non-genome-scale screens), lowly expressed genes and small effects are not efficiently measured, and a multiple-testing problem greatly undermines data analysis. To solve this problem, Targeted Perturb-seq (TAP-seq) amplifies genes of interest (rather than the whole transcriptome), thus lowering sequencing requirements up to 50-fold. This solves the multiple-hypothesis testing problem encountered in whole transcriptome screens, increases the sensitivity towards small expression changes and lowly expressed genes, and enables the efficient retrieval of sgRNA identities. This decrease in requisites and increase in sensitivity has broadened the applicability of TAP-seq to a wide range of functional genomics applications, including studies where phenotypes of interest are characterized by expression changes in small gene sets (Schraivogel et al., 2020).
In addition to CRISPR-based engineering, other gene-editing tools are also being incorporated into single-cell analyses. For example, Exogenous cDNAs can be identified in scRNA-seq by incorporating specific markers or barcode sequences into their UTR. This scheme has been applied to test the effect of various oncogene variants (Ursu et al., 2020) or to identify transcription factor (TF) sets that can transdifferentiate fibroblasts to neurons (Luginbühl et al., 2019). shRNA-mediated scRNA-seq screens have also been reported, including the identification of shRNAs expressed from pol II transcripts using scRNA-seq (Aarts et al., 2017).
The range of cellular features that can be analyzed with single-cell engineering toolkits evolves as novel screening systems for single-cell sequencing methods are developed. Early screening systems and toolkits were limited to scRNA analyses; however, recent advancements have shifted this focus toward multi-modal single-cell analyses. Various data from different ‘omics’ such as transcriptomics and proteomics (i.e., “multi-omics”) are also being integrated with cell engineering toolkits in single-cell sequencing.
ATAC-seq (assay for transposase-accessible chromatin using sequencing) is an approach that utilizes Tn5 transposases to tag regulatory regions for chromatin accessibility (Buenrostro et al., 2013). Through many collaborative efforts, scATAC-seq was developed to identify chromatin accessibility variations between cell subpopulations within a sample at single-cell resolution (Buenrostro et al., 2015; Chen et al., 2018; Cusanovich et al., 2015). Perturb-ATAC-seq is a method that combines CRISPR guide RNAs and open chromatin sites detected through ATAC-seq with multiplexed CRISPR interference or knockout. Rubin et al. (2019) utilized this approach to compare changes in chromatin states and B lymphoblasts landscapes during CRISPR modifications in broadly-expressed, lineage-specific
Much like the development of Perturb-ATAC-seq, Frangieh et al. (2020) recently developed Perturb-CITE-seq, a method that integrates Perturb-seq and CITE-seq to conduct scRNA-seq profiling and epitome sequencing of single-cell surface proteins under specific perturbations caused by multiplexed CRISPR mediated gene inactivation. Cellular Indexing of Transcriptomes and Epitopes (CITE-seq) is a high-throughput technique that is widely used to quantify single-cell mRNA and surface protein expression through oligonucleotide-labeled antibodies (Stoeckius et al., 2017). The integration of single-cell CRISPR-Cas9 functional screening and CITE-seq allows for more efficient identification of genes, as many relevant phenotypes are known to be best understood functionally at the protein level rather than at the transcript level (Yang et al., 2020). Expanded CRISPR-compatible CITE by sequencing (ECCITE-seq) was developed by Mimitou et al. (2019) to improve the original CITE-seq toolkit even further to reach a new level of multi-modal applicability in single-cell multi-omics research. ECCITE-seq can be used to characterize not only the transcriptome, but also the cell hashing, T cell antigen receptors (clonotypes), CRISPR perturbations, and surface proteins with sgRNA of single cells.
Until recently, only surface proteins could be analyzed at a single-cell resolution due to the intricacies of accessing the cell interior for intracellular protein sequencing via cellular fixation (Saliba et al., 2014). Intracellular staining and sequencing (INs-seq) was developed by Katzenelenbogen et al. (2020) to enable intracellular protein immunodetection via cellular fixation, which could then be analyzed using scRNA-seq. Using this approach, the authors fixed and permeabilized cells using a fixative based on methanol and ammonium sulfate solutions that precipitates proteins, inhibits enzymatic activity, and preserves RNA. More importantly, this approach enables immuno-intracellular staining while preserving mRNA integrity. Permeabilized cells were then intracellularly labeled with fluorophore-conjugated antibodies, then sorted by fluorescence-activated cell sorting (FACS) followed by scRNA-seq utilizing plate-based or microfluidics-based approaches. Comparisons with commonly used paraformaldehyde (Thomsen et al., 2016), methanol (Alles et al., 2017), and dithiobis(succinimidyl propionate) (Attar et al., 2018) fixation methods demonstrated that INs-seq preserved mRNA more effectively. INs-seq can therefore be used to characterize intracellular and post-translationally-modified proteins (PTM), signaling pathways, TF, and metabolism-related proteins at a single-cell resolution when coupled with scRNA-seq, thus enabling the analysis of intracellular signals that may not be typical of specific cell lineages.
Deriving lineage relationships between cells within a developing organism has long been a primary focus in the field of developmental biology, with fate mapping methods constantly being created to achieve this goal. Bulk lineage tracing has existed long before single-cell sequencing. However, the possibility of internal heterogeneity in specific cell populations or a few off-target cells has not been comprehensively assessed (Wagner and Klein, 2020). Single-cell screening systems have evolved to not only identify single-cell features, thus increasing the precision of the labeling process to limit potential cellular heterogeneity, but also to link the data obtained from various cells within a cell type to elucidate potential relationships. Lineage tracing can therefore provide critical insights into the pluripotency of stem cells. In turn, the findings gathered from such experiments could be used to treat diseases and aid in the development of regenerative medicine. Retroviral labeling is an earlier method of lineage tracing that utilizes libraries containing reporter transgenes (e.g., beta-galactosidase and GFP) and DNA fragment barcodes for clonal relationship analysis via PCR amplification followed by sequencing (Walsh and Cepko, 1992). However, although this procedure has been successfully used in the past to elucidate lineage relationships, it has a few limitations that must be considered prior to its implementation. Concretely, this method cannot be used to characterize cell lines at a single-cell resolution and is therefore unsuitable in instances where higher-dimensional and more complex data is needed, thus highlighting the limitations of this approach compared to more recent techniques. Retroviral vectors can become spontaneously silent during the experiment, and therefore certain experimental paradigms such as histochemical labeling may become more challenging (Ginsberg and Che, 2004; Mayer et al., 2015). Furthermore, with the barcode method implemented in retroviral labeling, only the cells with the ability to divide can pass down the barcode to their progeny, thus limiting the cells and methods that can be used.
Recently developed methods have begun to address many of the limitations of previous techniques such as retroviral labeling, including the implementation of microfluidics platforms in combination with scRNA-seq to characterize both interclonal and intraclonal variability of CD8 T-cells (Kimmerling et al., 2016). One of the most significant recent advancements in this technique was reported by McKenna et al. (2016), in which the authors developed the Genome Editing of Synthetic Target Arrays for Lineage Tracing (GESTALT) method. This approach utilizes CRISPR-Cas9 genome-editing technology to create unique mutation barcode patterns (McKenna et al., 2016). These unique mutations are accumulated after several sequential cellular divisions, which are then recovered using targeted sequencing, thus enabling the identification of lineage relationships between cells. In turn, this allows for a more efficient analysis of development (Kalhor et al., 2018) and does not have as many limitations as previous methods, including potential silencing from retroviral labeling. A disadvantage of GESTALT, however, is that it cannot yet determine the precise anatomical position and cell type of each assayed cell, meaning that other methods may be needed if the aforementioned information is required in a specific experiment. Raj et al. (2018) recently developed scGESTALT, which makes use of large-scale transcriptional profiling via the inDrops microfluidic platform (Zilionis et al., 2017) to extend the CRISPR-Cas9 lineage tracing abilities of the original GESTALT technique (Raj et al., 2018). Therefore, this approach offers a means for more efficient cell fate mapping and the elucidation of clonal relationships at the single-cell level.
ScarTrace is another technique that utilizes CRISPR-Cas9 technology to induce double-strand breaks, thus resulting in different-length insertions or deletions at various positions (scars) to create heritable labels (Alemany et al., 2018). Recent developments have furthered the evolution of CRISPR scarring methods such as ScarTrace and other barcoding methods. For example, iTracer (He et al., 2020b) is a novel technique that takes advantage of previously developed complex barcode libraries (Weinreb et al., 2020) and an inducible Cas9 scarring system similar to the one used in the lineage tracing by nuclease-activated editing of ubiquitous sequences (LINNAEUS) method(Spanjaard et al., 2018) to reconstruct the lineage trajectories and fate decisions of induced pluripotent stem cells (iPSCs). CellTagging is another recently developed method that allows for the parallel capture of lineage information and cell identity through a combinatorial cell indexing approach with lentiviral barcoding (Kong et al., 2020). This protocol consists of generating complex plasmid and lentivirus CellTag libraries to label cells, followed by sequential CellTagging throughout a given biological process. Lentiviral barcodes were incorporated via the integration of a constitutively expressed GFP-encoding gene with random barcodes engineered into the 3’ untranslated region sequence, after which additional barcoding rounds were applied to mark successive lineage restriction events, followed by scRNA-seq analysis. Biddy et al. (2018) used CellTagging on mouse fibroblast cells, focusing on the direct reprogramming of fibroblasts into induced endoderm progenitors (iEPs). Therefore, increasing the throughput of novel lineage tracing methods could provide a robust and more streamlined basis for the elucidation of relationships between higher populations of cells within a cell line.
This review discussed not only the recent developments in scRNA-seq technology and its advantages, but also the increased applicability of single-cell techniques, particularly when coupled with cell engineering toolkits. The combination of genetic perturbation approaches (e.g., CRISPR, CRISPRi, CRISPRa, TF over-expression, and lineage tracing, among others) with multimodal, single-cell, genome-wide phenotyping is revolutionizing the field of functional genomics by generating biometric data at unprecedented rates. Medical research benefits greatly from single-cell modification technologies, as various responses to a myriad of TFs can be studied at a single-cell resolution across multiple cell lines. However, despite the recent developments of perturbation-mediated toolkits such as Perturb-ATAC-seq and Perturb-CITE-seq, many techniques are yet to implement these perturbation methods. Therefore, the future co-implementation of genetic perturbation and lineage tracing into novel methods will be essential to further the field of single-cell multimodal omics. These changes will significantly improve the depth to which the clonal relationships between cells can be characterized and aid in the development of stem cell research. Moreover, INs-seq’s ability to analyze intracellular proteins is one of the most recent breakthroughs in the effort to dissect the role of individual proteins in single-cell physiology. Further integration of intracellular protein analysis and continuous expansion into new modalities will likely enhance our understanding of the various interacting cell lines that comprise the human body, as well as the potential molecular and cellular mechanisms that may be targeted to treat and prevent diseases. Finally, increasing the throughput of current approaches from hundreds of thousands to millions of cells would allow for a significantly more streamlined whole-genome profiling of entire organisms, thus surpassing the capacity of current next-generation sequencing technologies.
This work was supported by the POSCO Science Fellowship of the POSCO TJ Park Foundation.
S.L., J.K., and J.E.P. wrote the manuscript. J.E.P. created the figures.
The authors have no potential conflicts of interest to disclose.
Representative studies on single-cell engineering
Name of the technique | Perturbation mechanism | Detection mechanism | No. of perturbations | No. of cells | Modularity | Reference |
---|---|---|---|---|---|---|
Perturb-seq | CRISPRi | sgRNA barcode | 67 sgRNAs (24 genes) | ~30,000 | RNA | Dixit et al. (2016) |
CRISP-seq | CRISPR KO | sgRNA barcode | 57 sgRNAs (22 genes) | 6,144 | RNA | Jaitin et al. (2016) |
Mosaic-seq | CRISPRi | sgRNA barcode | 241 sgRNAs (71 enhancers) | 12,444 | RNA | Xie et al. (2017) |
CROP-seq | CRISPR KO | sgRNA barcode | 48 sgRNAs (20 genes) | N/A | RNA | Datlinger et al. (2017) |
Perturb-ATAC-seq | CRISPRi | sgRNA direct capture/barcode | ~190 sgRNAs (63 genes) | ~4,300 | Chromatin | Rubin et al. (2019) |
ECCITE-seq | CRISPR KO | sgRNA direct capture | N/A | N/A | RNA, surface protein | Mimitou et al. (2019) |
Convert-seq | cDNA | cDNA sequence | 20 genes (transcription factors) | 466 | RNA | Luginbühl et al. (2019) |
Perturb-CITE-seq | CRISPR KO | sgRNA barcode | ~750 sgRNAs (~250 genes) | ~218,000 | Surface protein | Frangieh et al. (2020) |
Spear-ATAC-seq | CRISPRi | sgRNA DNA PCR | 414 sgRNAs | 104,592 | Chromatin | Pierce et al. (2020) |
Targeted-Perturb-seq (TAP-seq) | CRISPRi | sgRNA barcode | 1,790 enhancers | 231,667 | RNA | Schraivogel et al. (2020) |
Perturb-seq | cDNA | cDNA barcode | 200 cancer gene variants | >300,000 | RNA | Ursu et al. (2020) |
Mol. Cells 2021; 44(3): 127-135
Published online March 31, 2021 https://doi.org/10.14348/molcells.2021.0002
Copyright © The Korean Society for Molecular and Cellular Biology.
Sean Lee1,2 , Jireh Kim1,2
, and Jong-Eun Park1, *
1Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea, 2These authors contributed equally to this work.
Correspondence to:jp24@kaist.ac.kr
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
Since the introduction of RNA sequencing (RNA-seq) as a high-throughput mRNA expression analysis tool, this procedure has been increasingly implemented to identify cell-level transcriptome changes in a myriad of model systems. However, early methods processed cell samples in bulk, and therefore the unique transcriptomic patterns of individual cells would be lost due to data averaging. Nonetheless, the recent and continuous development of new single-cell RNA sequencing (scRNA-seq) toolkits has enabled researchers to compare transcriptomes at a single-cell resolution, thus facilitating the analysis of individual cellular features and a deeper understanding of cellular functions. Nonetheless, the rapid evolution of high throughput single-cell “omics” tools has created the need for effective hypothesis verification strategies. Particularly, this issue could be addressed by coupling cell engineering techniques with single-cell sequencing. This approach has been successfully employed to gain further insights into disease pathogenesis and the dynamics of differentiation trajectories. Therefore, this review will discuss the current status of cell engineering toolkits and their contributions to single-cell and genome-wide data collection and analyses.
Keywords: cell engineering, CRISPR screening, lineage tracing, single-cell multi-omics
Since the first single-cell transcriptome analysis in 2009, the throughput of single-cell transcriptomic techniques has grown exponentially, allowing for a single study to characterize millions of cells (Svensson et al., 2018). Additionally, single-cell approaches are no longer limited to RNA analyses, but can also be used to characterize DNA and proteins (Lee et al., 2020). This powerful technique has been adopted in many fields of the life sciences and has rapidly expanded our understanding of biological systems. For example, a transcriptome and an open chromatin atlas of the embryonic development process of humans and mice are being created, thus providing new insights into the mechanisms by which gene expression modulates an individual’s developmental process (Cao et al., 2019; 2020; He et al., 2020a; Park et al., 2020; Pijuan-Sala et al., 2020). Moreover, the human immune system’s response to COVID-19 is being currently studied at various levels to gain insights into potential molecular mechanisms that could be targeted to control this disease (Sungnak et al., 2020; Zhang et al., 2020). The Human Cell Atlas is an international collaborative initiative that has provided countless researchers with a platform to produce data cooperatively, as well as to compare and analyze their results and focus their efforts towards a single common goal (Panina et al., 2020; Regev et al., 2017; 2018).
The ever-increasing wealth of single-cell data is deepening our understanding of the structure of the human body. However, to further this understanding, a system that efficiently validates new hypotheses is urgently needed. Interestingly, single-cell techniques can be combined with various cell engineering techniques to provide a platform for efficient hypothesis verification, and several single-cell engineering toolkits are being developed based on this strategy. Therefore, this review will address how cell engineering techniques such as CRISPR screening and lineage tracing are being combined with single-cell techniques to usher in a new era of cell engineering (Table 1).
Early single-cell sequencing approaches depended on amplifying the genetic materials of a single cell to create an RNA-seq library (Tang et al., 2009). Later on, two layers of barcode systems were introduced to increase the throughput of single-cell techniques. Among these, cell barcodes (CBCs) are incorporated during reverse transcription (RT) or template switching steps. This allows for the simultaneous preparation of multiple cells by pooling them after cDNA synthesis, increasing the throughput and efficiency of scRNA-seq library preparation. The second barcoding technique consists of a unique molecular identifier (UMI), which corrects the quantification error caused by polymerase chain reaction (PCR) amplification bias by adding random barcodes during the RT reaction (Kivioja et al., 2011).
In order to separate the cells using primers with unique barcode sequences, a multi-well plate system was first implemented, which allowed for a throughput of 100 to 1,000 cells (Hashimshony et al., 2012; Islam et al., 2011; Ramsköld et al., 2012) (Fig. 1). Afterward, the development of microfluidic systems in which each cell is mixed with CBC-specific RT primer conjugated-beads within a single droplet further increased the throughput to more than 10,000 cells (Klein et al., 2015; Macosko et al., 2015). One of the most recent advancements in this field includes
In order to sequence the genetic materials of single cells, solid tissues must first be dissociated. However, it is difficult to dissociate tissues with hard-to-release cell types while avoiding damage to fragile tissues. Many attempts have been made to achieve this balance, such as cryopreservation and methanol fixation. However, each of these approaches has its unique limitations (major loss of epithelial cell types and ambient RNA leakage, respectively), which negatively affect the final scRNA-seq results (Denisenko et al., 2020; Slyper et al., 2020). Single-nucleus RNA-seq (snRNA-seq) was developed to solve these problems. snRNA-seq decouples tissue acquisition from immediate sample processing, thus circumventing the inherent difficulties of obtaining fresh tissues for scRNA-seq analyses, as well as the potential loss of sensitive cells due to enzymatic digestion. Therefore, this strategy can be applied to hard-to-dissociate tissues, such as tissues rich in cell types (e.g., neurons, adipocytes, and skeletal muscle cells), archived frozen clinical materials, and tissues that must be frozen to register into specific coordinates. Moreover, given that snRNA-seq can be used to handle minute frozen specimens, large-scale studies from tissue atlases to longitudinal clinical trials and human genetics can be performed (Ernst et al., 2020; Gaublomme et al., 2019; Rozenblatt-Rosen et al., 2020).
Cell hashing was recently introduced to further increase the throughput of scRNA-seq. In this technique, each sample is labeled with unique ‘hashtag’ barcodes using oligonucleotide-conjugated antibodies (Stoeckius et al., 2017; 2018). This can be also applied to snRNA-seq using DNA-barcoded antibodies targeting the nuclear pore complex (Gaublomme et al., 2019). Moreover, MULTI-seq utilizes chemically modified oligos that can directly stain cellular membranes, making the staining process much easier. Similarly, short barcoding oligo (SBO) barcoding introduces oligos into cells using liposomal transfection, thus enabling single-cell experiment multiplexing. sci-Plex is another newly developed strategy that overcomes this issue by directly labeling nuclei with unmodified single-stranded DNA oligos (Srivatsan et al., 2020). Combined with the sci-RNA-seq combinatorial indexing technique, this approach provides a platform to multiplex hundreds of different conditions, thus rendering a total throughput of millions of cells.
The process of demultiplexing and doublet detection was further enhanced with the creation of demuxlet, a computational algorithm inspired by algorithms that were initially developed to detect DNA contamination in sequencing samples. Even without an oligo or antibody, demuxlet allows researchers to hash their samples even when only the genotype is available. Additionally, this approach can simultaneously demultiplex and detect doublets from more than two individuals from multiplexed Drop-seq using genetic variations, an achievement that was thought to be impossible prior to the development of this algorithm (Kang et al., 2018). The development of Souporcell then allowed for an increased genotype calling efficiency, thus enabling the determination of genotypes
In the field of functional genomics, diverse cell engineering tools such as shRNAs or CRISPRs have been used to modify gene expression. For instance, a pooled screening strategy has been designed to efficiently and simultaneously test the function of multiple genes (Sharma and Petsalaki, 2018). In pooled screening, cells are targeted by a pool of viral vector libraries with shRNAs or CRISPR sgRNAs, and the relative enrichment of specific DNA sequences in cells with certain phenotypes is measured to identify the genes associated with that phenotype (Joung et al., 2017; Sanjana, 2017; Sims et al., 2011). Despite being favorably used to perform efficient and scalable parallel cell modifications, the effectiveness of this pooled screening approach is limited to simple phenotypes only.
To perform pooled screening with scRNA-seq readouts, genetic modifications (e.g., sgRNAs) must be detected with a scRNA-seq technique. However, all available scRNA-seq platforms rely on oligo-dT priming prior to cDNA synthesis, thus capturing only polyadenylated (poly(A)) RNA transcripts. Nonetheless, most genetic modifications used in pooled screening (sgRNAs or shRNAs) do not contain poly(A) tails. In 2016, several independent studies overcame this issue by inserting barcodes associated with individual sgRNAs into poly(A) reporter transcripts, which resulted in successful CRISPR screening at a single-cell resolution (Adamson et al., 2016; Datlinger et al., 2017; Dixit et al., 2016; Jaitin et al., 2016; Xie et al., 2017) (Fig. 2). Further, Jaitin et al. (2016) developed CRISP-seq, an approach that combines pooled CRISPR screening with scRNA-seq. Specifically, CRISPR interference (CRISPRi) was combined with a poly(A) unique guide index (UGI) to characterize cellular responses at a single-cell resolution. Similarly, Dixit et al. (2016) developed Perturb-seq, a conceptually similar method of enhancing perturbation analysis utilizing droplet-based microfluidics, thus replacing the conventional CRISP-seq micro-well plate-based methods. Coupling mosaic single-cell analysis with indexed CRISPR sequencing (Mosaic-seq) enabled the successful development of a lentiviral dCas9-KRAB-blast vector that contained the epigenetic modifier KRAB, a repressor of enhancer function, to quantify enhancer repression at a single-cell resolution Xie et al., 2017). Moreover, recent advances have enabled the direct detection of sgRNAs by scRNA-seq by incorporating a sgRNA-specific RT primer (Replogle et al., 2020), thus facilitating CRISPR scRNA-seq screening without the need to build complex barcoded libraries.
Since its development, CRISPR scRNA-seq screening has been applied to address diverse biological questions. For example, Norman et al. (2019) applied Perturb-seq to manipulate a large number of gene pairs and measure the resulting changes in cell state. The authors then created a gene interaction (GI) manifold (high-dimensional surface) that can be interpreted and modeled to gain insights into how complex phenotypes emerge. Such large-scale GI analyses may render important insights into how complex, multigenic interactions govern biological traits and disease risks, such as synthetic lethal interactions in cancer and the discovery of gene targets that lessen the severity of genetic diseases (Norman et al., 2019). To cite another example, Jin et al. (2020) applied Perturb-seq
Even though Perturb-seq was an instrumental breakthrough in scRNA-seq techniques, its widespread implementation remains limited due to its inherent flaws. Specifically, this approach is prohibitively costly (even for non-genome-scale screens), lowly expressed genes and small effects are not efficiently measured, and a multiple-testing problem greatly undermines data analysis. To solve this problem, Targeted Perturb-seq (TAP-seq) amplifies genes of interest (rather than the whole transcriptome), thus lowering sequencing requirements up to 50-fold. This solves the multiple-hypothesis testing problem encountered in whole transcriptome screens, increases the sensitivity towards small expression changes and lowly expressed genes, and enables the efficient retrieval of sgRNA identities. This decrease in requisites and increase in sensitivity has broadened the applicability of TAP-seq to a wide range of functional genomics applications, including studies where phenotypes of interest are characterized by expression changes in small gene sets (Schraivogel et al., 2020).
In addition to CRISPR-based engineering, other gene-editing tools are also being incorporated into single-cell analyses. For example, Exogenous cDNAs can be identified in scRNA-seq by incorporating specific markers or barcode sequences into their UTR. This scheme has been applied to test the effect of various oncogene variants (Ursu et al., 2020) or to identify transcription factor (TF) sets that can transdifferentiate fibroblasts to neurons (Luginbühl et al., 2019). shRNA-mediated scRNA-seq screens have also been reported, including the identification of shRNAs expressed from pol II transcripts using scRNA-seq (Aarts et al., 2017).
The range of cellular features that can be analyzed with single-cell engineering toolkits evolves as novel screening systems for single-cell sequencing methods are developed. Early screening systems and toolkits were limited to scRNA analyses; however, recent advancements have shifted this focus toward multi-modal single-cell analyses. Various data from different ‘omics’ such as transcriptomics and proteomics (i.e., “multi-omics”) are also being integrated with cell engineering toolkits in single-cell sequencing.
ATAC-seq (assay for transposase-accessible chromatin using sequencing) is an approach that utilizes Tn5 transposases to tag regulatory regions for chromatin accessibility (Buenrostro et al., 2013). Through many collaborative efforts, scATAC-seq was developed to identify chromatin accessibility variations between cell subpopulations within a sample at single-cell resolution (Buenrostro et al., 2015; Chen et al., 2018; Cusanovich et al., 2015). Perturb-ATAC-seq is a method that combines CRISPR guide RNAs and open chromatin sites detected through ATAC-seq with multiplexed CRISPR interference or knockout. Rubin et al. (2019) utilized this approach to compare changes in chromatin states and B lymphoblasts landscapes during CRISPR modifications in broadly-expressed, lineage-specific
Much like the development of Perturb-ATAC-seq, Frangieh et al. (2020) recently developed Perturb-CITE-seq, a method that integrates Perturb-seq and CITE-seq to conduct scRNA-seq profiling and epitome sequencing of single-cell surface proteins under specific perturbations caused by multiplexed CRISPR mediated gene inactivation. Cellular Indexing of Transcriptomes and Epitopes (CITE-seq) is a high-throughput technique that is widely used to quantify single-cell mRNA and surface protein expression through oligonucleotide-labeled antibodies (Stoeckius et al., 2017). The integration of single-cell CRISPR-Cas9 functional screening and CITE-seq allows for more efficient identification of genes, as many relevant phenotypes are known to be best understood functionally at the protein level rather than at the transcript level (Yang et al., 2020). Expanded CRISPR-compatible CITE by sequencing (ECCITE-seq) was developed by Mimitou et al. (2019) to improve the original CITE-seq toolkit even further to reach a new level of multi-modal applicability in single-cell multi-omics research. ECCITE-seq can be used to characterize not only the transcriptome, but also the cell hashing, T cell antigen receptors (clonotypes), CRISPR perturbations, and surface proteins with sgRNA of single cells.
Until recently, only surface proteins could be analyzed at a single-cell resolution due to the intricacies of accessing the cell interior for intracellular protein sequencing via cellular fixation (Saliba et al., 2014). Intracellular staining and sequencing (INs-seq) was developed by Katzenelenbogen et al. (2020) to enable intracellular protein immunodetection via cellular fixation, which could then be analyzed using scRNA-seq. Using this approach, the authors fixed and permeabilized cells using a fixative based on methanol and ammonium sulfate solutions that precipitates proteins, inhibits enzymatic activity, and preserves RNA. More importantly, this approach enables immuno-intracellular staining while preserving mRNA integrity. Permeabilized cells were then intracellularly labeled with fluorophore-conjugated antibodies, then sorted by fluorescence-activated cell sorting (FACS) followed by scRNA-seq utilizing plate-based or microfluidics-based approaches. Comparisons with commonly used paraformaldehyde (Thomsen et al., 2016), methanol (Alles et al., 2017), and dithiobis(succinimidyl propionate) (Attar et al., 2018) fixation methods demonstrated that INs-seq preserved mRNA more effectively. INs-seq can therefore be used to characterize intracellular and post-translationally-modified proteins (PTM), signaling pathways, TF, and metabolism-related proteins at a single-cell resolution when coupled with scRNA-seq, thus enabling the analysis of intracellular signals that may not be typical of specific cell lineages.
Deriving lineage relationships between cells within a developing organism has long been a primary focus in the field of developmental biology, with fate mapping methods constantly being created to achieve this goal. Bulk lineage tracing has existed long before single-cell sequencing. However, the possibility of internal heterogeneity in specific cell populations or a few off-target cells has not been comprehensively assessed (Wagner and Klein, 2020). Single-cell screening systems have evolved to not only identify single-cell features, thus increasing the precision of the labeling process to limit potential cellular heterogeneity, but also to link the data obtained from various cells within a cell type to elucidate potential relationships. Lineage tracing can therefore provide critical insights into the pluripotency of stem cells. In turn, the findings gathered from such experiments could be used to treat diseases and aid in the development of regenerative medicine. Retroviral labeling is an earlier method of lineage tracing that utilizes libraries containing reporter transgenes (e.g., beta-galactosidase and GFP) and DNA fragment barcodes for clonal relationship analysis via PCR amplification followed by sequencing (Walsh and Cepko, 1992). However, although this procedure has been successfully used in the past to elucidate lineage relationships, it has a few limitations that must be considered prior to its implementation. Concretely, this method cannot be used to characterize cell lines at a single-cell resolution and is therefore unsuitable in instances where higher-dimensional and more complex data is needed, thus highlighting the limitations of this approach compared to more recent techniques. Retroviral vectors can become spontaneously silent during the experiment, and therefore certain experimental paradigms such as histochemical labeling may become more challenging (Ginsberg and Che, 2004; Mayer et al., 2015). Furthermore, with the barcode method implemented in retroviral labeling, only the cells with the ability to divide can pass down the barcode to their progeny, thus limiting the cells and methods that can be used.
Recently developed methods have begun to address many of the limitations of previous techniques such as retroviral labeling, including the implementation of microfluidics platforms in combination with scRNA-seq to characterize both interclonal and intraclonal variability of CD8 T-cells (Kimmerling et al., 2016). One of the most significant recent advancements in this technique was reported by McKenna et al. (2016), in which the authors developed the Genome Editing of Synthetic Target Arrays for Lineage Tracing (GESTALT) method. This approach utilizes CRISPR-Cas9 genome-editing technology to create unique mutation barcode patterns (McKenna et al., 2016). These unique mutations are accumulated after several sequential cellular divisions, which are then recovered using targeted sequencing, thus enabling the identification of lineage relationships between cells. In turn, this allows for a more efficient analysis of development (Kalhor et al., 2018) and does not have as many limitations as previous methods, including potential silencing from retroviral labeling. A disadvantage of GESTALT, however, is that it cannot yet determine the precise anatomical position and cell type of each assayed cell, meaning that other methods may be needed if the aforementioned information is required in a specific experiment. Raj et al. (2018) recently developed scGESTALT, which makes use of large-scale transcriptional profiling via the inDrops microfluidic platform (Zilionis et al., 2017) to extend the CRISPR-Cas9 lineage tracing abilities of the original GESTALT technique (Raj et al., 2018). Therefore, this approach offers a means for more efficient cell fate mapping and the elucidation of clonal relationships at the single-cell level.
ScarTrace is another technique that utilizes CRISPR-Cas9 technology to induce double-strand breaks, thus resulting in different-length insertions or deletions at various positions (scars) to create heritable labels (Alemany et al., 2018). Recent developments have furthered the evolution of CRISPR scarring methods such as ScarTrace and other barcoding methods. For example, iTracer (He et al., 2020b) is a novel technique that takes advantage of previously developed complex barcode libraries (Weinreb et al., 2020) and an inducible Cas9 scarring system similar to the one used in the lineage tracing by nuclease-activated editing of ubiquitous sequences (LINNAEUS) method(Spanjaard et al., 2018) to reconstruct the lineage trajectories and fate decisions of induced pluripotent stem cells (iPSCs). CellTagging is another recently developed method that allows for the parallel capture of lineage information and cell identity through a combinatorial cell indexing approach with lentiviral barcoding (Kong et al., 2020). This protocol consists of generating complex plasmid and lentivirus CellTag libraries to label cells, followed by sequential CellTagging throughout a given biological process. Lentiviral barcodes were incorporated via the integration of a constitutively expressed GFP-encoding gene with random barcodes engineered into the 3’ untranslated region sequence, after which additional barcoding rounds were applied to mark successive lineage restriction events, followed by scRNA-seq analysis. Biddy et al. (2018) used CellTagging on mouse fibroblast cells, focusing on the direct reprogramming of fibroblasts into induced endoderm progenitors (iEPs). Therefore, increasing the throughput of novel lineage tracing methods could provide a robust and more streamlined basis for the elucidation of relationships between higher populations of cells within a cell line.
This review discussed not only the recent developments in scRNA-seq technology and its advantages, but also the increased applicability of single-cell techniques, particularly when coupled with cell engineering toolkits. The combination of genetic perturbation approaches (e.g., CRISPR, CRISPRi, CRISPRa, TF over-expression, and lineage tracing, among others) with multimodal, single-cell, genome-wide phenotyping is revolutionizing the field of functional genomics by generating biometric data at unprecedented rates. Medical research benefits greatly from single-cell modification technologies, as various responses to a myriad of TFs can be studied at a single-cell resolution across multiple cell lines. However, despite the recent developments of perturbation-mediated toolkits such as Perturb-ATAC-seq and Perturb-CITE-seq, many techniques are yet to implement these perturbation methods. Therefore, the future co-implementation of genetic perturbation and lineage tracing into novel methods will be essential to further the field of single-cell multimodal omics. These changes will significantly improve the depth to which the clonal relationships between cells can be characterized and aid in the development of stem cell research. Moreover, INs-seq’s ability to analyze intracellular proteins is one of the most recent breakthroughs in the effort to dissect the role of individual proteins in single-cell physiology. Further integration of intracellular protein analysis and continuous expansion into new modalities will likely enhance our understanding of the various interacting cell lines that comprise the human body, as well as the potential molecular and cellular mechanisms that may be targeted to treat and prevent diseases. Finally, increasing the throughput of current approaches from hundreds of thousands to millions of cells would allow for a significantly more streamlined whole-genome profiling of entire organisms, thus surpassing the capacity of current next-generation sequencing technologies.
This work was supported by the POSCO Science Fellowship of the POSCO TJ Park Foundation.
S.L., J.K., and J.E.P. wrote the manuscript. J.E.P. created the figures.
The authors have no potential conflicts of interest to disclose.
. Representative studies on single-cell engineering.
Name of the technique | Perturbation mechanism | Detection mechanism | No. of perturbations | No. of cells | Modularity | Reference |
---|---|---|---|---|---|---|
Perturb-seq | CRISPRi | sgRNA barcode | 67 sgRNAs (24 genes) | ~30,000 | RNA | Dixit et al. (2016) |
CRISP-seq | CRISPR KO | sgRNA barcode | 57 sgRNAs (22 genes) | 6,144 | RNA | Jaitin et al. (2016) |
Mosaic-seq | CRISPRi | sgRNA barcode | 241 sgRNAs (71 enhancers) | 12,444 | RNA | Xie et al. (2017) |
CROP-seq | CRISPR KO | sgRNA barcode | 48 sgRNAs (20 genes) | N/A | RNA | Datlinger et al. (2017) |
Perturb-ATAC-seq | CRISPRi | sgRNA direct capture/barcode | ~190 sgRNAs (63 genes) | ~4,300 | Chromatin | Rubin et al. (2019) |
ECCITE-seq | CRISPR KO | sgRNA direct capture | N/A | N/A | RNA, surface protein | Mimitou et al. (2019) |
Convert-seq | cDNA | cDNA sequence | 20 genes (transcription factors) | 466 | RNA | Luginbühl et al. (2019) |
Perturb-CITE-seq | CRISPR KO | sgRNA barcode | ~750 sgRNAs (~250 genes) | ~218,000 | Surface protein | Frangieh et al. (2020) |
Spear-ATAC-seq | CRISPRi | sgRNA DNA PCR | 414 sgRNAs | 104,592 | Chromatin | Pierce et al. (2020) |
Targeted-Perturb-seq (TAP-seq) | CRISPRi | sgRNA barcode | 1,790 enhancers | 231,667 | RNA | Schraivogel et al. (2020) |
Perturb-seq | cDNA | cDNA barcode | 200 cancer gene variants | >300,000 | RNA | Ursu et al. (2020) |
Szu-Hsien (Sam) Wu, Ji-Hyun Lee, and Bon-Kyoung Koo
Mol. Cells 2019; 42(2): 104-112 https://doi.org/10.14348/molcells.2019.0006