Four flavors of CRISPR knockout screens

About a year and a half ago, George Church’s lab demonstrated the utility of the CRISPR/Cas9 system for genome engineering in human cells [Mali & Yang 2013]. A short guide RNA (sgRNA) complementary to a 20bp sequence in the genome complexes with the Cas9 nuclease and leads it to create a targeted double-strand break, thus creating an opportunity for homologous recombination (for knock-ins) or non-homologous end joining (for knockouts). In cultured cells, the technique is fairly efficient at creating knock-ins by homologous recombination (Mali describes rates of 3 to 8% ) and incredibly efficient at creating knockouts (we’ve heard rumors of efficiencies of up to 80% homozygous deletion for specific sgRNAs).

While Mali and Yang targeted a specific gene of interest (PPP1R12C) as a demonstration, they bioinformatically predicted 190,000 sgRNAs for targeting loci across the human genome, foreshadowing the use of CRISPR for genome-wide screens. One year later (this spring), four papers appeared in the literature almost synchronously, each demonstrating the use of pooled CRISPR libraries for conducting genetic screens in mammalian cells [Wang 2014, Shalem & Sanjana 2014, Koike-Yusa & Li 2014, Zhou 2014].

Genetic screens involve examining in parallel the effect of a wide variety of genetic mutations on a particular cellular phenotype. These have long been popular in yeast [Forsburg 2001] but until recently were difficult to achieve in mammalian cells, which are diploid and don’t mate with one another in culture. The solution in mammalian cells has been to use RNAi screens – transfect cells with thousands of different siRNAs and see which ones alter the phenotype. But the 50 or 80% knockdown often achievable with mass-produced siRNAs is often insufficient to confer a strong phenotypic change. CRISPR/Cas9, then, finally provided an opportunity to examine the phenotypic effects of total knockout. Design a library of sgRNAs that create indels via non-homologous end joining in early constitutive exons, and you can knock out most every gene.

What is striking about the first four papers to develop this approach is just how similar their strategies are, despite being developed in parallel.

Note, first, that even highly efficient sgRNAs don’t create homozygous deletions in 100% of cells. Therefore if you expose a batch of cells to the CRISPR reagents and then measure their phenotype in bulk, you will be measuring some mix of the homozygous knockout phenotype, heterozygous knockout phenotype, and unaffected cells. This “arrayed CRISPR” approach, at least when paired with bulk phenotype measurement, would not necessarily be much more powerful than RNAi screens. Instead, the power of CRISPR lies in the ability to select clones that do have homozygous deletions. Therefore each of these four papers used a “pooled CRISPR library” approach: transform a batch of cells with viral vectors containing a mix of different sgRNAs, then progressively select the cells for phenotypes of interest, then measure (using next-generation sequencing) which sgRNAs you’ve thereby depleted or enriched.

How, then, to select for phenotypes of interest? All four of these studies used cell death as the phenotype. First, you clone the Cas9 nuclease into a lentivirus and transform cells with the lentivirus so that Cas9 is stably integrated into their genome. Then you expose the cells to lentiviruses expressing the myriad sgRNAs, and give them anywhere from a couple days to a couple weeks for the sgRNAs to be expressed and create indels in target genes. Now you’ve got a collection of cells – all mixed together in one flask, mind you – that have thousands of different genes knocked out. Then you expose the cells to something that normally kills them – say, the chemotherapy drugs 6-thioguanine, etoposide [Wang 2014] or vemurafenib [Shalem & Sanjana 2014], or anthrax or diphtheria toxin [Zhou 2014], or Clostridium septicum alpha toxin [Koike-Yusa & Li 2014].

You use these toxins to kill off the cells day by day, sequencing as you go. These are all toxins that will kill any normal cell, and most genes have no effect at all on this toxicity, so overwhelmingly you find that the sgRNAs are under negative selection – the cells with those genes knocked out are getting killed off. By contrast, if there are a few genes that are on-pathway, that actually mediate the toxicity, then knocking them out will render the cells resistant, and the sgRNAs that cause these knockouts will be under massive positive selection. For instance, after a few days of exposure to anthrax toxin, most of the several hundred sgRNAs tested by [Zhou 2014] were virtually eliminated, while predictably enough, sgRNAs targeting the anthrax toxin receptor (ANTXR1) were enriched 1000-fold.

Similarly, the other authors also report that the most obviously expected genes for each toxin were indeed selected. For instance, DNA topoisomerase II (TOP2A) knockout was favored under conditions of etoposide poisoning [Wang 2014]. Two studies each found that sgRNAs targeting four genes in the DNA mismatch repair pathway were enriched when cells were treated with 6-thioguanine [Wang 2014, Koike-Yusa & Li 2014], i.e., knockouts that allowed the cells to ignore their own DNA damage encouraged the cells to proliferate. In addition to showing that known biology could be recapitulated using CRISPR libraries, these studies also reported novel validated hits, revealing new genes involved in, for example, vemurafenib resistance [Shalem & Sanjana 2014].

The broad strokes of the approaches used in these papers are so similar that it takes a very close reading to see what’s different. One study used mouse embryonic stem cells [Koike-Yusa & Li 2014], the others used human cell lines [Wang 2014, Shalem & Sanjana 2014, Zhou 2014]. One study was circumspect, validating the approach first in the “near-haploid” human cell line KBM7 before moving on to diploid cells [Wang 2014], while the others plunged straight into diploid cell editing. One group created a small (<1000 sgRNAs) library targeting a few hundred candidate genes [Zhou 2014], while the others went genome-wide, or nearly so. The libraries contained an average of about 3 [Shalem & Sanjana 2014, Zhou 2014], 5 [Koike-Yusa & Li 2014], or 10 [Wang 2014] sgRNAs per gene being targeted, and partly as a consequence, used different statistical tests to assess which genes were enriched – for instance, [Zhou 2014]] used DESeq2 which assumes a negative binomial distribution of read count data; [Wang 2014] simply applied a Kolmogorov-Smirnov test to the counts of the 10 sgRNAs for each gene before and after selection. Some of the studies reported ways to improve the CRISPR library approach – for instance, [Wang 2014] targeted essential ribosomal proteins without which cells cannot live, and then figured that the selection of sgRNAs under this paradigm is correlated with knockout efficiency, and thus reported tips for optimal sgRNA design. One study [Koike-Yusa & Li 2014] provided what was (for me) a helpful explanation of why multiple sgRNAs are needed per gene. Genes where only one sgRNA is enriched are likely to be false positives – for instance, they might represent off-target effects of that sgRNA, or they might represent “passenger sgRNAs” in cells where a different “driver sgRNA” confers the resistance phenotype. I am no expert on this material, so if I’ve missed another important distinction between some of these papers, please leave me a comment below.

Any way you slice it, pooled CRISPR libraries are an incredibly powerful new tool for genetic screens. The genome-wide libraries have all been deposited in addgene, so now you too can join in the fun and start screening for genes whose knockout confers a phenotype of your choosing. Though all of these studies used cell death for selection, there seems to be in principle no reason you couldn’t employ, say, FACS sorting to select based on immunoreactivity or reporter gene expression. As time goes on it will be interesting to see what new variations on this method appear in the literature.