The world now has two major technologies for targeted editing of the genome: zinc finger nucleases (ZFNs) and TAL effector nucleases (TALENs).  In principle these technologies are capable of targeting virtually any site in the genome and editing the DNA that is there.  These have already begun to revolutionize research and medicine, and their potential over the coming years seems enormous, so I sat down to wrap my head around how they work.

Gene targeting, while fairly new, isn’t as new as ZFNs or TALENs.  In the late 1980s, Cappechi, Evans and Smithies figured out that the cellular process of homologous recombination could be hijacked to swap out genomic DNA for a new sequence of interest.  Homologous recombination - where enzymes swap two similar pieces of DNA for each other – is a mechanism native to all living organisms.  Bacteria use it to exchange useful genes with one another; we animals use it to criss-cross our pairs of chromosomes during meiosis, which is important because it allows natural selection to act on individual genes rather than entire chromosomes. Otherwise a really beneficial gene could never enjoy positive selection if it happened to be just downstream from a really bad gene.  Anyway, Cappechi, Evans and Smithie figured out that if you could just create DNA homologous (i.e. similar, not identical) to an existing gene in an organism and get that DNA into the nucleus of a cell – which requires electroporation to permeabilize the cell membrane – there was some chance it would recombine and be integrated into the cell’s genome.  By applying this to mouse embryonic stem cells and recombining mutant, non-functional versions of genes into the mouse genome to replace working genes, they invented the knockout mouse, an achievement that won them a 2007 Nobel Prize.

That achievement has been the foundation of pretty much all reverse genetics (i.e. create a genotype, study the resulting phenotype) since then.  But just crossing your fingers and hoping for the homologous DNA to recombine is a fairly inefficient process – only a very small fraction of cells get successful recombination.

The process is inefficient in part because homologous recombination requires that first the DNA has to undergo a double-stranded break a relevant location, and then the repair mechanisms have to repair that break by fusing the new (rather than old) DNA to the broken end.  So if you were to sit down and brainstorm ways to try to speed this process up, you might hit upon the idea of cutting the genomic DNA first, to create the double-strand break.

Biology has just such a mechanism for creating double-stranded breaks: restriction enzymes. Some species of bacteria evolved these enzymes to chop up the DNA of invading bacteriophages (viruses that infect bacteria).  The bacteria methylate those sequences of their own DNA to prevent friendly fire. Brilliant.  For decades now, we humans have hijacked this mechanism for all manner of biology protocols.  If you think back, you might remember that in the days before sequencing, we used them for DNA fingerprinting in forensics and paternity tests.  Restriction enzymes have single nucleotide resolution in their binding specificity, so one SNP can be the difference between having a restriction site and not having a restriction site.  Therefore if you expose two different people’s DNA to the same restriction enzyme, it will cut them in different places, resulting in fragments of different lengths.  These differences in lengths, called restriction fragment length polymorphisms or RFLPs, featured prominently in the OJ trial.

The trouble with restriction enzymes is, their recognition sequences are really short, and are therefore any given restriction enzyme will cut in hundreds or thousands of places in the genome.  For instance the EcoRI and SamI restriction enzymes’ recognition sites:

…

Just six bases each.  What was needed, instead, was a restriction enzyme that would recognize a much longer sequence – long enough to be unique to one position in the whole genome.

Meganucleases are just that.  Despite the prefix, they don’t recognize a 1 million base pair sequence, just a 12-40 base pair sequence, but in many cases that is already enough to have one, and only one, recognition site in an entire organism’s genome.  So if the genomic site you want to target just happens to bear the sequence recognized by a known meganuclease, then you’re in business.  But there are 418/2 possible 18 base pair sequences you might want to target, and you’re not just going to stumble upon 418/2 different meganucleases to do the job.  People have tried using random mutagenesis screens to generate new meganucleases, with some success, but we’re still far from being able to target just any possible sequence.

Meganucleases have proven useful and you can buy them from Cellectis today – they’re by no means out of the picture.  But they never revolutionized science because they just weren’t programmable enough.  You couldn’t design them from the ground up to do your bidding.

Enter the Cys2His2 zinc finger protein.  These are sequence-specific DNA-binding proteins that act as transcription factors, i.e. regulating gene expression by binding selectively to gene promoters.  Lots of species have them, including us humans (the EGR1 gene codes a zinc finger transcription factor).  But while we have thousands of transcription factors that are no more ‘reprogrammable’ than a meganuclease, the beauty of the zinc finger is that it’s modular.  Each zinc finger has two to four domains, each with one α-helix which fits into the major groove of a DNA double helix and binds selectively to a 3 base pair sequence.

The most commonly used zinc fingers have three ‘fingers’ (DNA-binding domains), meaning you’ve got 9-base specificity.  The zinc finger proteins are then used in pairs, giving you 18-base specificity. This is enough to uniquely target almost any sequence you want to.

Zinc fingers, by themselves, just bind DNA.  So if all you needed to do was design a protein to selectively bind a promoter to activate a gene, you’d be done right there.   But if you want to use them to create targeted breaks in DNA, you need a pair of scissors to go with them.  Naturally you’d look around for another DNA-cleaving protein or domain of a protein that you could fuse to the zinc fingers.  All restriction enzymes have DNA-cleaving capability, but you know how complicated protein folding is: most of them have the DNA-cleaving capability all bound up with the DNA-binding capability and you can’t cleanly separate the two.  But the FokI restriction enzyme, isolated by Sugisaki & Kanazawa 1981,  has just what you need: a DNA-cleaving domain that has no sequence-specificity at all and can function completely separate from the enzyme’s DNA-binding domain.

Now picture two zinc finger proteins, each with three fingers for a total of 18 base pair specificity, bound up and downstream of a site of interest, and each fused to one FokI DNA-cleaving domain in between them.  That’s a zinc finger nuclease.  People have even been able to make the FokI cleavage domain on each protein be slightly different and be able to cleave DNA only as heterodimers (combination of the two different FokI domains) rather than homodimers, to reduce off-target effects.  You can also go longer for more specificity: CompoZr custom ZFNs, sold by Sigma-Aldrich under sole commercial license to sell ZFNs, recognize 24 – 36 bases.  (By the way, although ZFNs are patented you can still legally roll your own, and the Zinc Finger Consortium provides open source protocols for synthesizing ZFNs as well as software to design them.)

The design of zinc fingers to target a specific DNA sequence is not quite as perfect or elegant as you might hope: try Googling ‘zinc finger code’ and see if you have any better luck than I did.  I was hoping for a beautiful table showing which amino acid sequences in the alpha helix correspond to which DNA base triplets.  It seems it’s not quite that simple because the fingers interact with each other and change each other’s folding properties: “In both natural and designed multifinger ZFPs… individual fingers do not always function as completely modular units” [Hurt 2003].  So there is still a bit of trial and error and you can’t always target the exact sequence you want to and you might need to settle for another site a little ways away from your ideal site.

Zinc finger nucleases are all grown up at this point, having already made it to a clinical trial.    But all the buzz right now is about a new, comparatively untested but very promising alternative: transcription activator-like effector nucleases, or TALENs.

Like ZFNs and most other cool things in biology, humans didn’t invent TALENs from scratch – instead we hijacked biological tools from a highly specialized organism that had evolved to do what we want to do. TALENs, make use of DNA binding proteins from the Xanthomonas genus of bacteria. Xanthomonas are plant pathogens that produce proteins to specifically bind to and up- or down-regulate promoter sequences in host genes important to the disease process – much like a transcription activator would. Hence the “transcription activator-like” part of the name. The central region of the TAL protein is made up of 17 to 18 repeats of the same 34 amino acid unit, with amino acids 12 and 13 of each unit determining what single DNA nucleotide it binds to.  And unlike ZFNs, these units are pretty much completely modular and spell out an almost-unambiguous code for the targeted DNA sequence.  Here’s the beautiful elegant table you’ve been waiting for:

Amino acids 12 & 13 DNA letter targeted
NI A
HD C
NN G or A
NK G
NG T

(Sources: Cellectis & Zhang 2011)

So with, say, 17 repeats = 17*34 = 578 amino acids plus the non-repeating C- and N-terminal sequences of a TAL, you can uniquely target a 17 base pair DNA sequence.  Use two TALs and you can recognize 34 bases; put FokI DNA-cleaving domains in between them, and you’ve got a TALEN.

The process for synthesizing TALENs – creating your custom amino acid sequence to bind the DNA sequence of interest, adding FokI, and so on – is not exactly trivial. For a 20-page protocol on how to make your own, see Sanjana 2012, and for some open source resources to help you design it, see Cornell University’s TALE-NT tools.   Or you can buy a custom-designed one from Cellectis, which holds the exclusive commercial license on TALENs, for $5000. TALENs are quite new at this point but have already been used in some cool applications. Ding 2013 reports on using them to make isogenic models of human disease – pairs of cell lines that are (theoretically) identical except for the single site of a disease-causing genetic mutation. (This is what we at Prion Alliance have proposed to do for FFI using ZFNs in our Rare Disease Challenge proposal). In practice, the cell lines didn’t turn out quite identical: the TALENs exhibited minimal off-target effects, but simply growing cells in culture leads to the gradual accumulation of mutations here and there (just as aging leads to gradual accumulation of somatic mutations in cells in your body). Durai 2005, writing about ZFNs, says that making a double-stranded DNA break at an appropriate location can increase the efficiency of homologous recombination by 50,000-fold. But even with highly site-specific ZFNs or TALENs there is still no one ‘at the wheel’ making sure the ZFNs/TALENs and the new homologous DNA all end up at the right place at the right time and that DNA repair mechanisms stitch the chromosome back together in the way you want. So after you’ve used your ZFNs/TALENs you have to do sequencing to see what fraction of the cells got successful integration of the edits you were trying to make. In practice it will be some small-ish fraction of cells (say, anywhere from 1.6% to 34% in a recent report by Ding 2013) and so you’ll select those for your further experiments and cull the rest. That’s one reason why these technologies are still a long ways from allowing us to perform gene therapy in every cell of an adult human. But in the immediate term, these will be amazing tools for research, letting us create new models of human diseases. In the slightly longer term, there is also the possibility that they’ll play a role in making it possible to transplant modified versions of your own cells back into your body, like they did in HIV. While that’s harder to do with brain than blood, Stem Cells Inc has already brought human neural stem cell transplants to the clinical development stage, so don’t rule it out. update 2013-01-22: I just learned that the folks behind the Ding 2013 paper are sharing their plasmids through addgene.org as a public resource to help other researchers build their own TALENs. Awesome! Also another quick update: if you watched the Nature Biotechnology video on Youtube at the top of this post, you should be aware that this graphic at 1:24 is not really accurate: The above graphic depicts the template for homology-directed repair (HDR) as being of the same length as the piece of DNA that was cut out of the original genome. In fact, HDR depends upon homology between the new piece of DNA and the existing, uncut parts of the genome, so the introduced piece of DNA on top that you are trying to knock into the genome would need to extend beyond the points of the cut. So it should look something more like this: Although, while qualitatively more correct, even this graphic does not depict properly the length of homology required for HDR. For ZFNs, apparently the new template DNA can be just ~100b longer than the cut site on each side. That’s a big improvement over conventional knock-in without ZFNs/TALENs, which requires several kb of homology on each side of the desired mutation. update 2013-01-22 #2: The question arose today as to whether TALENs, like ZFNs, could be designed to only function as heterodimers so as to reduce off-target effects. The answer is yes: Cade 2012 uses ‘obligate heterodimer’ TALENs in zebrafish and compares the results to homodimeric TALENs. Predictably, the ‘obligate heterodimer’ TALENs have fewer off-target effects. update 2013-01-22 #3: We also had a discussion about costs for TALENs. As mentioned above, Cellectis sells custom-designed TALENs for$5000 a pair.  Buying the requisite kit from addgene will run you just $300, plus several days of labor (assuming you know what you’re doing!) And although U.S. patent application EP2510096A2 looks to preclude anyone but Cellectis from selling TALENs commercially, research institutions do produce them internally as a service to research groups within the institution. I’m told that the ‘all-in’ price for getting TALEN-modified cell lines runs about$14,000.  That includes design and assembly of TALENs, screening for cells that take up the TALEN plasmids, then screening for successfully modified cells and unmodified wild-type cells.

update 2013-06-07: Sigh – outdated no sooner than it was posted!  Now the hot new genome-editing technology is CRISPR, see e.g. Cong 2013.  Also a colleague pointed out that the screenshot from that YouTube video above is inaccurate in another way – most of the time you don’t need to make two double-stranded breaks, just one, in order to get homology-directed repair to kick in.