These are my notes from lecture 21 in Harvard’s BCMP 200: Molecular Biology course, delivered by Stirling Churchman on November 3, 2014.


The invention of microarrays led to a focus on mRNA abundance. This is distinct from mRNA identity. mRNA identity actually has several dimensions:

  1. Covalent identity, which arises from:
    • splicing
    • RNA editing
    • Transcription start and stop sites
    • polyadenylation (start site and length of polyA tail)
    • 5’ capping
  2. Non-covalent identity, which arises from:
    • Secondary structure
    • Tertiary structure
    • Bound proteins

Note that naked ssRNA is a highly toxic species due to its ability to form duplexes with DNA. Therefore from the moment it is transcribed it is bound by proteins which protect it.

mRNA identity (on all these dimensions) affects the composition of the resulting protein, the rate at which the message is translated and degraded, and where it is localized subcellularly.

Open questions in the field include:

  1. How are co-transcriptional processes coordinated with transcriptional elongation
  2. How does RNA pol II break through physical barriers along the DNA?
  3. How does RNA pol II handle “genomic traffic”
  4. How does transcriptional pausing affect mRNA identity and abundance

Transcription is highly conserved between prokaryotes, eukaryotic nuclei, and mitochondria, and much of our knowledge of the process comes from bacteria.

Transcriptional pausing

RNA pol II will pause at replication forks, nucleosomes, tightly bound DNA-binding proteins, and even sometimes at naked DNA. Bob Kingston developed a system to study transcriptional pausing a few decades ago [Kingston & Chamberlin 1981]. More recently the Block lab has developed a new system called “optical tweezers” [Herbert 2006] which they use to study transcription kinetics:

They’ve used this system to study the effects of NusG, an elongation factor in bacteria [Herbert 2010]. Carlos Bustamante’s group has found that nucleosomes cause transcription pausing in eukaryotes [Bintu 2012]. DST1 (the prokaryotic version of TFIIS) helps RNA pol II to backtrack in order to recover from backtracking.

Why does pausing exist? It is not universal: for instance, T1 phage polymerase rarely pauses, and people have identified point mutations that will make prokaryotic or eukaryotic polymerase pause less often. Therefore, it is thought that polymerase pausing must be an evolved mechanism which was positively selected-for. In some cases it has been shown to:

  1. allow RNA pol II to proceed through nucleosomes
  2. help to allow RNA secondary or tertiary structure to form
  3. allow co-transcriptional processes to occur
  4. improve fidelity by allowing for repair of mis-incorporated bases and subsequent backtracking

To study transcriptional pausing in vivo we would ideally like to measure RNA pol II density at high resolution, which isn’t yet possible. The closest things so far are:

  • RNA pol II ChIP and ChIP-seq
  • Global Run-On Sequencing (GRO-seq)
  • Native elongating transcript sequencing (NET-seq), which will be the subject of Churchman’s research seminar

RNA pol II ChIP has, at best, 50 bp resolution.

GRO-seq was developed by the Lis lab [Core 2008]. You purify nuclei and break up the nuclear envelope, wash out native NTPs and add in ATP, GTP, CTP and Br-UTP (a labeled uracil). You then let transcription resume very briefly, only for 50-100bp, then use an antibody against Br-UTP to purify out the newly transcribed pieces. Thus you can figure out where RNA pol II was when you added the Br-UTP. GRO-seq has a couple of disadvantages - first, its resolution is still only ~50bp, though they recently developed a new version called PRO-seq which comes close to 1 bp resolution. Second, the incorporation of labeled nucleotides occurs in vitro, so it is not guaranteed that the results will 100% represent what goes on in vivo - for instance, you might wonder if perhaps some factors associated with RNA pol II might not re-activate in vitro and so <100% of transcripts that were being transcribed will continue.

Nonetheless, GRO-seq has helped elucidate a lot of transcription biology, including the phenomenon of promoter-proximal pausing, which occurs at many genes. In promoter-proximal pausing, RNA pol II starts transcriibing and then pauses right away for a long time. It may be poised, waiting for permissive chromatin or for heat shock or developmental signals so that transcription can continue instantly when the signal does arrive. It may also allow integration of multiple signal sources - maybe one signal tells it to initiate and another tells it to elongate, and together the promoter-proximal pausing creates an AND logic gate for these two signals. Finally, it may just serve as a checkpoint early in the elongation process.

The C-terminal domain (CTD) of RNA pol II is important in regulating elongation. Again, this was the subject of [Kwon 2013] which we covered in BBS 230 week 5. It is made up of (in yeast) 26 tandem repeats of YSPTSPS, with only a few of the 26 copies having any divergence from this heptapeptide sequence. In humans, there are 52 repeats. The amino acids in the CTD are heavily post-translationally modified - chiefly phosphorylation of the Y, T and S, and cis-trans isomerization of the P. Glycosylation, ubiquidization, and methylation are sometimes seen as well. ChIP-seq has shown an enrichment of S5 phosphorylation at transcription start sites, whereas S2 phosphorylation is seen later in genes [Mayer 2010].

The general model that is emerging is that the RNA pol II CTD goes through different stages of phosphorylation at different stages in elongation. At the beginning it undergoes S5 phosphorylation and binds Cet1 (for 5’ capping) and Nrd1. By the middle it undergoes S2 phosphorylation and binds Spt6 and Set2 which are general elongation factors which inhibit transcription termination. By the end it binds Rtt103 and Pcf11 and this enables polyadenylaton and 3’ termination.

Say you heard that Y1 could be phosphorylated and you want to know more - a good first step would be to raise an antibody specific to the CTD with Y1p and perform ChIP-seq - indeed, that’s how the Cramer lab started to characterize Y1 phosphorylation [Mayer 2012]. The most difficult part was raising a specific antibody. They used a panel of synthetic 14-mer peptides to confirm that their antibody, named 3D12, only pulled down peptides with Y1p. The result of the ChIP-seq was that Y1p turns out to peak near the end of transcripts, and to drop off shortly right before the polyadenylation site, and this is true across a bunch of differently sized genes. Because Y1p disappears around the same time as transcription termination factors arrive, they hypothesized that maybe Y1p’s function is to prevent a transcript from being terminated until “its time has come.” They therefore synthesized 14-mer peptides with or without Y1p and tested whether they would bind the CTD-interacting domains of a bunch of different factors, using fluorescence anistropy. In this technique, you fluorescently label the synthetic peptide and you measure the angles at which it emits light. If the peptide is bound to another protein, it will “tumble” more slowly, and so a series of photons will be emitted almost in parallel, whereas when unbound, it will “tumble” rapidly and each photon will be emitted in a wildly different direction.

The role of dynamic CTD phosphorylation in transcription elongation is reviewed in [Heidemann 2013].