Molecular Biology 15: 'Gene regulation I - histones and the histone code'

These are my notes from lecture 15 in Harvard’s BCMP 200: Molecular Biology course, delivered by Timur Yusufzai on October 15, 2014.

Introduction to histones

DNA in nuclei is organized into chromatin. Chromatin is “the natural state of the genome”. Essentially all DNA is covered in or wrapped around proteins - there is no naked DNA to be found. Euchromatin is loose chromatin, whereas heterochromatin is condensed, closely packed chromatin.

Dividing cells have mostly euchromatin. Non-dividing cells have most genes turned off and therefore can condense their chromatin more into heterochromatin. Even within non-dividing cells, though, there is some euchromatin and it is physically separate from heterochromatin.

Chromatin structure can be influenced by:

Histone variants
Histone modifications
Chromatin remodeling
DNA methylation
Antisense RNA

A nucleosome is 147bp of DNA wrapped around a histone core. Yes, the term “nucleosome” refers to the DNA and the protein. DNA is wrapped around nucleosomes in a left-handed fashion (though some people believe that in a tiny minority of centromeric CENP-A histones the wrapping is right-handed).

Most nucleosomes are octamers containing two H2A-H2B dimers and one H3-H4 tetramer [see PDB# 1EQZ].

The term histone variants refers not to genetic variants but rather to different proteins that can be components of nucleosomes, for instance, CENP-A instead of H3. H2AZ is a variant which is suggested to form more stable nucleosomes than H2A [PDB# 1F66] though this is controversial.

Linker histones come in 7 subtypes in mammals and are involved in condensing chromatin [Routh 2008].

Histone modifications are post-translational modifications to histone proteins. The existence of histone modifications was discovered by [Allfrey 1964] by radiolabeling different chemical groups and mixing them into nuclear extracts and showing that ¹⁴C-acetate and ¹⁴C-methyl were incorporated into histones in the absence of translation, suggesting a post-translational mechanism.

We now know that histone modifications include:

Acetylation
Methylation
Phosphorylation
Ubiquitination
ADP-ribosylation
Sumoylation

The first three make up by far the majority of histone modifications.

H3 is the most oft-modified histone, and most of its modifications occur in its N terminus (for instance H3K4me3, H3K27me3). Lysine is modified by covalent attachment of one acetyl or one, two or three methyl groups to the nitrogen in its R group.

Factors involved in histone modification include “writers” (e.g. histone acetyltransferase, HAT), “erasers” (e.g. histone deacetylase, HDAC) and “readers” (see table below). There are many tens of proteins in each category, and they often function as part of large complexes involved in transcription and/or DNA repair. In fact, most transcription factor and DNA repair complexes include at least one histone-modifying enzyme.

Acetylation of a lysine silences its positive charge. Methylation does not change lysine’s charge. The two modifications each block each other.

Histone modification “readers” have a variety of reader domains specialized to different modifications:

reader domain	modifications recognized
Bromodomain	H3 and H4 acetylation
PHD fingers	H3K4me2/3
Chromodomain	Unmodified H3 and H3K4/K9/K27/K36 [me2/3]
WD40	H3K4me2
MBT	H4K20me1/2 and H3K4/9me1/2
Tudor	H4K20me1/2/3
14-3-3	H3S10p and H2S28p
BRCT	H2AX S139p
UIM	H2AK119ub and H2BK120ub

There are crystal structures demonstrating how many of these recognition events occur [e.g. Zeng 2008].

It took a long time to discover demethylation. Monomethylated lysine can be demethylated by the enzyme LSD1, which strips out one hydrogen from the nitrogen and one hydrogen from the methyl group, and adds these hydrogens to oxygen to make H₂O₂. In the second step it generates formaldehyde - CH₂O, and is a major contributor to the 0.1 mM levels of formaldehyde found in our blood.

New histone modifications continue to be identified, usually by mass spec. Once a modification has been identified, the next step is to raise an antibody against it. Once you have an antibody you can perform ChIP, Western blots, and IMF (what is that?). There are whole companies based on selling antibodies specific to various histone modifications. In a recent quality assessment, >200 commercially available antibodies were tested for their specificity to modified histones versus unmodified recombinant histones purified from E. coli [Egelhofer 2011]. Only about 2/3 possessed the specificity for which they were advertised - the remainder were non-specific or gave no signal at all. You can use peptide pull-down to identify readers of various histone modifications [e.g. Wysocka 2005].

Experiments using gamma irradiation to cause double-strand breaks have shown that each DSB causes H2AX to be phosphorylated at residue S139 across ~2 Mb of DNA. Mdc1 reads the H2AX S139 phosphorylation and recruits DSB repair factors [Stucki 2005, Lee 2005]. This whole process happens within seconds, reflecting the urgency of stabilizing DSBs. It is believed that the reason H2AX is phorphorylated across 2 Mb of chromatin is to amplify the signal so that it is recognized more quickly.

Controversies in this histone world

The definition of “epigenetic” is itself controversial. Conrad “Wad” Waddington originally coined the term “epigenetic” but it acquired more of its current meaning decades later, when Arthur Riggs used it to refer to “heritable changes in gene function that cannot be explained by changes in DNA sequence” (1996). At a CSHL meeting in 2008, “epigenetic” was redefined to specifically refer to “a stably heritable phenotype resulting from changes in a chromosome without alterations in the DNA sequence”, but some of the authors of that definition no longer agree with it. In particular, whether a modification must be “heritable” in order to be called “epigenetic” is controversial.

We also do not know how histone modifications are separated between daughter duplexes upon DNA replication. One model holds that H3-H4 tetramers from the parent duplex split into two H3-H4 dimers that go to each daughter duplex. An alternate model holds that H3-H4 tetramers never split, and instead that every other histone core goes to the each daugher duplex. One famous paper showed no histone splitting [Xu 2010] but subsequent work (including from that same group) has shown evidence for histone splitting. The issue remains controversial.

The “histone code hypothesis” holds that combinations of histone modifications encode instructions. A single H3 protein within a nucleosome has 10 lysines that can have 4 methylation states (un-, mono-, di-, and tri-), so each H3 alone has 4¹⁰ (= about 1 million) possible states. Then there are two H3 units per histone, and other modifications possible as well, and so on. But in reality, most histone modifications occur in groups. Usually one nucleosome will either contain a bunch of activating mutations or a bunch of silencing mutations. So although some people speak of a “code”, only a limited number of really distinct histone states are actually observed.