Read with caution!

This post was written during early stages of trying to understand a complex scientific problem, and we didn't get everything right. The original author no longer endorses the content of this post. It is being left online for historical reasons, but read at your own risk.

I just got back from Jeremy England‘s lecture at MIT this afternoon entitled “Shape shifting: protein statistical physics as a linear programming problem” in which he described his lab’s approach to modeling protein folding.

Beginning with an overview of the field, he observed that a lot of study of protein folding is devoted to “high resolution” models of proteins– i.e. down to the individual hydrogen bond.  X-ray crystallography can deliver this level of resolutions for proteins in their crystal state, but in silico modeling at that resolution is computationally too large a problem and so accurate high resolution models cannot always be made.  Moreover proteins do not fold in isolation but rather in interaction with the cytoplasmic environment and often with chaperones or in particular pockets of the cell, so the problem of protein folding should be “more than one problem.”

Instead, England proposes that delivering a low-resolution but holistic picture of the protein’s overall structure and behavior can be both doable and informative.

He introduced a fairly simple model which considers free energy to be  a function solely of steric replusion between atoms and the various degrees of hydrophobicity of various amino acids.  Amino acids which are relatively more hydrophobic than others in the protein “want” to be at the center of the protein’s folded structure, while less hydrophobic ones “want” to be at the outside.  One can use linear programming to minimize the free energy under the constraints that (1) amino acids that are adjacent to each other in the polypeptide chain have to stay adjacent, and (2) the average distance of the protein’s mass from its center must be (3/5)1/2R, where R is the radius of the folded structure.

Linear programming thus yields a variety of “burial modes” for the protein– i.e. ways of burying the hydrophobic amino acids in the structure’s center.  By averaging across several of the lowest-energy burial modes, one can find a plausible structure for the protein, and if you graph each amino acid’s predicted distance from center versus actual (for known structures) the two curves share a lot of peaks and valleys.

Of course, compared to all the complexity of proteins, the model is ridiculously simple– of all the amino acids’ properties (size, polarity, aromatic/non, etc) only hydrophobicity is considered; secondary structures (α-helices, β-sheets) are not modeled at all.

But what is valuable about this model is that it allows you make reasonable predictions about the allosteric properties of the protein.  Allostery is when a binding (ex. of a small molecule) at one site in the protein induces a conformational change which alters the shape of other parts of the protein.  England finds that the parts of proteins with high variance in distance from center across burial modes are the parts most likely to undergo change when allosteric binding occurs somewhere else.  He also finds that if two parts of the protein are strongly covariant, then binding at one site is likely to alter the other.  The actual binding sites and corresponding conformational changes for proteins with known structure apparently correlate well with England’s model.

As a disclaimer, England noted that the model really only works for one protein domain as it assumes that all of the most hydrophobic elements want to be in a single center, rather than the protein having multiple centers.  The viable range of the model is about 100 – 300 amino acids.

One audience member asked if the model could be used to predict how two proteins (say, two of the same protein) would interact with one another, and the answer was certainly yes, and it will, but I need more students to do the work!

For further questions England directs us to his website: