BroadE workshop on quantitative proteomics

These are my notes from the BroadE workshop “Quantiative proteomics in biology and medicine”, taught by various scientists from the Proteomics Platform at the Broad Institute, 12:00p - 5:00p on November 9, 2016.

Karl Clauser: fundamentals of peptide and protein mass spectrometry

Mass spec instruments have four responsibilities:

Create ions from analyte molecules
Separate ions by charge and mass
Detect ions and determine mass/charge (m/z) ratio
Select and fragment ions of interest to obtain structural information

Mass spec experiments can broadly be divided into two types. Discovery mass spec is global, for instance, proteome-wide — it detects everything. Targeted mass spec is the selective observation of just a few ions of interest. Broad has three types of instrument: the Fusion Lumos (good for discovery), the Quantiva (good for targeted MS), and the Q-Exactive Plus (good for both).

Most elements have more than one stable isotope. For instance, 98.89% of carbon is ¹²C, which has a mass of 12.0000 Da, while 1.11% is ¹³C, which has a mass of 13.0034 Da. 99.64% of nitrogen is ¹⁴N, which is 14.0031 Da, and the other 0.36% is ¹⁵N which is 15.0001 Da. Note that the mass contributed by the marginal neutron differs between a carbon nucleus versus a nitrogen nucleus (this has something to do with how tightly held together the nucleus is). High resolution mass spectrometers can resolve the distinct isotope peaks for ions containing 0, 1, 2, etc. different numbers of minor isotopes.

Peptides can be sequenced using an MS/MS intstrument. If you want to measure the masses of intact precursor ions, you only need one mass spec step, so the first instrument can simply scan, say, m/z ratios of 350-2000, and the second instrument can detect individual m/z values in that mixture. Alternatively, the first instrument can scan only a narrow range (for instance, m/z = 834-838) and then fling the ions at a collision cell which fragments them, with the fragments then detected by the second instrument. Peptides can fragment in several ways, for instance, b ion formation is cleavage of the peptide bond, whereas y ion formation involves moving a proton around. Sometimes, MS/MS can give a very clean spectrum that is readily interpretable and can be narrowed down to one or a few possible parent peptides. Other times, data can be sparse. However, much of the time, you don’t need to determine peptide sequences de novo — you simply look them up in a database of peptides that could arise from the translation of the human transcriptome. There are various algorithms for this. It turns out that peptides of <6 amino acids are rarely unique in the human proteome, and are generally ignored in data analysis, peptides of length 6-11 are often unique in sequence but not in amino acid composition, and peptides of >11 amino acids are usually unique in amino acid composition, so you don’t even need to sequence them [Clauser 1999].

So a modern MS/MS instrument can readily detect and quantify abundant peptides, separating them by retention time and then m/z ratio. Once you have measured peptides, you need to figure out what proteins they came from. Even for peptides of >11 amino acids, however, there are sometimes two possible proteins it could have come from — this is unlikely to arise by pure coincidence and instead usually comes from paralogs or conserved domains, etc. There are therefore various approaches for deconvoluting peptide abundances to yield protein abundances, though none of these algorithms are perfect. Fundamentally, one has four choices of which protein to assign the ambiguous peptide to:

The protein with the most peptides.
Both/all possible proteins.
None (just ignore this peptide).
The protein with the most similar quantitation as the peptide.

Today, most software presents the user with options 1-3 and the user must choose one approach to apply globally. Efforts are currently underway to develop more intelligent software that will make an intelligent choice between 1-4 on the fly for each individual peptide.

Here are the possible steps in a aample prep for proteomics (in order):

Solubilize the cells/tissue/fluid
Fractionate the protein, for instance by size (SDS-PAGE), solubility (detergents and centrifugation), and/or immunoenrichment or depletion
Digest the protein, usually with trypsin
Fractionate the peptides, for instance with immunoenrichment

There is no truly absolute quantification with mass spec. The end result of proteomics data analysis is always a ratio. For instance, the ratio of wild type to mutant, the ratio of unlabeled to labeled peptide, etc. If your goal is to quantify the relative abundance of a peptide under two different conditions (e.g. mutant vs. wild-type cells, compound-treated vs. untreated cells, etc.) then there are basically three approaches, each with a different “plex level” (how many comparisons can be made in one batch):

Unlabeled - just mix sample A and sample B. You can only do one comparison this way.
Chemically labeled - can do up to ~10 comparisons.
Metabolically labeled - (stable isotope labeling of amino acids in cell culture or SILAC) - can do up to 3 comparisons.

It is worth noting at least a couple of the fundamental differences between proteomics analysis and, say, RNA-seq:

Because there is no analog of PCR for proteins, there is no way to amplify signal. Mass spec must cope with a tremendous dynamic range, and often only the most abundant ~20-50% of features can be detected.
Because trypsin is specific to certain cut sites (after K or R except when followed by P), all peptides produced by a complete trypsin digestion are non-overlapping, so there is no overlap (produced in DNA and RNA sequencing by random shearing). This is actually an advantage which helps increase your dynamic range by avoiding redundancy (otherwise the dynamic range would be even worse).

Targeted MS differs from discovery MS in that you first mass-select for the peptide of interest before fragmenting it. Discovery MS is a powerful tool, but like any “ome”-wide approach, it has its false discovery rate, and many supposed biomarkers reported in the literature based on discovery MS experiments turn out to be false. Discovery is usually performed with only a small sample size (say, 10 samples). While this sample size is useful for hypothesis generation, the only way to validate a measurement as being biologically or clinically meaningful is to do quantitative measurements in a much larger number of samples. This translation from discovery MS based hypotheses to targeted MS validation is a principal challenge in proteomics today [Rifai 2006, Gillette & Carr 2013].

Namrata Udeshi: applications of quantitative discovery proteomics in biology and medicine part 1

Common questions that the Proteomics Platform at Broad is tasked with asking include:

Which proteins are significantly up- or down-regulated after a perturbation such as a drug treatment, or a gene knockout.
Which post-translational modifications are altered as a result of such a perturbation.
Which proteins are enriched in particular cellular compartments?
Which signaling pathways are activated in a particular cell state?
Which proteins interact with a protein of interest?

“Next generation” proteomics has arrived: in the past ~3 years, the number of distinct proteins and post-translational modifications (PTMs) that can be detected and quantified in human tissue samples has increased by 4- to 5-fold.

It has been estimated that the average human cell expresses ~10,000 distinct proteins, with concentrations spanning 7 orders of magnitude, from ~10 copies to multiple tens of millions of copies per cell [Beck 2011]. Deeper fractionation allows you to detect a broader dynamic range of these proteins [Mertins 2013].

Broad has large-scale methods in place for analyzing three PTMs: phosphorylation, ubiquitination, and acetylation. For reference, one recommended review [Aebersold & Mann 2016] has a discussion of PTM detection in proteomics. The principle here is that you get much better dynamic range by enriching for your (less abundant) proteins or PTMs of interest. For instance, phosphorylated peptides can be enriched by immobilized metal affinity chromatography (IMAC). Titanium dioxide and phospho-tyrosine specific antibodies are also methods that are used [Grimsrud 2010].

For a couple of good examples of how the Proteomics Platform has applied discovery MS to the study of PTMs or interacting proteins, see [Mertins 2014, Kronke 2014].

Philipp Mertins: applications of quantitative discovery proteomics in biology and medicine part 2

Spatial proteomics relies on labeling of proteins in specific cellular compartments. This can be accomplished using APEX, an engineered peroxidase enzyme which can be expressed as a fusion protein with a protein known to localize to a particular compartment of interest [Martell 2012]. This way you can specifically label proteins in that compartment. For instance they used this to selectively label mitochondrial matrix proteins [Rhee 2013].

Temporal proteomics relies on differential labeling of cells over time. You can get up to 3-plex labeling with SILAC using light, “medium”, and heavy media. For instance, grow cells in heavy media, and switch to medium media right before a perturbation of interest, all the while comparing to light media cell lysate as an internal control. They recently used this approach to characterize the temporal response to lipopolysaccharide stimulation in cells [Jovanovic 2015].

They also do global proteome profiling of human clinical samples, such as plasma or tumors. Often dynamic range can be improved by using techniques such as abundant protein depletion [Keshishian 2007, Shi 2012].

D.R. Mani: statisical analysis for proteomics and proteogenomics

This talk is a quick overview of how proteomics data are processed and analyzed.

Proteomics data are usually normalized in some way. Often you see proteomics data either Z scored (mean-centered and standard deviation (SD)-scaled) or median centered and median absolute deviation (MAD) scaled. They also sometimes use 2-component normalization. Here’s how that works. You assume that most proteins are unchanged (“unregulated”) between the two conditions being compared (log₂ratio ≈ 0) while a few proteins are differentially regulated (|log₂ratio| >> 0)). You therefore perform the normalization using the mean and SD (or median and MAD) of only the unregulated proteins. This method is better than naive normalization because it avoids underestimating the variance of the few truly differentially regulated proteins.

The goal of most discovery proteomics analyses is to identify a set of peptides or proteins whose abundance differs between two conditions. This is hypothesis testing, and so a few possible models are possible:

One-sample t test if comparing case vs. control to ask if log(case/control ratio) is different from zero.
Two-sample t test if comparing two conditions vs. control to ask if log(condition A/control) differs from log(condition B/control).
F test or longitudinal analysis if comparing multiple groups to ask if any of the log(group_i/reference) values are different from zero.

“Moderated tests” — versions of the above using Bayesian methods — are helpful for small sample sizes and are almost always used in discovery MS.

Multiple testing correction is usually performed using Benjamini-Hochberg FDR correction.

Sometimes the goal of the analysis is to identify a set of statistically significantly differential markers (SAM) for making predictions. This can use a range of techniques such as random forest or genetic algorithms.

Another common goal of proteomics analysis is to use clustering to ask whether proteomics reflects the same differences between, say, various cell types or various tumors, as established by individual markers or RNA-seq analysis.

Finally, proteomics data often feeds into pathway enrichment analysis. Specifically, you end up asking whether one list of things (say, differentially regulated proteins from a proteomics experiment) is enriched within another list of thing (say, genes assigned to a particular pathway). This is usually done using a Fisher’s exact test with Benjamini-Hochberg FDR correction. Alternatively, GSEA [Subramanian 2005] is a popular tool which doesn’t just compare membership in list 1 to membership in list 2, it compares a quantitative value for list 1 (e.g. each protein’s enrichment score in a proteomics experiment) versus membership in list 2 (e.g. genes in a pathway).

Monica Schenone: affinity enrichment proteomics - applications in chemistry and biology

Phenotypic screening can yield small molecule hits that are active in a cellular context. GWAS and other genomic association methods can point to loci that harbor susceptibility alleles for disease. In each case, we know that something (a small molecule or a gene) is active but we don’t know how or why — we don’t know what that thing does. Affinity enrichment proteomics is one approach to answering this question. The approach basically relies on guilt by association — for instance, a protein that is physically associated with a small molecule might be its mechanistic target.

Classical affinity enrichment relies on running proteins out on a gel and looking for bands that are present or enriched only in the experimental condition and not in the control condition. This is very vulnerable to differences in wash conditions and differences in sample handling, and the information you get out is not super quantitative. Mass spec offers a better approach.

While the entity of interest could be a protein, a small molecule, or even a nucleic acid, the fundamental approach is the same: affinity enrichment by tethering the entity of interest to solid support. In each case, it is considered essential to use the most relevant possible cell model — for instance, the disease-relevant cell type for a GWAS hit, or the cell type in which a small molecule hit was identified as bioactive.

Typical experimental details:

1e7 cells per condition, for ~1 mg total protein
Assay development step to optimize conditions and parameters such as input protein, affinity matrix amount, wash conditions, choice of control
The experiment should be replicated with a label swap at a minimum.
Performed on the Q-Exactive with unbiased data acquisition methods.
Data processing uses Maxquant for SILAC or SpectrumMill for iTRAQ, and statistical analysis uses a moderated t test and mixed modeling.
The final output of the experiment is a rank-ordered list of candidate interactors with nominal p values and q values after FDR correction.

For identifying the target of a small molecule [Ong 2009], the currently favored experimental design is as follows. An analog of the small molecule, confirmed to still be bioactive, is tethered to solid support and used to affinity enrich proteins from two conditions: cells treated with DMSO, and cells treated with DMSO containing the free compound. The principle here is that the free compound will block the target protein, so you should see enrichment of that target exclusively in the unblocked (DMSO) condition. For recent examples see [Chou 2015, de Waal 2016].

Karsten Krug: building interactive data analysis tools

A typical data analysis work flow in, for example, the affinity enrichment experiments described above, would be to process the raw data through SpectrumMill or similar to obtain a table of proteins with expression values, then QC and filter the data (for instance, removing common contaminants and selecting only proteins quantified by 2 or more peptides), then perform normalization (such as Z scoring), then do statistical analysis such as moderated t tests. In order to create a solution that lets users control this workflow without needing to write code themselves, the Proteomics Platform is developing a Shiny app.

Hasmik Keshishian & Jake Jaffe: targeted MS and its application in biology and medicine

As mentioned earlier, discovery MS is unbiased and proteome-wide, whereas targeted MS specifically aims to detect and quantify one or a few peptides of interest [Gillette & Carr 2013, Carr 2014]. Targeted MS can be 50-1000X more sensitive than discovery MS. Targeted MS can be multiplexed to measure >150 peptides at once [Ippoliti 2016]. Targeted MS gives you a quantification of every selected peptide in every experiment, because you’re looking specifically for each peptide so if you don’t see it you can say with confidence that its abundance was below the limit of detection, whereas if you don’t see something in discovery MS, that could just be chance. Quantification in targeted MS is highly precise and can detect small changes in peptide abundance.

Multiple reaction monitoring (MRM, also known as selected reaction monitoring or SRM), is a targeted MS method performed on triple quadrupole mass spectrometers. Parallel reaction monitoring (PRM) is a different targeted MS method performed on MS/MS machines. In MS/MS, you first mass-select a peptide ion, then fragment it and detect all the fragment ions, whereas in MRM you fragment and then detect only selected fragment ions. MRM is the most sensitive method.

In developing an MRM assay, the first question is which peptides you want to detect. You can use prediction algorithms and proteomics data from public databases, but the best approach is to do your own unbiased experiments to figure out which peptides are detectable for your protein of interest. Then you choose one amino acid in each peptide of interest, usually the C-terminal one, and order a synthetic peptide where that particular amino acid is isotopically labeled. You then spike the heavy peptides into the samples before running the mass spec. Quantification in MRM is based on comparing the area under the peaks for light vs. heavy peptides. In principle this provides absolute quantification, although the fact that the heavy peptides don’t go through all the same sample processing steps as the light peptides means that data are better interpreted as a relative quantification which is precise but not necessarily accurate [Gillette & Carr 2013].

Once you have the MRM assay up and running, you want to validate it. Run some samples in triplicate to assess the precision of the assay across techincal replicates. Run a dose response or dilution series to determine the assay’s lower limit of detection (LOD) and the lower limit of quantification (LOQ). Then test it on a variety of actual clinical samples to confirm that the assay works for each of them.

If done right, MRM can be a very robust and quantitative assay. Many studies of MRM have been done in human plasma, a fluid in which a few incredibly abundant proteins comprise the majority of total protein. With abundant protein depletion and minimal fractionation [Keshishian 2007], a stable isotope dilution (SID) MRM assay can often achieve a lower limit of detection for a protein in plasma around ~100 ng/mL [Gillette & Carr 2013] and sometimes as low as 1-10 ng/mL with a ~3-15% coefficient of variation (CV) [Keshishian 2007]. If you need more sensitivity than that, you can first immunoenrich, either with an antibody to the whole protein, or multiple antibodies to each individual peptide of interest (the latter approach is called SISCAPA) [Whiteaker 2011]. An 8-plex MRM assay with SISCAPA can achieve an LOD of ~1 ng/mL in a 30 μL volume of plasma [Whiteaker 2011]. Studies using known quantities of protein spiked into human plasma have demonstrated good intra- and inter-laboratory reproducibility for MRM both with [Kuhn 2012] or without [Addona 2009] SISCAPA. Note that the CV depends on where you are within the assay’s dynamic range, and is higher near the lower limit of detection.