Chemical biology tutorial 1: cMAP workshop

These are my notes from tutorial 1 in Harvard’s Chemistry 101: Chemical Biology Towards Precision Medicine course, taught by Dr. Paul Clemons on September 24, 2015.

Profiling small molecules by measuring multiple cellular responses, possibly in multiple cell types, can be a way of asking whether a small molecule does something new, or something we’ve seen before. The multiple responses measured could be mRNA transcripts, morphological features, cytotoxicity, or mining of diverse assay readouts from public databases such as PubChem.

Connectivity map (cMAP) [Lamb 2006] exists to catalog small molecule responses in a common languge: genome-wide mRNA expression profiles. For an early example of using it to generate a therapeutic hypothesis, see [Wei 2006]. The data now publicly available are cMAP 02, which was released in July 2008.

Queries can only have 500 up probes and 500 down probes at a maximum. To get down to this number, you can change your cutoff for what qualifies as an “up” or “down” probe — +.67 and -.67 are usually good.

The simplest task you can do in cMAP is query > instance query, which lets you choose the already-existing data on one of the compounds used in cMAP, and check for signatures that correlate with that compound.

The underlying data for cMAP are in Affymetrix ProbeSet ID space. If you have a list of gene symbols — for an example see Table 1 of [Rieck 2009] — you will need to convert these gene symbols to Affymetrix ProbeSet IDs, use the “NetAffx Query” tool available here. Export a plain text file with one ProbeSet ID per line and a .grp file extension, and upload one file with all “up” probes and one with all “down” probes into cMAP via the query > load signature dialogue. You can then use these data as a signature to query against, using the query > signature query menu. In the example discussed here [Rieck 2009], the authors had noticed that pancreatic beta islet cells, which are usually non-dividing, sometimes divide during pregnancy. So they measured changes in beta islet cell gene expression during pregnancy, and then queried cMAP to see if there were compounds that could induce this “pregnancy-like” state.

The next generation of cMAP uses the Luminex 1000 (L1000) assay to measure levels of 1000 “landmark” transcripts and, from those, infer the levels of other correlated transcripts. This is much faster and higher throughput than Affy arrays. Thus, whereas the 2008 version of cMAP only included 3 cell lines (of which 85% of measurements were taken in one cell line), the new cMAP includes a vastly greater number of cell lines, as well as more compounds and more genetic perturbations.

Profiling of cell morphological measurements via “cell painting” with many fluorescent probes and high-content image analysis using CellProfiler has also emerged as another “universal language” for what compounds do [Gustafsdottir & Ljosa 2013]. This has been used to perform “lead-hopping”, in which you define a signature (in gene expression and cell morphology space) of an active lead compound you’re interested in, discovered through some other assay, then run a similarity search for other compounds that yield similar signatures. If you find a novel, structurally unrelated compound with a similar signature, then you run it through the original assay again. This is useful if your original compound has a desired activity but also has some other liability, such as difficulty or expense of synthesis, poor pharmacokinetics, or off-target toxicity.

It is also possible to use these gene expression and morphology signatures to explore structure-activity relationships, asking which chemical moieties are correlated with a given signature [Wawer 2014a]. And you can quickly check whether a novel compound has a novel activity or a known activity [Wawer 2014b].

Another resource is the cancer therapeutics response portal, a collaboration between Broad and Novartis [Basu 2013]. It is a richer genomic characterization of cancer cells than was available previously. The phenotype of interest in studying cancer cells is cell death. Cells were characterized in terms of genetic mutations, gene expression, phosphoproteome, etc, and then dose-response kill curves were measured for each cell type for hundreds of compounds.