This past week I discovered that data from two enormous high-throughput screening efforts to find compounds that reduce PrP expression are available in PubChem.  In one assay, ~335,000 compounds were screened for their ability to reduce PrP translation, and in another, ~370,000 compounds were screened for their ability to reduce cell surface PrPC.  These are some exciting public datasets I plan to use as practice while teaching myself cheminformatics on this blog. Read on for details.


Previously I’ve reviewed high-throughput screens for antiprion compounds.  There have been several published efforts using a few different screening approaches, totaling over 30,000 molecules screened. In the time since I wrote that post, a new, much larger screen from the Prusiner lab was published, encompassing ~54,000 compounds screened in dividing and non-dividing ScN2a cells [Silber 2013].  That paper includes not only screening data but also a great deal of followup effort including in vivo pharmacokinetics on 28 compounds. That work is quite far along and will be the subject of my next blog post.

Then this past week I learned that results from two other unpublished screens are available in PubChem.  These datasets have been public in PubChem and I just never noticed:

Screenshot of PubChem while searching for antiprion screens

These screens don’t contain as much followup work yet but the number of compounds in the primary screens is huge, and since the data are all public this is just what I’ve been wanting to play with to learn some cheminformatics. In this post I’ll introduce the datasets and what I was able to learn about them.

PrP 5′UTR inhibitor screen

This dataset is deposited in PubChem under BioAssay ID 488894.  The study was completed in late 2010 under principal investigator Dr. Jack Rogers at Harvard.

Dr. Rogers studies translational regulation, iron metabolism, and Alzheimer’s disease. In particular, he’s interested in ways to downregulate translation of amyloid precursor protein (APP) as a therapeutic strategy for Alzheimer’s. The APP 5′UTR contains stem loops that act as iron response elements (IREs) which recruit IRE binding protein 1 or 2 (IREB1 and IREB2) to promote translation of APP. This binding event looked like a potential drug target, so several years ago, Rogers spearheaded two screens at the Columbia University Molecular Screening Center to find small molecules to inhibit or activate APP translation. In the primary screen, the APP 5′UTR was fused to luciferase to create a bioluminescence readout, i.e. the wells luminesce by default, but if a compound can inhibit the IREBs binding to the 5′UTR, then you get no signal.  For those assays, only the primary screening data are in PubChem.  The work on APP was first published at a preliminary stage [Bandyopadhyay 2006] and later eventually led to 13 hits which were published this summer [Bandyopadhyay 2013].

While he was at it, Dr. Rogers also tried this screening approach against alpha synuclein and PrP.  The screen for SNCA (alpha synuclein) 5′UTR inhibitors was a massive effort at the Broad Institute including several secondary screens and counterscreens, all of which are in PubChem. In fact, one of the counterscreens (of ~2000 compounds) was against the PrP 5′UTR, on the logic that they wanted compounds that specifically inhibited SNCA translation and didn’t affect other proteins such as PrP.  (Compounds that inhibit both SNCA and PrP translation are more likely to just be mucking up ribosomes or something undesirable).  The anti-synuclein effort led to one new small molecule probe which was published a couple of years ago [Ross 2011].  There was also a screen for SNCA translation activators.

And finally, of most interest to me, Rogers also conducted a primary screen of 335,011 compounds to find PrP 5′UTR inhibitors, summarized as assay 488894.  In fact, after the primary screen there were 7 confirmatory screens, checking the top hits to make sure they weren’t just luciferase inhibitors, didn’t also inhibit APP translation, and actually did inhibit PrP translation itself (as opposed to just inhibiting the luciferase – PrP 5′UTR construct).  The screen used the MLPCN compound library. The data from the primary and confirmatory screens were deposited in PubChem in 2010 but, as far as I can tell, findings were never published. That might mean there weren’t any hits worth following up on, so I won’t get my hopes up, but this should still be an interesting dataset to play with.

PrP-FEHTA times 370K

Earlier this year Corinne Lasmezas and the folks at Scripps Florida published their new PrP-FEHTA assay, a modified FRET assay for compounds that deplete cell surface PrPC [Karapetyan & Sferrazza 2013].  The initial publication included the screening method, results from a screen of the MSDI US Drug Collection, and followup work on two hits – tacrolimus and astemizole.  It appears that now these authors have extended their screen to the current MLPCN collection, totaling 370,276 compounds. Primary screen data are deposited as PubChem BioAssay 720640.  These data are brand new as of September 2013. As far as I can tell the methods, antibodies, cell lines, etc. are the same as described in the original paper [see supplement]. Compounds were screened in singlicate at 13.8 μM, and only the primary screening data are available at present.  Presumably the Scripps folks are working on secondary and counterscreens right now and this work will be published somewhere down the road. In the meantime, it’s exciting to be able to see the raw data!

next steps

There are all kinds of cool things one can do with these data. In my chemical informatics posts so far, all I’ve really done is learn the ways that data are stored, draw molecules calculate chemical properties, and do some principal components analysis on them. Next on my list has been to learn how to do SAR – finding molecular fragments and scaffolds associated with bioactivity in an assay. There are lots of packages that can do SAR out of the box – the Prusiner lab uses SARvision in their new paper [Silber 2013] – but I want to understand the principles so my plan is to do code my own exploratory analysis.

What’s also cool is that one screen (the 5′UTR inhibitor) is focused on one mechanism of action – blocking PrP translation – while the other (PrP-FEHTA) allows for a variety of mechanisms of action, including but not limited to translation. And a lot of the same molecules were included in both screens. So it should be possible to cross-check if the hits from one screen were active in the other. Plus there are the APP and SNCA 5′UTR inhibition data from the other Jack Rogers screens to see how specific the PrP 5′UTR inhibitors are.  I’d also like to look for the published antiprion hits from other screens, especially those with unknown mechanisms of action. There are many natural compounds and FDA-approved drugs with some limited amount of antiprion activity in [Kocisko 2003, Poncet-Montange 2011Silber 2013], for most of which the mechanism of action is unknown, and it will be interesting to see if any of those were active in these new screens.

I’ve set up a new git to hold all my code analyzing these datasets. It’s all open source, so feel free to follow along and/or contribute!