In our study last year, by comparing case and population control allele frequencies, we managed to come to conclusions about the penetrance of only 14 out of the now 66 reportedly pathogenic PRNP variants [Minikel 2016]. The remainder were too rare, both in prion disease cases and population controls, for allele frequency comparisons to reach any meaningful conclusion. But it would be very useful to have at least an educated guess about the penetrance of the other 53 variants. Some are probably high penetrance (say, >50% lifetime risk), others might confer a risk that is increased above the population baseline but still low, and some might be completely benign. Where to begin?

To start to get at this problem, I recently took a deeper dive on the literature about these 66 reportedly pathogenic variants, looking at the human genetic evidence for each variant’s pathogenicity. Here are the criteria I looked at:

  • Mendelian segregation. In a group of closely related individuals, if most or all of the individuals with a particular genetic variant develop a particular disease, and the people without that genetic variant do not develop the disease, this is called Mendelian segregation. Every geneticist agrees that segregation is a valuable clue in investigating pathogenicity, but the context matters a lot. The American College of Medical Genetics (ACMG), for instance, considers segregation to be only “supportive evidence” (as opposed to “strong” or “very strong” evidence) of pathogenicity, because all it means is the locus (a segment of a chromosome) is linked to disease, it doesn’t prove that one specific variant is causal [Richards 2015]. But in prion disease, there is only one causal gene, PRNP, all known pathogenic variants therein are protein-coding, and there is just one short open-reading frame that can easily be sequenced. The upshot is that if a rare protein-altering variant in PRNP segregates with prion disease, it is the causal variant. Investigators looking to establish a novel gene as disease-causing might like to see segregation in two different families before they’re confident, but for a novel variant in a well-established disease gene, I think that segregation even in one family (especially if that’s the only family with the variant) is pretty strong evidence (perhaps not definitive proof) of fairly high penetrance. I therefore looked through the literature to see, for each variant, if there was even one family where there were at least three closely related affected individuals in a pattern consistent with Mendelian segregation. If there was, I considered this evidence for high penetrance.
  • De novo variants. Just because a disease is genetic doesn’t mean it’s inherited — the average person has ~60-70 de novo (spontaneous) mutations in their genome, mutations that neither of their parents had, ~1 of which falls in a protein-coding portion of a gene [Michaelson 2012, Kong 2012]. And if a person with a de novo mutation in PRNP has prion disease, that mutation is probably highly penetrant. As a back of the envelope, if the average person has only about 1 protein-coding de novo SNP or indel in their entire genome, and there are ~20,000 genes, of which PRNP is one of the smaller ones, and only ~20,000 prion disease cases have ever come to the attention of the modern medical establishment. Therefore, it’s unlikely there’s ever even been one individual who had sporadic prion disease and just happened to have a benign de novo in PRNP just by coincidence. Others who do variant classification seem to agree — ACMG considers de novo status to be “strong evidence” of pathogenicity [Richards 2015]. I considered de novo status to be evidence for high penetrance.
  • Homozygotes. Almost all cases of genetic prion disease are in heterozygotes — people with just one mutant copy of the PRNP gene. But a few variants have been seen in homozygotes. For E200K, which has ample evidence for high penetrance and is found in some dense founder clusters around the world [Lee 1999], this isn’t too surprising. But for a couple of variants that don’t have evidence for high penetrance, the presence of an affected homozygote is suggestive that the variant at least confers an increased risk of prion disease. That’s because these PRNP variants are so rare that even one affected homozygote can represent a very unlikely-by-chance deviation from Hardy-Weinberg equilibrium. Let me explain. The variants in question have allele frequencies «0.1% in the general population (based on ExAC continental populations). A variant with, say, AF 0.1% has a het frequency of 2 in 1,000 and, under random breeding, a homozygote frequency of 1 in 1,000,000, so there are 2,000 hets out there for every 1 homozygote. Consider that 1 of the 4 known affected V203I individuals is a homozygote [Komatsu 2014], and 1 of 3 affected Q212P individuals is a homozygote [Beck 2010, Minikel 2016]. In both cases, the homozygotes had no family history of the disease on either side of the family. Without doing any math, it’s clear that these numbers are fairly unlikely to happen by chance if the variant confers no risk. Here, a likely explanation is that one mutant allele confers an elevated but still low risk, while two mutant alleles confer a higher risk. Thus, in these cases, an affected homozygote provides some evidence that a variant confers risk increased above the baseline.
  • Case/control enrichment. For completeness, I also noted the handful of variants for which we have evidence that the variant is more common in cases than controls [Minikel 2016], as this is evidence for increased risk or, with very strong enrichment, high penetrance.

All that said, here’s what I found (last updated 2017-06-27):

variant evidence for high penetrance evidence for increased risk refs comments
P39L     Bernardi 2014  
2-OPRD     Beck 2001, Capellari 2002  
1-OPRI     Laplanche 1995, Pietrini 2003  
2-OPRI     Hill 2006  
3-OPRI     Nishida 2004  
4-OPRI     Kaski 2011 most cases have negative family history
5-OPRI Mendelian segregation   Mead 2007  
6-OPRI Mendelian segregation   Mead 2006  
7-OPRI Mendelian segregation   Goldfarb 1991  
8-OPRI Mendelian segregation   Goldfarb 1991, Laplanche 1999  
9-OPRI Mendelian segregation, de novo   Krasemann 1995, Sanchez-Valle 2008  
12-OPRI Mendelian segregation   Kumar 2011  
P84S     Jones 2014  
S97N     Zheng 2008  
P102L Mendelian segregation case/control enrichment Webb 2008  
P105L Mendelian segregation   Yamada 1999 2 sibs affected & genotyped, 1 ungenotyped parent likely affected
P105S     Tunnell 2008  
P105T Mendelian segregation   Rogaeva 2006  
G114V Mendelian segregation   Rodriguez 2005, Liu 2010 pedigree suggests penetrance high though not 100%
A117V Mendelian segregation case/control enrichment Hsiao 1991  
129insLGGLGGYV de novo   Hinnell 2011  
G131V     Panegyres 2001, Jansen 2011 positive family history in one case
S132I Mendelian segregation   Hilton 2009 extensive family history, only proband genotyped
A133V     Rowe 2007  
Y145X     Kitamoto 1993  
R148H     Krebs 2005  
R156C     Kenny 2017  
Q160X Mendelian segregation   Fong & Rojas 2016  
Y163X Mendelian segregation   Mead 2013  
D167G     Bishop 2009  
D167N     Beck 2010  
V176G     Simpson 2013  
D178Efs25X Mendelian segregation   Mastuzono 2013 only proband genotyped
D178N Mendelian segregation, de novo case/control enrichment Medori 1992, Dagvadorj 2002  
V180I   case/control enrichment Hitoshi 1993  
T183A Mendelian segregation   Nitrini 1997  
H187R Mendelian segregation   Butefisch 2000  
T188A     Collins 2000  
T188K     Roeber 2008 multiple cases with negative family history
T188R     Roeber 2008, Tartaglia 2010  
T193I     Kotta 2006  
K194E     Takada 2017  
E196A     Zhang 2014  
E196K Mendelian segregation   Peoc’h 2000 only proband genotyped
F198S     Hsiao 1992  
F198V     Zheng 2008  
E200G     Kim 2013  
E200K Mendelian segregation homozygote, case/control enrichment Hsiao 1991  
T201S     Parvez 2010  
D202G Mendelian segregation   Heinemann 2008 only proband genotyped
D202N     Piccardo 1998  
V203I   homozygote Komatsu 2014  
R208C     Zheng 2008  
R208H     Mastrianni 1996  
V210I   case/control enrichment Ripoll 1993, Pocchiari 1993  
E211D Mendelian segregation   Peoc’h 2012 supplement describes 1 family with 3 affected
E211Q     Peoc’h 2000 2 sibs affected
Q212P   homozygote Beck 2010  
I215V     Munoz-Nieto 2013  
Q217R     Hsiao 1992 2 affected
Y218N Mendelian segregation   Alzualde 2010  
A224V     Watts 2015  
Y226X     Jansen 2010  
Q227X     Jansen 2010  
M232R   case/control enrichment Hitoshi 1993  
M232T     Bratosiewicz 2000  
P238S     Windl 1999  

In total, then, 25 out of the 66 have evidence for either Mendelian segregation or de novo status according to these criteria. These are all likely to be high penetrance variants. For some of these we can say definitively that penetrance is high, when the family is large or when there is dramatic case/control enrichment. For the rest it’s likely, although it’s conceivable for some variants that the penetrance is somewhat more modest and maybe there just happened to be a family with three affecteds by coincidence.

There are also probably some variants that are genuinely high penetrance but have no boxes checked above. For instance, sometimes a family history just isn’t available for a patient, or there is a history of disease but the family never speaks about it and so the younger generation doesn’t know, or the family history appears negative only due to adoption or a non-paternity event, or the variant is de novo but the parents are already deceased and there are no siblings, so it is impossible to prove that it’s de novo. When there’s only one patient and the variant is also ultra-rare in controls, it’s hard to say anything completely definitive.

Overall, then, I am not asserting that any of the criteria above are proof positive for a particular risk classification, but I think they each make a classification more or less likely, and are worth noting. Many papers on genetic prion disease include a figure with a diagram of PRNP’s coding sequence, sometimes with elements of protein secondary structure noted, and all the reportedly pathogenic mutations indicated. I set out to make a new such figure with variants shaded by their level of human genetic evidence (evidence for high penetrance, evidence for increased risk, or no evidence) and sized by the number of cases in our recent case series [Minikel 2016]. Here’s what I came up with:

code to produce this plot

As a future direction, there are additional sources of information that should be useful in classification that I haven’t gotten into here but may go through and annotate in the future:

  • More could probably be done with allele frequency comparisons — for instance in some cases the sheer frequency in controls is too high for a variant to be highly penetrant, and can be enough to suggest that certain variants are likely benign.
  • Multiple isolated cases without family history provide some evidence against high penetrance.
  • CpG variants with a low case count are probably not high penetrance. CpG variants (C → T transitions where the next base is G) are the most frequent type of DNA mutation, occurring 10X more often than non-CpG transitions and 100X more often than transversions [Samocha 2014, Lek 2016]. CpG variants are responsible for all three of the most prevalent, most highly recurrent PRNP mutations in cases (P102L, D178N, and E200K). Based on mutation rates, we can estimate that other CpG variants in PRNP have probably arisen very roughly about as many times in the world population as those three, so they’ve had roughly as many chances to produce prion disease cases, if they were highly penetrant. So if you see a CpG variant that has few cases, and no Mendelian segregation, it’s likely to confer at worst a low risk of disease, and perhaps no risk at all.