motivation

The concept of heritability is defined roughly as “the proportion of the variation in a trait can be explained by genetic factors”.  There are a variety of approaches to calculating heritability based on comparing identical vs. non-identical twins, natural siblings vs. adopted siblings, groups of family members or even unrelated individuals.  All of these approaches share a unifying theme in that they ask the question “are more genetically similar individuals also more phenotypically similar than would be expected by chance?”  By asking this question, these methods can try to estimate the proportion of phenotypic variance in the population that appears to track with the genome – even if we can’t pinpoint any particular genetic variant as being associated with the phenotype.

Meanwhile, genome-wide association studies (and more recently exome sequencing studies) try to find particular genetic variants (or at least loci) associated with a trait. For disease traits, studies of families with Mendelian forms of the disease will also reveal particular genes involved in the disease.

Since the point of heritability studies is to determine the total contribution of the genome to a trait, it would be surprising if a particular trait were estimated to have zero heritability, yet had specific genetic variants associated with it.  If we had an example of such a trait, perhaps it could teach us a lot about what we’re doing wrong when we try to estimate heritability.

It was recently suggested to me that Parkinson’s disease may be just such a trait.  The claim is that Parkinson’s has many known genetic risk factors yet was for years believed to have zero heritability and thought to be, therefore, purely sporadic in nature.

The purpose of this post is to dissect this claim by reviewing the genetic factors involved in Parkinson’s disease and the literature on heritability of Parkinson’s disease.

genetic factors in Parkinson’s disease

There are a lot of studies on genetic factors in Parkinson’s disease (PD).  Here are a few of the most well-established risk factors.  Note that the average person has something like a 1-2% risk of developing Parkinson’s in their lifetime [Van den Eeden 2003 (ft), de Lau & Breteler 2006 (ft), 23andMe Parkinson's primer].  The best review and meta-analysis of Parkinson’s GWAS I found was Lill 2012.  Here are a few of the genes most important in PD risk:

gene description
LRRK2 Mutations in LRRK2 (pronounced "lark 2" and formerly known as PARK8) cause a dramatically increased risk of PD.  The most famous one, G2019S (rs34637584) which Sergey Brin has, causes about a 28% risk of PD by age 60 [Healy 2008] compared to ~1% in people without the mutation [de Lau & Breteler 2006 (ft)].  It might be considered as a Mendelian form of the disease, albeit one with a late enough age of onset that that many people are censored, making the penetrance incomplete.  G2019S is common in Jews and North African Arabs; there are also other mutations found in other populations such as G2385R in East Asia [An 2008, Zabetian 2009, Kim 2010, Tan 2010].  More common variants in LRRK2 may also be risk factors [Simon-Sanchez 2009].
SNCA SNCA encodes alpha synuclein, the protein which appears to be the "prion" directly involved in PD [Luk 2012].  Not surprisingly, GWAS have identified an association between SNCA variants and PD risk [Satake 2009IPDGC 2011Saad 2011UKPDC 2011].  One report suggests that a repeat length polymorphism in the gene's promoter that affects expression levels might be responsible [Mata 2010].
MAPT MAPT encodes Tau, which is infamous for mutations that cause frontotemporal dementia as well as for its role in Alzheimer's disease.  GWAS have also found that variants in MAPT affect PD risk [Simon-Sanchez 2009IPDGC 2011UKPDC 2011].
GBA GBA is best known as the gene responsible for Gaucher's disease, but some mutations in GBA also increase PD risk by 5-fold to 10-fold [Aharon-Peretz 2004, Sidransky 2009, Lesage 2011, Anheim 2012].

In addition there are at least several other genes that have shown a reproducible association in GWAS: BST1, CCDC62/HIP1R, DGKQ/GAK, MCCC1/LAMP3, PARK16, STK39, SYT11/RAB25 [reviewed and meta-analyzed in Lill 2012; data browsable on PDGene.org].  All of these associations appear to be based solely on common variants with no Mendelian mutations identified yet, and have risk ratios between 0.5 and 2, meaning that the higher risk allele doesn’t quite double your risk.  There are also a few other Mendelian forms of parkinsonism – in PINK1, DJ1, UCHL1, & PARK2 – whose classification as strict PD is not clear [Farrer 2006 (ft)].

In sum, based on the genetic studies it seems safe to say that:

  • There are a lot of genetic factors involved in Parkinson’s disease risk, and
  • Some of these factors have a large effect size

heritability in Parkinson’s disease

A useful review is [Farrer 2006 (ft)].  Of note, he points out three very old citations that provided the earliest reports of Mendelian forms of Parkinsonism [Leroux 1880, Allan 1937, Mjones 1950].  Clearly, it was recognized long ago that genetic factors were capable of causing / contributing to PD, but I’ll focus here on studies that explicitly tried to estimate heritability.

The table below summarizes the PD heritability studies I found.  I cannot claim this is an exhaustive list of PD heritability studies. I started with a Google scholar search for “parkinson’s heritability”, focusing on studies that were either large or had a large number of citations, then also checked references of those studies.  In particular I made sure I had included the relevant studies cited by Do 2011Farrer 2006 (ft) and the 23andMe Parkinson’s primer (accessed September 17, 2013).

Because the number of patients screened for PD varied dramatically based on study design (i.e. age of target demographic; decision to ascertain relatives of people with PD vs. just random individuals), I felt that the most comparable measure of n was the number of people with PD in each study.

study n with PD description
Piccini 1999 21 Used PET imaging to trace dopaminergic function in MZ vs. DZ twins nominally discordant for PD, i.e. one twin had a diagnosis of PD and the other did not.  The quantitative data from the PET scan proved more concordant than the binary yes/no diagnosis, and grew moreso at followup, with 75% MZ concordance compared to 22% DZ concordance.  This would seem to suggest a substantial (up to 100%) heritability.
Tanner 1999 (ft) 193 Compared concordance of a binary yes/no diagnosis in MZ and DZ twins in a cross-sectional study design.  The MZ twins were nominally just a bit more concordant than DZ (.16 vs. .11) but this difference was not significant (the 95% CI on the relative risk for MZ twins as opposed to DZ ranged from 0.63 to 3.01) and the best estimate of heritability for PD at age > 50 was 6.8% (Table 2).  Heritability for PD at age < 50 was estimated at 100% (Table 2).
Sveinbjornsdottir 2000 (ft) 772 Ascertained 378 patients diagnosed with PD in Iceland and then looked for concordance in all of their relatives.  Relatives of a PD patient were at substantially increased risk for PD compared to the general population, with risk ratios as high as 6.3 for siblings.
Moilanen 2001 >265 Ascertained 265 patients with PD and then looked at relatives.  Alone among these studies, it found higher heritability of late onset PD (45%) than early onset PD (17%).
Payami 2002 (ft) >460 Ascertained 117 early onset and 343 late onset PD patients, then looked at relatives.  Risk ratio of 7.76 if you have a relative with early onset PD and 2.95 if you have a relative with late onset PD.
Rocca 2004 253 Ascertained patients with PD and then looked for concordance in relatives.  Significant risk ratio of 1.71.
Wirdefeldt 2004 247 Ascertained 14,082 twin pairs (plus some triplets?) from the Swedish national registry and found 247 individuals with "Possible PD" (or 517 using looser criteria of "Broad Definition PD").  Of the 247, only two concordant pairs were found, both female DZ, suggesting no heritability; of the 517, concordance rates were slightly higher for MZ than DZ twins.  The discussion states there is "almost no heritability", but Table 3 shows a variety of additive heritability estimates ranging from 13% to 40%.  The estimate of 30% at p = .049 in a model constrained to have no gender differences ("Men = women") is the estimate that Do 2011 chose as the take-home message of this paper in Table 4.
Do 2011 3426 The scientists at 23andMe took the self-reported PD status of 3,426 case and 29,624 control customers and used GCTA [Yang 2011] to come up with an estimate of 27% heritability (95% CI: 23% – 32%).  They also compared estimates from these other studies – see discussion below.
Hamza & Payami 2010 592 Ascertained 504 individuals with PD and then looked at siblings and parents.  Estimate of 60% heritability overall, and 40% after excluding families with mutations in LRRK2, SNCA, SCA2 or PARK2 (then known as PRKN).  The data were simply fed, black box-style, into a piece of software called SOLAR [Almasy & Blangero 1998].  The possible contribution of shared environment is not really discussed, nor are possible sources of ascertainment bias.  This study also estimates heritability of age of onset.
Wirdefeldt 2011 542 The original authors of the 2004 paper ascertained 23,218 Swedish twin pairs followed from 1961 to 2005. The paper is not totally clear but this appears to overlap with the 2004 dataset (neither a perfect subset nor superset).  In the longitudinal study, concordance was 11% MZ vs. 4% DZ or 13% vs. 5% using a looser disease definition, and the study's best estimate of heritability was 40%.

Of note, the 23andMe study also includes in Table 4 a tabular comparison of several of the above studies [Do 2011].   Although the authors of the various studies in the above table disagreed quite a bit in how they phrased their conclusions, their actual best estimates of heritability are not as different as you’d expect: the range is 17% to 100%, with 10 of the 17 estimates falling within 20-40% and all of the 95% confidence intervals (where available) overlapping considerably.  (Aside: this study used a method from Wray 2010 to convert risk ratios to heritability estimates – this is something I’ve been seeking for a while now).

discussion

Based on the comparison of heritability studies listed above, the claim that Parkinson’s was long thought to have no heritability does not seem justified.  This claim is most probably based on two studies [Tanner 1999 (ft), Wirdefeldt 2004].  These were indeed the two largest twin studies of PD until Wirdefeldt’s longitudinal study seven years later [Wirdefeldt 2011], but contemporary family-based studies (see above table) gave higher heritability estimates.  Though the authors phrased their conclusions in a largely negative light, Tanner also gave a high (~100%) estimate of heritability of early-onset PD, and Wirdefeldt’s 2004 paper did contain an estimate of 30% heritability, of borderline statistical significance (p = .049).  Therefore to single out only these two studies, and to single out the conclusion of no heritability in particular, seems rather selective.

At the same time, there’s a fair question here: why didn’t these twin studies find more evidence for heritability, given that we now know that so many genes are indeed involved in PD?  This is especially surprising because these studies did not even exclude Mendelian forms of PD.

The obvious explanation is sample size.  This may seem ridiculous since Wirdefeldt 2004 used an exhaustive registry of all twins in Sweden and started from a list of 50,000 people.  But consider how these big numbers whittle down.  The study ended up ascertaining 33,780 people in 14,082 twin pairs (I know, confusing – not clear how they handled triplets).  The twins were apparently about 2/3 DZ and 1/3 MZ, though the exact numbers are never stated.  The study found 247 individuals with “Possible PD”.  This suggests a prevalence of PD, in this particular twin population, of 247/33780 = 0.73%.  Under the null hypothesis that Parkinson’s is purely random (sporadic, idiopathic) with no genetic component at all, what is the expected number of pairs of twins who both have PD?  This is technically drawing without replacement, but to an approximation, it can be modeled with a binomial distribution:

> dbinom(2,size=2,prob=247/33780)*14082
[1] 0.7529029

On expectation, under the null assumption of zero heritability, there should have been just 0.75 concordant PD twin pairs in the dataset.  If we assume the dataset was ~2/3 DZ, then that 1 twin pair is divided into an expectation of 0.25 concordant MZ twin pairs and 0.50 DZ twin pairs.  Instead, Wirdefeldt found 0 concordant MZ twin pairs and 2 concordant DZ twin pairs.  The expected numbers are low enough that I suspect the true heritability would have to have been pretty large in order for this study to have been well-powered to detect it. I’m still puzzling over how exactly to model this.

The numbers from Tanner 1999 (ft) are not so different: 193 individuals with PD out of 19,842 individuals gives a prevalence of 0.97% and an expectation of 0.93 concordant twin pairs under the null hypothesis of no heritability:

> dbinom(2,size=2,prob=193/19842)*(19842/2)
[1] 0.9386403

Now consider the same authors’ later longitudinal study [Wirdefeldt 2011].  The study population appears to have been older, with an average age at last followup of 79.8.  The average age is not stated in the 2004 paper, but was likely younger since the 2004 study took twins born before 1950 and ascertained them in 1998 (thus minimum age: 48) while the 2011 study took twins born before 1952 and ascertained them in 2005 (thus minimum age: 53).  Moreover, the longitudinal design probably helped with ascertaining more individuals who had PD earlier and had already died by 2005.   For both of these reasons (age and longitudinal design), the prevalence is higher in the 2011 study: 542/49814 = 1.1%.   The higher prevalence and larger total sample size give an expectation of three (3) concordant twin pairs:

> dbinom(2,size=2,prob=542/49814)*(49814/2)
[1] 2.948609

In contrast to which, the authors found 16 concordant twin pairs: 9 MZ and 7 DZ.

This is all a pretty crude back-of-the-envelope not accounting for drawing without replacement, and I don’t fully understand Wirdefeldt’s numbers, particularly how sibships of > 2 were handled in the analysis. But it at least suggests that, despite a starting sample of ~50,000 people, the 2004 analysis may just not have had a ton of power to detect heritability.

I’m not the only one to reach this conclusion.  Farrer’s paper pointed me to Simon 2002 (ft), a nice editorial and power calculation showing it’s very difficult to detect heritability of a phenotype that has low penetrance.  (As I’ve argued here, late onset diseases are never completely penetrant, and even the Mendelian forms of PD – to say nothing of the mild risk factors – are no exception.)  An important point of Simon’s analysis is that a given level of disease prevalence – say, 1% – can be achieved a number of different ways.  A SNP (call it “z”) with allele frequency 50% that causes 2% risk results in the same prevalence as a SNP (call it “a”) with alelle frequency of 1% that causes 55% risk (I believe the asymmetry is due to how he handles homozygotes, though the full code is not available). For these two examples, “z” requires an outrageous 14,000 twin pairs to have 80% power to detect heritability at p = .05, while “a” requires “only” about 400 twin pairs with at least one affected member.  Even that latter figure is still twice the sample size in Tanner or Wirdefeldt’s studies.

Suppose this argument is right – that sample sizes in [Tanner 1999 (ft), Wirdefeldt 2004] were simply too small to reliably detect heritability.  How, then, did we ever manage to find genetic risk factors for PD?

The answer: by not only studying twins.  Twin studies are awesome for heritability because they’re so controlled for shared environment effects.  But an entire nation may contain only a couple hundred people who have both Parkinson’s disease and a twin, and that’s not a huge n.  Consider the review of PD GWAS in [Lill 2012 - Table 1]. The earliest Parkinson’s GWAS to find a result which still stands up to meta-analysis today was Pankratz 2009, which had 857 individuals with PD – about quadruple the number in Tanner 1999 (ft) or Wirdefeldt 2004.

conclusion

An intuitive assumption is that if there are genetic risk factors or genetic modifiers, then there must be heritability as well.  Nothing in this brief history of Parkinson’s disease contradicts this bit of common sense.

Perhaps a more pertinent question is whether we should demand evidence of heritability in order to justify a search for genetic risk factors.  Certainly, we all seem to assume so: GWAS papers – and, no doubt, grant applications – usually begin with an introduction citing the evidence for heritability of the particular trait being studied.  Is it reasonable that we demand this?  Or would it be just as well to say that it’s hard to have power to detect heritability, and therefore we should just be looking for genetic modifiers of any trait, regardless of reported heritability?

Perhaps.  But I’d argue the stronger case is simply that we shouldn’t rely solely on cross-sectional twin concordance studies, and certainly not without a power calculation to show what they’re capable of.  By the time Parkinson’s GWAS began to be done in the mid-2000s, there were multiple known Mendelian forms of PD, and multiple family-based heritability studies suggesting substantial heritability.  Considering this, it’s not at all surprising that we’ve since found several genomic regions associated with Parkinson’s risk.