The kin-cohort method for estimating disease penetrance

I have heard people – especially lay people, but sometimes scientists too – speak of disease penetrance (the proportion of people with a genetic mutation who get a disease) as though it were a single numerical quantity. In fact, for any adult-onset disorder, penetrance is a function of age, and you can only get the disease if you avoid dying of something else first. As explored a bit in this post, the concepts of penetrance and age of onset are deeply intertwined. Too often, we study Mendelian diseases by ascertaining people with a disease phenotype, and then looking at their genotypes and, at best, those of a few family members, making penetrance at any given age appear higher than it is. For very rare diseases, sequencing datasets are only just now beginning to approach the size where we could think of ascertaining on genotype and then asking questions about phenotype.

Recently a peer reviewer directed me to consider the concept of a kin-cohort study, pioneered by [Wacholder 1998 (ft)], and whether it could be used to assess age-dependent penetrance in genetic prion disease.

Wacholder wanted to obtain an estimate of penetrance as a function of age for breast and ovarian cancer associated with any of three BRCA1 and BRCA2 mutations prevalent in Ashkenazi Jews. Previous studies had provided estimates, but usually by studying families with a large number of cases – the very families in which linkage analysis had been performed to discover the mutations in the first place. These families are biased for high penetrance. To get a less biased estimate, Wacholder went out and ascertained 5,300 volunteers from the Jewish community in and around Washington D.C. Each volunteer was asked list their first-degree relatives (siblings, parents, offspring – people with 50% identity-by-descent) and give the age and cancer/vital status for each relative, living or deceased. Then each volunteer was genotyped for the three mutations in question.

At first, this study design sounded profoundly odd to me. You’ve only genotyped the volunteers, who by definition are at least still alive (though of course, a few might be cancer survivors), so how are you gaining any information about disease penetrance? The analytical strategy is brilliant: you just deal with genotypes probabilistically. For a mutation with known carrier frequency p, where p is small, the allele frequency is p/2, and so the probability of a first-degree relative of a carrier (“carrier kin”) being a carrier is p/2 + 1/2. 1/2 is the probability of sharing the known carrier’s mutant allele, and p/2 is the probability of the other allele happening to be mutant. The probability of a non-carrier’s first-degree relative being a carrier (“noncarrier kin”) is simply p.

Thus, you can compare the survival curves for how long the carrier kin and noncarrier kin live cancer-free. When you make the survival curves, of course you censor unaffecteds at their present age (if alive and well) or their age at death of intercurrent illness, and you get to do this with confidence that you’ve ascertained 100% of the censored individuals, since after all, people do know their immediate relatives quite well. The details of the math are a bit more complicated than this, but conceptually, you just need to figure that the difference between the two survival curves is attributable to the (p/2 + 1/2) – p difference in carrier frequency between the two populations, and then you can back into the penetrance for carriers.

It’s such a cool approach. Could it work for prion disease? We certainly do need a new approach. The one really thorough life table analysis of age-dependent penetrance in genetic prion disease [Spudich 1995] suffered from exactly the problem Wacholder was seeking to fix: individuals (N = 57) were ascertained from families that were ascertained precisely for having multiple cases of Creutzfeldt-Jakob disease, inherently biasing the analysis towards the hardest-hit families.

Yet to my eye, the vanishing rarity of genetic prion disease precludes the kin-cohort approach from being used here. BRCA mutations have an allele frequency of 2-3% in Ashkenazi Jews, so by ascertaining 5300 individuals you can expect to find over 100 carriers. PRNP E200K, by contrast, has never been seen in any public dataset (1kg, ESP, etc.) to date, so its allele frequency in those study populations cannot possibly be more than 0.1%. And if the mutation is anywhere close to fully penetrant, then the incidence of E200K CJD demands that the allele frequency is far, far lower.*

*CJD has an incidence of 1 case per million population per year, and only a fraction of cases are genetic, and of those only a fraction are E200K, so back-of-the-envelope integrating over a life expectancy of, say, 63 [Schelzke 2012], the global allele frequency assuming full penetrance would have to be on the order of 1e-6 × 5% × 63 ≈ 3e-6. So even if the mutation is, say, 100x more common in founder populations (Slovaks, Italians, Libyan Jews) than it is globally, that still has us finding maybe one carrier in a study of Wacholder’s size

If ascertaining random volunteers won’t do, then we are left with the possibility of ascertaining individuals from families known to carry the disease, meaning that we’re subject to the upward bias on our penetrance estimate that Wacholder was seeking to avoid. Indeed, the original paper recognized this as a limitation:

For many diseases, it could also be very difficult to recruit participants who do not have a family member with the disease, resulting in an over-estimate of risk due to volunteer effects.

Still, the idea that you can garner information on a genetic disease even from probabilistic genotypes rather than known genotypes is new to me, and I want to keep pondering whether this way of thinking could be useful in some way.