When we were both teenagers, my sister touched off an epistolary feud with Marilyn vos Savant by writing to her in Parade magazine to ask about the boy girl paradox.  The first of her letters is here; the issue led to a series of columns with letters from my sister and other readers – I’ll add scans of the newspaper clippings later if I can track them down.  update 2013-04-04: here they are! [PDF]

For several weeks in 1996, middle-aged men across America were looking my 17-year-old sister up in the White Pages and calling her at home, not because they were sketchballs but because they wanted to talk about math.  It was exciting times in the Minikel household.

For those who aren’t familiar, the boy girl paradox is as follows:

1. I have two children.  The older one is a boy.  What’s the probability they are both boys?
2. I have two children.  One is a boy.  What’s the probability they are both boys?

The correct answer to #1 is 1/2, as agreed by everyone who’s ever been asked.  The answer to #2 is the source of the apparent controversy, which Marilyn vos Savant never resolved to all of her readers’ satisfaction.  Is it 1/2 because, duh, the ‘other’ child’s gender doesn’t depend on the ‘one’ we hear about, or 1/3 because, out of the four possible two-child family configurations GG, BG, GB, and BB, only 3 are consistent with the ‘one is a boy’ statement and only 1 of those 3 contains two boys?

You know what I love?  The internet.  The collective genius of all of humanity editing the Wikipedia article on the boy girl paradox (along with some scholarly works ex. Bar-Hillel & Falk 1982Grinstead & Snell 2006) has finally given a beautifully correct and thorough answer to this problem.

tl;dr: It depends on how you ascertain families for your ‘study’.  Imagine these two ways:

• Ascertain a randomized set of families with two children. Exclude the ones that don’t have at least one boy.  Ask how many of the remaining families have two boys.  Answer: 1/3.
• Ascertain boys.  Exclude those who don’t have exactly one sibling.  Ask what fraction have a brother.  Answer: 1/2.

The answers differ because you sample BB families at double the rate in your second study design.  These two ‘study designs’ can correspond to all sorts of real-life scenarios – did you see the boy?  Was the parent equally likely to have phrased the whole question in terms of girls if they had girls?  And so on.

All this controversy, then, really stems from an ambiguously phrased question.  Therefore we should all be able to agree on an answer to a similar question that is put unambiguously and corresponds to a real-life situation.

The human prion protein gene, PRNP, has only one really common polymorphism: codon 129 can be either methionine (M) or valine (V) (dbSNP: rs1799990).  In people of European descent, M alleles are about 67% and V alleles are about 33% [1000 Genomes browser].  Although dbSNP has this SNP annotated as ‘pathogenic’, it’s not a cause of familial disease.  However, genotype at this SNP is absolutely central to the genetics of prion diseases [reviewed in Mastrianni 2010 (ft)].  MV heterozygosity at this SNP is associated with reduced risk of sporadic Creutzfeldt-Jakob disease (sCJD) [Palmer 1991, Alperovitch 1999, Mead 2012], later onset and/or longer incubation times in some genetic prion diseases [Poulter 1992, Mead 2006Webb 2009], and apparently absolute resistance to variant Creutzfeldt-Jakob disease (vCJD): 100% of people who died of vCJD during Britain’s mad cow epidemic were 129MM homozygotes [see Mead 2009].  Interesting aside: PRNP codon 129 heterozygosity has also been reported to be associated with increased risk for a different disease, primary progressive aphasia [Li 2005].

Of relevance to our discussion today is the fact that codon 129 determines the disease phenotype caused by the D178N mutation.  D178N in cis with 129M causes fatal familial insomnia (FFI); in cis with 129V, it causes familial Creutzfeldt-Jakob disease (fCJD) [Goldfarb 1992].  Actually, there’s a phenotypic spectrum between the two diseases [Zarranz 2005], but the codon 129 haplotype on which the mutation falls is still considered extremely meaningful, and an FFI allele is defined to be a PRNP allele with the D178N mutation and M at codon 129.

With that background, let’s get to our real-life boy girl paradox.  Kong 2003, still to my knowledge the most recent review of fatal familial insomnia case reports in the literature, counts the the FFI patients who have featured in case reports in the literature with known codon 129 genotypes, ages of onset and disease duration.  There are 57 of them: 40 MM homozygotes and 17 MV heterozygotes (70% and 30% respectively).

At first glance, you might reason as follows: the (European) population is ~45% MM,  ~44% MV and ~11% VV, so MV individuals represent about 50% of potential FFI patients, yet only 30% of reported FFI patients – looks like depletion and possibly evidence of incomplete penetrance.

Let’s be formal about this.  The question is: are these data are consistent with the null hypothesis that mutant FFI alleles exhibit complete disease penetrance regardless of the codon 129 genotype in trans, i.e. on the normal allele?  Under that null hypothesis, what fraction of FFI patients are expected to be MM vs. MV?  In other words, if someone has one allele with 129M that has a D178N mutation; what is the probability that their other allele is also 129M?

If we make, for now, the very incorrect but dramatically simplifying assumption that all patients who die of FFI will be reported in the literature, then then it is Mendel himself and not medical academia that’s doing the ‘ascertaining’ of these reported cases.  How does Mendel do that ascertaining?  Does he look at all people, throw out the .332 = ~11% of them that are VV, and then assign the FFI disease at random to the remainder who are split almost perfectly half and half between MM and MV?  If that were the case, then the expected proportions of MM and MV genotypes among FFI patients would be 50/50, and the observed proportions would be quite depleted for MV patients, suggesting incomplete penetrance in heterozygotes.  But of course that isn’t how Mendel operates.  Instead, he assigns the D178N mutation to alleles, not people, thus double-sampling the MM homozygotes, bringing the MM and MV proportions among FFI patients back up to 2/3 and 1/3, which is wholly consistent with Kong’s meta-observation of 40 and 17 patients respectively.

Is the normal allele’s codon 129 genotype independent of the mutated allele’s codon 129 genotype?  I believe it is. Webb 2009 asserts, and I agree, that an affected child’s normal allele is independent of their affected parent’s normal allele, because the normal allele is inherited from the unaffected parent.  The normal allele could still be correlated with the mutant allele if there were assortative mating, but if you look at the 1000 Genomes browser, this genotype is distributed pretty exquisitely according to Hardy-Weinberg equilibrium.

That’s assuming familial transmission – what about de novo D178N mutations?  After all, all of the mutated haplotypes present in the FFI families had to start somewhere.  But here, too, your odds of having a de novo D178N mutation on a 129M haplotype are doubled if you’re a 129MM homozygote compared to a 129MV heterozygote, meaning that again we’ll expect 2/3 of de novo FFI cases to be in 129MM homozygotes.

I’ve also left out any mention of population stratification, which you could imagine as important because 129V alleles are virtually absent in 1000 Genomes’ (East) Asian populations, however, all but two of Kong’s FFI families are of European descent.  We could also consider the fact that clearly not all FFI cases are reported in the literature.  Particularly as the genetics and pathology of the disease are now well-established, case reports might be increasingly likely to only include cases that are exceptional in some way.  But it isn’t clear what sort of bias that would introduce.  FFI (in contrast to other genetic prion diseases as mentioned above) has not exhibited any significant difference in age of onset between MM and MV genotypes [Kong 2003].

If all this sounds too convincing, you might be thinking “This is trivial.  It isn’t even a boy girl paradox.  This is more like question #1, where you tell me the older child is a boy, since you told me the FFI allele has the D178N mutation.”  But remember the “at first glance” way of thinking from above, which I’ll rephrase as: “You can’t have FFI without at least one 129M allele.  89% of Europeans have at least one 129M allele – 45% MM and 44% MV.  70% of reported FFI patients are MM and 30% are MV.  Therefore MV are depleted in FFI.”  This logic is wrong, because FFI doesn’t require “at least one 129M allele”, it specifically requires 129M in cis with D178N.  So yeah, maybe it is like the “older child is a boy” question.  But it’s always good to think through these things.

I conclude that the figure of 40 MM vs. 17 MV FFI patients reported by Kong is entirely consistent with the null hypothesis that FFI exhibits no difference in penetrance according to codon 129 genotype.

Anyone disagree?