Tuesday, July 3, 2012


Is it just me, or is everyone completely unable to explain what a confidence interval is?

The Higgs Boson is back in the news again, here's a Nature News article discussing Cern's latest discovery, which is that
The data contained “a significant excess" of collision events at a mass of around 125 gigaelectronvolts...
 Physicists have maintained that they will not announce the discovery of the Higgs until the signal surpasses 5 sigma, meaning that it has just a 0.00006% chance of being wrong. The ATLAS and CMS experiments are each seeing signals between 4.5 and 5 sigma, just a whisker away from a solid discovery claim.
This is wrong.
I also found incorrect explanations by MIT, the Guardian, and the New York Times, the latter of which (discussing a different discovery) said
The odds that the Fermilab bump were due to chance were only one in 550.
Now I'm not a physicist, but my understanding of what Nature is trying to say is something like "if there were no particle of a mass around 125 GeV, then the chance of a 5-sigma event is 0.00006%."

Confidence for beginners

The difference sounds quite subtle, but we'll see with some examples why it matters.  Think of the physics experiment as a very accurate diagnostic test - if the particle doesn't exist, it's very unlikely your experiment will give you a false positive.  It seems then to follow that if you get a positive result, it must be very unlikely that the particle doesn't exist.  This is both 'intuitively' true and fatally flawed, because it really depends upon the base rate, or how likely you thought it was that the particle existed before you did the experiment.

Let's think of another example (I gave this one in a previous post).  You go to the doctor, and he gives you a diagnostic test for a disease.  The test is very accurate, let's say 99%; this means that if you have the disease, 99% of the time the test will say so, and if you don't have the disease, 99% of the time the test will come up negative.  The test comes up positive - do you have the disease? (or rather what is the probability that you do?)

The point is that it will depend upon how likely you were to have the disease in the first place, that is before you took the test.  Pregnancy tests are very accurate, but if I took one and it came up positive I'd be pretty darn sure it was a false positive (not that pregnancy is a disease, *cough*).  If the disease is rare (say 1 in 1000 people have it), and you had no particular reason to think you might have the disease, then it's still about 10 times more likely that the test was wrong than that you have the disease.  In other words, the probability you have the disease is only about 1 in 11.  So no need to panic.

Reasoning wrongly as above is known as the base rate fallacy, and the maths can be formalised with Bayes formula.

Other issues

Going back to the Higgs Boson example, things are a bit different.  Physicists actually had good reason to believe that Higgs exists before the Large Hadron Collider actually started work, so we might put the prior probability of its existence at, say, 10%.  (Perhaps one could do a study of how many Physicists' predictions turn out to be true.)  Then if we observed a 5-sigma event we could apply Bayes' rule and find the posterior probability of the particle's non-existence, which would be about 0.0005% (one zero fewer than before).

Big deal you might say, this still seems like pretty solid evidence.  There is a further problem, which is that these probabilities assume the experimenters have dealt with all possible other explanations for the evidence, critically including all sources of systematic error in their measurements.  I'm sure they've set the experiment up very carefully indeed, but the probability of an error of this kind is still likely to be much higher than 0.0005% (I'd guess between 1% and 10%), so we really shouldn't say that the probability of an error is 0.00006%.

More generally, medical studies are generally performed to a standard of 5% confidence.  This means that for a positive result, roughly speaking, if the drug is useless (or worse) there less than a 5% chance of seeing results as good (or better) than we did.  This isn't perfect, so it should be obvious that some positive results will later turn out to be wrong, and so they are.  You might think it would be precisely 5% of results, if you hadn't read the rest of my post.

In fact, most such studies are false positives.  This is partly because of experimenter bias, but it's also just because most drugs put forward for trial don't work.  Hence the relative rate of false positives to true positives is quite high.  Things are almost certainly much worse in the social sciences.

No comments:

Post a Comment