Monday, October 29, 2012

Independence, independence, independence

Multiplying probabilities without any thought is a dangerous game.  Just look at this piece in Rolling Stone, referenced in this Guardian article.
...warmest May on record for the Northern Hemisphere – the 327th consecutive month in which the temperature of the entire globe exceeded the 20th-century average, the odds of which occurring by simple chance were $3.7 \times 10^{-99}$, a number considerably larger than the number of stars in the universe.

How does this number arise?  The chance of a random variable exceeding its median is one-half (that's the definition of a median).  If we treat each month as some independent random variables, then the chance of 327 of then all exceeding their medians is $0.5 \times 0.5 \times\ldots$(327 times)$\ldots\times 0.5 = 0.5^{327}$, which is indeed $3.7 \times 10^{-99}$; very very unlikely.

Of course, we don't know what the median of these random variables is, we simply use a historical average (from the 20th Century) to estimate it.

Since the probability we got was astronomically small, we deduce that one of the assumptions we used to calculate it was false; in Rolling Stone and the Guardian's case, they assume that medians were not correctly estimated, because the world has got warmer over time.  This seems likely to be true, given all the scientific evidence we have for climate change.

But in reality the independence assumption is certainly false as well.  The temperatures in consecutive months will certainly be correlated with one another; for example, my limited understanding of El Nino is that it can lead to warmer weather.  So, in fact, the probability calculated above is completely meaningless.

Suppose instead, for example, that conditional on January being warmer than average, the chance that February is also warmer than average is $0.75$, and the same for other consecutive months.  The probability becomes $0.75^{327}$, which is a mere $5.9 \times 10^{-42}$.  This is still astronomically unlikely of course, but the difference in magnitude between the two numbers is more astronomical, which suggests that it wasn't sensible to present the probability in the first place.

It's tempting to try and turn the very complicated but compelling evidence for something like climate change and turn it into a single number or graph in order to communicate a message.  That's a very worthy aim, but make sure the number you choose is even vaguely correct.