It's A Stat Life

B is for Bayesian Inference

2015-12-09T09:41:00.003+00:00

Just a note to say that I guest blogged for the University of Oxford's Mathematics Alphabet this week, on B is for Bayesian Inference. Do check it out!

Ebola hunting

2014-10-16T10:00:00.000+01:00

Here's a quick prediction for what happens now that the UK has started screening people for Ebola at Heathrow (soon to be extended to other airports and terminals). They'll find a lot of people with minor stomach bugs of some sort.

Image: CDC

There are about 150 new Ebola cases reported each day at the moment, so even if we're generous and assume the true figure is 300, and that symptoms last for 2 weeks, only about 4,000 people in total have it at any given time. The combined population of the worst affected areas (Guinea, Liberia and Sierra Leone) is about 20 million, so maybe approximately 1 in every 5,000 people are affected. Fewer than this will be in a state fit to travel (I imagine they would have to be in the early stages of the disease).

Gastroenteritis is endemic pretty much everywhere, and is worse in the developing world because of poor sanitation and sewage treatment. There are around 3 to 5 billion cases per year, so on the order of half the people in the world get it each year (in reality, young children are more vulnerable than adults). If it lasts for 4 days on average (it can be much worse, of course), that suggests that on a given day about 1 in 200 people would have it.

From this a fairly generous estimate is that someone screening will see at least 25 people with gastroenteritis for every person who has Ebola. Since many of the symptoms are pretty similar, I predict that, if anything happens at all, a lot of people will get sent to hospital without much cause.

Note that, in reality, international travellers are much richer than most residents of the west African countries affected; consequently they have better access to sanitation and healthcare when in those countries, and will have a substantially lower risk of infection with Ebola as a result (though this also applies to gastroenteritis).

The moral is that hunting for rare things is a pretty thankless task.

What price a preposition?

2014-10-15T09:05:00.001+01:00

The headline to this article made me aware of a pretty big ambiguity in its contents: there's a big difference between saying that "in 30 seats UKIP are likely to win", and "UKIP are likely to win 30 seats".

To see this, consider the statement: "Chelsea are likely to win every game they play this season". Interpreted one way, you're just saying that Chelsea are favourites in each game they play (plausible). Another meaning would be that Chelsea are likely to win all the games they play, which is exceptionally unlikely (no-one has ever come close).

Of course the results in different seats in general elections are strongly correlated, much more so than football matches. But I'd still like to know what exactly the Guardian means.

Can a gamble ever be right or wrong?

2014-10-12T23:35:00.002+01:00

A couple of weeks back I was feeling fairly smug, having put a bit of money on a 'No' vote in the Scottish referendum. I then placed a few quid on UKIP to win the Heywood and Middleton by-election last Thursday, which they didn't. But the latter result was quite close, so I felt that somehow it had been a 'good bet' to have made. So what makes a person right or wrong to have placed a bet in the first place? (Mathematically that is, leave your morals at your home page.)

As a statistician I'd say that winning or losing a particular bet does not in itself indicate whether or not the bet was good value. A bet is good value if you expect (on average) to make money from it. For example, suppose someone offers to let you bet that England will win their next match at odds of 50% (2 in decimal odds, evens in traditional parlance); the bet is good value if you think the chance of that happening is greater than 50%. If the true odds are 75%, then you expect to make
$$
0.75 \times £1 \times \left(\frac{1}{0.5}-1 \right) - £1 \times 0.25 = £0.50
$$
for every £1 you stake. Sounds good! But after the match is over, you're left with either double your money or nothing.

I got odds of about 5.5 (18% or 9 to 2) on UKIP winning in Heywood. This seemed like a bit of a long-shot, so when I saw that the election had indeed been so close this seemed like evidence that the true chance of them winning had been higher than 18%. This is obviously slightly illogical, since by the time I had that information the chance of them winning had dropped to zero.

So is there an objective way to say whether a particular bet was good value or not? For a one-off event like the Scottish referendum, or even the by-election, the answer is pretty clearly no. There are many expert opinions available about which outcomes were plausible (from both before and after the event) but these are ultimately subjective.

Purely subjective probabilities are of limited practical use in this context: if I 'expect' to make money, but I'm also an idiot, then this doesn't help me much. Such probabilities are most useful when they have some sensible long-run interpretation: e.g. if I roll a die 6,000 times about 1,000 times it will show a six: hence we'd say the probability is $\frac{1}{6}$. Referenda have no such interpretation without resorting to rather contorted abstract ideas, because there aren't any other events which are the same (or even vaguely similar, usually) to a particular plebiscite: you can try to compare it to other elections, but you'll never know whether this one was somehow 'different'. But surely probabilities for these events aren't totally meaningless, are they?

For a probability forecaster like a weather service or a bookie, you can check whether they are 'calibrated': take a large number of predictions of (say) 60% rain, and check whether approximately 60% of them were followed by rain. The events don't even have to be related: you can throw in any events given a 60% probability and see if they're jointly calibrated.

But for any individual prediction, there isn't much you can say about its quality from purely statistical evidence. What about a pundit? Well I'd say they only add value if you'll make money by betting on his or her predictions: i.e. he or she is telling you something that the bookie's odds don't. If they just tell you the most likely outcome this isn't very useful: in a sporting fixture it's usually clear who the favourite is.

For a gambler of course, the proof is in the winning: a good gambler will make money eventually, and a bad one will lose it: so far I'm up. My next wager is that the Conservatives will win the most seats at the next UK general election... I'd be quite happy to lose that bet.

(If you're interested in my reasoning for the bets above: I bet on the Scottish Referendum just after a YouGov opinion poll showed 'Yes' ahead, guessing that people would over interpret this single poll: I got odds of 1.4 (71% or 2 to 5). I bet on the Heywood and Middleton by-election after reading that Labour were nervous about the result: the odds would have been much higher if I'd waited until nearer the day.)

Frog kissing. Or, why do we never learn?

2014-01-26T21:15:00.002+00:00

A friend pointed me to this story about the average number of men (or rather frogs) a woman has to date before finding their 'Prince Charming' (apparently it's 15). Let's leave aside for now how they came up with the number: I've no doubt the methodology conforms to the most rigorous standards that we've all come to expect from the internet. I thought that this paragraph might give some people hope:

If you've made out with 11 men, great news! A new survey suggests you're only 4 away from Prince Charming. On average, anyway.

It's now my duty to crush that hope.

Distinguishing princes and frogs can be
difficult at first glance [Photo: Allen Warren]

The bad news is that (even if the rest of it is correct) the survey absolutely doesn't say that. If you've been in 11 disastrous (but possibly ultimately affirming) relationships, you've guaranteed that the total number you end up in is at least 11. But it could have been fewer than that (it happens apparently), so the possibility that you stayed with your childhood sweetheart is included in the average which arrives at 15. Once you exclude that possibility, the average will only increase.

It's the same with life expectancy: every year you live, your total life expectancy will increase simply because you didn't die that year (way to go!). When you're young and healthy your chances of dying this year aren't high, so it's no great surprise when you survive; your life expectancy therefore remains roughly the same.

If you're 25, male, and live in England and Wales, your remaining life expectancy is 54.71 years, as estimated by the ONS (data from 2010-12), for a total of 79.71 glorious years. If you're 26, your remaining life expectancy is 53.74, giving a slightly increased total of 79.94. This becomes much more pronounced for older people: if you're 85 and female, you have a remaining life expectancy of 6.84 years, for a total of 91.84, but by the time you're 86, this has increased to 92.36, a gain of more than half a year, and 10 years more than at birth.

In fact, it's quite possible for the remaining life expectancy to increase, let alone the total. Imagine a population in which half the people die as infants (under 5), but the remainder live to be 75. When born, your life expectancy is about $\frac{1}{2} \times 5 + \frac{1}{2} \times 75 = 40$: not great. But if you make it to age 5, you're guaranteed to live another 70 years. Sadly, high infant mortality means that in some parts of the world this is not an abstract concept.

Kermit's Revenge

One particularly interesting case is if the distribution is memoryless. Let's get back to the dating: suppose your approach is to just meet people at random, learn nothing from each experience, and you don't become more or less picky over time. So we assume that, each time, you have a fixed probability $p = \frac{1}{15}$ of ending up in a relationship that works.

The chance that it takes exactly $k$ relationships before you find a non-amphibian is $(1-p)^{k-1} p$, because the first $k-1$ have to all be unsuccessful, and the $k$th one a charm. If you can sum a geometric series (hopefully my first year students are reading this), then you can show that the total number of people you have to meet before settling down is, on average, $\frac{1}{p} = 15$.

But if you've already dated 11 people? Well, we can use conditional probability: the chance of you having to date exactly $11+k$ frogs and princes is
$$
(1-p)^{11+k-1} p,
$$
and the chance that you find 11 people without meeting a keeper is $(1-p)^{11}$. So the conditional probability of having to date exactly $k$ more guys, given that you've already put up with 11 toads, is
$$
\frac{P(X = k+11)}{P(X > 11)} = \frac{(1-p)^{11+k-1} p}{(1-p)^{11}} = (1-p)^{k-1} p.
$$
But this is just the chance of having to meet $k$ guys in the first place! This is the memoryless property. It doesn't matter how many times you've tried and failed, you're still at square one until the prince comes along.

Of course, we might like to think that we learn from our mistakes, grow as people, and make better decisions as we get older, but the empirical evidence available to me doesn't bear this out.

Just Instrument It

2013-11-22T09:00:00.000+00:00

(With apologies to macro-economists.)

The amount of time you spend in education predicts your earnings quite strongly, and it's generally agreed (Simon Cowell aside) that if you want to do well in life, staying in education for longer is a good idea.

But how much effect does it have? We could look at a survey of people's incomes and group them by education level, but this doesn't give a causal effect. It might tell us that people who have a masters degree earn more during their lifetime, on average, than those who don't. This could be because people from wealthy backgrounds can afford the tuition for a masters degree, and also have pals in the city who can help them get a big salary afterwards. Or perhaps people who do masters degrees work harder than the rest. We can't easily tell the difference: this problem is called confounding.

We can't tell whether time spent in education level causes earnings
to increase, or there's a third factor which affects both.

Suppose Sarah, 16, is considering whether to drop out of school, or stay for another year. She doesn't care about whether other people who choose to stay in school turn out to be rich, what she wants to know is: if she stays, how much more will she earn than if she leaves (and will it cover the counselling?). If the apparent benefit of education is really down to having rich friends, then Sarah staying in school won't necessarily help at all. Knowing the best course specifically for Sarah is pretty difficult, but we might be able to answer this: if an average person stays in school an extra year, how much more will they earn than if they leave?

To answer this we can simply do a randomized control trial: take 2,000 people, toss a coin for each, and force those who get heads to stay to 17, and the rest to go straight into work, and see who fares best 25 years later. Regrettably indentured labour (or education) of subjects is not an option open to researchers in social policy, so it is not possible to perform this experiment.

Instruments

So what can we do? Well, what we really want is to find a third quantity which affects whether or not people decide to stay in school, but isn't related to later income. This is a bit like outsourcing our coin tosses to a third party who doesn't suffer from our ethical qualms. We call this quantity an instrument.

What sort of quantity would work? Two researchers from the National Bureau of Economic Research, Joshua Angrist and Alan Krueger, thought they'd found one: the time of the year in which you're born. In many US states it's a legal requirement to be educated until a certain age, say 16. If you're born in September, this means that at the very beginning of 10th grade, you can drop out; but if your birthday is in August, you have to wait right until the end of the grade to leave, so you get an extra year's schooling.

Looking at US census data, Angrist and Krueger (1991) found a very weak ($R^2 \approx 0.0001$) but significant association between the time of year individuals were born, and the number of years education they received; the later in the school year you're born, the more education you get. But we can think of this as just a coin toss: the time of year in which you're born is not determined by your family background or other factors relevant to your earnings, it just happens.

We assume that the time of your birth is not affected by your background,
but it does have a small effect on how long you stay in school for.

So if we find that people born in September earn less than the group born in August, then the difference must be because of the extra year of education (some of them) received!

There are four key assumptions we need to make this work:

the time of birth (the instrument) is not confounded with education (the intermediate variable) or earnings (the outcome): in other words, the instrument is, for all intents and purposes, randomized and not affected by, for example, your background;
the time of birth doesn't directly affect earnings, except through the intermediate variable: so being born later might mean that you get an extra year of schooling, which in turn means you earn more, but the time of birth can't affect earnings in any other way;
the time of birth has an effect on education: in our case, being born later increases the amount of schooling you get.
the effects are linear (as in linear regression).

Then because we assume that the time of birth is randomized (or at least, not confounded), any correlation between birth and education (say $\alpha$) is causal (it's as if we did a randomized trial), and also the correlation between birth and earnings (say $\beta$) is causal.

Furthermore, because we believe that all the effect of birth on earnings is through education, this means that the causal effect of birth on earnings ($\beta$) should just be the combination (i.e. the product) of the causal effect of birth on education ($\alpha$), and the causal effect of education on earnings (which is what we want to know, say $\rho$). That is: $\beta = \alpha \times \rho$, where we can measure $\beta$ and $\alpha$ from our observational experiment. Then we just use:
$$
\rho = \frac{\beta}{\alpha}
$$
and hey presto! $\rho$ is the causal effect we wanted to find (and will be different in general from the ordinary correlation between education and earnings). This is the method of instrumental variables (IV), and has been around since the 1920s.

Note the importance of assumption 3 here: if the instrument and the intermediate effect are not related then $\alpha = 0$, and division by zero leads to unpleasant stomach cramps and a skin rash.

"Dude, that is weak"

So Angrist and Krueger divided the effect of birth on earnings by the effect of birth on education, and obtained an estimate of the causal effect of an extra year's schooling on earnings. (I'm simplifying quite a lot here.) They find that additional compulsory years of schooling does lead to higher wages later in life, because of the additional education received. The exact estimates vary for different cohorts, but is of the order of an 8% increase in wages for each additional year of education. Which all seems plausible, and in fact it turns out to be similar to the observed correlation.

Unfortunately, this approach carries a fatal flaw in this context, as shown in this rather nice 1995 JASA paper by Bound, Jaeger and Baker. The problem is related to the dividing-by-zero warning I gave earlier. A perfect instrument will control the intermediate variable exactly, just like an actual randomized trial; in this case we don't need to use instrumental variables methods. If the instrument doesn't affect the intermediate at all, then we can't do anything, which is why we need assumption 3.

But if the instrument only very weakly affects the intermediate (and therefore the outcome), it's almost as bad as not affecting it at all: we have to divide a tiny number ($\beta$), by another tiny number ($\alpha$), and hope we get a reasonable result. As every computer scientist knows this is very bad, because it means that if we get our estimate for $\alpha$ or $\beta$ wrong by even a tiny amount, then the division will give completely the wrong answer.

This means that if the model is mis-specified by even a tiny amount, then the instrumental variables estimator will be totally misleading. This is the problem of weak instruments.

We're assuming that the effects are linear (so each extra month's schooling leads to the same number of extra dollars earned, for example) and at best this is only an approximation to the truth. The world just isn't very linear in practice. Worse, assumptions 1 and 2 are essentially untestable in the IV set up, so (without other information) it's quite possible that our model is mis-specified.

Still Instrumental

Although one should be very careful about drawing strong conclusions from studies using instruments, they are still very useful, and a devilishly clever idea. Many questions in economics just can't be empirically tested by randomized trials: can you imagine asking Mark Carney to flip a coin before deciding whether or not to raise interest rates, just so we can see the causal effect over time? Instead we must resort to instruments or other natural experiments.

Single studies using instruments are pretty unreliable; but they can still contribute to a wider 'body of proof' about something: we're pretty confident that staying in school increases your earnings, because it's been demonstrated in lots of studies in different countries and time periods. It is a robust finding.

There is another rather promising application of instrumental variables, one which has gained quite a bit of attention recently. Mendelian randomization uses our genes as instruments, and has been used to prove, for example, that alcohol raises your blood pressure. But I'll save that delight for another time.

Reference

This nice page by Andrew Johnston was useful.

But is it causal? Defining causality

2013-11-14T09:00:00.000+00:00

As I alluded to in my last post, defining what it means for $X$ to cause $Y$ is no simple task. It is not an idea that can be defined in purely probabilistic terms, because it says something about the mechanisms underlying the system we are studying, and what will happen if we interfere with that system in some way.

Consider the example given at the end of the last post. The headline was:

How a short nap can raise the risk of diabetes

The implication of this is that the risk of diabetes increases because of the nap. But what does this mean?

Let's start with a more clear cut example. Suppose I drop a glass bottle onto a concrete floor and it breaks: we might say that the bottle broke because I dropped it. In doing so we pack in a couple of ideas:

I dropped the bottle, and it broke;
if I hadn't dropped the bottle, then it wouldn't have broken.

Oh cruel, cruel world. (Photo by LongPHAM, Creative Commons)

The only difference between the two scenarios is whether or not I dropped the bottle, and yet in one the bottle breaks and in the other it doesn't. Therefore the action of dropping the bottle directly caused it to break.

Unless you're a philosopher all this seems fairly uncontroversial, because we have a very good understanding of the actual mechanism by which the bottle smashes: gravity pulls the bottle down so that it hits the floor, and in doing so the bottle's kinetic energy is transferred to the brittle glass. We can be pretty confident when we say that if I hadn't dropped the bottle, it wouldn't have broken, even though this is an event we did not observe, because we can model the physics and have a lot of experience to show that bottles are pretty stable when left to their own devices. If they weren't then the Jesus College wine cellar would be an awful mess.

Now a trickier example. Suppose that at the end of term I get a cold; I might say that "it's because I've been working hard and not sleeping enough, my immune system is a bit weaker". If we follow the previous example, then I must mean that "If I hadn't been working so hard, then I wouldn't have caught a cold." This is far from clear: people sometimes get colds even if they haven't been working hard.

It might well be the case that working hard and not sleeping enough makes it more likely that one will get a cold (let's assume for the sake of argument that it does). But it's also possible that I would still have become ill even if I'd taken it easier: perhaps I caught it from my housemate, so less work means more time at home potentially exposed to the virus. On the other hand, I might have caught the cold from a student during a particular meeting, so if I'd had a different meeting at that time I might have worked just as hard and not got a cold. It therefore seems much less clear what we mean by the idea that me working hard and not sleeping has caused me to become ill.

Probabilistic Ideas of Causality

The important point in the previous example is the idea that working hard makes it more likely that one would get a cold. We could base a notion of causality on this: say that $A$ is a cause of $B$ if the probability of $B$ occurring is greater when $A$ happens than when $A$ doesn't happen.

In the case of the bottle, dropping it increased the probability of it breaking from essentially zero (no chance of breaking) to essentially one (will definitely break). In this sense it is an extreme example. On the other hand, working hard might increase my chances of getting a cold, but it probably only by a very small amount (an increase from 0.1 to 0.15 would be a very high estimate in my view). In this case then, there's still a reasonable chance I would have caught a cold anyway, so it seems a very strong statement to have said that I became ill because I was working hard.

In a large group of people though, the idea becomes clearer. Suppose I take 1000 people, and give them all a relaxing couple of months, with a sensible workload. I might expect approximately 100 of them to get colds; on the other hand, if I make them all work very hard for those two months, my estimate suggests that around 150 will get colds. This notion of causality says that more people who work hard will get colds than if those same people took it easy.

Applied to the Daily Mail's headline, the implication is that if we stopped people from taking naps, then fewer of them would get diabetes. This is a much stronger statement than simply noting the association between the two. If, as seems rather more plausible, having an underlying illness makes you more likely to feel tired and therefore to nap, then this acting to stop people napping would not have the desired effect at all. They'd just be more tired, and not have any impact whatever on their chances of developing diabetes.

The notion common to most ideas of probabilistic causality is that the likelihood of an effect, $Y$, is changed when I intervene to change the cause, $X$. By just passively observing, we might see some association between $X$ and $Y$; but if $X$ causes $Y$, then when I perform an action which somehow changes the value of $X$, it will also change the probability of $Y$ occurring.

Smoking causes lung cancer, in the sense that if everyone stopped smoking we would see less lung cancer. On the other hand, if I gave everyone a vaccine which stopped people from developing lung cancer, it wouldn't stop people from smoking (perhaps the reverse, in fact!).

Potential Outcomes

The above ideas all refer to probabilities and chances which occur at the population level: I can't say whether or not my specific cold is actually caused by working hard, only that more people have colds when the work hard in a general sense. So can we define a notion of causality at an individual level?

Imagine that two worlds 'exist': the world we observed, in which I worked hard and then got a cold, and a second counterfactual world which was essentially the same, except that I took it easy. Let $Y_{\text{hard}}$ and $Y_{\text{easy}}$ be variables which denote whether or not I get sick under each scenario (1 for ill, 0 for not ill): we know $Y_{\text{hard}} = 1$ because I actually did work hard, and I did get a cold. On the other hand we don't know anything about $Y_{\text{easy}}$: this is an outcome associated with the counterfactual world in which I took it easy, so we can't observe it.

We can consider the pair of values $(Y_{\text{hard}}, Y_{\text{easy}})$ as potential outcomes for an individual, just by assuming that the value of $Y_{\text{easy}}$ exists and is well defined (even though it is not observable, even in principle. We can use the values of this pair of numbers to divide the population into groups of people based on how they individually respond in the two different worlds. This idea (due to Donald Rubin) is both powerful and controversial, and deserves a future post of its own; I will say no more now for the sake of brevity, other than that the idea is mathematically very useful, whether you find it completely natural or philosophically challenging.

Testing Causal Questions

Now that we've defined some ideas of probabilistic causality, the real question is this: how can we answer causal questions? As I've already suggested, the 'gold standard' method is to do a randomised trial. We take a large group of people, and randomly divide them into two groups. The first group are all treated with $X=1$ and the second with $X=0$, but otherwise they are treated the same; if the number of people with $Y=1$ differs between the two groups (in a statistically and scientifically significant way), then there is a causal effect of $X$ on $Y$ in the sense defined above.

If we can't do this for some reason, whether practical or ethical, then life is rather harder. Next week I will discuss one popular approach to this, known as instrumental variables.

If correlation isn't causation, then what is?

2013-11-08T10:46:00.001+00:00

I've started to get a little tired of writing entirely about media shenanigans, and in all likelihood so, dear readers, have you tired of reading about them. So today I'm going to provide one of the 'educational pieces' alluded to in this blog's description; specifically I'm going to start talking about causal inference, which is the driving force behind the research I'm lucky enough to be paid to do.

We hear a lot in statistics classes (if you're inclined to go to them) and places like this blog, that "correlation is not causation." Packaged within this pithy but stern warning is some very sound advice: just because you observe a relationship between two things doesn't tell you anything about the mechanism (if any) which creates that relationship.

For example, you might not be surprised to learn that Guardian readers are, on average, better educated than the general UK population. You are unlikely to conclude from this that reading the Guardian makes you better educated, because this is absurd (unless you have a rather narrow view of what 'educated' means). Perhaps instead you imagine that better educated people are (generally) more interested in the things the Guardian has to say, or perhaps that people from a middle-class background are both more likely to be well educated and to read the Guardian.

We often perform this sort of inference unconsciously; in the real world we are confronted with patterns and associations all the time and, usually being unable to experiment, we must infer the mechanism just from observing. In some respects we are remarkably good at this, but this is often because we have learned a lot about the world around us through experience.

It can also trip us up: for centuries in the Western world it was believed that bad smells caused disease (think of malaria's etymology). This does not seem unreasonable, since there is a strong association between foul smelling material and illness, but the conclusion is quite incorrect. In particular, it suggests that the appropriate remedy is to cover up the bad smell with a good one, rather than washing one's hands more often. It wasn't until the late 19th century when Louis Pasteur and Robert Koch showed that the bad smell and the diseases were both caused by bacteria, that our modern understanding developed.

To help us understand both the importance and the difficulty of causal inference, I will present a little piece on the history of understanding the relationship between smoking and lung cancer.

Fisher's Cancer Gene

Sir Ronald Aylmer Fisher was a highly distinguished and brilliant statistician, and is responsible for the foundation of many modern statistical methods; if you've studied statistics you'll already know this, because he has quite a few important things named after him. He was also a difficult and stubborn man who entertained feuds with other statisticians, and who rejected one of the greatest triumphs of modern epidemiology: proving that smoking causes lung cancer.

Fisher was himself a heavy smoker, and this may have influenced his views. By the 1950s, large observational data sets were available to show that people who smoked had a much higher risk of developing lung cancer. Given what we now know, and the seemingly obvious potential problems caused by putting foreign substances into the body, it seems difficult to understand how such seemingly strong evidence could be ignored (on the other hand, people used to sell radioactive toothpaste).

We will represent the idea that some quantity $X$ influences another $Y$ in a causal way using an arrow: $X \rightarrow Y$ (I will try to define this more precisely in another post, though doing so is not straightforward: just ask a philosopher). The modern consensus view, then, is represented by this diagram:

$$\text{Smoking} \longrightarrow \text{Cancer}$$
However, Fisher argued that it was not possible to rule out other explanations for the association between smoking and lung cancer. Was it not possible, for example, that people who are already in the process of developing lung cancer (though unaware of it) might be more likely to use cigarettes as a palliative treatment, in order to relieve some related irritation of the lungs? In such a case the proper conclusion would be that the cancer causes the smoking.
$$\text{Smoking} \longleftarrow \text{Cancer}$$
Or perhaps there is some other factor which we have not considered, and which both makes smoking and lung cancer more likely. Social factors are quite plausible in this context: perhaps people who live in the city are both more likely to smoke and also to inhale factory soot. However, such hypotheses can be checked by gathering additional data. Fisher himself proposed a genetic explanation: perhaps there is a gene which provides its carriers with a strong craving to smoke, and which also increases their risk of developing cancer. That is to say:
$$\text{Smoking} \longleftarrow \text{Gene} \longrightarrow \text{Cancer}$$
The key difference between the three hypotheses is this: if we forced or convinced everyone to stop smoking, in the first example it would reduce the number of deaths from lung cancer, whereas in the other two it would make no difference whatever because the cause is something else. (This gives a hint of how one might try to define causality more formally.)

Logically Fisher is correct, an observed association can be explained in any of these ways. In order to be certain that smoking causes lung cancer, we would need to do a randomised controlled trial: simply take 1000 people, divide them into two groups at random. Force one group to do 40-a-day, ban the rest from smoking at all, and wait 20 years to see what happens. Since any other mechanism which might cause participants to smoke (or not) has been broken, any remaining association must be causal. Sadly though, this experiment didn't make it past the ethics committee.

So how was Fisher proved wrong? Ultimately it came down to the sheer weight of evidence: Fisher's genetic explanation is highly implausible, because it would require two genetic effects (one on cancer and one on smoking) to be absurdly strong; genetic effects can also be controlled for (to some extent) by studying people who are related. The ethics committees of the past did allow animal experiments, which demonstrated that smoking in (for example) beagles certainly does cause lung cancer. And external circumstances may be used as a proxy for randomised trials: a higher cigarette tax reduces the amount by which people smoke, but shouldn't have any other effect on lung cancer (this is called an 'instrument', beloved of economists in search of causal conclusions).

The last great hope for causality

Causal inference is difficult, but it is not impossible (even without randomised trials). It is certainly try that any causal conclusions drawn from observational studies should be treated with great caution (here's a particularly terrible example from our old friend the DM, I couldn't resist). But the defeatist (or rather obstructive) attitude of Fisher is not the answer. Causal inference is a fast developing field which shows that it is possible to obtain evidence of causal effects even without ideal study designs, and there is a huge potential benefit for many fields in discovering how.

I hope to share some of those methods with you in future posts.

Reference

Stolley, P.D. - When Genius Errs: R. A. Fisher and the Lung Cancer Controversy, Am. J. Epidemiol. 133 (5): 416-425, 1991.

The Monty Hall Problem

2013-09-13T12:54:00.001+01:00

It's great to see pieces on familiar mathematics puzzles in the mass media, so I was pleased to see this article about the Monty Hall problem on the BBC News website. However, I'm moved to write a little piece about this, because I think it uses a rather carelessly analogy.

A brief summary of the problem. You're in a game show, and there are three doors; behind one of the doors is a prize, behind the other two is nothing (or possibly worse, a goat). The procedure is as follows every time the game is played:

you choose a door, say number 1 (but don't see what's behind it);
the game show host, who knows where the prize is, opens one of the other two doors, say number 2, and reveals nothing;
the host gives you the opportunity to either stick with your choice or door 1, or move to door 3;
your choice is revealed.

The question is - should you stick, or switch, or does it make no difference?

The BBC video (with Marcus du Sautoy and Alan Davies) is very clear, and so is most of the article. However the beginning includes a reference to Deal or No Deal: the reason I don't like this is that in Deal or No Deal the banker doesn't know where the money is. As we will see, this point is absolutely critical to getting the right answer.

I won't go into too much detail here (there are lots of ways to explain this, have a look at the article), but the answer is that you should switch: to see this, note that the probability you chose the prize door in step 1 is clearly 1/3; in step 2, at some level you learn nothing, because the same thing always happens (the host opens a door to reveal no prize). So the chance that you have the correct door is still 1/3, and therefore the chance that the remaining door has the prize must be 2/3. It's not easy to understand at first, but it's all to do with the inevitability of what happens in step 2.

Suppose instead that the host doesn't know where the prize is, and just opens a door at random. There's a 1/3 chance you originally chose the prize, in which case the host will definitely reveal nothing. But there's a 2/3 chance you do not select the prize door, in which case there is a 50% chance the host will choose it instead. If he does this, one assumes that the game is over and you lose.

Let's call the prize door A, and the other B and C, and write AB to mean that you choose door A, and the host chooses door B. If you and the host both work at random, there are 6 equally likely possibilities:

AB, AC (should stick)
BA, CA (already lost)
BC, CB (should switch)

There's a 1/3 chance you lose immediately, but if you get to step 3, then in half of the instances you should switch, and half you should stick. In other words, it doesn't make any difference to you chances of winning, it's 50-50.

In the original scenario though, the host never picks door A, so BA and CA are not allowed.
What happens in step 2 is completely inevitable, whereas if the host is as ignorant as you, there's a chance you'll learn exactly where the car is.
Instead, if you choose door B, then the host will definitely pick C, so BC 'absorbs' all the probability from BA; similarly CB from CA. Now both BC and CB are twice as likely as each of AB and AC.

A friend of mine found the following generalisation helpful for understanding the problem. Suppose there are 100 doors, and still only one prize. You pick a door, and the host then opens 98 of the remaining 99 doors to reveal nothing. Again, this outcome was inevitable, so there's still a 1% chance the door you chose hides the prize, and if you switch there's a 99% probability you find the prize.

(Thanks to Tim Cannings for suggesting this, and Aeron Buchanan for the generalisation.)

Twins, or In which newspapers mess up odds calculations (again)

2013-06-05T09:00:00.000+01:00

This BBC piece interviews a couple who've been 'blessed' with three sets of twins (rather you than me). The caption states that

Doctors told them the chances of having three sets of twins was 500,000-1.

Doctors that don't know much about genetics, perhaps.

Our old friend the Daily Mail managed to imply that this happened for the first time in Britain last year:

Obviously they don't read the Telegraph, which has an example from 2001. (The DM article has since been changed but they forgot to change the title of the web-page, which is why it appears like this on Google; the correction is not acknowledged, naturally.) And if three sets of twins are unlucky, feel for this poor woman, who has six!

The Telegraph helpfully details the mathematics involved in their calculation. For natural births, in white European women, approximately 1 in 80 result in twins (or triplets, or more, if one is even luckier). Therefore if you take three births at random, there's about a one in $80 \times 80 \times 80 = 512,000$ chance they'll all be twins.

Except, obviously, these births were not selected 'at random' because they're all from the same mother. Which is sort of the point.

Like many things, there are genetic factors which predispose people to have twins, generally (as far as my research tells me) because the mother is prone to hyper-ovulate. So if you pick a mother who has already had twins, they're more likely to be someone prone to hyper-ovulation, and therefore it's more likely that they'll have a second set. If they've already had two sets, well you can be pretty darn sure they hyper-ovulate.

Other factors are also important. If you're of Nigerian ancestry, then your chances of having twins in the first place are considerably higher. Older mothers are also more likely to have multiple births.

I haven't been able to find anything terribly concrete quantifying the observed proportions of multiple sets of twins, but it seems to be about four times as likely that you'll have a second set of twins if you've already had one. This reduces our highly unlikely 1 in 500,000 to a mere 1 in $80 \times 20 \times 20 = 32,000$.

Quite a lot of families are out there who have gone through three pregnancies. Third births within marriages and civil partnerships in England and Wales (outside marriage the order isn't recorded) represented about 7.5% of the 720,000 in 2010. That's about 50,000 or so per year, so we can expect one of these stories to come around about every... oh nine months or so?

Actually that's probably not true, 'cos if people with two sets of twins already have any sense they'll start being a little bit more careful...

Extreme Comparisons

2013-04-19T12:47:00.002+01:00

This is just a fairly short note on what has been covered in other cases elsewhere. The BBC reported today that Tameside in Greater Manchester is the "UK's heart disease capital". This is on the basis that the rate of deaths from heart disease between 2009 and 2011 was higher than anywhere else, at 132 per 100,000 people. The data come from the British Heart Foundation, and I couldn't immediately see how to get at it.

The article mades a particular point of noting that this rate is three times higher than that of Kensington and Chelsea. This sounds very dramatic, but it really isn't, so don't go moving just yet (even if you can afford a house in Kensington).

The problem with this approach is pretty simple. If you take two different groups of people, even if the people in these groups are actually essentially interchangeable, different things will happen to them just by chance. For example, some heavy smokers live to be 90, just because they're lucky (but not many).

If you take a large collection of different groups (all the UK's local authorities, in this case) this effect becomes quite strong. Let's suppose that people in those groups are essentially the same, say some statistically literate but morally dubious government has decided to randomly assign people to live in particular areas. Just by chance, some groups will have more cases than others.

I tried this with a little simulation - take 300 large-ish equally sized groups with an average of 150 deaths each; by chance the number of deaths in each group would typically range between 116 and 187, and variations as large as between 100 in the lowest group and 200 in the highest wouldn't be too surprising. The effect becomes more dramatic if some of the groups are much smaller than others, as is the case with local authorities.

Exactly the same phenomenon is seen here for bowel cancer rates. It's explained very well with a funnel plot.

Of course the groups of people in different local authorities aren't the same, even on average. They'll have different age profiles, for example, different lifestyles, income levels, access to healthcare and so on. Some of these should be controlled for (depending on the point you're trying to make), some perhaps shouldn't. But comparing extremes as the BBC (egged on by the BHF) does tells us very little indeed.

Smoking rates: even the 'good guys' shouldn't be trusted

2013-03-22T14:27:00.000+00:00

The Guardian reports today that the number of children under 16 (more precisely aged 11-15) taking up smoking has has risen by 50,000 in a single year. This is a slightly irritating headline simply because the number is without context, but the sub-headline is more helpful: from 157,000 to 207,000, which is quite dramatic: a 32% increase.

First obvious question - is this statistically significant? Well, let's have a look at the research quoted by the Guardian, which comes from Cancer Research UK. Their Figure 6.10 immediately arouses suspicion. There does appear to be an uptick in the figures from 2010 to 2011, but they remain below the 2009 figure! The overall trend in the numbers over the past 10 years is clearly downwards. So either we believe that the number of children taking up smoking fell and then rose dramatically in consecutive years, or else we might just be witnessing a bit of noise in our data.

Surveys

So first, what exactly is the number actually measuring? Well it's the number of students who become regular or occasional smokers from one year to the next. This is measured by comparing the number of such regular or occasional smokers (for example) aged 12 in 2010, with the number aged 13 in 2011.

But the researchers cannot, in practice, go around asking every 11-15 year old whether or not they smoke. Instead they commission a survey, or find the results of an existing survey, in this case the snappily titled Smoking, drinking and drug use among young people in England in 2011, carried out by a branch of the NHS. You can see it all here.

The survey gives an estimate of the proportion of children smoking in England, which the statisticians at Cancer Research UK multiplied by the ONS's mid-year population estimates for the UK to arrive at a total number. This is, in my view, a mistake, because we end up combining the statistical error in the original survey with the error in the ONS's population estimates (though this should be fairly small). Also: why multiply population estimates for the UK by proportions for just England? Simple: the UK has more people, so the numbers sound bigger that way.

The original research says that the proportion of regular smokers amongst 11-15 year-olds in the past 5 years is 6%, 6%, 6%, 5% and 5%. Their Figure 3.2 backs this up. All in all, not very dramatic. No mention is made in the original report of any increase in smoking rates; in fact rates amongst school children have fallen dramatically in the past 20 years, with the number of regular smokers having been around 10% for boys and 15% for girls in the mid 1990s.

The problem is that it's in Cancer Research UK's interest to make this all sound as bad as possible so that ever more action is warranted by the authorities. In fact they seem to have gone to quite a lot of trouble to construct a figure which will get them a headline.

Their motives may be laudable, but they have an agenda just like everyone else, and it does not necessarily involve being totally honest. So keep your skeptical hats on at all times!

Appendix on Batch Effects

Here we cover some detail about the surveying method.

The survey covered 6,519 school pupils, but clustered into 219 schools in England. Ideally the authors would have written down a list of every 11-15 year-old in the country, and then picked several thousand of them from the list at random, and then given the lucky selection the survey. But such a list isn't readily available in the UK, and it's very difficult (and probably illegal) to create one.

Much easier is to make a list of every high school in the country, and pick a few hundred at random. Then you can go to each school, and either survey every student, or use the school's list of pupils to sample some of them at random. This is much simpler, and if done correctly should (very approximately) mean that every student in the country is still equally likely to be selected for the survey.

This means that the proportion of students in the survey who say they smoke should be an unbiased estimate of the proportion of all students who would answer the same way. Of course, if I ran the survey again and asked a different 6,519 students, i would get slightly different answers by change; but because the design is unbiased, I should get the 'correct' answer on average.

There's a disadvantage to this method though, which is that 6,519 students from 219 schools doesn't give you as much information as 6,519 students picked randomly from a list of all students. This is because students in the same schools are more likely to be similar to one another; they have similar backgrounds and come from the same area, they're all from the inner city or rural Cumbria, for example. This within-school correlation (a batch effect) can be estimated from the data and corrected for, which the authors of the original research do. More info in this article, on a similar example.

No attempt was made by the Guardian's article (or by CRUK as far as I can tell) to assess whether the difference they observe might be due to sampling variation.

Independence, independence, independence

2012-10-29T17:12:00.002+00:00

Multiplying probabilities without any thought is a dangerous game. Just look at this piece in Rolling Stone, referenced in this Guardian article.

...warmest May on record for the Northern Hemisphere – the 327th consecutive month in which the temperature of the entire globe exceeded the 20th-century average, the odds of which occurring by simple chance were $3.7 \times 10^{-99}$, a number considerably larger than the number of stars in the universe.

How does this number arise? The chance of a random variable exceeding its median is one-half (that's the definition of a median). If we treat each month as some independent random variables, then the chance of 327 of then all exceeding their medians is $0.5 \times 0.5 \times\ldots $(327 times)$\ldots\times 0.5 = 0.5^{327}$, which is indeed $3.7 \times 10^{-99}$; very very unlikely.

Of course, we don't know what the median of these random variables is, we simply use a historical average (from the 20th Century) to estimate it.

Since the probability we got was astronomically small, we deduce that one of the assumptions we used to calculate it was false; in Rolling Stone and the Guardian's case, they assume that medians were not correctly estimated, because the world has got warmer over time. This seems likely to be true, given all the scientific evidence we have for climate change.

But in reality the independence assumption is certainly false as well. The temperatures in consecutive months will certainly be correlated with one another; for example, my limited understanding of El Nino is that it can lead to warmer weather. So, in fact, the probability calculated above is completely meaningless.

Suppose instead, for example, that conditional on January being warmer than average, the chance that February is also warmer than average is $0.75$, and the same for other consecutive months. The probability becomes $0.75^{327}$, which is a mere $5.9 \times 10^{-42}$. This is still astronomically unlikely of course, but the difference in magnitude between the two numbers is more astronomical, which suggests that it wasn't sensible to present the probability in the first place.

It's tempting to try and turn the very complicated but compelling evidence for something like climate change and turn it into a single number or graph in order to communicate a message. That's a very worthy aim, but make sure the number you choose is even vaguely correct.

Horse Racing and Coincidence

2012-10-16T08:20:00.002+01:00

Just penned this to the BBC about why their piece on a jockey winning seven races in a single day quotes the wrong odds.

In your article on Richard Hughes winning seven races in a single day, you quote the odds of this event as being 10,168-1. Whilst undoubtedly a fantastic achievement, these odds are incorrect, since they they ignore the fact that Mr Hughes raced in eight races that day. The chances of him winning 7 out of 8 races (or more) is about 1,257-1, which is a bit more modest. In particular, it seems fairly unlikely that someone would place a bet on the rider winning these particular seven races, and not the eighth.

If the piece's author or anyone else want to talk more about how these odds are calculated, or why this sort of thing is important, I'd be very happy to chat about it.

Best wishes,

Robin Evans

I think this is the right approach to take with these mistakes, and I hope it doesn't come across as condescending. With a bit of luck they'll acknowledge the mistake and get in contact.

While we're on the subject, calculating multiple odds can seem tricky at first. The quickest way to do it is this: if the odds an event are 'a to b' (usually written a-b or a/b), then the probability of the event is b/(a + b). For example, 3/2 means 2/(3 + 2) = 0.4, so we'd expect 40% of events with these odds to actually occur (or slightly less if the bookies are taking a cut!).

To work out the chances of several events happening, and assuming these are independent (reasonable in this case, since the odds would be updated after taking into account what had happened in previous races), we multiply the probabilities. In this case the seven races had odds of 13/8, 5/2, 7/1, 4/1, 5/2, 7/4 and 15/8, and multiplying these gives 0.000098, which as the BBC say is about 10,168/1.

However there was an eighth race, in which Hughes was the 2/1 favourite, and in which he 'only' came third. So we need to consider the probabilities of him winning seven races, and failing to win the eighth race; of course there are eight ways in which he could do this, because we would have found it equally amazing if he'd won any of the seven races, and we throw in the chance of him winning all eight, since that would be seriously impressive.

The sum of all these possibilities comes to about 1257/1 (R code working below). This is still impressive, but an order of magnitude different. There's a lot of race meetings every year, so I'm surprised that this event hasn't happened since 1996 - perhaps jockeys don't often race eight horses in one day. Or maybe their the odds on their horses are usually a bit longer.

There are plenty of other, examples where people straightforwardly fail to calculate odds correctly, making something seem strange, spooky or suspicious, when it's merely mundane or tragic. It would be great if the BBC could avoid contributing to this malaise!

R Code

p = 1/(c(13/8, 5/2, 7/1, 4/1, 5/2, 7/4, 15/8, 2/1)+1)

out = prod(p)
for (i in 1:8) out = out + prod(p[-i])*(1-p[i])

out

Fruit, Vegetables, Health and Happiness

2012-10-11T15:22:00.002+01:00

Whilst perusing the Daily Mail today (my excuse is that it's next to the espresso machine) I saw this interesting health advice:

Forget five a day: Now scientists say you'll be healthier and happier eating seven daily portions of fruit and veg.

The phrase "scientists say" is always a red light.
The article says that there's been a study which shows that people who eat more fruit and vegetables are happier than everyone else. It runs through all the classically bad newspaper science reporting clichés. The old wisdom has been turned on its head, the government has been wrong all these years, those crazy scientists have gone and done it again! For example, it says:

But now scientists claim if we upped it to seven, we’d also be far happier.

There is a clear causal implication: if we change our diet to increase our intake of fruit and veg, we will become happier.

Here's a preprint of the paper referred, titled Is Psychological Well-being Linked to the Consumption of Fruit and Vegetables? The abstract contains the important sentence which is omitted from the newspaper article:

Reverse causality and problems of confounding remain possible.

They remain very possible indeed! All the paper does is look at some associational studies, where they ask people about their fruit and veg intake and their happiness. No randomised studies were carried out

It's well documented that poorer people eat less fruit and fewer vegetables, and it also seems pretty likely that your income and wealth have a substantial effect on your happiness, however measured. Thus income is potentially a very strong confounder. Needless to say the authors do try controlling for variables like this (and smoking status, BMI, etc), and still find a correlation.

Just as important is the problem of reverse causation: if I'm feeling bad about myself, do I reach for an apple? Well in my case, it's more likely to be the chocolate or the Pralines and Cream. So it may be that happiness has an effect on fruit and vegetable intake.

Another difficulty is that these studies rely upon people remembering how much fruit and veg they've been eating, and their mood at the time of asking might affect their memory.

In addition, the effect size is not that large. Taking all my estimates from Table 1 in the paper (based on Scottish data), people who eat 8 or more portions per day have a mean life satisfaction 0.27 higher than those who eat none at all; this is measured on a scale from 0 to 10. So it certainly isn't correct to say that scientists claim you'll be far happier if you eat a few more grapes.

There is quite a lot of variation in the estimates, so for example eating 5-6 portions appears to result in a higher increase over eating none (0.23), than does eating 6-7 (0.17). It seems to me that a better idea would be not to treat the intake as a discrete variable taking lying in these different categories, but as something continuous and fit a single regression line; this would avoid these problems and allow the whole sample to do the work.

Lastly the DM question whether the health advice given by the government should be changed on the basis of this paper. It mentions that various other countries suggest a higher intake of fruit and veg, though they omit to mention that the definition of 'fruit and veg' and a 'portion' are different in each case.

Needless to say nothing should (or will) be changed because of this one paper, but it's worth pointing out that health advice is partly based upon its actual effect on the population - if the Ministry of Health started saying that you should eat ten portions of fruit a day, people might dismiss the advice as unrealistic, and ignore it altogether. There are ethical questions about how untruthful one can be in this case, but the presentation of such advice is certainly important.

Trust your instincts

2012-10-02T11:01:00.000+01:00

As Obi Wan almost said: "Let go your conscious self and act on instinct. Your news articles, government agencies and other organisations can deceive you, don't trust them."

My brother Andrew (@andysstudy) has been studiously following this advice since 1977, and sent me this tweet:

See below. There are 73m children (u15) in EU I find it hard to believe 1:73 goes missing every year? http://www.euronews.com/2012/10/01/has-anyone-seen-my-child/

The link is to a Euronews article about the problem of children disappearing in the EU. The killer statistical sentences in the article are these:

An estimated one million children go missing every year in the European Union. These include runaways, criminal abductions, those abducted by a parent, the lost or injured, as well as missing unaccompanied migrant children.

As Andrew says, this seems like an extraordinarily large number. In a typical large high school like the one I went to, it would mean about 4 of the 300 students in each year group going missing annually.

So where does the number come from? Fortunately not much detective work was required on my part: this document, a summary report of a European conference on missing children from May 2012 states, as one of its conclusions, that

5. The Commission will use a working figure of 1 per cent of children in the EU (one million children per year) who go missing, pending more reliable data becoming available.

In other words, the number is made up. It's really that simple - in the absence of any data on the subject, they created a 'working figure' instead. So the Euronews' sentence should be "It is guessed that one million children go missing...".

Children disappearing, whether runaways, abductees, or victims of violence undoubtedly represent a terrible problem afflicting our societies, and any number would represent a great deal of untold misery for those children, their parents and families. Which, in my view, makes it ever more critical that such information is factually correct. I can only assume that the organisers of the conference have concluded that a headline grabbing number is important enough to justify making one up.

There is a seductive quality to choosing a small (and therefore plausibly conservative sounding) fraction, 1 percent, applying it to a very large number, 100 million, and getting a still large number, which can be used to shock people into action.

A similar example comes from Norman Myers' 1979 book, The Sinking Ark:

Let us suppose that, as a consequence of this man-handling of natural environments, the final one-quarter of this century witnesses the elimination of l million species--a far from unlikely prospect. This would work out, during the course of 25 years, at an average extinction rate of 40,000 species per year, or rather over l00 species per day.

So, again, one apparently plausible figure is invented to justify a shocking one: 100 species going extinct per day. Species extinction is a very real problem, but this sort of approach doesn't do anyone any credit, and only provides ammunition to those who spout guff about scientists (such as Prof. Myers) being involved in some sort of conspiracy to exaggerate environmental problems in order to get more grant money.

We deserve better.

Road safety - how not to reason from data

2012-09-13T23:42:00.003+01:00

My apologies for going quiet over the summer. I've been in China, and they're not too big on blogs there (blogger is blocked).

Just a short post here, on something fairly obvious. Here's an extract from post number 51 by Cambridge News' Cycling Blog on speed limits.

In Great Britian in 2011, 7 people were killed on a road with a 20 mph limit. 636 killed in a 30 mph limit. 289 people were seriously injured in a 20 mph limit, 13,168 in a 30 mph limit... Yes, these are large numbers. But it is the proportions that matter here... So that is proof then. Lower speed limits means fewer people killed.

[In case you're wondering, there's no irony in the last sentence. At all.] I assume you're all thinking the same as me by now.

There are a lot more 30mph roads (by length) than 20mph ones.
30mph roads are often busier than 20mph roads, so there's more things to collide with each other.

After a cursory glance for official proof of the first point I came up empty-handed, but I imagine you'll believe me.

There are a lot of good reasons to think that 20mph roads are safer than 30mph ones, at least if the speed limits are observed, just from simple physics. A car travelling at 30mph has 2.25 times as much kinetic energy as one travelling at 20mph. As a driver reacts before braking, the car travels 50% further; as it starts to brake, this will happen more slowly the faster the car is going to start with. I am personally in favour of 20mph limits - they're a lot more pleasant to cycle on, because I can actually keep up with the traffic. They certainly feel safer, because drivers are less likely to try and overtake me inappropriately. But what's listed above isn't proof of anything.

Polygraphs and Sex Offenders

2012-07-20T09:38:00.001+01:00

Just a short post on the news story that a pilot scheme which administers polygraph tests ('lie detector tests') to sex offenders on probation has been deemed successful, and the Ministry of Justice plans to roll it out nationwide.

The MoJ's research seems reasonable enough: the pilot scheme took place in the Midlands, and comparison groups of offenders were selected from other areas. The assignment was not randomised, which is unfortunately all too common, but the comparison group was similar to the treatment group on the basis of the most obvious covariates (age, original offence, risk or reoffending, criminal history, etc.). Based on a quick-ish reading of the paper it seems fairly solid.

The main finding was that the treated group were far more likely to make clinically significant disclosures (CSDs) than the comparison group; CSDs include relevant information about recent activity, sex life, and sexual fantasies. The offenders tended to make these additional disclosures when taking the polygraph test, often just before the test. This suggests that the perception of being under scrutiny for deception caused the offenders to disclose; in particular the study design can give no indication of whether the polygraph tests actually detected deception in the offenders.

My limited understanding of the research around polygraph testing (~~I couldn't find a good scientific review, but if anyone knows of one, let me know~~ see below for a scientific review; here's an interesting historical review [£]) is that they are fairly effective at detecting deception, but that if the test taker has been trained to beat the test they become almost useless. My main concern (scientific, rather than ethical) about using testing is that it can lull operators into a false sense of security.

Some offender managers stated that ... when polygraph results showed no deception, ... it reassured them of the offender’s honesty ...

This is particularly worrying as the offender mangers (and indeed their fellow criminals) were wont to describe some sex offenders as 'devious', which suggests that they will be particularly likely to try to manipulate the test. Lastly I note that the study did not consider reoffending rates, which is presumably the outcome we would most like to see reduced.

UPDATE: Tweeter @hjnock pointed me to this review by the British Psychological Society which gives a great summary of the evidence for relating to polygraphs. I think it justifies my fairly vague comment above!

Reading too much into one number

2012-07-18T11:24:00.000+01:00

The BBC points out that deaths from road accidents in the UK increased last year for the first time in 10 years. In total 1,901 people were killed during 2011, which is 51 more than in 2010. This is certainly not good.

But is it surprising, in a statistical sense?
Well if the roads were just as safe (or just as dangerous) in 2011 as in 2010, we still wouldn't expect the number of deaths to be exactly the same in the two years: other factors such as the weather will have an effect. The Transport Select Committee's report concludes by stating:

In the response to this report, we recommend that the Government outlines why it thinks road deaths increased in 2011.

A possible (though unlikely) response is: "we were just unlucky this year".

Poisson Distributions

A crude way to capture the variability is with a Poisson distribution. The essence of a Poisson is that it counts how many events of some kind take place (like road accidents) whilst assuming that each event is independent of all the others. In other words, the fact that one accident has occurred (or hasn't occurred) doesn't make the roads any more or less dangerous for everyone else during the rest of the year. You don't think "oh well, there was an accident in my town today, so that's our quota for the month - I'll be safe now" - this would obviously be ridiculous.

The only other piece of information we need is the intensity of accidents - how many deaths do we expect per year on average, possibly over many years of equally dangerous roads. Again, crudely, we can estimate this as the deaths of accidents in 2010, which was 1,850. So if in an average year we get 1,850 deaths, how often would we see 1,901 or more? Then answer turns out to be 12%, or about one in every eight years. In other words, this is not very surprising.

What about the assumptions we made? In reality accident deaths are not independent, mainly because several deaths can occur all at once. Last year 7 people were killed in a particularly awful accident on the M5. This sort of clustering of events causes overdispersion, and means that the variability in the number of deaths will actually be greater than my simple model implied. Hence, 12% is likely to be an underestimate.

Could it work the other way around? Perhaps after seeing the M5 crash, everyone drives a bit more carefully, and there are fewer accidents (this would lead to underdispersion, naturally). This is possible, but the effect is unlikely to be as strong as the effect of several people dying all at once. To be certain, we could test this idea with appropriate data.

Remarkable success

The reduction in the number of road deaths (particularly for vehicle occupants) in the last seven years has been remarkable (see graph); I can think of few other areas in which such a dramatic success can be demonstrated. The huge improvement in the safety standards of vehicles are the main reason, but better roads and higher petrol prices (so people drive more slowly) are also likely to have helped.

It's clear that the downward trend seen between 2006 and 2010 could not continue forever, so we shouldn't get too worked up about a small one year increase. If it rises again in the next couple of years, I'd be more concerned. I also can't understand why the committee find it "shocking" that road accidents are the biggest killer among young people. Something has to be the leading cause of death in this group; would they rather it was homicide? Or suicide? Or drug abuse?

Having said all this, the increase in the number of people killed or seriously injured looks much more significant (2% increase to 25,023), and might be harder to explain away as statistical noise. This post isn't intended to be an excuse for complacency in the Transport Department, just another cautionary tale about reading too much into small amounts of data.

Sigma-tised

2012-07-03T15:50:00.000+01:00

Is it just me, or is everyone completely unable to explain what a confidence interval is?

The Higgs Boson is back in the news again, here's a Nature News article discussing Cern's latest discovery, which is that

The data contained “a significant excess" of collision events at a mass of around 125 gigaelectronvolts...

Physicists have maintained that they will not announce the discovery of the Higgs until the signal surpasses 5 sigma, meaning that it has just a 0.00006% chance of being wrong. The ATLAS and CMS experiments are each seeing signals between 4.5 and 5 sigma, just a whisker away from a solid discovery claim.

This is wrong.
I also found incorrect explanations by MIT, the Guardian, and the New York Times, the latter of which (discussing a different discovery) said

The odds that the Fermilab bump were due to chance were only one in 550.

Now I'm not a physicist, but my understanding of what Nature is trying to say is something like "if there were no particle of a mass around 125 GeV, then the chance of a 5-sigma event is 0.00006%."

Confidence for beginners

The difference sounds quite subtle, but we'll see with some examples why it matters. Think of the physics experiment as a very accurate diagnostic test - if the particle doesn't exist, it's very unlikely your experiment will give you a false positive. It seems then to follow that if you get a positive result, it must be very unlikely that the particle doesn't exist. This is both 'intuitively' true and fatally flawed, because it really depends upon the base rate, or how likely you thought it was that the particle existed before you did the experiment.

Let's think of another example (I gave this one in a previous post). You go to the doctor, and he gives you a diagnostic test for a disease. The test is very accurate, let's say 99%; this means that if you have the disease, 99% of the time the test will say so, and if you don't have the disease, 99% of the time the test will come up negative. The test comes up positive - do you have the disease? (or rather what is the probability that you do?)

The point is that it will depend upon how likely you were to have the disease in the first place, that is before you took the test. Pregnancy tests are very accurate, but if I took one and it came up positive I'd be pretty darn sure it was a false positive (not that pregnancy is a disease, *cough*). If the disease is rare (say 1 in 1000 people have it), and you had no particular reason to think you might have the disease, then it's still about 10 times more likely that the test was wrong than that you have the disease. In other words, the probability you have the disease is only about 1 in 11. So no need to panic.

Reasoning wrongly as above is known as the base rate fallacy, and the maths can be formalised with Bayes formula.

Other issues

Going back to the Higgs Boson example, things are a bit different. Physicists actually had good reason to believe that Higgs exists before the Large Hadron Collider actually started work, so we might put the prior probability of its existence at, say, 10%. (Perhaps one could do a study of how many Physicists' predictions turn out to be true.) Then if we observed a 5-sigma event we could apply Bayes' rule and find the posterior probability of the particle's non-existence, which would be about 0.0005% (one zero fewer than before).

Big deal you might say, this still seems like pretty solid evidence. There is a further problem, which is that these probabilities assume the experimenters have dealt with all possible other explanations for the evidence, critically including all sources of systematic error in their measurements. I'm sure they've set the experiment up very carefully indeed, but the probability of an error of this kind is still likely to be much higher than 0.0005% (I'd guess between 1% and 10%), so we really shouldn't say that the probability of an error is 0.00006%.

More generally, medical studies are generally performed to a standard of 5% confidence. This means that for a positive result, roughly speaking, if the drug is useless (or worse) there less than a 5% chance of seeing results as good (or better) than we did. This isn't perfect, so it should be obvious that some positive results will later turn out to be wrong, and so they are. You might think it would be precisely 5% of results, if you hadn't read the rest of my post.

In fact, most such studies are false positives. This is partly because of experimenter bias, but it's also just because most drugs put forward for trial don't work. Hence the relative rate of false positives to true positives is quite high. Things are almost certainly much worse in the social sciences.

Data Overload?

2012-07-02T13:27:00.002+01:00

Today I guest blogged about the government's open data plans on the Understanding Uncertainty website. Check it out here.

National Traits and the Ecological Fallacy

2012-06-24T15:35:00.000+01:00

This article caught my eye, headlined "Why criminals believe in heaven." [The Daily Mail's 'Science' and 'Health' pages are an unlimited resource for writing blog posts about bad statistics, but I promise to try not to only pick on them in future.]

The original paper is here (predictably the DM don't bother to link to it), and here is the university's press release which has been pretty slavishly copied to create the article. The summary is, the researchers combined crime data from one source (the United Nations Office on Drugs and Crime), with survey data on people's attitudes to religion and other things (from World Values Surveys and European Value Surveys). They found that the belief in heaven and hell is a strong predictor of crime rates at a country-wide level, whilst rates of particular religious beliefs were not.

My main problem with the press release, and consequently the newspaper article, is this sentence:

The finding surfaced from a comprehensive analysis of 26 years of data involving 143,197 people in 67 countries.

The implication of this seems to be that the research involves following 143,000 people for 26 years, and seeing both whether they are religious and whether they commit crimes. In fact the number refers to how many people were asked about their religion. This is a nice large sample size, so they probably have pretty accurate information about religiosity in the relevant countries.

However the crime data is taken from a completely different source! The UN numbers are based on national statistics, which are, of course, collected in completely different ways in different places, and at varying levels of accuracy. I would be very wary of comparing rates of rape or burglary between countries based on this methodology, let alone with variables from other surveys.

The second point which occurs to me is that the crime data effectively covers all the people in these countries, so one might as well say that we have "data involving [several billion] people in 67 countries." In fact there isn't really any data on individuals which answers the question being posed, so our sample size is, at best, 67. The sentence quoted above misleads us into thinking the evidence is much stronger than it really is.

Anyway, if you believe the numbers, you might legitimately conclude that rates of belief in hell are correlated with crime rates amongst the countries surveyed. This is shown in the paper's graph, reproduced here:

[Note to all those who present data: don't try to label all 67 points on a graph like this, it gets very confusing.]

What we would like, but do not have, is information at the individual level about whether those people who commit crimes and those who don't believe in hell are the same people. It is quite possible that if you group individuals in some way (such as by country), and only look at the group averages, you get completely different relationships between variables. This is called the ecological fallacy.

To see this, imagine what would happen if the US was treated as 50 separate states, instead of one country, and that all the states occupied pretty much the same point on the graph (the US is the yellow dot at about (11%, 0.8)); this would increase the slope of the best fit line, making the relationship seem more important. Conversely, suppose we did this to Tanzania (the blue dot on the left), dividing it into its 26 regions; then the relationship would seem a lot weaker. Combining the Latin American countries would also make the relationship weaker, because many of them have both high crime rates and high belief in heaven relative compared to hell.

Now it might seem very arbitrary and silly to divide Tanzania up in this way, or to combine Latin America, but the point is that countries are somewhat arbitrary units in the context of the relationship between religion and crime too. Some countries are very homogenous, some very diverse; some groups of countries are very culturally similar, and at other times in history would have been considered a single country (e.g. Norway and Sweden); some countries are much more populous than others (China vs. Guatemala).

Even if we could see a relationship at the level of individuals, this would be much stronger evidence than the country-wide averages, but would still only be a correlation. It could be that people stop believing in hell if they see lots of crime, or if they see people committing crimes and getting away with it. It could be that other variables such as prosperity or unmeasured cultural similarities are important. The authors try to account for these possibilities, but there's only so much you can do.

The original paper carefully mentions both these problems:

[I]t will be important to examine these real-world effects at the level of the individual. The present findings tie rates of belief at the societal level to national crime rates... It is also possible, however, that an intervening variable or variables are at work at the societal level. ... The direct causal explanation is most closely in line with the experimental findings, but it could well be that both the direct and indirect mechanisms are at work. To assess individual-level effects simultaneously with societal-level effects, it will be necessary to collect data with both national crime rates and individual tendencies toward immoral behavior.

The press release, however, only mentions the causation problem and not the ecological one, and the newspaper articles mention neither. The Daily Mail's article largely avoids the trap of mentioning individual level phrases (apart from the headline). But because it contains no caveat that (i) country-wide averages may not reflect individual level traits, or (ii) correlation does not imply causation, the overall impression given is that if an individual believes in hell this causes him to be less likely to commit crimes.

Other headlines I've seen include "People who believe in redemption commit more crimes" (Real Bollywood), "Belief In Hell Lowers Crime Rate, According To International Study‎" (Huffington Post) and "Belief in Hell Keeps Crime Rates Down, According to New Study‎" (Christian Post), all of which screw up on one or both of these points.

To be clear, I don't have a problem with doing this kind of analysis, but it only constitutes very weak evidence of a direct relationship between crime and belief in hell. It's probably not worthy of a newspaper article at all, but the desire of universities for self-publicity is strong, and therefore the pressure on researchers to get press attention is strong too.

Reference:
Shariff, A and Rhemtulla, M. - Divergent Effects of Beliefs in Heaven and Hell on National Crime Rates, PLoS ONE 7(6), 2012.

Review: The Geek Manifesto

2012-06-23T22:42:00.000+01:00

This week I've been reading Mark Henderson's book The Geek Manifesto: Why Science Matters, which discusses the role of science and its geeky proponents in public life. Henderson's principal thesis is that politicians and scientists have been wilfully ignorant of each other for too long, and that it is time for science, its methods, and its practitioners to take a more central place in politics and policy. This includes statistics, of course, which is why it seemed relevant to the blog.

The central examples are chosen to illustrate the challenges science and scientists face in the public sphere: Simon Singh's libel troubles for pointing out that chiropractic medicine has no useful effect for problems like asthma and colic, in spite of the way it is marketed by some practitioners; David Nutt's dismissal as head of the Advisory Council on the Misuse of Drugs, essentially for doing his job and daring to say that the drugs classification system does not reflect the relative harm of the substances being classified; and the coalition government's plan to reduce the science budget by as much as a third.

But what is most important about these incidents is not so much the serious problems they reveal about the way our country is governed, but the impressive reaction they elicited from large numbers of people in our society who do recognise the importance of science and proper evidence. The campaign to support Singh contributed greatly to the current push for libel reform, which is working its way through parliament as I write this. The backlash against the sacking of Nutt took the government and then Home Secretary Alan Johnson by surprise, as they had assumed it would just be a routine sacking going largely unnoticed. And the planned budget cuts spawned a movement named Science is Vital which passionately argued for the benefits of government support for research, and won a significant, though imperfect, reprieve (the non-capital science spend was frozen for five years).

The common theme according to Henderson is the rise of a sort of Geek Power movement, which he believes has the potential to transform the way that science is understood by the general public, and can help make truly evidence-based policy integral to all governments and parties, current and future. The book covers areas including health, the environment, justice and education. This is an important work, and it would be wonderful to believe that some of the ideas Henderson recommends might bear fruit. There is some evidence that progress is being made: a copy is being sent to every MP. A good start would be for the government to act on this report on randomised trials.

I'm very much of the view that as voters we get the politicians we deserve, and Henderson clearly agrees, urging geeks to make our views known to our elected representatives. It is too easy for an MP to oppose a mobile phone mast near a school or support a homeopathic hospital if only one side ever makes a fuss. If we expect our leaders to use evidence to make policy, then we have to make it costly for them not to do so. They may think that it shows weakness to admit that the evidence shows a policy has failed, but we need to convince them that this is a sign of strength. Something similar goes for newspapers, of course.

Henderson is very careful to make the distinction between scientific advice to government, which lays out the evidence and facts as best understood, and the legitimate political choice which then follows. What needs to be avoided at all costs is politicians being able to claim to be acting on (possibly non-existent) evidence, when in fact the motive is purely political. The upgrading of Cannabis from Class C to Class B is a good example of this.

My only real criticism of the book is that science itself seems to be portrayed in a rather idealised way, a noble pursuit which sits above ordinary human endeavour. This is best illustrated by an anecdote of Richard Dawkins' which is retold:

'There was an elderly professor in my department who had been passionately keen on a particular theory for ... a number of years, and one day an American visiting researcher came and he completely and utterly disproved our old man's hypothesis. The old man strode to the front, shook his hand and said, "My dear fellow, I wish to thank you, I have been wrong these fifteen years."'

This describes perfectly how scientific research should be, but scientists are, in fact, human, and this story is not representative of how people usually behave when their pet theories are demolished (in my modest experience).

I don't suppose that Henderson, though not a scientist himself but a former science editor at The Times, is actually under any such illusion about researchers, rather the style seems to be a product of his (very successful) efforts to persuade the reader. My own feeling is that this attempt to show science in the best possible light involves perpetuating a slightly unrealistic view of how it is really carried out. It's a bit like giving Robin Cook's resignation speech as an example of how politics is done.

The book is well written, as you might expect, and I imagine it would be enjoyed by anyone who is interested in science, and possibly those who might have thought science rather dull until now (like a high school chemistry class). The content is not just a series of examples aimed at those who will already agree that science is great, but an inspiring call to arms for researchers and their allies to stand up and defend what's important to them. A worthy aim, and I have to say it's worked on me. See you on the barricades.

The Geek Manifesto: Why Science Matters by Mark Henderson is available now in hardback and as an e-book, published by Bantam Press. Some free extracts are found here.

How should statistics be taught? Some thoughts.

2012-06-17T19:39:00.000+01:00

Inspired by Timothy Gowers' recent post on how mathematics should be taught to non-mathematicians, I thought it might be prudent to ask how statistics should be taught. If you need to be motivated as to how important teaching statistics is, watch Arthur Benjamin's short TED talk.

My own experience of statistics up to GCSE was generally one of boredom: in the main it seemed to involve drawing histograms and line graphs, with a peculiar level of attention applied to getting the axes perfectly right, and choosing a suitable title. This is not to say that labelling axes is unimportant, but it's not exciting. More useful, and also more fun, is to explain why bad graphics are such a problem. Here is pretty bog standardly horrific example of a 3D pie chart. It wouldn't be hard to explain to school children why not to use this sort of presentation, and perhaps it would eventually filter up into the upper echelons of corporate management (I live in hope).

School probability, perhaps understandably, involved simply counting events (there are 3 blue socks and 4 red socks in a drawer, I pick 2 at random, what's the probability they're both blue?), with the examples tending to be rather implausible and dry. This sort of thing is again important, but the examples can easily be jazzed up a bit; much like socks.

At A-level things perked up a bit: hypothesis tests actually made me feel that I could do useful things with statistics - I even did a bit of coursework on UK election results. Box and whisker plots and finding the median of a bunch of numbers was somewhat less thrilling.

The view I've come to form recently is that applied statistics should, at all levels, teach two essential things, let's call them the cornerstones.

'Basic' probabilistic and statistical intuition. People just aren't innately very good at dealing with probability and risk, even those of us who are trained to do so. Many statistics courses do cover these issues, but it has to be drilled into people again and again. Here I'm thinking of the base rate fallacy, prosecutors fallacy, regression to the mean, etc; don't worry if you don't know what these terms mean yet, but see some of the examples below.
Construction of statistical methods. The thing I don't like about most applied statistics courses (including ones I have led myself) is that they involve teaching a long list of hypothesis tests or other methods which are useful in certain scenarios, and then providing some contrived examples to which we can apply them. But what if the situation you actually find yourself in doesn't lend itself to a t-test, an ANOVA test, or a generalised linear model?

For example, to my mind it isn't so important that a student remembers what Cook's distance or leverage is, as long as they understand what statistical inference is, and therefore why it might not be a good idea for your estimate to be very unstable when a few data points are removed. After all, that lesson applies in almost any statistical model, whilst Cook's distances do not.

A Case Study Approach

I think that an approach based on the study of real (or at least believable) examples could help to ensure that the cornerstones are the main message of the course, rather than the less important technical details. Of course, the suitable level of rigour in a statistics course need not be constrained in any way by the use of these sorts of examples, and will depend strongly upon the intended audience. Horses for courses, as they say.

Below are some interesting questions that could be answered with probability and statistics, and a short explanation of how they relate to the cornerstones. Please do let me know what you think!

1. Are men taller on average than women?

Leaving aside questions of gender identity, the point here is that we can't measure every man and woman to get a definitive answer. We might therefore think about sampling some men and some women, in the hope that the average of the sample is related to the average of the population. If, for example, men are taller than women in our sample, we would also need to consider whether this could have happened by chance. All this motivates the law of large numbers and the central limit theorem; these are weighty theorems, and it can be hard to communicate both their content and importance. For less advanced students, it can be used to give an intuitive idea of statistical significance. The key point is not to just teach students how to use a two-sample t-test.

2. How would you go about determining the number of people in Cambridge who like to watch basketball? How about in China?

This problem also uses the idea that sampling is somehow relevant to the population. A bigger question is how on Earth one could obtain a random sample of people from China, and what a polling company might do to try to compensate for this difficulty. The binomial distribution is obviously relevant here, but again we don't just want to do 'inference by formula'.

3. A doctor gives you a test for a rare disease: approximately 1 in 1000 people have the disease. The test is very accurate; it is correct 99% of the time. If the test is positive, what is the probability you have the disease?

This is the simplest sort of base rate fallacy; the test is so accurate that it seems almost certain you have the disease. In reality, assuming that you have no particular reason to think you had the disease before the test (e.g. symptoms), it is still much more likely that the test is a false positive (probability about 91%) than that you actually have the disease (about 9%).

4. A council collates the number of road accidents on 500 similar stretches of road. It finds that 40 of these have had at least 7 accidents in the last 5 years, whilst the average is just 4. It decides to put a speed camera at these 40 locations. 5 years later, the average number of deaths at the camera locations is just 4. Did the speed cameras do their job?

This is a dressed-up 'regression to the mean' example. Imagine that really there's no difference between the roads; some will have more accidents in the first 5 year period by chance, and then because they're not actually more dangerous, the number of accidents will probably be lower in the second period. So we can't tell whether the speed cameras work or not from this sort of data. The important thing is to consider what sort of experiment the council could use to answer the question. [I don't have anything against speed cameras, by the way, but it's a realistic sort of example.]

5. A PSA test is used to diagnose prostate cancer in men. The measured PSA level is generally higher if the volume of a tumour is higher. How should the test be used?

This is much more complicated, and takes into account the relationship between PSA and tumour size (linear regression?), what PSA looks like in healthy patients, the relative problems of false positives and false negatives, etc.

The last two examples are taken from Professor Sir Timothy's post:

6. In September 2009 the same six numbers were chosen in two consecutive draws of the Bulgarian State Lottery. Was this conclusive evidence that the draws were manipulated?

We see silly newspaper stories about this kind of 'coincidence' all the time, and it's not always obvious how big a coincidence it is. These problems are quite challenging (see also this).

7. In 1999 a solicitor named Sally Clark was convicted for the murder of her two sons... Roy Meadow, a paediatrician, argued for the prosecution as follows. The probability of a cot death is approximately 1 in 8500. So the probability of two cot deaths is roughly the square of this, or 1 in about 73 million. Therefore it was overwhelmingly likely that the deaths were not due to natural causes. Is this argument valid?

A sad and infamous example of the prosecutor's fallacy, the Texas sharp shooter fallacy, and various other statistical crimes, with real and terrible consequences. Much has been said about this already, so I won't add more here.

Newsflash: everyone being healthier reduces deaths

2012-06-02T12:39:00.000+01:00

People like me often complain about the accuracy of scientific journalism, and the selective nature of the reports we see in the media; indeed, there's plenty to complain about. However it's perfectly possible to misrepresent a story simply by choosing to place a particular spin upon it, usually one which makes the story seem more political than the original research. It's easy to blame the newspapers for this, but often the problem is compounded by academic press releases, which are themselves designed to catch the eye of media outlets.

A good example of this comes from a story this week about alcohol intake, which reports that, essentially, if everyone drank only one half of one unit of alcohol per day (i.e. 5 ml of pure alcohol per day), this would save thousands of lives per year. Many news outlets picked this up, and it even found its way onto Radio 4's The News Quiz, where the idea that people should only drink about 125 ml of beer per day was thoroughly ridiculed. The Daily Mail's headline, for example, began "Don't drink more than a quater of a pint a day", followed by "Oxford study claims slashing the official alcohol limit would save 4,500 lives a year."

You can get the original paper here (open access, happily), and I recommend a look to see the contrast with what you might have been exposed to in the news. If you're not a scientist, or imagine that scientific papers must be completely incomprehensible then I think you'll be pleasantly surprised; it's perfectly readable, largely free from unnecessary jargon and technical terms, and unlike many reports you might encounter in the worlds of business and politics, admirably concise (7 pages, including tables and figures).

Notice the difference between the newspaper headlines and the title of the paper: "What is the optimal level of population alcohol consumption for chronic disease prevention in England?" Aside from being rather less exciting sounding, note that this makes no mention of strategies for individuals, nor about telling people what to do.

The results summary tells us that "reducing the median consumption of alcohol to 5 ml per day would avert or delay 4,579 (2,544 to 6,590) deaths per year" [note to the authors, rounding is your friend]. There are three things about this sentence which strike me. First, unlike the news stories, there is a 95% credible interval associated with the estimate of the number of "lives saved"; it could be as low as 2,500, or as high as 6,600, which is a pretty wide margin. This doesn't need further statistical explanation here, suffice to say that there is a good deal of uncertainty about how much effect alcohol has upon different aspects of people's health. Second, we're talking about the median alcohol intake, not everyone's intake, which is not the same; under this scenario half the population drink 5 ml per day or more, and half drink 5 ml or less. Lastly, it mentions deaths "averted or delayed".

Now you might wonder what averting our own death might mean, as opposed to delaying it, which is seems more realistic. Sadly the paper doesn't explain precisely, but here's my best guess: the paper is based upon a series of simulations under different scenarios of how much people drink, using estimates of the effects of alcohol on various diseases. Then they count the number of deaths each year under the various scenarios; 1,000 deaths averted or delayed simply means that the number of deaths counted in the simulation was 1,000 fewer than under the status quo. Now, from an individual point of view, it could just be that you die the next year instead; or you could live another 20. The study doesn't consider this, because their interest is from a public health perspective.

Risk and reward

This is perfectly reasonable from the government's point of view, and in a purely economic sense: if someone lives for a year longer before they get liver cirrhosis or some other unpleasant chronic condition, then that's one more year before the public purse has to shell out for your treatment. Standard economic theory applies a discount to a cost in the future as compared to something you have to pay for now. In particular, it only requires a pretty small proportion of people to live a bit longer for this to turn into a significant amount of cash saved. And as a bonus your citizens live happier, longer lives, etc.

From an individual's perspective, however, the chance of benefiting from this reduced risk of death may be extremely small, and when presented with the facts it would be perfectly rational to decide that you'd rather enjoy a few drinks (or a bacon sandwich) than have a slightly longer life expectancy.

In any case the articles in the news imply that scientists are telling us all to drink less, rather than just reporting that drinking less would might be good for us. In particular the Daily Mail's article is pretty explicit in saying that "scientists say" that we should do such and such, and even claims that "changing the guidance" would save the 4,500 lives, rather than people actually drinking less. The paper itself says explicitly otherwise [my emphasis].

The recommendations and public messages...that would be required to achieve this...level of consumption are beyond the scope of this work... Public health behavioural recommendations should ideally be based on the best available evidence for optimising population health outcomes.

Who should we shout at?

Now to my favourite part: assigning blame. In this case, the BMJ's own press release is at least partly at fault, and reads pretty similarly to the BBC article. The credible interval has been removed, even though it is a very important measure of uncertainty inherent in the study, and, in this humble statistician's view, is perfectly capable of being understood by the non-scientific general public. That the level of uncertainty in scientific studies tends not to be reported contributes to the general impression that science always provides precise answers to questions; this causes problems when people read about studies with conflicting results, and leads to the contradictory impression that scientists don't know what they're talking about, and that one may as well ignore them. Naturally, neither of these things are true.

As an example of how this translates to media-world, the Mail says [my emphasis again]

The new advice flies in the face of previous studies, which have shown that drinking alcohol in moderation reduces the risk of dying from heart disease.

This is completely untrue. The paper is itself a meta-analysis of other results combined with a simulation, and does not involve a new study in the sense of following drinkers and seeing what they die from. The model used takes account of the fact that alcohol appears to be protective against heart disease, but finds this to be outweighed by other risks.

I'm uncertain as to who writes these sorts of press releases. There's plenty of pressure on scientists to get their research noticed by the general public, and the lead scientist of this study is quoted in some detail by the Mail. Getting this kind of attention depends upon newspapers and other media outlets picking up these stories, which in turn depends upon those outlets persuading their customers that the story is worth reading about. Implying that it's telling them all to stop drinking is a good way to grab someone's attention.

Some people take the view that it's better for scientists to hide behind a wall of solid fact, especially on issues such as climate change and evolution, because the idea that some scientists disagree on any aspect (however trivial in comparison with the might of the whole idea) might suggest that it's somehow equally valid to assume climate change is not down to carbon dioxide, or to believe in intelligent design. I think that this is likely to be highly counterproductive, and that it would be far better for scientists and the media to give a more realistic idea of what science, and being a scientist, is really like. The BMJ improving their press releases would be a good place to start.