Wednesday, July 18, 2012

Reading too much into one number

The BBC points out that deaths from road accidents in the UK increased last year for the first time in 10 years.  In total 1,901 people were killed during 2011, which is 51 more than in 2010.  This is certainly not good.

But is it surprising, in a statistical sense?
Well if the roads were just as safe (or just as dangerous) in 2011 as in 2010, we still wouldn't expect the number of deaths to be exactly the same in the two years: other factors such as the weather will have an effect.  The Transport Select Committee's report concludes by stating:
In the response to this report, we recommend that the Government outlines why it thinks road deaths increased in 2011.
A possible (though unlikely) response is: "we were just unlucky this year".

Poisson Distributions

A crude way to capture the variability is with a Poisson distribution.  The essence of a Poisson is that it counts how many events of some kind take place (like road accidents) whilst assuming that each event is independent of all the others.  In other words, the fact that one accident has occurred (or hasn't occurred) doesn't make the roads any more or less dangerous for everyone else during the rest of the year.  You don't think "oh well, there was an accident in my town today, so that's our quota for the month - I'll be safe now" - this would obviously be ridiculous.

The only other piece of information we need is the intensity of accidents - how many deaths do we expect per year on average, possibly over many years of equally dangerous roads.  Again, crudely, we can estimate this as the deaths of accidents in 2010, which was 1,850.  So if in an average year we get 1,850 deaths, how often would we see 1,901 or more?  Then answer turns out to be 12%, or about one in every eight years.  In other words, this is not very surprising.

What about the assumptions we made?  In reality accident deaths are not independent, mainly because several deaths can occur all at once.  Last year 7 people were killed in a particularly awful accident on the M5.  This sort of clustering of events causes overdispersion, and means that the variability in the number of deaths will actually be greater than my simple model implied.  Hence, 12% is likely to be an underestimate.

Could it work the other way around?  Perhaps after seeing the M5 crash, everyone drives a bit more carefully, and there are fewer accidents (this would lead to underdispersion, naturally).  This is possible, but the effect is unlikely to be as strong as the effect of several people dying all at once.  To be certain, we could test this idea with appropriate data.

Remarkable success

The reduction in the number of road deaths (particularly for vehicle occupants) in the last seven years has been remarkable (see graph); I can think of few other areas in which such a dramatic success can be demonstrated.  The huge improvement in the safety standards of vehicles are the main reason, but better roads and higher petrol prices (so people drive more slowly) are also likely to have helped.  

It's clear that the downward trend seen between 2006 and 2010 could not continue forever, so we shouldn't get too worked up about a small one year increase.  If it rises again in the next couple of years, I'd be more concerned.  I also can't understand why the committee find it "shocking" that road accidents are the biggest killer among young people.  Something has to be the leading cause of death in this group; would they rather it was homicide?  Or suicide?  Or drug abuse?

Having said all this, the increase in the number of people killed or seriously injured looks much more significant (2% increase to 25,023), and might be harder to explain away as statistical noise.  This post isn't intended to be an excuse for complacency in the Transport Department, just another cautionary tale about reading too much into small amounts of data.

1 comment:

  1. I am appearing onward directly in move ahead of beginning your advance position around exactly gone.
    buy dissertation online