First obvious question - is this statistically significant? Well, let's have a look at the research quoted by the Guardian, which comes from Cancer Research UK. Their Figure 6.10 immediately arouses suspicion. There does appear to be an uptick in the figures from 2010 to 2011, but they remain below the 2009 figure! The overall trend in the numbers over the past 10 years is clearly downwards. So either we believe that the number of children taking up smoking fell and then rose dramatically in consecutive years, or else we might just be witnessing a bit of noise in our data.

#### Surveys

So first, what exactly is the number actually measuring? Well it's the number of students who become regular or occasional smokers from one year to the next. This is measured by comparing the number of such regular or occasional smokers (for example) aged 12 in 2010, with the number aged 13 in 2011.

But the researchers cannot, in practice, go around asking every 11-15 year old whether or not they smoke. Instead they commission a survey, or find the results of an existing survey, in this case the snappily titled

*Smoking, drinking and drug use among young people in England in 2011*, carried out by a branch of the NHS. You can see it all here.

The survey gives an estimate of the

*proportion*of children smoking in England, which the statisticians at Cancer Research UK multiplied by the ONS's mid-year population estimates

*for the UK*to arrive at a total number. This is, in my view, a mistake, because we end up combining the statistical error in the original survey with the error in the ONS's population estimates (though this should be fairly small). Also: why multiply population estimates for the UK by proportions for just England? Simple: the UK has more people, so the numbers sound bigger that way.

The original research says that the proportion of regular smokers amongst 11-15 year-olds in the past 5 years is 6%, 6%, 6%, 5% and 5%. Their Figure 3.2 backs this up. All in all, not very dramatic. No mention is made in the original report of any increase in smoking rates; in fact rates amongst school children have fallen dramatically in the past 20 years, with the number of regular smokers having been around 10% for boys and 15% for girls in the mid 1990s.

The problem is that it's in Cancer Research UK's interest to make this all sound as bad as possible so that ever more action is warranted by the authorities. In fact they seem to have gone to quite a lot of trouble to construct a figure which will get them a headline.

Their motives may be laudable, but they have an agenda just like everyone else, and it does not necessarily involve being totally honest. So keep your skeptical hats on at all times!

####
**Appendix on Batch Effects**

Here we cover some detail about the surveying method.

The survey covered 6,519 school pupils, but clustered into 219 schools in England. Ideally the authors would have written down a list of every 11-15 year-old in the country, and then picked several thousand of them from the list at random, and then given the lucky selection the survey. But such a list isn't readily available in the UK, and it's very difficult (and probably illegal) to create one.

Much easier is to make a list of every high school in the country, and pick a few hundred at random. Then you can go to each school, and either survey every student, or use the school's list of pupils to sample some of them at random. This is much simpler, and if done correctly should (very approximately) mean that every student in the country is still equally likely to be selected for the survey.

This means that the proportion of students in the survey who say they smoke should be an

*unbiased*estimate of the proportion of

*all*students who would answer the same way. Of course, if I ran the survey again and asked a different 6,519 students, i would get slightly different answers by change; but because the design is unbiased, I should get the 'correct' answer

*on average*.

There's a disadvantage to this method though, which is that 6,519 students from 219 schools doesn't give you as much information as 6,519 students picked randomly from a list of all students. This is because students in the same schools are more likely to be similar to one another; they have similar backgrounds and come from the same area, they're all from the inner city or rural Cumbria, for example. This within-school correlation (a

*batch effect*) can be estimated from the data and corrected for, which the authors of the original research do. More info in this article, on a similar example.

No attempt was made by the Guardian's article (or by CRUK as far as I can tell) to assess whether the difference they observe might be due to sampling variation.

Great post. Thanks for keeping us healthily (!) skeptical.

ReplyDelete