As Statistics Canada continues to roll-out the results from the National Household survey, I seem to become involved in arguments at least once a week as to the importance of sample selection in survey data. This week, my argument was with IPSOS CEO Darrell Bricker – someone who should know a lot about statistics. In particular, Mr. Bricker should know that you can’t solve a sample selection problem with an increased sample size, and I actually think he does. I think the issue is that he’s thinking about practical polling issues with respect to sampling, not about statistical issues with respect to selected samples. Statistics Canada differentiates between sampling error and non-sampling errors, and I think that’s where our key difference lies. Let me see if I can explain this, and hopefully Mr. Bricker will respond and let me know if I am on the right track.

Almost all statistics relies on samples of the population. As long as your sample is representative, or appropriately weighted, your sample estimates will be unbiased or accurate, in that on average you’d get the same value in the sample as in the population as a whole. Your estimates may not be precise, however, as they will have large margins of error if your sample is small. This is why we can use samples to draw conclusions on the characteristics of the population as a whole.

Let me give you an example. If you are conducting a poll of Canadians, and you receive responses from 1000 people, you’ve covered 0.003% of the population with your survey. Even if the people you call are selected at random, and non-response is also random, you don’t necessarily have a representative sample. With a small sample, you will invariably over-sample some small groups and under-sample others. For example, the chance of being from Miramichi, NB is approximately 0.05% in the population as a whole. As such, in a representative panel of 1000 Canadians, you’d have 1/2 of a Miramichier. Of course, you can’t have 1/2 a Miramichier, so you either have 1 or more (over-represented) or 0 (under-represented) in your sample. The same goes for other small segments of the population, whether they are differentiated by geographic, demographic, religious, or other characteristics. You can solve this problem with a combination of larger sample sizes, over-sampling smaller groups, and appropriately weighting observations.

Luckily, we have the mandatory short form census and other institutional data which allow you to weight observations in a small sample with probabilities which reflect the proportion of the total population made up by the type of person associated with each observation. Weights also allow you to over-sample small regions, so that (to continue the example above) we don’t treat the opinion of one random Miramichier to be taken to be representative of the population of Miramichi, so you’d sample a greater share of the population in smaller regions and down-weight them in the overall results. Done correctly, with random non-response, you’ll get a good sense of the characteristics of the entire population with a relative small sample of it.

Statistical weighting can only get you so far – you can’t correct for a sample which is non-representative with respect to variables not in your weighting survey, or a sample which is not random due to non-response bias with sample weighting,. This is the core problem with the NHS. Put simply, you can’t increase the weight applied to observations you don’t have. Statistics Canada states that:

In every self-administered voluntary survey, error due to non-response to the survey’s variables makes up a substantial portion of the non-sampling error. Non-response is likely to bias the estimates based on the survey, because non-respondents tend to have different characteristics from respondents. As a result, there is a risk that the results will not be representative of the actual population.

This could be solved with a larger sample if it were a sampling problem, or could be solved with weights if we had information on the underlying population in the short form census. The problem is that the entire point of the NHS was to ask questions which we don’t ask in the short form census, so we won’t know if we have low response rates in those areas because we have no reference point – for now, we can rely partially on previous iterations of the census, but those will quickly become obsolete. We can increase the sample size, but it might not help if non-response is inherent to the group itself. As Bricker pointed out on Twitter, this was true of the old long form as well, but errors in the old tool don’t imply reliability of the new one.

In response to this issue, Bricker said that, “this is about common sense research issues, not math formulae.” I disagree – this is entirely about math formulae and understanding the difference between sampling and non-sampling error and sample size issues. You can’t always correct for non-response bias using a larger sample size or with re-weighting observations. Worse yet, we won’t always know if our results are biased by non-response, and in this case, higher sample sizes actually exacerbate the problem.

How can a larger sample size exacerbate the problem? It’s simple – precision of estimates increases in your sample size. Imagine I take a survey which experiences non-response bias. Even though the true value in the population would be 1, I get a biased estimate of 0.8, +/- 0.2 19 times out of 20, because people with higher values tend to not respond. The +/- 0.2 will decline as my sample size increases, even if the bias remains. With a larger sample, I might get an estimate of 0.8, +/- 0.05 19 times out of 20 – a more precise, but still inaccurate, answer.

No survey is perfect, and this includes the census (old or new). That said, it’s important to understand the limits of the information we can glean from the NHS and to not suppose that we will always know when the data are biased or not. That, I agree, is common sense.