Real world sampling

If you pick up a classic market research text you'll come away with the impression that sampling is an integral part of every survey. It's not.

The reasons sampling has historically been such a big deal are:

  • High per-respondent cost
  • Mass market issues
  • The sampling error calculation

Let's take them in order.

Per-respondent costs

If you're offering an incentive for every completion or using interviewers, this is a significant cost issue and sampling makes sense.

If you're using Web or paper surveys, the per-respondent cost is much smaller. While some Web survey hosting services have a per-response model, others are based on a monthly subscription. You can also get software to run surveys on your own servers, some of which allows unlimited responses. 

Remember that even if you have no data entry costs because you're scanning or using the Web, you'll always have a data cleaning cost for any written answers.

Mass market surveys

If you're predicting cola market shares, you don't need a census to find the answer. With a well constructed survey and representative sample, you'll be 95% certain you're within +/- 2% at just a couple thousand responses.

However, many surveys deal with smaller populations where it is feasible to reach out to the entire group. Remember you'll still end up with a "net sample" for any survey where you don't get every single person in the population to answer. You just don't have to bother with sampling who you invite for smaller groups.

Sampling errors

This can become a red herring in your survey projects. It's easy to fixate on the sampling error because it will give you a tidy +/- % of accuracy. The problem is this error only covers the distortion from having a portion of population answer. It does not include errors introduced by poorly phrased questions, missing scale options, non-random sampling techniques, etc.

How to sample

If you do have a large population or significant per-respondent costs, here are some basics on how to sample. I also recommend doing a bit more reading.

The end goal is always a representative sample of respondents. If you're measuring call center satisfaction, intercepting callers for a couple weeks will do the trick. However if you want to measure overall customer satisfaction, you also need to be reaching out to people who didn't call recently.

You always want to select people randomly, so you want an automatic selection of every 5th, 20th, etc. caller rather than having the service representative select who they offer the survey.

Here's a table for picking the size of your net sample (responses, not invitations) based on whether you want to be 95% or 99% certain of your results and whether you're comfortable with a +/- 5% or +/- 2% margin of error. If you have a very diverse population or plan to break the data down into smaller sub-groups, you'll want to get more responses.

Population 95% Confidence 99% Confidence
+/- 5% +/- 2% +/- 5% +/- 2%
100 80 97 88 98
500 218 414 286 447
1,000 278 707 400 807
2,000 323 1092 500 1351
3,000 341 1334 545 1744
4,000 351 1501 571 2040
5,000 357 1623 588 2272
10,000 370 1937 625 2939
50,000 382 2292 657 3841
100,000 383 2345 662 3995
500,000 384 2390 665 4126
1,000,000 385 2396 666 4144
10,000,000 385 2401 666 4159

When never to sample

Do not sample employees! You can decide to run a survey for only one division, but do not sample within that division. Even if an employee will not bother to answer the survey, they still want to be asked.

Likewise, with employees be sure you've got all the stakeholders on an issue. If you're talking about internal management, you can just poll the R&D division. But if you're talking about new product development, it's a good idea to include Manufacturing, Marketing and Sales.


Note: New comments disabled for a few days while debugging.

OK--So say my "universe" is the 4,000 members of a particular organization, does that mean that to obtain that 95% confidence, +/- 5% I need to send it to only 351 people (no matter how many actually respond)? Or does it mean that if I send the survey to all 4,000, I need 351 *responses* to achieve that confidence.

Or something alltogether different? Thank you!

The table is the number of responses you need.

So if you expect a 20% response rate, you'd need to send out 351/0.2 = 1,755 invitations.

If this is a low involvement customer survey, your response rate could easily be 5%, which would require 7,020 invitations.

Since that's more than your population, you can see why many surveys don't have to bother with sampling the invites.

Thanks so much Ann--I really appreciate it. Keep up the good work...

Need a Hand?

A little help can add a lot of polish—or just save hours and headaches:

(206) 399-2344 Download VCard LinkedIn Profile

The course was very well received. Ann in one word is phenomenal. Please thank her again for all her hard work and of course patience. Amazing woman.

Marian Slobodian
Statistics Canada