By Ann Ray
On June 26, 2006
Updated October 9, 2012

Real world sampling

If you pick up a classic market research text you'll come away with the impression that sampling is an integral part of every survey. It's not.

The reasons sampling has historically been such a big deal are:

High per-respondent cost
Mass market issues
The sampling error calculation

Let's take them in order.

Per-respondent costs

If you're offering an incentive for every completion or using interviewers, this is a significant cost issue and sampling makes sense.

If you're using Web or paper surveys, the per-respondent cost is much smaller. While some Web survey hosting services have a per-response model, others are based on a monthly subscription. You can also get software to run surveys on your own servers, some of which allows unlimited responses.

Remember that even if you have no data entry costs because you're scanning or using the Web, you'll always have a data cleaning cost for any written answers.

Mass market surveys

If you're predicting cola market shares, you don't need a census to find the answer. With a well constructed survey and representative sample, you'll be 95% certain you're within +/- 2% at just a couple thousand responses.

However, many surveys deal with smaller populations where it is feasible to reach out to the entire group. Remember you'll still end up with a "net sample" for any survey where you don't get every single person in the population to answer. You just don't have to bother with sampling who you invite for smaller groups.

Sampling errors

This can become a red herring in your survey projects. It's easy to fixate on the sampling error because it will give you a tidy +/- % of accuracy. The problem is this error only covers the distortion from having a portion of population answer. It does not include errors introduced by poorly phrased questions, missing scale options, non-random sampling techniques, etc.

How to sample

If you do have a large population or significant per-respondent costs, here are some basics on how to sample. I also recommend doing a bit more reading.

The end goal is always a representative sample of respondents. If you're measuring call center satisfaction, intercepting callers for a couple weeks will do the trick. However if you want to measure overall customer satisfaction, you also need to be reaching out to people who didn't call recently.

You always want to select people randomly, so you want an automatic selection of every 5th, 20th, etc. caller rather than having the service representative select who they offer the survey.

Here's a table for picking the size of your net sample (responses, not invitations) based on whether you want to be 95% or 99% certain of your results and whether you're comfortable with a +/- 5% or +/- 2% margin of error. If you have a very diverse population or plan to break the data down into smaller sub-groups, you'll want to get more responses.

Population	95% Confidence		99% Confidence
Population	+/- 5%	+/- 2%	+/- 5%	+/- 2%
100	80	97	88	98
500	218	414	286	447
1,000	278	707	400	807
2,000	323	1092	500	1351
3,000	341	1334	545	1744
4,000	351	1501	571	2040
5,000	357	1623	588	2272
10,000	370	1937	625	2939
50,000	382	2292	657	3841
100,000	383	2345	662	3995
500,000	384	2390	665	4126
1,000,000	385	2396	666	4144
10,000,000	385	2401	666	4159

When never to sample

Do not sample employees! You can decide to run a survey for only one division, but do not sample within that division. Even if an employee will not bother to answer the survey, they still want to be asked.

Likewise, with employees be sure you've got all the stakeholders on an issue. If you're talking about internal management, you can just poll the R&D division. But if you're talking about new product development, it's a good idea to include Manufacturing, Marketing and Sales.

Filed under:
Reporting
Respondents

Tagged with:
Response rates
Risk
Sampling

3 Comments

Note: New comments disabled for a few days while debugging.

Bob | June 28, 2006 9:04 AM | Reply

OK--So say my "universe" is the 4,000 members of a particular organization, does that mean that to obtain that 95% confidence, +/- 5% I need to send it to only 351 people (no matter how many actually respond)? Or does it mean that if I send the survey to all 4,000, I need 351 *responses* to achieve that confidence.

Or something alltogether different? Thank you!

Ann Ray | June 28, 2006 9:35 AM | Reply

The table is the number of responses you need.

So if you expect a 20% response rate, you'd need to send out 351/0.2 = 1,755 invitations.

If this is a low involvement customer survey, your response rate could easily be 5%, which would require 7,020 invitations.

Since that's more than your population, you can see why many surveys don't have to bother with sampling the invites.

Bob | June 28, 2006 10:16 AM | Reply

Thanks so much Ann--I really appreciate it. Keep up the good work...