Survey sampling
Encyclopedia
In statistics
, survey sampling describes the process of selecting a sample of elements from a target population
in order to conduct a survey.
A survey
may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection
. The purpose of sampling
is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census
.
Surveys that are not based on probability sampling have no way of measuring their bias or sampling error. Surveys based on non-probability samples are not externally valid. They can only be said to be representative of the people that have actually completed the survey.
Put another way, if a probability-based survey of the United States household population finds that 59% of its respondents support a piece of legislation there is mathematical reason to believe that the proportion of all the persons living in households in the United States who support this piece of legislation is close to 59% (within the margin of error
). If a non-probability survey conducted in the United States finds that 59% percent of its respondents support a piece of legislation that is the only conclusion that can be drawn, no statement about the target population can be made.
The main reason that non-probability samples are used is that probability samples cost much more to produce. Non-probability samples surveys are commonly used in market research
, where cost can be more important than the projectability of findings to a larger population.
In academic and government survey research probability sampling is often regarded a standard procedure that must be employed regardless of the cost. The Office of Management and Budget's List of Standards for Statistical Surveys states that federally funded surveys must be performed,
Many statisticians disagree with these views. For example, Valliant, Dorfman and Royall explain,
The extreme position, that no inferences can be made unless the selection probabilities of the sample units are known would make it impossible to draw inferences from most samples. For example, most surveys have substantial amounts of nonresponse. Even though the units are initially chosen with known probabilities, the nonresponse mechanism is unknown and must be modeled, as in an observational study.
A probability-based survey sample is created by constructing a list of the target population, called the sample frame, a randomized process for selecting units from the sample frame, called a selection procedure, and a method of contacting selected units to and enabling them complete the survey, called a data collection method or mode. For some target populations this process may be easy, for example, sampling the employees of a company by using payroll list. However, in large, disorganized populations simply constructing a suitable sample frame is often a complex and expensive task.
Common methods of conducting a probability sample of the household population in the United States are Area Probability Sampling, Random Digit Dial telephone sampling, and more recently Address-Based Sampling.
Within probability sampling there are specialized techniques such as stratified sampling and cluster sampling that improve the precision or efficiency of the sampling process without altering the fundamental principals of probability sampling.
In non-probability samples the relationship between the target population and the survey sample is immeasurable and potential bias is unknowable. Sophisticated users of non-probability survey samples tend to view the survey as an experimental condition, rather than a tool for population measurement, and examine the results for internally consistent relationships.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, survey sampling describes the process of selecting a sample of elements from a target population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
in order to conduct a survey.
A survey
Statistical survey
Survey methodology is the field that studies surveys, that is, the sample of individuals from a population with a view towards making statistical inferences about the population using the sample. Polls about public opinion, such as political beliefs, are reported in the news media in democracies....
may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection
Survey data collection
The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way....
. The purpose of sampling
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census
Census
A census is the procedure of systematically acquiring and recording information about the members of a given population. It is a regularly occurring and official count of a particular population. The term is used mostly in connection with national population and housing censuses; other common...
.
Probability vs. Non-Probability Sampling
Survey samples can be broadly divided into two types: probability samples and non-probability samples. Only surveys based on a probability samples can be used to create mathematically sound statistical inferences about a larger target population. Inferences from probability-based surveys may still suffer from many types of bias.Surveys that are not based on probability sampling have no way of measuring their bias or sampling error. Surveys based on non-probability samples are not externally valid. They can only be said to be representative of the people that have actually completed the survey.
Put another way, if a probability-based survey of the United States household population finds that 59% of its respondents support a piece of legislation there is mathematical reason to believe that the proportion of all the persons living in households in the United States who support this piece of legislation is close to 59% (within the margin of error
Margin of error
The margin of error is a statistic expressing the amount of random sampling error in a survey's results. The larger the margin of error, the less faith one should have that the poll's reported results are close to the "true" figures; that is, the figures for the whole population...
). If a non-probability survey conducted in the United States finds that 59% percent of its respondents support a piece of legislation that is the only conclusion that can be drawn, no statement about the target population can be made.
The main reason that non-probability samples are used is that probability samples cost much more to produce. Non-probability samples surveys are commonly used in market research
Market research
Market research is any organized effort to gather information about markets or customers. It is a very important component of business strategy...
, where cost can be more important than the projectability of findings to a larger population.
In academic and government survey research probability sampling is often regarded a standard procedure that must be employed regardless of the cost. The Office of Management and Budget's List of Standards for Statistical Surveys states that federally funded surveys must be performed,
selecting samples using generally accepted statistical methods (e.g., probabilistic methods that can provide estimates of sampling error). Any use of nonprobability sampling methods (e.g., cut-off or model-based samples) must be justified statistically and be able to measure estimation error.
Many statisticians disagree with these views. For example, Valliant, Dorfman and Royall explain,
To claim that, in general, probabilistic inferences are not valid when the randomization distribution is not available is simply wrong. This is not to deny that randomization is valuable, but only to deny that it represents the basis for all valid, rigorous, probabilistic inference.
The extreme position, that no inferences can be made unless the selection probabilities of the sample units are known would make it impossible to draw inferences from most samples. For example, most surveys have substantial amounts of nonresponse. Even though the units are initially chosen with known probabilities, the nonresponse mechanism is unknown and must be modeled, as in an observational study.
Probability sampling
In a probability sample (also called "scientific" or "random" sample) each member of the target population has a known and non-zero probability of inclusion in the sample. A survey based on a probability sample can in theory produce statistical measurements of the target population that are:- unbiased, the expected value of the sample mean is equal to the population mean E(ȳ)=μ, and
- have a measurable sampling error, which can be expressed as a confidence intervalConfidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
, or margin of errorMargin of errorThe margin of error is a statistic expressing the amount of random sampling error in a survey's results. The larger the margin of error, the less faith one should have that the poll's reported results are close to the "true" figures; that is, the figures for the whole population...
.
A probability-based survey sample is created by constructing a list of the target population, called the sample frame, a randomized process for selecting units from the sample frame, called a selection procedure, and a method of contacting selected units to and enabling them complete the survey, called a data collection method or mode. For some target populations this process may be easy, for example, sampling the employees of a company by using payroll list. However, in large, disorganized populations simply constructing a suitable sample frame is often a complex and expensive task.
Common methods of conducting a probability sample of the household population in the United States are Area Probability Sampling, Random Digit Dial telephone sampling, and more recently Address-Based Sampling.
Within probability sampling there are specialized techniques such as stratified sampling and cluster sampling that improve the precision or efficiency of the sampling process without altering the fundamental principals of probability sampling.
Bias in Probability Sampling
Bias in surveys is undesirable, but often unavoidable. The major types of bias that may occur in the sampling process are:- Non-response bias: When individuals or households selected in the survey sample cannot or will not complete the survey there is the potential for bias to result from this non-response. Nonresponse bias occurs when the observed value deviates from the population parameter due to differences between respondents and nonrespondents.
- Coverage bias: Coverage bias can occur when population members do not appear in the sample frame (undercoverage). Coverage bias occurs when the observed value deviates from the population parameter due to differences between covered and non-covered units. Telephone surveys suffer from a well known source of coverage bias because they cannot include households without telephones.
- Selection Bias: Selection bias occurs when some units have a differing probability of selection that is unaccounted for by the researcher. For example, some households have multiple phone numbers making them more likely to be selected in a telephone survey than households with only one phone number. This selection bias would be corrected by applying a survey weight equal to [1/(# of phone numbers)] to each household.
Non-Probability Sampling
Many surveys are not based on a probability samples, but rather by finding a suitable collection of respondents to complete the survey. Some common examples of non-probability sampling are:- Judgement Samples: A researcher decides which population members to include in the sample based on his or her judgement. The researcher may provide some alternative justification for the representativeness of the sample.
- Snowball Samples: Often used when a target population is rare, members of the target population recruit other members of the population for the survey.
- Quota Samples: The sample is designed to include a designated number of people with certain specified characteristics. For example, 100 coffee drinkers. This type of sampling is common in non-probability market research surveys.
- Convenience Samples: The sample is composed of whatever persons can be most easily accessed to fill out the survey.
In non-probability samples the relationship between the target population and the survey sample is immeasurable and potential bias is unknowable. Sophisticated users of non-probability survey samples tend to view the survey as an experimental condition, rather than a tool for population measurement, and examine the results for internally consistent relationships.
External links
- CRAN Task View Survey Methodology
- U.S. Bureau of the Census
- Statistics Canada
- What is a Survey? Booklet published by National Opinion Research Center and The American Statistical Association
- University of Michigan Survey Research Center
- Journal of Information Technology Learning and Performance article Organizational Research: Determining Sample Size in Survey Research