Sampling frame
Encyclopedia
In statistics
, a sampling frame is the source material or device from which a sample
is drawn. It is a list of all those within a population
who can be sampled, and may include individuals, households or institutions.
Importance of the sampling frame is stressed by Jessen:
, it is possible to identify and measure every single item in the population and to include any one of them in our sample; this is known as direct element sampling. However, in many other cases this is not possible; either because it is cost-prohibitive (reaching every citizen of a country) or impossible (reaching all humans alive).
Having established the frame, there are a number of ways for organizing it to improve efficiency and effectiveness. It's at this stage that the researcher should decide whether the sample is in fact to be the whole population and would therefore be a census
.
This list should also facilitate access to the selected sampling units
. A frame may also provide additional 'auxiliary information' about its elements; when this information is related to variables or groups of interest, it may be used to improve survey design. While not necessary for simple sampling, a sampling frame more advanced sample techniques, such as stratified sampling
may contain additional information (such as demographic information). For instance, an electoral register might include name and sex; this information can be used to ensure that a sample taken from that frame covers all demographic categories of interest. (Sometimes the auxiliary information is less explicit; for instance, a telephone number may provide some information about location.)
An ideal sampling frame will have the following qualities:
The most straightforward type of frame is a list of elements of the population (preferably the entire population) with appropriate contact information. For example, in an opinion poll
, possible sampling frames include a electoral register
or a telephone directory
. Other sampling frames can include employment records, school class lists, patient files in a hospital, organizations listed in a thematic database, and so on. On a more practical levels, sampling frames have the form of computer file
s.
Not all frames explicitly list population elements; some list only 'clusters'. For example, a street map
can be used as a frame for a door-to-door survey; although it doesn't show individual houses, we can select streets from the map and then select houses on those streets. This offers some advantages: such a frame would include people who have recently moved and are not yet on the list frames discussed above, and it may be easier to use because it doesn't require storing data for every unit in the population, only for a smaller number of clusters.
Because a cluster-based frame contains less information about the population, it may place constraints on the sample design, possibly requiring the use of less efficient sampling methods and/or making it harder to interpret the resulting data.
Statistical theory tells us about the uncertainties in extrapolating from a sample to the frame. It should be expected that sample frames, will always contain some mistakes. In some cases, this may lead to sampling bias and in extreme cases to an unrepresentative sample. Such bias should be minimized, and identified, although avoiding it completely in a real world is nearly impossible. One should also not assume that sources which claim to be unbiased and representative are such.
In defining the frame, practical, economic, ethical, and technical issues need to be addressed. The need to obtain timely results may prevent extending the frame far into the future. The difficulties can be extreme when the population and frame are disjoint. This is a particular problem in forecasting
where inferences about the future are made from historical data
. In fact, in 1703, when Jacob Bernoulli proposed to Gottfried Leibniz
the possibility of using historical mortality data to predict the probability
of early death of a living man, Gottfried Leibniz
recognized the problem in replying:
Kish posited four basic problems of sampling frames:
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, a sampling frame is the source material or device from which a sample
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
is drawn. It is a list of all those within a population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
who can be sampled, and may include individuals, households or institutions.
Importance of the sampling frame is stressed by Jessen:
Sampling frame types and qualities
In the most straightforward case, such as when dealing with a batch of material from a production run, or using a censusCensus
A census is the procedure of systematically acquiring and recording information about the members of a given population. It is a regularly occurring and official count of a particular population. The term is used mostly in connection with national population and housing censuses; other common...
, it is possible to identify and measure every single item in the population and to include any one of them in our sample; this is known as direct element sampling. However, in many other cases this is not possible; either because it is cost-prohibitive (reaching every citizen of a country) or impossible (reaching all humans alive).
Having established the frame, there are a number of ways for organizing it to improve efficiency and effectiveness. It's at this stage that the researcher should decide whether the sample is in fact to be the whole population and would therefore be a census
Census
A census is the procedure of systematically acquiring and recording information about the members of a given population. It is a regularly occurring and official count of a particular population. The term is used mostly in connection with national population and housing censuses; other common...
.
This list should also facilitate access to the selected sampling units
Statistical unit
A unit in a statistical analysis refers to one member of a set of entities being studied. It is the material source for the mathematical abstraction of a "random variable"...
. A frame may also provide additional 'auxiliary information' about its elements; when this information is related to variables or groups of interest, it may be used to improve survey design. While not necessary for simple sampling, a sampling frame more advanced sample techniques, such as stratified sampling
Stratified sampling
In statistics, stratified sampling is a method of sampling from a population.In statistical surveys, when subpopulations within an overall population vary, it is advantageous to sample each subpopulation independently. Stratification is the process of dividing members of the population into...
may contain additional information (such as demographic information). For instance, an electoral register might include name and sex; this information can be used to ensure that a sample taken from that frame covers all demographic categories of interest. (Sometimes the auxiliary information is less explicit; for instance, a telephone number may provide some information about location.)
An ideal sampling frame will have the following qualities:
- all units have a logical, numerical identifier
- all units can be found - their contact information, map location or other relevant information is present
- the frame is organized in a logical, systematic fashion
- the frame has additional information about the units that allow the use of more advanced sampling frames
- every element of the population of interest is present in the frame
- every element of the population is present only once in the frame
- no elements from outside the population of interest are present in the frame
The most straightforward type of frame is a list of elements of the population (preferably the entire population) with appropriate contact information. For example, in an opinion poll
Opinion poll
An opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...
, possible sampling frames include a electoral register
Electoral register
The electoral roll is a listing of all those registered to vote in a particular area. The register facilitates the process of voting, helps to prevent fraud and may also be used to select people for jury duty...
or a telephone directory
Telephone directory
A telephone directory is a listing of telephone subscribers in a geographical area or subscribers to services provided by the organization that publishes the directory...
. Other sampling frames can include employment records, school class lists, patient files in a hospital, organizations listed in a thematic database, and so on. On a more practical levels, sampling frames have the form of computer file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...
s.
Not all frames explicitly list population elements; some list only 'clusters'. For example, a street map
Street map
A street map is a map showing roads and streets in a district or entire city. Street maps are great tools for navigating cities, towns or communities. They are available in printed forms, online on the Internet or on mobile phones with GPRS connection....
can be used as a frame for a door-to-door survey; although it doesn't show individual houses, we can select streets from the map and then select houses on those streets. This offers some advantages: such a frame would include people who have recently moved and are not yet on the list frames discussed above, and it may be easier to use because it doesn't require storing data for every unit in the population, only for a smaller number of clusters.
Sampling frames problems
The sampling frame must be representative of the population and this is a question outside the scope of statistical theory demanding the judgement of experts in the particular subject matter being studied. All the above frames omit some people who will vote at the next election and contain some people who will not; some frames will contain multiple records for the same person. People not in the frame have no prospect of being sampled.Because a cluster-based frame contains less information about the population, it may place constraints on the sample design, possibly requiring the use of less efficient sampling methods and/or making it harder to interpret the resulting data.
Statistical theory tells us about the uncertainties in extrapolating from a sample to the frame. It should be expected that sample frames, will always contain some mistakes. In some cases, this may lead to sampling bias and in extreme cases to an unrepresentative sample. Such bias should be minimized, and identified, although avoiding it completely in a real world is nearly impossible. One should also not assume that sources which claim to be unbiased and representative are such.
In defining the frame, practical, economic, ethical, and technical issues need to be addressed. The need to obtain timely results may prevent extending the frame far into the future. The difficulties can be extreme when the population and frame are disjoint. This is a particular problem in forecasting
Forecasting
Forecasting is the process of making statements about events whose actual outcomes have not yet been observed. A commonplace example might be estimation for some variable of interest at some specified future date. Prediction is a similar, but more general term...
where inferences about the future are made from historical data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
. In fact, in 1703, when Jacob Bernoulli proposed to Gottfried Leibniz
Gottfried Leibniz
Gottfried Wilhelm Leibniz was a German philosopher and mathematician. He wrote in different languages, primarily in Latin , French and German ....
the possibility of using historical mortality data to predict the probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
of early death of a living man, Gottfried Leibniz
Gottfried Leibniz
Gottfried Wilhelm Leibniz was a German philosopher and mathematician. He wrote in different languages, primarily in Latin , French and German ....
recognized the problem in replying:
Kish posited four basic problems of sampling frames:
- Missing elements: Some members of the population are not included in the frame.
- Foreign elements: The non-members of the population are included in the frame.
- Duplicate entries: A member of the population is surveyed more than once.
- Groups or clusters: The frame lists clusters instead of individuals.