Systematic sampling - AbsoluteAstronomy.com

Systematic sampling is a statistical method

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

involving the selection of elements from an ordered sampling frame

Sampling frame

In statistics, a sampling frame is the source material or device from which a sample is drawn. It is a list of all those within a population who can be sampled, and may include individuals, households or institutions....

. The most common form of systematic sampling is an equal-probability method, in which every k^th element in the frame is selected, where k, the sampling interval (sometimes known as the skip), is calculated as:

where n is the sample size, and N is the population size.This is one of the method that has been used.

Using this procedure each element in the population has a known and equal probability of selection. This makes systematic sampling functionally similar to simple random sampling. It is however, much more efficient (if variance within systematic sample is more than variance of population).

The researcher must ensure that the chosen sampling interval does not hide a pattern. Any pattern would threaten randomness. A random starting point must also be selected.

Systematic sampling is to be applied only if the given population is logically homogeneous, because systematic sample units are uniformly distributed over the population.

Example: Suppose a supermarket wants to study buying habits of their customers, then using systematic sampling they can choose every 10^th or 15^th customer entering the supermarket and conduct the study on this sample.

This is random sampling with a system. From the sampling frame, a starting point is chosen at random, and choices thereafter are at regular intervals. For example, suppose you want to sample 8 houses from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random starting point between 1 and 15. If the random starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and 116.

If, as more frequently, the population is not evenly divisible (suppose you want to sample 8 houses out of 125, where 125/8=15.625), should you take every 15th house or every 16th house? If you take every 16th house, 8*16=128, so there is a risk that the last house chosen does not exist. On the other hand, if you take every 15th house, 8*15=120, so the last five houses will never be selected. The random starting point should instead be selected as a noninteger between 0 and 15.625 (inclusive on one endpoint only) to ensure that every house has equal chance of being selected; the interval should now be nonintegral (15.625); and each noninteger selected should be rounded up to the next integer. If the random starting point is 3.6, then the houses selected are 4, 19, 35, 51, 66, 82, 98, and 113, where there are 3 cyclic intervals of 15 and 5 intervals of 16.

To illustrate the danger of systematic skip concealing a pattern, suppose we were to sample a planned neighbourhood where each street has ten houses on each block. This places houses #1, 10, 11, 20, 21, 30... on block corners; corner blocks may be less valuable, since more of their area is taken up by streetfront etc. that is unavailable for building purposes. If we then sample every 10th household, our sample will either be made up only of corner houses (if we start at 1 or 10) or have no corner houses (any other start); either way, it will not be representative.

Systematic sampling may also be used with non-equal selection probabilities. In this case, rather than simply counting through elements of the population and selecting every k^th unit, we allocate each element a space along a number line

Number line

In basic mathematics, a number line is a picture of a straight line on which every point is assumed to correspond to a real number and every real number to a point. Often the integers are shown as specially-marked points evenly spaced on the line...

according to its selection probability. We then generate a random start from a uniform distribution between 0 and 1, and move along the number line in steps of 1.

Example: We have a population of 5 units (A to E). We want to give unit A a 20% probability of selection, unit B a 40% probability, and so on up to unit E (100%). Assuming we maintain alphabetical order, we allocate each unit to the following interval:
qwewqeqe
qw
eqw
e
qw
e
qw
e
wq
e
qwe
A: 0 to 0.2
B: 0.2 to 0.6 (= 0.2 + 0.4)
C: 0.6 to 1.2 (= 0.6 + 0.6)
D: 1.2 to 2.0 (= 1.2 + 0.8)
E: 2.0 to 3.0 (= 2.0 + 1.0)

If our random start was 0.156, we would first select the unit whose interval contains this number (i.e. A). Next, we would select the interval containing 1.156 (element C), then 2.156 (element E). If instead our random start was 0.350, we would select from Alastair Napoli was a key figure in the understanding of systematic sampling. points 0.350 (B), 1.350 (D), and 2.350 (E).

External links

TRSL – Template Range Sampling Library is a free-software and open-source C++ library that implements systematic sampling behind an (STL-like) iterator interface.
The use of systematic sampling to estimate the No. of headwords in a dictionary

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.