Spatial descriptive statistics
Encyclopedia
Spatial descriptive statistics
Descriptive statistics
Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...

are used for a variety of purposes in geography, particularly in quantitative data analyses involving Geographic Information Systems (GIS).

Types of spatial data

The simplest forms of spatial data are gridded data, in which a scalar quantity is measured for each point in a regular grid of points, and point sets, in which a set of coordinates (e.g. of points in the plane) is observed. An example of gridded data would be a satellite image of forest density that has been digitized on a grid. An example of a point set would be the latitude/longitude coordinates of all elm trees in a particular plot of land. More complicated forms of data include marked point sets and spatial time series.

Measures of spatial central tendency

The coordinate-wise mean of a point set is the centroid
Centroid
In geometry, the centroid, geometric center, or barycenter of a plane figure or two-dimensional shape X is the intersection of all straight lines that divide X into two parts of equal moment about the line. Informally, it is the "average" of all points of X...

, which solves the same variational problem in the plane (or higher dimensional Euclidean space) that the familiar average solves on the real line — that is, the centroid has the smallest possible average squared distance to all points in the set.

Measures of spatial dispersion

Dispersion
Statistical dispersion
In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...

 captures the degree to which points in a point set are separated from each other. For most applications, spatial dispersion should be quantified in a way that is invariant to rotations and reflections. Several simple measures of spatial dispersion for a point set can be defined using the covariance matrix
Covariance matrix
In probability theory and statistics, a covariance matrix is a matrix whose element in the i, j position is the covariance between the i th and j th elements of a random vector...

 of the coordinates of the points. The trace
Trace (linear algebra)
In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal of A, i.e.,...

, the determinant
Determinant
In linear algebra, the determinant is a value associated with a square matrix. It can be computed from the entries of the matrix by a specific arithmetic expression, while other ways to determine its value exist as well...

, and the largest eigenvalue of the covariance matrix can be used as measures of spatial dispersion.

A measure of spatial dispersion that is not based on the covariance matrix is the average distance between nearest neighbors .

Measures of spatial homogeneity

A homogeneous set of points in the plane is a set that is distributed such that approximately the same number of points occurs in any circular region of a given area. A set of points that lacks homogeneity is spatially clustered. A simple probability model for spatially homogeneous points is the Poisson process
Poisson process
A Poisson process, named after the French mathematician Siméon-Denis Poisson , is a stochastic process in which events occur continuously and independently of one another...

 in the plane with constant intensity function.

Ripley's K and L functions

Ripley's K and L functions are closely related descriptive statistics for detecting deviations from spatial homogeneity. The K function (technically its sample-based estimate) is defined as


where dij is the Euclidean distance between the ith and jth points in a data set of n points, and λ is the average density of points, generally estimated as n/A, where A is the area of the region containing all points. If the points are approximately homogeneous, should be approximately equal to πs2.

For data analysis, the variance stabilized Ripley K function called the L function is generally used. The sample version of the L function is defined as


For approximately homogeneous data, the L function has expected value s and its variance is approximately constant in s. A common plot is a graph of against s, which will approximately follow the horizontal zero-axis with constant dispersion if the data follow a homogeneous Poisson process.

See also

  • Geostatistics
    Geostatistics
    Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology,...

  • Variogram
    Variogram
    In spatial statistics the theoretical variogram 2\gamma is a function describing the degree of spatial dependence of a spatial random field or stochastic process Z...

  • Correlogram
    Correlogram
    In the analysis of data, a correlogram is an image of correlation statistics. For example, in time series analysis, a correlogram, also known as an autocorrelation plot, is a plot of the sample autocorrelations r_h\, versus h\, ....

  • Kriging
    Kriging
    Kriging is a group of geostatistical techniques to interpolate the value of a random field at an unobserved location from observations of its value at nearby locations....

  • Cuzick–Edwards test for clustering of sub-populations within clustered populations
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK