Cluster analysis (in marketing)
Encyclopedia
Cluster analysis is a class of statistical
techniques that can be applied to data that exhibit “natural” groupings. Cluster analysis sorts through the raw data and groups them into clusters. A cluster is a group of relatively homogeneous cases or observations. Objects in a cluster are similar to each other. They are also dissimilar to objects outside the cluster, particularly objects in other clusters.
The diagram below illustrates the results of a survey that studied drinkers’ perceptions of spirits (alcohol). Each point represents the results from one respondent. The research indicates there are four clusters in this market.
Another example is the vacation travel market. Recent research has identified three clusters or market segment
s. They are the: 1) The demanders - they want exceptional service and expect to be pampered; 2) The escapists - they want to get away and just relax; 3) The educationalist - they want to see new things, go to museums, go on a safari, or experience new cultures.
Cluster analysis, like factor analysis
and multi-dimensional scaling, is an interdependence technique: it makes no distinction between dependent and independent variables. The entire set of interdependent relationships is examined. It is similar to multi-dimensional scaling in that both examine inter-object similarity by examining the complete set of interdependent relationships. The difference is that multi-dimensional scaling identifies underlying dimensions, while cluster analysis identifies clusters. Cluster analysis is the obverse of factor analysis. Whereas factor analysis reduces the number of variables by grouping them into a smaller set of factors, cluster analysis reduces the number of observations or cases by grouping them into a smaller set of clusters.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
techniques that can be applied to data that exhibit “natural” groupings. Cluster analysis sorts through the raw data and groups them into clusters. A cluster is a group of relatively homogeneous cases or observations. Objects in a cluster are similar to each other. They are also dissimilar to objects outside the cluster, particularly objects in other clusters.
The diagram below illustrates the results of a survey that studied drinkers’ perceptions of spirits (alcohol). Each point represents the results from one respondent. The research indicates there are four clusters in this market.
Another example is the vacation travel market. Recent research has identified three clusters or market segment
Market segment
Market segmentation is a concept in economics and marketing. A market segment is a sub-set of a market made up of people or organizations with one or more characteristics that cause them to demand similar product and/or services based on qualities of those products such as price or function...
s. They are the: 1) The demanders - they want exceptional service and expect to be pampered; 2) The escapists - they want to get away and just relax; 3) The educationalist - they want to see new things, go to museums, go on a safari, or experience new cultures.
Cluster analysis, like factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
and multi-dimensional scaling, is an interdependence technique: it makes no distinction between dependent and independent variables. The entire set of interdependent relationships is examined. It is similar to multi-dimensional scaling in that both examine inter-object similarity by examining the complete set of interdependent relationships. The difference is that multi-dimensional scaling identifies underlying dimensions, while cluster analysis identifies clusters. Cluster analysis is the obverse of factor analysis. Whereas factor analysis reduces the number of variables by grouping them into a smaller set of factors, cluster analysis reduces the number of observations or cases by grouping them into a smaller set of clusters.
In marketing, cluster analysis is used for
- Segmenting the market and determining target marketTarget marketA target market is a group of customers that the business has decided to aim its marketing efforts and ultimately its merchandise. A well-defined target market is the first element to a marketing strategy...
s - Product positioningPositioning (marketing)In marketing, positioning has come to mean the process by which marketers try to create an image or identity in the minds of their target market for its product, brand, or organization....
and New Product DevelopmentNew product developmentIn business and engineering, new product development is the term used to describe the complete process of bringing a new product to market. A product is a set of benefits offered for exchange and can be tangible or intangible... - Selecting test markets (see : experimental techniquesExperimental techniquesExperimental research designs are used for the controlled testing of causal processes.The general procedure is one or more independent variables are manipulated to determine their effect on a dependent variable...
)
Basic procedure
- Formulate the problem - select the variables to which you wish to apply the clustering technique
- Select a distance measureMetric systemThe metric system is an international decimalised system of measurement. France was first to adopt a metric system, in 1799, and a metric system is now the official system of measurement, used in almost every country in the world...
- various ways of computing distance:- Squared Euclidean distanceEuclidean distanceIn mathematics, the Euclidean distance or Euclidean metric is the "ordinary" distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space becomes a metric space...
- the sum of the squared differences in value for each variable - Manhattan distance - the sum of the absolute differenceAbsolute differenceThe absolute difference of two real numbers x, y is given by |x − y|, the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y...
s in value for any variable - Chebyshev distanceChebyshev distanceIn mathematics, Chebyshev distance , Maximum metric, or L∞ metric is a metric defined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension...
- the maximum absolute difference in values for any variable - MahalanobisMahalanobis distanceIn statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean...
(or correlation) distance - this measure uses the correlation coefficientsPearson product-moment correlation coefficientIn statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...
between the observations and uses that as a measure to cluster them. This is an important measure since it is unit invariantInvariantInvariant and invariance may have several meanings, among which are:- Computer science :* Invariant , an Expression whose value doesn't change during program execution* A type in overriding that is neither covariant nor contravariant...
(can figuratively compare apples to oranges)
- Squared Euclidean distance
- Select a clustering procedure (see below)
- Decide on the number of clusters
- Map and interpret clusters - draw conclusions - illustrative techniques like perceptual mapsPerceptual mappingPerceptual mapping is a graphics technique used by asset marketers that attempts to visually display the perceptions of customers or potential customers. Typically the position of a product, product line, brand, or company is displayed relative to their competition.Perceptual maps can have any...
, icicle plots, and dendrograms are useful - Assess reliability and validity - various methods:
- repeat analysis but use different distance measure
- repeat analysis but use different clustering technique
- split the data randomly into two halves and analyze each part separately
- repeat analysis several times, deleting one variable each time
- repeat analysis several times, using a different order each time
Clustering procedures
There are several types of clustering methods:- Non-Hierarchical clustering (also called k-means clustering)
- first determine a cluster center, then group all objects that are within a certain distance
- examples:
- Sequential Threshold method - first determine a cluster center, then group all objects that are within a predetermined threshold from the center - one cluster is created at a time
- Parallel Threshold method - simultaneously several cluster centers are determined, then objects that are within a predetermined threshold from the centers are grouped
- Optimizing Partitioning method - first a non-hierarchical procedure is run, then objects are reassigned so as to optimize an overall criterion.
- Hierarchical clustering
- objects are organized into an hierarchical structure as part of the procedure
- examples:
- Divisive clustering - start by treating all objects as if they are part of a single large cluster, then divide the cluster into smaller and smaller clusters
- Agglomerative clustering - start by treating each object as a separate cluster, then group them into bigger and bigger clusters
- examples:
- Centroid methods - clusters are generated that maximize the distance between the centers of clusters (a centroid is the mean value for all the objects in the cluster)
- Variance methods - clusters are generated that minimize the within-cluster variance
- example:
- Ward’s Procedure - clusters are generated that minimize the squared Euclidean distance to the center mean
- example:
- Linkage methods - cluster objects based on the distance between them
- examples:
- Single Linkage method - cluster objects based on the minimum distance between them (also called the nearest neighbour rule)
- Complete Linkage method - cluster objects based on the maximum distance between them (also called the furthest neighbour rule)
- Average Linkage method - cluster objects based on the average distance between all pairs of objects (one member of the pair must be from a different cluster)
- examples:
- examples:
See also
- marketingMarketingMarketing is the process used to determine what products or services may be of interest to customers, and the strategy to use in sales, communications and business development. It generates the strategy that underlies sales techniques, business communication, and business developments...
- marketing researchMarketing researchMarketing research is "the function that links the consumer, customer, and public to the marketer through information — information used to identify and define marketing opportunities and problems; generate, refine, and evaluate marketing actions; monitor marketing performance; and improve...
- factor analysisFactor analysisFactor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
- multi dimensional scaling
- quantitative marketing researchQuantitative marketing researchQuantitative marketing research is the application of quantitative research techniques to the field of marketing. It has roots in both the positivist view of the world, and the modern marketing viewpoint that marketing is an interactive process in which both the buyer and seller reach a satisfying...
- positioningPositioning (marketing)In marketing, positioning has come to mean the process by which marketers try to create an image or identity in the minds of their target market for its product, brand, or organization....
- perceptual mappingPerceptual mappingPerceptual mapping is a graphics technique used by asset marketers that attempts to visually display the perceptions of customers or potential customers. Typically the position of a product, product line, brand, or company is displayed relative to their competition.Perceptual maps can have any...