Fuzzy clustering
Encyclopedia
Fuzzy clustering is a class of algorithm
s for cluster analysis in which the allocation of data points to clusters is not "hard" (all-or-nothing) but "fuzzy" in the same sense as fuzzy logic
.
is the process of dividing data elements into classes or clusters so that items in the same class are as similar as possible, and items in different classes are as dissimilar as possible. Depending on the nature of the data and the purpose for which clustering is being used, different measures of similarity may be used to place items into classes, where the similarity measure controls how the clusters are formed. Some examples of measures that can be used as in clustering include distance, connectivity, and intensity.
In hard clustering, data is divided into distinct clusters, where each data element belongs to exactly one cluster. In fuzzy clustering (also referred to as soft clustering), data elements can belong to more than one cluster, and associated with each element is a set of membership levels. These indicate the strength of the association between that data element and a particular cluster. Fuzzy clustering is a process of assigning these membership levels, and then using them to assign data elements to one or more clusters.
One of the most widely used fuzzy clustering algorithms is the Fuzzy C-Means (FCM) Algorithm
(Bezdek 1981). The FCM algorithm attempts to partition a finite collection of n elements
into a collection of c fuzzy clusters with respect to some given criterion.
Given a finite set of data, the algorithm returns a list of c cluster centres and a partition matrix , where each element uij tells
the degree to which element xi belongs to cluster cj . Like the k-means algorithm, the FCM
aims to minimize an objective function. The standard function is:
which differs from the k-means objective function by the addition of the membership values
uij and the fuzzifier m. The fuzzifier m determines the level of cluster fuzziness. A large
m results in smaller memberships uij and hence, fuzzier clusters. In the limit m = 1, the
memberships uij converge to 0 or 1, which implies a crisp partitioning. In the absence of
experimentation or domain knowledge, m is commonly set to 2. The basic FCM Algorithm,
given n data points (x1, . . . , xn) to be clustered, a number of c clusters with (c1, . . . , cc) the center of the clusters, and m the level of cluster fuzziness with,
, each point has a degree of belonging to clusters, as in fuzzy logic
, rather than belonging completely to just one cluster. Thus, points on the edge of a cluster, may be in the cluster to a lesser degree than points in the center of cluster. An overview and comparison of different fuzzy clustering algorithms is available.
Any point x has a set of coefficients giving the degree of being in the kth cluster wk(x). With fuzzy c-means, the centroid of a cluster is the mean of all points, weighted by their degree of belonging to the cluster:
The degree of belonging, wk(x), is related inversely to the distance from x to the cluster centrer as calculated on the previous pass. It also depends on a parameter m that controls how much weight is given to the closest centre. The fuzzy c-means algorithm is very similar to the k-means algorithm:
The algorithm minimizes intra-cluster variance as well, but has the same problems as k-means; the minimum is a local minimum, and the results depend on the initial choice of weights.
The expectation-maximization algorithm
is a more statistically formalized method which includes some of these ideas: partial membership in classes.
Fuzzy c-means has been a very important tool for image processing in clustering objects in an image. In the 70's, mathematicians introduced the spatial term into the FCM algorithm to improve the accuracy of clustering under noise.
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s for cluster analysis in which the allocation of data points to clusters is not "hard" (all-or-nothing) but "fuzzy" in the same sense as fuzzy logic
Fuzzy logic
Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1...
.
Explanation of clustering
Data clusteringData clustering
Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....
is the process of dividing data elements into classes or clusters so that items in the same class are as similar as possible, and items in different classes are as dissimilar as possible. Depending on the nature of the data and the purpose for which clustering is being used, different measures of similarity may be used to place items into classes, where the similarity measure controls how the clusters are formed. Some examples of measures that can be used as in clustering include distance, connectivity, and intensity.
In hard clustering, data is divided into distinct clusters, where each data element belongs to exactly one cluster. In fuzzy clustering (also referred to as soft clustering), data elements can belong to more than one cluster, and associated with each element is a set of membership levels. These indicate the strength of the association between that data element and a particular cluster. Fuzzy clustering is a process of assigning these membership levels, and then using them to assign data elements to one or more clusters.
One of the most widely used fuzzy clustering algorithms is the Fuzzy C-Means (FCM) Algorithm
(Bezdek 1981). The FCM algorithm attempts to partition a finite collection of n elements
into a collection of c fuzzy clusters with respect to some given criterion.
Given a finite set of data, the algorithm returns a list of c cluster centres and a partition matrix , where each element uij tells
the degree to which element xi belongs to cluster cj . Like the k-means algorithm, the FCM
aims to minimize an objective function. The standard function is:
which differs from the k-means objective function by the addition of the membership values
uij and the fuzzifier m. The fuzzifier m determines the level of cluster fuzziness. A large
m results in smaller memberships uij and hence, fuzzier clusters. In the limit m = 1, the
memberships uij converge to 0 or 1, which implies a crisp partitioning. In the absence of
experimentation or domain knowledge, m is commonly set to 2. The basic FCM Algorithm,
given n data points (x1, . . . , xn) to be clustered, a number of c clusters with (c1, . . . , cc) the center of the clusters, and m the level of cluster fuzziness with,
Fuzzy c-means clustering
In fuzzy clusteringFuzzy clustering
Fuzzy clustering is a class of algorithms for cluster analysis in which the allocation of data points to clusters is not "hard" but "fuzzy" in the same sense as fuzzy logic.- Explanation of clustering :...
, each point has a degree of belonging to clusters, as in fuzzy logic
Fuzzy logic
Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1...
, rather than belonging completely to just one cluster. Thus, points on the edge of a cluster, may be in the cluster to a lesser degree than points in the center of cluster. An overview and comparison of different fuzzy clustering algorithms is available.
Any point x has a set of coefficients giving the degree of being in the kth cluster wk(x). With fuzzy c-means, the centroid of a cluster is the mean of all points, weighted by their degree of belonging to the cluster:
The degree of belonging, wk(x), is related inversely to the distance from x to the cluster centrer as calculated on the previous pass. It also depends on a parameter m that controls how much weight is given to the closest centre. The fuzzy c-means algorithm is very similar to the k-means algorithm:
- Choose a number of clustersDetermining the number of clusters in a data setDetermining the number of clusters in a data set, a quantity often labeled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem....
. - Assign randomly to each point coefficients for being in the clusters.
- Repeat until the algorithm has converged (that is, the coefficients' change between two iterations is no more than , the given sensitivity threshold) :
- Compute the centroid for each cluster, using the formula above.
- For each point, compute its coefficients of being in the clusters, using the formula above.
The algorithm minimizes intra-cluster variance as well, but has the same problems as k-means; the minimum is a local minimum, and the results depend on the initial choice of weights.
The expectation-maximization algorithm
Expectation-maximization algorithm
In statistics, an expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables...
is a more statistically formalized method which includes some of these ideas: partial membership in classes.
Fuzzy c-means has been a very important tool for image processing in clustering objects in an image. In the 70's, mathematicians introduced the spatial term into the FCM algorithm to improve the accuracy of clustering under noise.
See also
- FLAME ClusteringFLAME clusteringFuzzy clustering by Local Approximation of MEmberships is a data clustering algorithm that defines clusters in the dense parts of a dataset and performs cluster assignment solely based on the neighborhood relationships among objects...
- Cluster Analysis
- Expectation-maximization algorithmExpectation-maximization algorithmIn statistics, an expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables...
(a similar, but more statistically formalized method)