Cobweb (clustering)
Encyclopedia
COBWEB is an incremental system for hierarchical conceptual clustering
.
COBWEB incrementally organizes observations into a classification tree. Each node in a classification tree represents a class (concept) and is labeled by a probabilistic concept that summarizes the attribute-value distributions of objects classified under the node. This classification tree can be used to predict missing attributes or the class of a new object.
There are four basic operations COBWEB employs in building the classification tree. Which operation is selected depends on the category utility
of the classification achieved by applying it. The operations are:
Input: A COBWEB node root, an instance to insert record
if root has no children then
children := {copy(root)}
newcategory(record) \\ adds child with record’s feature values.
insert(record, root) \\ update root’s statistics
else
insert(record, root)
for child in root’s children do
calculate Category Utility for insert(record, child),
set best1, best2 children w. best CU.
end for
if newcategory(record) yields best CU then
newcategory(record)
else if merge(best1, best2) yields best CU then
merge(best1, best2)
COBWEB(root, record)
else if split(best1) yields best CU then
split(best1)
COBWEB(root, record)
else
COBWEB(best1, record)
end if
end
Conceptual clustering
Conceptual clustering is a machine learning paradigm for unsupervised classification developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating...
.
COBWEB incrementally organizes observations into a classification tree. Each node in a classification tree represents a class (concept) and is labeled by a probabilistic concept that summarizes the attribute-value distributions of objects classified under the node. This classification tree can be used to predict missing attributes or the class of a new object.
There are four basic operations COBWEB employs in building the classification tree. Which operation is selected depends on the category utility
Category utility
Category utility is a measure of "category goodness" defined in and . It was intended to supersede more limited measures of category goodness such as "cue validity" and "collocation index"...
of the classification achieved by applying it. The operations are:
- Merging Two Nodes
Merging two nodes means replacing them by a node whose children is the union of the original nodes' sets of children and which summarizes the attribute-value distributions of all objects classified under them.
- Splitting a node
A node is split by replacing it with its children.
- Inserting a new node
A node is created corresponding to the object being inserted into the tree.
- Passing an object down the hierarchy
Effectively calling the COBWEB algorithm on the object and the subtree rooted in the node.
The COBWEB Algorithm
COBWEB(root, record):Input: A COBWEB node root, an instance to insert record
if root has no children then
children := {copy(root)}
newcategory(record) \\ adds child with record’s feature values.
insert(record, root) \\ update root’s statistics
else
insert(record, root)
for child in root’s children do
calculate Category Utility for insert(record, child),
set best1, best2 children w. best CU.
end for
if newcategory(record) yields best CU then
newcategory(record)
else if merge(best1, best2) yields best CU then
merge(best1, best2)
COBWEB(root, record)
else if split(best1) yields best CU then
split(best1)
COBWEB(root, record)
else
COBWEB(best1, record)
end if
end