Part-based models
Encyclopedia
Part based models refers to a broad class of detection algorithms used on images, in which various parts of the image are used separately in order to determine if and where an object of interest exists. Amongst these methods a very popular one is the constellation model
which refers to those schemes which seek to detect a small number of features and their relative positions to then determine whether or not the object of interest is present.
These models build on the original idea of Fischler and Elschlager of using the relative position of a few template matches and evolve in complexity in the work of Perona and others. These models will be covered in the constellation models section. To get a better idea of what is meant by constellation model an example may be more illustrative. Say we are trying to detect faces. A constellation model would use smaller part detectors, for instance mouth, nose and eye detectors and make a judgment about whether an image has a face based on the relative positions in which the components fire.
Early efforts, such as those by Yuille, Hallinan and Cohen sought to detect facial features and fit deformable templates to them. These templates were mathematically defined outlines which sought to capture the position and shape of the feature. Yuille, Hallinan and Cohen’s algorithm does have trouble finding the global minimum
fit for a given model and so templates did occasionally become mismatched.
Later efforts such as those by Poggio and Brunelli focus on building specific detectors for each feature. They use successive detectors to estimate scale, position, etc. and narrow the search field to be used by the next detector. As such it is a part based model, however, they seek more to recognize specific faces rather than to detect the presence of a face. They do so by using each detector to build a 35 element vector of characteristics of a given face. These characteristic can then be compared to recognize specific faces, however cut-offs can also be used to detect whether a face is present at all.
Cootes, Lanitis and Taylor build on this work in constructing a 100 element representation of the primary features of a face. The model is more detailed and robust however, given the additional complexity (100 elements compared to 35) this might be expected. The model essentially computes deviations from a mean face in terms of shape, orientation and gray level. The model is matched by the minimization of an error function
. These three classes of algorithms naturally fall within the scope of template matching
Of the non-constellation perhaps the most successful is that of Leibe and Schiele. Their algorithm finds templates associated with positive examples and records both the template (an average of the feature in all positive examples where it’s present) and the position of the center of the item (a face for instance) relative to the template. The algorithm then takes a test image and runs an interest point locater (hopefully one of the scale invariant
variety). These interest points are then compared to each template and the probability of a match is computed. All templates then cast votes for the center of the detected object proportional to the probability of the match, and the probability the template predicts the center. These votes are all summed and if there are enough of them, well enough clustered, the presence of the object in question (i.e. a face or car) is predicted.
The algorithm is effective because it imposes much less constellational rigidity the way the constellation model does. Admittedly the constellation model can be modified to allow for occlusions and other large abnormalities but this model is naturally suited to it. Also it must be said that sometimes the more rigid structure of the constellation is desired.
Constellation model
The constellation model is a probabilistic, generative model for category-level object recognition in computer vision. Like other part-based models, the constellation model attempts to represent an object class by a set of N parts under mutual geometric constraints...
which refers to those schemes which seek to detect a small number of features and their relative positions to then determine whether or not the object of interest is present.
These models build on the original idea of Fischler and Elschlager of using the relative position of a few template matches and evolve in complexity in the work of Perona and others. These models will be covered in the constellation models section. To get a better idea of what is meant by constellation model an example may be more illustrative. Say we are trying to detect faces. A constellation model would use smaller part detectors, for instance mouth, nose and eye detectors and make a judgment about whether an image has a face based on the relative positions in which the components fire.
Non-constellation models
Many overlapping ideas are included under the title part-based models even after having excluded those models of the constellation variety. The uniting thread is the use of small parts to build up to an algorithm that can detect/recognize an item (face, car, etc.)Early efforts, such as those by Yuille, Hallinan and Cohen sought to detect facial features and fit deformable templates to them. These templates were mathematically defined outlines which sought to capture the position and shape of the feature. Yuille, Hallinan and Cohen’s algorithm does have trouble finding the global minimum
Maxima and minima
In mathematics, the maximum and minimum of a function, known collectively as extrema , are the largest and smallest value that the function takes at a point either within a given neighborhood or on the function domain in its entirety .More generally, the...
fit for a given model and so templates did occasionally become mismatched.
Later efforts such as those by Poggio and Brunelli focus on building specific detectors for each feature. They use successive detectors to estimate scale, position, etc. and narrow the search field to be used by the next detector. As such it is a part based model, however, they seek more to recognize specific faces rather than to detect the presence of a face. They do so by using each detector to build a 35 element vector of characteristics of a given face. These characteristic can then be compared to recognize specific faces, however cut-offs can also be used to detect whether a face is present at all.
Cootes, Lanitis and Taylor build on this work in constructing a 100 element representation of the primary features of a face. The model is more detailed and robust however, given the additional complexity (100 elements compared to 35) this might be expected. The model essentially computes deviations from a mean face in terms of shape, orientation and gray level. The model is matched by the minimization of an error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...
. These three classes of algorithms naturally fall within the scope of template matching
Template matching
Template matching is a technique in digital image processing for finding small parts of an image which match a template image. It can be used in manufacturing as a part of quality control, a way to navigate a mobile robot, or as a way to detect edges in images....
Of the non-constellation perhaps the most successful is that of Leibe and Schiele. Their algorithm finds templates associated with positive examples and records both the template (an average of the feature in all positive examples where it’s present) and the position of the center of the item (a face for instance) relative to the template. The algorithm then takes a test image and runs an interest point locater (hopefully one of the scale invariant
Scale invariance
In physics and mathematics, scale invariance is a feature of objects or laws that do not change if scales of length, energy, or other variables, are multiplied by a common factor...
variety). These interest points are then compared to each template and the probability of a match is computed. All templates then cast votes for the center of the detected object proportional to the probability of the match, and the probability the template predicts the center. These votes are all summed and if there are enough of them, well enough clustered, the presence of the object in question (i.e. a face or car) is predicted.
The algorithm is effective because it imposes much less constellational rigidity the way the constellation model does. Admittedly the constellation model can be modified to allow for occlusions and other large abnormalities but this model is naturally suited to it. Also it must be said that sometimes the more rigid structure of the constellation is desired.