Random multinomial logit
Encyclopedia
In statistics
and machine learning
, random multinomial logit (RMNL) is a technique for (multi-class) statistical classification using repeated multinomial logit
analyses via Leo Breiman
's random forests.
. While some algorithms are extensions or combinations of intrinsically binary classification methods (e.g., multiclass classifiers as one-versus-one or one-versus-all binary classifiers), other algorithms like multinomial logit
(MNL) are specifically designed to map features to a multiclass output vector. MNL’s stability has a proven track record in many disciplines, including transportation research and CRM (customer relationship management
). Unfortunately, MNL cannot overcome the curse of dimensionality
, thereby implicitly necessitating feature selection
, i.e., the selection of a best subset of variables of the input feature set. In contrast to binary logit, to date, software packages mostly lack any feature selection algorithm for MNL. This absence constitutes a problem for several application areas.
Recently, random forests, (i.e., a classifier combining a forest of decision trees grown on random input vectors and splitting nodes on a random subset of features) have been introduced for the classification of binary and multiclass outputs. Feature selection is implicitly incorporated during each tree construction. RMNL, a random forest of multinomial logit models, attempts to overcome the feature selection difficulty of MNL.
.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
and machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
, random multinomial logit (RMNL) is a technique for (multi-class) statistical classification using repeated multinomial logit
Multinomial logit
In statistics, economics, and genetics, a multinomial logit model, also known as multinomial logistic regression, is a regression model which generalizes logistic regression by allowing more than two discrete outcomes...
analyses via Leo Breiman
Leo Breiman
Leo Breiman was a distinguished statistician at the University of California, Berkeley. He was the recipient of numerous honors and awards, and was a member of the United States National Academy of Science....
's random forests.
Rationale for the new method
Several learning algorithms have been proposed to handle multiclass classificationMulticlass classification
In machine learning, multiclass or multinomial classification is the problem of classifying instances into more than two classes.While some classification algorithms naturally permit the use of more than two classes, others are by nature binary algorithms; these can, however, be turned into...
. While some algorithms are extensions or combinations of intrinsically binary classification methods (e.g., multiclass classifiers as one-versus-one or one-versus-all binary classifiers), other algorithms like multinomial logit
Multinomial logit
In statistics, economics, and genetics, a multinomial logit model, also known as multinomial logistic regression, is a regression model which generalizes logistic regression by allowing more than two discrete outcomes...
(MNL) are specifically designed to map features to a multiclass output vector. MNL’s stability has a proven track record in many disciplines, including transportation research and CRM (customer relationship management
Customer relationship management
Customer relationship management is a widely implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing,...
). Unfortunately, MNL cannot overcome the curse of dimensionality
Curse of dimensionality
The curse of dimensionality refers to various phenomena that arise when analyzing and organizing high-dimensional spaces that do not occur in low-dimensional settings such as the physical space commonly modeled with just three dimensions.There are multiple phenomena referred to by this name in...
, thereby implicitly necessitating feature selection
Feature selection
In machine learning and statistics, feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique of selecting a subset of relevant features for building robust learning models...
, i.e., the selection of a best subset of variables of the input feature set. In contrast to binary logit, to date, software packages mostly lack any feature selection algorithm for MNL. This absence constitutes a problem for several application areas.
Recently, random forests, (i.e., a classifier combining a forest of decision trees grown on random input vectors and splitting nodes on a random subset of features) have been introduced for the classification of binary and multiclass outputs. Feature selection is implicitly incorporated during each tree construction. RMNL, a random forest of multinomial logit models, attempts to overcome the feature selection difficulty of MNL.
Application
The developers of the RMNL technique (Prinzie & Van den Poel, 2008) show in their application paper the usefulness of the technique for cross-sell analysis in customer relationship managementCustomer relationship management
Customer relationship management is a widely implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing,...
.