Predictive modelling
Encyclopedia
Predictive modelling is the process by which a model is created or chosen to try to best predict the probability
of an outcome. In many cases the model is chosen on the basis of detection theory
to try to guess the probability of an outcome given a set amount of input data, for example given an email
determining how likely that it is spam
.
Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set, say spam or 'ham'.
is a technique in which unknown values of a discrete variable are predicted based on known values of one or more continuous and/or discrete variables. Logistic regression
differs from OLS regression
in that the dependent variable is binary in nature. This procedure has many applications. In biostatistics
, the researcher may be interested in trying to model the probability of a patient being diagnosed with a certain type of cancer based on knowing, say, the incidence of that cancer in his or her family. In business, the marketer may be interested in modelling the probability of an individual purchasing a product based on the price of that product. Both of these are examples of a simple, binary logistic model. The model is "simple" in that each has only one independent, or predictor, variable, and it is "binary" in that the dependent variable can take on only one of two values: cancer or no cancer, and purchase or does not purchase.
is a technique for modelling the change in probability caused by an action. Typically this is a marketing action such as an offer to buy a product, to use a product more or to re-sign a contract. For example in a
retention campaign you wish to predict the change in probability that a customer will remain a customer if they are contacted. A model of the change in probability allows the retention campaign to be targeted at those customers on whom the change in probability will be beneficial. This allows the retention programme to avoid triggering unnecessary churn
or customer attrition
without wasting money contacting people who would act anyway.
gets its foundations from Gordon Willey
's mid-fifties work in the Virú Valley of Peru. Complete, intensive surveys were performed then covariability
between cultural remains and natural features such as slope, and vegetation were determined. Development of quantitative methods and a greater availability of applicable data led to growth of the discipline in the 1960s and by the late 1980s, substantial progress had been made by major land managers worldwide.
Generally, predictive modelling in archaeology is establishing statistically valid, causal or covariable relationships between natural proxies such as soil types, elevation, slope, vegetation, proximity to water, geology, geomorphology, etc., and the presence of archaeological features. Through analysis of these quantifiable attributes from land that has undergone archaeological survey, sometimes the “archaeological sensitivity” of unsurveyed areas can be anticipated based on the natural proxies in those areas. Large land managers in the United States, such as the Bureau of Land Management (BLM), the Department of Defense (DOD), and numerous highway and parks agencies, have successfully employed this strategy. By using predictive modelling in their cultural resource management plans, they are capable of making more informed decisions when planning for activities that have the potential to require ground disturbance and subsequently affect archaeological sites.
and data mining
to produce customer-level models that describe the likelihood that a customer will take a particular action. The actions are usually sales, marketing and customer retention related.
For example, a large consumer organisation such as a mobile telecommunications operator will have a set of predictive models for product cross-sell, product deep-sell and churn
. It is also now more common for such an organisation to have a model of savability using an uplift model
. This predicts the likelihood that a customer can be saved at the end of a contract period (the change in churn probability) as opposed to the standard churn prediction model.
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
of an outcome. In many cases the model is chosen on the basis of detection theory
Detection theory
Detection theory, or signal detection theory, is a means to quantify the ability to discern between information-bearing energy patterns and random energy patterns that distract from the information Detection theory, or signal detection theory, is a means to quantify the ability to discern between...
to try to guess the probability of an outcome given a set amount of input data, for example given an email
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...
determining how likely that it is spam
E-mail spam
Email spam, also known as junk email or unsolicited bulk email , is a subset of spam that involves nearly identical messages sent to numerous recipients by email. Definitions of spam usually include the aspects that email is unsolicited and sent in bulk. One subset of UBE is UCE...
.
Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set, say spam or 'ham'.
Majority classifier
The majority classifier takes non-anomalous data and incorporates it within its calculations. This ensures that the results produced by the predictive modelling system are as valid as possible.Logistic regression
Logistic regressionLogistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
is a technique in which unknown values of a discrete variable are predicted based on known values of one or more continuous and/or discrete variables. Logistic regression
Logistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
differs from OLS regression
Ordinary least squares
In statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...
in that the dependent variable is binary in nature. This procedure has many applications. In biostatistics
Biostatistics
Biostatistics is the application of statistics to a wide range of topics in biology...
, the researcher may be interested in trying to model the probability of a patient being diagnosed with a certain type of cancer based on knowing, say, the incidence of that cancer in his or her family. In business, the marketer may be interested in modelling the probability of an individual purchasing a product based on the price of that product. Both of these are examples of a simple, binary logistic model. The model is "simple" in that each has only one independent, or predictor, variable, and it is "binary" in that the dependent variable can take on only one of two values: cancer or no cancer, and purchase or does not purchase.
Uplift Modelling
Uplift ModellingUplift modelling
Uplift modelling, also known as incremental modelling, true lift modelling, or net modelling is a new predictive modelling technique that directly models the incremental impact of a treatment on an individual's behavior.Uplift modelling has applications in customer relationship management for...
is a technique for modelling the change in probability caused by an action. Typically this is a marketing action such as an offer to buy a product, to use a product more or to re-sign a contract. For example in a
retention campaign you wish to predict the change in probability that a customer will remain a customer if they are contacted. A model of the change in probability allows the retention campaign to be targeted at those customers on whom the change in probability will be beneficial. This allows the retention programme to avoid triggering unnecessary churn
Churn rate
Churn rate , in its broadest sense, is a measure of the number of individuals or items moving into or out of a collective over a specific period of time...
or customer attrition
Customer attrition
Customer attrition, also known as customer churn, customer turnover, or customer defection, is a business term used to describe loss of clients or customers....
without wasting money contacting people who would act anyway.
Archaeology
Predictive modelling in archaeologyArchaeology
Archaeology, or archeology , is the study of human society, primarily through the recovery and analysis of the material culture and environmental data that they have left behind, which includes artifacts, architecture, biofacts and cultural landscapes...
gets its foundations from Gordon Willey
Gordon Willey
Gordon Randolph Willey was an American archaeologist famous for his fieldwork in South and Central America as well as the southeastern United States...
's mid-fifties work in the Virú Valley of Peru. Complete, intensive surveys were performed then covariability
Covariance
In probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :...
between cultural remains and natural features such as slope, and vegetation were determined. Development of quantitative methods and a greater availability of applicable data led to growth of the discipline in the 1960s and by the late 1980s, substantial progress had been made by major land managers worldwide.
Generally, predictive modelling in archaeology is establishing statistically valid, causal or covariable relationships between natural proxies such as soil types, elevation, slope, vegetation, proximity to water, geology, geomorphology, etc., and the presence of archaeological features. Through analysis of these quantifiable attributes from land that has undergone archaeological survey, sometimes the “archaeological sensitivity” of unsurveyed areas can be anticipated based on the natural proxies in those areas. Large land managers in the United States, such as the Bureau of Land Management (BLM), the Department of Defense (DOD), and numerous highway and parks agencies, have successfully employed this strategy. By using predictive modelling in their cultural resource management plans, they are capable of making more informed decisions when planning for activities that have the potential to require ground disturbance and subsequently affect archaeological sites.
Customer relationship management
Predictive modelling is used extensively in analytical customer relationship managementCustomer relationship management
Customer relationship management is a widely implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing,...
and data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
to produce customer-level models that describe the likelihood that a customer will take a particular action. The actions are usually sales, marketing and customer retention related.
For example, a large consumer organisation such as a mobile telecommunications operator will have a set of predictive models for product cross-sell, product deep-sell and churn
Churn rate
Churn rate , in its broadest sense, is a measure of the number of individuals or items moving into or out of a collective over a specific period of time...
. It is also now more common for such an organisation to have a model of savability using an uplift model
Uplift modelling
Uplift modelling, also known as incremental modelling, true lift modelling, or net modelling is a new predictive modelling technique that directly models the incremental impact of a treatment on an individual's behavior.Uplift modelling has applications in customer relationship management for...
. This predicts the likelihood that a customer can be saved at the end of a contract period (the change in churn probability) as opposed to the standard churn prediction model.
See also
- California Predictive Model
- Prediction intervalPrediction intervalIn statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed...
- Predictive analyticsPredictive analyticsPredictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events....