CRISP-DM
Encyclopedia
CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a data mining
process model that describes commonly used approaches that expert data miners use to tackle problems. Polls conducted in 2002, 2004, and 2007 show that it is the leading methodology used by data miners.
into six major phases:
, NCR Corporation
, Daimler-Benz
and OHRA.
This core consortium brought different experiences to the project: ISL, later acquired and merged into SPSS Inc.
The computer giant NCR Corporation
produced the Teradata
data warehouse and its own data mining software. Daimler-Benz
had a significant data mining team. OHRA, an insurance company, was just starting to explore the potential use of data mining.
The first version of the methodology was released as CRISP-DM 1.0 in 1999.
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
process model that describes commonly used approaches that expert data miners use to tackle problems. Polls conducted in 2002, 2004, and 2007 show that it is the leading methodology used by data miners.
Major phases
CRISP-DM breaks the process of data miningData mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
into six major phases:
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
History
CRISP-DM was conceived in 1996. In 1997 it got underway as a European Union project under the ESPRIT funding initiative. The project was led by four companies: SPSSSPSS
SPSS is a computer program used for survey authoring and deployment , data mining , text analytics, statistical analysis, and collaboration and deployment ....
, NCR Corporation
NCR Corporation
NCR Corporation is an American technology company specializing in kiosk products for the retail, financial, travel, healthcare, food service, entertainment, gaming and public sector industries. Its main products are self-service kiosks, point-of-sale terminals, automated teller machines, check...
, Daimler-Benz
Daimler-Benz
Daimler-Benz AG was a German manufacturer of automobiles, motor vehicles, and internal combustion engines; founded in 1926. An Agreement of Mutual Interest - which was valid until year 2000 - was signed on 1 May 1924 between Karl Benz's Benz & Cie., and Daimler Motoren Gesellschaft, which had...
and OHRA.
This core consortium brought different experiences to the project: ISL, later acquired and merged into SPSS Inc.
SPSS Inc.
SPSS Inc. was a software house headquartered in Chicago and incorporated in Delaware, most noted for the proprietary software of the same name SPSS. The use of this trademarked name has been the subject of ongoing legal action against the company for many years.In addition to the software which...
The computer giant NCR Corporation
NCR Corporation
NCR Corporation is an American technology company specializing in kiosk products for the retail, financial, travel, healthcare, food service, entertainment, gaming and public sector industries. Its main products are self-service kiosks, point-of-sale terminals, automated teller machines, check...
produced the Teradata
Teradata
Teradata Corporation is a vendor specializing in data warehousing and analytic applications. Its products are commonly used by companies to manage data warehouses for analytics and business intelligence purposes. Teradata was formerly a division of NCR Corporation, with the spinoff from NCR on...
data warehouse and its own data mining software. Daimler-Benz
Daimler-Benz
Daimler-Benz AG was a German manufacturer of automobiles, motor vehicles, and internal combustion engines; founded in 1926. An Agreement of Mutual Interest - which was valid until year 2000 - was signed on 1 May 1924 between Karl Benz's Benz & Cie., and Daimler Motoren Gesellschaft, which had...
had a significant data mining team. OHRA, an insurance company, was just starting to explore the potential use of data mining.
The first version of the methodology was released as CRISP-DM 1.0 in 1999.
CRISP-DM 2.0
In July 2006 the consortium announced that it was going to start the process of working towards a second version of CRISP-DM. On 26 September 2006, the CRISP-DM SIG met to discuss potential enhancements for CRISP-DM 2.0 and the subsequent roadmap. However, these efforts appear to be stalled. The SIG has not met, updated the CRISP website, or communicated anything to members since early 2007. As of June 22, 2011, the website redirects to an IBM page about SPSS.Advantages
- Industry neutral
- Tool neutral
- Closely related to the Knowledge Discovery in Databases Process Model
- Anchors the data mining process
External links
- CRoss Industry Standard Process for Data Mining Blog
- Le site des dataminers Article publié par Pascal BIZZARI, Mai 2009
- The Data Mining Group (DMG): The DMG is an independent, vendor led group which develops data mining standards, such as the Predictive Model Markup Language (PMML)