Educational data mining
Encyclopedia
Educational Data Mining (called EDM) is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in. A key area of EDM is mining computer logs of student performance. Another key area is mining enrollment data. Key uses of EDM include predicting student performance, and studying learning in order to recommend improvements to current educational practice. EDM can be considered one of the learning sciences
, as well as an area of data mining
. A related field is learning analytics
.
in general, but with some
differences based on the unique features of educational data.
Ryan Baker classifies the areas of EDM as follows:
Baker and Kalina Yacef claim that discovery with models is particularly prominent in EDM, as compared to data mining in general. In discovery with models, a model of a phenomenon is developed through any process that can be validated in some fashion
(most commonly, prediction or knowledge engineering), and this model is then used as a
component in another analysis, such as prediction or relationship mining.
EDM papers are also published in the Journal of Educational Data Mining (JEDM).
Many EDM papers are routinely published in related conferences, such as Artificial
Intelligence and Education, Intelligent Tutoring Systems, and User Modeling and Adaptive
Personalization.
in the domain of educational data mining. The data set was provided by the Pittsburgh Science of Learning Center
DataShop, and consisted of over a million data points from students using Cognitive Tutor
educational software. 600 Teams competed for over $8000 dollars in prize money donated by Facebook
. The winners used the Random forest
, Bayesian Networks, and Feature generation techniques to accurately predict the performance of over half a million unseen student responses.
Learning sciences
The term Learning Sciences refers to an interdisciplinary field that works to further scientific understanding of learning as well as to engage in the design and implementation of learning innovations, and improvement of instructional methodologies...
, as well as an area of data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
. A related field is learning analytics
Learning analytics
Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs . A related field is educational data mining....
.
EDM methods
The types of EDM method are related to those found in data miningData mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
in general, but with some
differences based on the unique features of educational data.
Ryan Baker classifies the areas of EDM as follows:
- Prediction
- Classification
- RegressionRegressionRegression could refer to:* Regression , a defensive reaction to some unaccepted impulses* Regression analysis, a statistical technique for estimating the relationships among variables...
- Density estimationDensity estimationIn probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...
- Clustering
- Relationship mining
- Association rule miningAssociation rule learningIn data mining, association rule learning is a popular andwell researched method for discovering interesting relations between variablesin large databases. Piatetsky-Shapirodescribes analyzing and presenting...
- Correlation mining
- Sequential pattern mining
- Causal data mining
- Association rule mining
- Distillation of data for human judgment
- Discovery with models
Baker and Kalina Yacef claim that discovery with models is particularly prominent in EDM, as compared to data mining in general. In discovery with models, a model of a phenomenon is developed through any process that can be validated in some fashion
(most commonly, prediction or knowledge engineering), and this model is then used as a
component in another analysis, such as prediction or relationship mining.
Applications
A list of the primary applications of EDM is provided by Cristobal Romero and Sebastian Ventura. In their taxonomy, the areas of EDM application are:- Analysis and visualization of data
- Providing feedback for supporting instructors
- Recommendations for students
- Predicting student performance
- Student modeling
- Detecting undesirable student behaviors
- Grouping students
- Social network analysis
- Developing concept maps
- Constructing courseware
- Planning and scheduling
Publication Venues
Considerable amounts of EDM work are published at the peer-reviewed International Conference on Educational Data Mining, organized by the International Educational Data Mining Society.- 1st International Conference on Educational Data Mining (2008) -- Montreal, Canada
- 2nd International Conference on Educational Data Mining (2009) -- Cordoba, Spain
- 3rd International Conference on Educational Data Mining (2010) -- Pittsburgh, USA
- 4th International Conference on Educational Data Mining (2011) -- Eindhoven, Netherlands
EDM papers are also published in the Journal of Educational Data Mining (JEDM).
Many EDM papers are routinely published in related conferences, such as Artificial
Intelligence and Education, Intelligent Tutoring Systems, and User Modeling and Adaptive
Personalization.
The use of Educational Data Mining in the KDD Cup
In 2010, the Association of Computing Machinery's KDD Cup was held, with a data and topicin the domain of educational data mining. The data set was provided by the Pittsburgh Science of Learning Center
Pittsburgh Science of Learning Center
The Pittsburgh Science of Learning Center is a Science of Learning Center funded by the National Science Foundation.The PSLC theory wiki collects and organizes research results, including a list of instructional principles that are supported by learning science research. The wiki is open and...
DataShop, and consisted of over a million data points from students using Cognitive Tutor
Cognitive tutor
A cognitive tutor is an intelligent tutoring system which develops a cognitive model of a student as he or she interacts with the program, providing problems and individualized instruction based on this model....
educational software. 600 Teams competed for over $8000 dollars in prize money donated by Facebook
Facebook
Facebook is a social networking service and website launched in February 2004, operated and privately owned by Facebook, Inc. , Facebook has more than 800 million active users. Users must register before using the site, after which they may create a personal profile, add other users as...
. The winners used the Random forest
Random forest
Random forest is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. The algorithm for inducing a random forest was developed by Leo Breiman and Adele Cutler, and "Random Forests" is their trademark...
, Bayesian Networks, and Feature generation techniques to accurately predict the performance of over half a million unseen student responses.