Apache Mahout
Encyclopedia
Apache Mahout is an Apache
project to produce free
implementations of distributed
or otherwise scalable
machine learning
algorithms on the Hadoop
platform. Mahout is a work in progress; the number of implemented algorithms has grown quickly, but there are still various multivariate analytics algorithms missing.
While Mahout's core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm, it does not restrict contributions to Hadoop based implementations. Contributions that run on a single node or on a non-Hadoop cluster are also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project, and can run stand-alone without Hadoop. Integration with initiatives such as the Pregel-like Giraph are actively under discussion.
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...
project to produce free
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
implementations of distributed
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
or otherwise scalable
Scalability
In electronics scalability is the ability of a system, network, or process, to handle growing amount of work in a graceful manner or its ability to be enlarged to accommodate that growth...
machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
algorithms on the Hadoop
Hadoop
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...
platform. Mahout is a work in progress; the number of implemented algorithms has grown quickly, but there are still various multivariate analytics algorithms missing.
While Mahout's core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm, it does not restrict contributions to Hadoop based implementations. Contributions that run on a single node or on a non-Hadoop cluster are also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project, and can run stand-alone without Hadoop. Integration with initiatives such as the Pregel-like Giraph are actively under discussion.