MAHOUT
Dalla Palma Stefano
Open source Machine Learning Java library
A P A C H E
Part I: Overview
Search
Text mining
Information Retrieval
Search
Text mining
Information Retrieval
Search
Text mining
Information Retrieval
Collaborative
filtering
Search
Text mining
Information Retrieval
Collaborative
filtering
Mahout Machine Learning main techniques
and
architecture overview
Recommender
Data Store
User
Preference Item
Preference Item
Recommender
Neighborhood
Correlation
Preference
InferrerData Model
User
User
Classification
(Naïve Bayes)
Training
examples
Training
algorithm
ModelNew examples Decisions
Predictors and
target variables
Classification System
Copy
Estimated
target
variable
Predicted
variables
only
Model
Clustering
Clustering
Clustering
(k-means)
Map output = <centroid_id, data_point>
S = Shuffle and Sorting
M1
Split 1
Split 2
Split n-1
Split n
…Input M2
Mn
R1
R2
Rk
S
New Centroids
file on Distr.
Cache
Centroids file on
Distr. Cache
Map phase Reduce phase
Samsara
Apache Flink
Flink
MAHOUTOpen source Machine Learning Java library
A P A C H E
Part II: Architecture
MAHOUTOpen source Machine Learning Java library
A P A C H E
Part III: Conclusions
Stablest components
Most instable components
Questions?
Dalla Palma Stefano
A P A C H E M A H O U T

Apache Mahout Architecture Overview