2. About me
ASF member having fun with:
Lucene / Solr
Hama
UIMA
Stanbol
… some others
SW engineer @ Adobe R&D
2
3. Agenda
Apache Hama and BSP
Why machine learning on BSP
Some examples
Benchmarks
3
4. Apache Hama
Bulk Synchronous Parallel computing
framework on top of HDFS for massive
scientific computations
TLP since May 2012
0.6.0 release out soon
Growing community
4
5. BSP supersteps
A BSP algorithm is composed by a sequence of “supersteps”
5
6. BSP supersteps
Each task
Superstep 1
Do some computation
Communicate with other tasks
Synchronize
Superstep 2
Do some computation
Communicate with other tasks
Synchronize
…
…
…
Superstep N
Do some computation
Communicate with other tasks
Synchronize
6
7. Why BSP
Simple programming model
Supersteps semantic is easy
Preserve data locality
Improve performance
Well suited for iterative algorithms
7
10. Apache Hama
Features
BSP API
M/R like I/O API
Graph API
Job management / monitoring
Checkpoint recovery
Local & (Pseudo) Distributed run modes
Pluggable message transfer architecture
YARN supported
Running in Apache Whirr
10
11. Apache Hama BSP API
public abstract class BSP<K1, V1, K2, V2,
M extends Writable> …
K1, V1 are key, values for inputs
K2, V2 are key, values for outputs
M are they type of messages used for task
communication
11
12. Apache Hama BSP API
public void bsp(BSPPeer<K1, V1, K2, V2,
M> peer) throws ..
public void setup(BSPPeer<K1, V1, K2,
V2, M> peer) throws ..
public void cleanup(BSPPeer<K1, V1, K2,
V2, M> peer) throws ..
12
13. Machine learning on BSP
Lots (most?) of ML algorithms are
inherently iterative
Hama ML module currently counts
Collaborative filtering
Clustering
Gradient descent
13
15. Collaborative filtering
Given user preferences on movies
We want to find users “near” to some
specific user
So that that user can “follow” them
And/or see what they like (which he/she could
like too)
15
16. Collaborative filtering BSP
Given a specific user
Iteratively (for each task)
Superstep 1*i
Read a new user preference row
Find how near is that user from the current user
That is finding how near their preferences are
Since they are given as vectors we may use vector
distance measures like Euclidean, cosine, etc. distance
algorithms
Broadcast the measure output to other peers
Superstep 2*i
Aggregate measure outputs
Update most relevant users
Still to be committed (HAMA-612)
16
17. Collaborative filtering BSP
Given user ratings about movies
"john" -> 0, 0, 0, 9.5, 4.5, 9.5, 8
"paula" -> 7, 3, 8, 2, 8.5, 0, 0
"jim” -> 4, 5, 0, 5, 8, 0, 1.5
"tom" -> 9, 4, 9, 1, 5, 0, 8
"timothy" -> 7, 3, 5.5, 0, 9.5, 6.5, 0
We ask for 2 nearest users to “paula” and
we get “timothy” and “tom”
user recommendation
We can extract highly rated movies
“timothy” and “tom” that “paula” didn’t see
Item recommendation
17
18. Benchmarks
Fairly simple algorithm
Highly iterative
Comparing to Apache Mahout
Behaves better than ALS-WR
Behaves similarly to RecommenderJob and
ItemSimilarityJob
18
19. K-Means clustering
We have a bunch of data (e.g. documents)
We want to group those docs in k
homogeneous clusters
Iteratively for each cluster
Calculate new cluster center
Add doc nearest to new center to the cluster
19
21. K-Means clustering BSP
Iteratively
Superstep 1*i
Assignment phase
Read vectors splits
Sum up temporary centers with assigned vectors
Broadcast sum and ingested vectors count
Superstep 2*i
Update phase
Calculate the total sum over all received
messages and average
Replace old centers with new centers and check
for convergence
21
22. Benchmarks
One rack (16 nodes 256 cores) cluster
10G network
On average faster than Mahout’s impl
22
23. Gradient descent
Optimization algorithm
Find a (local) minimum of some function
Used for
solving linear systems
solving non linear systems
in machine learning tasks
linear regression
logistic regression
neural networks backpropagation
…
23
24. Gradient descent
Minimize a given (cost) function
Give the function a starting point (set of parameters)
Iteratively change parameters in order to minimize the
function
Stop at the (local)
minimum
There’s some math but intuitively:
evaluate derivatives at a given point in order to choose
where to “go” next
24
25. Gradient descent BSP
Iteratively
Superstep 1*i
each task calculates and broadcasts portions of the
cost function with the current parameters
Superstep 2*i
aggregate and update cost function
check the aggregated cost and iterations count
cost should always decrease
Superstep 3*i
each task calculates and broadcasts portions of
(partial) derivatives
Superstep 4*i
aggregate and update parameters
25
26. Gradient descent BSP
Simplistic example
Linear regression
Given real estate market dataset
Estimate new houses prices given known
houses’ size, geographic region and prices
Expected output: actual parameters for the
(linear) prediction function
26
27. Gradient descent BSP
Generate a different model for each region
House item vectors
price -> size
150k -> 80
2 dimensional space
~1.3M vectors dataset
27
30. Gradient descent BSP
Classification
Logistic regression with gradient descent
Real estate market dataset
We want to find which estate listings belong to agencies
To avoid buying from them
Same algorithm
With different cost function and features
Existing items are tagged or not as “belonging to agency”
Create vectors from items’ text
Sample vector
1 -> 1 3 0 0 5 3 4 1
30
32. Benchmarks
Not directly comparable to Mahout’s
regression algorithms
Both SGD and CGD are inherently better than
plain GD
But Hama GD had on average same
performance of Mahout’s SGD / CGD
Next step is implementing SGD / CGD on top of
Hama
32
33. Wrap up
Even if
ML module is still “young” / work in progress
and tools like Apache Mahout have better
“coverage”
Apache Hama can be particularly useful in
certain “highly iterative” use cases
Interesting benchmarks
33