Machine learning with Apache Hama


Published on

Published in: Technology

Machine learning with Apache Hama

  1. Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org 1
  2. About me ASF member having fun with: Lucene / Solr Hama UIMA Stanbol … some others SW engineer @ Adobe R&D 2
  3. Agenda Apache Hama and BSP Why machine learning on BSP Some examples Benchmarks 3
  4. Apache Hama Bulk Synchronous Parallel computing framework on top of HDFS for massive scientific computations TLP since May 2012 0.6.0 release out soon Growing community 4
  5. BSP supersteps A BSP algorithm is composed by a sequence of “supersteps” 5
  6. BSP supersteps Each task Superstep 1  Do some computation  Communicate with other tasks  Synchronize Superstep 2  Do some computation  Communicate with other tasks  Synchronize … … … Superstep N  Do some computation  Communicate with other tasks  Synchronize 6
  7. Why BSP Simple programming model Supersteps semantic is easy Preserve data locality Improve performance Well suited for iterative algorithms 7
  8. Apache Hama architecture  BSP Program execution flow 8
  9. Apache Hama architecture 9
  10. Apache Hama Features  BSP API  M/R like I/O API  Graph API  Job management / monitoring  Checkpoint recovery  Local & (Pseudo) Distributed run modes  Pluggable message transfer architecture  YARN supported  Running in Apache Whirr 10
  11. Apache Hama BSP API public abstract class BSP<K1, V1, K2, V2, M extends Writable> …  K1, V1 are key, values for inputs  K2, V2 are key, values for outputs  M are they type of messages used for task communication 11
  12. Apache Hama BSP API public void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws .. public void setup(BSPPeer<K1, V1, K2, V2, M> peer) throws .. public void cleanup(BSPPeer<K1, V1, K2, V2, M> peer) throws .. 12
  13. Machine learning on BSP Lots (most?) of ML algorithms are inherently iterative Hama ML module currently counts  Collaborative filtering  Clustering  Gradient descent 13
  14. Benchmarking architectureNodeNode Node Node Node Node Node Node Hama Hama Solr DBMS Lucene Mahout Mahout HDFS HDFS 14
  15. Collaborative filtering Given user preferences on movies We want to find users “near” to some specific user So that that user can “follow” them And/or see what they like (which he/she could like too) 15
  16. Collaborative filtering BSP Given a specific user Iteratively (for each task) Superstep 1*i  Read a new user preference row  Find how near is that user from the current user  That is finding how near their preferences are  Since they are given as vectors we may use vector distance measures like Euclidean, cosine, etc. distance algorithms  Broadcast the measure output to other peers Superstep 2*i  Aggregate measure outputs  Update most relevant users  Still to be committed (HAMA-612) 16
  17. Collaborative filtering BSP Given user ratings about movies "john" -> 0, 0, 0, 9.5, 4.5, 9.5, 8 "paula" -> 7, 3, 8, 2, 8.5, 0, 0 "jim” -> 4, 5, 0, 5, 8, 0, 1.5 "tom" -> 9, 4, 9, 1, 5, 0, 8 "timothy" -> 7, 3, 5.5, 0, 9.5, 6.5, 0 We ask for 2 nearest users to “paula” and we get “timothy” and “tom”  user recommendation We can extract highly rated movies “timothy” and “tom” that “paula” didn’t see  Item recommendation 17
  18. Benchmarks Fairly simple algorithm Highly iterative Comparing to Apache Mahout Behaves better than ALS-WR Behaves similarly to RecommenderJob and ItemSimilarityJob 18
  19. K-Means clustering We have a bunch of data (e.g. documents) We want to group those docs in k homogeneous clusters Iteratively for each cluster  Calculate new cluster center  Add doc nearest to new center to the cluster 19
  20. K-Means clustering 20
  21. K-Means clustering BSP Iteratively Superstep 1*i Assignment phase Read vectors splits Sum up temporary centers with assigned vectors Broadcast sum and ingested vectors count Superstep 2*i Update phase Calculate the total sum over all received messages and average Replace old centers with new centers and check for convergence 21
  22. Benchmarks One rack (16 nodes 256 cores) cluster 10G network On average faster than Mahout’s impl 22
  23. Gradient descent Optimization algorithm Find a (local) minimum of some function Used for  solving linear systems  solving non linear systems  in machine learning tasks  linear regression  logistic regression  neural networks backpropagation  … 23
  24. Gradient descent Minimize a given (cost) function Give the function a starting point (set of parameters) Iteratively change parameters in order to minimize the function Stop at the (local) minimum There’s some math but intuitively:  evaluate derivatives at a given point in order to choose where to “go” next 24
  25. Gradient descent BSP Iteratively  Superstep 1*i  each task calculates and broadcasts portions of the cost function with the current parameters  Superstep 2*i  aggregate and update cost function  check the aggregated cost and iterations count  cost should always decrease  Superstep 3*i  each task calculates and broadcasts portions of (partial) derivatives  Superstep 4*i  aggregate and update parameters 25
  26. Gradient descent BSP Simplistic example  Linear regression  Given real estate market dataset  Estimate new houses prices given known houses’ size, geographic region and prices  Expected output: actual parameters for the (linear) prediction function 26
  27. Gradient descent BSP Generate a different model for each region House item vectors  price -> size  150k -> 80 2 dimensional space ~1.3M vectors dataset 27
  28. Gradient descent BSP Dataset and model fit 28
  29. Gradient descent BSP Cost checking 29
  30. Gradient descent BSP Classification Logistic regression with gradient descent Real estate market dataset We want to find which estate listings belong to agencies  To avoid buying from them  Same algorithm With different cost function and features Existing items are tagged or not as “belonging to agency” Create vectors from items’ text Sample vector  1 -> 1 3 0 0 5 3 4 1 30
  31. Gradient descent BSP Classification 31
  32. Benchmarks Not directly comparable to Mahout’s regression algorithms Both SGD and CGD are inherently better than plain GD But Hama GD had on average same performance of Mahout’s SGD / CGD Next step is implementing SGD / CGD on top of Hama  32
  33. Wrap up Even if ML module is still “young” / work in progress and tools like Apache Mahout have better “coverage” Apache Hama can be particularly useful in certain “highly iterative” use cases Interesting benchmarks 33
  34. Thanks! 34