HBase for Dealing with Large Matrices

  • 153 views
Uploaded on

My presentation in HBaseCon 2013 entitled "Using Apache HBase for Large Matrices". This describes an HBase backed Mahout matrix, as input/output to machine learning algorithms

My presentation in HBaseCon 2013 entitled "Using Apache HBase for Large Matrices". This describes an HBase backed Mahout matrix, as input/output to machine learning algorithms

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
153
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HBase for Dealing withLarge Matrices
  • 2. Who am I?Leads data team at DilisimResearcher at Anadolu University
  • 3. Machine LearningSome big problemsClassifying huge text collectionsRecommending to millions of usersPredicting links in a social network
  • 4. Recommender SystemsRecommenders input large sparsematricesHow would you input a millions Xmillions matrix?
  • 5. Recommender Systemsm users3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 …2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00 2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00…3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 …0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 …0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 …4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00 4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00…4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00…1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 …1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00 1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00…3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00 3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00…0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00 0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00…4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00…1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 …3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 …0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 ……………………………………………………………………………………………………………………… .…………………………………………………………………………………………………………………… .………………………………………………………………………………………………………………… … .n itemsInput
  • 6. Recommender SystemsState-of-the-art recommender systems learnlarge modelsOne factor vector per each user and itemOne parameter vector (on side info) pereach user and item
  • 7. Recommender Systemsm users3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 …2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00 …3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 …0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 …0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 …4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00 …4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 …1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 …1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00 …3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00 …0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00 …………………………………………………………… .………………………………………………………… .………………………………………………………… .n itemsInputUser ModelItem Modelm x kn x k0.54 0.48 0.83 0.75 0.28 …0.02 0.29 0.99 0.85 0.68 …0.05 0.53 0.60 0.98 0.19 …0.52 0.47 0.50 0.12 0.98 …0.26 0.39 0.29 0.91 0.50 …0.15 0.43 0.66 0.07 0.51 …0.52 0.36 0.01 0.87 0.53 ……………………………. .………………………….. .…………………………... .0.93 0.78 0.56 0.77 0.75 …0.21 0.44 0.99 0.01 0.00 …0.04 0.42 0.36 0.72 0.19 …0.77 0.07 0.24 0.67 0.87 …0.42 0.79 0.62 0.80 0.79 …0.42 0.32 0.26 0.50 0.85 …0.94 0.76 0.93 0.34 0.46 ……………………………. .………………………….. .…………………………... .
  • 8. Learning ProcessWhat does a machine learning algorithmrequire to do with that matrix?
  • 9. Machine Learning - TechniquesBatch LearningAll parameters are updated once periteration
  • 10. Machine Learning - TechniquesBatch LearningUpdates can be calculated in parallelusing MapReduce(SequenceFile might be enough)
  • 11. Machine Learning - TechniquesBatch LearningOutput model should provide randomaccess to rows
  • 12. Machine Learning - TechniquesOnline LearningParameters are updated per trainingexample
  • 13. Machine Learning - TechniquesOnline LearningEach update results in updates ina rowNeeds random access while learning
  • 14. Machine Learning - TechniquesOnline LearningOutput model should provide randomaccess to rows
  • 15. Deployment ProcessHow do you decide to deploy a machinelearning model in production?
  • 16. Machine Learning - DeploymentUsual processWorksgood?Deploy inproductionExperimenton prototypeYN
  • 17. Machine Learning - DeploymentHow would you turn your prototype intoproduction easily?Common matrix interface for in-memory and persistent versions
  • 18. HBase Backed MatrixImplements Mahout matrixDense or sparse
  • 19. HBase Backed MatrixRandom access to cellsRandom access to rowsIteration over rowsLazy loading while iterating
  • 20. HBase Backed MatrixCommon interface for prototype andproductEasy to deploy (Model already persisted)
  • 21. HBase Backed MatrixMatrix operations with existing mahout-math library
  • 22. Logical SchemaComposite row keys:12_0:12_9:12_22000:data:value:0.41data:value:0.41data:value:0.41
  • 23. Logical SchemaComposite row keys:Row access by scanCell access by getAtomic row update should behandled in application
  • 24. Logical SchemaRow indices as row keys12:data:0:0.41 data:22000:0.41data:9:0.41
  • 25. Logical SchemaRow indices as row keysAtomic updates are handledautomatically
  • 26. Speed – Cell access/writeGET SETrow index as row keycomposite row key
  • 27. Speed – Row access/writeGET SETrow index as row keycomposite row key
  • 28. Codegithub.com/gcapan/mahout/tree/hbase-matrix
  • 29. Future WorkMatrixInputFormatMight replace SequenceFile basedMapReduce inputs
  • 30. Future Work – A little digressionRecommender SystemsCalculating score for a user-itempair is easy with HBaseMatrix
  • 31. Future Work – A little digressionRecommender Systemstop-N recommendation?All candidate items for a user inthe user row as a nested entity(See Ian Varleys HBase SchemaDesign)
  • 32. Thank you!