SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
HBaseCon 2013: Using Apache HBase for Large Matrices
6.
Recommender Systems
State-of-the-art recommender systems learn
large models
One factor vector per each user and item
One parameter vector (on side info) per
each user and item
8.
Learning Process
What does a machine learning algorithm
require to do with that matrix?
9.
Machine Learning - Techniques
Batch Learning
All parameters are updated once per
iteration
10.
Machine Learning - Techniques
Batch Learning
Updates can be calculated in parallel
using MapReduce
(SequenceFile might be enough)
11.
Machine Learning - Techniques
Batch Learning
Output model should provide random
access to rows
12.
Machine Learning - Techniques
Online Learning
Parameters are updated per training
example
13.
Machine Learning - Techniques
Online Learning
Each update results in updates in
a row
Needs random access while learning
14.
Machine Learning - Techniques
Online Learning
Output model should provide random
access to rows
15.
Deployment Process
How do you decide to deploy a machine
learning model in production?
16.
Machine Learning - Deployment
Usual process
Works
good?
Deploy in
production
Experiment
on prototype
Y
N
17.
Machine Learning - Deployment
How would you turn your prototype into
production easily?
Common matrix interface for in-
memory and persistent versions
18.
HBase Backed Matrix
Implements Mahout matrix
Dense or sparse
19.
HBase Backed Matrix
Random access to cells
Random access to rows
Iteration over rows
Lazy loading while iterating
20.
HBase Backed Matrix
Common interface for prototype and
product
Easy to deploy (Model already persisted)
21.
HBase Backed Matrix
Matrix operations with existing mahout-
math library
29.
Future Work
MatrixInputFormat
Might replace SequenceFile based
MapReduce inputs
30.
Future Work – A little digression
Recommender Systems
Calculating score for a user-item
pair is easy with HBaseMatrix
31.
Future Work – A little digression
Recommender Systems
top-N recommendation?
All candidate items for a user in
the user row as a nested entity
(See Ian Varley's HBase Schema
Design)