Big Data LDN 2017: Serving Predictive Models with Redis

Home of Redis
Serving Predictive Models with Redis
Tague Griffith
Head of Developer Advocacy

3
Teaching a computer, by example, an algorithm
that is too complex to program

4
Machine Learning Problems
Pick One of a Set
• Spam Detection
• Manufacturing defect
detection
• Handwriting analysis
• Decision Trees
• Naïve Bayes
• Logistic Regression
Score or Rank
• Recommendations
• Likelihood of Purchase
• Linear Regression
• SVM
Classification Regression
Group Similar
• Find Similar Items
• Customer segmentation
• Cohort detection
• K-Means
• K-Nearest Neighbors
• Hierarchical Clustering
Clustering

5
Supervised Learning – Training Spam Classifier
Mail Spam Mail MailMail
Mail
Spam
SpamSpam Mail
MailSpam
Spam
Mail Spam

6
Deploying a Spam Classifier
6
Spam
Mail
Spam Spam

7
How do we Build these Boxes
¯_(ツ)_/¯

8
Typical Spark Application Structure
8
Spark Training
Data is loaded into Spark Model is saved in files
File System Custom Server
Model is loaded to your
custom app
Serving Client
Client App

9
Building high performance and reliable
services are hard, isn't there something we can
deploy

11
REmote DIctionary Server
Strings Hashes Lists
Sets Bitmaps
Hyperlog-
logs
Sorted
Sets
Geospatial Bitfield

12
A Quick Recap of Redis
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings / Bitmaps / BitFields
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337
}
{ A: (51.5, 0.12), B: (32.1, 34.7)
}
00110101 11001110 10101010

13
Redis Modules
• Any C/C++ program can now run on Redis
• Use existing or add new data-structures
• Enjoy simplicity, infinite scalability and high availability while
keeping the native speed of Redis
• Can be created by anyone
New Capabilities
New Commands
New Data Types

14
Redis-ML: Predictive Model Serving Engine
• Predictive models as native Redis types
• Perform evaluation directly in Redis
• Store training output as “hot model”
Spark Training
Data loaded into Spark Model is saved in
Redis-ML
Redis-ML
Serving Client
Client
App
Client
App
Client
App
Any Training
Platform

15
Redis ML Module
Redis Module
Tree Ensembles
Linear Regression
Logistic Regression
Matrix + Vector Operations
More to come...

16
Random Forest Model
• A collection of decision trees
• Supports classification & regression
• Splitter Node can be:
◦ Categorical (e.g. day == “Sunday”)
◦ Numerical (e.g. age < 43)
• Decision is taken by the majority of decision trees

17
Classic Tree Problem: Titanic Survival
YES
Sex =
Male ?
Age <
9.5?
Sibps >
2.5?
Survived
Died
SurvivedDied
NO
• Passenger Data encoded as feature vectors
• ML Algorithm learns the tree rules
• ID3, CART (RPART), etc.
• Tree rules used to infer results

18
Titanic Survival: Random Forest
YES
Sex =
Male ?
Age <
9.5?
*Sibps >
2.5?
Survived
Died
SurvivedDied
NO YES
Country=
US?
State =
CA?
Height>
1.60m?
Survived
Died
SurvivedDied
NO YES
Weight<
80kg?
I.Q<100?
Eye color
=blue?
Survived
Died
SurvivedDied
NO
Tree #1 Tree #2 Tree #3

19
Who Would Survive the Titanic
John:
• Male, 34,
• Married w/ 2 kids (Sibps=3)
• New York, USA
• 1.78m, 78kg
• 110 iq
• Blue eyes
Mathew:
• Male, 6
• 3 Sisters (Sibps=3)
• New York, USA
• 1.06m, 22.7 kg
• 100 iq
• Brown eyes
Let's use our forest to find out

20
Redis: Forest Data Type
Add nodes to a tree in a forest:
Perform classification/regression of a feature vector:
ML.FOREST.ADD <forestId> <treeId> <path>
[ [NUMERIC|CATEGORIC] <splitterAttr> <splitterVal> ] |
[LEAF] <predVal>
ML.FOREST.RUN <forestId> <features>
[CLASSIFICATION|REGRESSION]

21
Real World Challenge
• Ad serving company
• Need to serve 20,000 ads/sec @ 50msec data-center latency
• Runs 1k campaigns → 1K random forest
• Each forest has 15K trees
• On average each tree has 7 levels (depth)

22
Ad Serving costs: Homegrown v. Redis
Homegrown
1,247 x c4.8xlarge 35 x c4.8xlarge
Cut computing infrastructure
by 97%
22

23
Redis ML with Spark ML
Random Forest; 1,000 forests @ 15,000 trees
Classification Time Over Spark
13x Faster

25
The Tools
Transform:
25
Train:
Classify:
+
Containers:

27
Step 1: Get The Data
Download and extract the MovieLens 100K Dataset
The data is organized in separate files:
• Ratings: user id | item id | rating (1-5) | timestamp
• Item (movie) info: movie id | genre info fields (1/0)
• User info: user id | age | gender | occupation
Our classifier should return the expected rating (from 1 to 5) a user would give the movie in question

28
Step 2: Transform
28
The training data for each movie should contain 1 line per user:
• class (rating from 1 to 5 the user gave to this movie)
• user info (age, gender, occupation)
• user ratings of other movies (movie_id:rating ...)
• user genre rating averages (genre:avg_score ...)
Run gen_data.py to transform the files to the desired format

29
Step3: Train and Load to Redis
// Create a new forest instance
val rf = new
RandomForestClassifier().setFeatureSubsetStrategy("auto").setLabelCol("indexedLabel").setFeat
uresCol("indexedFeatures").setNumTrees(500)
…..
// Train model
val model = pipeline.fit(trainingData)
…..
val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel]
// Load the model to redis
val f = new Forest(rfModel.trees)
f.loadToRedis(”movie-10", "127.0.0.1")

30
Step 4: Execute inference in Redis
Redis-ML
+
Spark
Training
Client App

31
Summary
• Train with Spark, Serve with Redis
• 97% resource cost serving
• Simplify ML lifecycle
• Redise (Cloud or Pack):
‒Scaling, HA, Performance
‒PAYG – cost optimized
‒Ease of use
‒Supported by the teams who created Spark and
Redis
Spark Training
Data loaded into Spark Model is saved in
Redis-ML
Redis-ML
Serving Client
Client
App
Client
App
Client
App
+

Big Data LDN 2017: Serving Predictive Models with Redis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data LDN 2017: Serving Predictive Models with Redis

Similar to Big Data LDN 2017: Serving Predictive Models with Redis (20)

More from Matt Stubbs

More from Matt Stubbs (20)

Recently uploaded

Recently uploaded (20)

Big Data LDN 2017: Serving Predictive Models with Redis