Serving predictive models with Redis

Home of Redis
Serving Predictive Models with Redis
Tague Griffith
Head of Developer Advocacy

2
Topics
• Introductions
• Why Machine Learning
• What is Apache Spark
• Redis-ML

4
Who I am
• Head of Developer Advocacy for Redis Labs
• Developer and architect turned Evangelist
• Infrastructure and Distributed Systems
• Large Scale Redis Systems
• Former: Apple, Netscape, Yahoo/Flickr, GoPro
• Focus on the Open Source Community
• Education and Support
• Nurture and grow the entire community

5
Redis Labs – Home of Redis
Founded in 2011
HQ in Mountain View CA, R&D center in Tel-Aviv IL
The commercial company behind Open Source Redis
Provider of the Redis Enterprise (Redise) technology,
platform and products

6
Redise Cloud Private
Redis Labs Products
Redise Cloud Redise Pack ManagedRedise Pack
SERVICES SOFTWARE
Fully managed Redise service in
VPCs within AWS, MS Azure, GCP
& IBM Softlayer
Fully managed Redise service on
hosted servers within AWS, MS
Azure, GCP, IBM Softlayer, Heroku,
CF & OpenShift
Downloadable Redise software for
any enterprise datacenter or
cloud environment
Fully managed Redise Pack in
private data centers
&& &

7
REmote DIctionary Server
Strings Hashes Lists
Sets Bitmaps
Hyperlog-
logs
Sorted
Sets
Geospatial Bitfield

8
A Quick Recap of Redis
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings / Bitmaps / BitFields
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337
}
{ A: (51.5, 0.12), B: (32.1, 34.7)
}
00110101 11001110 10101010

9
Redis Main Differentiations
Simplicity
(through Data Structures)
Extensibility
(through Redis Modules)
Performance
ListsSorted Sets
Hashes Hyperlog-logs
Geospatial
Indexes
Bitmaps
SetsStrings
Bit field

11
Teaching a computer by example to learn an
algorithm that is too complex to program

12
Machine Learning Problems
Pick One of a Set
• Spam Detection
• Manufacturing defect
detection
• Handwriting analysis
• Decision Trees/Forests
• Naïve Bayes
Score or Rank
• Recommendations
• Likelihood of Purchase
• Linear Regression
• Logistic Regression
Classification Regression
Group Similar
• Find Similar Items
• Customer segmentation
• Cohort detection
• K-Means
• Hierarchical Clustering
Clustering

13
Supervised Learning – Training Spam Classifier
Mail Spam Mail MailMail
Mail
Spam
SpamSpam Mail
MailSpam
Spam
Mail Spam

14
Deploying a Spam Classifier
14
Spam
Mail
Spam Spam

15
How do we Build these Boxes
¯_(ツ)_/¯

16
Typical Spark Application Structure
16
Spark Training
Data is loaded into Spark Model is saved in files
File System Custom Server
Model is loaded to your
custom app
Serving Client
Client App

17
Building high performance and reliable
services are hard, isn't there something we can
deploy

19
Redis Modules
• Any C/C++ program can now run on Redis
• Use existing or add new data-structures
• Enjoy simplicity, infinite scalability and high availability while
keeping the native speed of Redis
• Can be created by anyone
New Capabilities
New Commands
New Data Types

20
Redis-ML: Predictive Model Serving Engine
• Predictive models as native Redis types
• Perform evaluation directly in Redis
• Store training output as “hot model”
Spark Training
Data loaded into Spark Model is saved in
Redis-ML
Redis-ML
Serving Client
Client
App
Client
App
Client
App
Any Training
Platform

21
Redis ML Module
Redis Module
Tree Ensembles
Linear Regression
Logistic Regression
Matrix + Vector Operations
More to come...

22
Random Forest Model
• A collection of decision trees
• Supports classification & regression
• Splitter Node can be:
◦ Categorical (e.g. day == “Sunday”)
◦ Numerical (e.g. age < 43)
• Decision is taken by the majority of decision trees

23
Classic Tree Problem: Titanic Survival
YES
Sex =
Male ?
Age <
9.5?
Sibps >
2.5?
Survived
Died
SurvivedDied
NO
• Passenger Data encoded as feature vectors
• ML Algorithm learns the tree rules
• ID3, CART (RPART), etc.
• Tree rules used to infer results

24
Titanic Survival: Random Forest
YES
Sex =
Male ?
Age <
9.5?
*Sibps >
2.5?
Survived
Died
SurvivedDied
NO YES
Country=
US?
State =
CA?
Height>
1.60m?
Survived
Died
SurvivedDied
NO YES
Weight<
80kg?
I.Q<100?
Eye color
=blue?
Survived
Died
SurvivedDied
NO
Tree #1 Tree #2 Tree #3

25
Who Would Survive the Titanic
John:
• Male, 34,
• Married w/ 2 kids (Sibps=3)
• New York, USA
• 1.78m, 78kg
• 110 iq
• Blue eyes
Mathew:
• Male, 6
• 3 Sisters (Sibps=3)
• New York, USA
• 1.06m, 22.7 kg
• 100 iq
• Brown eyes
Let's use our forest to find out

26
Redis: Forest Data Type
Add nodes to a tree in a forest:
Perform classification/regression of a feature vector:
ML.FOREST.ADD <forestId> <treeId> <path>
[ [NUMERIC|CATEGORIC] <splitterAttr> <splitterVal> ] |
[LEAF] <predVal>
ML.FOREST.RUN <forestId> <features>
[CLASSIFICATION|REGRESSION]

27
Real World Challenge
• Ad serving company
• Need to serve 20,000 ads/sec @ 50msec data-center latency
• Runs 1k campaigns → 1K random forest
• Each forest has 15K trees
• On average each tree has 7 levels (depth)

28
Ad Serving costs: Homegrown v. Redis
Homegrown
1,247 x c4.8xlarge 35 x c4.8xlarge
Cut computing infrastructure
by 97%
28

29
Redis ML with Spark ML
Random Forest; 1,000 forests @ 15,000 trees
Classification Time Over Spark
13x Faster

31
The Tools
Transform:
31
Train:
Classify:
+
Containers:

33
Step 1: Get The Data
Download and extract the MovieLens 100K Dataset
The data is organized in separate files:
• Ratings: user id | item id | rating (1-5) | timestamp
• Item (movie) info: movie id | genre info fields (1/0)
• User info: user id | age | gender | occupation
Our classifier should return the expected rating (from 1 to 5) a user would give the movie in question

34
Step 2: Transform
34
The training data for each movie should contain 1 line per user:
• class (rating from 1 to 5 the user gave to this movie)
• user info (age, gender, occupation)
• user ratings of other movies (movie_id:rating ...)
• user genre rating averages (genre:avg_score ...)
Run gen_data.py to transform the files to the desired format

35
Step3: Train and Load to Redis
// Create a new forest instance
val rf = new
RandomForestClassifier().setFeatureSubsetStrategy("auto").setLabelCol("indexedLabel").setFeat
uresCol("indexedFeatures").setNumTrees(500)
…..
// Train model
val model = pipeline.fit(trainingData)
…..
val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel]
// Load the model to redis
val f = new Forest(rfModel.trees)
f.loadToRedis(”movie-10", "127.0.0.1")

36
Step 4: Execute inference in Redis
Redis-ML
+
Spark
Training
Client App

37
Summary
• Train with Spark, Serve with Redis
• 97% resource cost serving
• Simplify ML lifecycle
• Redise (Cloud or Pack):
‒Scaling, HA, Performance
‒PAYG – cost optimized
‒Ease of use
‒Supported by the teams who created Spark and
Redis
Spark Training
Data loaded into Spark Model is saved in
Redis-ML
Redis-ML
Serving Client
Client
App
Client
App
Client
App
+

38
Where to Find Me
@tague
https://github.com/tague
tague@redislabs.com

Serving predictive models with Redis

Recommended

Recommended

More Related Content

Similar to Serving predictive models with Redis

Similar to Serving predictive models with Redis (20)

Recently uploaded

Recently uploaded (20)

Serving predictive models with Redis