SlideShare a Scribd company logo
Home of Redis
Serving Predictive Models with Redis
Tague Griffith
Head of Developer Advocacy
2
Topics
• Introductions
• Why Machine Learning
• What is Apache Spark
• Redis-ML
Introductions
4
Who I am
• Head of Developer Advocacy for Redis Labs
• Developer and architect turned Evangelist
• Infrastructure and Distributed Systems
• Large Scale Redis Systems
• Former: Apple, Netscape, Yahoo/Flickr, GoPro
• Focus on the Open Source Community
• Education and Support
• Nurture and grow the entire community
5
Redis Labs – Home of Redis
Founded in 2011
HQ in Mountain View CA, R&D center in Tel-Aviv IL
The commercial company behind Open Source Redis
Provider of the Redis Enterprise (Redise) technology,
platform and products
6
Redise Cloud Private
Redis Labs Products
Redise Cloud Redise Pack ManagedRedise Pack
SERVICES SOFTWARE
Fully managed Redise service in
VPCs within AWS, MS Azure, GCP
& IBM Softlayer
Fully managed Redise service on
hosted servers within AWS, MS
Azure, GCP, IBM Softlayer, Heroku,
CF & OpenShift
Downloadable Redise software for
any enterprise datacenter or
cloud environment
Fully managed Redise Pack in
private data centers
&& &
7
REmote DIctionary Server
Strings Hashes Lists
Sets Bitmaps
Hyperlog-
logs
Sorted
Sets
Geospatial Bitfield
8
A Quick Recap of Redis
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings / Bitmaps / BitFields
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337
}
{ A: (51.5, 0.12), B: (32.1, 34.7)
}
00110101 11001110 10101010
9
Redis Main Differentiations
Simplicity
(through Data Structures)
Extensibility
(through Redis Modules)
Performance
ListsSorted Sets
Hashes Hyperlog-logs
Geospatial
Indexes
Bitmaps
SetsStrings
Bit field
Why Machine Learning
11
Teaching a computer by example to learn an
algorithm that is too complex to program
12
Machine Learning Problems
Pick One of a Set
• Spam Detection
• Manufacturing defect
detection
• Handwriting analysis
• Decision Trees/Forests
• Naïve Bayes
Score or Rank
• Recommendations
• Likelihood of Purchase
• Linear Regression
• Logistic Regression
Classification Regression
Group Similar
• Find Similar Items
• Customer segmentation
• Cohort detection
• K-Means
• Hierarchical Clustering
Clustering
13
Supervised Learning – Training Spam Classifier
Mail Spam Mail MailMail
Mail
Spam
SpamSpam Mail
MailSpam
Spam
Mail Spam
14
Deploying a Spam Classifier
14
Spam
Mail
Spam Spam
15
How do we Build these Boxes
¯_(ツ)_/¯
16
Typical Spark Application Structure
16
Spark Training
Data is loaded into Spark Model is saved in files
File System Custom Server
Model is loaded to your
custom app
Serving Client
Client App
17
Building high performance and reliable
services are hard, isn't there something we can
deploy
Redis - ML
19
Redis Modules
• Any C/C++ program can now run on Redis
• Use existing or add new data-structures
• Enjoy simplicity, infinite scalability and high availability while
keeping the native speed of Redis
• Can be created by anyone
New Capabilities
New Commands
New Data Types
20
Redis-ML: Predictive Model Serving Engine
• Predictive models as native Redis types
• Perform evaluation directly in Redis
• Store training output as “hot model”
Spark Training
Data loaded into Spark Model is saved in
Redis-ML
Redis-ML
Serving Client
Client
App
Client
App
Client
App
Any Training
Platform
21
Redis ML Module
Redis Module
Tree Ensembles
Linear Regression
Logistic Regression
Matrix + Vector Operations
More to come...
22
Random Forest Model
• A collection of decision trees
• Supports classification & regression
• Splitter Node can be:
◦ Categorical (e.g. day == “Sunday”)
◦ Numerical (e.g. age < 43)
• Decision is taken by the majority of decision trees
23
Classic Tree Problem: Titanic Survival
YES
Sex =
Male ?
Age <
9.5?
Sibps >
2.5?
Survived
Died
SurvivedDied
NO
• Passenger Data encoded as feature vectors
• ML Algorithm learns the tree rules
• ID3, CART (RPART), etc.
• Tree rules used to infer results
24
Titanic Survival: Random Forest
YES
Sex =
Male ?
Age <
9.5?
*Sibps >
2.5?
Survived
Died
SurvivedDied
NO YES
Country=
US?
State =
CA?
Height>
1.60m?
Survived
Died
SurvivedDied
NO YES
Weight<
80kg?
I.Q<100?
Eye color
=blue?
Survived
Died
SurvivedDied
NO
Tree #1 Tree #2 Tree #3
25
Who Would Survive the Titanic
John:
• Male, 34,
• Married w/ 2 kids (Sibps=3)
• New York, USA
• 1.78m, 78kg
• 110 iq
• Blue eyes
Mathew:
• Male, 6
• 3 Sisters (Sibps=3)
• New York, USA
• 1.06m, 22.7 kg
• 100 iq
• Brown eyes
Let's use our forest to find out
26
Redis: Forest Data Type
Add nodes to a tree in a forest:
Perform classification/regression of a feature vector:
ML.FOREST.ADD <forestId> <treeId> <path>
[ [NUMERIC|CATEGORIC] <splitterAttr> <splitterVal> ] |
[LEAF] <predVal>
ML.FOREST.RUN <forestId> <features>
[CLASSIFICATION|REGRESSION]
27
Real World Challenge
• Ad serving company
• Need to serve 20,000 ads/sec @ 50msec data-center latency
• Runs 1k campaigns → 1K random forest
• Each forest has 15K trees
• On average each tree has 7 levels (depth)
28
Ad Serving costs: Homegrown v. Redis
Homegrown
1,247 x c4.8xlarge 35 x c4.8xlarge
Cut computing infrastructure
by 97%
28
29
Redis ML with Spark ML
Random Forest; 1,000 forests @ 15,000 trees
Classification Time Over Spark
13x Faster
Movie Classification Demo
31
The Tools
Transform:
31
Train:
Classify:
+
Containers:
32
Inference Model
33
Step 1: Get The Data
Download and extract the MovieLens 100K Dataset
The data is organized in separate files:
• Ratings: user id | item id | rating (1-5) | timestamp
• Item (movie) info: movie id | genre info fields (1/0)
• User info: user id | age | gender | occupation
Our classifier should return the expected rating (from 1 to 5) a user would give the movie in question
34
Step 2: Transform
34
The training data for each movie should contain 1 line per user:
• class (rating from 1 to 5 the user gave to this movie)
• user info (age, gender, occupation)
• user ratings of other movies (movie_id:rating ...)
• user genre rating averages (genre:avg_score ...)
Run gen_data.py to transform the files to the desired format
35
Step3: Train and Load to Redis
// Create a new forest instance
val rf = new
RandomForestClassifier().setFeatureSubsetStrategy("auto").setLabelCol("indexedLabel").setFeat
uresCol("indexedFeatures").setNumTrees(500)
…..
// Train model
val model = pipeline.fit(trainingData)
…..
val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel]
// Load the model to redis
val f = new Forest(rfModel.trees)
f.loadToRedis(”movie-10", "127.0.0.1")
36
Step 4: Execute inference in Redis
Redis-ML
+
Spark
Training
Client App
37
Summary
• Train with Spark, Serve with Redis
• 97% resource cost serving
• Simplify ML lifecycle
• Redise (Cloud or Pack):
‒Scaling, HA, Performance
‒PAYG – cost optimized
‒Ease of use
‒Supported by the teams who created Spark and
Redis
Spark Training
Data loaded into Spark Model is saved in
Redis-ML
Redis-ML
Serving Client
Client
App
Client
App
Client
App
+
38
Where to Find Me
@tague
https://github.com/tague
tague@redislabs.com

More Related Content

Similar to Serving predictive models with Redis

Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Imply
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual Machine
Andrew Case
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Maurice Nsabimana
 
Informatica slides
Informatica slidesInformatica slides
Informatica slides
sureshpaladi12
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Databricks
 
Obtén visibilidad completa y encuentra problemas de seguridad ocultos
Obtén visibilidad completa y encuentra problemas de seguridad ocultosObtén visibilidad completa y encuentra problemas de seguridad ocultos
Obtén visibilidad completa y encuentra problemas de seguridad ocultos
Elasticsearch
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
DataWorks Summit
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
Lucas Jellema
 
Toward Easy Export of Imagery Products and Feature Classes as Training Data f...
Toward Easy Export of Imagery Products and Feature Classes as Training Data f...Toward Easy Export of Imagery Products and Feature Classes as Training Data f...
Toward Easy Export of Imagery Products and Feature Classes as Training Data f...
Dawn Wright
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
Lucas Jellema
 
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
Data Con LA
 
eDiscovery and Microsoft Teams
eDiscovery and Microsoft TeamseDiscovery and Microsoft Teams
eDiscovery and Microsoft Teams
Albert Hoitingh
 
How in memory technology will impact machine deep learning services (redis la...
How in memory technology will impact machine deep learning services (redis la...How in memory technology will impact machine deep learning services (redis la...
How in memory technology will impact machine deep learning services (redis la...
Avner Algom
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
Amazon Web Services
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Yan Cui
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open Data
Sarasi Sarangi
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
MongoDB
 
Data Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data ProtectionData Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data Protection
Karen Lopez
 

Similar to Serving predictive models with Redis (20)

Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual Machine
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
Informatica slides
Informatica slidesInformatica slides
Informatica slides
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
Obtén visibilidad completa y encuentra problemas de seguridad ocultos
Obtén visibilidad completa y encuentra problemas de seguridad ocultosObtén visibilidad completa y encuentra problemas de seguridad ocultos
Obtén visibilidad completa y encuentra problemas de seguridad ocultos
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
 
Toward Easy Export of Imagery Products and Feature Classes as Training Data f...
Toward Easy Export of Imagery Products and Feature Classes as Training Data f...Toward Easy Export of Imagery Products and Feature Classes as Training Data f...
Toward Easy Export of Imagery Products and Feature Classes as Training Data f...
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
 
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
 
eDiscovery and Microsoft Teams
eDiscovery and Microsoft TeamseDiscovery and Microsoft Teams
eDiscovery and Microsoft Teams
 
How in memory technology will impact machine deep learning services (redis la...
How in memory technology will impact machine deep learning services (redis la...How in memory technology will impact machine deep learning services (redis la...
How in memory technology will impact machine deep learning services (redis la...
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open Data
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Data Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data ProtectionData Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data Protection
 

Recently uploaded

QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 

Recently uploaded (20)

QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 

Serving predictive models with Redis

  • 1. Home of Redis Serving Predictive Models with Redis Tague Griffith Head of Developer Advocacy
  • 2. 2 Topics • Introductions • Why Machine Learning • What is Apache Spark • Redis-ML
  • 4. 4 Who I am • Head of Developer Advocacy for Redis Labs • Developer and architect turned Evangelist • Infrastructure and Distributed Systems • Large Scale Redis Systems • Former: Apple, Netscape, Yahoo/Flickr, GoPro • Focus on the Open Source Community • Education and Support • Nurture and grow the entire community
  • 5. 5 Redis Labs – Home of Redis Founded in 2011 HQ in Mountain View CA, R&D center in Tel-Aviv IL The commercial company behind Open Source Redis Provider of the Redis Enterprise (Redise) technology, platform and products
  • 6. 6 Redise Cloud Private Redis Labs Products Redise Cloud Redise Pack ManagedRedise Pack SERVICES SOFTWARE Fully managed Redise service in VPCs within AWS, MS Azure, GCP & IBM Softlayer Fully managed Redise service on hosted servers within AWS, MS Azure, GCP, IBM Softlayer, Heroku, CF & OpenShift Downloadable Redise software for any enterprise datacenter or cloud environment Fully managed Redise Pack in private data centers && &
  • 7. 7 REmote DIctionary Server Strings Hashes Lists Sets Bitmaps Hyperlog- logs Sorted Sets Geospatial Bitfield
  • 8. 8 A Quick Recap of Redis Key "I'm a Plain Text String!" { A: “foo”, B: “bar”, C: “baz” } Strings / Bitmaps / BitFields Hash Tables (objects!) Linked Lists Sets Sorted Sets Geo Sets HyperLogLog { A , B , C , D , E } [ A → B → C → D → E ] { A: 0.1, B: 0.3, C: 100, D: 1337 } { A: (51.5, 0.12), B: (32.1, 34.7) } 00110101 11001110 10101010
  • 9. 9 Redis Main Differentiations Simplicity (through Data Structures) Extensibility (through Redis Modules) Performance ListsSorted Sets Hashes Hyperlog-logs Geospatial Indexes Bitmaps SetsStrings Bit field
  • 11. 11 Teaching a computer by example to learn an algorithm that is too complex to program
  • 12. 12 Machine Learning Problems Pick One of a Set • Spam Detection • Manufacturing defect detection • Handwriting analysis • Decision Trees/Forests • Naïve Bayes Score or Rank • Recommendations • Likelihood of Purchase • Linear Regression • Logistic Regression Classification Regression Group Similar • Find Similar Items • Customer segmentation • Cohort detection • K-Means • Hierarchical Clustering Clustering
  • 13. 13 Supervised Learning – Training Spam Classifier Mail Spam Mail MailMail Mail Spam SpamSpam Mail MailSpam Spam Mail Spam
  • 14. 14 Deploying a Spam Classifier 14 Spam Mail Spam Spam
  • 15. 15 How do we Build these Boxes ¯_(ツ)_/¯
  • 16. 16 Typical Spark Application Structure 16 Spark Training Data is loaded into Spark Model is saved in files File System Custom Server Model is loaded to your custom app Serving Client Client App
  • 17. 17 Building high performance and reliable services are hard, isn't there something we can deploy
  • 19. 19 Redis Modules • Any C/C++ program can now run on Redis • Use existing or add new data-structures • Enjoy simplicity, infinite scalability and high availability while keeping the native speed of Redis • Can be created by anyone New Capabilities New Commands New Data Types
  • 20. 20 Redis-ML: Predictive Model Serving Engine • Predictive models as native Redis types • Perform evaluation directly in Redis • Store training output as “hot model” Spark Training Data loaded into Spark Model is saved in Redis-ML Redis-ML Serving Client Client App Client App Client App Any Training Platform
  • 21. 21 Redis ML Module Redis Module Tree Ensembles Linear Regression Logistic Regression Matrix + Vector Operations More to come...
  • 22. 22 Random Forest Model • A collection of decision trees • Supports classification & regression • Splitter Node can be: ◦ Categorical (e.g. day == “Sunday”) ◦ Numerical (e.g. age < 43) • Decision is taken by the majority of decision trees
  • 23. 23 Classic Tree Problem: Titanic Survival YES Sex = Male ? Age < 9.5? Sibps > 2.5? Survived Died SurvivedDied NO • Passenger Data encoded as feature vectors • ML Algorithm learns the tree rules • ID3, CART (RPART), etc. • Tree rules used to infer results
  • 24. 24 Titanic Survival: Random Forest YES Sex = Male ? Age < 9.5? *Sibps > 2.5? Survived Died SurvivedDied NO YES Country= US? State = CA? Height> 1.60m? Survived Died SurvivedDied NO YES Weight< 80kg? I.Q<100? Eye color =blue? Survived Died SurvivedDied NO Tree #1 Tree #2 Tree #3
  • 25. 25 Who Would Survive the Titanic John: • Male, 34, • Married w/ 2 kids (Sibps=3) • New York, USA • 1.78m, 78kg • 110 iq • Blue eyes Mathew: • Male, 6 • 3 Sisters (Sibps=3) • New York, USA • 1.06m, 22.7 kg • 100 iq • Brown eyes Let's use our forest to find out
  • 26. 26 Redis: Forest Data Type Add nodes to a tree in a forest: Perform classification/regression of a feature vector: ML.FOREST.ADD <forestId> <treeId> <path> [ [NUMERIC|CATEGORIC] <splitterAttr> <splitterVal> ] | [LEAF] <predVal> ML.FOREST.RUN <forestId> <features> [CLASSIFICATION|REGRESSION]
  • 27. 27 Real World Challenge • Ad serving company • Need to serve 20,000 ads/sec @ 50msec data-center latency • Runs 1k campaigns → 1K random forest • Each forest has 15K trees • On average each tree has 7 levels (depth)
  • 28. 28 Ad Serving costs: Homegrown v. Redis Homegrown 1,247 x c4.8xlarge 35 x c4.8xlarge Cut computing infrastructure by 97% 28
  • 29. 29 Redis ML with Spark ML Random Forest; 1,000 forests @ 15,000 trees Classification Time Over Spark 13x Faster
  • 33. 33 Step 1: Get The Data Download and extract the MovieLens 100K Dataset The data is organized in separate files: • Ratings: user id | item id | rating (1-5) | timestamp • Item (movie) info: movie id | genre info fields (1/0) • User info: user id | age | gender | occupation Our classifier should return the expected rating (from 1 to 5) a user would give the movie in question
  • 34. 34 Step 2: Transform 34 The training data for each movie should contain 1 line per user: • class (rating from 1 to 5 the user gave to this movie) • user info (age, gender, occupation) • user ratings of other movies (movie_id:rating ...) • user genre rating averages (genre:avg_score ...) Run gen_data.py to transform the files to the desired format
  • 35. 35 Step3: Train and Load to Redis // Create a new forest instance val rf = new RandomForestClassifier().setFeatureSubsetStrategy("auto").setLabelCol("indexedLabel").setFeat uresCol("indexedFeatures").setNumTrees(500) ….. // Train model val model = pipeline.fit(trainingData) ….. val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel] // Load the model to redis val f = new Forest(rfModel.trees) f.loadToRedis(”movie-10", "127.0.0.1")
  • 36. 36 Step 4: Execute inference in Redis Redis-ML + Spark Training Client App
  • 37. 37 Summary • Train with Spark, Serve with Redis • 97% resource cost serving • Simplify ML lifecycle • Redise (Cloud or Pack): ‒Scaling, HA, Performance ‒PAYG – cost optimized ‒Ease of use ‒Supported by the teams who created Spark and Redis Spark Training Data loaded into Spark Model is saved in Redis-ML Redis-ML Serving Client Client App Client App Client App +
  • 38. 38 Where to Find Me @tague https://github.com/tague tague@redislabs.com