Streaming Architecture including Rendezvous for Machine Learning

T
Ted DunningSoftware Engineer at MapR Technologies
© 2017 MapR Technologies 1
Why Stream?
and
Machine Learning Logistics
© 2017 MapR Technologies 2
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Committer, PMC member, board member, ASF
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
© 2017 MapR Technologies 3
Traditional Solution – Use a Profile Database
POS
1..n
Fraud
detector
Last card
use
© 2017 MapR Technologies 4
What Happens as You Scale Up?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
© 2017 MapR Technologies 5
Shared Database Can Be A Problem
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
Shared database
causes problems
Big problem is
disagreement about
schema and indexing
© 2017 MapR Technologies 6
Alternative: Use a Stream to Isolate Services
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
© 2017 MapR Technologies 7
Add New Services via the Stream
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
© 2017 MapR Technologies 8
Changing Implementation Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
© 2017 MapR Technologies 9
Changing Implementation Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
© 2017 MapR Technologies 10
With MapR, Geo-Distributed Data Appears Local
stream
Data
source
Consumer
© 2017 MapR Technologies 11
With MapR, Geo-Distributed Data Appears Local
stream
stream
Data
source
Consumer
© 2017 MapR Technologies 12
With MapR, Geo-distributed Data Appears Local
stream
stream
Data
source
ConsumerGlobal Data Center
Regional Data Center
© 2017 MapR Technologies 13
Use Case: Telecommunications
Callers
Towers
cdr data
© 2017 MapR Technologies 14
Streaming in Telecom
• Data collection & handling happens at different levels
– tower, local data center, central data center)
• Batch: Can take 30 minutes per level
• Streaming: Latency drops to seconds or sub-seconds per level
• Ability to respond as events occur
• MapR Streams enables stream replication with offsets across data
centers
© 2017 MapR Technologies 15
Unique to MapR: Manage Topics at Stream Level
• Many more topics on MapR cluster
• Topics are grouped together in Stream (different from Kafka)
• Policies set at the Stream level such as time-to-live, ACEs (controlled
access at this level is different than Kafka)
• Geo-distributed stream replication (different from Kafka)
Stream
Topic 1
Topic 3
Topic 2
Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
© 2017 MapR Technologies 16
Use Case: Each pump has many sensors
pump
data
Dashboard
C2
topic = p1
p2
p3
p4
p5
p1
p1
p5
© 2017 MapR Technologies 17
Use topics as an organizing principle
© 2017 MapR Technologies 18
Example
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2017 MapR Technologies 19
Cluster
Volume mount point
© 2017 MapR Technologies 20
Streams should be integrated tightly into
normal persistence
© 2017 MapR Technologies 21
Stream vs Database
• Can be better for flexibility and multi-tenancy
• Streams can be 50 – 100x faster than db (no mutation)
• Faster means less arguments about performance optimization
• Operations are simpler so works better to share data
• Don’t have to commit to one type of db: push updates through
stream and let each group use the db they want
© 2017 MapR Technologies 22
Collect Data
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 23
And Transport to Global Analytics
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 24
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 25
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 26
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 27
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 28
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . machine . sensor
© 2017 MapR Technologies 29
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . *. sensor
© 2017 MapR Technologies 30
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . machine. *
© 2017 MapR Technologies 31
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
* . *. sensor
© 2017 MapR Technologies 32
Act locally, learn globally
© 2017 MapR Technologies 33
Machine Learning Logistics
© 2017 MapR Technologies 34
Traditional View
© 2017 MapR Technologies 35
Traditional View: This isn’t the whole story
© 2017 MapR Technologies 36
90% of the effort in successful machine
learning isn’t in the training or model dev…
It’s the logistics
© 2017 MapR Technologies 37
Why?
• Just getting the training data is hard
– Which data? How to make it accessible? Multiple sources!
– New kinds of observations force restarts
– Requires a ton of domain knowledge
• The myth of the unitary model
– You can’t train just one
– You will have dozens of models, likely hundreds or more
– Handoff to new versions is tricky
– You have to get run-time to be sure about which is better

© 2017 MapR Technologies 38
What Machine Learning Tool is Best?
• Most successful groups keep several “favorite” machine
learning tools at hand
– No single tool is best in every situation
• The most important tool is a platform that supports logistics well
– Don’t have to do everything at the application level
– Lots of what matters can be handled at the platform level
• A good design for the logistics can make a big difference
© 2017 MapR Technologies 39
Some Gotchas
• Ops-oriented people will not “get it” regarding modeling
subtleties
• Data scientists will not “get it” regarding operational realities
• Therefore, modelers have to deliver self-contained models
• And, ops has to provide pre-wired structure
© 2017 MapR Technologies 40
Rendezvous Architecture
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
© 2017 MapR Technologies 41
Rendezvous to the Rescue: Better ML Logistics
• Stream-1st architecture is a powerful approach with surprisingly
widespread advantages
– Innovative technologies emerging to for streaming data
• Microservices approach provides flexibility
– Streaming supports microservices (if done right)
• Containers remove surprises
– Predictable environment for running models
© 2017 MapR Technologies 42
Rendezvous: Mainly for Decisioning Engines
• Decisioning models
– Looking for a “right answer”
– Simpler than reinforcement learning
• Examples include:
– Fraud detection
– Predictive analytics / market prediction
– Churn prediction (as in telecommunications)
– Yield optimization
– Deep learning in form of speech or image recognition, in some cases
© 2017 MapR Technologies 43
What We Ultimately Want
request
response
Model
© 2017 MapR Technologies 44
But This Isn’t The Answer
Model 1
request
response
Load
balancer
Model 2
Model 3
© 2017 MapR Technologies 45
First Try with Streams
Input
Model 1
Model 2
Model 3
request
response
?
© 2017 MapR Technologies 46
First Rendezvous
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
© 2017 MapR Technologies 47
Some Key Points
• Note that all models see identical inputs
• All models run in production setting
• All models send scores to same stream
• The rendezvous server decides which scores to ignore
• Roll forward, roll back, correlated comparison are all now trivial
© 2017 MapR Technologies 48
Reality Check, Injecting External State
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Database
The world
© 2017 MapR Technologies 49
Recording Raw Data (as it really was)
Input
Scores
Decoy
Model 2
Model 3
Archive
© 2017 MapR Technologies 50
Quality & Reproducibility of Input Data is Important!
• Recording raw-ish data is really a big deal
– Data as seen by a model is worth gold
– Data reconstructed later often has time-machine leaks
– Databases were made for updates, streams are safer
• Raw data is useful for non-ML cases as well (think flexibility)
• Decoy model records training data as seen by models under
development & evaluation
© 2017 MapR Technologies 51
Canary for Comparison
Real
model
∆
Result
Canary
Decoy
Archive
Input
© 2017 MapR Technologies 52
What Does the Canary Do?
• The canary is a real model, but is very rarely updated
• The canary results are almost never used for decisioning
• The virtue of the canary is stability
• Comparing to the canary results gives insight into new models
© 2017 MapR Technologies 53
Isolated Development With Stream Replication
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Internal 1
Internal 2
Internal 3
The world
Model 4
Raw
New
external
data
Input
Internal 4
Production
Development
© 2017 MapR Technologies 54
A Quick Review
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 55
The Proxy Talks to the Outside World
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 56
The Input Stream Feeds All Models Identically
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 57
The Scores Stream Contains All Results
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 58
The Rendezvous Picks A Result
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 59
Results Return Via A Stream and Return Address
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 60
Models in production live in the real
world:
Conditions may (will) change
© 2017 MapR Technologies 61
Rendezvous Schedules
• The key idea of rendezvous schedules is to define the trade-off
of latency versus model priority
– At short delays, we want the best
– At moderate delays we will compromise a bit
– Near the deadline, we will take any answer at all
• Normally the same rendezvous schedules apply to all
transactions
– Overriding default schedule has bona fide uses
© 2017 MapR Technologies 62
Rendezvous Overrides
• Incoming transaction can carry an overriding schedule
– This is great for QA, to see output from a specific model
– Overriding the default schedule is also good for systemic A/B tests
• Overrides should be unusual
© 2017 MapR Technologies 63
Scaling Up
• More kinds of model
– multiple rendezvous frameworks for different tasks
• More throughput
– Fast default models
– Partition input stream to allow parallel model evaluation
– Input batching
• Extreme volumes require extreme measures
– Cannibalize fancy models to run more fast/simple models
– Speed before beauty
© 2017 MapR Technologies 64
Faster Throughput Through Failure
• Suppose we have one model that can handle 10,000 t/s @ 2ms
– But this isn’t the most accurate model. Not bad, but not best
• And our champion model can handle 1000 t/s @ 10ms
• Then imagine a burst of 2000 t/s for several minutes
• Champion can only evaluate half of all requests
– Should skip to keep up
– Fast model will cover for champion
© 2017 MapR Technologies 65
Input Scores
Model 1
Model 2
Model 3
© 2017 MapR Technologies 66
Input Scores
Model 1
Model 2
Model 3
© 2017 MapR Technologies 67
Input Scores
Model 1
Model 2
Model 3
© 2017 MapR Technologies 68
Always have a default or
fallback model
Models that fall behind should
discard requests to catch up
© 2017 MapR Technologies 69
Limitations of Rendezvous
• 100% speculative execution can be expensive
– Can be mitigated by partial speculation
– Or it may just be too expensive
• Minimum Viable Products should be minimal
– You may not require zero downtime … be realistic
• Context may be too large
• Latency limits may be too stringent
© 2017 MapR Technologies 70
Ad Targeting Example
Detailed
scoring
Proxy Pre-select
1
2
Sharded Ad Scoring
3
User
Profile
Ads
User profile and context used
for rough-cut selection of ads
Roughly 1000 ads are scored in
detail for p(click)
© 2017 MapR Technologies 71
Why Not Full Rendezvous?
• 1000’s of ads / second x 1000 candidates = 1M scores /
second
– AKA “a lot”
• Scoring a single model is expensive
• Sharding and replication provides a form of failure tolerance
• Full speculative execution across several options is prohibitive
• Latency guarantees can be very short (10 ms)
© 2017 MapR Technologies 72
Rendezvous-lite Options
• We have some options
• We can allow selective speculation on marked requests
– If only 1% of ads run speculative execution, we can pack 10x more
shards per node and use 10x fewer nodes
– Selective speculation doesn’t give redundancy
• We can release results if >80% of shards reply
• Temporary speculation during hand-offs is useful
© 2017 MapR Technologies 73
Let’s Review
© 2017 MapR Technologies 74
A Quick Review
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 75
The Proxy Talks to the Outside World
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 76
The Input Stream Feeds All Models Identically
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 77
The Scores Stream Contains All Results
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 78
The Rendezvous Picks A Result
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 79
Results Return Via A Stream and Return Address
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 80
Not Such Bad Ideas
• Keep models running “in the wings”
– Don’t wait until conditions change to start building the next model
– Keep new short-history models ready to roll, some graybeards as well
• Hot hand-off
– With rendezvous: just stop ignoring the new best model
• Deploy a canary server
– Keep an old model active as a reference
– If it was 90% correct, difference with any better model should be small
– Score distribution should be roughly constant
© 2017 MapR Technologies 81
New book: how to manage machine learning models
Download free pdf or read free online via @MapR:
https://mapr.com/ebook/machine-learning-logistics/
“Rendezvous Architecture” by Ted Dunning & Ellen Friedman, in
Encyclopedia of Big Data Technologies. Sherif Sakr and Albert
Zomaya, editors. Springer International Publishing, in press 2018.
and
© 2017 MapR Technologies 82
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Committer, PMC member, board member, ASF
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
© 2017 MapR Technologies 83
Q&A
@mapr
tdunning@mapr.com
ENGAGE WITH US
@ Ted_Dunning
1 of 83

Recommended

Machine Learning logistics by
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
3.9K views53 slides
Machine Learning Logistics by
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
613 views52 slides
Tensor Abuse - how to reuse machine learning frameworks by
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
883 views24 slides
Surprising Advantages of Streaming - ACM March 2018 by
Surprising Advantages of Streaming - ACM March 2018Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018Ellen Friedman
443 views44 slides
T digest-update by
T digest-updateT digest-update
T digest-updateTed Dunning
1.4K views52 slides
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ... by
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
799 views32 slides

More Related Content

What's hot

Live Machine Learning Tutorial: Churn Prediction by
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
3.8K views58 slides
How to tell which algorithms really matter by
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
1.1K views54 slides
Where is Data Going? - RMDC Keynote by
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
545 views62 slides
Cheap learning-dunning-9-18-2015 by
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
1.8K views41 slides
Streaming patterns revolutionary architectures by
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
507 views79 slides
Applying Machine Learning to Live Patient Data by
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
1.8K views28 slides

What's hot(20)

Live Machine Learning Tutorial: Churn Prediction by MapR Technologies
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies3.8K views
How to tell which algorithms really matter by DataWorks Summit
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
DataWorks Summit1.1K views
Where is Data Going? - RMDC Keynote by Ted Dunning
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning545 views
Cheap learning-dunning-9-18-2015 by Ted Dunning
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Ted Dunning1.8K views
Streaming patterns revolutionary architectures by Carol McDonald
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
Carol McDonald507 views
Applying Machine Learning to Live Patient Data by Carol McDonald
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald1.8K views
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time by Ted Dunning
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning2.8K views
How Big Data is Reducing Costs and Improving Outcomes in Health Care by Carol McDonald
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald1K views
Cognitive computing with big data, high tech and low tech approaches by Ted Dunning
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
Ted Dunning2.6K views
Anomaly Detection - New York Machine Learning by Ted Dunning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning6.3K views
What is the past future tense of data? by Ted Dunning
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
Ted Dunning2.5K views
Sharing Sensitive Data Securely by Ted Dunning
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning1.8K views
Which Algorithms Really Matter by Ted Dunning
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning32.2K views
Doing-the-impossible by Ted Dunning
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
Ted Dunning3.3K views
Dunning time-series-2015 by Ted Dunning
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
Ted Dunning1.1K views
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose by Allen Day, PhD
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
Allen Day, PhD5K views

Similar to Streaming Architecture including Rendezvous for Machine Learning

Geo-Distributed Big Data and Analytics by
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
2.1K views59 slides
ML Workshop 1: A New Architecture for Machine Learning Logistics by
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
2K views45 slides
Machine Learning Success: The Key to Easier Model Management by
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
1.9K views44 slides
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics... by
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
680 views53 slides
Spark and MapR Streams: A Motivating Example by
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
1.3K views66 slides
Predictive Maintenance Using Recurrent Neural Networks by
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
1.1K views32 slides

Similar to Streaming Architecture including Rendezvous for Machine Learning(20)

Geo-Distributed Big Data and Analytics by MapR Technologies
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies2.1K views
ML Workshop 1: A New Architecture for Machine Learning Logistics by MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
Machine Learning Success: The Key to Easier Model Management by MapR Technologies
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies1.9K views
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics... by The Hive
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive680 views
Spark and MapR Streams: A Motivating Example by Ian Downard
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Ian Downard1.3K views
Predictive Maintenance Using Recurrent Neural Networks by Justin Brandenburg
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
Justin Brandenburg1.1K views
An Introduction to the MapR Converged Data Platform by MapR Technologies
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub... by Mathieu Dumoulin
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Mathieu Dumoulin2.1K views
Map r chicago_advanalytics_oct_meetup by Alan Iovine
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
Alan Iovine416 views
Big Data LDN 2017: Real World Impact of a Global Data Fabric by Matt Stubbs
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Matt Stubbs273 views
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using... by Carol McDonald
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald2K views
CEP - simplified streaming architecture - Strata Singapore 2016 by Mathieu Dumoulin
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin3.9K views
State of the Art Robot Predictive Maintenance with Real-time Sensor Data by Mathieu Dumoulin
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
Mathieu Dumoulin1.9K views
Fast Cars, Big Data How Streaming can help Formula 1 by Carol McDonald
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald908 views
Big Data LDN 2017: How to leverage the cloud for Business Solutions by Matt Stubbs
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Matt Stubbs139 views
Evolving Beyond the Data Lake: A Story of Wind and Rain by MapR Technologies
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies776 views
How Spark is Enabling the New Wave of Converged Applications by MapR Technologies
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
MapR Technologies343 views

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx by
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
19 views21 slides
How to Get Going with Kubernetes by
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
593 views80 slides
Progress for big data in Kubernetes by
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
473 views82 slides
Anomaly Detection: How to find what you didn’t know to look for by
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
766 views104 slides
How the Internet of Things is Turning the Internet Upside Down by
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
1.7K views71 slides
Apache Kylin - OLAP Cubes for SQL on Hadoop by
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
8.5K views42 slides

More from Ted Dunning(14)

Dunning - SIGMOD - Data Economy.pptx by Ted Dunning
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
Ted Dunning19 views
How to Get Going with Kubernetes by Ted Dunning
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
Ted Dunning593 views
Progress for big data in Kubernetes by Ted Dunning
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
Ted Dunning473 views
Anomaly Detection: How to find what you didn’t know to look for by Ted Dunning
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning766 views
How the Internet of Things is Turning the Internet Upside Down by Ted Dunning
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning1.7K views
Apache Kylin - OLAP Cubes for SQL on Hadoop by Ted Dunning
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ted Dunning8.5K views
Recommendation Techn by Ted Dunning
Recommendation TechnRecommendation Techn
Recommendation Techn
Ted Dunning1.6K views
What's new in Apache Mahout by Ted Dunning
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
Ted Dunning5.6K views
Possible Visions for Mahout 1.0 by Ted Dunning
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
Ted Dunning2.5K views
My talk about recommendation and search to the Hive by Ted Dunning
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
Ted Dunning2.8K views
Strata 2014 Anomaly Detection by Ted Dunning
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Ted Dunning11.3K views
Building multi-modal recommendation engines using search engines by Ted Dunning
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
Ted Dunning15K views
Using Mahout and a Search Engine for Recommendation by Ted Dunning
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
Ted Dunning7.4K views
Inside MapR's M7 by Ted Dunning
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
Ted Dunning7.5K views

Recently uploaded

AMAZON PRODUCT RESEARCH.pdf by
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdfJerikkLaureta
15 views13 slides
Spesifikasi Lengkap ASUS Vivobook Go 14 by
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14Dot Semarang
35 views1 slide
ChatGPT and AI for Web Developers by
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web DevelopersMaximiliano Firtman
181 views82 slides
Tunable Laser (1).pptx by
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptxHajira Mahmood
23 views37 slides
Black and White Modern Science Presentation.pptx by
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptxmaryamkhalid2916
14 views21 slides

Recently uploaded(20)

AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta15 views
Spesifikasi Lengkap ASUS Vivobook Go 14 by Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 views
Black and White Modern Science Presentation.pptx by maryamkhalid2916
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptx
maryamkhalid291614 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2216 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada121 views
Data-centric AI and the convergence of data and model engineering: opportunit... by Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier34 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana12 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
1st parposal presentation.pptx by i238212
1st parposal presentation.pptx1st parposal presentation.pptx
1st parposal presentation.pptx
i2382129 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi120 views

Streaming Architecture including Rendezvous for Machine Learning

  • 1. © 2017 MapR Technologies 1 Why Stream? and Machine Learning Logistics
  • 2. © 2017 MapR Technologies 2 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Committer, PMC member, board member, ASF O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning
  • 3. © 2017 MapR Technologies 3 Traditional Solution – Use a Profile Database POS 1..n Fraud detector Last card use
  • 4. © 2017 MapR Technologies 4 What Happens as You Scale Up? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 5. © 2017 MapR Technologies 5 Shared Database Can Be A Problem POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector Shared database causes problems Big problem is disagreement about schema and indexing
  • 6. © 2017 MapR Technologies 6 Alternative: Use a Stream to Isolate Services POS 1..n Fraud detector Last card use Updater card activity
  • 7. © 2017 MapR Technologies 7 Add New Services via the Stream POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  • 8. © 2017 MapR Technologies 8 Changing Implementation Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  • 9. © 2017 MapR Technologies 9 Changing Implementation Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  • 10. © 2017 MapR Technologies 10 With MapR, Geo-Distributed Data Appears Local stream Data source Consumer
  • 11. © 2017 MapR Technologies 11 With MapR, Geo-Distributed Data Appears Local stream stream Data source Consumer
  • 12. © 2017 MapR Technologies 12 With MapR, Geo-distributed Data Appears Local stream stream Data source ConsumerGlobal Data Center Regional Data Center
  • 13. © 2017 MapR Technologies 13 Use Case: Telecommunications Callers Towers cdr data
  • 14. © 2017 MapR Technologies 14 Streaming in Telecom • Data collection & handling happens at different levels – tower, local data center, central data center) • Batch: Can take 30 minutes per level • Streaming: Latency drops to seconds or sub-seconds per level • Ability to respond as events occur • MapR Streams enables stream replication with offsets across data centers
  • 15. © 2017 MapR Technologies 15 Unique to MapR: Manage Topics at Stream Level • Many more topics on MapR cluster • Topics are grouped together in Stream (different from Kafka) • Policies set at the Stream level such as time-to-live, ACEs (controlled access at this level is different than Kafka) • Geo-distributed stream replication (different from Kafka) Stream Topic 1 Topic 3 Topic 2 Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
  • 16. © 2017 MapR Technologies 16 Use Case: Each pump has many sensors pump data Dashboard C2 topic = p1 p2 p3 p4 p5 p1 p1 p5
  • 17. © 2017 MapR Technologies 17 Use topics as an organizing principle
  • 18. © 2017 MapR Technologies 18 Example Files Table Streams Directories Cluster Volume mount point
  • 19. © 2017 MapR Technologies 19 Cluster Volume mount point
  • 20. © 2017 MapR Technologies 20 Streams should be integrated tightly into normal persistence
  • 21. © 2017 MapR Technologies 21 Stream vs Database • Can be better for flexibility and multi-tenancy • Streams can be 50 – 100x faster than db (no mutation) • Faster means less arguments about performance optimization • Operations are simpler so works better to share data • Don’t have to commit to one type of db: push updates through stream and let each group use the db they want
  • 22. © 2017 MapR Technologies 22 Collect Data log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center
  • 23. © 2017 MapR Technologies 23 And Transport to Global Analytics log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 24. © 2017 MapR Technologies 24 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 25. © 2017 MapR Technologies 25 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 26. © 2017 MapR Technologies 26 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 27. © 2017 MapR Technologies 27 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 28. © 2017 MapR Technologies 28 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection Topic: data-center . machine . sensor
  • 29. © 2017 MapR Technologies 29 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection Topic: data-center . *. sensor
  • 30. © 2017 MapR Technologies 30 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection Topic: data-center . machine. *
  • 31. © 2017 MapR Technologies 31 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection Topic: * . *. sensor
  • 32. © 2017 MapR Technologies 32 Act locally, learn globally
  • 33. © 2017 MapR Technologies 33 Machine Learning Logistics
  • 34. © 2017 MapR Technologies 34 Traditional View
  • 35. © 2017 MapR Technologies 35 Traditional View: This isn’t the whole story
  • 36. © 2017 MapR Technologies 36 90% of the effort in successful machine learning isn’t in the training or model dev… It’s the logistics
  • 37. © 2017 MapR Technologies 37 Why? • Just getting the training data is hard – Which data? How to make it accessible? Multiple sources! – New kinds of observations force restarts – Requires a ton of domain knowledge • The myth of the unitary model – You can’t train just one – You will have dozens of models, likely hundreds or more – Handoff to new versions is tricky – You have to get run-time to be sure about which is better 
  • 38. © 2017 MapR Technologies 38 What Machine Learning Tool is Best? • Most successful groups keep several “favorite” machine learning tools at hand – No single tool is best in every situation • The most important tool is a platform that supports logistics well – Don’t have to do everything at the application level – Lots of what matters can be handled at the platform level • A good design for the logistics can make a big difference
  • 39. © 2017 MapR Technologies 39 Some Gotchas • Ops-oriented people will not “get it” regarding modeling subtleties • Data scientists will not “get it” regarding operational realities • Therefore, modelers have to deliver self-contained models • And, ops has to provide pre-wired structure
  • 40. © 2017 MapR Technologies 40 Rendezvous Architecture Input Scores RendezvousModel 1 Model 2 Model 3 request response Results
  • 41. © 2017 MapR Technologies 41 Rendezvous to the Rescue: Better ML Logistics • Stream-1st architecture is a powerful approach with surprisingly widespread advantages – Innovative technologies emerging to for streaming data • Microservices approach provides flexibility – Streaming supports microservices (if done right) • Containers remove surprises – Predictable environment for running models
  • 42. © 2017 MapR Technologies 42 Rendezvous: Mainly for Decisioning Engines • Decisioning models – Looking for a “right answer” – Simpler than reinforcement learning • Examples include: – Fraud detection – Predictive analytics / market prediction – Churn prediction (as in telecommunications) – Yield optimization – Deep learning in form of speech or image recognition, in some cases
  • 43. © 2017 MapR Technologies 43 What We Ultimately Want request response Model
  • 44. © 2017 MapR Technologies 44 But This Isn’t The Answer Model 1 request response Load balancer Model 2 Model 3
  • 45. © 2017 MapR Technologies 45 First Try with Streams Input Model 1 Model 2 Model 3 request response ?
  • 46. © 2017 MapR Technologies 46 First Rendezvous Input Scores RendezvousModel 1 Model 2 Model 3 request response Results
  • 47. © 2017 MapR Technologies 47 Some Key Points • Note that all models see identical inputs • All models run in production setting • All models send scores to same stream • The rendezvous server decides which scores to ignore • Roll forward, roll back, correlated comparison are all now trivial
  • 48. © 2017 MapR Technologies 48 Reality Check, Injecting External State Model 1 Model 2 Model 3 request Raw Add external data Input Database The world
  • 49. © 2017 MapR Technologies 49 Recording Raw Data (as it really was) Input Scores Decoy Model 2 Model 3 Archive
  • 50. © 2017 MapR Technologies 50 Quality & Reproducibility of Input Data is Important! • Recording raw-ish data is really a big deal – Data as seen by a model is worth gold – Data reconstructed later often has time-machine leaks – Databases were made for updates, streams are safer • Raw data is useful for non-ML cases as well (think flexibility) • Decoy model records training data as seen by models under development & evaluation
  • 51. © 2017 MapR Technologies 51 Canary for Comparison Real model ∆ Result Canary Decoy Archive Input
  • 52. © 2017 MapR Technologies 52 What Does the Canary Do? • The canary is a real model, but is very rarely updated • The canary results are almost never used for decisioning • The virtue of the canary is stability • Comparing to the canary results gives insight into new models
  • 53. © 2017 MapR Technologies 53 Isolated Development With Stream Replication Model 1 Model 2 Model 3 request Raw Add external data Input Internal 1 Internal 2 Internal 3 The world Model 4 Raw New external data Input Internal 4 Production Development
  • 54. © 2017 MapR Technologies 54 A Quick Review Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 55. © 2017 MapR Technologies 55 The Proxy Talks to the Outside World Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 56. © 2017 MapR Technologies 56 The Input Stream Feeds All Models Identically Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 57. © 2017 MapR Technologies 57 The Scores Stream Contains All Results Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 58. © 2017 MapR Technologies 58 The Rendezvous Picks A Result Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 59. © 2017 MapR Technologies 59 Results Return Via A Stream and Return Address Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 60. © 2017 MapR Technologies 60 Models in production live in the real world: Conditions may (will) change
  • 61. © 2017 MapR Technologies 61 Rendezvous Schedules • The key idea of rendezvous schedules is to define the trade-off of latency versus model priority – At short delays, we want the best – At moderate delays we will compromise a bit – Near the deadline, we will take any answer at all • Normally the same rendezvous schedules apply to all transactions – Overriding default schedule has bona fide uses
  • 62. © 2017 MapR Technologies 62 Rendezvous Overrides • Incoming transaction can carry an overriding schedule – This is great for QA, to see output from a specific model – Overriding the default schedule is also good for systemic A/B tests • Overrides should be unusual
  • 63. © 2017 MapR Technologies 63 Scaling Up • More kinds of model – multiple rendezvous frameworks for different tasks • More throughput – Fast default models – Partition input stream to allow parallel model evaluation – Input batching • Extreme volumes require extreme measures – Cannibalize fancy models to run more fast/simple models – Speed before beauty
  • 64. © 2017 MapR Technologies 64 Faster Throughput Through Failure • Suppose we have one model that can handle 10,000 t/s @ 2ms – But this isn’t the most accurate model. Not bad, but not best • And our champion model can handle 1000 t/s @ 10ms • Then imagine a burst of 2000 t/s for several minutes • Champion can only evaluate half of all requests – Should skip to keep up – Fast model will cover for champion
  • 65. © 2017 MapR Technologies 65 Input Scores Model 1 Model 2 Model 3
  • 66. © 2017 MapR Technologies 66 Input Scores Model 1 Model 2 Model 3
  • 67. © 2017 MapR Technologies 67 Input Scores Model 1 Model 2 Model 3
  • 68. © 2017 MapR Technologies 68 Always have a default or fallback model Models that fall behind should discard requests to catch up
  • 69. © 2017 MapR Technologies 69 Limitations of Rendezvous • 100% speculative execution can be expensive – Can be mitigated by partial speculation – Or it may just be too expensive • Minimum Viable Products should be minimal – You may not require zero downtime … be realistic • Context may be too large • Latency limits may be too stringent
  • 70. © 2017 MapR Technologies 70 Ad Targeting Example Detailed scoring Proxy Pre-select 1 2 Sharded Ad Scoring 3 User Profile Ads User profile and context used for rough-cut selection of ads Roughly 1000 ads are scored in detail for p(click)
  • 71. © 2017 MapR Technologies 71 Why Not Full Rendezvous? • 1000’s of ads / second x 1000 candidates = 1M scores / second – AKA “a lot” • Scoring a single model is expensive • Sharding and replication provides a form of failure tolerance • Full speculative execution across several options is prohibitive • Latency guarantees can be very short (10 ms)
  • 72. © 2017 MapR Technologies 72 Rendezvous-lite Options • We have some options • We can allow selective speculation on marked requests – If only 1% of ads run speculative execution, we can pack 10x more shards per node and use 10x fewer nodes – Selective speculation doesn’t give redundancy • We can release results if >80% of shards reply • Temporary speculation during hand-offs is useful
  • 73. © 2017 MapR Technologies 73 Let’s Review
  • 74. © 2017 MapR Technologies 74 A Quick Review Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 75. © 2017 MapR Technologies 75 The Proxy Talks to the Outside World Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 76. © 2017 MapR Technologies 76 The Input Stream Feeds All Models Identically Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 77. © 2017 MapR Technologies 77 The Scores Stream Contains All Results Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 78. © 2017 MapR Technologies 78 The Rendezvous Picks A Result Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 79. © 2017 MapR Technologies 79 Results Return Via A Stream and Return Address Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 80. © 2017 MapR Technologies 80 Not Such Bad Ideas • Keep models running “in the wings” – Don’t wait until conditions change to start building the next model – Keep new short-history models ready to roll, some graybeards as well • Hot hand-off – With rendezvous: just stop ignoring the new best model • Deploy a canary server – Keep an old model active as a reference – If it was 90% correct, difference with any better model should be small – Score distribution should be roughly constant
  • 81. © 2017 MapR Technologies 81 New book: how to manage machine learning models Download free pdf or read free online via @MapR: https://mapr.com/ebook/machine-learning-logistics/ “Rendezvous Architecture” by Ted Dunning & Ellen Friedman, in Encyclopedia of Big Data Technologies. Sherif Sakr and Albert Zomaya, editors. Springer International Publishing, in press 2018. and
  • 82. © 2017 MapR Technologies 82 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Committer, PMC member, board member, ASF O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning
  • 83. © 2017 MapR Technologies 83 Q&A @mapr tdunning@mapr.com ENGAGE WITH US @ Ted_Dunning