Productionizing Predictive Analytics using the Rendezvous Architecture - for Data Scientists

Delivery Excellence:
Love is in the Air –
an Architecture for Analytics
in Production
Leverkusen, July 31, 2019, Daniel Schulz

DSC_RendezvousArchitecture.pptx
Love is in the Air
is an Architectural Battle Story

Time Schedule
Topics Covered
1. Brief Motivation
2. Recap: Lambda Architecture
3. Rendezvous Architecture
© 2019 Daniel Schulz. All rights reserved. 3

Challenge
Deliver many Ever-Improving
Predictive Models in Reproducible &
Robust Manner to Production

The Challenge Put Forward
• Greenfield project
• Modern tech stack: Kubernetes, Docker, Cloud-native, DSE Cassandra
• Reproducibility
• Resilient & robust models
• Deliver multiple models w/ increasing quality
• DevOps delivery
Requirements for both QA & Production

Time Schedule
Topics Covered
1. Brief Motivation

Competing, Famous
λ-Architecture
is a Different Approach – both
Might be Combined w/ One Another

The Lambda-Architecture Splits Data into Two Distinct Sets –
a Large Batch Set as a 360° View & Just Recent, Speedy Updates
Just Like the Data Set is Split, so are the Data Stores & Hence their Respective Serving Architectures
Speed Layer
Batch Layer
Data Sources
Queries
Queries
GET
GET
PUT/POST

In a more Unified Version, One Central Serving Architecture
Combines Both Batch & Speed Layer Data to Answer Queries
There is a Trade-Off to be Made between Completeness/Perfection in Batch & Recency in Speed Layer Data
Speed Layer
Batch Layer
Data Sources Serving Layer QueriesPUT/POST GET
GET

Lambda-Architecture is a Trade-Off Between Completeness &
Speediness of Prediction
• Commonalities
• Use Cases in Predictive Analytics
• Deliver predictions very fast
• Differences
• Lambda Architecture
• is criticized for its complexity –
different data stores, software code base and serving technologies might be needed
• Only applicable in Append-only Architectures like e.g. Hadoop –
data change would yield a completely new batch computation
• Focused on data age
▪ Rendezvous Architecture
• Technically more complex to implement
• Message Queue needed which might duplicate messages and increase latency
• Focused on Predictive Models
The Batch Layer Provides an almost Perfect 360° on Everything – the Speed Layer Intends to Predict Fast

Time Schedule
Topics Covered
1. Brief Motivation

λ-Architecture ⊥
Rendezvous Architecture
Are Different Approaches – both
Might be Combined w/ One Another

Objectives on the Rendezvous Architecture
Managing a Myriad of Predictive Analytics Models in Production
▪ Manage multiple models in production and alike environments
▪ Test-drive incumbent & challenging models against one another
→ enable Rapid App/Model Development (RAD) for DevOps projects
▪ Reproducibility & transparency of model’s predictions

Client/Server
Architecture
Simple & Traditional

Starting with Simplest, Traditional Client/Server Architecture –
Discrete, Direct & Stateless Response
One Endpoint Answers Predictive Queries w/ all Information to the Model Encapsulated in Request
Source & image courtesy: [MLL]

Starting with Simplest, Traditional Client/Server Architecture –
Discrete, Direct & Stateless Response
• Shortcomings:
• Only one Predictive Model at a time
• Incumbent-only deployment – no challenging models for reference
• No comparing many model’s accuracies against one another
• Continuous improvement in Data Science resp. DataOps team unlikely due to lack of feedback
• Redeployment could result in downtimes iff no Green/Blue Deployments are used
• Selected implementation ideas:
• Tensorflow Serving
• Flask
• Various Application Servers
This is the Simplest Architecture for AI Deployments

Annotations for All Future Architecture Iterations
• Annotation on HA:
independent of the High-Availability aspect, the following is agnostic to whether there might be one or many replicas to
take over in case of technical failure
• Annotation on Multi-Threading:
any Model might be one singular process, a Multi-Threading, Multi-Process program or even an Endpoint in a Microservice
Architecture (with more complex calls behind it); the only relevant information is the Client/Server nature of it, where all
information is enclosed in this stateless request and the model responds to it; external dependencies might apply or not
• Annotation on Ensembles, etc.:
this model might be an Ensemble or not – the sophistication and complexity underneath is secondary
Applies to this Base Model & all Supporting, more Complex Versions of such

Load Balanced
Client/Server Architecture
Enable Fast Model Exchange

Adding Technical High Availability Helps Little –
a Load Balancer Forwards any One Request to Exactly One Model
The Reverse Proxy Enables Green/Blue Deployments Only – Swapping Models Just Happens Faster Now

Adding Technical High Availability Helps Little –
a Load Balancer Forwards any One Request to Exactly One Model
• Shortcomings:
• Only one Predictive Model per Request at a time
• Hence, still no challenging models for reference
• Selected implementation ideas for Reverse Proxy:
• Kubernetes or Kubernetes w/ Istio (Microservice Architectures)
• Docker Swarm (Microservice Architectures)
• Nginx
• Flask
• Various Application Servers
This Architecture is Pretty Simple & a Standard Deployment Practice from Custom Solution Development

Load Balanced
Parallelized Models
Enable Multiple Models at Once

Multiple Models in Parallel –
Adding a Message Queue Enables Concurrency of Models
But which Prediction to Choose in the End?

Multiple Models in Parallel –
Adding a Message Queue Enables Concurrency of Models
• Shortcomings:
• Somewhat challenging models for reference
• Need to add return address (as an URI) and boolean flag to whether to return anything at all in the queue –
for the consumer to respond to the original (HTTP) request, which might be completed then
• Persistent Message Queues tend to be slower than pure in-memory ones;
Spark Streaming working in micro-batches might have higher latencies as well
• Selected implementation ideas for (persistent) Message Queues:
• Apache Kafka
• Apache Flume
• MapR Streams
• RabbitMQ, ActiveMQ, ZeroMQ, etc.
• Apache Spark’s Streaming resp. Apache Flink’s Streaming as Producers resp. Consumers of such Message Queues
But which Prediction to Choose in the End?

Rendezvous
Architecture
Stop-back Request Guarantees,
Parallelized, Comparable,
Rapid Model Updates,
QoS Guarantees,
etc.

The Model’s Rendezvous – a Scoring Stream Collects Various
Predictions & Returns them to the Original Request’s Client
Many Models – One Final Prediction

Stateless Models are Easier to Replicate –
but Data Augmentation Might be Helpful or Necessary
Any Additional Piece of or State Information Helpful to our Models Shall be Added in One Central Place

The Decoy Model –
Collect Production Data for Debugging, Optimizations and Reproducibility
The “Unit Tests” of AI Models in Production is Real-World Data

Discussion when to Add External Information
No Augmentation
▪ Models only receive request information
▪ More reliable for technical failure
▪ More overhead when many models fetch the same data –
use caches here

Discussion when to Add External Information
No Augmentation
▪ Models only receive request information
▪ More reliable for technical failure
▪ More overhead when many models fetch the same data –
use caches here
All Augmentation in One Central Place
▪ Models receive all complete information
▪ Ideal case for reproducibility –
as the “Decoy Model” stores all information for
debugging and to explain predictions later-on
▪ Usually faster due to smaller overhead
→ Best Practice by Ellen Friedman & Ted Dunning,
Chapter 3, sub-section “Stateful Models” in [MLL]

Add Metrics & the Canary Model for Monitoring & Optimizations
Metrics are Crucial to Compare Many Models Against One Another & Hence Supports Accuracy Improvements

Add Metrics & the Canary Model for Monitoring & Optimizations
Metrics
▪ Metrics monitor all model’s predictions and compares
their performances
▪ Hence, we are able to judge challenging model’s
performances compared to the incumbent model’s ones
▪ Metrics are helpful to detect outliers in predictions –
• e.g. Adversarial Images, where models might predict
obscure classifications
• e.g. detect swayed models, shift in input data’s
distributions, etc.
▪ Both technological SLA, timing & AI metrics
• SLA metrics:
latency, throughput, etc.
• Timing metrics:
computation time in threads, time for requests by
source/endpoint, etc.
• AI metrics:
accuracy, error metrics, AUC, F-statistics, etc.
The Canary Model
▪ Also a Best Practice by [MLL] for finding anomalies
▪ Is a rather dated model that keeps predicting to compare
the newer models’ predictions with it – to help detect
how production-ready they really are
▪ The difference in Canary Model and later ones is proof of
progress – or lack thereof for the Data Scientists

Rendezvous Architecture – a Mixture of Models in Harmony
• Advantages:
• Model “warm-up” in production-like environments
• Switch models in an instant – un-deploy & deploy AI models swiftly
• Introduce time guarantees: all models work in parallel like Cassandra queries
• Mix simple, technically robust (not failing) models along w/ more sophisticated ones, which might break suddenly
• Incumbent vs challenging: collect raw data and performance metrics for various models (e.g. XGBoost vs Random
Forests; e.g. Linear Model vs SVM; e.g. PCA vs T-SNE) and differing versions in model streams (version 0.1, 0.2, …)
• Backstop:
when taking too long, a simpler, less robust model may answer as a backstop for more complex, more sophisticated
models; the same applies when longer-term performance metrics might indicate another model would perform better
My Suggestion for Reliable AI Systems Due to…

Source & Image Courtesy from Book “Machine Learning Logistics”
“Machine Learning Logistics” by Ellen Friedman & Ted Dunning
▪ Authors: Ellen Friedman & Ted Dunning
▪ Publisher: O'Reilly Media, Inc.
▪ Release Date: October 2017
▪ ISBN: 9 7814 9199 7611
▪ Picture source ID: MLL

Source from Book “Hands-on Machine Learning with Scikit-Learn,
Keras, and TensorFlow, 2nd Edition”
“Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow” by Aurélien Géron
▪ Author: Aurélien Géron
▪ Publisher: O'Reilly Media, Inc.
▪ Release Date: September 2019
▪ ISBN: 9 7814 9203 2649

Résumé
Resilient, Extendable &
Production-ready Architecture for
Predictive Analytics

Resumé on Rendezvous Architecture
Limitations
▪ Is no silver bullet – does not solve all logistical obstacles in AI projects
▪ Focusses on production-side architecture – development, test, QA and LTE environments might benefit from it
▪ Focus on technological, software architecture – does not cover ML Metrics, Hyperparameter Tuning, etc.
▪ Latencies increased a bit by Message-Queue-dependency

Major Advantages
▪ Manage multiple models in production and alike
environments
▪ Test-drive incumbent & challenging models against one
another

Major Advantages
▪ Manage multiple models in production and alike
environments
▪ Test-drive incumbent & challenging models against one
another
Minor Benefits
▪ Collect real-world data for future development
▪ Establish baseline performance values for Predictive
Analytics
▪ Rapid development due to default, fallback models
▪ Latency guarantees for model predictions
resilient, robust Predictive models
in modern Agile & DevOps projects

Rendezvous Architecture
is the Modern Bedrock
of Robust Predictive models for
Today’s Agile & DevOps Projects

Thank You for Your Attention
Please Feel Free to Ask any Open Questions, Suggestions or Voice Your Opinion…

A global leader in consulting, technology services and digital transformation,
Capgemini is at the forefront of innovation to address the entire breadth of clients’
opportunities in the evolving world of cloud, digital and platforms. Building on its
strong 50-year heritage and deep industry-specific expertise, Capgemini enables
organizations to realize their business ambitions through an array of services from
strategy to operations. Capgemini is driven by the conviction that the business
value of technology comes from and through people. It is a multicultural company
of over 200,000 team members in more than 40 countries. The Group reported
2018 global revenues of EUR 13.2 billion.
About Capgemini
Learn more about us at
www.capgemini.com
This presentation contains information that may be privileged or confidential and
is the property of the Capgemini Group.
Copyright © 2019 Daniel Schulz. All rights reserved.
People matter, results count.

Productionizing Predictive Analytics using the Rendezvous Architecture - for Data Scientists

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Productionizing Predictive Analytics using the Rendezvous Architecture - for Data Scientists

Similar to Productionizing Predictive Analytics using the Rendezvous Architecture - for Data Scientists (20)

Recently uploaded

Recently uploaded (20)

Productionizing Predictive Analytics using the Rendezvous Architecture - for Data Scientists