SlideShare a Scribd company logo
Production-Ready
Machine Learning for the
Software Architect
Pooyan Jamshidi
CMU
Guest lecture
Architectures for
Software Systems,
April 2018
Learning Goals
Understand how to build a system that can
put the power of machine learning to use.
Understand how to incorporate ML-based
components into a larger system.
Understand the principles that govern
these systems, both as software and as
predictive systems.
Learning Goals
Understand challenges of building ML
systems.
Understand strategies of making ML
systems reactive.
I won’t teach you specific techniques
of machine learning, it simply does
not fit in one lecture course.
The topic of this lecture
belongs to here!
I also hang around here
Software
Arch.
Machine
Learning
Distributed
Systems
ML as a technique vs ML
as system in production
After few years, PyTorch
just became production
ready
Disclaimer
I borrowed materials
from:
Jeff Smith’s book ->
Uber blog
Spark Documents
MIT course Intro to
Deep Learning
TF document
Who am I?
Postdoc at CMU, Fall’18: Prof. At USC
My research: SE+Systems+ML
Software dev/eng for 7 years
3+ yrs as architect
More info: https://pooyanjamshidi.github.io
Please interrupt and ask
your questions at anytime!
What is the grand
difference?
Software systems are built from sets
of software components
ML(-based) systems are build from
sets of software components capable
of learning from data and making
predictions about the future
Case study
A startup that tries to build a
machine learning system from the
ground up and finds it very, very
hard!
Sniffable
Sniffable is the Instagram for dogs.
Dog owners post pictures of their
dogs,
and other dog owners (sniffers!)
like, share, and comment on those
pictures.
A New Feature for
Sniffable
Sniffers want to promote their
specific dog
Sniffers want their dogs to become a
celebrity!
So, they want a new tool to make
their pupdates, more viral.
Pooch Predictor
A new feature to figure out which
picture would get the biggest
response on Sniffable
A new feature to predict the number
of likes a given pupdate might get,
based on the hashtags used.
Why did the startup come
up with Pooch Predictor?
It would engage the dog owners,
help them create viral content,
and grow the Sniffable network as a
whole
Pooch Predictor 1.0
Architecture
What kind of architectural style is this?
What happened at the
organizational level
Data
Scientist
Software
Engineer
Prototype
Something seems
wrong!
Issue: Predictions weren’t changing
much over a period of time despite
adding more data to the system
Misunderstanding
between stakeholders
What is the
retraining What??
Data
Scientist
Software
Engineer
Pooch Predictor 1.1
Architecture
System issues
appeared
Pooch Predictor wasn’t very reliable
Change in the database -> the queries
would fail
High load on the server -> the modeling
job would fail
More and more as the size of the social
network increased
Failures were hard
to detect
Failures were hard to detect without
building up more sophisticated
monitoring infrastructure.
There wasn’t much that could be
done other than restarting the job.
Modeling issues
Architect realized that some of the
features weren’t being correctly
extracted from the raw data.
It was also really hard to
understand how a change to the
features would impact modeling
performance.
A major issue
happened
For a couple of weeks, their interaction rates
steadily trended down.
For users outside of the US, Pooch Predictor
would always predict a negative number of likes!!
It was a problem with the modeling system’s
location-based features.
The data scientist and engineer paired to fix the
issue 
Things went from
bad to worse!
Appeared when data scientist implementing
more feature-extraction functionality to
hopefully improve modeling quality.
The app slowed down dramatically.
New functionality caused exceptions to be
thrown on the server, which led to
pupdates being dropped. 
What was the
reason?
Sending the data from the app back
to server took way too long to
maintain reasonable responsiveness.
The server would throw an
exception any time the prediction
functionality saw any of the new
features.
So what they did??
After understanding where things
had gone wrong, the team quickly
rolled back all of the new
functionality and restored the app
to a normal operational state.
They held a retrospective to build a
better system…
Building a better
system (operation)
Sniffle must remain responsive, regardless
of any problems with the predictive system.
The predictive component needs to be less
tightly coupled to the rest of the systems.
The predictive system needs to behave
predictably regardless of high load or
errors in the system itself.
Building a better
system (development)
It should be easier for different
developers to make changes to the
predictive system.
The code needs to use different
programming idioms that ensure
better performance.
Building a better
system (ML)
The predictive system must measure its modeling
performance better.
The predictive system should support evolution
and change.
The predictive system should support online
experimentation.
Easy for developers to supervise the predictive
system and rapidly correct any rogue behavior.

Sniffable team missed
something big, right?
They built what initially looked like a useful ML
system that added value to their core product.
Production issues with their ML system
frequently pulled the team away from work on
improvements to the capability of the system.
Even though they had a bunch of smart people
to develop model to predict the dynamics of
dog-based social networking, their system
repeatedly failed at its mission.
Building ML
systems is hard!
The data scientist knew how to do ML.
Pooch Predictor totally worked on his
laptop.
But the data scientist was not thinking
of ML as a system - but as a technique!
Pooch Predictor was a failure both as a
predictive system and as a software.
To build it right:
Reactive ML
Reactive ML = ML as a system
ML Pipeline
Starting from the
beginning
A ML system must collect data from the
outside world.
In the Pooch Predictor example, the team
was trying to skip this concern by using the
data that their application already had.
Obviously, this approach was quick, but it
tightly couples the Sniffable application data
model to the Pooch Predictor data model.
Derived representations of
the raw data = instances
What is “feature”?
Features are meaningful data points
derived from raw data related to the
entity being predicted on.
A Sniffable example of a “feature”
would be ??
the number of friends a given dog
has. 
What is the feature
vector?
The feature values for a given
instance are collectively called
a feature vector.
What is a concept?
A concept is the thing that the system is
trying to predict.
In the context of Pooch Predictor, a
concept would be the number of likes a
given post receives.
When a concept is discrete (i.e. not
continuous), it can be called a class label.
Types of learning
Supervised: given data, predict label
Unsupervised: given data, learn about
that data
RL: given data, choose action to
maximize expected long-term reward
Lex Fridman:
fridman@mit.edu
January
2018
Course 6.S191:
Intro to Deep Learning
For the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
Types of Deep Learning
[81, 165]
Supervised
Learning
Unsupervised
Learning
Semi-Supervised
Learning Reinforcement
Learning
What is a “model”?
A program that maps from features to
predicted concepts.
For the Pooch Predictor, a program
that takes as input the feature
representation of the hashtag data and
returns the predicted number of likes
that a given pupdate might receive.
What is model
publishing?
Model publishing means making the
model program available outside of
the context it was learned in,
so that it can make predictions on
data it hasn’t seen before.
making the model program
available outside of the
context it was learned in
Remember any issue
regarding model publishing
in sniffable?
Sniffable team largely skipped it in their
original implementation.
They didn’t even set up their system to
retrain the model on a regular basis.
Their next approach at implementing
model retraining also ran into difficulty,
causing their models to be out of sync
with their feature extractors.
ML systems are different
from transactional systems
Some of the pooch predictor system problems
stemmed from treating their predictive system
like a transaction business application.
An approach that relies upon strong
consistency guarantees just doesn’t work for
modern distributed systems,
Out of synch with the pervasive and intrinsic
uncertainty in a machine learning system.
ML apps are
dynamic systems
Other problems the Sniffable team
experienced had to do with not
thinking about their system in
dynamic terms.
ML systems must evolve, and they
must support parallel tracks for that
evolution through experimentation.
A Simplistic Machine
Learning System
A Reactive ML
System
ML Quality Attributes
for Reactive ML Systems
@jeffksmithjr @xdotaDevoxx
Reactive Systems
Responsive
Resilient Elastic
Message-Driven
[Most of these figures are taken from Jeff Smith’s slides and book]
Responsive
Responsive
Responsive
If a system doesn’t respond to its
users, then it’s useless.
Think of the Sniffable team causing
a massive slowdown in the Sniffable
app due to the responsiveness of
their ML system.
Resilient
Resilient
Resilient
Whether the cause is hardware, human
error, or design flaws, software always
breaks.
Providing some sort of acceptable
response even when things don’t go as
planned is a key part of ensuring that
users view a system as being responsive.
Elastic
Elastic
Elastic
Elastic systems should respond to
increases or decreases in load.
The Sniffable team saw this when
their traffic ramped up and the
Pooch Predictor system couldn’t
keep up with the load.
Message-driven
Message-Driven
Message-driven
A loosely coupled system organized
around message passing can make it easier
to detect failure or issues with load.
Moreover, a design with this trait helps
contain any of the effects of errors
isolated, rather than flaming production
issues that needed to be immediately
addressed, as they were in Pooch Predictor.
Wait, how do you build a
system that actually has
these qualities?
Tactics
Replication
Data, whether at rest or in motion, should
be redundantly stored or processed.
In the Sniffable example, there was a time
when the server that ran the model-
learning job failed, and no model was
learned.
two or more model-learning jobs
An Example:
Spark Architecture
Rather than requiring
you to always have
two pipelines
executing, Spark
gives you automatic,
fine-grained
replication so that
the system can
recover from failure.
Containment
Prevent the failure of any single component
of the system from affecting any other
component.
Docker or rkt? no, this strategy isn’t about any
one implementation. Containment can be
implemented using many different systems.
The point is to prevent the sort of cascading
failure we saw in Pooch Predictor, and to do
so at a structural level.
A Contained Model-
Serving Architecture
Remember the model and the
features were out of sync, resulting
in exceptions during model
serving??
Supervision
Identify the components that could
fail and some other component is
responsible for their lifecycles.
A Supervisory Architecture
The published models are observed by
the model supervisor. Should their
behavior ever deviate from acceptable
bounds, the supervisor would stop
sending them messages requesting
predictions. 
Model Supervisor
French
Bulldog
French
Bulldog
French
Bulldog
No one
likes
Replication
Isolation/
Containment
Supervision/
Delegation
Supervisory architecture
allows self-healing
The model supervisor could even
completely destroy a model it
knows to be bad, allowing the
system to be self-healing.
Why Turning ML
systems to reactive
Ultimately, a reactive ML system
gives you the ability to deliver value
through ever better predictions.
Strategies to make
ML reactive
Infinite Data
Ideally, with its new ML capabilities, Sniffable
is going to take off and see tons of traffic.
Some posts would have big dogs, others
small ones.
Some posts would use filters, and others
would be more natural.
Some would be rich in hashtags, and some
wouldn’t have any annotations.
Strategy 1: Laziness
Delay of execution, to separate the
composition of functions to execute
from their actual execution.
Conceive of the data flow in terms
of infinite streams instead of finite
batches.
Lazy Evaluation in
Spark
The execution will not start until an
action is triggered
Strategy 2: Pure
functions
Evaluating the function must not
result in some sort of side effect,
such as changing the state of some
variable or performing some I/O.
Functions can be passed to other
functions as arguments -> Higher
Order Functions
Uncertainty
Even before making a prediction, a ML
system must deal with the uncertainty of
the real world outside of the ML system.
For example, do sniffers using the
hashtag #adorabull mean the same thing
as sniffers using the hashtag #adorable,
or should those be viewed as different
features?
Strategy 3:
Immutable facts
Data that can simply be written once and never
modified
The use of immutable facts will allow us to
reason about uncertain views of the world at
specific points in time.
Having a complete record of all facts that
occurred over the lifetime of the system will also
enable important machine learning, like model
experimentation and automatic model validation.
Strategy 4: Possible
Worlds
Until now things
were fictional, lets
see how things are
done in
real-world At Uber
Production-ready
ML at Scale (Uber)
https://eng.uber.com/michelangelo/
Production-ready
ML at Scale (Uber)
An example of model
evaluation in practice
(feature importance)
Production-ready
ML at Scale (Uber)
Production-ready
ML at Scale (Uber)
Production-ready
ML at Scale (Uber)
Summary
ML offers a powerful toolkit for building
useful prediction systems.
Building an ML model is only part of a
larger system development process.
We went through practices in distributed
systems and software engineering in a
pragmatic approach to building real-world
ML systems.
Summary
ML can be viewed as an application, and
not just as a technique.
We learned how to incorporate ML-based
components into a larger complex system.
Reactive is a way to build sophisticated
systems, and some ML systems don’t need
to be sophisticated. 
Resources
Reading assignment: Reactive ML
book, Chapter 1

More Related Content

What's hot

Stack and its Applications : Data Structures ADT
Stack and its Applications : Data Structures ADTStack and its Applications : Data Structures ADT
Stack and its Applications : Data Structures ADT
Soumen Santra
 
Constructor and Destructors in C++
Constructor and Destructors in C++Constructor and Destructors in C++
Constructor and Destructors in C++
sandeep54552
 
Best,worst,average case .17581556 045
Best,worst,average case .17581556 045Best,worst,average case .17581556 045
Best,worst,average case .17581556 045
university of Gujrat, pakistan
 
9 subprograms
9 subprograms9 subprograms
9 subprograms
Munawar Ahmed
 
Queues
QueuesQueues
Standard Library Functions
Standard Library FunctionsStandard Library Functions
Standard Library Functions
Praveen M Jigajinni
 
Python Lab Manual
Python Lab ManualPython Lab Manual
Python Lab Manual
Bobby Murugesan
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
Saranya Natarajan
 
pointer-to-object-.pptx
pointer-to-object-.pptxpointer-to-object-.pptx
pointer-to-object-.pptx
ZaibunnisaMalik1
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First Search
Kevin Jadiya
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
Sujith Kumar
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
Hitesh Mohapatra
 
Stacks IN DATA STRUCTURES
Stacks IN DATA STRUCTURESStacks IN DATA STRUCTURES
Stacks IN DATA STRUCTURES
Sowmya Jyothi
 
Using the set operators
Using the set operatorsUsing the set operators
Using the set operators
Syed Zaid Irshad
 
Functions in C++
Functions in C++Functions in C++
Functions in C++
Mohammed Sikander
 
ArrayList in JAVA
ArrayList in JAVAArrayList in JAVA
ArrayList in JAVA
SAGARDAVE29
 
Stack Data Structure
Stack Data StructureStack Data Structure
Stack Data Structure
Afaq Mansoor Khan
 
Operator & Expression in c++
Operator & Expression in c++Operator & Expression in c++
Operator & Expression in c++bajiajugal
 
C# operators
C# operatorsC# operators

What's hot (20)

Stack and its Applications : Data Structures ADT
Stack and its Applications : Data Structures ADTStack and its Applications : Data Structures ADT
Stack and its Applications : Data Structures ADT
 
Constructor and Destructors in C++
Constructor and Destructors in C++Constructor and Destructors in C++
Constructor and Destructors in C++
 
Best,worst,average case .17581556 045
Best,worst,average case .17581556 045Best,worst,average case .17581556 045
Best,worst,average case .17581556 045
 
9 subprograms
9 subprograms9 subprograms
9 subprograms
 
Queues
QueuesQueues
Queues
 
Standard Library Functions
Standard Library FunctionsStandard Library Functions
Standard Library Functions
 
Python Lab Manual
Python Lab ManualPython Lab Manual
Python Lab Manual
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
 
pointer-to-object-.pptx
pointer-to-object-.pptxpointer-to-object-.pptx
pointer-to-object-.pptx
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First Search
 
Cursors
CursorsCursors
Cursors
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 
Stacks IN DATA STRUCTURES
Stacks IN DATA STRUCTURESStacks IN DATA STRUCTURES
Stacks IN DATA STRUCTURES
 
Using the set operators
Using the set operatorsUsing the set operators
Using the set operators
 
Functions in C++
Functions in C++Functions in C++
Functions in C++
 
ArrayList in JAVA
ArrayList in JAVAArrayList in JAVA
ArrayList in JAVA
 
Stack Data Structure
Stack Data StructureStack Data Structure
Stack Data Structure
 
Operator & Expression in c++
Operator & Expression in c++Operator & Expression in c++
Operator & Expression in c++
 
C# operators
C# operatorsC# operators
C# operators
 

Similar to Production-Ready Machine Learning for the Software Architect

Creating a Use Case
Creating a Use Case                                               Creating a Use Case
Creating a Use Case
CruzIbarra161
 
Different Methodologies For Testing Web Application Testing
Different Methodologies For Testing Web Application TestingDifferent Methodologies For Testing Web Application Testing
Different Methodologies For Testing Web Application Testing
Rachel Davis
 
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
Agile Testing Alliance
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
Johnson Ubah
 
Software_Engineering_Presentation (1).pptx
Software_Engineering_Presentation (1).pptxSoftware_Engineering_Presentation (1).pptx
Software_Engineering_Presentation (1).pptx
ArifaMehreen1
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
ShareDocView.com
 
Sdlc
SdlcSdlc
Sdlc
SdlcSdlc
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
LokeshKumarReddy8
 
502 Object Oriented Analysis and Design.pdf
502 Object Oriented Analysis and Design.pdf502 Object Oriented Analysis and Design.pdf
502 Object Oriented Analysis and Design.pdf
PradeepPandey506579
 
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
DataScienceConferenc1
 
1 (1)
1 (1)1 (1)
Slides from "Taking an Holistic Approach to Product Quality"
Slides from "Taking an Holistic Approach to Product Quality"Slides from "Taking an Holistic Approach to Product Quality"
Slides from "Taking an Holistic Approach to Product Quality"
Peter Marshall
 
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
IBM France Lab
 
Bt0081 software engineering
Bt0081 software engineeringBt0081 software engineering
Bt0081 software engineering
Techglyphs
 
WELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptxWELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptx
9D38SHIDHANTMITTAL
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEbutest
 
Building an Information System
Building an Information SystemBuilding an Information System
Building an Information System
Jo Balucanag - Bitonio
 
Teacher training material
Teacher training materialTeacher training material
Teacher training material
Vikram Parmar
 

Similar to Production-Ready Machine Learning for the Software Architect (20)

Creating a Use Case
Creating a Use Case                                               Creating a Use Case
Creating a Use Case
 
Different Methodologies For Testing Web Application Testing
Different Methodologies For Testing Web Application TestingDifferent Methodologies For Testing Web Application Testing
Different Methodologies For Testing Web Application Testing
 
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Software_Engineering_Presentation (1).pptx
Software_Engineering_Presentation (1).pptxSoftware_Engineering_Presentation (1).pptx
Software_Engineering_Presentation (1).pptx
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
Sdlc
SdlcSdlc
Sdlc
 
Sdlc
SdlcSdlc
Sdlc
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
 
502 Object Oriented Analysis and Design.pdf
502 Object Oriented Analysis and Design.pdf502 Object Oriented Analysis and Design.pdf
502 Object Oriented Analysis and Design.pdf
 
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
 
1 (1)
1 (1)1 (1)
1 (1)
 
Slides from "Taking an Holistic Approach to Product Quality"
Slides from "Taking an Holistic Approach to Product Quality"Slides from "Taking an Holistic Approach to Product Quality"
Slides from "Taking an Holistic Approach to Product Quality"
 
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
IBM Paris Bluemix Meetup #12 - Ecole 42 - 9 décembre 2015
 
Bt0081 software engineering
Bt0081 software engineeringBt0081 software engineering
Bt0081 software engineering
 
WELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptxWELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptx
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
Building an Information System
Building an Information SystemBuilding an Information System
Building an Information System
 
Teacher training material
Teacher training materialTeacher training material
Teacher training material
 
FINAL_40058464
FINAL_40058464FINAL_40058464
FINAL_40058464
 

More from Pooyan Jamshidi

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Pooyan Jamshidi
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
Pooyan Jamshidi
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Pooyan Jamshidi
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Pooyan Jamshidi
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning Systems
Pooyan Jamshidi
 
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Pooyan Jamshidi
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
Pooyan Jamshidi
 
Learning to Sample
Learning to SampleLearning to Sample
Learning to Sample
Pooyan Jamshidi
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
Pooyan Jamshidi
 
Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable Software
Pooyan Jamshidi
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
Pooyan Jamshidi
 
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory AnalysisTransfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
Pooyan Jamshidi
 
Architecting for Scale
Architecting for ScaleArchitecting for Scale
Architecting for Scale
Pooyan Jamshidi
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
Pooyan Jamshidi
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic Software
Pooyan Jamshidi
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Pooyan Jamshidi
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic Systems
Pooyan Jamshidi
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
Pooyan Jamshidi
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
Pooyan Jamshidi
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization Tool
Pooyan Jamshidi
 

More from Pooyan Jamshidi (20)

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning Systems
 
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Learning to Sample
Learning to SampleLearning to Sample
Learning to Sample
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
 
Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable Software
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
 
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory AnalysisTransfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
 
Architecting for Scale
Architecting for ScaleArchitecting for Scale
Architecting for Scale
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic Software
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic Systems
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization Tool
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Production-Ready Machine Learning for the Software Architect

  • 1. Production-Ready Machine Learning for the Software Architect Pooyan Jamshidi CMU Guest lecture Architectures for Software Systems, April 2018
  • 2. Learning Goals Understand how to build a system that can put the power of machine learning to use. Understand how to incorporate ML-based components into a larger system. Understand the principles that govern these systems, both as software and as predictive systems.
  • 3. Learning Goals Understand challenges of building ML systems. Understand strategies of making ML systems reactive. I won’t teach you specific techniques of machine learning, it simply does not fit in one lecture course.
  • 4. The topic of this lecture belongs to here! I also hang around here Software Arch. Machine Learning Distributed Systems
  • 5. ML as a technique vs ML as system in production
  • 6. After few years, PyTorch just became production ready
  • 7. Disclaimer I borrowed materials from: Jeff Smith’s book -> Uber blog Spark Documents MIT course Intro to Deep Learning TF document
  • 8. Who am I? Postdoc at CMU, Fall’18: Prof. At USC My research: SE+Systems+ML Software dev/eng for 7 years 3+ yrs as architect More info: https://pooyanjamshidi.github.io
  • 9. Please interrupt and ask your questions at anytime!
  • 10. What is the grand difference? Software systems are built from sets of software components ML(-based) systems are build from sets of software components capable of learning from data and making predictions about the future
  • 11. Case study A startup that tries to build a machine learning system from the ground up and finds it very, very hard!
  • 12. Sniffable Sniffable is the Instagram for dogs. Dog owners post pictures of their dogs, and other dog owners (sniffers!) like, share, and comment on those pictures.
  • 13. A New Feature for Sniffable Sniffers want to promote their specific dog Sniffers want their dogs to become a celebrity! So, they want a new tool to make their pupdates, more viral.
  • 14. Pooch Predictor A new feature to figure out which picture would get the biggest response on Sniffable A new feature to predict the number of likes a given pupdate might get, based on the hashtags used.
  • 15. Why did the startup come up with Pooch Predictor? It would engage the dog owners, help them create viral content, and grow the Sniffable network as a whole
  • 16. Pooch Predictor 1.0 Architecture What kind of architectural style is this?
  • 17. What happened at the organizational level Data Scientist Software Engineer Prototype
  • 18. Something seems wrong! Issue: Predictions weren’t changing much over a period of time despite adding more data to the system
  • 19. Misunderstanding between stakeholders What is the retraining What?? Data Scientist Software Engineer
  • 21. System issues appeared Pooch Predictor wasn’t very reliable Change in the database -> the queries would fail High load on the server -> the modeling job would fail More and more as the size of the social network increased
  • 22. Failures were hard to detect Failures were hard to detect without building up more sophisticated monitoring infrastructure. There wasn’t much that could be done other than restarting the job.
  • 23. Modeling issues Architect realized that some of the features weren’t being correctly extracted from the raw data. It was also really hard to understand how a change to the features would impact modeling performance.
  • 24. A major issue happened For a couple of weeks, their interaction rates steadily trended down. For users outside of the US, Pooch Predictor would always predict a negative number of likes!! It was a problem with the modeling system’s location-based features. The data scientist and engineer paired to fix the issue 
  • 25. Things went from bad to worse! Appeared when data scientist implementing more feature-extraction functionality to hopefully improve modeling quality. The app slowed down dramatically. New functionality caused exceptions to be thrown on the server, which led to pupdates being dropped. 
  • 26. What was the reason? Sending the data from the app back to server took way too long to maintain reasonable responsiveness. The server would throw an exception any time the prediction functionality saw any of the new features.
  • 27. So what they did?? After understanding where things had gone wrong, the team quickly rolled back all of the new functionality and restored the app to a normal operational state. They held a retrospective to build a better system…
  • 28. Building a better system (operation) Sniffle must remain responsive, regardless of any problems with the predictive system. The predictive component needs to be less tightly coupled to the rest of the systems. The predictive system needs to behave predictably regardless of high load or errors in the system itself.
  • 29. Building a better system (development) It should be easier for different developers to make changes to the predictive system. The code needs to use different programming idioms that ensure better performance.
  • 30. Building a better system (ML) The predictive system must measure its modeling performance better. The predictive system should support evolution and change. The predictive system should support online experimentation. Easy for developers to supervise the predictive system and rapidly correct any rogue behavior.

  • 31. Sniffable team missed something big, right? They built what initially looked like a useful ML system that added value to their core product. Production issues with their ML system frequently pulled the team away from work on improvements to the capability of the system. Even though they had a bunch of smart people to develop model to predict the dynamics of dog-based social networking, their system repeatedly failed at its mission.
  • 32. Building ML systems is hard! The data scientist knew how to do ML. Pooch Predictor totally worked on his laptop. But the data scientist was not thinking of ML as a system - but as a technique! Pooch Predictor was a failure both as a predictive system and as a software.
  • 33. To build it right: Reactive ML Reactive ML = ML as a system
  • 35. Starting from the beginning A ML system must collect data from the outside world. In the Pooch Predictor example, the team was trying to skip this concern by using the data that their application already had. Obviously, this approach was quick, but it tightly couples the Sniffable application data model to the Pooch Predictor data model.
  • 36. Derived representations of the raw data = instances
  • 37. What is “feature”? Features are meaningful data points derived from raw data related to the entity being predicted on. A Sniffable example of a “feature” would be ?? the number of friends a given dog has. 
  • 38. What is the feature vector? The feature values for a given instance are collectively called a feature vector.
  • 39. What is a concept? A concept is the thing that the system is trying to predict. In the context of Pooch Predictor, a concept would be the number of likes a given post receives. When a concept is discrete (i.e. not continuous), it can be called a class label.
  • 40. Types of learning Supervised: given data, predict label Unsupervised: given data, learn about that data RL: given data, choose action to maximize expected long-term reward Lex Fridman: fridman@mit.edu January 2018 Course 6.S191: Intro to Deep Learning For the full updated list of references visit: https://selfdrivingcars.mit.edu/references Types of Deep Learning [81, 165] Supervised Learning Unsupervised Learning Semi-Supervised Learning Reinforcement Learning
  • 41. What is a “model”? A program that maps from features to predicted concepts. For the Pooch Predictor, a program that takes as input the feature representation of the hashtag data and returns the predicted number of likes that a given pupdate might receive.
  • 42. What is model publishing? Model publishing means making the model program available outside of the context it was learned in, so that it can make predictions on data it hasn’t seen before.
  • 43. making the model program available outside of the context it was learned in
  • 44. Remember any issue regarding model publishing in sniffable? Sniffable team largely skipped it in their original implementation. They didn’t even set up their system to retrain the model on a regular basis. Their next approach at implementing model retraining also ran into difficulty, causing their models to be out of sync with their feature extractors.
  • 45. ML systems are different from transactional systems Some of the pooch predictor system problems stemmed from treating their predictive system like a transaction business application. An approach that relies upon strong consistency guarantees just doesn’t work for modern distributed systems, Out of synch with the pervasive and intrinsic uncertainty in a machine learning system.
  • 46. ML apps are dynamic systems Other problems the Sniffable team experienced had to do with not thinking about their system in dynamic terms. ML systems must evolve, and they must support parallel tracks for that evolution through experimentation.
  • 49.
  • 50. ML Quality Attributes for Reactive ML Systems @jeffksmithjr @xdotaDevoxx Reactive Systems Responsive Resilient Elastic Message-Driven [Most of these figures are taken from Jeff Smith’s slides and book]
  • 52. Responsive If a system doesn’t respond to its users, then it’s useless. Think of the Sniffable team causing a massive slowdown in the Sniffable app due to the responsiveness of their ML system.
  • 54. Resilient Whether the cause is hardware, human error, or design flaws, software always breaks. Providing some sort of acceptable response even when things don’t go as planned is a key part of ensuring that users view a system as being responsive.
  • 56. Elastic Elastic systems should respond to increases or decreases in load. The Sniffable team saw this when their traffic ramped up and the Pooch Predictor system couldn’t keep up with the load.
  • 58. Message-driven A loosely coupled system organized around message passing can make it easier to detect failure or issues with load. Moreover, a design with this trait helps contain any of the effects of errors isolated, rather than flaming production issues that needed to be immediately addressed, as they were in Pooch Predictor.
  • 59. Wait, how do you build a system that actually has these qualities? Tactics
  • 60. Replication Data, whether at rest or in motion, should be redundantly stored or processed. In the Sniffable example, there was a time when the server that ran the model- learning job failed, and no model was learned. two or more model-learning jobs
  • 61. An Example: Spark Architecture Rather than requiring you to always have two pipelines executing, Spark gives you automatic, fine-grained replication so that the system can recover from failure.
  • 62. Containment Prevent the failure of any single component of the system from affecting any other component. Docker or rkt? no, this strategy isn’t about any one implementation. Containment can be implemented using many different systems. The point is to prevent the sort of cascading failure we saw in Pooch Predictor, and to do so at a structural level.
  • 63. A Contained Model- Serving Architecture Remember the model and the features were out of sync, resulting in exceptions during model serving??
  • 64. Supervision Identify the components that could fail and some other component is responsible for their lifecycles.
  • 65. A Supervisory Architecture The published models are observed by the model supervisor. Should their behavior ever deviate from acceptable bounds, the supervisor would stop sending them messages requesting predictions. 
  • 67. Supervisory architecture allows self-healing The model supervisor could even completely destroy a model it knows to be bad, allowing the system to be self-healing.
  • 68. Why Turning ML systems to reactive Ultimately, a reactive ML system gives you the ability to deliver value through ever better predictions.
  • 70. Infinite Data Ideally, with its new ML capabilities, Sniffable is going to take off and see tons of traffic. Some posts would have big dogs, others small ones. Some posts would use filters, and others would be more natural. Some would be rich in hashtags, and some wouldn’t have any annotations.
  • 71. Strategy 1: Laziness Delay of execution, to separate the composition of functions to execute from their actual execution. Conceive of the data flow in terms of infinite streams instead of finite batches.
  • 72. Lazy Evaluation in Spark The execution will not start until an action is triggered
  • 73. Strategy 2: Pure functions Evaluating the function must not result in some sort of side effect, such as changing the state of some variable or performing some I/O. Functions can be passed to other functions as arguments -> Higher Order Functions
  • 74. Uncertainty Even before making a prediction, a ML system must deal with the uncertainty of the real world outside of the ML system. For example, do sniffers using the hashtag #adorabull mean the same thing as sniffers using the hashtag #adorable, or should those be viewed as different features?
  • 75. Strategy 3: Immutable facts Data that can simply be written once and never modified The use of immutable facts will allow us to reason about uncertain views of the world at specific points in time. Having a complete record of all facts that occurred over the lifetime of the system will also enable important machine learning, like model experimentation and automatic model validation.
  • 77. Until now things were fictional, lets see how things are done in real-world At Uber
  • 78. Production-ready ML at Scale (Uber) https://eng.uber.com/michelangelo/
  • 80. An example of model evaluation in practice (feature importance)
  • 84. Summary ML offers a powerful toolkit for building useful prediction systems. Building an ML model is only part of a larger system development process. We went through practices in distributed systems and software engineering in a pragmatic approach to building real-world ML systems.
  • 85. Summary ML can be viewed as an application, and not just as a technique. We learned how to incorporate ML-based components into a larger complex system. Reactive is a way to build sophisticated systems, and some ML systems don’t need to be sophisticated.