Machine Learning
Integrating Capabilities into your Team
WELCOME
Cameron Vetter
I have 20 years of experience using Microsoft tools and technologies to develop
software. I have experience in many roles including Development, Architecture,
Infrastructure, Management, and Leadership roles. I've worked for some of the largest
companies in the world and for small local companies getting a breadth of experience
in different Corporate Cultures. Currently, I work at SafeNet Consulting, where I get to
do what I love... Architect, Design, and Develop great software! I currently focus on
Microservices, SOA, Azure, Cognitive Toolkit, and Kubernetes.
Principal Cloud Consultant
A Partner to Advise and Support
About SafeNet
Consulting
SafeNet specializes in being partners in your
success. We currently focus on Custom Application
Development, Cloud Consulting Services,
Data & Analytics, and User Experience Strategy.
Introduction
Machine Learning Definition
Tooling and Standards
Integration and Deployment
Versioning
Bias and Ethics
My Recommendations
Question & Answer
Agenda
Machine Learning Definition
Wikipedia’s
Definition
Machine learning (ML) is the scientific study of algorithms and
statistical models that computer systems use to effectively perform
a specific task without using explicit instructions, relying on
patterns and inference instead. Machine learning algorithms build a
mathematical model of sample data, known as "training data", in
order to make predictions or decisions without being explicitly
programmed to perform the task
Patterns and Inference
Relies on Patterns and Inference not explicit
instructions or Algorithms
Model Based
The training is used to train a model that is the core of
the Machine Learning Model
Not Just Glorified Statistics
We borrow many terms from statistics such as
biases, weights, models, and regressions, but that
does not make this segment of statistics, ML is a
segment of computer algorithms
Training Data
Uses some form of training data to learn from to
make its decisions
Tooling and Standards
Prebuilt Services
/ Machine Learning Based Web Services /
• Vision
• Speech
• Language
• Knowledge
• Search
24 Different Services available via REST API’s and SDK’s
No Coding Services
/ Custom Machine Learning Services without the Code /
• Targets Data Scientists and Data
Engineers
• Poorly suited to Developers
• Creates REST Services
High Level Libraries
/ Neural Network Libraries /
• Targets Developers
• Most are Python Based
• Abstracts away complexity of ML
Low Level Libraries
/ Neural Network Libraries /
• Targets Machine Learning Devs
• Most are Python Based
• Tools provided to allow easy
implementation of algorithms
Neural Network Exchange Format Open Neural Network Exchange Format
Standard
/ Not Much Competition /
Integration and Deployment
REST Services
Isolate your Machine Learning code from
other code creating appropriate
boundaries and allowing it to be
independently released.
Encapsulated
Consumers should not have to
understand the machine learning,
present them a simple RESTful interface
like any other service designed for
consumption.
Simple
Unless you can’t isolate the code. When
embedding in an IOT Edge device your
Machine Learning can not be isolated,
but can still have logical boundaries,
such as using Docker Containers.
Integrated
N-Tier
Add this service into your business logic tier. Can
also fit well into this architecture.
Others
Onion, Event Driven, CQRS, ESB, Spaghetti, etc…
Services / MicroServices
Fits well with my recommendation to wrap this in a
RESTful service, very easy to turn that into a
Service
Architecture
IAAS / PAAS
Platform As A Service is a great choice, but
Infastructure As A Service can work well also.
Containers
Fits well in a container, consider keeping your model
in one container and your service in a separate
container.
Edge Devices
Embed your Machine Learning into IOT devices,
laptops, web pages, or any other devices on the
Edge.
Containers
Training = N/A
Deployment = Azure Kubernetes Service
PAAS
Training = Azure Notebooks
Deployment = Azure Web Apps
IAAS
Training = Azure Data Science VM’s
Deployment = Azure Data Science VM’s
Examples
Azure tools that can be applied to different deployment scenarios
Versioning
21
Design for Replacement
Your team will want to iterate often, especially in the beginning. Plan your versioning strategy as though you are
swapping out you ML training daily. Don’t tightly couple your ML Service to your ML Training!
A/B
Testing
Assume you will need A/ B Testing. ML Models
that perfectly during testing can fall apart in the
real world!
Which Performs Better?
What Do I need to Version?
Standard Source Control procedures and labeling.
Cloud Storage. Each Data Set Version should be
independently accessible.
Serialize your model and check it into its own
Source Control Repo. Label Appropriately!
Serialize your training and scalars and check them
into Source Control with the Model.
Code
Data
Model
Training
Interface
Compatibility
• Standardize on Model Inputs
and Outputs.
• Consider Model Interface
changes to be a breaking
change.
• Breaking changes will require
versioning the service.
Bias and Other Risks
Bias is an error from erroneous assumptions
in the learning algorithm. High bias can cause
an algorithm to miss the relevant relations
between features and target outputs. This is
referred to as underfitting.
Bias
Source: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
Variance is an error from sensitivity to
small fluctuations in the training set. High
variance can cause an algorithm to
model the random noise in the training
data, rather than the intended outputs.
This is referred to as overfitting.
Variance
Source: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
Real world data is messy. Expect to have too
little data potentially causing Bias + Variance,
or missing data points in a data set. Data is
the hard part of machine learning.
Incomplete Data
My Recommendations
Don’t start by hiring a Data Scientist. There are
many tools that can help people with less expensive
skill sets. Developers, Data Analyst’s, and Data
Architects will be able to help you get started and
identify when you need a Data Scientist.
Data
Scientists
Start with off the shelf prebuilt
services. Select problems that can fit
this solutions, and get some quick
wins.
Prebuilt ML Services
Use some Neural Networks available
on GitHub. Allow your team to get
used to the language / tool chain.
Prebuilt Neural Networks
Create a Proof of Concept within your
domain leveraging High Level
Libraries to build your own Neural
Network.
POC with High Level Libraries
Use these high level libraries to
create full production systems. Don’t
bother with the low level libraries,
unless a specific need forces you to.
Full Solutions
Start Small
/ Work smart and save money /
Source: https://bit.ly/netflixjupyter/
Prebuilt
Strategy
I recommend using the Façade Design Pattern to
ensure that you are loosely coupled to the prebuilt
service you are consuming. Don’t get tightly tied into a
platform, especially when you are experimenting!
• Pick a very popular library like Keras
• Integrate into a REST Service
• Plug the REST Service into your Current
Architecture and Current Infrastructure
High Level
Library
Strategy
ONNX is a open format to represent deep
learning models. With ONNX, AI
developers can more easily move models
between state-of-the-art tools and choose
the combination that is best for them.
ONNX is developed and supported by a
community of partners.
ONNX.AI
Be Flexible
• Don’t buy a ton of GPU’s and Servers!
• Leverage the Cloud To Train
• Use Batch AI and Data Science VM’s in
Azure
Training
• Reach consensus on your versioning
strategy up front.
• Use separate containers for your ML
Service and model
• Plan for A/B Testing
• Have the ability to instantly roll forward or
backward
Up Front
Versioning
Strategy
Use a uniform versioning convention that
includes time and date.
For Example:
(datetime)-(model name)-
(model version)- (training script id).json
Versioning
Scheme
• Tough problem to solve.
• Your data is much closer to code.
• Check out off the shelf tools: Data Version
Control, Pachyderm, CookieCutter Data
Science, Luigi
Data
Version
Control
Source: https://shuaiw.github.io/2017/07/30/versioning-data-science.html
www.cameronvetter.com
Any Questions?
@poshporcupine Linkedin.com/in/cameronvetter

Integrating Machine Learning Capabilities into your team

  • 1.
  • 2.
  • 3.
    Cameron Vetter I have20 years of experience using Microsoft tools and technologies to develop software. I have experience in many roles including Development, Architecture, Infrastructure, Management, and Leadership roles. I've worked for some of the largest companies in the world and for small local companies getting a breadth of experience in different Corporate Cultures. Currently, I work at SafeNet Consulting, where I get to do what I love... Architect, Design, and Develop great software! I currently focus on Microservices, SOA, Azure, Cognitive Toolkit, and Kubernetes. Principal Cloud Consultant
  • 4.
    A Partner toAdvise and Support About SafeNet Consulting SafeNet specializes in being partners in your success. We currently focus on Custom Application Development, Cloud Consulting Services, Data & Analytics, and User Experience Strategy.
  • 5.
    Introduction Machine Learning Definition Toolingand Standards Integration and Deployment Versioning Bias and Ethics My Recommendations Question & Answer Agenda
  • 6.
  • 7.
    Wikipedia’s Definition Machine learning (ML)is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. Machine learning algorithms build a mathematical model of sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task
  • 8.
    Patterns and Inference Relieson Patterns and Inference not explicit instructions or Algorithms Model Based The training is used to train a model that is the core of the Machine Learning Model Not Just Glorified Statistics We borrow many terms from statistics such as biases, weights, models, and regressions, but that does not make this segment of statistics, ML is a segment of computer algorithms Training Data Uses some form of training data to learn from to make its decisions
  • 9.
  • 10.
    Prebuilt Services / MachineLearning Based Web Services / • Vision • Speech • Language • Knowledge • Search 24 Different Services available via REST API’s and SDK’s
  • 11.
    No Coding Services /Custom Machine Learning Services without the Code / • Targets Data Scientists and Data Engineers • Poorly suited to Developers • Creates REST Services
  • 12.
    High Level Libraries /Neural Network Libraries / • Targets Developers • Most are Python Based • Abstracts away complexity of ML
  • 13.
    Low Level Libraries /Neural Network Libraries / • Targets Machine Learning Devs • Most are Python Based • Tools provided to allow easy implementation of algorithms
  • 14.
    Neural Network ExchangeFormat Open Neural Network Exchange Format Standard / Not Much Competition /
  • 15.
  • 16.
    REST Services Isolate yourMachine Learning code from other code creating appropriate boundaries and allowing it to be independently released. Encapsulated Consumers should not have to understand the machine learning, present them a simple RESTful interface like any other service designed for consumption. Simple Unless you can’t isolate the code. When embedding in an IOT Edge device your Machine Learning can not be isolated, but can still have logical boundaries, such as using Docker Containers. Integrated
  • 17.
    N-Tier Add this serviceinto your business logic tier. Can also fit well into this architecture. Others Onion, Event Driven, CQRS, ESB, Spaghetti, etc… Services / MicroServices Fits well with my recommendation to wrap this in a RESTful service, very easy to turn that into a Service Architecture
  • 18.
    IAAS / PAAS PlatformAs A Service is a great choice, but Infastructure As A Service can work well also. Containers Fits well in a container, consider keeping your model in one container and your service in a separate container. Edge Devices Embed your Machine Learning into IOT devices, laptops, web pages, or any other devices on the Edge.
  • 19.
    Containers Training = N/A Deployment= Azure Kubernetes Service PAAS Training = Azure Notebooks Deployment = Azure Web Apps IAAS Training = Azure Data Science VM’s Deployment = Azure Data Science VM’s Examples Azure tools that can be applied to different deployment scenarios
  • 20.
  • 21.
    21 Design for Replacement Yourteam will want to iterate often, especially in the beginning. Plan your versioning strategy as though you are swapping out you ML training daily. Don’t tightly couple your ML Service to your ML Training!
  • 22.
    A/B Testing Assume you willneed A/ B Testing. ML Models that perfectly during testing can fall apart in the real world! Which Performs Better?
  • 23.
    What Do Ineed to Version? Standard Source Control procedures and labeling. Cloud Storage. Each Data Set Version should be independently accessible. Serialize your model and check it into its own Source Control Repo. Label Appropriately! Serialize your training and scalars and check them into Source Control with the Model. Code Data Model Training
  • 24.
    Interface Compatibility • Standardize onModel Inputs and Outputs. • Consider Model Interface changes to be a breaking change. • Breaking changes will require versioning the service.
  • 25.
  • 26.
    Bias is anerror from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs. This is referred to as underfitting. Bias Source: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
  • 27.
    Variance is anerror from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs. This is referred to as overfitting. Variance Source: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
  • 28.
    Real world datais messy. Expect to have too little data potentially causing Bias + Variance, or missing data points in a data set. Data is the hard part of machine learning. Incomplete Data
  • 29.
  • 30.
    Don’t start byhiring a Data Scientist. There are many tools that can help people with less expensive skill sets. Developers, Data Analyst’s, and Data Architects will be able to help you get started and identify when you need a Data Scientist. Data Scientists
  • 31.
    Start with offthe shelf prebuilt services. Select problems that can fit this solutions, and get some quick wins. Prebuilt ML Services Use some Neural Networks available on GitHub. Allow your team to get used to the language / tool chain. Prebuilt Neural Networks Create a Proof of Concept within your domain leveraging High Level Libraries to build your own Neural Network. POC with High Level Libraries Use these high level libraries to create full production systems. Don’t bother with the low level libraries, unless a specific need forces you to. Full Solutions Start Small / Work smart and save money / Source: https://bit.ly/netflixjupyter/
  • 32.
    Prebuilt Strategy I recommend usingthe Façade Design Pattern to ensure that you are loosely coupled to the prebuilt service you are consuming. Don’t get tightly tied into a platform, especially when you are experimenting!
  • 33.
    • Pick avery popular library like Keras • Integrate into a REST Service • Plug the REST Service into your Current Architecture and Current Infrastructure High Level Library Strategy
  • 34.
    ONNX is aopen format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners. ONNX.AI Be Flexible
  • 35.
    • Don’t buya ton of GPU’s and Servers! • Leverage the Cloud To Train • Use Batch AI and Data Science VM’s in Azure Training
  • 36.
    • Reach consensuson your versioning strategy up front. • Use separate containers for your ML Service and model • Plan for A/B Testing • Have the ability to instantly roll forward or backward Up Front Versioning Strategy
  • 37.
    Use a uniformversioning convention that includes time and date. For Example: (datetime)-(model name)- (model version)- (training script id).json Versioning Scheme
  • 38.
    • Tough problemto solve. • Your data is much closer to code. • Check out off the shelf tools: Data Version Control, Pachyderm, CookieCutter Data Science, Luigi Data Version Control Source: https://shuaiw.github.io/2017/07/30/versioning-data-science.html
  • 39.