SlideShare a Scribd company logo
Models in Minutes not Months:
Data Science as Microservices
Sarah Aerni, PhD
saerni@salesforce.com
@itweetsarah
​Einstein Platform
LIVE DEMO
Agenda
​BUILDING AI APPS: Perspective Of A Data Scientist
• Journey to building your first model
• Barriers to production along the way
​DEPLOYING MODELS IN PRODUCTION: Built For Reuse
• Where engineering and applications meet AI
• DevOps in Data Science – monitoring, alerting and iterating
​AUTO MACHINE LEARNING: Machine Learning Pipelines as a Collection of Microservices
• Create reusable ML pipeline code for multiple applications customers
• Data Scientists focus on exploration, validation and adding new apps and models
ENABLING DATA SCIENCE
​A DATA SCIENTISTS VIEW OF BUILDING MODELS
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
A data scientist’s view of
the journey to building
models
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
A data scientist’s view of
the journey to building
models
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
#Leads
Created Date
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Engineer Features
Empty fields
One-hot encoding (pivoting)
Email domain of a user
Business titles of a user
Historical spend
Email-Company Name Similarity
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy​>>> from sklearn import svm
>>> from numpy import loadtxt as l, random as r
>>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",")
>>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False)
>>> X, y = pls[-testSet,:-1], pls[-testSet:,-1]
>>> clf = svm.SVC()
>>> clf.fit(X,y)
SVC(C=1.0, cache_size=200, class_weight=None,
coef0=0.0,decision_function_shape=None, degree=3,
gamma='auto', kernel='rbf', max_iter=-1,
tol=0.001, verbose=False)
>>> clf.score(pls[testSet,:-1],pls[testSet,-1])
0.88571428571428568
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Geographies
#Leads
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Fresh Data Input
Delivery of
Predictions
Bringing a Model to Production Requires a Team
​Applications deliver predictions for
customer consumption
​Predictions are produced by the
models live in production
​Pipelines deliver the data for modeling
and scoring at an appropriate latency
​Monitoring systems allow us to check
the health of the models, data,
pipelines and app
Source: Salesforce Customer Relationship Survey conducted 2014-2016 among 10,500+ customers randomly selected. Response sizes per question vary.
Data Engineers
Provide data access and management
capabilities for data scientists
Set up and monitor data pipelines
Improve performance of data
processing pipelines
Front-End Developers
Build customer-facing UI
Application instrumentation and
logging
​Data Scientists
​Continue evaluating models
​Monitor for anomalies and degradation
​Iteratively improve models in production
Bringing a Model to Production Requires a Team
Platform Engineers
Machine resource management
Alerting and monitoring
Product Managers
Gather requirements & feedback
Provide business context
Supporting a Model in Production is Complex
Only a small fraction of real-world ML systems is a composed of ML code, as
shown by the small black box in the middle. The required surrounding
infrastructure is fast and complex.
D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing
Systems (NIPS). 2015
MODELS IN PRODUDCTION
​WHAT IT TAKES TO DEPLOY AN AI-POWERED
APPLICATION
Supporting Models in Production is Mostly NOT AI
Only a small fraction of real-world ML systems
is a composed of ML code, as shown by the
small black box in the middle. The required
surrounding infrastructure is fast and
complex.
Adapted from D. Sculley, et al. Hidden technical debt in machine
learning systems. In Neural Information Processing Systems
(NIPS). 2015
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Data
Sources
…
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Admin / Authentication Service
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor ServiceDatastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI
Why Data Services are Critical
…
DataConnector
Data Hub
Object Store in
Blob Storage
Catalog
AccessControl
Data Registry
Applications
App
Data
Prov
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Preparator
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Preparator
Data
Pusher
CI Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Monitoring your AI’s health like any other app
​Pipelines, Model Performance, Scores – Invest your time where it is needed!
Sample Dashboard on Simulated Data
​Model Performance at Evaluation​Distribution of Scores at Evaluation
​105,874
Scores Written Per
Hour(1 day moving
avg)
​0.86
Evaluation auROC
​Total Number of Scores Written Per Hour
​150,000
​
100,000
​
50,000
​
0
​Total Number of Scores Written Per Week
​10,000,000
​
100,000
​
100
​
0
​PercentofTotalPredictionsin
Bucket
​Prediction Probability Bucket
​TotalPredictionCount
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration Tool
Why Data Services are Critical
…
DataConnector
Data Hub
Object Store in
Blob Storage
Catalog
AccessControl
Data Registry
Applications
App
Data
Prov
Applications
App 1
App 2
App 3
Prov
Prov
Prov
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
Scheduler
Model
Management
Service
Control System
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration Tool
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
Scheduler
Model
Management
Service
Control System
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration ToolMicroservice architecture
Customizable model-evaluation &
monitoring dashboards
Scheduling and workflow
management
In-platform secured
experimentation and exploration
Data Scientists focus their
efforts on modeling and
evaluating results
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Why Stop at Microservices for Supporting Your ML Code?
Why stop here?
Your ML code can also be just a
collection of microservices!
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
Auto Machine Learning
​Building reusable ML code
Leveraging Platform Services to Easily Deploy 1000s of Apps
Data Scientists on App #1
Leveraging Platform Services to Easily Deploy 1000s of Apps
Data Scientists on App #2Data Scientists on App #1
Let’s Add a Third App
Data Scientists on App #2 Data Scientists on App #3Data Scientists on App #1
How This Process Would Look in Salesforce
150,000 customers
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
Einstein’sNewApproachtoAI
Democratizing AI for Everyone
Classical
Approach
Data
Sampling
Feature
Selection
Model
Selection
Score
Calibration
Integrate to
Application
Artificial
Intelligence
Einstein
Auto-ML
Data already prepped
Models automatically built
Predictions delivered in context
AI for CRM
Discover
Predict
Recommend
Automate
Numerical BucketsCategorical Variables Text Fields
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
Categorical Variables
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
1 0 0
1 0 0
0 0 1
0 0 1
0 1 0
0 0 1
0 0 0
0 1 0
Text Fields
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
Word Count
Word Count (no
stop words)
Is English Sentiment
4 2 1 1
6 3 1 1
9 4 0 0
6 4 0 -1
7 3 1 0
5 1 1 0
7 3 1 0
Numerical Buckets
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
What Now? How autoML can choose your model
​>>> from sklearn import svm
>>> from numpy import loadtxt as l, random as r
>>> clf = svm.SVC()
>>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",")
>>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False)
>>> X, y = pls[-testSet,:-1], pls[-testSet:,-1]
>>> clf.fit(X,y)
SVC(C=1.0, cache_size=200, class_weight=None,
coef0=0.0,decision_function_shape=None, degree=3,
gamma='auto', kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
>>> clf.score(pls[testSet,:-1],pls[testSet,-1])
0.88571428571428568
Should we try other model forms?Should we try other model forms?
Features?
Should we try other model forms?
Features?
Kernels or hyperparameters?
Each use case will have its own
model and features to use. We
enable building separate models
and features with 1 code base
using OP
Model 1
83% accuracy
Model 3
73% accuracy
Model 4
89% accuracy
…
Model 1 Model 2
Model 3 Model 4
…
MODEL GENERATION MODEL TESTINGCUSTOMER A
Model 2
91% accuracy
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
22 30 45 23 60
A A B A C
M F F M M
Y N N Y Y
A tournament of models!
A tournament of models!
…
Model 1 Model 2
Model 3 Model 4
…
CUSTOMER B
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
34 22 66 58 41
B A C C B
M M F M F
Y Y N N Y
Model 1
Model 3
Model 4
Customer B
Model 2
Customer A
MODEL GENERATION MODEL TESTINGCUSTOMER A
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
22 30 45 23 60
A A B A C
M F F M M
Y N N Y Y
Deploy Monitors, Monitor, Repeat!
Sample Dashboard on Simulated Data
​134
Models in
Production
​215
Models Trained
(curr.month)
​98.51%
Models with Above
Chance Performance
​35,573,664
Predictions Written
Per Day (7 day avg)
​8
Experiments Run this
Week
Deploy Monitors, Monitor, Repeat!
​Pipelines, Model Performance, Scores – Invest your time where it is needed!
Sample Dashboard on Simulated Data
​Model Performance at Evaluation​Distribution of Scores at Evaluation
​105,874
Scores Written Per
Hour(1 day moving
avg)
​0.86
Evaluation auROC
​Total Number of Scores Written Per Hour
​150,000
​
100,000
​
50,000
​
0
​Total Number of Scores Written Per Week
​10,000,000
​
100,000
​
100
​
0
​PercentofTotalPredictionsin
Bucket
​Prediction Probability Bucket
​TotalPredictionCount
Deploy Monitors, Monitor, Repeat!
Sample Dashboard on Simulated Data
​134
Models in
Production
​215
Models Trained
(curr.month)
​98.51%
Models with Above
Chance Performance
​216
Models Trained
(curr.month)
​99.25%
Models with Above
Chance Performance
​35,573,664
Predictions Written
Per Day (7 day avg)
​12
Experiments Run this
Week
Key Takeaways
• Deploying machine learning in production is hard
• Platforms are critical for enabling data scientist productivity
• Plan for multiple apps… always
• To ensure enabling rapid identification of areas of improvement and efficacy of new approaches
provide
• Monitoring services
• Experimentation frameworks
• Identify opportunities for reusability in all aspects, even your machine learning pipelines
• Help simplify the process of experimenting, deploying, and iterating
Models in Minutes using AutoML

More Related Content

What's hot

Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 

What's hot (20)

Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine Learning
 
MLFlow as part of ML CI/CD at Avalara
MLFlow as part of ML CI/CD at AvalaraMLFlow as part of ML CI/CD at Avalara
MLFlow as part of ML CI/CD at Avalara
 
Facebook ML Infrastructure - 2018 slides
Facebook ML Infrastructure - 2018 slidesFacebook ML Infrastructure - 2018 slides
Facebook ML Infrastructure - 2018 slides
 
AzureML TechTalk
AzureML TechTalkAzureML TechTalk
AzureML TechTalk
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementMLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
Machine Learning Platform Life-Cycle Management
Machine Learning Platform Life-Cycle ManagementMachine Learning Platform Life-Cycle Management
Machine Learning Platform Life-Cycle Management
 
Rest microservice ml_deployment_ntalagala_ai_conf_2019
Rest microservice ml_deployment_ntalagala_ai_conf_2019Rest microservice ml_deployment_ntalagala_ai_conf_2019
Rest microservice ml_deployment_ntalagala_ai_conf_2019
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
NextGenML
NextGenML NextGenML
NextGenML
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
Managing your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure MLManaging your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure ML
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
TejasveeBolisetty
TejasveeBolisettyTejasveeBolisetty
TejasveeBolisetty
 

Similar to Models in Minutes using AutoML

Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Le_GFII
 

Similar to Models in Minutes using AutoML (20)

DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Microsoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenMicrosoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDriven
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Machine learning
Machine learningMachine learning
Machine learning
 
Future.ready().watson dataplatform 01
Future.ready().watson dataplatform 01Future.ready().watson dataplatform 01
Future.ready().watson dataplatform 01
 
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
 
Introduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep LearningIntroduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep Learning
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Building a mature foundation for life in the cloud
Building a mature foundation for life in the cloudBuilding a mature foundation for life in the cloud
Building a mature foundation for life in the cloud
 
Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the Cloud
 

More from Bill Liu

More from Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 

Models in Minutes using AutoML

  • 1.
  • 2. Models in Minutes not Months: Data Science as Microservices Sarah Aerni, PhD saerni@salesforce.com @itweetsarah ​Einstein Platform
  • 4. Agenda ​BUILDING AI APPS: Perspective Of A Data Scientist • Journey to building your first model • Barriers to production along the way ​DEPLOYING MODELS IN PRODUCTION: Built For Reuse • Where engineering and applications meet AI • DevOps in Data Science – monitoring, alerting and iterating ​AUTO MACHINE LEARNING: Machine Learning Pipelines as a Collection of Microservices • Create reusable ML pipeline code for multiple applications customers • Data Scientists focus on exploration, validation and adding new apps and models
  • 5. ENABLING DATA SCIENCE ​A DATA SCIENTISTS VIEW OF BUILDING MODELS
  • 6. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy A data scientist’s view of the journey to building models
  • 7. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy A data scientist’s view of the journey to building models
  • 8. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy #Leads Created Date
  • 9. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Engineer Features Empty fields One-hot encoding (pivoting) Email domain of a user Business titles of a user Historical spend Email-Company Name Similarity
  • 10. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy​>>> from sklearn import svm >>> from numpy import loadtxt as l, random as r >>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",") >>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False) >>> X, y = pls[-testSet,:-1], pls[-testSet:,-1] >>> clf = svm.SVC() >>> clf.fit(X,y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, tol=0.001, verbose=False) >>> clf.score(pls[testSet,:-1],pls[testSet,-1]) 0.88571428571428568
  • 11. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Geographies #Leads
  • 12. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy
  • 13. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Fresh Data Input Delivery of Predictions
  • 14. Bringing a Model to Production Requires a Team ​Applications deliver predictions for customer consumption ​Predictions are produced by the models live in production ​Pipelines deliver the data for modeling and scoring at an appropriate latency ​Monitoring systems allow us to check the health of the models, data, pipelines and app Source: Salesforce Customer Relationship Survey conducted 2014-2016 among 10,500+ customers randomly selected. Response sizes per question vary.
  • 15. Data Engineers Provide data access and management capabilities for data scientists Set up and monitor data pipelines Improve performance of data processing pipelines Front-End Developers Build customer-facing UI Application instrumentation and logging ​Data Scientists ​Continue evaluating models ​Monitor for anomalies and degradation ​Iteratively improve models in production Bringing a Model to Production Requires a Team Platform Engineers Machine resource management Alerting and monitoring Product Managers Gather requirements & feedback Provide business context
  • 16. Supporting a Model in Production is Complex Only a small fraction of real-world ML systems is a composed of ML code, as shown by the small black box in the middle. The required surrounding infrastructure is fast and complex. D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015
  • 17. MODELS IN PRODUDCTION ​WHAT IT TAKES TO DEPLOY AN AI-POWERED APPLICATION
  • 18. Supporting Models in Production is Mostly NOT AI Only a small fraction of real-world ML systems is a composed of ML code, as shown by the small black box in the middle. The required surrounding infrastructure is fast and complex. Adapted from D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015 Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code
  • 19. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Data Sources …
  • 20. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Data Sources … Web UICLI
  • 21. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Admin / Authentication Service Data Sources … Web UICLI
  • 22. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor ServiceDatastore Service Admin / Authentication Service Data Sources … Web UICLI
  • 23. Why Data Services are Critical … DataConnector Data Hub Object Store in Blob Storage Catalog AccessControl Data Registry Applications App Data Prov
  • 24. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Web UICLI
  • 25. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Web UICLI Exploration Tool
  • 26. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 27. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Preparator Data Pusher Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 28. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Preparator Data Pusher CI Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 29. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 30. Monitoring your AI’s health like any other app ​Pipelines, Model Performance, Scores – Invest your time where it is needed! Sample Dashboard on Simulated Data ​Model Performance at Evaluation​Distribution of Scores at Evaluation ​105,874 Scores Written Per Hour(1 day moving avg) ​0.86 Evaluation auROC ​Total Number of Scores Written Per Hour ​150,000 ​ 100,000 ​ 50,000 ​ 0 ​Total Number of Scores Written Per Week ​10,000,000 ​ 100,000 ​ 100 ​ 0 ​PercentofTotalPredictionsin Bucket ​Prediction Probability Bucket ​TotalPredictionCount
  • 31. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration Tool
  • 32. Why Data Services are Critical … DataConnector Data Hub Object Store in Blob Storage Catalog AccessControl Data Registry Applications App Data Prov Applications App 1 App 2 App 3 Prov Prov Prov
  • 33. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher Scheduler Model Management Service Control System CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration Tool
  • 34. c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher Scheduler Model Management Service Control System CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration ToolMicroservice architecture Customizable model-evaluation & monitoring dashboards Scheduling and workflow management In-platform secured experimentation and exploration Data Scientists focus their efforts on modeling and evaluating results How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location
  • 35. Why Stop at Microservices for Supporting Your ML Code? Why stop here? Your ML code can also be just a collection of microservices! Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code
  • 37. Leveraging Platform Services to Easily Deploy 1000s of Apps Data Scientists on App #1
  • 38. Leveraging Platform Services to Easily Deploy 1000s of Apps Data Scientists on App #2Data Scientists on App #1
  • 39. Let’s Add a Third App Data Scientists on App #2 Data Scientists on App #3Data Scientists on App #1
  • 40. How This Process Would Look in Salesforce 150,000 customers ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12 ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12 ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12
  • 41. Einstein’sNewApproachtoAI Democratizing AI for Everyone Classical Approach Data Sampling Feature Selection Model Selection Score Calibration Integrate to Application Artificial Intelligence Einstein Auto-ML Data already prepped Models automatically built Predictions delivered in context AI for CRM Discover Predict Recommend Automate
  • 42. Numerical BucketsCategorical Variables Text Fields ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines
  • 43. Categorical Variables ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0
  • 44. Text Fields ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines Word Count Word Count (no stop words) Is English Sentiment 4 2 1 1 6 3 1 1 9 4 0 0 6 4 0 -1 7 3 1 0 5 1 1 0 7 3 1 0
  • 45. Numerical Buckets ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines
  • 46. What Now? How autoML can choose your model ​>>> from sklearn import svm >>> from numpy import loadtxt as l, random as r >>> clf = svm.SVC() >>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",") >>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False) >>> X, y = pls[-testSet,:-1], pls[-testSet:,-1] >>> clf.fit(X,y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) >>> clf.score(pls[testSet,:-1],pls[testSet,-1]) 0.88571428571428568 Should we try other model forms?Should we try other model forms? Features? Should we try other model forms? Features? Kernels or hyperparameters? Each use case will have its own model and features to use. We enable building separate models and features with 1 code base using OP
  • 47. Model 1 83% accuracy Model 3 73% accuracy Model 4 89% accuracy … Model 1 Model 2 Model 3 Model 4 … MODEL GENERATION MODEL TESTINGCUSTOMER A Model 2 91% accuracy Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 22 30 45 23 60 A A B A C M F F M M Y N N Y Y A tournament of models!
  • 48. A tournament of models! … Model 1 Model 2 Model 3 Model 4 … CUSTOMER B Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 34 22 66 58 41 B A C C B M M F M F Y Y N N Y Model 1 Model 3 Model 4 Customer B Model 2 Customer A MODEL GENERATION MODEL TESTINGCUSTOMER A Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 22 30 45 23 60 A A B A C M F F M M Y N N Y Y
  • 49. Deploy Monitors, Monitor, Repeat! Sample Dashboard on Simulated Data ​134 Models in Production ​215 Models Trained (curr.month) ​98.51% Models with Above Chance Performance ​35,573,664 Predictions Written Per Day (7 day avg) ​8 Experiments Run this Week
  • 50. Deploy Monitors, Monitor, Repeat! ​Pipelines, Model Performance, Scores – Invest your time where it is needed! Sample Dashboard on Simulated Data ​Model Performance at Evaluation​Distribution of Scores at Evaluation ​105,874 Scores Written Per Hour(1 day moving avg) ​0.86 Evaluation auROC ​Total Number of Scores Written Per Hour ​150,000 ​ 100,000 ​ 50,000 ​ 0 ​Total Number of Scores Written Per Week ​10,000,000 ​ 100,000 ​ 100 ​ 0 ​PercentofTotalPredictionsin Bucket ​Prediction Probability Bucket ​TotalPredictionCount
  • 51. Deploy Monitors, Monitor, Repeat! Sample Dashboard on Simulated Data ​134 Models in Production ​215 Models Trained (curr.month) ​98.51% Models with Above Chance Performance ​216 Models Trained (curr.month) ​99.25% Models with Above Chance Performance ​35,573,664 Predictions Written Per Day (7 day avg) ​12 Experiments Run this Week
  • 52. Key Takeaways • Deploying machine learning in production is hard • Platforms are critical for enabling data scientist productivity • Plan for multiple apps… always • To ensure enabling rapid identification of areas of improvement and efficacy of new approaches provide • Monitoring services • Experimentation frameworks • Identify opportunities for reusability in all aspects, even your machine learning pipelines • Help simplify the process of experimenting, deploying, and iterating