SlideShare a Scribd company logo
July 4 - 6, 2022
2 n d E d i t i o n
BigML, Inc #DutchMLSchool
The road to production
Automating and deploying Machine Learning projects
2
jao
CTO, BigML
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
3 / 61
Machine Learning as a System Service
The goal
Machine Learning as a system level
service
• Accessibility
• Integrability
• Automation
• Ease of use
4 / 61
Machine Learning as a System Service
5 / 61
Machine Learning as a System Service
The goal
Machine Learning as a system level
service
The means
• APIs: ML building blocks
• Abstraction layer over feature
engineering
• Abstraction layer over algorithms
• Automation
6 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
7 / 61
RESTful-ish ML Services
8 / 61
RESTful-ish ML Services
9 / 61
RESTful done right: Whitebox resources
• Your data, your model
• Model reverse engineering becomes moot
• Maximizes reach (Web, CLI, desktop, IoT)
10 / 61
RESTful-ish ML Services
• Excellent abstraction layer
• Transparent data model
• Immutable resources and UUIDs: traceability
• Simple yet effective interaction model
• Easy access from any language (API bindings)
Algorithmic complexity and computing resources management
problems mostly washed away
11 / 61
RESTful-ish ML Services
12 / 61
RESTful-ish ML Services
13 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
14 / 61
Textbook Machine Learning workflows
Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/)
15 / 61
ML workflows for real
16 / 61
ML workflows for real
17 / 61
ML workflows for real
18 / 61
Tumor detection using anomalies
Given data about a tumor:
• Extract the relevant features that
characterize it (unsupervised
learning)
• Classify the tumor as either benign
or malignant, improving diagnosis
and avoiding unnecessary surgery
19 / 61
Tumor detection using anomalies
Given data about a tumor:
• Extract the relevant features that
characterize it (unsupervised
learning)
• Classify the tumor as either benign
or malignant, improving diagnosis
and avoiding unnecessary surgery
Example: University of Wisconsin Hospital’s Cancer dataset
https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
19 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: Evaluation
Is the anomaly score a good predictor in real cases?
21 / 61
Tumor detection using anomalies: Automation?
22 / 61
Web UI
23 / 61
(Non) automation via Web UI
Strengths of Web UI
Simple Just clicking around
Discoverable Exploration and experimenting
Abstract Transparent error handling and scalability
24 / 61
(Non) automation via Web UI
Strengths of Web UI
Simple Just clicking around
Discoverable Exploration and experimenting
Abstract Transparent error handling and scalability
Problems of Web UI
Only simple Simple tasks are simple, hard tasks quickly get hard
No automation or batch operations Clicking humans don’t scale well
24 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
25 / 61
Abstracting over raw HTTP: bindings
26 / 61
Example workflow
27 / 61
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
dataset = api.create_dataset(source)
api.ok(dataset)
r, s = 0.8, "seed"
train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s})
test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train_dataset)
model = api.create_model(train_dataset)
api.ok(model)
api.ok(test_dataset)
evaluation = api.create_evaluation(model, test_dataset)
api.ok(evaluation)
28 / 61
Is this production code?
How do we generalize to, say, 100 datasets?
29 / 61
Example workflow: Python bindings
# Now do it 100 times, serially
for i in range(0, 100):
r, s = 0.8, i
train = api.create_dataset(dataset, {"rate": r, "seed": s})
test = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train)
model.append(api.create_model(train))
api.ok(model)
api.ok(test)
evaluation.append(api.create_evaluation(model, test))
api.ok(evaluation[i])
30 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but at what level?
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
# Do we wait here?
api.ok(train[i])
api.ok(test[i])
for i in range(0, 100):
model.append(api.create_model(train[i]))
api.ok(model[i])
for i in range(0, 100):
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
31 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but at what level?
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
for i in range(0, 100):
# Or do we wait here?
api.ok(train[i])
model.append(api.create_model(train[i]))
for i in range(0, 100):
# and here?
api.ok(model[i])
api.ok(train[i])
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
32 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but how do we handle errors??
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
for i in range(0, 100):
api.ok(train[i])
model.append(api.create_model(train[i]))
for i in range(0, 100):
try:
api.ok(model[i])
api.ok(test[i])
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
except:
# How to recover if test[i] is failed? New datasets? Abort?
33 / 61
Client-side Machine Learning Automation
Problems of bindings-based, client solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows are hard to optimize
Reproducibility Noisy, complex and hard to audit development environment
Not enough abstraction
34 / 61
A partial solution: CLI declarative tools
# "1-click" dataset with parameterized fields
bigmler --train data/diabetes.csv 
--no-model 
--name "4-featured diabetes" 
--dataset-fields 
"plasma glucose,insulin,diabetes pedigree,diabetes" 
--output-dir output/diabetes 
--project "Certification Workshop"
# "1-click" ensemble
bigmler --train data/iris.csv 
--number-of-models 500 
--sample-rate 0.85 
--output-dir output/iris-ensemble 
--project "Certification Workshop"
35 / 61
Rich, parameterized workflows: cross-validation
bigmler analyze --cross-validation # parameterized input 
--dataset $(cat output/diabetes/dataset) 
--k-folds 3 # number of folds during validation 
--output-dir output/diabetes-validation
36 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Not enough automation
37 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Not enough abstraction
37 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Algorithmic complexity and computing resources management problems mostly
washed away are back!
37 / 61
Client-side Machine Learning automation
Algorithmic complexity and computing resources management problems are back! 38 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
39 / 61
Machine Learning Automation
40 / 61
Solution (scalability, reuse): Back to the server
41 / 61
Server–side automation: Scriptify
42 / 61
Server–side automation: Scriptify
42 / 61
Solution (complexity, reuse): Domain–specific languages
43 / 61
In a Nutshell
1. Workflows reified as server–side, RESTful resources
2. Domain–specific language for ML workflow automation
44 / 61
Workflows as RESTful Resources
Library Reusable building-block: a collection of WhizzML
definitions that can be imported by other libraries or
scripts.
Script Executable code that describes an actual workflow.
• Imports List of libraries with code used by the script.
• Inputs List of input values that parameterize the
workflow.
• Outputs List of values computed by the script and
returned to the user.
Execution Given a script and a complete set of inputs, the workflow
can be executed and its outputs generated.
45 / 61
Workflows as RESTful Resources: the bazaar
46 / 61
Workflows as RESTful Resources: metaprogramming
Resources that create
resources that create
resources that create
resources that create 47 / 61
Different ways of executing WhizzML Scripts
Web UI
BigMLer
Bindings
Executions
−→
48 / 61
Executing WhizzML scripts: bindings
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))
49 / 61
Creating and executing WhizzML scripts with BigMLer
bigmler execute --code "(+ 1 2)" --output-dir simple_exe
bigmler execute --script script/50a2bb64035d0706db000643
bigmler execute --script script/50a2bb64035d0706db000643 
--inputs my_inputs.json
bigmler execute --code '(define addition (+ a b))' 
--declare-inputs my_inputs_dec.json 
--declare-outputs my_outputs_dec.json 
--no-execute
50 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
51 / 61
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
dataset = api.create_dataset(source)
api.ok(dataset)
r, s = 0.8, "seed"
train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s})
test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train_dataset)
model = api.create_model(train_dataset)
api.ok(model)
api.ok(test_dataset)
evaluation = api.create_evaluation(model, test_dataset)
api.ok(evaluation)
52 / 61
Syntactic Abstraction in WhizzML: Simple workflow
;; ML artifacts are first-class citizens,
;; we only need to talk about our domain
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id
model-id
{"name" "Evaluation 80/20"
"missing_strategy" 0}))
53 / 61
Syntactic Abstraction in WhizzML: Simple workflow
;; ML artifacts are first-class citizens,
;; we only need to talk about our domain
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id
model-id
{"name" "Evaluation 80/20"
"missing_strategy" 0}))
Ready for production!
53 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for 1 resource
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id model-id))
54 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for arbitrary number of resources
(let (splits (for (id input-datasets)
(create-dataset-split id 0.8)))
(for (split splits)
(create-evaluation (create-model (split 0)) (split 1))))
55 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for arbitrary number of resources
(let (splits (for (id input-datasets)
(create-dataset-split id 0.8)))
(for (split splits)
(create-evaluation (create-model (split 0)) (split 1))))
Ready for production!
55 / 61
ML workflows for real
56 / 61
Syntactic Abstraction in WhizzML: Simple workflow
(let (score (create-anomalyscore anomaly-id input))
(if (> score threshold)
(raise "Input is too weird to predict")
(create-prediction model-id input)))
Ready for production!
57 / 61
Domain Specificity and Scalability: Trivial parallelization
(for (input inputs)
(when (< (create-anomalyscore anomaly-id input))
(create-prediction model-id input)))
Ready for production!
58 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
59 / 61
Package and deploy BigML work
fl
ows in a few clicks
Deploy and
monitor your
application
1 Create an Application
Ops
2 Connect to BigML and add
Work
fl
ows and models
3 Package everything
in a container
4
60 / 61
61 / 61

More Related Content

Similar to DutchMLSchool 2022 - Automation

VSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic WorkflowsVSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic Workflows
BigML, Inc
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning Workflows
BigML, Inc
 
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
Bruno Capuano
 
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
2021 06 19 ms student ambassadors nigeria ml net 01   slide-share2021 06 19 ms student ambassadors nigeria ml net 01   slide-share
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
Bruno Capuano
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
Francesca Lazzeri, PhD
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
Matei Zaharia
 
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017
Dan Crankshaw
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
Ivica Crnkovic
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016
Mahesh Dananjaya
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
Stepan Pushkarev
 
VSSML18. REST API and Bindings
VSSML18. REST API and BindingsVSSML18. REST API and Bindings
VSSML18. REST API and Bindings
BigML, Inc
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Benjamin Bengfort
 
Mining attributes
Mining attributesMining attributes
Mining attributes
Sandra Alex
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
Databricks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019
Marco Zamana
 
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Databricks
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
Gianni Rosa Gallina
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 

Similar to DutchMLSchool 2022 - Automation (20)

VSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic WorkflowsVSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic Workflows
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning Workflows
 
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
 
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
2021 06 19 ms student ambassadors nigeria ml net 01   slide-share2021 06 19 ms student ambassadors nigeria ml net 01   slide-share
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
 
VSSML18. REST API and Bindings
VSSML18. REST API and BindingsVSSML18. REST API and Bindings
VSSML18. REST API and Bindings
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Mining attributes
Mining attributesMining attributes
Mining attributes
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019
 
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
BigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
BigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
BigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
BigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
BigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
BigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
BigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
BigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
BigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
BigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
BigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
BigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
BigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
BigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
BigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
BigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 

DutchMLSchool 2022 - Automation

  • 1. July 4 - 6, 2022 2 n d E d i t i o n
  • 2. BigML, Inc #DutchMLSchool The road to production Automating and deploying Machine Learning projects 2 jao CTO, BigML
  • 3. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 3 / 61
  • 4. Machine Learning as a System Service The goal Machine Learning as a system level service • Accessibility • Integrability • Automation • Ease of use 4 / 61
  • 5. Machine Learning as a System Service 5 / 61
  • 6. Machine Learning as a System Service The goal Machine Learning as a system level service The means • APIs: ML building blocks • Abstraction layer over feature engineering • Abstraction layer over algorithms • Automation 6 / 61
  • 7. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 7 / 61
  • 10. RESTful done right: Whitebox resources • Your data, your model • Model reverse engineering becomes moot • Maximizes reach (Web, CLI, desktop, IoT) 10 / 61
  • 11. RESTful-ish ML Services • Excellent abstraction layer • Transparent data model • Immutable resources and UUIDs: traceability • Simple yet effective interaction model • Easy access from any language (API bindings) Algorithmic complexity and computing resources management problems mostly washed away 11 / 61
  • 14. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 14 / 61
  • 15. Textbook Machine Learning workflows Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/) 15 / 61
  • 16. ML workflows for real 16 / 61
  • 17. ML workflows for real 17 / 61
  • 18. ML workflows for real 18 / 61
  • 19. Tumor detection using anomalies Given data about a tumor: • Extract the relevant features that characterize it (unsupervised learning) • Classify the tumor as either benign or malignant, improving diagnosis and avoiding unnecessary surgery 19 / 61
  • 20. Tumor detection using anomalies Given data about a tumor: • Extract the relevant features that characterize it (unsupervised learning) • Classify the tumor as either benign or malignant, improving diagnosis and avoiding unnecessary surgery Example: University of Wisconsin Hospital’s Cancer dataset https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/ 19 / 61
  • 21. Tumor detection using anomalies: workflow 20 / 61
  • 22. Tumor detection using anomalies: workflow 20 / 61
  • 23. Tumor detection using anomalies: workflow 20 / 61
  • 24. Tumor detection using anomalies: workflow 20 / 61
  • 25. Tumor detection using anomalies: Evaluation Is the anomaly score a good predictor in real cases? 21 / 61
  • 26. Tumor detection using anomalies: Automation? 22 / 61
  • 28. (Non) automation via Web UI Strengths of Web UI Simple Just clicking around Discoverable Exploration and experimenting Abstract Transparent error handling and scalability 24 / 61
  • 29. (Non) automation via Web UI Strengths of Web UI Simple Just clicking around Discoverable Exploration and experimenting Abstract Transparent error handling and scalability Problems of Web UI Only simple Simple tasks are simple, hard tasks quickly get hard No automation or batch operations Clicking humans don’t scale well 24 / 61
  • 30. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 25 / 61
  • 31. Abstracting over raw HTTP: bindings 26 / 61
  • 33. Example workflow: Python bindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' dataset = api.create_dataset(source) api.ok(dataset) r, s = 0.8, "seed" train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s}) test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train_dataset) model = api.create_model(train_dataset) api.ok(model) api.ok(test_dataset) evaluation = api.create_evaluation(model, test_dataset) api.ok(evaluation) 28 / 61
  • 34. Is this production code? How do we generalize to, say, 100 datasets? 29 / 61
  • 35. Example workflow: Python bindings # Now do it 100 times, serially for i in range(0, 100): r, s = 0.8, i train = api.create_dataset(dataset, {"rate": r, "seed": s}) test = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train) model.append(api.create_model(train)) api.ok(model) api.ok(test) evaluation.append(api.create_evaluation(model, test)) api.ok(evaluation[i]) 30 / 61
  • 36. Example workflow: Python bindings # More efficient if we parallelize, but at what level? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) # Do we wait here? api.ok(train[i]) api.ok(test[i]) for i in range(0, 100): model.append(api.create_model(train[i])) api.ok(model[i]) for i in range(0, 100): evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) 31 / 61
  • 37. Example workflow: Python bindings # More efficient if we parallelize, but at what level? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) for i in range(0, 100): # Or do we wait here? api.ok(train[i]) model.append(api.create_model(train[i])) for i in range(0, 100): # and here? api.ok(model[i]) api.ok(train[i]) evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) 32 / 61
  • 38. Example workflow: Python bindings # More efficient if we parallelize, but how do we handle errors?? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) for i in range(0, 100): api.ok(train[i]) model.append(api.create_model(train[i])) for i in range(0, 100): try: api.ok(model[i]) api.ok(test[i]) evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) except: # How to recover if test[i] is failed? New datasets? Abort? 33 / 61
  • 39. Client-side Machine Learning Automation Problems of bindings-based, client solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows are hard to optimize Reproducibility Noisy, complex and hard to audit development environment Not enough abstraction 34 / 61
  • 40. A partial solution: CLI declarative tools # "1-click" dataset with parameterized fields bigmler --train data/diabetes.csv --no-model --name "4-featured diabetes" --dataset-fields "plasma glucose,insulin,diabetes pedigree,diabetes" --output-dir output/diabetes --project "Certification Workshop" # "1-click" ensemble bigmler --train data/iris.csv --number-of-models 500 --sample-rate 0.85 --output-dir output/iris-ensemble --project "Certification Workshop" 35 / 61
  • 41. Rich, parameterized workflows: cross-validation bigmler analyze --cross-validation # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 3 # number of folds during validation --output-dir output/diabetes-validation 36 / 61
  • 42. Client-side Machine Learning automation Problems of client-side solutions Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Not enough automation 37 / 61
  • 43. Client-side Machine Learning automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Not enough abstraction 37 / 61
  • 44. Client-side Machine Learning automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Algorithmic complexity and computing resources management problems mostly washed away are back! 37 / 61
  • 45. Client-side Machine Learning automation Algorithmic complexity and computing resources management problems are back! 38 / 61
  • 46. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 39 / 61
  • 48. Solution (scalability, reuse): Back to the server 41 / 61
  • 51. Solution (complexity, reuse): Domain–specific languages 43 / 61
  • 52. In a Nutshell 1. Workflows reified as server–side, RESTful resources 2. Domain–specific language for ML workflow automation 44 / 61
  • 53. Workflows as RESTful Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. 45 / 61
  • 54. Workflows as RESTful Resources: the bazaar 46 / 61
  • 55. Workflows as RESTful Resources: metaprogramming Resources that create resources that create resources that create resources that create 47 / 61
  • 56. Different ways of executing WhizzML Scripts Web UI BigMLer Bindings Executions −→ 48 / 61
  • 57. Executing WhizzML scripts: bindings from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) 49 / 61
  • 58. Creating and executing WhizzML scripts with BigMLer bigmler execute --code "(+ 1 2)" --output-dir simple_exe bigmler execute --script script/50a2bb64035d0706db000643 bigmler execute --script script/50a2bb64035d0706db000643 --inputs my_inputs.json bigmler execute --code '(define addition (+ a b))' --declare-inputs my_inputs_dec.json --declare-outputs my_outputs_dec.json --no-execute 50 / 61
  • 59. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 51 / 61
  • 60. Example workflow: Python bindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' dataset = api.create_dataset(source) api.ok(dataset) r, s = 0.8, "seed" train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s}) test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train_dataset) model = api.create_model(train_dataset) api.ok(model) api.ok(test_dataset) evaluation = api.create_evaluation(model, test_dataset) api.ok(evaluation) 52 / 61
  • 61. Syntactic Abstraction in WhizzML: Simple workflow ;; ML artifacts are first-class citizens, ;; we only need to talk about our domain (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id {"name" "Evaluation 80/20" "missing_strategy" 0})) 53 / 61
  • 62. Syntactic Abstraction in WhizzML: Simple workflow ;; ML artifacts are first-class citizens, ;; we only need to talk about our domain (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id {"name" "Evaluation 80/20" "missing_strategy" 0})) Ready for production! 53 / 61
  • 63. Domain Specificity and Scalability: Trivial parallelization ;; Workflow for 1 resource (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id)) 54 / 61
  • 64. Domain Specificity and Scalability: Trivial parallelization ;; Workflow for arbitrary number of resources (let (splits (for (id input-datasets) (create-dataset-split id 0.8))) (for (split splits) (create-evaluation (create-model (split 0)) (split 1)))) 55 / 61
  • 65. Domain Specificity and Scalability: Trivial parallelization ;; Workflow for arbitrary number of resources (let (splits (for (id input-datasets) (create-dataset-split id 0.8))) (for (split splits) (create-evaluation (create-model (split 0)) (split 1)))) Ready for production! 55 / 61
  • 66. ML workflows for real 56 / 61
  • 67. Syntactic Abstraction in WhizzML: Simple workflow (let (score (create-anomalyscore anomaly-id input)) (if (> score threshold) (raise "Input is too weird to predict") (create-prediction model-id input))) Ready for production! 57 / 61
  • 68. Domain Specificity and Scalability: Trivial parallelization (for (input inputs) (when (< (create-anomalyscore anomaly-id input)) (create-prediction model-id input))) Ready for production! 58 / 61
  • 69. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 59 / 61
  • 70. Package and deploy BigML work fl ows in a few clicks Deploy and monitor your application 1 Create an Application Ops 2 Connect to BigML and add Work fl ows and models 3 Package everything in a container 4 60 / 61