July 4 - 6, 2022
2 n d E d i t i o n
BigML, Inc #DutchMLSchool
The road to production
Automating and deploying Machine Learning projects
2
jao
CTO, BigML
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
3 / 61
Machine Learning as a System Service
The goal
Machine Learning as a system level
service
• Accessibility
• Integrability
• Automation
• Ease of use
4 / 61
Machine Learning as a System Service
5 / 61
Machine Learning as a System Service
The goal
Machine Learning as a system level
service
The means
• APIs: ML building blocks
• Abstraction layer over feature
engineering
• Abstraction layer over algorithms
• Automation
6 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
7 / 61
RESTful-ish ML Services
8 / 61
RESTful-ish ML Services
9 / 61
RESTful done right: Whitebox resources
• Your data, your model
• Model reverse engineering becomes moot
• Maximizes reach (Web, CLI, desktop, IoT)
10 / 61
RESTful-ish ML Services
• Excellent abstraction layer
• Transparent data model
• Immutable resources and UUIDs: traceability
• Simple yet effective interaction model
• Easy access from any language (API bindings)
Algorithmic complexity and computing resources management
problems mostly washed away
11 / 61
RESTful-ish ML Services
12 / 61
RESTful-ish ML Services
13 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
14 / 61
Textbook Machine Learning workflows
Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/)
15 / 61
ML workflows for real
16 / 61
ML workflows for real
17 / 61
ML workflows for real
18 / 61
Tumor detection using anomalies
Given data about a tumor:
• Extract the relevant features that
characterize it (unsupervised
learning)
• Classify the tumor as either benign
or malignant, improving diagnosis
and avoiding unnecessary surgery
19 / 61
Tumor detection using anomalies
Given data about a tumor:
• Extract the relevant features that
characterize it (unsupervised
learning)
• Classify the tumor as either benign
or malignant, improving diagnosis
and avoiding unnecessary surgery
Example: University of Wisconsin Hospital’s Cancer dataset
https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
19 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: Evaluation
Is the anomaly score a good predictor in real cases?
21 / 61
Tumor detection using anomalies: Automation?
22 / 61
Web UI
23 / 61
(Non) automation via Web UI
Strengths of Web UI
Simple Just clicking around
Discoverable Exploration and experimenting
Abstract Transparent error handling and scalability
24 / 61
(Non) automation via Web UI
Strengths of Web UI
Simple Just clicking around
Discoverable Exploration and experimenting
Abstract Transparent error handling and scalability
Problems of Web UI
Only simple Simple tasks are simple, hard tasks quickly get hard
No automation or batch operations Clicking humans don’t scale well
24 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
25 / 61
Abstracting over raw HTTP: bindings
26 / 61
Example workflow
27 / 61
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
dataset = api.create_dataset(source)
api.ok(dataset)
r, s = 0.8, "seed"
train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s})
test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train_dataset)
model = api.create_model(train_dataset)
api.ok(model)
api.ok(test_dataset)
evaluation = api.create_evaluation(model, test_dataset)
api.ok(evaluation)
28 / 61
Is this production code?
How do we generalize to, say, 100 datasets?
29 / 61
Example workflow: Python bindings
# Now do it 100 times, serially
for i in range(0, 100):
r, s = 0.8, i
train = api.create_dataset(dataset, {"rate": r, "seed": s})
test = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train)
model.append(api.create_model(train))
api.ok(model)
api.ok(test)
evaluation.append(api.create_evaluation(model, test))
api.ok(evaluation[i])
30 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but at what level?
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
# Do we wait here?
api.ok(train[i])
api.ok(test[i])
for i in range(0, 100):
model.append(api.create_model(train[i]))
api.ok(model[i])
for i in range(0, 100):
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
31 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but at what level?
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
for i in range(0, 100):
# Or do we wait here?
api.ok(train[i])
model.append(api.create_model(train[i]))
for i in range(0, 100):
# and here?
api.ok(model[i])
api.ok(train[i])
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
32 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but how do we handle errors??
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
for i in range(0, 100):
api.ok(train[i])
model.append(api.create_model(train[i]))
for i in range(0, 100):
try:
api.ok(model[i])
api.ok(test[i])
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
except:
# How to recover if test[i] is failed? New datasets? Abort?
33 / 61
Client-side Machine Learning Automation
Problems of bindings-based, client solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows are hard to optimize
Reproducibility Noisy, complex and hard to audit development environment
Not enough abstraction
34 / 61
A partial solution: CLI declarative tools
# "1-click" dataset with parameterized fields
bigmler --train data/diabetes.csv 
--no-model 
--name "4-featured diabetes" 
--dataset-fields 
"plasma glucose,insulin,diabetes pedigree,diabetes" 
--output-dir output/diabetes 
--project "Certification Workshop"
# "1-click" ensemble
bigmler --train data/iris.csv 
--number-of-models 500 
--sample-rate 0.85 
--output-dir output/iris-ensemble 
--project "Certification Workshop"
35 / 61
Rich, parameterized workflows: cross-validation
bigmler analyze --cross-validation # parameterized input 
--dataset $(cat output/diabetes/dataset) 
--k-folds 3 # number of folds during validation 
--output-dir output/diabetes-validation
36 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Not enough automation
37 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Not enough abstraction
37 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Algorithmic complexity and computing resources management problems mostly
washed away are back!
37 / 61
Client-side Machine Learning automation
Algorithmic complexity and computing resources management problems are back! 38 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
39 / 61
Machine Learning Automation
40 / 61
Solution (scalability, reuse): Back to the server
41 / 61
Server–side automation: Scriptify
42 / 61
Server–side automation: Scriptify
42 / 61
Solution (complexity, reuse): Domain–specific languages
43 / 61
In a Nutshell
1. Workflows reified as server–side, RESTful resources
2. Domain–specific language for ML workflow automation
44 / 61
Workflows as RESTful Resources
Library Reusable building-block: a collection of WhizzML
definitions that can be imported by other libraries or
scripts.
Script Executable code that describes an actual workflow.
• Imports List of libraries with code used by the script.
• Inputs List of input values that parameterize the
workflow.
• Outputs List of values computed by the script and
returned to the user.
Execution Given a script and a complete set of inputs, the workflow
can be executed and its outputs generated.
45 / 61
Workflows as RESTful Resources: the bazaar
46 / 61
Workflows as RESTful Resources: metaprogramming
Resources that create
resources that create
resources that create
resources that create 47 / 61
Different ways of executing WhizzML Scripts
Web UI
BigMLer
Bindings
Executions
−→
48 / 61
Executing WhizzML scripts: bindings
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))
49 / 61
Creating and executing WhizzML scripts with BigMLer
bigmler execute --code "(+ 1 2)" --output-dir simple_exe
bigmler execute --script script/50a2bb64035d0706db000643
bigmler execute --script script/50a2bb64035d0706db000643 
--inputs my_inputs.json
bigmler execute --code '(define addition (+ a b))' 
--declare-inputs my_inputs_dec.json 
--declare-outputs my_outputs_dec.json 
--no-execute
50 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
51 / 61
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
dataset = api.create_dataset(source)
api.ok(dataset)
r, s = 0.8, "seed"
train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s})
test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train_dataset)
model = api.create_model(train_dataset)
api.ok(model)
api.ok(test_dataset)
evaluation = api.create_evaluation(model, test_dataset)
api.ok(evaluation)
52 / 61
Syntactic Abstraction in WhizzML: Simple workflow
;; ML artifacts are first-class citizens,
;; we only need to talk about our domain
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id
model-id
{"name" "Evaluation 80/20"
"missing_strategy" 0}))
53 / 61
Syntactic Abstraction in WhizzML: Simple workflow
;; ML artifacts are first-class citizens,
;; we only need to talk about our domain
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id
model-id
{"name" "Evaluation 80/20"
"missing_strategy" 0}))
Ready for production!
53 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for 1 resource
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id model-id))
54 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for arbitrary number of resources
(let (splits (for (id input-datasets)
(create-dataset-split id 0.8)))
(for (split splits)
(create-evaluation (create-model (split 0)) (split 1))))
55 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for arbitrary number of resources
(let (splits (for (id input-datasets)
(create-dataset-split id 0.8)))
(for (split splits)
(create-evaluation (create-model (split 0)) (split 1))))
Ready for production!
55 / 61
ML workflows for real
56 / 61
Syntactic Abstraction in WhizzML: Simple workflow
(let (score (create-anomalyscore anomaly-id input))
(if (> score threshold)
(raise "Input is too weird to predict")
(create-prediction model-id input)))
Ready for production!
57 / 61
Domain Specificity and Scalability: Trivial parallelization
(for (input inputs)
(when (< (create-anomalyscore anomaly-id input))
(create-prediction model-id input)))
Ready for production!
58 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
59 / 61
Package and deploy BigML work
fl
ows in a few clicks
Deploy and
monitor your
application
1 Create an Application
Ops
2 Connect to BigML and add
Work
fl
ows and models
3 Package everything
in a container
4
60 / 61
61 / 61

DutchMLSchool 2022 - Automation

  • 1.
    July 4 -6, 2022 2 n d E d i t i o n
  • 2.
    BigML, Inc #DutchMLSchool Theroad to production Automating and deploying Machine Learning projects 2 jao CTO, BigML
  • 3.
    Outline 1 ML asa system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 3 / 61
  • 4.
    Machine Learning asa System Service The goal Machine Learning as a system level service • Accessibility • Integrability • Automation • Ease of use 4 / 61
  • 5.
    Machine Learning asa System Service 5 / 61
  • 6.
    Machine Learning asa System Service The goal Machine Learning as a system level service The means • APIs: ML building blocks • Abstraction layer over feature engineering • Abstraction layer over algorithms • Automation 6 / 61
  • 7.
    Outline 1 ML asa system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 7 / 61
  • 8.
  • 9.
  • 10.
    RESTful done right:Whitebox resources • Your data, your model • Model reverse engineering becomes moot • Maximizes reach (Web, CLI, desktop, IoT) 10 / 61
  • 11.
    RESTful-ish ML Services •Excellent abstraction layer • Transparent data model • Immutable resources and UUIDs: traceability • Simple yet effective interaction model • Easy access from any language (API bindings) Algorithmic complexity and computing resources management problems mostly washed away 11 / 61
  • 12.
  • 13.
  • 14.
    Outline 1 ML asa system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 14 / 61
  • 15.
    Textbook Machine Learningworkflows Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/) 15 / 61
  • 16.
    ML workflows forreal 16 / 61
  • 17.
    ML workflows forreal 17 / 61
  • 18.
    ML workflows forreal 18 / 61
  • 19.
    Tumor detection usinganomalies Given data about a tumor: • Extract the relevant features that characterize it (unsupervised learning) • Classify the tumor as either benign or malignant, improving diagnosis and avoiding unnecessary surgery 19 / 61
  • 20.
    Tumor detection usinganomalies Given data about a tumor: • Extract the relevant features that characterize it (unsupervised learning) • Classify the tumor as either benign or malignant, improving diagnosis and avoiding unnecessary surgery Example: University of Wisconsin Hospital’s Cancer dataset https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/ 19 / 61
  • 21.
    Tumor detection usinganomalies: workflow 20 / 61
  • 22.
    Tumor detection usinganomalies: workflow 20 / 61
  • 23.
    Tumor detection usinganomalies: workflow 20 / 61
  • 24.
    Tumor detection usinganomalies: workflow 20 / 61
  • 25.
    Tumor detection usinganomalies: Evaluation Is the anomaly score a good predictor in real cases? 21 / 61
  • 26.
    Tumor detection usinganomalies: Automation? 22 / 61
  • 27.
  • 28.
    (Non) automation viaWeb UI Strengths of Web UI Simple Just clicking around Discoverable Exploration and experimenting Abstract Transparent error handling and scalability 24 / 61
  • 29.
    (Non) automation viaWeb UI Strengths of Web UI Simple Just clicking around Discoverable Exploration and experimenting Abstract Transparent error handling and scalability Problems of Web UI Only simple Simple tasks are simple, hard tasks quickly get hard No automation or batch operations Clicking humans don’t scale well 24 / 61
  • 30.
    Outline 1 ML asa system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 25 / 61
  • 31.
    Abstracting over rawHTTP: bindings 26 / 61
  • 32.
  • 33.
    Example workflow: Pythonbindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' dataset = api.create_dataset(source) api.ok(dataset) r, s = 0.8, "seed" train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s}) test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train_dataset) model = api.create_model(train_dataset) api.ok(model) api.ok(test_dataset) evaluation = api.create_evaluation(model, test_dataset) api.ok(evaluation) 28 / 61
  • 34.
    Is this productioncode? How do we generalize to, say, 100 datasets? 29 / 61
  • 35.
    Example workflow: Pythonbindings # Now do it 100 times, serially for i in range(0, 100): r, s = 0.8, i train = api.create_dataset(dataset, {"rate": r, "seed": s}) test = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train) model.append(api.create_model(train)) api.ok(model) api.ok(test) evaluation.append(api.create_evaluation(model, test)) api.ok(evaluation[i]) 30 / 61
  • 36.
    Example workflow: Pythonbindings # More efficient if we parallelize, but at what level? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) # Do we wait here? api.ok(train[i]) api.ok(test[i]) for i in range(0, 100): model.append(api.create_model(train[i])) api.ok(model[i]) for i in range(0, 100): evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) 31 / 61
  • 37.
    Example workflow: Pythonbindings # More efficient if we parallelize, but at what level? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) for i in range(0, 100): # Or do we wait here? api.ok(train[i]) model.append(api.create_model(train[i])) for i in range(0, 100): # and here? api.ok(model[i]) api.ok(train[i]) evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) 32 / 61
  • 38.
    Example workflow: Pythonbindings # More efficient if we parallelize, but how do we handle errors?? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) for i in range(0, 100): api.ok(train[i]) model.append(api.create_model(train[i])) for i in range(0, 100): try: api.ok(model[i]) api.ok(test[i]) evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) except: # How to recover if test[i] is failed? New datasets? Abort? 33 / 61
  • 39.
    Client-side Machine LearningAutomation Problems of bindings-based, client solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows are hard to optimize Reproducibility Noisy, complex and hard to audit development environment Not enough abstraction 34 / 61
  • 40.
    A partial solution:CLI declarative tools # "1-click" dataset with parameterized fields bigmler --train data/diabetes.csv --no-model --name "4-featured diabetes" --dataset-fields "plasma glucose,insulin,diabetes pedigree,diabetes" --output-dir output/diabetes --project "Certification Workshop" # "1-click" ensemble bigmler --train data/iris.csv --number-of-models 500 --sample-rate 0.85 --output-dir output/iris-ensemble --project "Certification Workshop" 35 / 61
  • 41.
    Rich, parameterized workflows:cross-validation bigmler analyze --cross-validation # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 3 # number of folds during validation --output-dir output/diabetes-validation 36 / 61
  • 42.
    Client-side Machine Learningautomation Problems of client-side solutions Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Not enough automation 37 / 61
  • 43.
    Client-side Machine Learningautomation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Not enough abstraction 37 / 61
  • 44.
    Client-side Machine Learningautomation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Algorithmic complexity and computing resources management problems mostly washed away are back! 37 / 61
  • 45.
    Client-side Machine Learningautomation Algorithmic complexity and computing resources management problems are back! 38 / 61
  • 46.
    Outline 1 ML asa system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 39 / 61
  • 47.
  • 48.
    Solution (scalability, reuse):Back to the server 41 / 61
  • 49.
  • 50.
  • 51.
    Solution (complexity, reuse):Domain–specific languages 43 / 61
  • 52.
    In a Nutshell 1.Workflows reified as server–side, RESTful resources 2. Domain–specific language for ML workflow automation 44 / 61
  • 53.
    Workflows as RESTfulResources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. 45 / 61
  • 54.
    Workflows as RESTfulResources: the bazaar 46 / 61
  • 55.
    Workflows as RESTfulResources: metaprogramming Resources that create resources that create resources that create resources that create 47 / 61
  • 56.
    Different ways ofexecuting WhizzML Scripts Web UI BigMLer Bindings Executions −→ 48 / 61
  • 57.
    Executing WhizzML scripts:bindings from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) 49 / 61
  • 58.
    Creating and executingWhizzML scripts with BigMLer bigmler execute --code "(+ 1 2)" --output-dir simple_exe bigmler execute --script script/50a2bb64035d0706db000643 bigmler execute --script script/50a2bb64035d0706db000643 --inputs my_inputs.json bigmler execute --code '(define addition (+ a b))' --declare-inputs my_inputs_dec.json --declare-outputs my_outputs_dec.json --no-execute 50 / 61
  • 59.
    Outline 1 ML asa system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 51 / 61
  • 60.
    Example workflow: Pythonbindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' dataset = api.create_dataset(source) api.ok(dataset) r, s = 0.8, "seed" train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s}) test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train_dataset) model = api.create_model(train_dataset) api.ok(model) api.ok(test_dataset) evaluation = api.create_evaluation(model, test_dataset) api.ok(evaluation) 52 / 61
  • 61.
    Syntactic Abstraction inWhizzML: Simple workflow ;; ML artifacts are first-class citizens, ;; we only need to talk about our domain (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id {"name" "Evaluation 80/20" "missing_strategy" 0})) 53 / 61
  • 62.
    Syntactic Abstraction inWhizzML: Simple workflow ;; ML artifacts are first-class citizens, ;; we only need to talk about our domain (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id {"name" "Evaluation 80/20" "missing_strategy" 0})) Ready for production! 53 / 61
  • 63.
    Domain Specificity andScalability: Trivial parallelization ;; Workflow for 1 resource (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id)) 54 / 61
  • 64.
    Domain Specificity andScalability: Trivial parallelization ;; Workflow for arbitrary number of resources (let (splits (for (id input-datasets) (create-dataset-split id 0.8))) (for (split splits) (create-evaluation (create-model (split 0)) (split 1)))) 55 / 61
  • 65.
    Domain Specificity andScalability: Trivial parallelization ;; Workflow for arbitrary number of resources (let (splits (for (id input-datasets) (create-dataset-split id 0.8))) (for (split splits) (create-evaluation (create-model (split 0)) (split 1)))) Ready for production! 55 / 61
  • 66.
    ML workflows forreal 56 / 61
  • 67.
    Syntactic Abstraction inWhizzML: Simple workflow (let (score (create-anomalyscore anomaly-id input)) (if (> score threshold) (raise "Input is too weird to predict") (create-prediction model-id input))) Ready for production! 57 / 61
  • 68.
    Domain Specificity andScalability: Trivial parallelization (for (input inputs) (when (< (create-anomalyscore anomaly-id input)) (create-prediction model-id input))) Ready for production! 58 / 61
  • 69.
    Outline 1 ML asa system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 59 / 61
  • 70.
    Package and deployBigML work fl ows in a few clicks Deploy and monitor your application 1 Create an Application Ops 2 Connect to BigML and add Work fl ows and models 3 Package everything in a container 4 60 / 61
  • 71.