Automating Machine Learning
API, bindings, BigMLer and Basic Workflows
#VSSML16
September 2016
#VSSML16 Automating Machine Learning September 2016 1 / 43
Outline
1 Machine Learning workflows
2 Client-side workflows: REST API and bindings
3 Client-side workflows: Bigmler
4 Server-side workflows: WhizzML
5 Example Workflow Walk-throughs
#VSSML16 Automating Machine Learning September 2016 2 / 43
Outline
1 Machine Learning workflows
2 Client-side workflows: REST API and bindings
3 Client-side workflows: Bigmler
4 Server-side workflows: WhizzML
5 Example Workflow Walk-throughs
#VSSML16 Automating Machine Learning September 2016 3 / 43
Machine Learning as a System Service
The goal
Machine Learning as a system
level service
The means
• APIs: ML building blocks
• Abstraction layer over feature
engineering
• Abstraction layer over
algorithms
• Automation
#VSSML16 Automating Machine Learning September 2016 4 / 43
Machine Learning workflows
#VSSML16 Automating Machine Learning September 2016 5 / 43
Machine Learning workflows, for real
#VSSML16 Automating Machine Learning September 2016 6 / 43
Higher-level Machine Learning
#VSSML16 Automating Machine Learning September 2016 7 / 43
Outline
1 Machine Learning workflows
2 Client-side workflows: REST API and bindings
3 Client-side workflows: Bigmler
4 Server-side workflows: WhizzML
5 Example Workflow Walk-throughs
#VSSML16 Automating Machine Learning September 2016 8 / 43
Example workflow: Batch Centroid
Objective: Label each row in a Dataset with its associated centroid.
We need to...
• Create Dataset
• Create Cluster
• Create BatchCentroid from Cluster
and Dataset
• Save BatchCentroid as new Dataset
#VSSML16 Automating Machine Learning September 2016 9 / 43
Example workflow: building blocks
curl -X POST "https://bigml.io?$AUTH/dataset" 
-D '{"source": "source/56fbbfea200d5a3403000db7"}'
curl -X POST "https://bigml.io?$AUTH/cluster" 
-D '{"source": "dataset/43ffe231a34fff333000b65"}'
curl -X POST "https://bigml.io?$AUTH/batchcentroid" 
-D '{"dataset": "dataset/43ffe231a34fff333000b65",
"cluster": "cluster/33e2e231a34fff333000b65"}'
curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334"
#VSSML16 Automating Machine Learning September 2016 10 / 43
Example workflow: Web UI
#VSSML16 Automating Machine Learning September 2016 11 / 43
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
# create dataset and cluster, waiting for both
dataset = api.create_dataset(source)
api.ok(dataset)
cluster = api.create_cluster(dataset)
api.ok(cluster)
# create new dataset with centroid
new_dataset = api.create_batch_centroid(cluster, dataset,
{'output_dataset': True,
'all_fields': True})
# wait again, via polling, until the job is finished
api.ok(new_dataset)
#VSSML16 Automating Machine Learning September 2016 12 / 43
Outline
1 Machine Learning workflows
2 Client-side workflows: REST API and bindings
3 Client-side workflows: Bigmler
4 Server-side workflows: WhizzML
5 Example Workflow Walk-throughs
#VSSML16 Automating Machine Learning September 2016 13 / 43
Higher-level Machine Learning
#VSSML16 Automating Machine Learning September 2016 14 / 43
Simple workflow in a one-liner
# 1-clikc cluster
bigmler cluster 
--output-dir output/job
--train data/iris.csv 
--test-datasets output/job/dataset 
--remote 
--to-dataset
# the created dataset id:
cat output/job/batch_centroid_dataset
#VSSML16 Automating Machine Learning September 2016 15 / 43
Simple automation: “1-click” tasks
# "1-click" ensemble
bigmler --train data/iris.csv 
--number-of-models 500 
--sample-rate 0.85 
--output-dir output/iris-ensemble 
--project "vssml tutorial"
# "1-click" dataset with parameterized fields
bigmler --train data/diabetes.csv 
--no-model 
--name "4-featured diabetes" 
--dataset-fields 
"plasma glucose,insulin,diabetes pedigree,diabetes" 
--output-dir output/diabetes 
--project vssml_tutorial
#VSSML16 Automating Machine Learning September 2016 16 / 43
Rich, parameterized workflows: cross-validation
bigmler analyze --cross-validation  # parameterized input
--dataset $(cat output/diabetes/dataset) 
--k-folds 3  # number of folds during validation
--output-dir output/diabetes-validation
#VSSML16 Automating Machine Learning September 2016 17 / 43
Rich, parameterized workflows: feature selection
bigmler analyze --features  # parameterized input
--dataset $(cat output/diabetes/dataset) 
--k-folds 2  # number of folds during validation
--staleness 2  # stop criterium
--optimize precision  # optimization metric
--penalty 1  # algorithm parameter
--output-dir output/diabetes-features-selection
#VSSML16 Automating Machine Learning September 2016 18 / 43
Outline
1 Machine Learning workflows
2 Client-side workflows: REST API and bindings
3 Client-side workflows: Bigmler
4 Server-side workflows: WhizzML
5 Example Workflow Walk-throughs
#VSSML16 Automating Machine Learning September 2016 19 / 43
Client-side Machine Learning Automation
Problems of client-side solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
Extensibility Bigmler hides complexity at the cost of flexibility
Not enough abstraction
#VSSML16 Automating Machine Learning September 2016 20 / 43
Higher-level Machine Learning
#VSSML16 Automating Machine Learning September 2016 21 / 43
Server-side Machine Learning
#VSSML16 Automating Machine Learning September 2016 22 / 43
WhizzML in a Nutshell
• Domain-specific language for ML workflow automation
High-level problem and solution specication
• Framework for scalable, remote execution of ML workflows
Sophisticated server-side optimization
Out-of-the-box scalability
Client-server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries
#VSSML16 Automating Machine Learning September 2016 23 / 43
WhizzML REST Resources
Library Reusable building-block: a collection of
WhizzML denitions that can be imported by
other libraries or scripts.
Script Executable code that describes an actual
workflow.
• Imports List of libraries with code used by
the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by the
script and returned to the user.
Execution Given a script and a complete set of inputs,
the workflow can be executed and its outputs
generated.
#VSSML16 Automating Machine Learning September 2016 24 / 43
Different ways to create WhizzML Scripts/Libraries
Github
Script editor
Gallery
Other scripts
Scriptify
−→
#VSSML16 Automating Machine Learning September 2016 25 / 43
Basic workflow in WhizzML
(let (dataset (create-dataset source)
cluster (create-cluster dataset))
(create-batchcentroid dataset
cluster
{"output_dataset" true
"all_fields" true}))
#VSSML16 Automating Machine Learning September 2016 26 / 43
Basic workflow in WhizzML: Usable by any binding
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))
#VSSML16 Automating Machine Learning September 2016 27 / 43
Basic workflow in WhizzML: Trivial parallelization
;; Workflow for 1 resource
(let (dataset (create-dataset source)
cluster (create-cluster dataset))
(create-batchcentroid dataset
cluster
{"output_dataset" true
"all_fields" true}))
#VSSML16 Automating Machine Learning September 2016 28 / 43
Basic workflow in WhizzML: Trivial parallelization
;; Workflow for any number of resources
(let (datasets (map create-dataset sources)
clusters (map create-cluster datasets)
params {"output_dataset" true "all_fields" true})
(map (lambda (d c) (create-batchcentroid d c params))
datasets
clusters))
#VSSML16 Automating Machine Learning September 2016 29 / 43
Basic workflows in WhizzML: automatic generation
#VSSML16 Automating Machine Learning September 2016 30 / 43
Standard functions
• Numeric and relational operators (+, *, <, =, ...)
• Mathematical functions (cos, sinh, floor ...)
• Strings and regular expressions (str, matches?, replace, ...)
• Flatline generation
• Collections: list traversal, sorting, map manipulation
• BigML resources manipulation
Creation create-source, create-and-wait-dataset, etc.
Retrieval fetch, list-anomalies, etc.
Update update
Deletion delete
• Machine Learning Algorithms (SMACDown, Boosting, etc.)
#VSSML16 Automating Machine Learning September 2016 31 / 43
Outline
1 Machine Learning workflows
2 Client-side workflows: REST API and bindings
3 Client-side workflows: Bigmler
4 Server-side workflows: WhizzML
5 Example Workflow Walk-throughs
#VSSML16 Automating Machine Learning September 2016 32 / 43
Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
https://github.com/whizzml/examples/tree/master/model-or-ensemble
#VSSML16 Automating Machine Learning September 2016 33 / 43
Model or Ensemble?
;; Functions for creating the two dataset parts
;; Sample a dataset taking a fraction of its rows (rate) and
;; keeping either that fraction (out-of-bag? false) or its
;; complement (out-of-bag? true)
(define (sample-dataset origin-id rate out-of-bag?)
(create-dataset {"origin_dataset" origin-id
"sample_rate" rate
"out_of_bag" out-of-bag?
"seed" "example-seed-0001"})))
;; Create in parallel two halves of a dataset using
;; the sample function twice. Return a list of the two
;; new dataset ids.
(define (split-dataset origin-id rate)
(list (sample-dataset origin-id rate false)
(sample-dataset origin-id rate true)))
#VSSML16 Automating Machine Learning September 2016 34 / 43
Model or Ensemble?
;; Functions to create an ensemble and extract the f-measure from
;; evaluation, given its id.
(define (make-ensemble ds-id size)
(create-ensemble ds-id {"number_of_models" size}))
(define (f-measure ev-id)
(let (ev-id (wait ev-id) ;; because fetch doesn't wait
evaluation (fetch ev-id))
(evaluation ["result" "model" "average_f_measure"]))
#VSSML16 Automating Machine Learning September 2016 35 / 43
Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-dataset {"source" src-id})
[train-id test-id] (split-dataset ds-id 0.8)
m-id (create-model train-id)
e-id (make-ensemble train-id 15)
m-f (f-measure (create-evaluation m-id test-id))
e-f (f-measure (create-evaluation e-id test-id)))
(log-info "model f " m-f " / ensemble f " e-f)
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))
#VSSML16 Automating Machine Learning September 2016 36 / 43
Transforming item counts to features
basket milk eggs flour salt chocolate caviar
milk,eggs Y Y N N N N
milk,flour Y N Y N N N
milk,flour,eggs Y Y Y N N N
chocolate N N N N Y N
#VSSML16 Automating Machine Learning September 2016 37 / 43
Item counts to features with Flatline
(if (contains-items? "basket" "milk") "Y" "N")
(if (contains-items? "basket" "eggs") "Y" "N")
(if (contains-items? "basket" "flour") "Y" "N")
(if (contains-items? "basket" "salt") "Y" "N")
(if (contains-items? "basket" "chocolate") "Y" "N")
(if (contains-items? "basket" "caviar") "Y" "N")
Parameterized code generation
Field name
Item values
Y/N category names
#VSSML16 Automating Machine Learning September 2016 38 / 43
Flatline code generation with WhizzML
"(if (contains-items? "basket" "milk") "Y" "N")"
#VSSML16 Automating Machine Learning September 2016 39 / 43
Flatline code generation with WhizzML
"(if (contains-items? "basket" "milk") "Y" "N")"
(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
#VSSML16 Automating Machine Learning September 2016 39 / 43
Flatline code generation with WhizzML
"(if (contains-items? "basket" "milk") "Y" "N")"
(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
(define (field-flatline field item yes no)
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
#VSSML16 Automating Machine Learning September 2016 39 / 43
Flatline code generation with WhizzML
(define (field-flatline field item yes no)
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))
(define (item-fields field items yes no)
(for (item items)
{"field" (field-flatline field item yes no)}))
(define (dataset-item-fields ds-id field)
(let (ds (fetch ds-id)
item-dist (ds ["fields" field "summary" "items"])
items (map head item-dist))
(item-fields field items "Y" "N")))
#VSSML16 Automating Machine Learning September 2016 40 / 43
Flatline code generation with WhizzML
(define output-dataset
(let (fs {"new_fields" (dataset-item-fields input-dataset
field)})
(create-dataset input-dataset fs)))
{"inputs": [{"name": "input-dataset",
"type": "dataset-id",
"description": "The input dataset"},
{"name": "field",
"type": "string",
"description": "Id of the items field"}],
"outputs": [{"name": "output-dataset",
"type": "dataset-id",
"description": "The id of the generated dataset"}]}
#VSSML16 Automating Machine Learning September 2016 41 / 43
More information
Resources
• Home: https://bigml.com/whizzml
• Documentation: https://bigml.com/whizzml#documentation
• Examples: https://github.com/whizzml/examples
#VSSML16 Automating Machine Learning September 2016 42 / 43

VSSML16 L7. REST API, Bindings, and Basic Workflows

  • 1.
    Automating Machine Learning API,bindings, BigMLer and Basic Workflows #VSSML16 September 2016 #VSSML16 Automating Machine Learning September 2016 1 / 43
  • 2.
    Outline 1 Machine Learningworkflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 2 / 43
  • 3.
    Outline 1 Machine Learningworkflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 3 / 43
  • 4.
    Machine Learning asa System Service The goal Machine Learning as a system level service The means • APIs: ML building blocks • Abstraction layer over feature engineering • Abstraction layer over algorithms • Automation #VSSML16 Automating Machine Learning September 2016 4 / 43
  • 5.
    Machine Learning workflows #VSSML16Automating Machine Learning September 2016 5 / 43
  • 6.
    Machine Learning workflows,for real #VSSML16 Automating Machine Learning September 2016 6 / 43
  • 7.
    Higher-level Machine Learning #VSSML16Automating Machine Learning September 2016 7 / 43
  • 8.
    Outline 1 Machine Learningworkflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 8 / 43
  • 9.
    Example workflow: BatchCentroid Objective: Label each row in a Dataset with its associated centroid. We need to... • Create Dataset • Create Cluster • Create BatchCentroid from Cluster and Dataset • Save BatchCentroid as new Dataset #VSSML16 Automating Machine Learning September 2016 9 / 43
  • 10.
    Example workflow: buildingblocks curl -X POST "https://bigml.io?$AUTH/dataset" -D '{"source": "source/56fbbfea200d5a3403000db7"}' curl -X POST "https://bigml.io?$AUTH/cluster" -D '{"source": "dataset/43ffe231a34fff333000b65"}' curl -X POST "https://bigml.io?$AUTH/batchcentroid" -D '{"dataset": "dataset/43ffe231a34fff333000b65", "cluster": "cluster/33e2e231a34fff333000b65"}' curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334" #VSSML16 Automating Machine Learning September 2016 10 / 43
  • 11.
    Example workflow: WebUI #VSSML16 Automating Machine Learning September 2016 11 / 43
  • 12.
    Example workflow: Pythonbindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' # create dataset and cluster, waiting for both dataset = api.create_dataset(source) api.ok(dataset) cluster = api.create_cluster(dataset) api.ok(cluster) # create new dataset with centroid new_dataset = api.create_batch_centroid(cluster, dataset, {'output_dataset': True, 'all_fields': True}) # wait again, via polling, until the job is finished api.ok(new_dataset) #VSSML16 Automating Machine Learning September 2016 12 / 43
  • 13.
    Outline 1 Machine Learningworkflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 13 / 43
  • 14.
    Higher-level Machine Learning #VSSML16Automating Machine Learning September 2016 14 / 43
  • 15.
    Simple workflow ina one-liner # 1-clikc cluster bigmler cluster --output-dir output/job --train data/iris.csv --test-datasets output/job/dataset --remote --to-dataset # the created dataset id: cat output/job/batch_centroid_dataset #VSSML16 Automating Machine Learning September 2016 15 / 43
  • 16.
    Simple automation: “1-click”tasks # "1-click" ensemble bigmler --train data/iris.csv --number-of-models 500 --sample-rate 0.85 --output-dir output/iris-ensemble --project "vssml tutorial" # "1-click" dataset with parameterized fields bigmler --train data/diabetes.csv --no-model --name "4-featured diabetes" --dataset-fields "plasma glucose,insulin,diabetes pedigree,diabetes" --output-dir output/diabetes --project vssml_tutorial #VSSML16 Automating Machine Learning September 2016 16 / 43
  • 17.
    Rich, parameterized workflows:cross-validation bigmler analyze --cross-validation # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 3 # number of folds during validation --output-dir output/diabetes-validation #VSSML16 Automating Machine Learning September 2016 17 / 43
  • 18.
    Rich, parameterized workflows:feature selection bigmler analyze --features # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 2 # number of folds during validation --staleness 2 # stop criterium --optimize precision # optimization metric --penalty 1 # algorithm parameter --output-dir output/diabetes-features-selection #VSSML16 Automating Machine Learning September 2016 18 / 43
  • 19.
    Outline 1 Machine Learningworkflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 19 / 43
  • 20.
    Client-side Machine LearningAutomation Problems of client-side solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows hard to optimize Extensibility Bigmler hides complexity at the cost of flexibility Not enough abstraction #VSSML16 Automating Machine Learning September 2016 20 / 43
  • 21.
    Higher-level Machine Learning #VSSML16Automating Machine Learning September 2016 21 / 43
  • 22.
    Server-side Machine Learning #VSSML16Automating Machine Learning September 2016 22 / 43
  • 23.
    WhizzML in aNutshell • Domain-specific language for ML workflow automation High-level problem and solution specification • Framework for scalable, remote execution of ML workflows Sophisticated server-side optimization Out-of-the-box scalability Client-server brittleness removed Infrastructure for creating and sharing ML scripts and libraries #VSSML16 Automating Machine Learning September 2016 23 / 43
  • 24.
    WhizzML REST Resources LibraryReusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. #VSSML16 Automating Machine Learning September 2016 24 / 43
  • 25.
    Different ways tocreate WhizzML Scripts/Libraries Github Script editor Gallery Other scripts Scriptify −→ #VSSML16 Automating Machine Learning September 2016 25 / 43
  • 26.
    Basic workflow inWhizzML (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #VSSML16 Automating Machine Learning September 2016 26 / 43
  • 27.
    Basic workflow inWhizzML: Usable by any binding from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) #VSSML16 Automating Machine Learning September 2016 27 / 43
  • 28.
    Basic workflow inWhizzML: Trivial parallelization ;; Workflow for 1 resource (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #VSSML16 Automating Machine Learning September 2016 28 / 43
  • 29.
    Basic workflow inWhizzML: Trivial parallelization ;; Workflow for any number of resources (let (datasets (map create-dataset sources) clusters (map create-cluster datasets) params {"output_dataset" true "all_fields" true}) (map (lambda (d c) (create-batchcentroid d c params)) datasets clusters)) #VSSML16 Automating Machine Learning September 2016 29 / 43
  • 30.
    Basic workflows inWhizzML: automatic generation #VSSML16 Automating Machine Learning September 2016 30 / 43
  • 31.
    Standard functions • Numericand relational operators (+, *, <, =, ...) • Mathematical functions (cos, sinh, floor ...) • Strings and regular expressions (str, matches?, replace, ...) • Flatline generation • Collections: list traversal, sorting, map manipulation • BigML resources manipulation Creation create-source, create-and-wait-dataset, etc. Retrieval fetch, list-anomalies, etc. Update update Deletion delete • Machine Learning Algorithms (SMACDown, Boosting, etc.) #VSSML16 Automating Machine Learning September 2016 31 / 43
  • 32.
    Outline 1 Machine Learningworkflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 32 / 43
  • 33.
    Model or Ensemble? •Split a dataset in test and training parts • Create a model and an ensemble with the training dataset • Evaluate both with the test dataset • Choose the one with better evaluation (f-measure) https://github.com/whizzml/examples/tree/master/model-or-ensemble #VSSML16 Automating Machine Learning September 2016 33 / 43
  • 34.
    Model or Ensemble? ;;Functions for creating the two dataset parts ;; Sample a dataset taking a fraction of its rows (rate) and ;; keeping either that fraction (out-of-bag? false) or its ;; complement (out-of-bag? true) (define (sample-dataset origin-id rate out-of-bag?) (create-dataset {"origin_dataset" origin-id "sample_rate" rate "out_of_bag" out-of-bag? "seed" "example-seed-0001"}))) ;; Create in parallel two halves of a dataset using ;; the sample function twice. Return a list of the two ;; new dataset ids. (define (split-dataset origin-id rate) (list (sample-dataset origin-id rate false) (sample-dataset origin-id rate true))) #VSSML16 Automating Machine Learning September 2016 34 / 43
  • 35.
    Model or Ensemble? ;;Functions to create an ensemble and extract the f-measure from ;; evaluation, given its id. (define (make-ensemble ds-id size) (create-ensemble ds-id {"number_of_models" size})) (define (f-measure ev-id) (let (ev-id (wait ev-id) ;; because fetch doesn't wait evaluation (fetch ev-id)) (evaluation ["result" "model" "average_f_measure"])) #VSSML16 Automating Machine Learning September 2016 35 / 43
  • 36.
    Model or Ensemble? ;;Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-dataset {"source" src-id}) [train-id test-id] (split-dataset ds-id 0.8) m-id (create-model train-id) e-id (make-ensemble train-id 15) m-f (f-measure (create-evaluation m-id test-id)) e-f (f-measure (create-evaluation e-id test-id))) (log-info "model f " m-f " / ensemble f " e-f) (if (> m-f e-f) m-id e-id))) ;; Compute the result of the script execution ;; - Inputs: [{"name": "input-source-id", "type": "source-id"}] ;; - Outputs: [{"name": "result", "type": "resource-id"}] (define result (model-or-ensemble input-source-id)) #VSSML16 Automating Machine Learning September 2016 36 / 43
  • 37.
    Transforming item countsto features basket milk eggs flour salt chocolate caviar milk,eggs Y Y N N N N milk,flour Y N Y N N N milk,flour,eggs Y Y Y N N N chocolate N N N N Y N #VSSML16 Automating Machine Learning September 2016 37 / 43
  • 38.
    Item counts tofeatures with Flatline (if (contains-items? "basket" "milk") "Y" "N") (if (contains-items? "basket" "eggs") "Y" "N") (if (contains-items? "basket" "flour") "Y" "N") (if (contains-items? "basket" "salt") "Y" "N") (if (contains-items? "basket" "chocolate") "Y" "N") (if (contains-items? "basket" "caviar") "Y" "N") Parameterized code generation Field name Item values Y/N category names #VSSML16 Automating Machine Learning September 2016 38 / 43
  • 39.
    Flatline code generationwith WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" #VSSML16 Automating Machine Learning September 2016 39 / 43
  • 40.
    Flatline code generationwith WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML16 Automating Machine Learning September 2016 39 / 43
  • 41.
    Flatline code generationwith WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML16 Automating Machine Learning September 2016 39 / 43
  • 42.
    Flatline code generationwith WhizzML (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (item-fields field items yes no) (for (item items) {"field" (field-flatline field item yes no)})) (define (dataset-item-fields ds-id field) (let (ds (fetch ds-id) item-dist (ds ["fields" field "summary" "items"]) items (map head item-dist)) (item-fields field items "Y" "N"))) #VSSML16 Automating Machine Learning September 2016 40 / 43
  • 43.
    Flatline code generationwith WhizzML (define output-dataset (let (fs {"new_fields" (dataset-item-fields input-dataset field)}) (create-dataset input-dataset fs))) {"inputs": [{"name": "input-dataset", "type": "dataset-id", "description": "The input dataset"}, {"name": "field", "type": "string", "description": "Id of the items field"}], "outputs": [{"name": "output-dataset", "type": "dataset-id", "description": "The id of the generated dataset"}]} #VSSML16 Automating Machine Learning September 2016 41 / 43
  • 44.
    More information Resources • Home:https://bigml.com/whizzml • Documentation: https://bigml.com/whizzml#documentation • Examples: https://github.com/whizzml/examples #VSSML16 Automating Machine Learning September 2016 42 / 43