BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Automating Machine Learning
Advanced Workﬂows and WhizzML
#BSSML16
December 2016
#BSSML16 Automating Machine Learning December 2016 1 / 46

Outline
1 Server-side workflows: WhizzML
2 Basic Workflow: Model or ensemble?
3 Case study: Using Flatline in Whizzml
4 Advanced Workflows
5 Case Study: Stacked Generalization in WhizzML

Outline

Client-side Machine Learning Automation
Problems of client-side solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workﬂows hard to optimize
Extensibility Bigmler hides complexity at the cost of ﬂexibility
Not enough abstraction

Machine Learning Automation for real
Solution (complexity, reuse): Domain-speciﬁc languages

Machine Learning Automation for real
Solution (scalability, reuse): Back to the server

WhizzML in a Nutshell
• Domain-specific language for ML workflow automation
High-level problem and solution specification
• Framework for scalable, remote execution of ML workflows
Sophisticated server-side optimization
Out-of-the-box scalability
Client-server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries

WhizzML REST Resources
Library Reusable building-block: a collection of
WhizzML definitions that can be imported by
other libraries or scripts.
Script Executable code that describes an actual
workflow.
• Imports List of libraries with code used by
the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by the
script and returned to the user.
Execution Given a script and a complete set of inputs,
the workflow can be executed and its outputs
generated.

Different ways to create WhizzML Scripts/Libraries
Github
Script editor
Gallery
Other scripts
Scriptify
−→

Basic workﬂow in WhizzML
(let (dataset (create-dataset source)
cluster (create-cluster dataset))
(create-batchcentroid dataset
cluster
{"output_dataset" true
"all_fields" true}))

Basic workﬂow in WhizzML: Usable by any binding
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))

Basic workﬂow in WhizzML: Trivial parallelization
;; Workflow for 1 resource
(let (dataset (create-dataset source)
cluster (create-cluster dataset))
(create-batchcentroid dataset
cluster
{"output_dataset" true
"all_fields" true}))

Basic workﬂow in WhizzML: Trivial parallelization
;; Workflow for any number of resources
(let (datasets (map create-dataset sources)
clusters (map create-cluster datasets)
params {"output_dataset" true "all_fields" true})
(map (lambda (d c) (create-batchcentroid d c params))
datasets
clusters))

Basic workﬂows in WhizzML: automatic generation

Standard functions
• Numeric and relational operators (+, *, <, =, ...)
• Mathematical functions (cos, sinh, floor ...)
• Strings and regular expressions (str, matches?, replace, ...)
• Flatline generation
• Collections: list traversal, sorting, map manipulation
• BigML resources manipulation
Creation create-source, create-and-wait-dataset, etc.
Retrieval fetch, list-anomalies, etc.
Update update
Deletion delete
• Machine Learning Algorithms (SMACDown, Boosting, etc.)

Outline

Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
https://github.com/whizzml/examples/tree/master/model-or-ensemble

Model or Ensemble?
;; Functions for creating the two dataset parts
;; Sample a dataset taking a fraction of its rows (rate) and
;; keeping either that fraction (out-of-bag? false) or its
;; complement (out-of-bag? true)
(define (sample-dataset origin-id rate out-of-bag?)
(create-dataset {"origin_dataset" origin-id
"sample_rate" rate
"out_of_bag" out-of-bag?
"seed" "example-seed-0001"})))
;; Create in parallel two halves of a dataset using
;; the sample function twice. Return a list of the two
;; new dataset ids.
(define (split-dataset origin-id rate)
(list (sample-dataset origin-id rate false)
(sample-dataset origin-id rate true)))

Model or Ensemble?
;; Functions to create an ensemble and extract the f-measure from
;; evaluation, given its id.
(define (make-ensemble ds-id size)
(create-ensemble ds-id {"number_of_models" size}))
(define (f-measure ev-id)
(let (ev-id (wait ev-id) ;; because fetch doesn't wait
evaluation (fetch ev-id))
(evaluation ["result" "model" "average_f_measure"]))

Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-dataset {"source" src-id})
[train-id test-id] (split-dataset ds-id 0.8)
m-id (create-model train-id)
e-id (make-ensemble train-id 15)
m-f (f-measure (create-evaluation m-id test-id))
e-f (f-measure (create-evaluation e-id test-id)))
(log-info "model f " m-f " / ensemble f " e-f)
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))

Outline

Transforming item counts to features
basket milk eggs flour salt chocolate caviar
milk,eggs Y Y N N N N
milk,flour Y N Y N N N
milk,flour,eggs Y Y Y N N N
chocolate N N N N Y N

Item counts to features with Flatline
(if (contains-items? "basket" "milk") "Y" "N")
(if (contains-items? "basket" "eggs") "Y" "N")
(if (contains-items? "basket" "flour") "Y" "N")
(if (contains-items? "basket" "salt") "Y" "N")
(if (contains-items? "basket" "chocolate") "Y" "N")
(if (contains-items? "basket" "caviar") "Y" "N")
Parameterized code generation
Field name
Item values
Y/N category names

Flatline code generation with WhizzML
"(if (contains-items? "basket" "milk") "Y" "N")"

(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
"{{yes}}"
"{{no}})"))

(let (field "basket"
item "milk"
yes "Y"
no "N")
"{{yes}}"
"{{no}})"))
(define (field-flatline field item yes no)
"{{yes}}"
"{{no}})"))

(define (field-flatline field item yes no)
"{{yes}}"
"{{no}})"))
(define (item-fields field items yes no)
(for (item items)
{"field" (field-flatline field item yes no)}))
(define (dataset-item-fields ds-id field)
(let (ds (fetch ds-id)
item-dist (ds ["fields" field "summary" "items"])
items (map head item-dist))
(item-fields field items "Y" "N")))

(define output-dataset
(let (fs {"new_fields" (dataset-item-fields input-dataset
field)})
(create-dataset input-dataset fs)))
{"inputs": [{"name": "input-dataset",
"type": "dataset-id",
"description": "The input dataset"},
{"name": "field",
"type": "string",
"description": "Id of the items field"}],
"outputs": [{"name": "output-dataset",
"type": "dataset-id",
"description": "The id of the generated dataset"}]}

Outline

What Do We Know About WhizzML?
• It’s a complete programming language
• Machine learning “operations” are ﬁrst-class
• Those operations are performed in BigML’s backend
One-line of code to perform API requests
We get scale “for free”
• Everything is Composable
Functions
Libraries
The Web Interface

What Can We Do With It?
• Non-trivial Model Selection
n-fold cross validation
Comparison of model types (tree, ensemble, logistic)
• Automation of Drudgery
One-click retraining/validation
Standarized dataset transformations / cleaning
• Sure, but what else?

Algorithms as Workﬂows
• Many ML algorithms can be thought of as workﬂows
• In these algorithms, machine learning operations are the
primitives
Make a model
Make a prediction
Evaluate a model
• Many such algorithms can be implemented in WhizzML
Reap the advantages of BigML’s infrastructure
Once implemented, it is language-agnostic

Examples: Stacked Generalization

Examples: Randomized Parameter Optimization

Examples: SMACdown

Examples: SMACdown
Objective: Find the best set of parameters even more quickly!
• Do:
Generate several random sets of parameters for an ML algorithm
Do 10-fold cross-validation with those parameters
Learn a predictive model to predict performance from parameter
values
Use the model to help you select the next set of parameters to
evaluate
• Until you get a set of parameters that performs “well” or you get
bored

Examples: Boosting
• General idea: Iteratively model the dataset
Each iteration is trained on the mistakes of previous iterations
Said another way, the objective changes each iteration
The ﬁnal model is a summation of all iterations
• Lots of variations on this theme
Adaboost
Logitboost
Martingale Boosting
Gradient Boosting
• Let’s take a look at a WhizzML implementation of the latter

Examples: Boosting

Outline

Examples: Stacked Generalization

Stacked generalization
Objective: Improve predictions by modeling the output scores of
multiple trained models.
• Create a training and a holdout set
• Create n different models on the training set (with some difference
among them; e.g., single-tree vs. ensemble vs. logistic regression)
• Make predictions from those models on the holdout set
• Train a model to predict the class based on the other models’
predictions

A Stacked generalization library: creating the stack
;; Splits the given dataset, using half of it to create
;; an heterogeneous collection of models and the other
;; half to train a tree that predicts based on those other
;; models predictions. Returns a map with the collection
;; of models (under the key "models") and the meta-prediction
;; as the value of the key "metamodel". The key "result"
;; has as value a boolean flag indicating whether the
;; process was successful.
(define (make-stack dataset-id)
(let ([train-id hold-id] (create-random-dataset-split dataset-id 0.5)
models (create-stack-models train-id)
id (create-stack-predictions models hold-id)
orig-fields (model-inputs (head models))
obj-id (dataset-get-objective-id train-id)
meta-id (create-model {"dataset" id
"excluded_fields" orig-fields
"objective_field" obj-id})
success? (resource-done? (fetch (wait meta-id))))
{"models" models "metamodel" meta-id "result" success?}))

A Stacked generalization library: using the stack
;; Use the models and metamodels computed by make-stack
;; to make a prediction on the input-data map. Returns
;; the identifier of the prediction object.
(define (make-stack-prediction models meta-model input-data)
(let (preds (map (lambda (m) (create-prediction {"model" m
"input_data" input-data}))
models)
preds (map (lambda (p)
(head (values ((fetch p) "prediction"))))
preds)
meta-input (make-map (model-inputs meta-model) preds))
(create-prediction {"model" meta-model "input_data" meta-input})))

A Stacked generalization library: auxiliary functions
;; Extract for a batchpredction its associated dataset of results
(define (batch-dataset id)
(wait ((fetch id) "output_dataset_resource")))
;; Create a batchprediction for the given model and datasets,
;; with a map of additional options and using defaults appropriate
;; for model stacking
(define (make-batch ds-id mod-id)
(let (name (resource-type mod-id))
(create-batchprediction ds-id mod-id {"all_fields" true
"output_dataset" true
"prediction_name" name})))
;; Auxiliary function extracting the model_inputs of a model
(define (model-inputs mod-id)
(fetch mod-id) "input_fields"))

A Stacked generalization library: auxiliary functions
;; Auxiliary function to create the set of stack models
(define (create-stack-models train-id)
[(create-model {"dataset" train-id})
(create-ensemble {"dataset" train-id
"number_of_models" 20
"randomize" false})
(create-ensemble {"dataset" train-id
"number_of_models" 20
"randomize" true})
(create-logisticregression {"dataset" train-id})])
;; Auxiliary funtion to successively create batchpredictions using the
;; given models over the initial dataset ds-id. Returns the final
;; dataset id.
(define (create-stack-predictions models ds-id)
(reduce (lambda (did mid)
(batch-dataset (make-batch did mid)))
ds-id
models))

A Stacked generalization library: creating the stack
;; Splits the given dataset, using half of it to create
;; an heterogeneous collection of models and the other
;; half to train a tree that predicts based on those other
;; models predictions. Returns a map with the collection
;; of models (under the key "models") and the meta-prediction
;; as the value of the key "metamodel". The key "result"
;; has as value a boolean flag indicating whether the
;; process was successful.
(define (make-stack dataset-id)
(let ([train-id hold-id] (create-random-dataset-split dataset-id 0.5)
models (create-stack-models train-id)
id (create-stack-predictions models hold-id)
orig-fields (model-inputs (head models))
obj-id (dataset-get-objective-id train-id)
meta-id (create-model {"dataset" id
"excluded_fields" orig-fields
"objective_field" obj-id})
success? (resource-done? (fetch (wait meta-id))))
{"models" models "metamodel" meta-id "result" success?}))

Library-based scripts
Script for creating the models
(define stack (make-stack dataset-id))
Script for predictions using the stack
(define (make-prediction exec-id input-data)
(let (exec (fetch exec-id)
stack (nth (head (get-in exec ["execution" "outputs"])) 1)
models (get stack "models")
metamodel (get stack "metamodel"))
(when (get stack "result")
(try (make-stack-prediction models metamodel {})
(catch e (log-info "Error: " e) false)))))
(define prediction-id (make-prediction exec-id input-data))
(define prediction (when prediction-id (fetch prediction-id)))
https://github.com/whizzml/examples/tree/master/stacked-generalizati

Questions?

BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Similar to BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking (20)

More from BigML, Inc

More from BigML, Inc (20)

Recently uploaded

Recently uploaded (20)

BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking