SlideShare a Scribd company logo
Automating Machine Learning
Advanced Workflows and WhizzML
December 2016
#BSSML16 Automating Machine Learning December 2016 1 / 46
1 Server-side workflows: WhizzML
2 Basic Workflow: Model or ensemble?
3 Case study: Using Flatline in Whizzml
4 Advanced Workflows
5 Case Study: Stacked Generalization in WhizzML
#BSSML16 Automating Machine Learning December 2016 2 / 46
1 Server-side workflows: WhizzML
2 Basic Workflow: Model or ensemble?
3 Case study: Using Flatline in Whizzml
4 Advanced Workflows
5 Case Study: Stacked Generalization in WhizzML
#BSSML16 Automating Machine Learning December 2016 3 / 46
Client-side Machine Learning Automation
Problems of client-side solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
Extensibility Bigmler hides complexity at the cost of flexibility
Not enough abstraction
#BSSML16 Automating Machine Learning December 2016 4 / 46
Machine Learning Automation for real
Solution (complexity, reuse): Domain-specific languages
#BSSML16 Automating Machine Learning December 2016 5 / 46
Machine Learning Automation for real
Solution (scalability, reuse): Back to the server
#BSSML16 Automating Machine Learning December 2016 6 / 46
Machine Learning Automation for real
Solution (scalability, reuse): Back to the server
#BSSML16 Automating Machine Learning December 2016 6 / 46
WhizzML in a Nutshell
• Domain-specific language for ML workflow automation
High-level problem and solution specification
• Framework for scalable, remote execution of ML workflows
Sophisticated server-side optimization
Out-of-the-box scalability
Client-server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries
#BSSML16 Automating Machine Learning December 2016 7 / 46
WhizzML REST Resources
Library Reusable building-block: a collection of
WhizzML definitions that can be imported by
other libraries or scripts.
Script Executable code that describes an actual
• Imports List of libraries with code used by
the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by the
script and returned to the user.
Execution Given a script and a complete set of inputs,
the workflow can be executed and its outputs
#BSSML16 Automating Machine Learning December 2016 8 / 46
Different ways to create WhizzML Scripts/Libraries
Script editor
Other scripts
#BSSML16 Automating Machine Learning December 2016 9 / 46
Basic workflow in WhizzML
(let (dataset (create-dataset source)
cluster (create-cluster dataset))
(create-batchcentroid dataset
{"output_dataset" true
"all_fields" true}))
#BSSML16 Automating Machine Learning December 2016 10 / 46
Basic workflow in WhizzML: Usable by any binding
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))
#BSSML16 Automating Machine Learning December 2016 11 / 46
Basic workflow in WhizzML: Trivial parallelization
;; Workflow for 1 resource
(let (dataset (create-dataset source)
cluster (create-cluster dataset))
(create-batchcentroid dataset
{"output_dataset" true
"all_fields" true}))
#BSSML16 Automating Machine Learning December 2016 12 / 46
Basic workflow in WhizzML: Trivial parallelization
;; Workflow for any number of resources
(let (datasets (map create-dataset sources)
clusters (map create-cluster datasets)
params {"output_dataset" true "all_fields" true})
(map (lambda (d c) (create-batchcentroid d c params))
#BSSML16 Automating Machine Learning December 2016 13 / 46
Basic workflows in WhizzML: automatic generation
#BSSML16 Automating Machine Learning December 2016 14 / 46
Standard functions
• Numeric and relational operators (+, *, <, =, ...)
• Mathematical functions (cos, sinh, floor ...)
• Strings and regular expressions (str, matches?, replace, ...)
• Flatline generation
• Collections: list traversal, sorting, map manipulation
• BigML resources manipulation
Creation create-source, create-and-wait-dataset, etc.
Retrieval fetch, list-anomalies, etc.
Update update
Deletion delete
• Machine Learning Algorithms (SMACDown, Boosting, etc.)
#BSSML16 Automating Machine Learning December 2016 15 / 46
1 Server-side workflows: WhizzML
2 Basic Workflow: Model or ensemble?
3 Case study: Using Flatline in Whizzml
4 Advanced Workflows
5 Case Study: Stacked Generalization in WhizzML
#BSSML16 Automating Machine Learning December 2016 16 / 46
Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
#BSSML16 Automating Machine Learning December 2016 17 / 46
Model or Ensemble?
;; Functions for creating the two dataset parts
;; Sample a dataset taking a fraction of its rows (rate) and
;; keeping either that fraction (out-of-bag? false) or its
;; complement (out-of-bag? true)
(define (sample-dataset origin-id rate out-of-bag?)
(create-dataset {"origin_dataset" origin-id
"sample_rate" rate
"out_of_bag" out-of-bag?
"seed" "example-seed-0001"})))
;; Create in parallel two halves of a dataset using
;; the sample function twice. Return a list of the two
;; new dataset ids.
(define (split-dataset origin-id rate)
(list (sample-dataset origin-id rate false)
(sample-dataset origin-id rate true)))
#BSSML16 Automating Machine Learning December 2016 18 / 46
Model or Ensemble?
;; Functions to create an ensemble and extract the f-measure from
;; evaluation, given its id.
(define (make-ensemble ds-id size)
(create-ensemble ds-id {"number_of_models" size}))
(define (f-measure ev-id)
(let (ev-id (wait ev-id) ;; because fetch doesn't wait
evaluation (fetch ev-id))
(evaluation ["result" "model" "average_f_measure"]))
#BSSML16 Automating Machine Learning December 2016 19 / 46
Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-dataset {"source" src-id})
[train-id test-id] (split-dataset ds-id 0.8)
m-id (create-model train-id)
e-id (make-ensemble train-id 15)
m-f (f-measure (create-evaluation m-id test-id))
e-f (f-measure (create-evaluation e-id test-id)))
(log-info "model f " m-f " / ensemble f " e-f)
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))
#BSSML16 Automating Machine Learning December 2016 20 / 46
1 Server-side workflows: WhizzML
2 Basic Workflow: Model or ensemble?
3 Case study: Using Flatline in Whizzml
4 Advanced Workflows
5 Case Study: Stacked Generalization in WhizzML
#BSSML16 Automating Machine Learning December 2016 21 / 46
Transforming item counts to features
basket milk eggs flour salt chocolate caviar
milk,eggs Y Y N N N N
milk,flour Y N Y N N N
milk,flour,eggs Y Y Y N N N
chocolate N N N N Y N
#BSSML16 Automating Machine Learning December 2016 22 / 46
Item counts to features with Flatline
(if (contains-items? "basket" "milk") "Y" "N")
(if (contains-items? "basket" "eggs") "Y" "N")
(if (contains-items? "basket" "flour") "Y" "N")
(if (contains-items? "basket" "salt") "Y" "N")
(if (contains-items? "basket" "chocolate") "Y" "N")
(if (contains-items? "basket" "caviar") "Y" "N")
Parameterized code generation
Field name
Item values
Y/N category names
#BSSML16 Automating Machine Learning December 2016 23 / 46
Flatline code generation with WhizzML
"(if (contains-items? "basket" "milk") "Y" "N")"
#BSSML16 Automating Machine Learning December 2016 24 / 46
Flatline code generation with WhizzML
"(if (contains-items? "basket" "milk") "Y" "N")"
(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
#BSSML16 Automating Machine Learning December 2016 24 / 46
Flatline code generation with WhizzML
"(if (contains-items? "basket" "milk") "Y" "N")"
(let (field "basket"
item "milk"
yes "Y"
no "N")
(flatline "(if (contains-items? {{field}} {{item}})"
(define (field-flatline field item yes no)
(flatline "(if (contains-items? {{field}} {{item}})"
#BSSML16 Automating Machine Learning December 2016 24 / 46
Flatline code generation with WhizzML
(define (field-flatline field item yes no)
(flatline "(if (contains-items? {{field}} {{item}})"
(define (item-fields field items yes no)
(for (item items)
{"field" (field-flatline field item yes no)}))
(define (dataset-item-fields ds-id field)
(let (ds (fetch ds-id)
item-dist (ds ["fields" field "summary" "items"])
items (map head item-dist))
(item-fields field items "Y" "N")))
#BSSML16 Automating Machine Learning December 2016 25 / 46
Flatline code generation with WhizzML
(define output-dataset
(let (fs {"new_fields" (dataset-item-fields input-dataset
(create-dataset input-dataset fs)))
{"inputs": [{"name": "input-dataset",
"type": "dataset-id",
"description": "The input dataset"},
{"name": "field",
"type": "string",
"description": "Id of the items field"}],
"outputs": [{"name": "output-dataset",
"type": "dataset-id",
"description": "The id of the generated dataset"}]}
#BSSML16 Automating Machine Learning December 2016 26 / 46
1 Server-side workflows: WhizzML
2 Basic Workflow: Model or ensemble?
3 Case study: Using Flatline in Whizzml
4 Advanced Workflows
5 Case Study: Stacked Generalization in WhizzML
#BSSML16 Automating Machine Learning December 2016 27 / 46
What Do We Know About WhizzML?
• It’s a complete programming language
• Machine learning “operations” are first-class
• Those operations are performed in BigML’s backend
One-line of code to perform API requests
We get scale “for free”
• Everything is Composable
The Web Interface
#BSSML16 Automating Machine Learning December 2016 28 / 46
What Can We Do With It?
• Non-trivial Model Selection
n-fold cross validation
Comparison of model types (tree, ensemble, logistic)
• Automation of Drudgery
One-click retraining/validation
Standarized dataset transformations / cleaning
• Sure, but what else?
#BSSML16 Automating Machine Learning December 2016 29 / 46
Algorithms as Workflows
• Many ML algorithms can be thought of as workflows
• In these algorithms, machine learning operations are the
Make a model
Make a prediction
Evaluate a model
• Many such algorithms can be implemented in WhizzML
Reap the advantages of BigML’s infrastructure
Once implemented, it is language-agnostic
#BSSML16 Automating Machine Learning December 2016 30 / 46
Examples: Stacked Generalization
#BSSML16 Automating Machine Learning December 2016 31 / 46
Examples: Randomized Parameter Optimization
#BSSML16 Automating Machine Learning December 2016 32 / 46
Examples: SMACdown
#BSSML16 Automating Machine Learning December 2016 33 / 46
Examples: SMACdown
Objective: Find the best set of parameters even more quickly!
• Do:
Generate several random sets of parameters for an ML algorithm
Do 10-fold cross-validation with those parameters
Learn a predictive model to predict performance from parameter
Use the model to help you select the next set of parameters to
• Until you get a set of parameters that performs “well” or you get
#BSSML16 Automating Machine Learning December 2016 34 / 46
Examples: Boosting
• General idea: Iteratively model the dataset
Each iteration is trained on the mistakes of previous iterations
Said another way, the objective changes each iteration
The final model is a summation of all iterations
• Lots of variations on this theme
Martingale Boosting
Gradient Boosting
• Let’s take a look at a WhizzML implementation of the latter
#BSSML16 Automating Machine Learning December 2016 35 / 46
Examples: Boosting
#BSSML16 Automating Machine Learning December 2016 36 / 46
1 Server-side workflows: WhizzML
2 Basic Workflow: Model or ensemble?
3 Case study: Using Flatline in Whizzml
4 Advanced Workflows
5 Case Study: Stacked Generalization in WhizzML
#BSSML16 Automating Machine Learning December 2016 37 / 46
Examples: Stacked Generalization
#BSSML16 Automating Machine Learning December 2016 38 / 46
Stacked generalization
Objective: Improve predictions by modeling the output scores of
multiple trained models.
• Create a training and a holdout set
• Create n different models on the training set (with some difference
among them; e.g., single-tree vs. ensemble vs. logistic regression)
• Make predictions from those models on the holdout set
• Train a model to predict the class based on the other models’
#BSSML16 Automating Machine Learning December 2016 39 / 46
A Stacked generalization library: creating the stack
;; Splits the given dataset, using half of it to create
;; an heterogeneous collection of models and the other
;; half to train a tree that predicts based on those other
;; models predictions. Returns a map with the collection
;; of models (under the key "models") and the meta-prediction
;; as the value of the key "metamodel". The key "result"
;; has as value a boolean flag indicating whether the
;; process was successful.
(define (make-stack dataset-id)
(let ([train-id hold-id] (create-random-dataset-split dataset-id 0.5)
models (create-stack-models train-id)
id (create-stack-predictions models hold-id)
orig-fields (model-inputs (head models))
obj-id (dataset-get-objective-id train-id)
meta-id (create-model {"dataset" id
"excluded_fields" orig-fields
"objective_field" obj-id})
success? (resource-done? (fetch (wait meta-id))))
{"models" models "metamodel" meta-id "result" success?}))
#BSSML16 Automating Machine Learning December 2016 40 / 46
A Stacked generalization library: using the stack
;; Use the models and metamodels computed by make-stack
;; to make a prediction on the input-data map. Returns
;; the identifier of the prediction object.
(define (make-stack-prediction models meta-model input-data)
(let (preds (map (lambda (m) (create-prediction {"model" m
"input_data" input-data}))
preds (map (lambda (p)
(head (values ((fetch p) "prediction"))))
meta-input (make-map (model-inputs meta-model) preds))
(create-prediction {"model" meta-model "input_data" meta-input})))
#BSSML16 Automating Machine Learning December 2016 41 / 46
A Stacked generalization library: auxiliary functions
;; Extract for a batchpredction its associated dataset of results
(define (batch-dataset id)
(wait ((fetch id) "output_dataset_resource")))
;; Create a batchprediction for the given model and datasets,
;; with a map of additional options and using defaults appropriate
;; for model stacking
(define (make-batch ds-id mod-id)
(let (name (resource-type mod-id))
(create-batchprediction ds-id mod-id {"all_fields" true
"output_dataset" true
"prediction_name" name})))
;; Auxiliary function extracting the model_inputs of a model
(define (model-inputs mod-id)
(fetch mod-id) "input_fields"))
#BSSML16 Automating Machine Learning December 2016 42 / 46
A Stacked generalization library: auxiliary functions
;; Auxiliary function to create the set of stack models
(define (create-stack-models train-id)
[(create-model {"dataset" train-id})
(create-ensemble {"dataset" train-id
"number_of_models" 20
"randomize" false})
(create-ensemble {"dataset" train-id
"number_of_models" 20
"randomize" true})
(create-logisticregression {"dataset" train-id})])
;; Auxiliary funtion to successively create batchpredictions using the
;; given models over the initial dataset ds-id. Returns the final
;; dataset id.
(define (create-stack-predictions models ds-id)
(reduce (lambda (did mid)
(batch-dataset (make-batch did mid)))
#BSSML16 Automating Machine Learning December 2016 43 / 46
A Stacked generalization library: creating the stack
;; Splits the given dataset, using half of it to create
;; an heterogeneous collection of models and the other
;; half to train a tree that predicts based on those other
;; models predictions. Returns a map with the collection
;; of models (under the key "models") and the meta-prediction
;; as the value of the key "metamodel". The key "result"
;; has as value a boolean flag indicating whether the
;; process was successful.
(define (make-stack dataset-id)
(let ([train-id hold-id] (create-random-dataset-split dataset-id 0.5)
models (create-stack-models train-id)
id (create-stack-predictions models hold-id)
orig-fields (model-inputs (head models))
obj-id (dataset-get-objective-id train-id)
meta-id (create-model {"dataset" id
"excluded_fields" orig-fields
"objective_field" obj-id})
success? (resource-done? (fetch (wait meta-id))))
{"models" models "metamodel" meta-id "result" success?}))
#BSSML16 Automating Machine Learning December 2016 44 / 46
Library-based scripts
Script for creating the models
(define stack (make-stack dataset-id))
Script for predictions using the stack
(define (make-prediction exec-id input-data)
(let (exec (fetch exec-id)
stack (nth (head (get-in exec ["execution" "outputs"])) 1)
models (get stack "models")
metamodel (get stack "metamodel"))
(when (get stack "result")
(try (make-stack-prediction models metamodel {})
(catch e (log-info "Error: " e) false)))))
(define prediction-id (make-prediction exec-id input-data))
(define prediction (when prediction-id (fetch prediction-id)))
#BSSML16 Automating Machine Learning December 2016 45 / 46
#BSSML16 Automating Machine Learning December 2016 46 / 46

More Related Content

What's hot

VSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsVSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic Workflows
BigML, Inc
A developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIsA developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIs
Louis Dorard
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
BigML, Inc
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
BigML, Inc
BigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML Fall 2016 Release
BigML Fall 2016 Release
BigML, Inc
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature Engineering
BigML, Inc
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
BigML, Inc
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature Engineering
BigML, Inc
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
BigML, Inc
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
BigML, Inc
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data Transformations
BigML, Inc
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - Ensembles
BigML, Inc
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
Itzhak Kameli
VSSML18. Introduction to WhizzML
VSSML18. Introduction to WhizzMLVSSML18. Introduction to WhizzML
VSSML18. Introduction to WhizzML
BigML, Inc
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
Turi, Inc.
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create
Turi, Inc.
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML Automation
BigML, Inc
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
Marko Rodriguez
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsMLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, Deepnets
BigML, Inc

What's hot (20)

VSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsVSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic Workflows
A developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIsA developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIs
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
BigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML Fall 2016 Release
BigML Fall 2016 Release
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature Engineering
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data Transformations
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - Ensembles
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
VSSML18. Introduction to WhizzML
VSSML18. Introduction to WhizzMLVSSML18. Introduction to WhizzML
VSSML18. Introduction to WhizzML
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML Automation
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsMLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, Deepnets

Viewers also liked

VSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsVSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data Transformations
BigML, Inc
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
BigML, Inc
VSSML16 LR2. Summary Day 2
VSSML16 LR2. Summary Day 2VSSML16 LR2. Summary Day 2
VSSML16 LR2. Summary Day 2
BigML, Inc
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
BigML, Inc
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Ian Lumb
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data Transformations
BigML, Inc
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
BigML, Inc
VSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic RegressionVSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic Regression
BigML, Inc
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro ServiceDrilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Ian Lumb
Google TensorFlow Tutorial
Google TensorFlow TutorialGoogle TensorFlow Tutorial
Google TensorFlow Tutorial

Viewers also liked (10)

VSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsVSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data Transformations
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
VSSML16 LR2. Summary Day 2
VSSML16 LR2. Summary Day 2VSSML16 LR2. Summary Day 2
VSSML16 LR2. Summary Day 2
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data Transformations
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
VSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic RegressionVSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic Regression
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro ServiceDrilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Google TensorFlow Tutorial
Google TensorFlow TutorialGoogle TensorFlow Tutorial
Google TensorFlow Tutorial

Similar to BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

Basic WhizzML Workflows
Basic WhizzML WorkflowsBasic WhizzML Workflows
Basic WhizzML Workflows
BigML, Inc
VSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic WorkflowsVSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic Workflows
BigML, Inc
Advanced WhizzML Workflows
Advanced WhizzML WorkflowsAdvanced WhizzML Workflows
Advanced WhizzML Workflows
BigML, Inc
Intermediate WhizzML Workflows
Intermediate WhizzML WorkflowsIntermediate WhizzML Workflows
Intermediate WhizzML Workflows
BigML, Inc
Desing pattern prototype-Factory Method, Prototype and Builder
Desing pattern prototype-Factory Method, Prototype and Builder Desing pattern prototype-Factory Method, Prototype and Builder
Desing pattern prototype-Factory Method, Prototype and Builder
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor
Soham Kulkarni
Ch01 basic-java-programs
Ch01 basic-java-programsCh01 basic-java-programs
Ch01 basic-java-programs
James Brotsos
Intake 37 linq2
Intake 37 linq2Intake 37 linq2
Intake 37 linq2
Mahmoud Ouf
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
Augustus Overview Open Source Analytics
Augustus Overview  Open Source AnalyticsAugustus Overview  Open Source Analytics
Augustus Overview Open Source Analytics
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streaming
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Martin Loetzsch
AI Library - An Open Source Machine Learning Framework
AI Library - An Open Source Machine Learning FrameworkAI Library - An Open Source Machine Learning Framework
AI Library - An Open Source Machine Learning Framework
Reducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code AnalysisReducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code Analysis
Sebastiano Panichella
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
Max Kleiner
Headache from using mathematical software
Headache from using mathematical softwareHeadache from using mathematical software
Headache from using mathematical software
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Chetan Khatri
Price of an Error
Price of an ErrorPrice of an Error
Price of an Error
Andrey Karpov
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python

Similar to BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking (20)

Basic WhizzML Workflows
Basic WhizzML WorkflowsBasic WhizzML Workflows
Basic WhizzML Workflows
VSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic WorkflowsVSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic Workflows
Advanced WhizzML Workflows
Advanced WhizzML WorkflowsAdvanced WhizzML Workflows
Advanced WhizzML Workflows
Intermediate WhizzML Workflows
Intermediate WhizzML WorkflowsIntermediate WhizzML Workflows
Intermediate WhizzML Workflows
Desing pattern prototype-Factory Method, Prototype and Builder
Desing pattern prototype-Factory Method, Prototype and Builder Desing pattern prototype-Factory Method, Prototype and Builder
Desing pattern prototype-Factory Method, Prototype and Builder
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor
Ch01 basic-java-programs
Ch01 basic-java-programsCh01 basic-java-programs
Ch01 basic-java-programs
Intake 37 linq2
Intake 37 linq2Intake 37 linq2
Intake 37 linq2
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Augustus Overview Open Source Analytics
Augustus Overview  Open Source AnalyticsAugustus Overview  Open Source Analytics
Augustus Overview Open Source Analytics
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streaming
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
AI Library - An Open Source Machine Learning Framework
AI Library - An Open Source Machine Learning FrameworkAI Library - An Open Source Machine Learning Framework
AI Library - An Open Source Machine Learning Framework
Reducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code AnalysisReducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code Analysis
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
Headache from using mathematical software
Headache from using mathematical softwareHeadache from using mathematical software
Headache from using mathematical software
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Price of an Error
Price of an ErrorPrice of an Error
Price of an Error
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
BigML, Inc
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
BigML, Inc
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
BigML, Inc
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
BigML, Inc
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
BigML, Inc
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
BigML, Inc
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
BigML, Inc
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
BigML, Inc
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
BigML, Inc
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
BigML, Inc
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
BigML, Inc
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
BigML, Inc
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
BigML, Inc
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
BigML, Inc
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
BigML, Inc
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
BigML, Inc
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
BigML, Inc
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
BigML, Inc
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
BigML, Inc

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...

Recently uploaded

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann

Recently uploaded (20)

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent, and Stacking

  • 1. Automating Machine Learning Advanced Workflows and WhizzML #BSSML16 December 2016 #BSSML16 Automating Machine Learning December 2016 1 / 46
  • 2. Outline 1 Server-side workflows: WhizzML 2 Basic Workflow: Model or ensemble? 3 Case study: Using Flatline in Whizzml 4 Advanced Workflows 5 Case Study: Stacked Generalization in WhizzML #BSSML16 Automating Machine Learning December 2016 2 / 46
  • 3. Outline 1 Server-side workflows: WhizzML 2 Basic Workflow: Model or ensemble? 3 Case study: Using Flatline in Whizzml 4 Advanced Workflows 5 Case Study: Stacked Generalization in WhizzML #BSSML16 Automating Machine Learning December 2016 3 / 46
  • 4. Client-side Machine Learning Automation Problems of client-side solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows hard to optimize Extensibility Bigmler hides complexity at the cost of flexibility Not enough abstraction #BSSML16 Automating Machine Learning December 2016 4 / 46
  • 5. Machine Learning Automation for real Solution (complexity, reuse): Domain-specific languages #BSSML16 Automating Machine Learning December 2016 5 / 46
  • 6. Machine Learning Automation for real Solution (scalability, reuse): Back to the server #BSSML16 Automating Machine Learning December 2016 6 / 46
  • 7. Machine Learning Automation for real Solution (scalability, reuse): Back to the server #BSSML16 Automating Machine Learning December 2016 6 / 46
  • 8. WhizzML in a Nutshell • Domain-specific language for ML workflow automation High-level problem and solution specification • Framework for scalable, remote execution of ML workflows Sophisticated server-side optimization Out-of-the-box scalability Client-server brittleness removed Infrastructure for creating and sharing ML scripts and libraries #BSSML16 Automating Machine Learning December 2016 7 / 46
  • 9. WhizzML REST Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. #BSSML16 Automating Machine Learning December 2016 8 / 46
  • 10. Different ways to create WhizzML Scripts/Libraries Github Script editor Gallery Other scripts Scriptify −→ #BSSML16 Automating Machine Learning December 2016 9 / 46
  • 11. Basic workflow in WhizzML (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #BSSML16 Automating Machine Learning December 2016 10 / 46
  • 12. Basic workflow in WhizzML: Usable by any binding from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) #BSSML16 Automating Machine Learning December 2016 11 / 46
  • 13. Basic workflow in WhizzML: Trivial parallelization ;; Workflow for 1 resource (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #BSSML16 Automating Machine Learning December 2016 12 / 46
  • 14. Basic workflow in WhizzML: Trivial parallelization ;; Workflow for any number of resources (let (datasets (map create-dataset sources) clusters (map create-cluster datasets) params {"output_dataset" true "all_fields" true}) (map (lambda (d c) (create-batchcentroid d c params)) datasets clusters)) #BSSML16 Automating Machine Learning December 2016 13 / 46
  • 15. Basic workflows in WhizzML: automatic generation #BSSML16 Automating Machine Learning December 2016 14 / 46
  • 16. Standard functions • Numeric and relational operators (+, *, <, =, ...) • Mathematical functions (cos, sinh, floor ...) • Strings and regular expressions (str, matches?, replace, ...) • Flatline generation • Collections: list traversal, sorting, map manipulation • BigML resources manipulation Creation create-source, create-and-wait-dataset, etc. Retrieval fetch, list-anomalies, etc. Update update Deletion delete • Machine Learning Algorithms (SMACDown, Boosting, etc.) #BSSML16 Automating Machine Learning December 2016 15 / 46
  • 17. Outline 1 Server-side workflows: WhizzML 2 Basic Workflow: Model or ensemble? 3 Case study: Using Flatline in Whizzml 4 Advanced Workflows 5 Case Study: Stacked Generalization in WhizzML #BSSML16 Automating Machine Learning December 2016 16 / 46
  • 18. Model or Ensemble? • Split a dataset in test and training parts • Create a model and an ensemble with the training dataset • Evaluate both with the test dataset • Choose the one with better evaluation (f-measure) #BSSML16 Automating Machine Learning December 2016 17 / 46
  • 19. Model or Ensemble? ;; Functions for creating the two dataset parts ;; Sample a dataset taking a fraction of its rows (rate) and ;; keeping either that fraction (out-of-bag? false) or its ;; complement (out-of-bag? true) (define (sample-dataset origin-id rate out-of-bag?) (create-dataset {"origin_dataset" origin-id "sample_rate" rate "out_of_bag" out-of-bag? "seed" "example-seed-0001"}))) ;; Create in parallel two halves of a dataset using ;; the sample function twice. Return a list of the two ;; new dataset ids. (define (split-dataset origin-id rate) (list (sample-dataset origin-id rate false) (sample-dataset origin-id rate true))) #BSSML16 Automating Machine Learning December 2016 18 / 46
  • 20. Model or Ensemble? ;; Functions to create an ensemble and extract the f-measure from ;; evaluation, given its id. (define (make-ensemble ds-id size) (create-ensemble ds-id {"number_of_models" size})) (define (f-measure ev-id) (let (ev-id (wait ev-id) ;; because fetch doesn't wait evaluation (fetch ev-id)) (evaluation ["result" "model" "average_f_measure"])) #BSSML16 Automating Machine Learning December 2016 19 / 46
  • 21. Model or Ensemble? ;; Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-dataset {"source" src-id}) [train-id test-id] (split-dataset ds-id 0.8) m-id (create-model train-id) e-id (make-ensemble train-id 15) m-f (f-measure (create-evaluation m-id test-id)) e-f (f-measure (create-evaluation e-id test-id))) (log-info "model f " m-f " / ensemble f " e-f) (if (> m-f e-f) m-id e-id))) ;; Compute the result of the script execution ;; - Inputs: [{"name": "input-source-id", "type": "source-id"}] ;; - Outputs: [{"name": "result", "type": "resource-id"}] (define result (model-or-ensemble input-source-id)) #BSSML16 Automating Machine Learning December 2016 20 / 46
  • 22. Outline 1 Server-side workflows: WhizzML 2 Basic Workflow: Model or ensemble? 3 Case study: Using Flatline in Whizzml 4 Advanced Workflows 5 Case Study: Stacked Generalization in WhizzML #BSSML16 Automating Machine Learning December 2016 21 / 46
  • 23. Transforming item counts to features basket milk eggs flour salt chocolate caviar milk,eggs Y Y N N N N milk,flour Y N Y N N N milk,flour,eggs Y Y Y N N N chocolate N N N N Y N #BSSML16 Automating Machine Learning December 2016 22 / 46
  • 24. Item counts to features with Flatline (if (contains-items? "basket" "milk") "Y" "N") (if (contains-items? "basket" "eggs") "Y" "N") (if (contains-items? "basket" "flour") "Y" "N") (if (contains-items? "basket" "salt") "Y" "N") (if (contains-items? "basket" "chocolate") "Y" "N") (if (contains-items? "basket" "caviar") "Y" "N") Parameterized code generation Field name Item values Y/N category names #BSSML16 Automating Machine Learning December 2016 23 / 46
  • 25. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" #BSSML16 Automating Machine Learning December 2016 24 / 46
  • 26. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #BSSML16 Automating Machine Learning December 2016 24 / 46
  • 27. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #BSSML16 Automating Machine Learning December 2016 24 / 46
  • 28. Flatline code generation with WhizzML (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (item-fields field items yes no) (for (item items) {"field" (field-flatline field item yes no)})) (define (dataset-item-fields ds-id field) (let (ds (fetch ds-id) item-dist (ds ["fields" field "summary" "items"]) items (map head item-dist)) (item-fields field items "Y" "N"))) #BSSML16 Automating Machine Learning December 2016 25 / 46
  • 29. Flatline code generation with WhizzML (define output-dataset (let (fs {"new_fields" (dataset-item-fields input-dataset field)}) (create-dataset input-dataset fs))) {"inputs": [{"name": "input-dataset", "type": "dataset-id", "description": "The input dataset"}, {"name": "field", "type": "string", "description": "Id of the items field"}], "outputs": [{"name": "output-dataset", "type": "dataset-id", "description": "The id of the generated dataset"}]} #BSSML16 Automating Machine Learning December 2016 26 / 46
  • 30. Outline 1 Server-side workflows: WhizzML 2 Basic Workflow: Model or ensemble? 3 Case study: Using Flatline in Whizzml 4 Advanced Workflows 5 Case Study: Stacked Generalization in WhizzML #BSSML16 Automating Machine Learning December 2016 27 / 46
  • 31. What Do We Know About WhizzML? • It’s a complete programming language • Machine learning “operations” are first-class • Those operations are performed in BigML’s backend One-line of code to perform API requests We get scale “for free” • Everything is Composable Functions Libraries The Web Interface #BSSML16 Automating Machine Learning December 2016 28 / 46
  • 32. What Can We Do With It? • Non-trivial Model Selection n-fold cross validation Comparison of model types (tree, ensemble, logistic) • Automation of Drudgery One-click retraining/validation Standarized dataset transformations / cleaning • Sure, but what else? #BSSML16 Automating Machine Learning December 2016 29 / 46
  • 33. Algorithms as Workflows • Many ML algorithms can be thought of as workflows • In these algorithms, machine learning operations are the primitives Make a model Make a prediction Evaluate a model • Many such algorithms can be implemented in WhizzML Reap the advantages of BigML’s infrastructure Once implemented, it is language-agnostic #BSSML16 Automating Machine Learning December 2016 30 / 46
  • 34. Examples: Stacked Generalization #BSSML16 Automating Machine Learning December 2016 31 / 46
  • 35. Examples: Randomized Parameter Optimization #BSSML16 Automating Machine Learning December 2016 32 / 46
  • 36. Examples: SMACdown #BSSML16 Automating Machine Learning December 2016 33 / 46
  • 37. Examples: SMACdown Objective: Find the best set of parameters even more quickly! • Do: Generate several random sets of parameters for an ML algorithm Do 10-fold cross-validation with those parameters Learn a predictive model to predict performance from parameter values Use the model to help you select the next set of parameters to evaluate • Until you get a set of parameters that performs “well” or you get bored #BSSML16 Automating Machine Learning December 2016 34 / 46
  • 38. Examples: Boosting • General idea: Iteratively model the dataset Each iteration is trained on the mistakes of previous iterations Said another way, the objective changes each iteration The final model is a summation of all iterations • Lots of variations on this theme Adaboost Logitboost Martingale Boosting Gradient Boosting • Let’s take a look at a WhizzML implementation of the latter #BSSML16 Automating Machine Learning December 2016 35 / 46
  • 39. Examples: Boosting #BSSML16 Automating Machine Learning December 2016 36 / 46
  • 40. Outline 1 Server-side workflows: WhizzML 2 Basic Workflow: Model or ensemble? 3 Case study: Using Flatline in Whizzml 4 Advanced Workflows 5 Case Study: Stacked Generalization in WhizzML #BSSML16 Automating Machine Learning December 2016 37 / 46
  • 41. Examples: Stacked Generalization #BSSML16 Automating Machine Learning December 2016 38 / 46
  • 42. Stacked generalization Objective: Improve predictions by modeling the output scores of multiple trained models. • Create a training and a holdout set • Create n different models on the training set (with some difference among them; e.g., single-tree vs. ensemble vs. logistic regression) • Make predictions from those models on the holdout set • Train a model to predict the class based on the other models’ predictions #BSSML16 Automating Machine Learning December 2016 39 / 46
  • 43. A Stacked generalization library: creating the stack ;; Splits the given dataset, using half of it to create ;; an heterogeneous collection of models and the other ;; half to train a tree that predicts based on those other ;; models predictions. Returns a map with the collection ;; of models (under the key "models") and the meta-prediction ;; as the value of the key "metamodel". The key "result" ;; has as value a boolean flag indicating whether the ;; process was successful. (define (make-stack dataset-id) (let ([train-id hold-id] (create-random-dataset-split dataset-id 0.5) models (create-stack-models train-id) id (create-stack-predictions models hold-id) orig-fields (model-inputs (head models)) obj-id (dataset-get-objective-id train-id) meta-id (create-model {"dataset" id "excluded_fields" orig-fields "objective_field" obj-id}) success? (resource-done? (fetch (wait meta-id)))) {"models" models "metamodel" meta-id "result" success?})) #BSSML16 Automating Machine Learning December 2016 40 / 46
  • 44. A Stacked generalization library: using the stack ;; Use the models and metamodels computed by make-stack ;; to make a prediction on the input-data map. Returns ;; the identifier of the prediction object. (define (make-stack-prediction models meta-model input-data) (let (preds (map (lambda (m) (create-prediction {"model" m "input_data" input-data})) models) preds (map (lambda (p) (head (values ((fetch p) "prediction")))) preds) meta-input (make-map (model-inputs meta-model) preds)) (create-prediction {"model" meta-model "input_data" meta-input}))) #BSSML16 Automating Machine Learning December 2016 41 / 46
  • 45. A Stacked generalization library: auxiliary functions ;; Extract for a batchpredction its associated dataset of results (define (batch-dataset id) (wait ((fetch id) "output_dataset_resource"))) ;; Create a batchprediction for the given model and datasets, ;; with a map of additional options and using defaults appropriate ;; for model stacking (define (make-batch ds-id mod-id) (let (name (resource-type mod-id)) (create-batchprediction ds-id mod-id {"all_fields" true "output_dataset" true "prediction_name" name}))) ;; Auxiliary function extracting the model_inputs of a model (define (model-inputs mod-id) (fetch mod-id) "input_fields")) #BSSML16 Automating Machine Learning December 2016 42 / 46
  • 46. A Stacked generalization library: auxiliary functions ;; Auxiliary function to create the set of stack models (define (create-stack-models train-id) [(create-model {"dataset" train-id}) (create-ensemble {"dataset" train-id "number_of_models" 20 "randomize" false}) (create-ensemble {"dataset" train-id "number_of_models" 20 "randomize" true}) (create-logisticregression {"dataset" train-id})]) ;; Auxiliary funtion to successively create batchpredictions using the ;; given models over the initial dataset ds-id. Returns the final ;; dataset id. (define (create-stack-predictions models ds-id) (reduce (lambda (did mid) (batch-dataset (make-batch did mid))) ds-id models)) #BSSML16 Automating Machine Learning December 2016 43 / 46
  • 47. A Stacked generalization library: creating the stack ;; Splits the given dataset, using half of it to create ;; an heterogeneous collection of models and the other ;; half to train a tree that predicts based on those other ;; models predictions. Returns a map with the collection ;; of models (under the key "models") and the meta-prediction ;; as the value of the key "metamodel". The key "result" ;; has as value a boolean flag indicating whether the ;; process was successful. (define (make-stack dataset-id) (let ([train-id hold-id] (create-random-dataset-split dataset-id 0.5) models (create-stack-models train-id) id (create-stack-predictions models hold-id) orig-fields (model-inputs (head models)) obj-id (dataset-get-objective-id train-id) meta-id (create-model {"dataset" id "excluded_fields" orig-fields "objective_field" obj-id}) success? (resource-done? (fetch (wait meta-id)))) {"models" models "metamodel" meta-id "result" success?})) #BSSML16 Automating Machine Learning December 2016 44 / 46
  • 48. Library-based scripts Script for creating the models (define stack (make-stack dataset-id)) Script for predictions using the stack (define (make-prediction exec-id input-data) (let (exec (fetch exec-id) stack (nth (head (get-in exec ["execution" "outputs"])) 1) models (get stack "models") metamodel (get stack "metamodel")) (when (get stack "result") (try (make-stack-prediction models metamodel {}) (catch e (log-info "Error: " e) false))))) (define prediction-id (make-prediction exec-id input-data)) (define prediction (when prediction-id (fetch prediction-id))) #BSSML16 Automating Machine Learning December 2016 45 / 46
  • 49. Questions? #BSSML16 Automating Machine Learning December 2016 46 / 46