SlideShare a Scribd company logo
1 of 34
Download to read offline
Introducing
Boosted Trees
BigML Winter 2017 Release
BigML, Inc 2Winter Release Webinar - March 2017
Winter 2017 Release
POUL PETERSEN (CIO)
Enter questions into chat box – we’ll
answer some via chat; others at the end of
the session
https://bigml.com/releases
ATAKAN CETINSOY, (VP Predictive Applications)
Resources
Moderator
Speaker
Contact info@bigml.com
Twitter @bigmlcom
Questions
BigML, Inc 3Winter Release Webinar - March 2017
Boosted Trees
• Ensemble method
• Supervised Learning
• Classification & Regression
• Now "BigML Easy"!
NUMBER OF MODELS
1
NUMBER OF MODELS
10
NUMBER OF MODELS
20
BigML, Inc 4Winter Release Webinar - March 2017
Questions…
• Novice: Wait, what is Supervised Learning
again?
• Intermediate: When are ensemble methods
useful and how are Boosted Trees different
from the ensemble methods already
available in BigML?
• Power User: How does Boosting work and
what do all these knobs do?
BigML, Inc 5Winter Release Webinar - March 2017
Instances
Supervised Learning
4113 NW
Peppertree
4 2 1876 7841 1977 44,598141 -123,29689 366000
2450 NW 13th 3 2 1502 9148 1964 44,593322 -123,267435 265000
3545 NE
Canterbury
3 1,5 1172 10019 1972 44,603042 -123,236332 245000
1245 NW
Kainui
4 2 1864 49658 1974 44,651826 -123,250615 341500
Features
ADDRESS BEDS BATHS SQFT LOT
SIZE
YEAR
BUILT
LATITUDE LONGITUDE LAST SALE
PRICE
1522 NW
Jonquil
4 3 2424 5227 1991 44,594828 -123,269328 360000
7360 NW
Valley Vw
3 2 1785 25700 1979 44,643876 -123,238189 307500
4748 NW
Veronica
5 3,5 4135 6098 2004 44,5929659 -123,306916 600000
411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350
2975 NW
Stewart
3 1,5 1327 9148 1973 44,597319 -123,25452 245000
LAST SALE
PRICE
360000
307500
600000
435350
245000
366000
265000
245000
341500
L a b
Supervised Learning: 

There is a column of data, the label, we want to predict.
BigML, Inc 6Winter Release Webinar - March 2017
Classification & Regression
animal state … proximity action
tiger hungry … close run
elephant happy … far take picture
… … … … …
Classification
animal state … proximity min_kmh
tiger hungry … close 70
hippo angry … far 10
… …. … … …
Regression
label
BigML, Inc 7Winter Release Webinar - March 2017
Evaluation
Instances Features
ADDRESS BEDS BATHS SQFT LOT
SIZE
YEAR
BUILT
LATITUDE LONGITUDE LAST SALE
PRICE
1522 NW
Jonquil
4 3 2424 5227 1991 44,594828 -123,269328 360000
7360 NW
Valley Vw
3 2 1785 25700 1979 44,643876 -123,238189 307500
4748 NW
Veronica
5 3,5 4135 6098 2004 44,5929659 -123,306916 600000
411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350
2975 NW
Stewart
3 1,5 1327 9148 1973 44,597319 -123,25452 245000
LAST SALE
PRICE
360000
307500
600000
435350
245000
L a b
4113 NW
Peppertree
4 2 1876 7841 1977 44,598141 -123,29689 366000
2450 NW 13th 3 2 1502 9148 1964 44,593322 -123,267435 265000
3545 NE
Canterbury
3 1,5 1172 10019 1972 44,603042 -123,236332 245000
1245 NW
Kainui
4 2 1864 49658 1974 44,651826 -123,250615 341500
366000
265000
245000
341500
MODEL
EVALUATION
BigML, Inc 8Winter Release Webinar - March 2017
Demo #1
BigML, Inc 9Winter Release Webinar - March 2017
Why Ensembles?
MODEL PREDICTIONDATASET
• Model can "overfit" - especially if there are
outliers. Consider: mansions
• Model might "underfit" - especially if there is
not enough training data to discover a
single good representation. Consider: No
mid-size homes
• The data may be too "complicated" for the
model to represent at all. Consider: Trying to
fit a line to exponential pricing
• Models might be "unlucky" - algorithms
typically rely on stochastic processes. This
is more subtle…
Problems: Ensemble Intuition:
• Build many models, each with only some of
the data and outliers will be averaged out
• Built many models instead of relying on one
guess.
• Built many models, allowing each to focus
on part of the data thus increasing
complexity of the solution.
• Built many models to avoid unlucky initial
conditions.
BigML, Inc 10Winter Release Webinar - March 2017
Decision Forest
MODEL 1
DATASET
SAMPLE 1
SAMPLE 2
SAMPLE 3
SAMPLE 4
MODEL 2
MODEL 3
MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
VOTE
aka "Bagging"
BigML, Inc 11Winter Release Webinar - March 2017
Demo #2
BigML, Inc 12Winter Release Webinar - March 2017
Random Decision Forest
MODEL 1
DATASET
SAMPLE 1
SAMPLE 2
SAMPLE 3
SAMPLE 4
MODEL 2
MODEL 3
MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
SAMPLE 1
PREDICTION
VOTE
BigML, Inc 13Winter Release Webinar - March 2017
Demo #3
BigML, Inc 14Winter Release Webinar - March 2017
Boosting
Question:
• Instead of sampling tricks… can we incrementally improve a single
model?
MODEL
BATCH
PREDICTION
Intuition:
• The training data has the correct answer, so we know the errors
exactly. Can we model the errors?
DATASET
label
DATASET
label
prediction
DATASET
label
prediction
error
BigML, Inc 15Winter Release Webinar - March 2017
Boosting
ADDRESS BEDS BATHS SQFT
LOT
SIZE
YEAR
BUILT
LATITUDE LONGITUDE
LAST SALE
PRICE
1522 NW
Jonquil
4 3 2424 5227 1991 44,594828 -123,269328 360000
7360 NW
Valley Vw
3 2 1785 25700 1979 44,643876 -123,238189 307500
4748 NW
Veronica
5 3,5 4135 6098 2004 44,5929659 -123,306916 600000
411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350
MODEL 1
PREDICTED
SALE PRICE
360750
306875
587500
435350
ERROR
750
-625
-12500
0
ADDRESS BEDS BATHS SQFT
LOT
SIZE
YEAR
BUILT
LATITUDE LONGITUDE ERROR
1522 NW
Jonquil
4 3 2424 5227 1991 44,594828 -123,269328 750
7360 NW
Valley Vw
3 2 1785 25700 1979 44,643876 -123,238189 625
4748 NW
Veronica
5 3,5 4135 6098 2004 44,5929659 -123,306916 12500
411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 0
MODEL 2
PREDICTED
ERROR
750
625
12393,83333
6879,67857
Why stop at one iteration?
• "Hey Model 1, what do you predict is the sale price of this home?"
• "Hey Model 2, how much error do you predict Model 1 just made?"
BigML, Inc 16Winter Release Webinar - March 2017
Boosting
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration 1
Iteration 2
Iteration 3
Iteration 4
etc…
BigML, Inc 17Winter Release Webinar - March 2017
Boosting
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration 1
Iteration 2
Iteration 3
Iteration 4
etc…
BigML, Inc 18Winter Release Webinar - March 2017
Demo #4
BigML, Inc 19Winter Release Webinar - March 2017
Boosting Parameters
• Number of iterations - similar to number of models for DF/RDF
• Iterations can be limited with Early Stopping:
• Early out of bag: tests with the out-of-bag samples
BigML, Inc 20Winter Release Webinar - March 2017
“OUT OF BAG”
SAMPLES
Boosting
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration 1
Iteration 2
Iteration 3
Iteration 4
etc…
BigML, Inc 21Winter Release Webinar - March 2017
Boosting Parameters
• Number of iterations - similar to number of models for DF/RDF
• Iterations can be limited with Early Stopping:
• Early out of bag: tests with the out-of-bag samples
• Early holdout: tests with a portion of the dataset
• None: performs all iterations. Note: In general, it is better to use a
high number of iterations and let the early stopping work.
BigML, Inc 22Winter Release Webinar - March 2017
Iterations
Boosted Ensemble #1
1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50
Early Stop # Iterations
1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50
Boosted Ensemble #2
Early Stop# Iterations
This is OK because the early stop means the iterative improvement is small

and we have "converged" before being forcibly stopped by the # iterations
This is NOT OK because the hard limit on iterations stopped improving the quality of the
boosting long before there was enough iterations to have achieved the best quality.
BigML, Inc 23Winter Release Webinar - March 2017
Boosting Parameters
• Number of iterations - similar to number of models for DF/RDF
• Iterations can be limited with Early Stopping:
• Early out of bag: tests with the out-of-bag samples
• Early holdout: tests with a portion of the dataset
• None: performs all iterations. Note: In general, it is better to use a
high number of iterations and let the early stopping work.
• Learning Rate: Controls how aggressively boosting will fit the data:
• Larger values ~ maybe quicker fit, but risk of overfitting
• You can combine sampling with Boosting!
• Samples with Replacement
• Add Randomize
BigML, Inc 24Winter Release Webinar - March 2017
Boosting
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration 1
Iteration 2
Iteration 3
Iteration 4
etc…
BigML, Inc 25Winter Release Webinar - March 2017
Boosting
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration 1
Iteration 2
Iteration 3
Iteration 4
etc…
BigML, Inc 26Winter Release Webinar - March 2017
Boosting Parameters
• Number of iterations - similar to number of models for DF/RDF
• Iterations can be limited with Early Stopping:
• Early out of bag: tests with the out-of-bag samples
• Early holdout: tests with a portion of the dataset
• None: performs all iterations. Note: In general, it is better to use a
high number of iterations and let the early stopping work.
• Learning Rate: Controls how aggressively boosting will fit the data:
• Larger values ~ maybe quicker fit, but risk of overfitting
• You can combine sampling with Boosting!
• Samples with Replacement
• Add Randomize
• All other tree based options are the same:
• weights: for unbalanced classes
• missing splits: to learn from missing data
• node threshold: increase to allow "deeper" trees
BigML, Inc 27Winter Release Webinar - March 2017
Wait a second….
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age diabetes
6 148 72 35 0 33,6 0,627 50 TRUE
1 85 66 29 0 26,6 0,351 31 FALSE
8 183 64 0 0 23,3 0,672 32 TRUE
1 89 66 23 94 28,1 0,167 21 FALSE
MODEL 1
predicted
diabetes
TRUE
TRUE
FALSE
FALSE
ERROR
?
?
?
?
… what about classification?
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age diabetes
6 148 72 35 0 33,6 0,627 50 1
1 85 66 29 0 26,6 0,351 31 0
8 183 64 0 0 23,3 0,672 32 1
1 89 66 23 94 28,1 0,167 21 0
MODEL 1
predicted
diabetes
1
1
0
0
ERROR
0
-1
1
0
… we could try
BigML, Inc 28Winter Release Webinar - March 2017
Wait a second….
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age
favorite
color
6 148 72 35 0 33,6 0,627 50 RED
1 85 66 29 0 26,6 0,351 31 GREEN
8 183 64 0 0 23,3 0,672 32 BLUE
1 89 66 23 94 28,1 0,167 21 RED
MODEL 1
predicted
favorite color
BLUE
GREEN
RED
GREEN
ERROR
?
?
?
?
… but then what about multiple classes?
BigML, Inc 29Winter Release Webinar - March 2017
Boosting Classification
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age
favorite
color
6 148 72 35 0 33,6 0,627 50 RED
1 85 66 29 0 26,6 0,351 31 GREEN
8 183 64 0 0 23,3 0,672 32 BLUE
1 89 66 23 94 28,1 0,167 21 RED
MODEL 1
RED/NOT RED
Class RED
Probability
0,9
0,7
0,46
0,12
Class RED
ERROR
0,1
-0,7
0,54
-0,12
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age ERROR
6 148 72 35 0 33,6 0,627 50 0,1
1 85 66 29 0 26,6 0,351 31 -0,7
8 183 64 0 0 23,3 0,672 32 0,54
1 89 66 23 94 28,1 0,167 21 -0,12
MODEL 2
RED/NOT RED ERR
PREDICTED
ERROR
0,05
-0,54
0,32
-0,22
MODEL 1
BLUE/NOT BLUE
Class BLUE
Probability
0,1
0,3
0,54
0,88
Class BLUE
ERROR
-0,1
0,7
-0,54
0,12
…and repeat for each
class at each iteration
…and repeat for each
class at each iteration
Iteration 1
Iteration 2
BigML, Inc 30Winter Release Webinar - March 2017
Boosting Classification
DATASET
MODELS 1
per class
DATASETS 2
per class
MODELS 2
per class
PREDICTIONS 1
per class
PREDICTIONS 2
per class
PREDICTIONS 3
per class
PREDICTIONS 4
per class
Comb
PROBABILITY
per class
MODELS 3
per class
MODELS 4
per class
DATASETS 3
per class
DATASETS 4
per class
Iteration 1
Iteration 2
Iteration 3
Iteration 4
etc…
BigML, Inc 31Winter Release Webinar - March 2017
Demo #5
BigML, Inc 32Winter Release Webinar - March 2017
Programmable
API First!
Bindings!
WhizzML!
BigML, Inc 33Winter Release Webinar - March 2017
What to use?
• The one that works best!
• Ok, but seriously. Did you evaluate?
• For "large" / "complex" datasets
• Use DF/RDF with deeper node threshold
• Even better, use Boosting with more iterations
• For "noisy" data
• Boosting may overfit
• RDF preferred
• For "wide" data
• Randomize features (RDF) will be quicker
• For "easy" data
• A single model may be fine
• Bonus: also has the best interpretability!
• For classification with "large" number of classes
• Boosting will be slower
• For "general" data
• DF/RDF likely better than a single model or Boosting.
• Boosting will be slower since the models are processed serially
Questions?
Twitter: @bigmlcom
Mail: info@bigml.com
Docs: https://bigml.com/releases

More Related Content

Similar to BigML Winter 2017 Release

VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsBigML, Inc
 
U Unit 6 [MT355] Page 1 of 3 Unit 6 Assignment.docx
U Unit 6 [MT355]  Page 1 of 3  Unit 6 Assignment.docxU Unit 6 [MT355]  Page 1 of 3  Unit 6 Assignment.docx
U Unit 6 [MT355] Page 1 of 3 Unit 6 Assignment.docxouldparis
 
Surfing Microsoft 365 waves: a Microsoft 365 roadmap analysis with Power BI -...
Surfing Microsoft 365 waves: a Microsoft 365 roadmap analysis with Power BI -...Surfing Microsoft 365 waves: a Microsoft 365 roadmap analysis with Power BI -...
Surfing Microsoft 365 waves: a Microsoft 365 roadmap analysis with Power BI -...Patrick Guimonet
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBigML, Inc
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
VSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsVSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsBigML, Inc
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
Better Functional Design through TDD
Better Functional Design through TDDBetter Functional Design through TDD
Better Functional Design through TDDPhil Calçado
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowDatabricks
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
Optimization Direct: Introduction and recent case studies
Optimization Direct: Introduction and recent case studiesOptimization Direct: Introduction and recent case studies
Optimization Direct: Introduction and recent case studiesAlkis Vazacopoulos
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveHyderabad Scalability Meetup
 
Forecasting QuestionsStudent NameUniversity Affiliate.docx
Forecasting QuestionsStudent NameUniversity Affiliate.docxForecasting QuestionsStudent NameUniversity Affiliate.docx
Forecasting QuestionsStudent NameUniversity Affiliate.docxalisoncarleen
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 

Similar to BigML Winter 2017 Release (20)

VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and Deepnets
 
Training Module
Training ModuleTraining Module
Training Module
 
U Unit 6 [MT355] Page 1 of 3 Unit 6 Assignment.docx
U Unit 6 [MT355]  Page 1 of 3  Unit 6 Assignment.docxU Unit 6 [MT355]  Page 1 of 3  Unit 6 Assignment.docx
U Unit 6 [MT355] Page 1 of 3 Unit 6 Assignment.docx
 
Surfing Microsoft 365 waves: a Microsoft 365 roadmap analysis with Power BI -...
Surfing Microsoft 365 waves: a Microsoft 365 roadmap analysis with Power BI -...Surfing Microsoft 365 waves: a Microsoft 365 roadmap analysis with Power BI -...
Surfing Microsoft 365 waves: a Microsoft 365 roadmap analysis with Power BI -...
 
ppt2.pptx
ppt2.pptxppt2.pptx
ppt2.pptx
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
VSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsVSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic Regressions
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
Better Functional Design through TDD
Better Functional Design through TDDBetter Functional Design through TDD
Better Functional Design through TDD
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Optimization Direct: Introduction and recent case studies
Optimization Direct: Introduction and recent case studiesOptimization Direct: Introduction and recent case studies
Optimization Direct: Introduction and recent case studies
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Forecasting QuestionsStudent NameUniversity Affiliate.docx
Forecasting QuestionsStudent NameUniversity Affiliate.docxForecasting QuestionsStudent NameUniversity Affiliate.docx
Forecasting QuestionsStudent NameUniversity Affiliate.docx
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryBigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility Industry
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

BigML Winter 2017 Release

  • 2. BigML, Inc 2Winter Release Webinar - March 2017 Winter 2017 Release POUL PETERSEN (CIO) Enter questions into chat box – we’ll answer some via chat; others at the end of the session https://bigml.com/releases ATAKAN CETINSOY, (VP Predictive Applications) Resources Moderator Speaker Contact info@bigml.com Twitter @bigmlcom Questions
  • 3. BigML, Inc 3Winter Release Webinar - March 2017 Boosted Trees • Ensemble method • Supervised Learning • Classification & Regression • Now "BigML Easy"! NUMBER OF MODELS 1 NUMBER OF MODELS 10 NUMBER OF MODELS 20
  • 4. BigML, Inc 4Winter Release Webinar - March 2017 Questions… • Novice: Wait, what is Supervised Learning again? • Intermediate: When are ensemble methods useful and how are Boosted Trees different from the ensemble methods already available in BigML? • Power User: How does Boosting work and what do all these knobs do?
  • 5. BigML, Inc 5Winter Release Webinar - March 2017 Instances Supervised Learning 4113 NW Peppertree 4 2 1876 7841 1977 44,598141 -123,29689 366000 2450 NW 13th 3 2 1502 9148 1964 44,593322 -123,267435 265000 3545 NE Canterbury 3 1,5 1172 10019 1972 44,603042 -123,236332 245000 1245 NW Kainui 4 2 1864 49658 1974 44,651826 -123,250615 341500 Features ADDRESS BEDS BATHS SQFT LOT SIZE YEAR BUILT LATITUDE LONGITUDE LAST SALE PRICE 1522 NW Jonquil 4 3 2424 5227 1991 44,594828 -123,269328 360000 7360 NW Valley Vw 3 2 1785 25700 1979 44,643876 -123,238189 307500 4748 NW Veronica 5 3,5 4135 6098 2004 44,5929659 -123,306916 600000 411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350 2975 NW Stewart 3 1,5 1327 9148 1973 44,597319 -123,25452 245000 LAST SALE PRICE 360000 307500 600000 435350 245000 366000 265000 245000 341500 L a b Supervised Learning: There is a column of data, the label, we want to predict.
  • 6. BigML, Inc 6Winter Release Webinar - March 2017 Classification & Regression animal state … proximity action tiger hungry … close run elephant happy … far take picture … … … … … Classification animal state … proximity min_kmh tiger hungry … close 70 hippo angry … far 10 … …. … … … Regression label
  • 7. BigML, Inc 7Winter Release Webinar - March 2017 Evaluation Instances Features ADDRESS BEDS BATHS SQFT LOT SIZE YEAR BUILT LATITUDE LONGITUDE LAST SALE PRICE 1522 NW Jonquil 4 3 2424 5227 1991 44,594828 -123,269328 360000 7360 NW Valley Vw 3 2 1785 25700 1979 44,643876 -123,238189 307500 4748 NW Veronica 5 3,5 4135 6098 2004 44,5929659 -123,306916 600000 411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350 2975 NW Stewart 3 1,5 1327 9148 1973 44,597319 -123,25452 245000 LAST SALE PRICE 360000 307500 600000 435350 245000 L a b 4113 NW Peppertree 4 2 1876 7841 1977 44,598141 -123,29689 366000 2450 NW 13th 3 2 1502 9148 1964 44,593322 -123,267435 265000 3545 NE Canterbury 3 1,5 1172 10019 1972 44,603042 -123,236332 245000 1245 NW Kainui 4 2 1864 49658 1974 44,651826 -123,250615 341500 366000 265000 245000 341500 MODEL EVALUATION
  • 8. BigML, Inc 8Winter Release Webinar - March 2017 Demo #1
  • 9. BigML, Inc 9Winter Release Webinar - March 2017 Why Ensembles? MODEL PREDICTIONDATASET • Model can "overfit" - especially if there are outliers. Consider: mansions • Model might "underfit" - especially if there is not enough training data to discover a single good representation. Consider: No mid-size homes • The data may be too "complicated" for the model to represent at all. Consider: Trying to fit a line to exponential pricing • Models might be "unlucky" - algorithms typically rely on stochastic processes. This is more subtle… Problems: Ensemble Intuition: • Build many models, each with only some of the data and outliers will be averaged out • Built many models instead of relying on one guess. • Built many models, allowing each to focus on part of the data thus increasing complexity of the solution. • Built many models to avoid unlucky initial conditions.
  • 10. BigML, Inc 10Winter Release Webinar - March 2017 Decision Forest MODEL 1 DATASET SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 MODEL 2 MODEL 3 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION VOTE aka "Bagging"
  • 11. BigML, Inc 11Winter Release Webinar - March 2017 Demo #2
  • 12. BigML, Inc 12Winter Release Webinar - March 2017 Random Decision Forest MODEL 1 DATASET SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 MODEL 2 MODEL 3 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 SAMPLE 1 PREDICTION VOTE
  • 13. BigML, Inc 13Winter Release Webinar - March 2017 Demo #3
  • 14. BigML, Inc 14Winter Release Webinar - March 2017 Boosting Question: • Instead of sampling tricks… can we incrementally improve a single model? MODEL BATCH PREDICTION Intuition: • The training data has the correct answer, so we know the errors exactly. Can we model the errors? DATASET label DATASET label prediction DATASET label prediction error
  • 15. BigML, Inc 15Winter Release Webinar - March 2017 Boosting ADDRESS BEDS BATHS SQFT LOT SIZE YEAR BUILT LATITUDE LONGITUDE LAST SALE PRICE 1522 NW Jonquil 4 3 2424 5227 1991 44,594828 -123,269328 360000 7360 NW Valley Vw 3 2 1785 25700 1979 44,643876 -123,238189 307500 4748 NW Veronica 5 3,5 4135 6098 2004 44,5929659 -123,306916 600000 411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350 MODEL 1 PREDICTED SALE PRICE 360750 306875 587500 435350 ERROR 750 -625 -12500 0 ADDRESS BEDS BATHS SQFT LOT SIZE YEAR BUILT LATITUDE LONGITUDE ERROR 1522 NW Jonquil 4 3 2424 5227 1991 44,594828 -123,269328 750 7360 NW Valley Vw 3 2 1785 25700 1979 44,643876 -123,238189 625 4748 NW Veronica 5 3,5 4135 6098 2004 44,5929659 -123,306916 12500 411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 0 MODEL 2 PREDICTED ERROR 750 625 12393,83333 6879,67857 Why stop at one iteration? • "Hey Model 1, what do you predict is the sale price of this home?" • "Hey Model 2, how much error do you predict Model 1 just made?"
  • 16. BigML, Inc 16Winter Release Webinar - March 2017 Boosting DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration 1 Iteration 2 Iteration 3 Iteration 4 etc…
  • 17. BigML, Inc 17Winter Release Webinar - March 2017 Boosting DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration 1 Iteration 2 Iteration 3 Iteration 4 etc…
  • 18. BigML, Inc 18Winter Release Webinar - March 2017 Demo #4
  • 19. BigML, Inc 19Winter Release Webinar - March 2017 Boosting Parameters • Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping: • Early out of bag: tests with the out-of-bag samples
  • 20. BigML, Inc 20Winter Release Webinar - March 2017 “OUT OF BAG” SAMPLES Boosting DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration 1 Iteration 2 Iteration 3 Iteration 4 etc…
  • 21. BigML, Inc 21Winter Release Webinar - March 2017 Boosting Parameters • Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping: • Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a high number of iterations and let the early stopping work.
  • 22. BigML, Inc 22Winter Release Webinar - March 2017 Iterations Boosted Ensemble #1 1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50 Early Stop # Iterations 1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50 Boosted Ensemble #2 Early Stop# Iterations This is OK because the early stop means the iterative improvement is small and we have "converged" before being forcibly stopped by the # iterations This is NOT OK because the hard limit on iterations stopped improving the quality of the boosting long before there was enough iterations to have achieved the best quality.
  • 23. BigML, Inc 23Winter Release Webinar - March 2017 Boosting Parameters • Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping: • Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a high number of iterations and let the early stopping work. • Learning Rate: Controls how aggressively boosting will fit the data: • Larger values ~ maybe quicker fit, but risk of overfitting • You can combine sampling with Boosting! • Samples with Replacement • Add Randomize
  • 24. BigML, Inc 24Winter Release Webinar - March 2017 Boosting DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration 1 Iteration 2 Iteration 3 Iteration 4 etc…
  • 25. BigML, Inc 25Winter Release Webinar - March 2017 Boosting DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration 1 Iteration 2 Iteration 3 Iteration 4 etc…
  • 26. BigML, Inc 26Winter Release Webinar - March 2017 Boosting Parameters • Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping: • Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a high number of iterations and let the early stopping work. • Learning Rate: Controls how aggressively boosting will fit the data: • Larger values ~ maybe quicker fit, but risk of overfitting • You can combine sampling with Boosting! • Samples with Replacement • Add Randomize • All other tree based options are the same: • weights: for unbalanced classes • missing splits: to learn from missing data • node threshold: increase to allow "deeper" trees
  • 27. BigML, Inc 27Winter Release Webinar - March 2017 Wait a second…. pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age diabetes 6 148 72 35 0 33,6 0,627 50 TRUE 1 85 66 29 0 26,6 0,351 31 FALSE 8 183 64 0 0 23,3 0,672 32 TRUE 1 89 66 23 94 28,1 0,167 21 FALSE MODEL 1 predicted diabetes TRUE TRUE FALSE FALSE ERROR ? ? ? ? … what about classification? pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age diabetes 6 148 72 35 0 33,6 0,627 50 1 1 85 66 29 0 26,6 0,351 31 0 8 183 64 0 0 23,3 0,672 32 1 1 89 66 23 94 28,1 0,167 21 0 MODEL 1 predicted diabetes 1 1 0 0 ERROR 0 -1 1 0 … we could try
  • 28. BigML, Inc 28Winter Release Webinar - March 2017 Wait a second…. pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age favorite color 6 148 72 35 0 33,6 0,627 50 RED 1 85 66 29 0 26,6 0,351 31 GREEN 8 183 64 0 0 23,3 0,672 32 BLUE 1 89 66 23 94 28,1 0,167 21 RED MODEL 1 predicted favorite color BLUE GREEN RED GREEN ERROR ? ? ? ? … but then what about multiple classes?
  • 29. BigML, Inc 29Winter Release Webinar - March 2017 Boosting Classification pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age favorite color 6 148 72 35 0 33,6 0,627 50 RED 1 85 66 29 0 26,6 0,351 31 GREEN 8 183 64 0 0 23,3 0,672 32 BLUE 1 89 66 23 94 28,1 0,167 21 RED MODEL 1 RED/NOT RED Class RED Probability 0,9 0,7 0,46 0,12 Class RED ERROR 0,1 -0,7 0,54 -0,12 pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age ERROR 6 148 72 35 0 33,6 0,627 50 0,1 1 85 66 29 0 26,6 0,351 31 -0,7 8 183 64 0 0 23,3 0,672 32 0,54 1 89 66 23 94 28,1 0,167 21 -0,12 MODEL 2 RED/NOT RED ERR PREDICTED ERROR 0,05 -0,54 0,32 -0,22 MODEL 1 BLUE/NOT BLUE Class BLUE Probability 0,1 0,3 0,54 0,88 Class BLUE ERROR -0,1 0,7 -0,54 0,12 …and repeat for each class at each iteration …and repeat for each class at each iteration Iteration 1 Iteration 2
  • 30. BigML, Inc 30Winter Release Webinar - March 2017 Boosting Classification DATASET MODELS 1 per class DATASETS 2 per class MODELS 2 per class PREDICTIONS 1 per class PREDICTIONS 2 per class PREDICTIONS 3 per class PREDICTIONS 4 per class Comb PROBABILITY per class MODELS 3 per class MODELS 4 per class DATASETS 3 per class DATASETS 4 per class Iteration 1 Iteration 2 Iteration 3 Iteration 4 etc…
  • 31. BigML, Inc 31Winter Release Webinar - March 2017 Demo #5
  • 32. BigML, Inc 32Winter Release Webinar - March 2017 Programmable API First! Bindings! WhizzML!
  • 33. BigML, Inc 33Winter Release Webinar - March 2017 What to use? • The one that works best! • Ok, but seriously. Did you evaluate? • For "large" / "complex" datasets • Use DF/RDF with deeper node threshold • Even better, use Boosting with more iterations • For "noisy" data • Boosting may overfit • RDF preferred • For "wide" data • Randomize features (RDF) will be quicker • For "easy" data • A single model may be fine • Bonus: also has the best interpretability! • For classification with "large" number of classes • Boosting will be slower • For "general" data • DF/RDF likely better than a single model or Boosting. • Boosting will be slower since the models are processed serially