SlideShare a Scribd company logo
1st edition | July 8-11, 2019
BigML, Inc #DutchMLSchool 2
Feature Engineering
Creating Features that Make Machine Learning Work
Poul Petersen
CIO, BigML, Inc
BigML, Inc #DutchMLSchool
Gaming the ML Performance
3
• Use ML to improve performance automatically
• OptiML
• Unsupervised Feature Engineering (PCA, Topic Models,
Clustering, Anomaly Detection, etc)
• Automated feature selection
• Use domain knowledge to improve performance manually
• Bespoke features (requires expertise)
• Fusions of models
• Manual feature selection
A Tale of Two Strategies…
BigML, Inc #DutchMLSchool
what is Feature Engineering
4
Feature Engineering: applying domain knowledge of
the data to create new features that allow ML
algorithms to work better, or to work at all.
• This is really, really important - more than algorithm selection!
• In fact, so important that BigML often does it
automatically
• ML Algorithms have no deeper understanding of data
• Numerical: have a natural order, can be scaled, etc
• Categorical: have discrete values, etc.
• The "magic" is the ability to find patterns quickly and efficiently
• ML Algorithms only know what you tell/show it with data
• Medical: Kg and M, but BMI = Kg/M2 is better
• Lending: Debt and Income, but DTI is better
• Intuition can be risky: remember to prove it with an evaluation!
BigML, Inc #DutchMLSchool
Built-in Transformations
5
2013-09-25 10:02
Date-Time Fields
… year month day hour minute …
… 2013 Sep 25 10 2 …
… … … … … … …
NUM NUMCAT NUM NUM
• Date-Time fields have a lot of information "packed" into them
• Splitting out the time components allows ML algorithms to
discover time-based patterns.
DATE-TIME
BigML, Inc #DutchMLSchool
Built-in Transformations
6
Categorical Fields for Clustering/LR
… alchemy_category …
… business …
… recreation …
… health …
… … …
CAT
business health recreation …
… 1 0 0 …
… 0 0 1 …
… 0 1 0 …
… … … … …
NUM NUM NUM
• Clustering and Logistic Regression require numeric fields for
inputs
• Categorical values are transformed to numeric vectors
automatically*
• *Note: In BigML, clustering uses k-prototypes and the encoding used for LR can be configured.
BigML, Inc #DutchMLSchool
Built-in Transformations
7
Be not afraid of greatness:
some are born great, some achieve
greatness, and some have greatness
thrust upon ‘em.
TEXT
Text Fields
… great afraid born achieve …
… 4 1 1 1 …
… … … … … …
NUM NUM NUM NUM
• Unstructured text contains a lot of potentially interesting
patterns
• Bag-of-words analysis happens automatically and extracts
the "interesting" tokens in the text
• Another option is Topic Modeling to extract thematic meaning
BigML, Inc #DutchMLSchool
Help ML to Work Better
8
{
“url":"cbsnews",
"title":"Breaking News Headlines
Business Entertainment World News “,
"body":" news covering all the latest
breaking national and world news
headlines, including politics, sports,
entertainment, business and more.”
}
TEXT
title body
Breaking News… news covering…
… …
TEXT TEXT
When text is not actually unstructured
• In this case, the text field has structure (key/value pairs)
• Extracting the structure as new features may allow the ML
algorithm to work better
BigML, Inc #DutchMLSchool
FE Demo #1
9
BigML, Inc #DutchMLSchool
Help ML to Work at all
10
When the pattern does not exist
Highway Number Direction Is Long
2 East-West FALSE
4 East-West FALSE
5 North-South TRUE
8 East-West FALSE
10 East-West TRUE
… … …
Goal: Predict principle direction from highway number
( = (mod (field "Highway Number") 2) 0)
BigML, Inc #DutchMLSchool
FE Demo #2
11
BigML, Inc #DutchMLSchool
Feature Engineering
12
Discretization
Total Spend
7.342,99
304,12
4,56
345,87
8.546,32
NUM
“Predict will spend
$3,521 with error
$1,232”
Spend Category
Top 33%
Bottom 33%
Bottom 33%
Middle 33%
Top 33%
CAT
“Predict customer
will be Top 33% in
spending”
BigML, Inc #DutchMLSchool
FE Demo #3
13
BigML, Inc #DutchMLSchool
Built-ins for FE
14
• Discretize: Converts a numeric value to categorical
• Replace missing values: fixed/max/mean/median/etc
• Normalize: Adjust a numeric value to a specific range of
values while preserving the distribution
• Math: Exponentiation, Logarithms, Squares, Roots, etc
• Types: Force a field value to categorical, integer, or real
• Random: Create random values for introducing noise
• Statistics: Mean, Population
• Refresh Fields:
• Types: recomputes field types. Ex: #classes > 1000
• Preferred: recomputes preferred status
BigML, Inc #DutchMLSchool
Flatline Add Fields
15
Computing with Existing Features
Debt Income
10.134 100.000
85.234 134.000
8.112 21.500
0 45.900
17.534 52.000
NUM NUM
(/ (field "Debt") (field "Income"))
Debt
Income
Debt to Income Ratio
0,10
0,64
0,38
0
0,34
NUM
BigML, Inc #DutchMLSchool
FE Demo #4
16
BigML, Inc #DutchMLSchool
What is Flatline?
17
• DSL:
• Invented by BigML - Programmatic / Optimized for
speed
• Transforms datasets into new datasets
• Adding new fields / Filtering
• Transformations are written in lisp-style syntax
• Feature Engineering
• Computing new fields: (/ (field "Debt") (field
“Income”))
• Programmatic Filtering:
• Filtering datasets according to functions that evaluate
to true/false using the row of data as an input.
Flatline: a domain specific language for feature
engineering and programmatic filtering
BigML, Inc #DutchMLSchool
Flatline
18
• Lisp style syntax: Operators come first
• Correct: (+ 1 2) => NOT Correct: (1 + 2)
• Dataset Fields are first-class citizens
• (field “diabetes pedigree”)
• Limited programming language structures
• let, cond, if, map, list operators, */+-, etc.
• Built-in transformations
• statistics, strings, timestamps, windows
BigML, Inc #DutchMLSchool
Flatline s-expressions
19
(= 0 (+ (abs ( f "Month - 3" ) ) (abs ( f "Month - 2")) (abs ( f "Month - 1") ) ))
Name Month - 3 Month - 2 Month - 1
Joe Schmo 123,23 0 0
Jane Plain 0 0 0
Mary Happy 0 55,22 243,33
Tom Thumb 12,34 8,34 14,56
Un-Labelled Data
Labelled data
Name Month - 3 Month - 2 Month - 1 Default
Joe Schmo 123,23 0 0 FALSE
Jane Plain 0 0 0 TRUE
Mary Happy 0 55,22 243,33 FALSE
Tom Thumb 12,34 8,34 14,56 FALSE
Adding Simple Labels to Data
Define "default" as
missing three payments
in a row
BigML, Inc #DutchMLSchool
FE Demo #5
20
BigML, Inc #DutchMLSchool
Flatline s-expressions
21
date volume price
1 34353 314
2 44455 315
3 22333 315
4 52322 321
5 28000 320
6 31254 319
7 56544 323
8 44331 324
9 81111 287
10 65422 294
11 59999 300
12 45556 302
13 19899 301
Current - (4-day avg)
std dev
Shock: Deviations from a Trend
day-4 day-3 day-2 day-1 4davg
-
314 -
314 315 -
314 315 315 -
314 315 315 321 316,25
315 315 321 320 317,75
315 321 320 319 318,75
BigML, Inc #DutchMLSchool
Flatline s-expressions
22
Current - (4-day avg)
std dev
Shock: Deviations from a Trend
Current : (field “price”)
4-day avg: (avg-window “price” -4 -1)
std dev: (standard-deviation “price”)
(/ (- ( f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
BigML, Inc #DutchMLSchool
FE Demo #6
23
BigML, Inc #DutchMLSchool
Advanced s-expressions
24
( = (mod (field "Highway Number")
2) 0)
Highway isEven?
BigML, Inc #DutchMLSchool
Advanced s-expressions
25
( /
( mod
( -
( /
( epoch ( field "date-field" ))
1000
)
621300
)
2551443
)
2551442
)
Moon Phase%
https://gist.github.com/petersen-poul/0cf5022ed1768837fe13af72b2488329
BigML, Inc #DutchMLSchool
Home Price Feature
26
Worth More
Worth Less
BigML, Inc #DutchMLSchool
Home Price Feature
27
LATITUDE LONGITUDE REFERENCE
LATITUDE
REFERENCE
LONGITUDE
44,583 -123,296775 44,5638 -123,2794
44,604414 -123,296129 44,5638 -123,2794
44,600108 -123,29707 44,5638 -123,2794
44,603077 -123,295004 44,5638 -123,2794
44,589587 -123,301154 44,5638 -123,2794
Distance (m)
700
30,4
19,38
37,8
23,39
BigML, Inc #DutchMLSchool
Haversine Formula
28
https://en.wikipedia.org/wiki/Haversine_formula
BigML, Inc #DutchMLSchool
Advanced s-expressions
29
( let
( R 6371000
latA (to-radians {lat-ref})
latB (to-radians ( field "LATITUDE" ) )
latD ( - latB latA )
longD ( to-radians ( - ( field "LONGITUDE" ) {long-ref} )
)
a ( +
( square ( sin ( / latD 2 ) ) )
( *
(cos latA)
(cos latB)
(square ( sin ( / longD 2)))
)
)
c ( * 2 ( asin ( min (list 1 (sqrt a)))))
)
( * R c )
)
Distance Lat/Long <=> Ref (Haversine)
BigML, Inc #DutchMLSchool
WhizzML + Flatline
30
HAVERSINE
FLATLINE
OUTPUT
DATASET
INPUT
DATASET
LONG Ref
LAT Ref
WHIZZML SCRIPT
https://bigml.com/gallery/scripts
BigML, Inc #DutchMLSchool
Advanced s-expressions
31
JSON Parser???
• Remember, Flatline is not a full programming language
• No loops
• No accumulated values
• Code executes on one row at a time and has a limited
view into other rows
https://gist.github.com/petersen-poul/504c62ceaace76227cc6d8e0c5f1704b
BigML, Inc #DutchMLSchool
Feature Engineering
32
Fix Missing Values in a “Meaningful” Way
F i l t e r
Zeros
Model 

insulin
Predict 

insulin
Select 

insulin
Fixed

Dataset
Amended

Dataset
Original

Dataset
Clean

Dataset
( if ( = (field "insulin") 0) (field "predicted insulin") (field "insulin"))
BigML, Inc #DutchMLSchool
FE Demo #7
33
BigML, Inc #DutchMLSchool
Feature Selection
34
BigML, Inc #DutchMLSchool
Feature Selection
35
• Model Summary
• Field Importance
• Algorithmic
• Best-First Feature Selection
• Boruta
• Leakage
• Tight Correlations (AD, Plot, Correlations)
• Test Data
• Perfect future knowledge
Care must be taken when creating features!
BigML, Inc #DutchMLSchool
Feature Selection
36
Leakage
• sales pipeline where step n-1 has no other
outcome then step n.
• stock close predicts stock open
• churn retention: the worst rep is actually the best
(correlation != causation)
• cancer prediction where one input is a doctor
ordered test for the condition
• 	account ID predicts fraud (because only new
accounts are fraudsters)
BigML, Inc #DutchMLSchool
Summary
37
• Feature Engineering: what is it / why it is important
• Automatic transformations: date-time, text, etc
• Built-in functions: filtering and feature engineering
• Discretization / Normalization / etc.
• Flatline: programmatic feature engineering / filtering
• Structure
• Examples: Adding fields / filtering
• When building features it is important to watch for leakage
BigML, Inc #DutchMLSchool 38
OptiML and Fusions
Automating Machine Learning
Poul Petersen
CIO, BigML, Inc
BigML, Inc #DutchMLSchool
Title
39
Decreasing Interpretability / Better Representation / Longer Training
IncreasingDataSize/Complexity
Early Stage

Rapid Prototyping
Mid Stage

Proven Application
Late Stage

Critical Performance
DeepnetsSingle Tree Model
Logistic Regression Boosted Trees
Random

Decision Forest
Decision Forest
TO
O
H
AR
D
BigML, Inc #DutchMLSchool
BigML Deepnets
40
• The success of a Deepnet is dependent on getting the right
network structure for the dataset
• But, there are too many parameters:
• Nodes, layers, activation function, learning rate, etc…
• And setting them takes significant expert knowledge
• Solution:
• Metalearning (a good initial guess)
• Network search (try a bunch)
Remember this?
BigML, Inc #DutchMLSchool
OptiML
41
• Each resource has several parameters that impact quality
• Number of trees, missing splits, nodes, weight
• Rather than trial and error, we can use ML to find ideal
parameters
• Why not make the model type, Decision Tree, Boosted Tree,
etc, a parameter as well?
• Similar to Deepnet network search, but finds the optimum
machine learning algorithm and parameters for your data
automatically
Key Insight: We can solve any parameter selection
problem in a similar way.
BigML, Inc #DutchMLSchool
The Challenge…
42
• We will start with a dataset from StumbleUpon
• Train/Test split with seed “bigml”
• Build and Evaluate:
• 1-click Model, LR, Ensemble, Deepnet
• Top model from OptiML output
• Compare the results using the phi coefficient
• Explore other ideas for improving performance further
BigML, Inc #DutchMLSchool
OptiML Demo
43
BigML, Inc #DutchMLSchool
Results…
44
All scores are phi, evaluated against a holdout
• 1-Click Decision Tree: 0.36
• 1-Click LR: 0.47
• 1-Click Ensemble: 0.58
• Best OptiML Model (LR): 0.66
• 1-Click Deepnet: 0.67
•
What else can we try?
BigML, Inc #DutchMLSchool
Fusions Inside
45
• Fuse any set of models into a new “fusion”

• Must have the same objective type

• Inputs and feature space can differ

• Weights can be added 

• Give more importance to individual models

• Fusions can be fused as well

• Especially useful for fusing OptiML models
Key Insight: ML algorithms each have unique
strengths and weaknesses
BigML, Inc #DutchMLSchool
Performance thru Diversity
46
Dataset
Optimized 

Deepnet
Optimized 

Ensemble
Optimized 

Logistic Regression
Better?
BigML, Inc #DutchMLSchool
Fusion Demo #1
47
BigML, Inc #DutchMLSchool
Results…
48
All scores are phi, evaluated against a holdout
• 1-Click Decision Tree: 0.36
• 1-Click LR: 0.47
• 1-Click Ensemble: 0.58
• Best OptiML Model (LR): 0.66
• 1-Click Deepnet: 0.67
• Fusion of top Model Types: 0.68
BigML, Inc #DutchMLSchool
Fusions: Under the Hood
49
P(TRUE) = [56+(100-67)+2*78] / 4
Model Prediction Probability Weight
Ensemble TRUE %56 1 Fus ion
Deepnet FALSE %67 1 TRUE %61
Model TRUE %78 2
Classification
Model Prediction Error Weight
Ensemble 156,78 12,56 1 Fus ion
Deepnet 139,55 9,88 1 160,13 17,49
Model 172,10 23,76 2
Regression
BigML, Inc #DutchMLSchool
Fusions: Like any BigML Model
50
• Fully accessible thru API and WhizzML

• Bindings have support for local predictions
BigML, Inc #DutchMLSchool
Decision Boundary Smoothness
51
Single Tree:
• Outcome changes abruptly near decision
boundary

• And not at all parallel to the boundary

• This can be “surprising”
Single Tree + Deepnet:
• Keep the interpretability of the tree

• But with a more nuanced decision boundary
BigML, Inc #DutchMLSchool
Feature Stability
52
Feature Importance: Different subsets of features may have similar modeling
performance
Fusing models gives better resilience against missing values as well as
ensuring that all relevant features are utilized.
BigML, Inc #DutchMLSchool
Weighting over Time
53
1 Day
Data significance over time:
• Some data may change significance in different times

• Short-term user behavior versus long-term

• Weights can set to account for significance of time
1 Week
1 Month
w=8
w=4
w=2
BigML, Inc #DutchMLSchool
Improved Class Separation
54
Consider a 3-class objective
• Really only care about “yes” versus “not yes”

• A single model may struggle to separate the two negative classes
Yes No Maybe
yes/no/maybe
yes/no
yes/maybe
BigML, Inc #DutchMLSchool
Feature Space Optimization
55
Model Skills: Some ML algorithms “generally” do better
on some feature types:
• RDF for sparse text vectors

• LR/Deepnets for numeric features

• Trees for categorical features
Full
Numeric
Text
BigML, Inc #DutchMLSchool
Fusions Demo #2
56
BigML, Inc #DutchMLSchool
Results…
57
All scores are phi, evaluated against a holdout
• 1-Click Decision Tree: 0.36
• 1-Click LR: 0.47
• 1-Click Ensemble: 0.58
• Best OptiML Model (LR): 0.66
• 1-Click Deepnet: 0.67
• Fusion of top Model Types: 0.68
• Custom Feature Fusion: 0.70
BigML, Inc #DutchMLSchool
PCA
Principal Component Analysis
Poul Petersen
CIO, BigML
58
BigML, Inc #DutchMLSchool
Issues with High Dimensionality
59
• Implicitly increases model complexity, prone to overfitting
• Requires more observations in order to generalize well
• Contains correlated or useless variables
• Data is difficult to visualize
• Takes a longer time to train models or make predictions
Principal Component Analysis
addresses all of these issues
BigML, Inc #DutchMLSchool
Other Approaches
60
MODEL Pruning, Node threshold
ENSEMBLE Bagging, Randomization
LOGISTIC
REGRESSION
L1 and L2 penalties
DEEPNET Dropout
BigML, Inc #DutchMLSchool
Dimensionality Reduction
61
Feature Selection
• Preserves the original variables and selects a subset
• Often uses recursive methods or statistical thresholds
• Examples: RFE, Chi-Squared Test, Boruta
Feature Extraction
• Transforms original variables into variables better suited for modeling
• Examples: word vectors, clustering
• PCA falls into this category
Manual Approach
BigML, Inc #DutchMLSchool
When to use PCA
62
1. You want to reduce the number of variables in your model, but
it is not clear which should be eliminated
2. You want to generate variables that are not correlated
3. You are okay with sacrificing some amount of interpretability
for potential downstream performance gains
BigML, Inc #DutchMLSchool
How Does PCA Work?
63
Each PC is a linear combination of original variables
PC1 = w1F1 + w2F2 + w3F3 + … + wNFN
PC2 = w1F1 + w2F2 + w3F3 + … + wNFN
PCN = w1F1 + w2F2 + w3F3 + … + wNFN
…
BigML, Inc #DutchMLSchool
PCA Output
64
These principal components are not correlated
BigML, Inc #DutchMLSchool
PCA Workflow
65
SOURCE DATASET
TRAIN
TEST
BigML, Inc #DutchMLSchool
PCA Workflow
66
PCA
SOURCE DATASET
TRAIN
TEST
BigML, Inc #DutchMLSchool
PCA Workflow
67
BATCH
PROJECTION
BATCH
PROJECTION
SOURCE DATASET
TRAIN
TEST
PCA
BigML, Inc #DutchMLSchool
PCA Workflow
68
NEW TRAIN
FEATURES
NEW TEST
FEATURES
BATCH
PROJECTION
BATCH
PROJECTION
SOURCE DATASET
TRAIN
TEST
PCA
BigML, Inc #DutchMLSchool
PCA Demo
69
BigML, Inc #DutchMLSchool
BigML PCA
70
• Standard PCA only applies to numerical data
• BigML uses three different data transformation methods in order to
handle different data types
• Numeric data: Principal Component Analysis (PCA)
• Categorical data: Multiple Correspondence Analysis (MCA)
• Mixed data: Factorial Analysis of Mixed Data (FAMD)
• BigML will automatically handle numeric, text, items, and categorical
data without needing user input
Co-organized by: Sponsor:
Business Partners:

More Related Content

What's hot

DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and Ensembles
BigML, Inc
 
DutchMLSchool. Opening Remarks
DutchMLSchool. Opening RemarksDutchMLSchool. Opening Remarks
DutchMLSchool. Opening Remarks
BigML, Inc
 
DutchMLSchool. Associations and Topic Models
DutchMLSchool. Associations and Topic ModelsDutchMLSchool. Associations and Topic Models
DutchMLSchool. Associations and Topic Models
BigML, Inc
 
DutchMLSchool. ML for Logistics
DutchMLSchool. ML for LogisticsDutchMLSchool. ML for Logistics
DutchMLSchool. ML for Logistics
BigML, Inc
 
MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles
BigML, Inc
 
VSSML18 Introduction to Supervised Learning
VSSML18 Introduction to Supervised LearningVSSML18 Introduction to Supervised Learning
VSSML18 Introduction to Supervised Learning
BigML, Inc
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
MLconf
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
Hakka Labs
 
Building Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemMLBuilding Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemML
sparktc
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
Eng Teong Cheah
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
Charles Vestur
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Databricks
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
Daniel Tunkelang
 
AI
AIAI
Machine Learning for Sales & Marketing
Machine Learning for Sales & MarketingMachine Learning for Sales & Marketing
Machine Learning for Sales & Marketing
Piyush Saggi
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
Paco Nathan
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
DIY Max-Diff webinar slides
DIY Max-Diff webinar slidesDIY Max-Diff webinar slides
DIY Max-Diff webinar slides
Displayr
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 

What's hot (20)

DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and Ensembles
 
DutchMLSchool. Opening Remarks
DutchMLSchool. Opening RemarksDutchMLSchool. Opening Remarks
DutchMLSchool. Opening Remarks
 
DutchMLSchool. Associations and Topic Models
DutchMLSchool. Associations and Topic ModelsDutchMLSchool. Associations and Topic Models
DutchMLSchool. Associations and Topic Models
 
DutchMLSchool. ML for Logistics
DutchMLSchool. ML for LogisticsDutchMLSchool. ML for Logistics
DutchMLSchool. ML for Logistics
 
MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles
 
VSSML18 Introduction to Supervised Learning
VSSML18 Introduction to Supervised LearningVSSML18 Introduction to Supervised Learning
VSSML18 Introduction to Supervised Learning
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Building Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemMLBuilding Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemML
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
AI
AIAI
AI
 
Machine Learning for Sales & Marketing
Machine Learning for Sales & MarketingMachine Learning for Sales & Marketing
Machine Learning for Sales & Marketing
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
DIY Max-Diff webinar slides
DIY Max-Diff webinar slidesDIY Max-Diff webinar slides
DIY Max-Diff webinar slides
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 

Similar to DutchMLSchool. Automating Decision Making

BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
BigML, Inc
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
BigML, Inc
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature Engineering
BigML, Inc
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
BigML, Inc
 
BigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with FlatlineBigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with Flatline
BigML, Inc
 
MLSD18. Real-World Use Case I
MLSD18. Real-World Use Case IMLSD18. Real-World Use Case I
MLSD18. Real-World Use Case I
BigML, Inc
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data Transformations
BigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
BigML, Inc
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
BigML, Inc
 
VSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsVSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data Transformations
BigML, Inc
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
BigML, Inc
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
BigML, Inc
 
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature Engineering
BigML, Inc
 
Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!
Louis Dorard
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
BigML, Inc
 
Predictive apps for startups
Predictive apps for startupsPredictive apps for startups
Predictive apps for startups
Louis Dorard
 
Pandas application
Pandas applicationPandas application
Pandas application
SohamChakraborty44
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
Databricks
 
MLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigMLMLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigML
BigML, Inc
 
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data Transformations
BigML, Inc
 

Similar to DutchMLSchool. Automating Decision Making (20)

BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature Engineering
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
 
BigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with FlatlineBigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with Flatline
 
MLSD18. Real-World Use Case I
MLSD18. Real-World Use Case IMLSD18. Real-World Use Case I
MLSD18. Real-World Use Case I
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data Transformations
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
 
VSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsVSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data Transformations
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
 
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature Engineering
 
Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
Predictive apps for startups
Predictive apps for startupsPredictive apps for startups
Predictive apps for startups
 
Pandas application
Pandas applicationPandas application
Pandas application
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
MLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigMLMLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigML
 
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data Transformations
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
BigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
BigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
BigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
BigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
BigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
BigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
BigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
BigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
BigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
BigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
BigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
BigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
BigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
BigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
BigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
BigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 

Recently uploaded

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 

Recently uploaded (20)

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 

DutchMLSchool. Automating Decision Making

  • 1. 1st edition | July 8-11, 2019
  • 2. BigML, Inc #DutchMLSchool 2 Feature Engineering Creating Features that Make Machine Learning Work Poul Petersen CIO, BigML, Inc
  • 3. BigML, Inc #DutchMLSchool Gaming the ML Performance 3 • Use ML to improve performance automatically • OptiML • Unsupervised Feature Engineering (PCA, Topic Models, Clustering, Anomaly Detection, etc) • Automated feature selection • Use domain knowledge to improve performance manually • Bespoke features (requires expertise) • Fusions of models • Manual feature selection A Tale of Two Strategies…
  • 4. BigML, Inc #DutchMLSchool what is Feature Engineering 4 Feature Engineering: applying domain knowledge of the data to create new features that allow ML algorithms to work better, or to work at all. • This is really, really important - more than algorithm selection! • In fact, so important that BigML often does it automatically • ML Algorithms have no deeper understanding of data • Numerical: have a natural order, can be scaled, etc • Categorical: have discrete values, etc. • The "magic" is the ability to find patterns quickly and efficiently • ML Algorithms only know what you tell/show it with data • Medical: Kg and M, but BMI = Kg/M2 is better • Lending: Debt and Income, but DTI is better • Intuition can be risky: remember to prove it with an evaluation!
  • 5. BigML, Inc #DutchMLSchool Built-in Transformations 5 2013-09-25 10:02 Date-Time Fields … year month day hour minute … … 2013 Sep 25 10 2 … … … … … … … … NUM NUMCAT NUM NUM • Date-Time fields have a lot of information "packed" into them • Splitting out the time components allows ML algorithms to discover time-based patterns. DATE-TIME
  • 6. BigML, Inc #DutchMLSchool Built-in Transformations 6 Categorical Fields for Clustering/LR … alchemy_category … … business … … recreation … … health … … … … CAT business health recreation … … 1 0 0 … … 0 0 1 … … 0 1 0 … … … … … … NUM NUM NUM • Clustering and Logistic Regression require numeric fields for inputs • Categorical values are transformed to numeric vectors automatically* • *Note: In BigML, clustering uses k-prototypes and the encoding used for LR can be configured.
  • 7. BigML, Inc #DutchMLSchool Built-in Transformations 7 Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon ‘em. TEXT Text Fields … great afraid born achieve … … 4 1 1 1 … … … … … … … NUM NUM NUM NUM • Unstructured text contains a lot of potentially interesting patterns • Bag-of-words analysis happens automatically and extracts the "interesting" tokens in the text • Another option is Topic Modeling to extract thematic meaning
  • 8. BigML, Inc #DutchMLSchool Help ML to Work Better 8 { “url":"cbsnews", "title":"Breaking News Headlines Business Entertainment World News “, "body":" news covering all the latest breaking national and world news headlines, including politics, sports, entertainment, business and more.” } TEXT title body Breaking News… news covering… … … TEXT TEXT When text is not actually unstructured • In this case, the text field has structure (key/value pairs) • Extracting the structure as new features may allow the ML algorithm to work better
  • 10. BigML, Inc #DutchMLSchool Help ML to Work at all 10 When the pattern does not exist Highway Number Direction Is Long 2 East-West FALSE 4 East-West FALSE 5 North-South TRUE 8 East-West FALSE 10 East-West TRUE … … … Goal: Predict principle direction from highway number ( = (mod (field "Highway Number") 2) 0)
  • 12. BigML, Inc #DutchMLSchool Feature Engineering 12 Discretization Total Spend 7.342,99 304,12 4,56 345,87 8.546,32 NUM “Predict will spend $3,521 with error $1,232” Spend Category Top 33% Bottom 33% Bottom 33% Middle 33% Top 33% CAT “Predict customer will be Top 33% in spending”
  • 14. BigML, Inc #DutchMLSchool Built-ins for FE 14 • Discretize: Converts a numeric value to categorical • Replace missing values: fixed/max/mean/median/etc • Normalize: Adjust a numeric value to a specific range of values while preserving the distribution • Math: Exponentiation, Logarithms, Squares, Roots, etc • Types: Force a field value to categorical, integer, or real • Random: Create random values for introducing noise • Statistics: Mean, Population • Refresh Fields: • Types: recomputes field types. Ex: #classes > 1000 • Preferred: recomputes preferred status
  • 15. BigML, Inc #DutchMLSchool Flatline Add Fields 15 Computing with Existing Features Debt Income 10.134 100.000 85.234 134.000 8.112 21.500 0 45.900 17.534 52.000 NUM NUM (/ (field "Debt") (field "Income")) Debt Income Debt to Income Ratio 0,10 0,64 0,38 0 0,34 NUM
  • 17. BigML, Inc #DutchMLSchool What is Flatline? 17 • DSL: • Invented by BigML - Programmatic / Optimized for speed • Transforms datasets into new datasets • Adding new fields / Filtering • Transformations are written in lisp-style syntax • Feature Engineering • Computing new fields: (/ (field "Debt") (field “Income”)) • Programmatic Filtering: • Filtering datasets according to functions that evaluate to true/false using the row of data as an input. Flatline: a domain specific language for feature engineering and programmatic filtering
  • 18. BigML, Inc #DutchMLSchool Flatline 18 • Lisp style syntax: Operators come first • Correct: (+ 1 2) => NOT Correct: (1 + 2) • Dataset Fields are first-class citizens • (field “diabetes pedigree”) • Limited programming language structures • let, cond, if, map, list operators, */+-, etc. • Built-in transformations • statistics, strings, timestamps, windows
  • 19. BigML, Inc #DutchMLSchool Flatline s-expressions 19 (= 0 (+ (abs ( f "Month - 3" ) ) (abs ( f "Month - 2")) (abs ( f "Month - 1") ) )) Name Month - 3 Month - 2 Month - 1 Joe Schmo 123,23 0 0 Jane Plain 0 0 0 Mary Happy 0 55,22 243,33 Tom Thumb 12,34 8,34 14,56 Un-Labelled Data Labelled data Name Month - 3 Month - 2 Month - 1 Default Joe Schmo 123,23 0 0 FALSE Jane Plain 0 0 0 TRUE Mary Happy 0 55,22 243,33 FALSE Tom Thumb 12,34 8,34 14,56 FALSE Adding Simple Labels to Data Define "default" as missing three payments in a row
  • 21. BigML, Inc #DutchMLSchool Flatline s-expressions 21 date volume price 1 34353 314 2 44455 315 3 22333 315 4 52322 321 5 28000 320 6 31254 319 7 56544 323 8 44331 324 9 81111 287 10 65422 294 11 59999 300 12 45556 302 13 19899 301 Current - (4-day avg) std dev Shock: Deviations from a Trend day-4 day-3 day-2 day-1 4davg - 314 - 314 315 - 314 315 315 - 314 315 315 321 316,25 315 315 321 320 317,75 315 321 320 319 318,75
  • 22. BigML, Inc #DutchMLSchool Flatline s-expressions 22 Current - (4-day avg) std dev Shock: Deviations from a Trend Current : (field “price”) 4-day avg: (avg-window “price” -4 -1) std dev: (standard-deviation “price”) (/ (- ( f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
  • 24. BigML, Inc #DutchMLSchool Advanced s-expressions 24 ( = (mod (field "Highway Number") 2) 0) Highway isEven?
  • 25. BigML, Inc #DutchMLSchool Advanced s-expressions 25 ( / ( mod ( - ( / ( epoch ( field "date-field" )) 1000 ) 621300 ) 2551443 ) 2551442 ) Moon Phase% https://gist.github.com/petersen-poul/0cf5022ed1768837fe13af72b2488329
  • 26. BigML, Inc #DutchMLSchool Home Price Feature 26 Worth More Worth Less
  • 27. BigML, Inc #DutchMLSchool Home Price Feature 27 LATITUDE LONGITUDE REFERENCE LATITUDE REFERENCE LONGITUDE 44,583 -123,296775 44,5638 -123,2794 44,604414 -123,296129 44,5638 -123,2794 44,600108 -123,29707 44,5638 -123,2794 44,603077 -123,295004 44,5638 -123,2794 44,589587 -123,301154 44,5638 -123,2794 Distance (m) 700 30,4 19,38 37,8 23,39
  • 28. BigML, Inc #DutchMLSchool Haversine Formula 28 https://en.wikipedia.org/wiki/Haversine_formula
  • 29. BigML, Inc #DutchMLSchool Advanced s-expressions 29 ( let ( R 6371000 latA (to-radians {lat-ref}) latB (to-radians ( field "LATITUDE" ) ) latD ( - latB latA ) longD ( to-radians ( - ( field "LONGITUDE" ) {long-ref} ) ) a ( + ( square ( sin ( / latD 2 ) ) ) ( * (cos latA) (cos latB) (square ( sin ( / longD 2))) ) ) c ( * 2 ( asin ( min (list 1 (sqrt a))))) ) ( * R c ) ) Distance Lat/Long <=> Ref (Haversine)
  • 30. BigML, Inc #DutchMLSchool WhizzML + Flatline 30 HAVERSINE FLATLINE OUTPUT DATASET INPUT DATASET LONG Ref LAT Ref WHIZZML SCRIPT https://bigml.com/gallery/scripts
  • 31. BigML, Inc #DutchMLSchool Advanced s-expressions 31 JSON Parser??? • Remember, Flatline is not a full programming language • No loops • No accumulated values • Code executes on one row at a time and has a limited view into other rows https://gist.github.com/petersen-poul/504c62ceaace76227cc6d8e0c5f1704b
  • 32. BigML, Inc #DutchMLSchool Feature Engineering 32 Fix Missing Values in a “Meaningful” Way F i l t e r Zeros Model 
 insulin Predict 
 insulin Select 
 insulin Fixed
 Dataset Amended
 Dataset Original
 Dataset Clean
 Dataset ( if ( = (field "insulin") 0) (field "predicted insulin") (field "insulin"))
  • 35. BigML, Inc #DutchMLSchool Feature Selection 35 • Model Summary • Field Importance • Algorithmic • Best-First Feature Selection • Boruta • Leakage • Tight Correlations (AD, Plot, Correlations) • Test Data • Perfect future knowledge Care must be taken when creating features!
  • 36. BigML, Inc #DutchMLSchool Feature Selection 36 Leakage • sales pipeline where step n-1 has no other outcome then step n. • stock close predicts stock open • churn retention: the worst rep is actually the best (correlation != causation) • cancer prediction where one input is a doctor ordered test for the condition • account ID predicts fraud (because only new accounts are fraudsters)
  • 37. BigML, Inc #DutchMLSchool Summary 37 • Feature Engineering: what is it / why it is important • Automatic transformations: date-time, text, etc • Built-in functions: filtering and feature engineering • Discretization / Normalization / etc. • Flatline: programmatic feature engineering / filtering • Structure • Examples: Adding fields / filtering • When building features it is important to watch for leakage
  • 38. BigML, Inc #DutchMLSchool 38 OptiML and Fusions Automating Machine Learning Poul Petersen CIO, BigML, Inc
  • 39. BigML, Inc #DutchMLSchool Title 39 Decreasing Interpretability / Better Representation / Longer Training IncreasingDataSize/Complexity Early Stage Rapid Prototyping Mid Stage Proven Application Late Stage Critical Performance DeepnetsSingle Tree Model Logistic Regression Boosted Trees Random Decision Forest Decision Forest TO O H AR D
  • 40. BigML, Inc #DutchMLSchool BigML Deepnets 40 • The success of a Deepnet is dependent on getting the right network structure for the dataset • But, there are too many parameters: • Nodes, layers, activation function, learning rate, etc… • And setting them takes significant expert knowledge • Solution: • Metalearning (a good initial guess) • Network search (try a bunch) Remember this?
  • 41. BigML, Inc #DutchMLSchool OptiML 41 • Each resource has several parameters that impact quality • Number of trees, missing splits, nodes, weight • Rather than trial and error, we can use ML to find ideal parameters • Why not make the model type, Decision Tree, Boosted Tree, etc, a parameter as well? • Similar to Deepnet network search, but finds the optimum machine learning algorithm and parameters for your data automatically Key Insight: We can solve any parameter selection problem in a similar way.
  • 42. BigML, Inc #DutchMLSchool The Challenge… 42 • We will start with a dataset from StumbleUpon • Train/Test split with seed “bigml” • Build and Evaluate: • 1-click Model, LR, Ensemble, Deepnet • Top model from OptiML output • Compare the results using the phi coefficient • Explore other ideas for improving performance further
  • 44. BigML, Inc #DutchMLSchool Results… 44 All scores are phi, evaluated against a holdout • 1-Click Decision Tree: 0.36 • 1-Click LR: 0.47 • 1-Click Ensemble: 0.58 • Best OptiML Model (LR): 0.66 • 1-Click Deepnet: 0.67 • What else can we try?
  • 45. BigML, Inc #DutchMLSchool Fusions Inside 45 • Fuse any set of models into a new “fusion” • Must have the same objective type • Inputs and feature space can differ • Weights can be added • Give more importance to individual models • Fusions can be fused as well • Especially useful for fusing OptiML models Key Insight: ML algorithms each have unique strengths and weaknesses
  • 46. BigML, Inc #DutchMLSchool Performance thru Diversity 46 Dataset Optimized Deepnet Optimized Ensemble Optimized Logistic Regression Better?
  • 48. BigML, Inc #DutchMLSchool Results… 48 All scores are phi, evaluated against a holdout • 1-Click Decision Tree: 0.36 • 1-Click LR: 0.47 • 1-Click Ensemble: 0.58 • Best OptiML Model (LR): 0.66 • 1-Click Deepnet: 0.67 • Fusion of top Model Types: 0.68
  • 49. BigML, Inc #DutchMLSchool Fusions: Under the Hood 49 P(TRUE) = [56+(100-67)+2*78] / 4 Model Prediction Probability Weight Ensemble TRUE %56 1 Fus ion Deepnet FALSE %67 1 TRUE %61 Model TRUE %78 2 Classification Model Prediction Error Weight Ensemble 156,78 12,56 1 Fus ion Deepnet 139,55 9,88 1 160,13 17,49 Model 172,10 23,76 2 Regression
  • 50. BigML, Inc #DutchMLSchool Fusions: Like any BigML Model 50 • Fully accessible thru API and WhizzML • Bindings have support for local predictions
  • 51. BigML, Inc #DutchMLSchool Decision Boundary Smoothness 51 Single Tree: • Outcome changes abruptly near decision boundary • And not at all parallel to the boundary • This can be “surprising” Single Tree + Deepnet: • Keep the interpretability of the tree • But with a more nuanced decision boundary
  • 52. BigML, Inc #DutchMLSchool Feature Stability 52 Feature Importance: Different subsets of features may have similar modeling performance Fusing models gives better resilience against missing values as well as ensuring that all relevant features are utilized.
  • 53. BigML, Inc #DutchMLSchool Weighting over Time 53 1 Day Data significance over time: • Some data may change significance in different times • Short-term user behavior versus long-term • Weights can set to account for significance of time 1 Week 1 Month w=8 w=4 w=2
  • 54. BigML, Inc #DutchMLSchool Improved Class Separation 54 Consider a 3-class objective • Really only care about “yes” versus “not yes” • A single model may struggle to separate the two negative classes Yes No Maybe yes/no/maybe yes/no yes/maybe
  • 55. BigML, Inc #DutchMLSchool Feature Space Optimization 55 Model Skills: Some ML algorithms “generally” do better on some feature types: • RDF for sparse text vectors • LR/Deepnets for numeric features • Trees for categorical features Full Numeric Text
  • 57. BigML, Inc #DutchMLSchool Results… 57 All scores are phi, evaluated against a holdout • 1-Click Decision Tree: 0.36 • 1-Click LR: 0.47 • 1-Click Ensemble: 0.58 • Best OptiML Model (LR): 0.66 • 1-Click Deepnet: 0.67 • Fusion of top Model Types: 0.68 • Custom Feature Fusion: 0.70
  • 58. BigML, Inc #DutchMLSchool PCA Principal Component Analysis Poul Petersen CIO, BigML 58
  • 59. BigML, Inc #DutchMLSchool Issues with High Dimensionality 59 • Implicitly increases model complexity, prone to overfitting • Requires more observations in order to generalize well • Contains correlated or useless variables • Data is difficult to visualize • Takes a longer time to train models or make predictions Principal Component Analysis addresses all of these issues
  • 60. BigML, Inc #DutchMLSchool Other Approaches 60 MODEL Pruning, Node threshold ENSEMBLE Bagging, Randomization LOGISTIC REGRESSION L1 and L2 penalties DEEPNET Dropout
  • 61. BigML, Inc #DutchMLSchool Dimensionality Reduction 61 Feature Selection • Preserves the original variables and selects a subset • Often uses recursive methods or statistical thresholds • Examples: RFE, Chi-Squared Test, Boruta Feature Extraction • Transforms original variables into variables better suited for modeling • Examples: word vectors, clustering • PCA falls into this category Manual Approach
  • 62. BigML, Inc #DutchMLSchool When to use PCA 62 1. You want to reduce the number of variables in your model, but it is not clear which should be eliminated 2. You want to generate variables that are not correlated 3. You are okay with sacrificing some amount of interpretability for potential downstream performance gains
  • 63. BigML, Inc #DutchMLSchool How Does PCA Work? 63 Each PC is a linear combination of original variables PC1 = w1F1 + w2F2 + w3F3 + … + wNFN PC2 = w1F1 + w2F2 + w3F3 + … + wNFN PCN = w1F1 + w2F2 + w3F3 + … + wNFN …
  • 64. BigML, Inc #DutchMLSchool PCA Output 64 These principal components are not correlated
  • 65. BigML, Inc #DutchMLSchool PCA Workflow 65 SOURCE DATASET TRAIN TEST
  • 66. BigML, Inc #DutchMLSchool PCA Workflow 66 PCA SOURCE DATASET TRAIN TEST
  • 67. BigML, Inc #DutchMLSchool PCA Workflow 67 BATCH PROJECTION BATCH PROJECTION SOURCE DATASET TRAIN TEST PCA
  • 68. BigML, Inc #DutchMLSchool PCA Workflow 68 NEW TRAIN FEATURES NEW TEST FEATURES BATCH PROJECTION BATCH PROJECTION SOURCE DATASET TRAIN TEST PCA
  • 70. BigML, Inc #DutchMLSchool BigML PCA 70 • Standard PCA only applies to numerical data • BigML uses three different data transformation methods in order to handle different data types • Numeric data: Principal Component Analysis (PCA) • Categorical data: Multiple Correspondence Analysis (MCA) • Mixed data: Factorial Analysis of Mixed Data (FAMD) • BigML will automatically handle numeric, text, items, and categorical data without needing user input