Feature Engineering Hands-On by Dmitry Larko

Feature Engineering
Hands-on
by Dmitry Larko
Sr. Data Scientist @ H2O.ai

6x
"Coming up with features is difficult, time-consuming, requires expert knowledge. “Applied machine
learning” is basically feature engineering." ~ Andrew Ng
Feature Engineering
“… some machine learning projects succeed and some fail. What makes the difference? Easily the
most important factor is the features used.” ~ Pedro Domingos
“Good data preparation and feature engineering is integral to better prediction” ~ Marios Michailidis
(KazAnova), Kaggle GrandMaster, Kaggle #2, former #1
”you have to turn your inputs into things the algorithm can understand” ~ Shayne Miel, answer to
“What is the intuitive explanation of feature engineering in machine learning?”

6x
Your data
6x
Target Categorical CatNum

Follow this link
6x
https://www.kaggle.com/c/amazon-employee-access-challenge/kernels
http://bit.ly/SoCal-h2o
We are going to use these 4 kernels:

6x
Categorical features – missing values
6x
• Encode as an unique category
• “Unknown”, “Missing”, ….
• Use the most frequent category level (mode)
Feature 1 Encoded Feature 1 Encoded Feature 1
A A A
A A A
NaN MISSING A
A A A
B B B
B B B
NaN MISSING A
C C C
C C C

6x
Categorical Encoding
6x
• Turn categorical features into numeric features to
provide more fine-grained information
• Help explicitly capture non-linear relationships and
interactions between the values of features
• Most of machine learning tools only accept numbers as
their input
• xgboost, gbm, glmnet, libsvm, liblinear, etc.

6x
• Label Encoding
• Randomly assign 1 or N integer to each level
• Ok for tree-based methods
Feature 1 Encoded Feature 1 Encoded Feature 2
A 0 0
A 0 0
A 0 0
A 0 0
B 1 2
B 1 2
B 1 2
C 2 1
C 2 1
A 0 0
B 1 2
C 2 1

6x
6x
• One Hot Encoding
• Transform categories into individual binary (0 or 1) features
• Python scikit-learn: DictVectorizer, OneHotEncoder
• Ok for K-means, Linear, NNs, etc
Feature Feature = A Feature = B Feature = C
A 1 0 0
A 1 0 0
A 1 0 0
A 1 0 0
B 0 1 0
B 0 1 0
B 0 1 0
C 0 0 1
C 0 0 1
A 1 0 0
B 0 1 0
C 0 0 1

6x
• SVD encoding
Feature1 Feature2
A C
A D
A C
A C
B C
B D
B D
C D
C C
C D
A 3 1
B 1 2
C 1 1
C D
A 0.94 0.31
B 0.44 0.89
C 0.70 0.70
A 0.91
B 0.93
C 0.99
Count IDF SVD

6x
• Frequency Encoding
• Encoding of categorical levels of feature to values
between 0 and 1 based on their relative frequency
Feature Encoded Feature
A 0.44
A 0.44
A 0.44
A 0.44
B 0.33
B 0.33
B 0.33
C 0.22
C 0.22
A 0.44 (4 out of 9)
B 0.33 (3 out of 9)
C 0.22 (2 out of 9)

Categorical Encoding - Target mean encoding
6x
• Instead of dummy encoding of categorical variables and increasing the
number of features we can encode each level as the mean of the response.
Feature Outcome MeanEncode
A 1 0.75
A 0 0.75
A 1 0.75
A 1 0.75
B 1 0.66
B 1 0.66
B 0 0.66
C 1 1.00
C 1 1.00
A 0.75 (3 out of 4)
B 0.66 (2 out of 3)
C 1.00 (2 out of 2)

Categorical Encoding - Target mean encoding, smoothing
6x
• Also it is better to calculate weighted average of the overall mean of the
training set and the mean of the level:
𝑓 𝑛 ∗ 𝑚𝑒𝑎𝑛 𝑙𝑒𝑣𝑒𝑙 + 1 − 𝑓 𝑛 ∗ 𝑚𝑒𝑎𝑛 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
Where:
𝑓 𝑛 - weight function (monotonic)
𝑚𝑒𝑎𝑛 𝑙𝑒𝑣𝑒𝑙 - calculated mean for category level
𝑚𝑒𝑎𝑛 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 - calculated mean for a dataset
• The weights are based on the frequency of the levels i.e. if a category
only appears a few times in the dataset then its encoded value will be
close to the overall mean instead of the mean of that level.

Categorical Encoding – Target mean encoding 𝑓 𝑛 example
6x
x = frequency
k = inflection
point
f = steepness
https://www.researchgate.net/publication/220520258_A_Preprocessing_Scheme_for_High-Cardinality_Categorical_Attributes_in_Classification_and_Prediction_Problems

Categorical Encoding - Target mean encoding - Smoothing
6x
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
x level dataset 𝜆
A 4 0.75 0.77 0.99 0.99*0.75 + 0.01*0.77 = 0.7502
B 3 0.66 0.77 0.98 0.98*0.66 + 0.02*0.77 = 0.6622
C 2 1.00 0.77 0.5 0.5*1.0 + 0.5*0.77 = 0.885
𝜆 =
1
1 + exp (−
𝑥 − 2
0.25
)
x level dataset 𝜆
A 4 0.75 0.77 0.98 0.98*0.75 + 0.01*0.77 = 0.7427
B 3 0.66 0.77 0.5 0.5*0.66 + 0.5*0.77 = 0.715
C 2 1.00 0.77 0.017 0.017*1.0 + 0.983*0.77 = 0.773
𝜆 =
1
1 + exp (−
𝑥 − 3
0.25
)

Categorical Encoding – Target mean encoding, adding noise
6x
• Using 2-fold CV
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
A 1
B 0
C 1
C 1
A 1
B 1
A 1
B 1
A 0
A 0.5
B 1.0
A 1.0
B 0.0
C 1.0
A 0.5
B 1.0
C 0.77
C 0.77
A 0.5
B 0.0
A 1.0
B 0.0
A 1.0
Feature Encode
B 0.0
A 1.0
B 0.0
A 1.0
A 0.5
B 1.0
C 0.77
C 0.77
A 0.5

Categorical Encoding – Target mean encoding, adding noise
6x
Feature Outcome MeanEncode
A 1 0.75
A 0 0.75
A 1 0.75
A 1 0.75
B 1 0.66
B 1 0.66
B 0 0.66
C 1 1.00
C 1 1.00
A 0.75 (3 out of 4)
B 0.66 (2 out of 3)
C 1.00 (2 out of 2)
• Using Expanding mean approach

Categorical Encoding - Target mean encoding, adding noise
6x
• Expanding mean
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
Shuffle data

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77
0.77

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77
0.77
1.00

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77
0.77
1.00
1.00

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77
0.77
1.00
1.00
0.50

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77
0.77
1.00
1.00
0.50
1.00

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77
0.77
1.00
1.00
0.50
1.00
0.77

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77
0.77
1.00
1.00
0.50
1.00
0.77
1.00

• Expanding mean
Feature Outcome
B 1
A 1
B 1
A 0
A 1
B 0
C 1
C 1
A 1
ExpMeanEncode
0.77
0.77
1.00
1.00
0.50
1.00
0.77
1.00
0.66

Must read:
• Feature Engineering and Selection: A Practical Approach for Predictive Models
www.feat.engineering
• O’Reilly Feature Engineering for Machine Learning
http://shop.oreilly.com/product/0636920049081.do
• GitHub for different categorical encoding techniques (14!!! techniques):
https://github.com/scikit-learn-contrib/categorical-encoding
• Mean (likelihood) encoding for categorical variables with high cardinality and feature
interactions: a comprehensive study with Python: https://www.kaggle.com/vprokopev/mean-
likelihood-encodings-a-comprehensive-study
• Nice ensembling techniques overview https://mlwave.com/kaggle-ensembling-guide/

Feature Engineering Hands-On by Dmitry Larko

Recommended

Recommended

More Related Content

Similar to Feature Engineering Hands-On by Dmitry Larko

Similar to Feature Engineering Hands-On by Dmitry Larko (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

Feature Engineering Hands-On by Dmitry Larko