Deep Feature Synthesis

Deep Feature
Synthesis
Martin Eastwood

FEATURE ENGINEERING IS:
the act of transforming your data
into a format that better represents
the underling problem

- Peter Norvig, Director of Research, Google
More data beats clever algorithms,
but better data beats more data.

Deep Feature Synthesis…
Generate new features based on relationships
within the data
Applies mathematical functions across feature
space depending on data types
Can stack functions to create ‘deep’ features

Nominal Data…
Labels without any quantitative value
• Gender: male / female
• Customer type: active / churned
• Blood type : A, O, AB
Cannot perform quantitative operations
• Frequency counts
• Mode

Ordinal Data…
Nominal data with a natural order
• Exam Grades: A, B, C, D, E, F
• Likert Scale: 1-10*
• Customer ratings: 1-5 Stars*
Inherits all the properties of nominal data,
plus, we have ordering
• Medians
• Quantiles
*It may look numeric but it’s not

Interval Data…
•Numeric data where the distance between
values is meaningful (but no ‘true’ zero)
• Temperature (Celsius, Fahrenheit)
• Time on a clock
•Quantitative data, meaning we have many
more options
• Add / Subtract
• Mean
• Standard Deviation

Ratio Data…
•Numeric data with a ‘true’ zero
• Height
• Number of orders
• Revenue
•Quantitative data, meaning we have many
more options
• Multiply / divide
• Ratio

An Example Dataset…
Customers Orders
customer_id gender age customer_id order_id date
1 m 28 1 1 01/05/2018
2 f 45 4 2 12/05/2018
3 … … … … …
4 … … … … …
Products Items Ordered
product_id price colour product_id order_id quantity
1 100.00 red 1 1 1
2 49.99 white 5 1 12
3 … … … … …
4 … … … … …

Feature Abstraction..
•Entity Features (EFEAT)
• Function applied element-wise to existing features, e.g. convert
date to day of week, normalize feature’s scale to 0-1
•Relational Features (RFEAT)
• Function applied to group of values via backward relationships,
e.g. Min, Max, AVG, Count
•Direct Features (DFEAT)
• Function applied to group of values via forward relationships

Example Features...
Customers RFEAT Orders RFEAT EFEAT
customer_id gender age COUNT(order_id) customer_id order_id date AVG(product price) Month of year
1 m 28 2 1 1 01/05/2018 2 5
2 f 45 5 4 2 12/05/2018 5 5
3 … … … … … … … …
4 … … … … … … … …
Products Items Ordered DFEAT
product_id price colour product_id order_id quantity product_price
1 100 red 1 1 1 100
2 50 white 5 1 5 25
3 … … … … … …
4 … … … … … …

Stack Derived Features…
EFEATRFEAT DFEAT

Handling the Increased Dimensionality…
•Process creates a lot of features
• Slows model training
• More expensive hardware required
• Increased risk of overfitting
• Reduced model performance, e.g. for clustering

Dimension Reduction…
•Authors use SVD to reduce dimensionality
• Create new features called components, which contain
linear combinations of original features
• Compresses data into a smaller feature space
• Select top n% features based on feature importance

Machine Learning Pipeline…
Cluster
(optional)
Train /
Tune
Predict Evaluate

Tuning Hyper Parameters…
6 * 490 * 90 * 10 * 450 * 20 * 100 = two trillion three hundred eighty-
one billion four hundred million combinations
Parameter Range
Clusters [1-6]
SVD Components [10-500]
% Components Selected [10-100]
Oversampling Ratio [1-10]
Trees in Random Forest [50-500]
Decision Tree Depth [1-20]
% Features in Tree [1-100]

Using a Model to Tune a Model…
•Tune parameters using Gaussian Copula
• Sample hyperparameters randomly
• Assess model using cross-validation
• Model non-linear functions between parameters using
Gaussian copula
• Predict neighborhood of parameters to sample from next
• Repeat…

Conclusions…
•DFS automatically synthesizes new features
based on relationships in the data
•Use SVD to control the size of the feature
space and keep it manageable
•Optimize parameter tuning by modeling
parameter space

Questions
Paper available at https://bit.ly/2s2krRG

Deep Feature Synthesis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Feature Synthesis

Similar to Deep Feature Synthesis (20)

Recently uploaded

Recently uploaded (20)

Deep Feature Synthesis