Deep feature synthesis is an approach to automate feature engineering for relational and human behavioral data. It generates features from entities and their relationships using entity features, direct features from related entities, and relational features. This overcomes the iterative and time-consuming nature of traditional feature engineering. The approach is demonstrated on three predictive tasks, generating features from entities and relations and achieving substantially better results than other teams.
5. Overcoming Feature Engineering:
Feature Generation
Why not using Deep Learning?!
• significant automation in feature engineering for
data types like: images, text, signals,… etc.
• relational and human behavioral data remains
iterative, human-intuition driven, and challenging,
thus time consuming.
6. Just a relational Database ;)
data is structured and relational
data captures aspect s of human interactions
Overcoming Feature Engineering:
Any Restriction?!
7. Overcoming Feature Engineering:
Feature Generation- Example
“how often does this customer make a purchase?”
“how long has it been since this customer’s last purchase?”
How much does the total order price vary for the customer?”
“does this customer typically buy luxurious or economical products?”
9. Deep Feature Synthesis
Entity Features- Efeat
feature => another type of value
• Categorical string data type to a pre-decided unique numeric value or
rounding of a numerical value
• Timestamp - 4 distinct features: weekday (1-7), day of the month (1-
30/31), month of the year (1-12) or hour of the day (1-24).
10. Deep Feature Synthesis
Direct Features- Dfeat
Direct features are applied over the forward relationships.
Features in a related entity i ∈ E K are directly transferred as features for the
m ∈ E L .
E LE K
13. PREDICTIVE MACHINE LEARNING PATHWAY
Choosing the target value (feature) & predictors
If predictors are computed using common base data as the target value, or if
they rely on data that does not exist at the time of the occurrence of the target
value, they are filtered out as invalid.
14. Reusable machine learning pathways
Data preprocessing: removing the null values, converting the categorical
variables using one-hot encoding, and normalizing the features.
Feature selection and dimensionality reduction: Truncated SVD
transformation
Modeling: a random forest by constructing n decision trees.
15. BAYESIAN PARAMETER OPTIMIZATION USING
GAUSSIAN COPULA PROCESSES
GCP is used to model the relationship f between parameter choices and the
performance of the whole pathway (Model).
A naive grid search would lead to search in the space of 6 ∗ 490 ∗ 90 ∗ 10 ∗
450 ∗ 20 ∗ 100 = 2, 381, 400, 000, 000 (two trillion, three hundred eighty-one
billion, four hundred million) possibilities.
16. Feature Engineering Result
KDD cup 2014 - Project Excitement:
IJCAI - Repeat Buyer Prediction:
KDD cup 2015 - Student Dropout: