Deep Feature Synthesis
Data Science Machine - Towards Automating Data
Science Endeavors
Data scientist Work
What is Feature Engineering?!
 Feature Generation or Feature Construction
 Feature Selection
 Dimension Reduction
Then What?!
Overcoming Feature Engineering:
Feature Generation
 Why not using Deep Learning?!
• significant automation in feature engineering for
data types like: images, text, signals,… etc.
• relational and human behavioral data remains
iterative, human-intuition driven, and challenging,
thus time consuming.
 Just a relational Database ;)
data is structured and relational
data captures aspect s of human interactions
Overcoming Feature Engineering:
Any Restriction?!
Overcoming Feature Engineering:
Feature Generation- Example
“how often does this customer make a purchase?”
“how long has it been since this customer’s last purchase?”
How much does the total order price vary for the customer?”
“does this customer typically buy luxurious or economical products?”
Deep Feature Synthesis
Deep Feature Synthesis
Entity Features- Efeat
 feature => another type of value
• Categorical string data type to a pre-decided unique numeric value or
rounding of a numerical value
• Timestamp - 4 distinct features: weekday (1-7), day of the month (1-
30/31), month of the year (1-12) or hour of the day (1-24).
Deep Feature Synthesis
Direct Features- Dfeat
 Direct features are applied over the forward relationships.
 Features in a related entity i ∈ E K are directly transferred as features for the
m ∈ E L .
E LE K
Deep Feature Synthesis
Relational Features: Rfeat
 Relational features are applied over the backward relationships
Deep Feature Synthesis
- Example -
PREDICTIVE MACHINE LEARNING PATHWAY
 Choosing the target value (feature) & predictors
 If predictors are computed using common base data as the target value, or if
they rely on data that does not exist at the time of the occurrence of the target
value, they are filtered out as invalid.
Reusable machine learning pathways
 Data preprocessing: removing the null values, converting the categorical
variables using one-hot encoding, and normalizing the features.
 Feature selection and dimensionality reduction: Truncated SVD
transformation
 Modeling: a random forest by constructing n decision trees.
BAYESIAN PARAMETER OPTIMIZATION USING
GAUSSIAN COPULA PROCESSES
 GCP is used to model the relationship f between parameter choices and the
performance of the whole pathway (Model).
 A naive grid search would lead to search in the space of 6 ∗ 490 ∗ 90 ∗ 10 ∗
450 ∗ 20 ∗ 100 = 2, 381, 400, 000, 000 (two trillion, three hundred eighty-one
billion, four hundred million) possibilities.
Feature Engineering Result
 KDD cup 2014 - Project Excitement:
 IJCAI - Repeat Buyer Prediction:
 KDD cup 2015 - Student Dropout:
KDD cup 2014 - Project Excitement
Entities :
 Projects
 Teacher
 Donors
 Outcomes
 Essays
 Recourses
KDD cup 2014 - Project Excitement
Entities Relation :
KDD cup 2014 - Project Excitement
Entities Features :
KDD cup 2014 - Project Excitement
Result : 70% Team Worse
Submission Score : 86.5%
IJCAI - Repeat Buyer Prediction:
Entities :
 Behavior
 Action Type
 Brand
 Category
 Merchant
 Item
IJCAI - Repeat Buyer Prediction:
Entities Relation :
IJCAI - Repeat Buyer Prediction:
Entities Features :
IJCAI - Repeat Buyer Prediction:
Result : 32.3% Team Worse
Submission Score : 93.7%
KDD cup 2015 - Student Dropout:
Entities :
 Enrollment Date
 Course Object
 Course Students
 Event Type
 Event Date
KDD cup 2015 - Student Dropout:
Entities Relation :
KDD cup 2015 - Student Dropout:
Entities Features :
KDD cup 2015 - Student Dropout:
Result : 85.7% Team Worse
Submission Score : 95.2%
Thank You
Resources:
http://www.jmaxkanter.com/static/papers/DSAA_DSM_2015.pdf

Deep feature synthesis

  • 1.
    Deep Feature Synthesis DataScience Machine - Towards Automating Data Science Endeavors
  • 2.
  • 3.
    What is FeatureEngineering?!  Feature Generation or Feature Construction  Feature Selection  Dimension Reduction
  • 4.
  • 5.
    Overcoming Feature Engineering: FeatureGeneration  Why not using Deep Learning?! • significant automation in feature engineering for data types like: images, text, signals,… etc. • relational and human behavioral data remains iterative, human-intuition driven, and challenging, thus time consuming.
  • 6.
     Just arelational Database ;) data is structured and relational data captures aspect s of human interactions Overcoming Feature Engineering: Any Restriction?!
  • 7.
    Overcoming Feature Engineering: FeatureGeneration- Example “how often does this customer make a purchase?” “how long has it been since this customer’s last purchase?” How much does the total order price vary for the customer?” “does this customer typically buy luxurious or economical products?”
  • 8.
  • 9.
    Deep Feature Synthesis EntityFeatures- Efeat  feature => another type of value • Categorical string data type to a pre-decided unique numeric value or rounding of a numerical value • Timestamp - 4 distinct features: weekday (1-7), day of the month (1- 30/31), month of the year (1-12) or hour of the day (1-24).
  • 10.
    Deep Feature Synthesis DirectFeatures- Dfeat  Direct features are applied over the forward relationships.  Features in a related entity i ∈ E K are directly transferred as features for the m ∈ E L . E LE K
  • 11.
    Deep Feature Synthesis RelationalFeatures: Rfeat  Relational features are applied over the backward relationships
  • 12.
  • 13.
    PREDICTIVE MACHINE LEARNINGPATHWAY  Choosing the target value (feature) & predictors  If predictors are computed using common base data as the target value, or if they rely on data that does not exist at the time of the occurrence of the target value, they are filtered out as invalid.
  • 14.
    Reusable machine learningpathways  Data preprocessing: removing the null values, converting the categorical variables using one-hot encoding, and normalizing the features.  Feature selection and dimensionality reduction: Truncated SVD transformation  Modeling: a random forest by constructing n decision trees.
  • 15.
    BAYESIAN PARAMETER OPTIMIZATIONUSING GAUSSIAN COPULA PROCESSES  GCP is used to model the relationship f between parameter choices and the performance of the whole pathway (Model).  A naive grid search would lead to search in the space of 6 ∗ 490 ∗ 90 ∗ 10 ∗ 450 ∗ 20 ∗ 100 = 2, 381, 400, 000, 000 (two trillion, three hundred eighty-one billion, four hundred million) possibilities.
  • 16.
    Feature Engineering Result KDD cup 2014 - Project Excitement:  IJCAI - Repeat Buyer Prediction:  KDD cup 2015 - Student Dropout:
  • 17.
    KDD cup 2014- Project Excitement Entities :  Projects  Teacher  Donors  Outcomes  Essays  Recourses
  • 18.
    KDD cup 2014- Project Excitement Entities Relation :
  • 19.
    KDD cup 2014- Project Excitement Entities Features :
  • 20.
    KDD cup 2014- Project Excitement Result : 70% Team Worse Submission Score : 86.5%
  • 21.
    IJCAI - RepeatBuyer Prediction: Entities :  Behavior  Action Type  Brand  Category  Merchant  Item
  • 22.
    IJCAI - RepeatBuyer Prediction: Entities Relation :
  • 23.
    IJCAI - RepeatBuyer Prediction: Entities Features :
  • 24.
    IJCAI - RepeatBuyer Prediction: Result : 32.3% Team Worse Submission Score : 93.7%
  • 25.
    KDD cup 2015- Student Dropout: Entities :  Enrollment Date  Course Object  Course Students  Event Type  Event Date
  • 26.
    KDD cup 2015- Student Dropout: Entities Relation :
  • 27.
    KDD cup 2015- Student Dropout: Entities Features :
  • 28.
    KDD cup 2015- Student Dropout: Result : 85.7% Team Worse Submission Score : 95.2%
  • 29.