Deep feature synthesis

•Download as PPTX, PDF•

1 like•396 views

Deep feature synthesis is an approach to automate feature engineering for relational and human behavioral data. It generates features from entities and their relationships using entity features, direct features from related entities, and relational features. This overcomes the iterative and time-consuming nature of traditional feature engineering. The approach is demonstrated on three predictive tasks, generating features from entities and relations and achieving substantially better results than other teams.

Technology

Deep Feature Synthesis
Data Science Machine - Towards Automating Data
Science Endeavors

What is Feature Engineering?!
 Feature Generation or Feature Construction
 Feature Selection
 Dimension Reduction

Overcoming Feature Engineering:
Feature Generation
 Why not using Deep Learning?!
• significant automation in feature engineering for
data types like: images, text, signals,… etc.
• relational and human behavioral data remains
iterative, human-intuition driven, and challenging,
thus time consuming.

 Just a relational Database ;)
data is structured and relational
data captures aspect s of human interactions
Overcoming Feature Engineering:
Any Restriction?!

Overcoming Feature Engineering:
Feature Generation- Example
“how often does this customer make a purchase?”
“how long has it been since this customer’s last purchase?”
How much does the total order price vary for the customer?”
“does this customer typically buy luxurious or economical products?”

Deep Feature Synthesis
Entity Features- Efeat
 feature => another type of value
• Categorical string data type to a pre-decided unique numeric value or
rounding of a numerical value
• Timestamp - 4 distinct features: weekday (1-7), day of the month (1-
30/31), month of the year (1-12) or hour of the day (1-24).

Deep Feature Synthesis
Direct Features- Dfeat
 Direct features are applied over the forward relationships.
 Features in a related entity i ∈ E K are directly transferred as features for the
m ∈ E L .
E LE K

Deep Feature Synthesis
Relational Features: Rfeat
 Relational features are applied over the backward relationships

PREDICTIVE MACHINE LEARNING PATHWAY
 Choosing the target value (feature) & predictors
 If predictors are computed using common base data as the target value, or if
they rely on data that does not exist at the time of the occurrence of the target
value, they are filtered out as invalid.

Reusable machine learning pathways
 Data preprocessing: removing the null values, converting the categorical
variables using one-hot encoding, and normalizing the features.
 Feature selection and dimensionality reduction: Truncated SVD
transformation
 Modeling: a random forest by constructing n decision trees.

BAYESIAN PARAMETER OPTIMIZATION USING
GAUSSIAN COPULA PROCESSES
 GCP is used to model the relationship f between parameter choices and the
performance of the whole pathway (Model).
 A naive grid search would lead to search in the space of 6 ∗ 490 ∗ 90 ∗ 10 ∗
450 ∗ 20 ∗ 100 = 2, 381, 400, 000, 000 (two trillion, three hundred eighty-one
billion, four hundred million) possibilities.

Feature Engineering Result
 KDD cup 2014 - Project Excitement:
 IJCAI - Repeat Buyer Prediction:
 KDD cup 2015 - Student Dropout:

KDD cup 2014 - Project Excitement
Entities :
 Projects
 Teacher
 Donors
 Outcomes
 Essays
 Recourses

KDD cup 2014 - Project Excitement
Entities Relation :

KDD cup 2014 - Project Excitement
Entities Features :

KDD cup 2014 - Project Excitement
Result : 70% Team Worse
Submission Score : 86.5%

IJCAI - Repeat Buyer Prediction:
Entities Relation :

IJCAI - Repeat Buyer Prediction:
Entities Features :

IJCAI - Repeat Buyer Prediction:
Result : 32.3% Team Worse
Submission Score : 93.7%

KDD cup 2015 - Student Dropout:
Entities :
 Enrollment Date
 Course Object
 Course Students
 Event Type
 Event Date

KDD cup 2015 - Student Dropout:
Entities Relation :

KDD cup 2015 - Student Dropout:
Entities Features :

KDD cup 2015 - Student Dropout:
Result : 85.7% Team Worse
Submission Score : 95.2%

Thank You
Resources:
http://www.jmaxkanter.com/static/papers/DSAA_DSM_2015.pdf

Similar to Deep feature synthesis

Prepare your data for machine learningIvo Andreev

Modern recommender system in large content websiteCyrus Chien-Ching Chiu

Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün

Utilizing Marginal Net Utility for Recommendation in E-commerceLiangjie Hong

BMDSE v1 - Data Scientist DeckSasha Lazarevic

Scaling & Transforming Stitch Fix's Visibility into What Folks will loveJune Andrews

Simplify Feature Engineering in Your Data WarehouseFeatureByte

The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com

Strata London - Deep Learning 05-2015Turi, Inc.

Made to Measure: Ranking Evaluation using ElasticsearchDaniel Schneiter

Large scale Click-streaming and tranaction log miningitstuff

IEEE.BigData.Tutorial.2.slidesNish Parikh

Cognitive Toolkit - Deep Learning framework from MicrosoftŁukasz Grala

Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit

1710 track3 zhuRising Media, Inc.

1440 track 2 boire_using our laptopRising Media, Inc.

How to Effectively Combine Numerical Features and Categorical FeaturesDomino Data Lab

The Machine Learning Workflow with AzureIvo Andreev

Machine Learning With ML.NETDev Raj Gautam

Similar to Deep feature synthesis (20)

Prepare your data for machine learning

Modern recommender system in large content website

Building High Available and Scalable Machine Learning Applications

Utilizing Marginal Net Utility for Recommendation in E-commerce

BMDSE v1 - Data Scientist Deck

Scaling & Transforming Stitch Fix's Visibility into What Folks will love

Simplify Feature Engineering in Your Data Warehouse

The Analytics Frontier of the Hadoop Eco-System

Strata London - Deep Learning 05-2015

Made to Measure: Ranking Evaluation using Elasticsearch

Large scale Click-streaming and tranaction log mining

IEEE.BigData.Tutorial.2.slides

Cognitive Toolkit - Deep Learning framework from Microsoft

Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...

1710 track3 zhu

1440 track 2 boire_using our laptop

How to Effectively Combine Numerical Features and Categorical Features

The Machine Learning Workflow with Azure

Machine Learning With ML.NET

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Install Stable Diffusion in windows machinePadma Pradeep

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

AI as an Interface for Commercial BuildingsMemoori

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Artificial intelligence in the post-deep learning eraDeakin University

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Key Features Of Token Development (1).pptxLBM Solutions

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Install Stable Diffusion in windows machine

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

AI as an Interface for Commercial Buildings

The transition to renewables in India.pdf

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

My INSURER PTE LTD - Insurtech Innovation Award 2024

Artificial intelligence in the post-deep learning era

Connect Wave/ connectwave Pitch Deck Presentation

My Hashitalk Indonesia April 2024 Presentation

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Pigging Solutions Piggable Sweeping Elbows

Understanding the Laravel MVC Architecture

Key Features Of Token Development (1).pptx

Designing IA for AI - Information Architecture Conference 2024

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Deep feature synthesis

1. Deep Feature Synthesis Data Science Machine - Towards Automating Data Science Endeavors

2. Data scientist Work

3. What is Feature Engineering?!  Feature Generation or Feature Construction  Feature Selection  Dimension Reduction

4. Then What?!

5. Overcoming Feature Engineering: Feature Generation  Why not using Deep Learning?! • significant automation in feature engineering for data types like: images, text, signals,… etc. • relational and human behavioral data remains iterative, human-intuition driven, and challenging, thus time consuming.

6.  Just a relational Database ;) data is structured and relational data captures aspect s of human interactions Overcoming Feature Engineering: Any Restriction?!

7. Overcoming Feature Engineering: Feature Generation- Example “how often does this customer make a purchase?” “how long has it been since this customer’s last purchase?” How much does the total order price vary for the customer?” “does this customer typically buy luxurious or economical products?”

8. Deep Feature Synthesis

9. Deep Feature Synthesis Entity Features- Efeat  feature => another type of value • Categorical string data type to a pre-decided unique numeric value or rounding of a numerical value • Timestamp - 4 distinct features: weekday (1-7), day of the month (1- 30/31), month of the year (1-12) or hour of the day (1-24).

10. Deep Feature Synthesis Direct Features- Dfeat  Direct features are applied over the forward relationships.  Features in a related entity i ∈ E K are directly transferred as features for the m ∈ E L . E LE K

11. Deep Feature Synthesis Relational Features: Rfeat  Relational features are applied over the backward relationships

12. Deep Feature Synthesis - Example -

13. PREDICTIVE MACHINE LEARNING PATHWAY  Choosing the target value (feature) & predictors  If predictors are computed using common base data as the target value, or if they rely on data that does not exist at the time of the occurrence of the target value, they are filtered out as invalid.

14. Reusable machine learning pathways  Data preprocessing: removing the null values, converting the categorical variables using one-hot encoding, and normalizing the features.  Feature selection and dimensionality reduction: Truncated SVD transformation  Modeling: a random forest by constructing n decision trees.

15. BAYESIAN PARAMETER OPTIMIZATION USING GAUSSIAN COPULA PROCESSES  GCP is used to model the relationship f between parameter choices and the performance of the whole pathway (Model).  A naive grid search would lead to search in the space of 6 ∗ 490 ∗ 90 ∗ 10 ∗ 450 ∗ 20 ∗ 100 = 2, 381, 400, 000, 000 (two trillion, three hundred eighty-one billion, four hundred million) possibilities.

16. Feature Engineering Result  KDD cup 2014 - Project Excitement:  IJCAI - Repeat Buyer Prediction:  KDD cup 2015 - Student Dropout:

17. KDD cup 2014 - Project Excitement Entities :  Projects  Teacher  Donors  Outcomes  Essays  Recourses

18. KDD cup 2014 - Project Excitement Entities Relation :

19. KDD cup 2014 - Project Excitement Entities Features :

20. KDD cup 2014 - Project Excitement Result : 70% Team Worse Submission Score : 86.5%

21. IJCAI - Repeat Buyer Prediction: Entities :  Behavior  Action Type  Brand  Category  Merchant  Item

22. IJCAI - Repeat Buyer Prediction: Entities Relation :

23. IJCAI - Repeat Buyer Prediction: Entities Features :

24. IJCAI - Repeat Buyer Prediction: Result : 32.3% Team Worse Submission Score : 93.7%

25. KDD cup 2015 - Student Dropout: Entities :  Enrollment Date  Course Object  Course Students  Event Type  Event Date

26. KDD cup 2015 - Student Dropout: Entities Relation :

27. KDD cup 2015 - Student Dropout: Entities Features :

28. KDD cup 2015 - Student Dropout: Result : 85.7% Team Worse Submission Score : 95.2%

29. Thank You Resources: http://www.jmaxkanter.com/static/papers/DSAA_DSM_2015.pdf

Deep feature synthesis

Recommended

Recommended

More Related Content

Similar to Deep feature synthesis

Similar to Deep feature synthesis (20)

Recently uploaded

Recently uploaded (20)

Deep feature synthesis