SlideShare a Scribd company logo
1 of 20
Download to read offline
Deep Feature
Synthesis
Martin Eastwood
FEATURE ENGINEERING IS:
the act of transforming your data
into a format that better represents
the underling problem
- Peter Norvig, Director of Research, Google
More data beats clever algorithms,
but better data beats more data.
Deep Feature Synthesis…
Generate new features based on relationships
within the data
Applies mathematical functions across feature
space depending on data types
Can stack functions to create ‘deep’ features
Nominal Data…
Labels without any quantitative value
• Gender: male / female
• Customer type: active / churned
• Blood type : A, O, AB
Cannot perform quantitative operations
• Frequency counts
• Mode
Ordinal Data…
Nominal data with a natural order
• Exam Grades: A, B, C, D, E, F
• Likert Scale: 1-10*
• Customer ratings: 1-5 Stars*
Inherits all the properties of nominal data,
plus, we have ordering
• Medians
• Quantiles
*It may look numeric but it’s not
Interval Data…
•Numeric data where the distance between
values is meaningful (but no ‘true’ zero)
• Temperature (Celsius, Fahrenheit)
• Time on a clock
•Quantitative data, meaning we have many
more options
• Add / Subtract
• Mean
• Standard Deviation
Ratio Data…
•Numeric data with a ‘true’ zero
• Height
• Number of orders
• Revenue
•Quantitative data, meaning we have many
more options
• Multiply / divide
• Ratio
An Example Dataset…
Customers Orders
customer_id gender age customer_id order_id date
1 m 28 1 1 01/05/2018
2 f 45 4 2 12/05/2018
3 … … … … …
4 … … … … …
Products Items Ordered
product_id price colour product_id order_id quantity
1 100.00 red 1 1 1
2 49.99 white 5 1 12
3 … … … … …
4 … … … … …
Feature Abstraction..
•Entity Features (EFEAT)
• Function applied element-wise to existing features, e.g. convert
date to day of week, normalize feature’s scale to 0-1
•Relational Features (RFEAT)
• Function applied to group of values via backward relationships,
e.g. Min, Max, AVG, Count
•Direct Features (DFEAT)
• Function applied to group of values via forward relationships
Example Features...
Customers RFEAT Orders RFEAT EFEAT
customer_id gender age COUNT(order_id) customer_id order_id date AVG(product price) Month of year
1 m 28 2 1 1 01/05/2018 2 5
2 f 45 5 4 2 12/05/2018 5 5
3 … … … … … … … …
4 … … … … … … … …
Products Items Ordered DFEAT
product_id price colour product_id order_id quantity product_price
1 100 red 1 1 1 100
2 50 white 5 1 5 25
3 … … … … … …
4 … … … … … …
Stack Derived Features…
EFEATRFEAT DFEAT
Handling the Increased Dimensionality…
•Process creates a lot of features
• Slows model training
• More expensive hardware required
• Increased risk of overfitting
• Reduced model performance, e.g. for clustering
Dimension Reduction…
•Authors use SVD to reduce dimensionality
• Create new features called components, which contain
linear combinations of original features
• Compresses data into a smaller feature space
• Select top n% features based on feature importance
Machine Learning Pipeline…
Cluster
(optional)
Train /
Tune
Predict Evaluate
Tuning Hyper Parameters…
6 * 490 * 90 * 10 * 450 * 20 * 100 = two trillion three hundred eighty-
one billion four hundred million combinations
Parameter Range
Clusters [1-6]
SVD Components [10-500]
% Components Selected [10-100]
Oversampling Ratio [1-10]
Trees in Random Forest [50-500]
Decision Tree Depth [1-20]
% Features in Tree [1-100]
Using a Model to Tune a Model…
•Tune parameters using Gaussian Copula
• Sample hyperparameters randomly
• Assess model using cross-validation
• Model non-linear functions between parameters using
Gaussian copula
• Predict neighborhood of parameters to sample from next
• Repeat…
Conclusions…
•DFS automatically synthesizes new features
based on relationships in the data
•Use SVD to control the size of the feature
space and keep it manageable
•Optimize parameter tuning by modeling
parameter space
Questions
Paper available at https://bit.ly/2s2krRG

More Related Content

What's hot

Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AINing Jiang
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIWanjin Yu
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Aprendizado de Máquina
Aprendizado de MáquinaAprendizado de Máquina
Aprendizado de Máquinabutest
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowDatabricks
 
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodingsChode Amarnath
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonMicrosoft
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learningBabu Priyavrat
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced dataSaurabhWani6
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...GEO Analytics Canada
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning pyingkodi maran
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 

What's hot (20)

Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Aprendizado de Máquina
Aprendizado de MáquinaAprendizado de Máquina
Aprendizado de Máquina
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
 
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodings
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 

Similar to Deep Feature Synthesis

Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learningIvo Andreev
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at PipedriveAndré Karpištšenko
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with AzureBarbara Fusinska
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopDataWorks Summit
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à ZAlexia Audevart
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Venturesmicrosoftventures
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...PAPIs.io
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Deep feature synthesis
Deep feature synthesisDeep feature synthesis
Deep feature synthesisDurra Sahtout
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Topic5 - IT Implementation & Challenges.pptx
Topic5 - IT Implementation & Challenges.pptxTopic5 - IT Implementation & Challenges.pptx
Topic5 - IT Implementation & Challenges.pptxCallplanetsDeveloper
 
Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Eduardo Castro
 

Similar to Deep Feature Synthesis (20)

Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with Azure
 
Preprocessing_new.ppt
Preprocessing_new.pptPreprocessing_new.ppt
Preprocessing_new.ppt
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on Hadoop
 
Data preprocess
Data preprocessData preprocess
Data preprocess
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
kdd2015
kdd2015kdd2015
kdd2015
 
Machine learning
Machine learningMachine learning
Machine learning
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
5.Developing IT Solution.pptx
5.Developing IT Solution.pptx5.Developing IT Solution.pptx
5.Developing IT Solution.pptx
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Deep feature synthesis
Deep feature synthesisDeep feature synthesis
Deep feature synthesis
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Topic5 - IT Implementation & Challenges.pptx
Topic5 - IT Implementation & Challenges.pptxTopic5 - IT Implementation & Challenges.pptx
Topic5 - IT Implementation & Challenges.pptx
 
Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Deep Feature Synthesis

  • 2.
  • 3. FEATURE ENGINEERING IS: the act of transforming your data into a format that better represents the underling problem
  • 4. - Peter Norvig, Director of Research, Google More data beats clever algorithms, but better data beats more data.
  • 5. Deep Feature Synthesis… Generate new features based on relationships within the data Applies mathematical functions across feature space depending on data types Can stack functions to create ‘deep’ features
  • 6. Nominal Data… Labels without any quantitative value • Gender: male / female • Customer type: active / churned • Blood type : A, O, AB Cannot perform quantitative operations • Frequency counts • Mode
  • 7. Ordinal Data… Nominal data with a natural order • Exam Grades: A, B, C, D, E, F • Likert Scale: 1-10* • Customer ratings: 1-5 Stars* Inherits all the properties of nominal data, plus, we have ordering • Medians • Quantiles *It may look numeric but it’s not
  • 8. Interval Data… •Numeric data where the distance between values is meaningful (but no ‘true’ zero) • Temperature (Celsius, Fahrenheit) • Time on a clock •Quantitative data, meaning we have many more options • Add / Subtract • Mean • Standard Deviation
  • 9. Ratio Data… •Numeric data with a ‘true’ zero • Height • Number of orders • Revenue •Quantitative data, meaning we have many more options • Multiply / divide • Ratio
  • 10. An Example Dataset… Customers Orders customer_id gender age customer_id order_id date 1 m 28 1 1 01/05/2018 2 f 45 4 2 12/05/2018 3 … … … … … 4 … … … … … Products Items Ordered product_id price colour product_id order_id quantity 1 100.00 red 1 1 1 2 49.99 white 5 1 12 3 … … … … … 4 … … … … …
  • 11. Feature Abstraction.. •Entity Features (EFEAT) • Function applied element-wise to existing features, e.g. convert date to day of week, normalize feature’s scale to 0-1 •Relational Features (RFEAT) • Function applied to group of values via backward relationships, e.g. Min, Max, AVG, Count •Direct Features (DFEAT) • Function applied to group of values via forward relationships
  • 12. Example Features... Customers RFEAT Orders RFEAT EFEAT customer_id gender age COUNT(order_id) customer_id order_id date AVG(product price) Month of year 1 m 28 2 1 1 01/05/2018 2 5 2 f 45 5 4 2 12/05/2018 5 5 3 … … … … … … … … 4 … … … … … … … … Products Items Ordered DFEAT product_id price colour product_id order_id quantity product_price 1 100 red 1 1 1 100 2 50 white 5 1 5 25 3 … … … … … … 4 … … … … … …
  • 14. Handling the Increased Dimensionality… •Process creates a lot of features • Slows model training • More expensive hardware required • Increased risk of overfitting • Reduced model performance, e.g. for clustering
  • 15. Dimension Reduction… •Authors use SVD to reduce dimensionality • Create new features called components, which contain linear combinations of original features • Compresses data into a smaller feature space • Select top n% features based on feature importance
  • 17. Tuning Hyper Parameters… 6 * 490 * 90 * 10 * 450 * 20 * 100 = two trillion three hundred eighty- one billion four hundred million combinations Parameter Range Clusters [1-6] SVD Components [10-500] % Components Selected [10-100] Oversampling Ratio [1-10] Trees in Random Forest [50-500] Decision Tree Depth [1-20] % Features in Tree [1-100]
  • 18. Using a Model to Tune a Model… •Tune parameters using Gaussian Copula • Sample hyperparameters randomly • Assess model using cross-validation • Model non-linear functions between parameters using Gaussian copula • Predict neighborhood of parameters to sample from next • Repeat…
  • 19. Conclusions… •DFS automatically synthesizes new features based on relationships in the data •Use SVD to control the size of the feature space and keep it manageable •Optimize parameter tuning by modeling parameter space
  • 20. Questions Paper available at https://bit.ly/2s2krRG