SlideShare a Scribd company logo
1 of 22
Download to read offline
FeatureByte
Feature Ideation
9th May 2023
Singapore
FeatureByte
Feature Ideation
2
Colin Priest, Chief Evangelist
Data is a finite resource, and you want to get the most out of it. So
feature engineering becomes vital for machine learning success. Yet
we often lose important signals in our data when our data extracts
are limited to the same old boring COUNT and SUM aggregations.
We propose a new approach to feature engineering ideation, based
upon a structure using data semantics and signal types, yet with the
freedom to apply domain knowledge.
FeatureByte
Agenda
3
● Why is Feature Engineering Important?
● Signal Types
● Entities
FeatureByte
Why is Feature
Engineering Important?
FeatureByte 5
Why is Feature Engineering Important?
Clean Data
Feature engineering tells an
algorithm how to handle
data.
Improved Model
Performance
Better features lead to more
accurate models, which
leads to better predictive
power and ultimately, better
decisions.
Explainability
Feature engineering is a
creative process that
requires a deep
understanding of data to
create signals that
intuitively link to real world
characteristics and actions.
FeatureByte
Signal Types
FeatureByte 7
Clumpiness Signals
Do events occur randomly across time,
or are there long gaps between intense
bursts of activity?
Metrics
● Coefficient of variation of
inter-event times
● Entropy of inter-event times
Examples
● Which customers are binge-watching TV
shows?
● Do equipment failures occur in cascades?
● After a long break without purchasing, does
a customer return to a store a few times
over several days to purchase items?
FeatureByte 8
Inventory Signals
What are the counts or amounts of
events or available resources,
cross-categorized by an item label?
Metrics
● Counts of item types
● Sum of values of item types
● Max values of item types
● Latest attribute of item types
Examples
● What are the counts of each item type in a
shopping basket?
● What is the total value of item type for each
customer’s shipping orders over 30 days?
● What is the maximum weight of each type of
item in a shipping order?
FeatureByte 9
Similarity Signals
How similar is one entity to a group of
others?
Metrics
● Ratio of one entity's numeric
value to the average or
maximum of values over a
group of entities.
● Cosine similarity of two
entities’ cross-aggregations
Examples
● How similar is one customer's purchases to
customers of the same age and gender?
● What is the ratio of a network antenna's
maximum temperature to that of all other
antennas that are the same model, or
located in the same geography?
FeatureByte 10
Stability Signals
Are an entity’s recent events similar to
the past for the same entity?
Metrics
● Ratio of latest numeric value to
the average or max of values
over a historical time window.
● Cosine similarity of two
cross-aggregations from
different time periods
Examples
● Are the contents of the latest shopping
basket similar to past purchases?
● Is the count or sum of numeric event data
over the most recent period similar to the
past?
● Is a recent data value anomalous?
FeatureByte 11
Change Signals
Have usually static attributes changed?
By how much?
Metrics
● Did an attribute change?
● How many times did an
attribute change?
● By how much did it change?
Examples
● Did a customer change address over the past
two years? How far?
● Has a password been reset in the past 48
hours?
● Was network equipment replaced with a
different make or model?
FeatureByte 12
Diversity Signals
How variable are the data values?
Metrics
● Coefficient of variation of
amounts
● Entropy of labels
Examples
● How variable are the monthly invoice
amounts over the past 6 months?
● How stable is a patient's blood pressure?
● Does a customer always purchase similar
products?
FeatureByte 13
Location Signals
The static or dynamic location of an
event or entity.
Metrics
● The latitude and longitude of an
event.
● Distance between locations, or
distance moved within a time
period.
Examples
● What is the distance between the shop and
the customer address?
● Is the shipping address in a remote location?
What is the population density in that state?
● How far has a vehicle moved in the past
hour?
FeatureByte 14
Regularity Signals
Any seasonality in the timing of events.
Metrics
● Entropy of day of the week of
events.
● Cross-aggregation of events by
hour or month
Examples
● Does a customer always shop on the same
days of the week?
● Do equipment failures tend to occur at
different times of the day or times of year?
FeatureByte 15
Recency Signals
Attributes of the latest event to occur.
Metrics
● Time since last event.
● Label of last event.
● Magnitude of last event.
Examples
● How much did a customer spend on their
last purchase?
● How long since the last equipment failure?
● Was a patient’s last visit for an emergency?
FeatureByte 16
Frequency Signals
How often do events occur?
Metrics
● Number of events occurring
over a time window.
Examples
● How many hospital admissions has the
patient had in the past 12 months?
● How many international phone calls has a
customer made in the past month?
FeatureByte 17
Monetary Signals
What are the monetary amounts of
events over a time window?
Metrics
● Total cost of events occurring
over a time window.
● Average cost of events
occurring over a time window
Examples
● How much did a customer spend over the
past 12 months?
● What was the average discount applied to a
customer’s purchases over the past month?
FeatureByte
Entities
FeatureByte 19
Entities
Parent Entities
● A group that an entity
belongs to e.g. gender, or
country.
● Can be joined to the unit of
analysis as a robust signal
source.
Child Entities
● More granular than your
unit of analysis.
● Needs to be aggregated.
Use a Variety of Entities
● An entity is a real-world object
or concept that is represented
by fields in the source tables.
● All features must relate to an
entity (or entities) as their
primary unit of analysis.
● Table joins add valuable
signals.
FeatureByte
Conclusion
FeatureByte 21
Conclusion
Use signal types and entities for
inspiration and information
Inspiration
● Lists of signal types and entities
can prompt and remind you to
engineer missing features
Information
● Features with different signal types tend to
be less correlated
● Business SMEs can intuitively understand
signal types
FeatureByte
Thank You!

More Related Content

Similar to Feature Ideation

TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14TechSoup
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsJohn Blossom
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Rittman Analytics
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Richard Veryard
 
Financial analysis for product managers
Financial analysis for product managersFinancial analysis for product managers
Financial analysis for product managersMike Claiborne
 
Data Wrangling Questionnaire
Data Wrangling QuestionnaireData Wrangling Questionnaire
Data Wrangling QuestionnaireMimi Brown
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big DataRavinder Kamboj
 
Government and Software as-a-service
Government and Software as-a-serviceGovernment and Software as-a-service
Government and Software as-a-serviceProudCity
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarNaresh Jain
 
A multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner EriksenA multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner EriksenAlan Quayle
 
Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018Utah Digital Marketing Collective
 
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Luciano Pesci, PhD
 
22 non statistical questions for a statistician v2
22 non statistical questions for a statistician v222 non statistical questions for a statistician v2
22 non statistical questions for a statistician v2Derick Jose
 
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...fluturadsa
 
Multiquip Goes AI with Involve
Multiquip Goes AI with InvolveMultiquip Goes AI with Involve
Multiquip Goes AI with InvolveRazaSiddiqui9
 
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese,  Big Data as core engine to support the Wind Tre datadriven journeyM. Savarese,  Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journeyIstituto nazionale di statistica
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolLouis Cialdella
 
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...Search Engine Journal
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 

Similar to Feature Ideation (20)

TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015
 
Financial analysis for product managers
Financial analysis for product managersFinancial analysis for product managers
Financial analysis for product managers
 
Data Wrangling Questionnaire
Data Wrangling QuestionnaireData Wrangling Questionnaire
Data Wrangling Questionnaire
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
 
Data Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analyticsData Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analytics
 
Government and Software as-a-service
Government and Software as-a-serviceGovernment and Software as-a-service
Government and Software as-a-service
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
 
A multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner EriksenA multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner Eriksen
 
Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018
 
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)
 
22 non statistical questions for a statistician v2
22 non statistical questions for a statistician v222 non statistical questions for a statistician v2
22 non statistical questions for a statistician v2
 
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
 
Multiquip Goes AI with Involve
Multiquip Goes AI with InvolveMultiquip Goes AI with Involve
Multiquip Goes AI with Involve
 
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese,  Big Data as core engine to support the Wind Tre datadriven journeyM. Savarese,  Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 

Recently uploaded

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 

Recently uploaded (20)

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 

Feature Ideation

  • 2. FeatureByte Feature Ideation 2 Colin Priest, Chief Evangelist Data is a finite resource, and you want to get the most out of it. So feature engineering becomes vital for machine learning success. Yet we often lose important signals in our data when our data extracts are limited to the same old boring COUNT and SUM aggregations. We propose a new approach to feature engineering ideation, based upon a structure using data semantics and signal types, yet with the freedom to apply domain knowledge.
  • 3. FeatureByte Agenda 3 ● Why is Feature Engineering Important? ● Signal Types ● Entities
  • 5. FeatureByte 5 Why is Feature Engineering Important? Clean Data Feature engineering tells an algorithm how to handle data. Improved Model Performance Better features lead to more accurate models, which leads to better predictive power and ultimately, better decisions. Explainability Feature engineering is a creative process that requires a deep understanding of data to create signals that intuitively link to real world characteristics and actions.
  • 7. FeatureByte 7 Clumpiness Signals Do events occur randomly across time, or are there long gaps between intense bursts of activity? Metrics ● Coefficient of variation of inter-event times ● Entropy of inter-event times Examples ● Which customers are binge-watching TV shows? ● Do equipment failures occur in cascades? ● After a long break without purchasing, does a customer return to a store a few times over several days to purchase items?
  • 8. FeatureByte 8 Inventory Signals What are the counts or amounts of events or available resources, cross-categorized by an item label? Metrics ● Counts of item types ● Sum of values of item types ● Max values of item types ● Latest attribute of item types Examples ● What are the counts of each item type in a shopping basket? ● What is the total value of item type for each customer’s shipping orders over 30 days? ● What is the maximum weight of each type of item in a shipping order?
  • 9. FeatureByte 9 Similarity Signals How similar is one entity to a group of others? Metrics ● Ratio of one entity's numeric value to the average or maximum of values over a group of entities. ● Cosine similarity of two entities’ cross-aggregations Examples ● How similar is one customer's purchases to customers of the same age and gender? ● What is the ratio of a network antenna's maximum temperature to that of all other antennas that are the same model, or located in the same geography?
  • 10. FeatureByte 10 Stability Signals Are an entity’s recent events similar to the past for the same entity? Metrics ● Ratio of latest numeric value to the average or max of values over a historical time window. ● Cosine similarity of two cross-aggregations from different time periods Examples ● Are the contents of the latest shopping basket similar to past purchases? ● Is the count or sum of numeric event data over the most recent period similar to the past? ● Is a recent data value anomalous?
  • 11. FeatureByte 11 Change Signals Have usually static attributes changed? By how much? Metrics ● Did an attribute change? ● How many times did an attribute change? ● By how much did it change? Examples ● Did a customer change address over the past two years? How far? ● Has a password been reset in the past 48 hours? ● Was network equipment replaced with a different make or model?
  • 12. FeatureByte 12 Diversity Signals How variable are the data values? Metrics ● Coefficient of variation of amounts ● Entropy of labels Examples ● How variable are the monthly invoice amounts over the past 6 months? ● How stable is a patient's blood pressure? ● Does a customer always purchase similar products?
  • 13. FeatureByte 13 Location Signals The static or dynamic location of an event or entity. Metrics ● The latitude and longitude of an event. ● Distance between locations, or distance moved within a time period. Examples ● What is the distance between the shop and the customer address? ● Is the shipping address in a remote location? What is the population density in that state? ● How far has a vehicle moved in the past hour?
  • 14. FeatureByte 14 Regularity Signals Any seasonality in the timing of events. Metrics ● Entropy of day of the week of events. ● Cross-aggregation of events by hour or month Examples ● Does a customer always shop on the same days of the week? ● Do equipment failures tend to occur at different times of the day or times of year?
  • 15. FeatureByte 15 Recency Signals Attributes of the latest event to occur. Metrics ● Time since last event. ● Label of last event. ● Magnitude of last event. Examples ● How much did a customer spend on their last purchase? ● How long since the last equipment failure? ● Was a patient’s last visit for an emergency?
  • 16. FeatureByte 16 Frequency Signals How often do events occur? Metrics ● Number of events occurring over a time window. Examples ● How many hospital admissions has the patient had in the past 12 months? ● How many international phone calls has a customer made in the past month?
  • 17. FeatureByte 17 Monetary Signals What are the monetary amounts of events over a time window? Metrics ● Total cost of events occurring over a time window. ● Average cost of events occurring over a time window Examples ● How much did a customer spend over the past 12 months? ● What was the average discount applied to a customer’s purchases over the past month?
  • 19. FeatureByte 19 Entities Parent Entities ● A group that an entity belongs to e.g. gender, or country. ● Can be joined to the unit of analysis as a robust signal source. Child Entities ● More granular than your unit of analysis. ● Needs to be aggregated. Use a Variety of Entities ● An entity is a real-world object or concept that is represented by fields in the source tables. ● All features must relate to an entity (or entities) as their primary unit of analysis. ● Table joins add valuable signals.
  • 21. FeatureByte 21 Conclusion Use signal types and entities for inspiration and information Inspiration ● Lists of signal types and entities can prompt and remind you to engineer missing features Information ● Features with different signal types tend to be less correlated ● Business SMEs can intuitively understand signal types