SlideShare a Scribd company logo
1 of 22
Download to read offline
FeatureByte
Feature Ideation
9th May 2023
Singapore
FeatureByte
Feature Ideation
2
Colin Priest, Chief Evangelist
Data is a finite resource, and you want to get the most out of it. So
feature engineering becomes vital for machine learning success. Yet
we often lose important signals in our data when our data extracts
are limited to the same old boring COUNT and SUM aggregations.
We propose a new approach to feature engineering ideation, based
upon a structure using data semantics and signal types, yet with the
freedom to apply domain knowledge.
FeatureByte
Agenda
3
● Why is Feature Engineering Important?
● Signal Types
● Entities
FeatureByte
Why is Feature
Engineering Important?
FeatureByte 5
Why is Feature Engineering Important?
Clean Data
Feature engineering tells an
algorithm how to handle
data.
Improved Model
Performance
Better features lead to more
accurate models, which
leads to better predictive
power and ultimately, better
decisions.
Explainability
Feature engineering is a
creative process that
requires a deep
understanding of data to
create signals that
intuitively link to real world
characteristics and actions.
FeatureByte
Signal Types
FeatureByte 7
Clumpiness Signals
Do events occur randomly across time,
or are there long gaps between intense
bursts of activity?
Metrics
● Coefficient of variation of
inter-event times
● Entropy of inter-event times
Examples
● Which customers are binge-watching TV
shows?
● Do equipment failures occur in cascades?
● After a long break without purchasing, does
a customer return to a store a few times
over several days to purchase items?
FeatureByte 8
Inventory Signals
What are the counts or amounts of
events or available resources,
cross-categorized by an item label?
Metrics
● Counts of item types
● Sum of values of item types
● Max values of item types
● Latest attribute of item types
Examples
● What are the counts of each item type in a
shopping basket?
● What is the total value of item type for each
customer’s shipping orders over 30 days?
● What is the maximum weight of each type of
item in a shipping order?
FeatureByte 9
Similarity Signals
How similar is one entity to a group of
others?
Metrics
● Ratio of one entity's numeric
value to the average or
maximum of values over a
group of entities.
● Cosine similarity of two
entities’ cross-aggregations
Examples
● How similar is one customer's purchases to
customers of the same age and gender?
● What is the ratio of a network antenna's
maximum temperature to that of all other
antennas that are the same model, or
located in the same geography?
FeatureByte 10
Stability Signals
Are an entity’s recent events similar to
the past for the same entity?
Metrics
● Ratio of latest numeric value to
the average or max of values
over a historical time window.
● Cosine similarity of two
cross-aggregations from
different time periods
Examples
● Are the contents of the latest shopping
basket similar to past purchases?
● Is the count or sum of numeric event data
over the most recent period similar to the
past?
● Is a recent data value anomalous?
FeatureByte 11
Change Signals
Have usually static attributes changed?
By how much?
Metrics
● Did an attribute change?
● How many times did an
attribute change?
● By how much did it change?
Examples
● Did a customer change address over the past
two years? How far?
● Has a password been reset in the past 48
hours?
● Was network equipment replaced with a
different make or model?
FeatureByte 12
Diversity Signals
How variable are the data values?
Metrics
● Coefficient of variation of
amounts
● Entropy of labels
Examples
● How variable are the monthly invoice
amounts over the past 6 months?
● How stable is a patient's blood pressure?
● Does a customer always purchase similar
products?
FeatureByte 13
Location Signals
The static or dynamic location of an
event or entity.
Metrics
● The latitude and longitude of an
event.
● Distance between locations, or
distance moved within a time
period.
Examples
● What is the distance between the shop and
the customer address?
● Is the shipping address in a remote location?
What is the population density in that state?
● How far has a vehicle moved in the past
hour?
FeatureByte 14
Regularity Signals
Any seasonality in the timing of events.
Metrics
● Entropy of day of the week of
events.
● Cross-aggregation of events by
hour or month
Examples
● Does a customer always shop on the same
days of the week?
● Do equipment failures tend to occur at
different times of the day or times of year?
FeatureByte 15
Recency Signals
Attributes of the latest event to occur.
Metrics
● Time since last event.
● Label of last event.
● Magnitude of last event.
Examples
● How much did a customer spend on their
last purchase?
● How long since the last equipment failure?
● Was a patient’s last visit for an emergency?
FeatureByte 16
Frequency Signals
How often do events occur?
Metrics
● Number of events occurring
over a time window.
Examples
● How many hospital admissions has the
patient had in the past 12 months?
● How many international phone calls has a
customer made in the past month?
FeatureByte 17
Monetary Signals
What are the monetary amounts of
events over a time window?
Metrics
● Total cost of events occurring
over a time window.
● Average cost of events
occurring over a time window
Examples
● How much did a customer spend over the
past 12 months?
● What was the average discount applied to a
customer’s purchases over the past month?
FeatureByte
Entities
FeatureByte 19
Entities
Parent Entities
● A group that an entity
belongs to e.g. gender, or
country.
● Can be joined to the unit of
analysis as a robust signal
source.
Child Entities
● More granular than your
unit of analysis.
● Needs to be aggregated.
Use a Variety of Entities
● An entity is a real-world object
or concept that is represented
by fields in the source tables.
● All features must relate to an
entity (or entities) as their
primary unit of analysis.
● Table joins add valuable
signals.
FeatureByte
Conclusion
FeatureByte 21
Conclusion
Use signal types and entities for
inspiration and information
Inspiration
● Lists of signal types and entities
can prompt and remind you to
engineer missing features
Information
● Features with different signal types tend to
be less correlated
● Business SMEs can intuitively understand
signal types
FeatureByte
Thank You!

More Related Content

Similar to Feature Ideation

TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14TechSoup
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsJohn Blossom
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Rittman Analytics
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Richard Veryard
 
Financial analysis for product managers
Financial analysis for product managersFinancial analysis for product managers
Financial analysis for product managersMike Claiborne
 
Data Wrangling Questionnaire
Data Wrangling QuestionnaireData Wrangling Questionnaire
Data Wrangling QuestionnaireMimi Brown
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big DataRavinder Kamboj
 
Government and Software as-a-service
Government and Software as-a-serviceGovernment and Software as-a-service
Government and Software as-a-serviceProudCity
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarNaresh Jain
 
A multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner EriksenA multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner EriksenAlan Quayle
 
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Luciano Pesci, PhD
 
Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018Utah Digital Marketing Collective
 
22 non statistical questions for a statistician v2
22 non statistical questions for a statistician v222 non statistical questions for a statistician v2
22 non statistical questions for a statistician v2Derick Jose
 
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...fluturadsa
 
Multiquip Goes AI with Involve
Multiquip Goes AI with InvolveMultiquip Goes AI with Involve
Multiquip Goes AI with InvolveRazaSiddiqui9
 
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese,  Big Data as core engine to support the Wind Tre datadriven journeyM. Savarese,  Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journeyIstituto nazionale di statistica
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolLouis Cialdella
 
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...Search Engine Journal
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 

Similar to Feature Ideation (20)

TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015
 
Financial analysis for product managers
Financial analysis for product managersFinancial analysis for product managers
Financial analysis for product managers
 
Data Wrangling Questionnaire
Data Wrangling QuestionnaireData Wrangling Questionnaire
Data Wrangling Questionnaire
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
 
Data Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analyticsData Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analytics
 
Government and Software as-a-service
Government and Software as-a-serviceGovernment and Software as-a-service
Government and Software as-a-service
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
 
A multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner EriksenA multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner Eriksen
 
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)
 
Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018
 
22 non statistical questions for a statistician v2
22 non statistical questions for a statistician v222 non statistical questions for a statistician v2
22 non statistical questions for a statistician v2
 
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
 
Multiquip Goes AI with Involve
Multiquip Goes AI with InvolveMultiquip Goes AI with Involve
Multiquip Goes AI with Involve
 
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese,  Big Data as core engine to support the Wind Tre datadriven journeyM. Savarese,  Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 

Recently uploaded

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...boychatmate1
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 

Recently uploaded (20)

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 

Feature Ideation

  • 2. FeatureByte Feature Ideation 2 Colin Priest, Chief Evangelist Data is a finite resource, and you want to get the most out of it. So feature engineering becomes vital for machine learning success. Yet we often lose important signals in our data when our data extracts are limited to the same old boring COUNT and SUM aggregations. We propose a new approach to feature engineering ideation, based upon a structure using data semantics and signal types, yet with the freedom to apply domain knowledge.
  • 3. FeatureByte Agenda 3 ● Why is Feature Engineering Important? ● Signal Types ● Entities
  • 5. FeatureByte 5 Why is Feature Engineering Important? Clean Data Feature engineering tells an algorithm how to handle data. Improved Model Performance Better features lead to more accurate models, which leads to better predictive power and ultimately, better decisions. Explainability Feature engineering is a creative process that requires a deep understanding of data to create signals that intuitively link to real world characteristics and actions.
  • 7. FeatureByte 7 Clumpiness Signals Do events occur randomly across time, or are there long gaps between intense bursts of activity? Metrics ● Coefficient of variation of inter-event times ● Entropy of inter-event times Examples ● Which customers are binge-watching TV shows? ● Do equipment failures occur in cascades? ● After a long break without purchasing, does a customer return to a store a few times over several days to purchase items?
  • 8. FeatureByte 8 Inventory Signals What are the counts or amounts of events or available resources, cross-categorized by an item label? Metrics ● Counts of item types ● Sum of values of item types ● Max values of item types ● Latest attribute of item types Examples ● What are the counts of each item type in a shopping basket? ● What is the total value of item type for each customer’s shipping orders over 30 days? ● What is the maximum weight of each type of item in a shipping order?
  • 9. FeatureByte 9 Similarity Signals How similar is one entity to a group of others? Metrics ● Ratio of one entity's numeric value to the average or maximum of values over a group of entities. ● Cosine similarity of two entities’ cross-aggregations Examples ● How similar is one customer's purchases to customers of the same age and gender? ● What is the ratio of a network antenna's maximum temperature to that of all other antennas that are the same model, or located in the same geography?
  • 10. FeatureByte 10 Stability Signals Are an entity’s recent events similar to the past for the same entity? Metrics ● Ratio of latest numeric value to the average or max of values over a historical time window. ● Cosine similarity of two cross-aggregations from different time periods Examples ● Are the contents of the latest shopping basket similar to past purchases? ● Is the count or sum of numeric event data over the most recent period similar to the past? ● Is a recent data value anomalous?
  • 11. FeatureByte 11 Change Signals Have usually static attributes changed? By how much? Metrics ● Did an attribute change? ● How many times did an attribute change? ● By how much did it change? Examples ● Did a customer change address over the past two years? How far? ● Has a password been reset in the past 48 hours? ● Was network equipment replaced with a different make or model?
  • 12. FeatureByte 12 Diversity Signals How variable are the data values? Metrics ● Coefficient of variation of amounts ● Entropy of labels Examples ● How variable are the monthly invoice amounts over the past 6 months? ● How stable is a patient's blood pressure? ● Does a customer always purchase similar products?
  • 13. FeatureByte 13 Location Signals The static or dynamic location of an event or entity. Metrics ● The latitude and longitude of an event. ● Distance between locations, or distance moved within a time period. Examples ● What is the distance between the shop and the customer address? ● Is the shipping address in a remote location? What is the population density in that state? ● How far has a vehicle moved in the past hour?
  • 14. FeatureByte 14 Regularity Signals Any seasonality in the timing of events. Metrics ● Entropy of day of the week of events. ● Cross-aggregation of events by hour or month Examples ● Does a customer always shop on the same days of the week? ● Do equipment failures tend to occur at different times of the day or times of year?
  • 15. FeatureByte 15 Recency Signals Attributes of the latest event to occur. Metrics ● Time since last event. ● Label of last event. ● Magnitude of last event. Examples ● How much did a customer spend on their last purchase? ● How long since the last equipment failure? ● Was a patient’s last visit for an emergency?
  • 16. FeatureByte 16 Frequency Signals How often do events occur? Metrics ● Number of events occurring over a time window. Examples ● How many hospital admissions has the patient had in the past 12 months? ● How many international phone calls has a customer made in the past month?
  • 17. FeatureByte 17 Monetary Signals What are the monetary amounts of events over a time window? Metrics ● Total cost of events occurring over a time window. ● Average cost of events occurring over a time window Examples ● How much did a customer spend over the past 12 months? ● What was the average discount applied to a customer’s purchases over the past month?
  • 19. FeatureByte 19 Entities Parent Entities ● A group that an entity belongs to e.g. gender, or country. ● Can be joined to the unit of analysis as a robust signal source. Child Entities ● More granular than your unit of analysis. ● Needs to be aggregated. Use a Variety of Entities ● An entity is a real-world object or concept that is represented by fields in the source tables. ● All features must relate to an entity (or entities) as their primary unit of analysis. ● Table joins add valuable signals.
  • 21. FeatureByte 21 Conclusion Use signal types and entities for inspiration and information Inspiration ● Lists of signal types and entities can prompt and remind you to engineer missing features Information ● Features with different signal types tend to be less correlated ● Business SMEs can intuitively understand signal types