SlideShare a Scribd company logo
FeatureByte
Feature Ideation
9th May 2023
Singapore
FeatureByte
Feature Ideation
2
Colin Priest, Chief Evangelist
Data is a finite resource, and you want to get the most out of it. So
feature engineering becomes vital for machine learning success. Yet
we often lose important signals in our data when our data extracts
are limited to the same old boring COUNT and SUM aggregations.
We propose a new approach to feature engineering ideation, based
upon a structure using data semantics and signal types, yet with the
freedom to apply domain knowledge.
FeatureByte
Agenda
3
● Why is Feature Engineering Important?
● Signal Types
● Entities
FeatureByte
Why is Feature
Engineering Important?
FeatureByte 5
Why is Feature Engineering Important?
Clean Data
Feature engineering tells an
algorithm how to handle
data.
Improved Model
Performance
Better features lead to more
accurate models, which
leads to better predictive
power and ultimately, better
decisions.
Explainability
Feature engineering is a
creative process that
requires a deep
understanding of data to
create signals that
intuitively link to real world
characteristics and actions.
FeatureByte
Signal Types
FeatureByte 7
Clumpiness Signals
Do events occur randomly across time,
or are there long gaps between intense
bursts of activity?
Metrics
● Coefficient of variation of
inter-event times
● Entropy of inter-event times
Examples
● Which customers are binge-watching TV
shows?
● Do equipment failures occur in cascades?
● After a long break without purchasing, does
a customer return to a store a few times
over several days to purchase items?
FeatureByte 8
Inventory Signals
What are the counts or amounts of
events or available resources,
cross-categorized by an item label?
Metrics
● Counts of item types
● Sum of values of item types
● Max values of item types
● Latest attribute of item types
Examples
● What are the counts of each item type in a
shopping basket?
● What is the total value of item type for each
customer’s shipping orders over 30 days?
● What is the maximum weight of each type of
item in a shipping order?
FeatureByte 9
Similarity Signals
How similar is one entity to a group of
others?
Metrics
● Ratio of one entity's numeric
value to the average or
maximum of values over a
group of entities.
● Cosine similarity of two
entities’ cross-aggregations
Examples
● How similar is one customer's purchases to
customers of the same age and gender?
● What is the ratio of a network antenna's
maximum temperature to that of all other
antennas that are the same model, or
located in the same geography?
FeatureByte 10
Stability Signals
Are an entity’s recent events similar to
the past for the same entity?
Metrics
● Ratio of latest numeric value to
the average or max of values
over a historical time window.
● Cosine similarity of two
cross-aggregations from
different time periods
Examples
● Are the contents of the latest shopping
basket similar to past purchases?
● Is the count or sum of numeric event data
over the most recent period similar to the
past?
● Is a recent data value anomalous?
FeatureByte 11
Change Signals
Have usually static attributes changed?
By how much?
Metrics
● Did an attribute change?
● How many times did an
attribute change?
● By how much did it change?
Examples
● Did a customer change address over the past
two years? How far?
● Has a password been reset in the past 48
hours?
● Was network equipment replaced with a
different make or model?
FeatureByte 12
Diversity Signals
How variable are the data values?
Metrics
● Coefficient of variation of
amounts
● Entropy of labels
Examples
● How variable are the monthly invoice
amounts over the past 6 months?
● How stable is a patient's blood pressure?
● Does a customer always purchase similar
products?
FeatureByte 13
Location Signals
The static or dynamic location of an
event or entity.
Metrics
● The latitude and longitude of an
event.
● Distance between locations, or
distance moved within a time
period.
Examples
● What is the distance between the shop and
the customer address?
● Is the shipping address in a remote location?
What is the population density in that state?
● How far has a vehicle moved in the past
hour?
FeatureByte 14
Regularity Signals
Any seasonality in the timing of events.
Metrics
● Entropy of day of the week of
events.
● Cross-aggregation of events by
hour or month
Examples
● Does a customer always shop on the same
days of the week?
● Do equipment failures tend to occur at
different times of the day or times of year?
FeatureByte 15
Recency Signals
Attributes of the latest event to occur.
Metrics
● Time since last event.
● Label of last event.
● Magnitude of last event.
Examples
● How much did a customer spend on their
last purchase?
● How long since the last equipment failure?
● Was a patient’s last visit for an emergency?
FeatureByte 16
Frequency Signals
How often do events occur?
Metrics
● Number of events occurring
over a time window.
Examples
● How many hospital admissions has the
patient had in the past 12 months?
● How many international phone calls has a
customer made in the past month?
FeatureByte 17
Monetary Signals
What are the monetary amounts of
events over a time window?
Metrics
● Total cost of events occurring
over a time window.
● Average cost of events
occurring over a time window
Examples
● How much did a customer spend over the
past 12 months?
● What was the average discount applied to a
customer’s purchases over the past month?
FeatureByte
Entities
FeatureByte 19
Entities
Parent Entities
● A group that an entity
belongs to e.g. gender, or
country.
● Can be joined to the unit of
analysis as a robust signal
source.
Child Entities
● More granular than your
unit of analysis.
● Needs to be aggregated.
Use a Variety of Entities
● An entity is a real-world object
or concept that is represented
by fields in the source tables.
● All features must relate to an
entity (or entities) as their
primary unit of analysis.
● Table joins add valuable
signals.
FeatureByte
Conclusion
FeatureByte 21
Conclusion
Use signal types and entities for
inspiration and information
Inspiration
● Lists of signal types and entities
can prompt and remind you to
engineer missing features
Information
● Features with different signal types tend to
be less correlated
● Business SMEs can intuitively understand
signal types
FeatureByte
Thank You!

More Related Content

Similar to Feature Ideation

TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
John Blossom
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
Rittman Analytics
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015
Richard Veryard
 
Financial analysis for product managers
Financial analysis for product managersFinancial analysis for product managers
Financial analysis for product managers
Mike Claiborne
 
Data Wrangling Questionnaire
Data Wrangling QuestionnaireData Wrangling Questionnaire
Data Wrangling Questionnaire
Mimi Brown
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
Ravinder Kamboj
 
Data Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analyticsData Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Government and Software as-a-service
Government and Software as-a-serviceGovernment and Software as-a-service
Government and Software as-a-service
ProudCity
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
Naresh Jain
 
A multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner EriksenA multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner Eriksen
Alan Quayle
 
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Luciano Pesci, PhD
 
Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018
Utah Digital Marketing Collective
 
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
fluturadsa
 
22 non statistical questions for a statistician v2
22 non statistical questions for a statistician v222 non statistical questions for a statistician v2
22 non statistical questions for a statistician v2
Derick Jose
 
Multiquip Goes AI with Involve
Multiquip Goes AI with InvolveMultiquip Goes AI with Involve
Multiquip Goes AI with Involve
RazaSiddiqui9
 
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese,  Big Data as core engine to support the Wind Tre datadriven journeyM. Savarese,  Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
Istituto nazionale di statistica
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
Louis Cialdella
 
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Search Engine Journal
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
Vivek Murugesan
 

Similar to Feature Ideation (20)

TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
TechSoup Global CEO Rebecca Masisak "Big Ideas" session at NTC14
 
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and SemanticsThe New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
The New Role of Epertise: Open Science in a Web of Sensors, Senses and Semantics
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015
 
Financial analysis for product managers
Financial analysis for product managersFinancial analysis for product managers
Financial analysis for product managers
 
Data Wrangling Questionnaire
Data Wrangling QuestionnaireData Wrangling Questionnaire
Data Wrangling Questionnaire
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
 
Data Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analyticsData Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analytics
 
Government and Software as-a-service
Government and Software as-a-serviceGovernment and Software as-a-service
Government and Software as-a-service
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
 
A multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner EriksenA multi sided marketplace platform for telco enabled products Werner Eriksen
A multi sided marketplace platform for telco enabled products Werner Eriksen
 
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)
 
Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018Lifetime Value (the only metric that matters) Utah DMC September 2018
Lifetime Value (the only metric that matters) Utah DMC September 2018
 
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...How to bridge the gap between Statisticans and Business folks ? Flutura outli...
How to bridge the gap between Statisticans and Business folks ? Flutura outli...
 
22 non statistical questions for a statistician v2
22 non statistical questions for a statistician v222 non statistical questions for a statistician v2
22 non statistical questions for a statistician v2
 
Multiquip Goes AI with Involve
Multiquip Goes AI with InvolveMultiquip Goes AI with Involve
Multiquip Goes AI with Involve
 
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese,  Big Data as core engine to support the Wind Tre datadriven journeyM. Savarese,  Big Data as core engine to support the Wind Tre datadriven journey
M. Savarese, Big Data as core engine to support the Wind Tre datadriven journey
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
Unleash Your Campaign's Potential: Elevate Your Performance Marketing Game_Fi...
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 

Feature Ideation

  • 2. FeatureByte Feature Ideation 2 Colin Priest, Chief Evangelist Data is a finite resource, and you want to get the most out of it. So feature engineering becomes vital for machine learning success. Yet we often lose important signals in our data when our data extracts are limited to the same old boring COUNT and SUM aggregations. We propose a new approach to feature engineering ideation, based upon a structure using data semantics and signal types, yet with the freedom to apply domain knowledge.
  • 3. FeatureByte Agenda 3 ● Why is Feature Engineering Important? ● Signal Types ● Entities
  • 5. FeatureByte 5 Why is Feature Engineering Important? Clean Data Feature engineering tells an algorithm how to handle data. Improved Model Performance Better features lead to more accurate models, which leads to better predictive power and ultimately, better decisions. Explainability Feature engineering is a creative process that requires a deep understanding of data to create signals that intuitively link to real world characteristics and actions.
  • 7. FeatureByte 7 Clumpiness Signals Do events occur randomly across time, or are there long gaps between intense bursts of activity? Metrics ● Coefficient of variation of inter-event times ● Entropy of inter-event times Examples ● Which customers are binge-watching TV shows? ● Do equipment failures occur in cascades? ● After a long break without purchasing, does a customer return to a store a few times over several days to purchase items?
  • 8. FeatureByte 8 Inventory Signals What are the counts or amounts of events or available resources, cross-categorized by an item label? Metrics ● Counts of item types ● Sum of values of item types ● Max values of item types ● Latest attribute of item types Examples ● What are the counts of each item type in a shopping basket? ● What is the total value of item type for each customer’s shipping orders over 30 days? ● What is the maximum weight of each type of item in a shipping order?
  • 9. FeatureByte 9 Similarity Signals How similar is one entity to a group of others? Metrics ● Ratio of one entity's numeric value to the average or maximum of values over a group of entities. ● Cosine similarity of two entities’ cross-aggregations Examples ● How similar is one customer's purchases to customers of the same age and gender? ● What is the ratio of a network antenna's maximum temperature to that of all other antennas that are the same model, or located in the same geography?
  • 10. FeatureByte 10 Stability Signals Are an entity’s recent events similar to the past for the same entity? Metrics ● Ratio of latest numeric value to the average or max of values over a historical time window. ● Cosine similarity of two cross-aggregations from different time periods Examples ● Are the contents of the latest shopping basket similar to past purchases? ● Is the count or sum of numeric event data over the most recent period similar to the past? ● Is a recent data value anomalous?
  • 11. FeatureByte 11 Change Signals Have usually static attributes changed? By how much? Metrics ● Did an attribute change? ● How many times did an attribute change? ● By how much did it change? Examples ● Did a customer change address over the past two years? How far? ● Has a password been reset in the past 48 hours? ● Was network equipment replaced with a different make or model?
  • 12. FeatureByte 12 Diversity Signals How variable are the data values? Metrics ● Coefficient of variation of amounts ● Entropy of labels Examples ● How variable are the monthly invoice amounts over the past 6 months? ● How stable is a patient's blood pressure? ● Does a customer always purchase similar products?
  • 13. FeatureByte 13 Location Signals The static or dynamic location of an event or entity. Metrics ● The latitude and longitude of an event. ● Distance between locations, or distance moved within a time period. Examples ● What is the distance between the shop and the customer address? ● Is the shipping address in a remote location? What is the population density in that state? ● How far has a vehicle moved in the past hour?
  • 14. FeatureByte 14 Regularity Signals Any seasonality in the timing of events. Metrics ● Entropy of day of the week of events. ● Cross-aggregation of events by hour or month Examples ● Does a customer always shop on the same days of the week? ● Do equipment failures tend to occur at different times of the day or times of year?
  • 15. FeatureByte 15 Recency Signals Attributes of the latest event to occur. Metrics ● Time since last event. ● Label of last event. ● Magnitude of last event. Examples ● How much did a customer spend on their last purchase? ● How long since the last equipment failure? ● Was a patient’s last visit for an emergency?
  • 16. FeatureByte 16 Frequency Signals How often do events occur? Metrics ● Number of events occurring over a time window. Examples ● How many hospital admissions has the patient had in the past 12 months? ● How many international phone calls has a customer made in the past month?
  • 17. FeatureByte 17 Monetary Signals What are the monetary amounts of events over a time window? Metrics ● Total cost of events occurring over a time window. ● Average cost of events occurring over a time window Examples ● How much did a customer spend over the past 12 months? ● What was the average discount applied to a customer’s purchases over the past month?
  • 19. FeatureByte 19 Entities Parent Entities ● A group that an entity belongs to e.g. gender, or country. ● Can be joined to the unit of analysis as a robust signal source. Child Entities ● More granular than your unit of analysis. ● Needs to be aggregated. Use a Variety of Entities ● An entity is a real-world object or concept that is represented by fields in the source tables. ● All features must relate to an entity (or entities) as their primary unit of analysis. ● Table joins add valuable signals.
  • 21. FeatureByte 21 Conclusion Use signal types and entities for inspiration and information Inspiration ● Lists of signal types and entities can prompt and remind you to engineer missing features Information ● Features with different signal types tend to be less correlated ● Business SMEs can intuitively understand signal types