It is a core demand of marketing & sales to segment their customer base. Join this session to learn to identify and prepare the data to perform this segmentation with machine learning.
2. AGENDA
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 2
• DATA SCIENCE – CROSS INDUSTRY STANDARD PROCESS (CRISP)
• BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES
• DATA UNDERSTANDING – REALITY AND DATA MODEL
• DATA PREPARATION – ENGINEERING FEATURES
• MODELING – APPLYING USEFUL ALGORITHM
• EVALUATION – DOES IT MAKE SENSE?
• DEPLOYMENT – UTILIZING THE RESULTS
• OUTLOOK
3. DATA SCIENCE – CROSS INDUSTRY STANDARD
PROCESS (CRISP)
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 3
4. BUSINESS UNDERSTANDING – CONTEXT AND
OBJECTIVES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 4
Customers are different and have different needs and expactations. Therefore it is a
core demand of marketing & sales to find a good differentiation of their customer
base. This enables marketing & sales to perfectly fit advertising, campaigns and
sales activities to their specific customer groups.
We want to identify useful clusters from our data. Since we dont have labels to
learn from we need to use an unsupervised approach.
5. BUSINESS UNDERSTANDING – CONTEXT AND
OBJECTIVES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 5
Criterias given and agreed by business:
› There is information about revenue und employees for at least 2017
› The considered companys have a minimum of 200 employees
› „Company“ means a consolidation circle („Konzern“)
› About 14k observations meet this criteria
6. DATA UNDERSTANDING – REALITY AND DATA
MODEL
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 6
• Companys & „Konzerne“ are represented by customer masterdata -> Golden
Record from SAP & Bedirect
• Customer masterdata include hierachical information representing „Konzern“
structures
• Employee and revenue information can be aggregated within the hierarchy
• There are corresponding data on contracts and billing from SAP. They are a
representation of the business relationship with haufe.
7. DATA PREPARATION – ENGINEERING FEATURES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 7
General information on data preparation
› Imputation -> „Filling gaps“
› Aggregation level
› Aggregation method (mean, stddev, sum, min, max…)
› Treating outliers -> Boxplots
› Standardization (z-transformation, minmax)
› Create patterns (eg. binary)
› Categorical variables will be transformed to numerical dummy variables
8. DATA PREPARATION – ENGINEERING FEATURES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 8
Required mathematical expression (in case of unsupervised learning)
9. DATA PREPARATION – ENGINEERING FEATURES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 9
Creating a pattern variable representing the revenue development over the last 3
years.
› When applying the tripartite system a vektor can be created that shows for
each year if the revenue dropped, remained or rised.
› When converting to decimal and summarizing the years we will have a „handy“
variable representing revenue development.
› We do the same for number of employess and
billing information
10. DATA PREPARATION – ENGINEERING FEATURES
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 10
Creating a pattern variables representing the value of the subscription portfolios
› We apply a binary system based on hierachical productinformation from
business.
› We summarize as decimal
11. MODELING – APPLYING USEFUL ALGORITHM
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 11
Since we don‘t have any given labels to learn from we need to perform a
unsupervised learning.
We choose K-MEANS as algorithm
12. MODELING – APPLYING USEFUL ALGORITHM
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 12
We choose k==4 startingpoints
14. EVALUATION – DOES IT MAKE SENSE?
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 14
Describing the calculated clusters
15. EVALUATION – DOES IT MAKE SENSE?
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 15
Naming the calculated clusters
16. DEPLOYMENT – UTILIZING THE RESULTS
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 16
Saleslists -> Already tested succesfully at channel sales (Michel Lason)
FKS (Firmen-Konzern-Sicht)
Chordiant
Other campaigntools
Customer Service
17. OUTLOOK
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 17
Using calculated clusters as target variables for discriminant analysis and/or
decision tree.
> Predict Clusters
> „Handy“ Formula
> Better understanding of clusters (significance, exogene variables,
separation points etc.)
Using cosine distance for identifying similaritys between vectors