Identifying customer potentials through unsupervised learning

IDENTIFYING CUSTOMER POTENTIALS – PERFORMING
UNSUPERVISED LEARNING
Conrad Kleinn 05.02.2019

AGENDA
07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 2
• DATA SCIENCE – CROSS INDUSTRY STANDARD PROCESS (CRISP)
• BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES
• DATA UNDERSTANDING – REALITY AND DATA MODEL
• DATA PREPARATION – ENGINEERING FEATURES
• MODELING – APPLYING USEFUL ALGORITHM
• EVALUATION – DOES IT MAKE SENSE?
• DEPLOYMENT – UTILIZING THE RESULTS
• OUTLOOK

DATA SCIENCE – CROSS INDUSTRY STANDARD
PROCESS (CRISP)

BUSINESS UNDERSTANDING – CONTEXT AND
OBJECTIVES
Customers are different and have different needs and expactations. Therefore it is a
core demand of marketing & sales to find a good differentiation of their customer
base. This enables marketing & sales to perfectly fit advertising, campaigns and
sales activities to their specific customer groups.
We want to identify useful clusters from our data. Since we dont have labels to
learn from we need to use an unsupervised approach.

BUSINESS UNDERSTANDING – CONTEXT AND
OBJECTIVES
Criterias given and agreed by business:
› There is information about revenue und employees for at least 2017
› The considered companys have a minimum of 200 employees
› „Company“ means a consolidation circle („Konzern“)
› About 14k observations meet this criteria

DATA UNDERSTANDING – REALITY AND DATA
MODEL
• Companys & „Konzerne“ are represented by customer masterdata -> Golden
Record from SAP & Bedirect
• Customer masterdata include hierachical information representing „Konzern“
structures
• Employee and revenue information can be aggregated within the hierarchy
• There are corresponding data on contracts and billing from SAP. They are a
representation of the business relationship with haufe.

DATA PREPARATION – ENGINEERING FEATURES
General information on data preparation
› Imputation -> „Filling gaps“
› Aggregation level
› Aggregation method (mean, stddev, sum, min, max…)
› Treating outliers -> Boxplots
› Standardization (z-transformation, minmax)
› Create patterns (eg. binary)
› Categorical variables will be transformed to numerical dummy variables

Required mathematical expression (in case of unsupervised learning)

Creating a pattern variable representing the revenue development over the last 3
years.
› When applying the tripartite system a vektor can be created that shows for
each year if the revenue dropped, remained or rised.
› When converting to decimal and summarizing the years we will have a „handy“
variable representing revenue development.
› We do the same for number of employess and
billing information

Creating a pattern variables representing the value of the subscription portfolios
› We apply a binary system based on hierachical productinformation from
business.
› We summarize as decimal

MODELING – APPLYING USEFUL ALGORITHM
Since we don‘t have any given labels to learn from we need to perform a
unsupervised learning.
We choose K-MEANS as algorithm

We choose k==4 startingpoints

EVALUATION – DOES IT MAKE SENSE?
Describing the calculated clusters

EVALUATION – DOES IT MAKE SENSE?
Naming the calculated clusters

DEPLOYMENT – UTILIZING THE RESULTS
Saleslists -> Already tested succesfully at channel sales (Michel Lason)
FKS (Firmen-Konzern-Sicht)
Chordiant
Other campaigntools
Customer Service

OUTLOOK
Using calculated clusters as target variables for discriminant analysis and/or
decision tree.
> Predict Clusters
> „Handy“ Formula
> Better understanding of clusters (significance, exogene variables,
separation points etc.)
Using cosine distance for identifying similaritys between vectors

Identifying customer potentials through unsupervised learning

Recommended

Recommended

More Related Content

Similar to Identifying customer potentials through unsupervised learning

Similar to Identifying customer potentials through unsupervised learning (20)

More from Haufe-Lexware GmbH & Co KG

More from Haufe-Lexware GmbH & Co KG (20)

Recently uploaded

Recently uploaded (20)

Identifying customer potentials through unsupervised learning