Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Identifying customer potentials through unsupervised learning

20 views

Published on

It is a core demand of marketing & sales to segment their customer base. Join this session to learn to identify and prepare the data to perform this segmentation with machine learning.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Identifying customer potentials through unsupervised learning

  1. 1. IDENTIFYING CUSTOMER POTENTIALS – PERFORMING UNSUPERVISED LEARNING Conrad Kleinn 05.02.2019
  2. 2. AGENDA 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 2 • DATA SCIENCE – CROSS INDUSTRY STANDARD PROCESS (CRISP) • BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES • DATA UNDERSTANDING – REALITY AND DATA MODEL • DATA PREPARATION – ENGINEERING FEATURES • MODELING – APPLYING USEFUL ALGORITHM • EVALUATION – DOES IT MAKE SENSE? • DEPLOYMENT – UTILIZING THE RESULTS • OUTLOOK
  3. 3. DATA SCIENCE – CROSS INDUSTRY STANDARD PROCESS (CRISP) 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 3
  4. 4. BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 4 Customers are different and have different needs and expactations. Therefore it is a core demand of marketing & sales to find a good differentiation of their customer base. This enables marketing & sales to perfectly fit advertising, campaigns and sales activities to their specific customer groups. We want to identify useful clusters from our data. Since we dont have labels to learn from we need to use an unsupervised approach.
  5. 5. BUSINESS UNDERSTANDING – CONTEXT AND OBJECTIVES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 5 Criterias given and agreed by business: › There is information about revenue und employees for at least 2017 › The considered companys have a minimum of 200 employees › „Company“ means a consolidation circle („Konzern“) › About 14k observations meet this criteria
  6. 6. DATA UNDERSTANDING – REALITY AND DATA MODEL 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 6 • Companys & „Konzerne“ are represented by customer masterdata -> Golden Record from SAP & Bedirect • Customer masterdata include hierachical information representing „Konzern“ structures • Employee and revenue information can be aggregated within the hierarchy • There are corresponding data on contracts and billing from SAP. They are a representation of the business relationship with haufe.
  7. 7. DATA PREPARATION – ENGINEERING FEATURES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 7 General information on data preparation › Imputation -> „Filling gaps“ › Aggregation level › Aggregation method (mean, stddev, sum, min, max…) › Treating outliers -> Boxplots › Standardization (z-transformation, minmax) › Create patterns (eg. binary) › Categorical variables will be transformed to numerical dummy variables
  8. 8. DATA PREPARATION – ENGINEERING FEATURES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 8 Required mathematical expression (in case of unsupervised learning)
  9. 9. DATA PREPARATION – ENGINEERING FEATURES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 9 Creating a pattern variable representing the revenue development over the last 3 years. › When applying the tripartite system a vektor can be created that shows for each year if the revenue dropped, remained or rised. › When converting to decimal and summarizing the years we will have a „handy“ variable representing revenue development. › We do the same for number of employess and billing information
  10. 10. DATA PREPARATION – ENGINEERING FEATURES 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 10 Creating a pattern variables representing the value of the subscription portfolios › We apply a binary system based on hierachical productinformation from business. › We summarize as decimal
  11. 11. MODELING – APPLYING USEFUL ALGORITHM 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 11 Since we don‘t have any given labels to learn from we need to perform a unsupervised learning. We choose K-MEANS as algorithm
  12. 12. MODELING – APPLYING USEFUL ALGORITHM 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 12 We choose k==4 startingpoints
  13. 13. MODELING – APPLYING USEFUL ALGORITHM 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 13
  14. 14. EVALUATION – DOES IT MAKE SENSE? 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 14 Describing the calculated clusters
  15. 15. EVALUATION – DOES IT MAKE SENSE? 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 15 Naming the calculated clusters
  16. 16. DEPLOYMENT – UTILIZING THE RESULTS 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 16 Saleslists -> Already tested succesfully at channel sales (Michel Lason) FKS (Firmen-Konzern-Sicht) Chordiant Other campaigntools Customer Service
  17. 17. OUTLOOK 07.02.2019 Fußzeile ändern unter: Einfügen > Kopf- und FußzeileSeite 17 Using calculated clusters as target variables for discriminant analysis and/or decision tree. > Predict Clusters > „Handy“ Formula > Better understanding of clusters (significance, exogene variables, separation points etc.) Using cosine distance for identifying similaritys between vectors

×