SlideShare a Scribd company logo
1 of 24
Optimizing Segmentation
Insight Research Group
2




Why is Segmentation Difficult?
 Infinite number of possible solutions

 Hundreds of possible variables for to use

 Clearly defined clusters are rarely present in real life data-
  sets
3




Technical Challenges

 Challenge: Incorporating fundamentally
  categorical variables
    Ethnicity, Religion, Political Party, Etc.


  Standard methods assume continuous data (ideal case) and
   require interval level data as worst case (e.g. ratings scales)
      Correlation, linear regression, k-means clustering
4




Technical Solutions
 Challenge: Incorporating fundamentally categorical
  variables

   Multiple Correspondence Analysis (factor analysis for categorical
    data)

   Pro: handles both demographic (categorical) and ratings variables
    Would allow treating sets of variables separately (i.e. demographic,
     behavioral, psychological) – these sets could be used as inputs to
     clustering method


   Con: segmentation would be based on extracted components
5




Technical Challenges

 Determining the number of clusters/segments in the
  data

    Standard methods require the user to specify the number of cluster
     to extract

    Our standard practice results in fewer clusters then input variables
      e.g. AMC segmentation solutions required ~12 variables to find ~5
       segments
      This ratio of features-to-segments will „water-down‟ the effect of the
       individual variables (segments do not differ significantly on most items)
6

Technical Solution 1
 Challenge: Determining the number of clusters/segments
  in the data

 Solution: fit a probabilistic mixture model and compute a
  complexity penalized likelihood (AIC / BIC scores)
   The model with the best AIC / BIC score is our best guess for the
    number of natural clusters in the data

   Gaussian mixture models for continuous data
   Latent Class Models for categorical data
     Latent class models can handle both categorical and continuous data if the
      continuous data is binned.


   Both of the above return BIC scores to determine the number of
    clusters
7


Technical Solution 1
 How many clusters do you see? (4 sources generated the data –duh)
8


Technical Solution 1
 The BIC infers 4 clusters (4 clusters solution had the best BIC score)
9


Technical Solution 1
 How many clusters do you see? (4 sources generated the data –not so obvious)
10


Technical Solution 1
 The BIC says 4! (4 clusters had the best BIC score, thanks BIC!)
11


Technical Solution 2
 Challenge: Determining the number of clusters/segments
  in the data
   Solution: ensure there are fewer input variables then extracted
    clusters
                                          2(+) segments can be obtained from
                                          a single variable.

                                          That is a 2-1 ratio of segments-to-
                                          variables

                                          For AMC & MTV we got 5 segments
                                          from ~12 variables. A ratio of 0.4-1.
                                          - That is less then 1 segments for
                                          every two variables…



       Also See: Van Buuren & Heiser (1989); Vichi & Kiers (2001); Hwang, Dillon, &
        Takane (2006).
12



Technical Challenges
 Respondents vary in their use of
  ratings scales

 Some respondents only use part
  of the scale,
     Either top or bottom of range


 Segmentation method will find the
  high/low scale-use respondents
  and define segments for them
     See AMC segments,
13




Psychographic
banner for AMC
segments.

These items were not
used to define the cluster
solution.
14




Technical Solution 1
 Challenge: Respondents vary in their use of ratings
  scales

   Calibrate respondents to equate ratings scale across sample
      Overcoming Scale Use Heterogeneity (2003) Peter E. Rossi


   Pro: Improves the accuracy and validly of standard methods
      E.g. correlation, regression, clustering


   Con: requires complex and computational expensive models
      i.e. hierarchical bayesian models – available as R package
15




Technical Solution 2
 Challenge: Respondents vary in their use of ratings
  scales

   Abandon rating scales – use simple Agree/Disagree variables
    Focus on methods for categorical variables

    Multiple Correspondence Analysis (factor analysis for categorical data)

    Pro: handles both demographic (categorical) and ratings variables
      Would allow treating sets of variables separately (i.e. demographic, behavioral,
       psychological) – these sets could be used as inputs to clustering methods
16




What Slows Us Down?
 Each segmentation iteration consumes resources

 Producing new segmentation variable for each respondent
    .5 man hour

 Producing new banners
    Generating tables - .25 hours
    Formatting and printing – 1+ man hours

 Analyzing full banner for new segmentation
    Requires entire research team, 6+ man hours
17



How to Speed it up
 Producing new segmentation variable for each respondent
     .5 man hour – Not the bottleneck


 Producing new banners
     Generating tables - .25 hours – Not the bottleneck
     Formatting and printing – 1+ man hours – Potential for Automation


 Analyzing full banner for the new segmentation
     Requires entire research team, 6+ man hours – workflow bottleneck
     Ideas / brainstorm
       Criteria of success is often vague
       When the goal is well defined quant methods can increase efficiency
          If you can formalize it you can solve it
       Time invested in the planning phase will reap productivity gains during analysis
18


Hypothetical Case Study
 Goals Brainstorm:
   Client and previous research says:
    “segmentation should differentiate enthusiasts (early adopters) and utility
     consumers (late adopters)”
    “also, segmentation should include demographics that are known to influence
     technology adoption.
       Age, Gender, Income, Education



   Quant answers:
    “Ok, lets write a battery of questions addressing consumers perceptions and
     relation to technology products – this will be distilled into a single „tech
     enthusiasm‟ measure.
    “Also, all relevant demographic information can be reduced into a one (or more)
     demo factors
    “Segments will be defined from a „reduced dimensionality‟ representation of the
     data (MCA)”
19
Hypothetical Case Study
20
Hypothetical Case Study
          Categories graph
21
Hypothetical Case Study
          Combined graph
22
Hypothetical Case Study
23



MCA for Segmentation
 (2006). An extension of multiple correspondence analysis for identifying
  heterogeneous subgroups of respondents

 (2010). Traveler segmentation strategy with nominal variables through
  correspondence analysis

 (2010). Fuzzy cluster multiple correspondence analysis

 (2010). Simultaneous two-way clustering of multiple correspondence
  analysis

 (2005). A simultaneous approach to constrained multiple correspondence
  analysis and cluster analysis for market segmentation

 (2002). Analysis of categorical marketing data by generalized constrained
  multiple correspondence analysis
24




Further Directions
 Extension to Multiple Correspondence Analysis
   Methods that let us combine nominal, numeric, and ordinal
    variables
   Methods that let us group variables into sets.
    E.g. could ensures that psychographic, behavioral and demographic
     have an equal influence on the final solution.


 Methods that simultaneously preform dimensionality
  reduction and cluster discovery
   Optimizes the entire analysis to discover the most distinctive
    clusters
   Very promising approach
       Con: I have not found an implementation of these methods.

More Related Content

What's hot

Using Problem-Specific Knowledge and Learning from Experience in Estimation o...
Using Problem-Specific Knowledge and Learning from Experience in Estimation o...Using Problem-Specific Knowledge and Learning from Experience in Estimation o...
Using Problem-Specific Knowledge and Learning from Experience in Estimation o...Martin Pelikan
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
 
11.selection method by fuzzy set theory and preference matrix
11.selection method by fuzzy set theory and preference matrix11.selection method by fuzzy set theory and preference matrix
11.selection method by fuzzy set theory and preference matrixAlexander Decker
 
Selection method by fuzzy set theory and preference matrix
Selection method by fuzzy set theory and preference matrixSelection method by fuzzy set theory and preference matrix
Selection method by fuzzy set theory and preference matrixAlexander Decker
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithmDr Sandeep Kumar Poonia
 
AHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in Excel
AHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in ExcelAHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in Excel
AHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in ExcelMegha Ahuja
 
Figure 1
Figure 1Figure 1
Figure 1butest
 
352735322 rsh-qam11-tif-03-doc
352735322 rsh-qam11-tif-03-doc352735322 rsh-qam11-tif-03-doc
352735322 rsh-qam11-tif-03-docFiras Husseini
 
352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-doc352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-docFiras Husseini
 
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...Universitas Pembangunan Panca Budi
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationKomal Kotak
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysisGreg Makowski
 
Segmentation for Targeting
Segmentation for TargetingSegmentation for Targeting
Segmentation for TargetingMarcelo Salup
 

What's hot (20)

Malhotra05
Malhotra05Malhotra05
Malhotra05
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Using Problem-Specific Knowledge and Learning from Experience in Estimation o...
Using Problem-Specific Knowledge and Learning from Experience in Estimation o...Using Problem-Specific Knowledge and Learning from Experience in Estimation o...
Using Problem-Specific Knowledge and Learning from Experience in Estimation o...
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
11.selection method by fuzzy set theory and preference matrix
11.selection method by fuzzy set theory and preference matrix11.selection method by fuzzy set theory and preference matrix
11.selection method by fuzzy set theory and preference matrix
 
Selection method by fuzzy set theory and preference matrix
Selection method by fuzzy set theory and preference matrixSelection method by fuzzy set theory and preference matrix
Selection method by fuzzy set theory and preference matrix
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithm
 
Uncertainty Management
Uncertainty ManagementUncertainty Management
Uncertainty Management
 
AHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in Excel
AHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in ExcelAHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in Excel
AHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in Excel
 
Figure 1
Figure 1Figure 1
Figure 1
 
352735322 rsh-qam11-tif-03-doc
352735322 rsh-qam11-tif-03-doc352735322 rsh-qam11-tif-03-doc
352735322 rsh-qam11-tif-03-doc
 
Df24693697
Df24693697Df24693697
Df24693697
 
Malhotra06
Malhotra06Malhotra06
Malhotra06
 
352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-doc352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-doc
 
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
 
ADAN Symposium
ADAN SymposiumADAN Symposium
ADAN Symposium
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian Classification
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysis
 
Segmentation for Targeting
Segmentation for TargetingSegmentation for Targeting
Segmentation for Targeting
 

Similar to Optimizing Market Segmentation

DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125Displayr
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Greg Makowski
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterIOSR Journals
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualizationVini Vasundharan
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progressoveesingh
 
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral CodesObserving Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codesjie cao
 
On building more human query answering systems
On building more human query answering systemsOn building more human query answering systems
On building more human query answering systemsINRIA-OAK
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
22_RepeatedMeasuresDesign_Complete.pptx
22_RepeatedMeasuresDesign_Complete.pptx22_RepeatedMeasuresDesign_Complete.pptx
22_RepeatedMeasuresDesign_Complete.pptxMarceloHenriques20
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Frank Kienle
 
Enlister baidu's recommender system for the biggest chinese q&a website
Enlister baidu's recommender system for the biggest chinese q&a websiteEnlister baidu's recommender system for the biggest chinese q&a website
Enlister baidu's recommender system for the biggest chinese q&a websitejasonfuoo
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 

Similar to Optimizing Market Segmentation (20)

DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
 
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral CodesObserving Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
 
On building more human query answering systems
On building more human query answering systemsOn building more human query answering systems
On building more human query answering systems
 
modeling.ppt
modeling.pptmodeling.ppt
modeling.ppt
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
22_RepeatedMeasuresDesign_Complete.pptx
22_RepeatedMeasuresDesign_Complete.pptx22_RepeatedMeasuresDesign_Complete.pptx
22_RepeatedMeasuresDesign_Complete.pptx
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
 
Enlister baidu's recommender system for the biggest chinese q&a website
Enlister baidu's recommender system for the biggest chinese q&a websiteEnlister baidu's recommender system for the biggest chinese q&a website
Enlister baidu's recommender system for the biggest chinese q&a website
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Optimizing Market Segmentation

  • 2. 2 Why is Segmentation Difficult?  Infinite number of possible solutions  Hundreds of possible variables for to use  Clearly defined clusters are rarely present in real life data- sets
  • 3. 3 Technical Challenges  Challenge: Incorporating fundamentally categorical variables  Ethnicity, Religion, Political Party, Etc.  Standard methods assume continuous data (ideal case) and require interval level data as worst case (e.g. ratings scales)  Correlation, linear regression, k-means clustering
  • 4. 4 Technical Solutions  Challenge: Incorporating fundamentally categorical variables  Multiple Correspondence Analysis (factor analysis for categorical data)  Pro: handles both demographic (categorical) and ratings variables  Would allow treating sets of variables separately (i.e. demographic, behavioral, psychological) – these sets could be used as inputs to clustering method  Con: segmentation would be based on extracted components
  • 5. 5 Technical Challenges  Determining the number of clusters/segments in the data  Standard methods require the user to specify the number of cluster to extract  Our standard practice results in fewer clusters then input variables  e.g. AMC segmentation solutions required ~12 variables to find ~5 segments  This ratio of features-to-segments will „water-down‟ the effect of the individual variables (segments do not differ significantly on most items)
  • 6. 6 Technical Solution 1  Challenge: Determining the number of clusters/segments in the data  Solution: fit a probabilistic mixture model and compute a complexity penalized likelihood (AIC / BIC scores)  The model with the best AIC / BIC score is our best guess for the number of natural clusters in the data  Gaussian mixture models for continuous data  Latent Class Models for categorical data  Latent class models can handle both categorical and continuous data if the continuous data is binned.  Both of the above return BIC scores to determine the number of clusters
  • 7. 7 Technical Solution 1  How many clusters do you see? (4 sources generated the data –duh)
  • 8. 8 Technical Solution 1  The BIC infers 4 clusters (4 clusters solution had the best BIC score)
  • 9. 9 Technical Solution 1  How many clusters do you see? (4 sources generated the data –not so obvious)
  • 10. 10 Technical Solution 1  The BIC says 4! (4 clusters had the best BIC score, thanks BIC!)
  • 11. 11 Technical Solution 2  Challenge: Determining the number of clusters/segments in the data  Solution: ensure there are fewer input variables then extracted clusters 2(+) segments can be obtained from a single variable. That is a 2-1 ratio of segments-to- variables For AMC & MTV we got 5 segments from ~12 variables. A ratio of 0.4-1. - That is less then 1 segments for every two variables…  Also See: Van Buuren & Heiser (1989); Vichi & Kiers (2001); Hwang, Dillon, & Takane (2006).
  • 12. 12 Technical Challenges  Respondents vary in their use of ratings scales  Some respondents only use part of the scale,  Either top or bottom of range  Segmentation method will find the high/low scale-use respondents and define segments for them  See AMC segments,
  • 13. 13 Psychographic banner for AMC segments. These items were not used to define the cluster solution.
  • 14. 14 Technical Solution 1  Challenge: Respondents vary in their use of ratings scales  Calibrate respondents to equate ratings scale across sample  Overcoming Scale Use Heterogeneity (2003) Peter E. Rossi  Pro: Improves the accuracy and validly of standard methods  E.g. correlation, regression, clustering  Con: requires complex and computational expensive models  i.e. hierarchical bayesian models – available as R package
  • 15. 15 Technical Solution 2  Challenge: Respondents vary in their use of ratings scales  Abandon rating scales – use simple Agree/Disagree variables  Focus on methods for categorical variables  Multiple Correspondence Analysis (factor analysis for categorical data)  Pro: handles both demographic (categorical) and ratings variables  Would allow treating sets of variables separately (i.e. demographic, behavioral, psychological) – these sets could be used as inputs to clustering methods
  • 16. 16 What Slows Us Down?  Each segmentation iteration consumes resources  Producing new segmentation variable for each respondent  .5 man hour  Producing new banners  Generating tables - .25 hours  Formatting and printing – 1+ man hours  Analyzing full banner for new segmentation  Requires entire research team, 6+ man hours
  • 17. 17 How to Speed it up  Producing new segmentation variable for each respondent  .5 man hour – Not the bottleneck  Producing new banners  Generating tables - .25 hours – Not the bottleneck  Formatting and printing – 1+ man hours – Potential for Automation  Analyzing full banner for the new segmentation  Requires entire research team, 6+ man hours – workflow bottleneck  Ideas / brainstorm  Criteria of success is often vague  When the goal is well defined quant methods can increase efficiency  If you can formalize it you can solve it  Time invested in the planning phase will reap productivity gains during analysis
  • 18. 18 Hypothetical Case Study  Goals Brainstorm:  Client and previous research says:  “segmentation should differentiate enthusiasts (early adopters) and utility consumers (late adopters)”  “also, segmentation should include demographics that are known to influence technology adoption.  Age, Gender, Income, Education  Quant answers:  “Ok, lets write a battery of questions addressing consumers perceptions and relation to technology products – this will be distilled into a single „tech enthusiasm‟ measure.  “Also, all relevant demographic information can be reduced into a one (or more) demo factors  “Segments will be defined from a „reduced dimensionality‟ representation of the data (MCA)”
  • 20. 20 Hypothetical Case Study Categories graph
  • 21. 21 Hypothetical Case Study Combined graph
  • 23. 23 MCA for Segmentation  (2006). An extension of multiple correspondence analysis for identifying heterogeneous subgroups of respondents  (2010). Traveler segmentation strategy with nominal variables through correspondence analysis  (2010). Fuzzy cluster multiple correspondence analysis  (2010). Simultaneous two-way clustering of multiple correspondence analysis  (2005). A simultaneous approach to constrained multiple correspondence analysis and cluster analysis for market segmentation  (2002). Analysis of categorical marketing data by generalized constrained multiple correspondence analysis
  • 24. 24 Further Directions  Extension to Multiple Correspondence Analysis  Methods that let us combine nominal, numeric, and ordinal variables  Methods that let us group variables into sets.  E.g. could ensures that psychographic, behavioral and demographic have an equal influence on the final solution.  Methods that simultaneously preform dimensionality reduction and cluster discovery  Optimizes the entire analysis to discover the most distinctive clusters  Very promising approach  Con: I have not found an implementation of these methods.