Probabilistic Structural Equations - Bayesian Networks for the Analysis of a Perfume Market
Upcoming SlideShare
Loading in...5
×
 

Probabilistic Structural Equations - Bayesian Networks for the Analysis of a Perfume Market

on

  • 5,944 views

After a brief introduction of Bayesian Belief Networks, we describe how Probabilistic Structural Equations (PSE) can be induced by BayesiaLab to analyze a specific Perfume Market. We also describe the ...

After a brief introduction of Bayesian Belief Networks, we describe how Probabilistic Structural Equations (PSE) can be induced by BayesiaLab to analyze a specific Perfume Market. We also describe the Mutli-Quadrant Analysis (opportunity plots), a new analysis tool allowing taking into account the competitive position of each product\'s drivers for the computation of the optimal policies.

Statistics

Views

Total Views
5,944
Views on SlideShare
3,564
Embed Views
2,380

Actions

Likes
1
Downloads
107
Comments
0

11 Embeds 2,380

http://www.bayesia.com 2244
http://www.linkedin.com 43
http://translate.googleusercontent.com 39
http://www.slideshare.net 18
http://www.bayesia.fr 17
https://www.linkedin.com 9
http://local.bayesia.com 4
http://bayesia.com 3
http://74.125.153.132 1
http://bayesia.fr 1
http://www.lmodules.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Probabilistic Structural Equations - Bayesian Networks for the Analysis of a Perfume Market Probabilistic Structural Equations - Bayesian Networks for the Analysis of a Perfume Market Presentation Transcript

  • Plan Probabilistic Structural Equations Introduction Bayesian Networks Application Application to the Analysis of a Perfume Market Dr. Lionel JOUFFE August 2009 ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 1
  • BayesiaLab’s Probabilistic Structural Equations for Perfume Market Analysis Plan Introduction Bayesian Networks Application ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 2
  • Plan Introduction Bayesian Networks INTRODUCTION Application ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 3
  • Bayesian Networks A Computational Tool to Model Uncertainty Plan Based both on graph theory and on probability theory Introduction Bayesian Manual modeling through brainstorming: Networks probabilistic expert systems Application Induction by automatic learning: data analysis, data mining ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 4
  • Bayesian Networks 1763: Bayes’ Theorem Plan P(A|B) = P(B|A)P(A)/P(B) Introduction 1988: Judea Pearl Bayesian “Probabilistic Reasoning in Intelligent Systems: Networks of Networks Plausible Inference” Application 1996: “Microsoft's competitive advantage is its expertise in Bayesian networks”, Bill Gates 2004: ©2009 Bayesia SA Bayesian Machine Learning at the 4th rank among the 10 All rights reserved. Forbidden reproduction in whole or part Emerging Technologies That Will Change Your World without the Bayesia’s express written permission 5
  • Example of Probabilistic Reasoning Letter from the analysis laboratory Plan Introduction “You recently went to our laboratory for a screening test. The targeted rare disease has a prevalence of one person out of ten Bayesian thousand. We regret to inform you that this test, which has a Networks symmetric efficiency of 99%, is positive.” Application What is your feeling after reading this letter? Do you think that the probability that you are affected is 1%, 50% or 99% ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 6
  • Example of Probabilistic Reasoning Letter from the analysis laboratory Plan Among the 9 999 other persons, “99.99 persons” will receive a letter with a positive test result Introduction Bayesian Networks Application One person out of 10 000 is affected. He will receive “0.99 letter” with a positive test result ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 7
  • Example of Probabilistic Reasoning Letter from the analysis laboratory Plan - There is then a total of 0.99 + 99.99 letters with a positive test result Introduction - Probability to be affected when one Bayesian receives such letter: Networks 0.99/(0.99+99.99) = 0.98% Application ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 8
  • Example of Probabilistic Reasoning Letter from the analysis laboratory Plan Introduction Bayesian Networks Application ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 9
  • Plan Introduction Bayesian Networks BAYESIAN BELIEF NETWORKS Application ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 10
  • ... are made of Two Distinct Parts Plan Structure Directed Acyclic Graph (DAG), i.e. no directed loop Introduction Nodes represent the domain’s variables Bayesian Networks Arcs represent the direct probabilistic influences between Application the variables (possibly causal) Parameters Probability distributions are associated to each node, usually by using tables ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 11
  • ... are Powerful Inference Engines We get some evidence on the states of a subset of variables Hard positive evidence Plan Hard negative evidence Introduction Likelihoods Bayesian Networks Application Probability distributions (fixed or not) Mean values (fixed or not) We then want to take these findings into account in a rigorous way to update our belief on the states of the other variables Probability distributions on their values ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express Multi-Directional Inference (Simulation and/or Diagnosis) written permission 21 12
  • How to Build a Bayesian Network? Modeling by Brainstorming Productive exchange between experts that can ease the Plan consensus An Expert System with powerful computational and analytical abilities Introduction Modeling of rare or never occurred cases Bayesian Networks Automatic Modeling by Data Mining Application Probability estimation/updating of a network Structural learning and probability estimation Missing values Filtered/censored states Initial network proposed by experts Discovering of all the direct probabilistic relations Target node characterization - Supervised learning ©2009 Bayesia SA Data clustering All rights reserved. Forbidden reproduction in whole or part Variable clustering without the Bayesia’s express written permission 13 Probabilistic Structural Equations
  • Plan PROBABILISTIC STRUCTURAL EQUATIONS* Introduction - Bayesian Perfume Market Analysis Networks Applications * see “Probabilistic Structural Equations and Path Analysis - Part I” (http:// www.bayesia.com/en/products/bayesialab/resources/tutorials/probabilistic-structural- ©2009 Bayesia SA equations-I.php) for a detailed BayesiaLab’s tutorial describing the complete workflow to get All rights reserved. Forbidden Probabilistic Structural Equations reproduction in whole or part without the Bayesia’s express written permission 14
  • Perfume Market Analysis Questionnaire’s characteristics Plan To get an insight of the market (11 products), 1.300 monadic tests have been carried out (each woman has only evaluated one perfume). Introduction Bayesian 1 target variable, the Purchase Intent: 6 numerical states Networks 27 questions relative to the perfume : 10 numerical levels Applications considered as continuous values and discretized into 5 numerical states (equal distances) 19 questions relative to the woman wearing the perfume: 10 numerical levels considered as continuous values and discretized into 5 numerical states (equal distances) 1 Just About Right (JAR) question for the fragrance Intensity: 5 numerical states ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 15
  • Step 1: Unsupervised learning on the Manifest variables only Plan Introduction Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 16
  • Analysis of the arcs’ strength Plan Introduction Bayesian Networks Applications Here is the Kullback-Leibler Divergence associated to the arc, and its relative weight in the ©2009 Bayesia SA factorized representation of the Joint Probability All rights reserved. Forbidden reproduction in whole or part distribution without the Bayesia’s express written permission 17
  • Step 2: Variables’ Clustering to find the concepts Based on those Kullback-Liebler measures, 15 clusters are automatically proposed by the BayesiaLab’s variable clustering algorithm Plan Introduction Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 18
  • Step 2: Variables’ Clustering Plan Introduction Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 19
  • Step 3: Multiple Data Clustering By using the BayesiaLab’s Multiple-Clustering algorithm, we carry out data clustering on the implied subset of variables, for each cluster of variables. Plan Introduction Factor 0 is a new random variable summarizing these 5 Bayesian manifest variables Networks Factor 2 is a new Applications random variable that summarizes these 4 manifest variables Factor 1 is a new random variable that summarizes these 5 manifest variables ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express ..... written permission 20
  • Analysis of the Induced Factors: Factor 0 Based on the associated variables, we name this Factor “IS SELF-CONFIDENT” Plan Introduction Bayesian Networks 5 states have been automatically Applications created by the BayesiaLab’s Data Clustering algorithm. Here is the Marginal Distribution over those 5 states. ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 21
  • Analysis of the Induced Factors: Quality measurement of Factor 0 The state’s Purity is the mean When the purity is not of its posterior probabilities (given the 100%, the remaining probabilities Plan manifest variables), over all the points that have are used to define the probabilistic been associated to that state with the neighborhood maximum likelihood rule Introduction Bayesian Networks Applications The 2-dimensional representation of Factor 0. The bubble size is proportional to the prior probability, the darkness of the blue represents the state purity, and the bubble proximity is based on the probabilistic vicinity ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 22
  • Analysis of the Induced Factors: Quality measurement of Factor 0 The 5 states of Factor 0 summarize the Joint Probability Distribution over its 5 associated manifest variables. This Joint is a 5 dimensional hypercube, with 5 states per dimension, i.e. 5^5 cells = 3,125 probabilities Plan This probability density function is based on the database’s log- Introduction Likelihood returned by Factor 0’s network Bayesian Networks Applications The Contingency Table Fit measures the representation quality of the Joint Probability Distribution. 100% corresponds to the perfect representation with the fully connected network (no independence hypothesis), 0% corresponds to the representation with the fully unconnected network (no dependence hypothesis) ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 23
  • Analysis of the Induced Factors: Quality measurement of Factor 0 In the specific case of a Factor’s analysis, the dimension represented by that factor is not taken into account in the Joint. The Contingency Table Fit measures then the quality of the Joint’s summary realized by the Factor’s states Plan Introduction Bayesian Networks Applications Contingency Table Fit: 78.39% Contingency Table Fit: 85.04% The representation of the Joint (defined over the 5 manifest variables) with the 5 states latent variable Factor 0 is more precise than the one obtained with an unsupervised learning representing the direct probabilistic relations between the manifest variables ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 24
  • Analysis of the Induced Factors: Semantic analysis of Factor 0 The numerical value associated to each state corresponds to the mean value over the manifest variables when this latent state is observed (weighted by the relative significance of the manifest variables wrt that state). These values Plan allow to have a quick insight on the meaning of the state. For example, C3 corresponds to the lowest evaluations ... Introduction Bayesian Networks Applications ... whereas C5 corresponds to the highest ones ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 25
  • Analysis of the Induced Factors Plan Here is a table describing the Multiple Introduction Clustering key measures obtained during the data clustering of the 15 manifest variables’ clusters Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 26
  • Final Step: Unsupervised Learning on Manifest, Latent, and Target variables The “Probabilistic Structural Equation” has been obtained under some constraints: no arc from Manifests toward Factors no direct relation between Manifests no direct relation between the Target and Manifests Plan Introduction Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 27
  • Path Analysis: Focussing on Factor variables only The Path can be highlighted just by hiding the Manifest variables Plan As we can see, the Purchase Intent in only directly connected to one Latent variable, Introduction the “ADEQUACY” Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 28
  • Path Analysis: Focussing on Factor variables only Plan Factors’ Hierarchization by using the Standardized Total Effects (STE) Introduction Bayesian Networks Applications Graphical representation of each Factor’s influence on the Purchase Intent ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 29
  • Path Analysis: Focussing on Factor variables only Our Quadrant Analysis allows to get a concise view of the Factors’ hierarchy wrt the Purchase Intent. Whereas the Y-axis is based on the Standardized Total Effect (STE), the X-axis corresponds to the Factors’ mean value Plan Mean of the Mean Values Introduction Bayesian Networks Applications Mean of the STEs ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 30
  • Driver Analysis: Focussing on Manifest variables only Plan The Bayesian network representing the Probabilistic Structural Equation (PSE) has been learnt by using the Perfume Total Market (11 products) Introduction useful for understanding the Total Market Bayesian inappropriate for finding the levers that can be used to improve a Networks given product Applications To be able to analyze the products’ drivers, we define the Product variable as a BayesiaLab’s Breakout variable the PSE’s structure remains the same for all the products the PSE’s parameters (conditional probability tables) are estimated, for each perfume, on its corresponding subset of lines ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 31
  • Driver Analysis: Focussing on Manifest variables only Only a subset of Manifest variables can be used as Drivers. The PSE below masks the non-actionable variables Plan Introduction Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 32
  • Driver Analysis for Product 10 Plan Introduction Bayesian Networks Applications Due to non-linearity, the Standardized Total Effect (STE) does not reflect the importance of Intensity This graph highlights the non linear influence of Intensity on Purchase Intent (JAR variable) ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 33
  • Driver Analysis for Product 10 Note that STE is only proposed in BayesiaLab for some analysis tools. This is not a measure used for learning Bayesian networks (BN). As the states are discrete, the learning algorithms are not sensitive to linearity. Plan The analysis below ranks the Drivers wrt the Mutual Information criterion. Introduction As we can see, Intensity is now in the 4th position Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 34
  • Driver Analysis for Product 10 To be able to use STE properly, we can use BayesiaLab to linearize Intensity. It will then associate numerical values to the states in order to get a positive linear relation (sorting of the states wrt to their relation to Purchase Intent). Plan Introduction Bayesian Networks Applications Intensity is now in the 4th position with STE and with the Slopes in the Graphical representation ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 35
  • Driver Analysis for Product 10 Quadrant based on the potential Drivers Plan 1 2 Introduction Bayesian Networks Applications 4 3 Usually this kind of quadrant can be used to quickly see what the Drivers to prioritize are 1: Concentrate here 2: Keep on the good work 3: Possible overkill 4: Low priority ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 36
  • Driver Analysis for Product 10 However, this kind of interpretation is not appropriate here. Indeed, quadrants are defined with the means (STEs and Mean Values) of the studied product. Even if a variable is located in Quadrants 1 or 4, its value can be the highest of the Total Market. Conversely, variables belonging to Quadrants 2 and 3 can also have low values compared with the other products. Plan Introduction Thanks to the scales associated to each Bayesian variable, this new BayesiaLab’s Quadrant allows to quickly have an insight on how the Networks variables are ranked wrt the other products. Product 10 has the best Intensity value, but a Applications poor Flowery value (lower than the mean value over the products) ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 37
  • Driver Analysis for Product 10 Plan By hovering over the point, it is possible to have a specific view of the Introduction variable values for all the products. The best ranked product on Flowery is then Product 11, the Bayesian worse one being Product 1 Networks Applications This Multiple-Quadrant tool allows to export the variation percentage needed to reach the best market value, for each product and each variable. For Product 10, we need to apply a 10.02% increase on the Flowery mean to reach Product 11’s level. ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 38
  • Driver Analysis for Product 10 We use our Target Dynamic Profile tool to estimate the most realistic action policy. Here are the optimization parameters: maximize the Purchase Intent Mean value take into account the Joint Probability of the actions take the costs into account (1 per action consisting in reaching the max authorized value) Plan “Soft Increase” of the drivers’ mean by taking into account the exported variation values Introduction Bayesian Networks Applications The induced policy is !"(%$ then to work on Flowery, then Feminine, ...., !"($ and Fruity, to increase the Purchase Intent Value !"'%$ from 3.65 to 3.92. The Joint is 50.35%, which means that !"'$ half of those product evaluations corresponds to this !"&%$ setting. The column “Value/Mean at T” indicates the !"&$ ©2009 Bayesia SA impact of each action on the other drivers. As we !"#%$ All rights reserved. Forbidden reproduction in whole or part see, those impacts reduce the cost for !"#$ without the Bayesia’s express written permission the actions. )$*+,-+,$ ./-01+2$ .13,4,41$ 5+,6,47/$ 81479,-:;$ .+:,<2$ 39
  • Driver Analysis for Product 10 Plan Introduction Bayesian Networks Applications Here is the complete policy over all the drivers. The BayesiaLab’s Soft Increase allows to get a targeted mean value by using the closest probability distribution to the initial one. It then means that the corresponding action should be the easiest one, as it is close to the current state ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 40
  • Driver Analysis for Product 10 Plan Introduction Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 41
  • Driver Analysis for Product 5 Let’s compute the same Driver Analysis for Product 5 Plan Introduction Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 42
  • Driver Analysis for Product 5 Plan Introduction Bayesian Networks Applications ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 43
  • Contact Plan Address Introduction BAYESIA SA 6 rue Léonard de Vinci BP0119 Bayesian 53001 LAVAL Cedex Networks France Application Contact Dr. Lionel JOUFFE Managing Director / Cofounder Tel.: +33(0)243 49 75 58 Mobile: +33(0)607 25 70 05 Fax: +33(0)243 49 75 83 ©2009 Bayesia SA All rights reserved. Forbidden reproduction in whole or part without the Bayesia’s express written permission 44