Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DIY Driver Analysis Webinar slides

2,782 views

Published on

Slides from Driver Analysis Webinar presented by Tim Bock on 29 March, 2017.

Published in: Data & Analytics
  • 7 Sacred "Sign Posts" From The Universe Revealed. Discover the "secret language" the Universes uses to send us guided messages and watch as your greatest desires manifest before your eyes. Claim your free report. ♥♥♥ https://tinyurl.com/y6pnne55
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 8 Ways to Communicate with Your Guardian Angels... ▲▲▲ http://ishbv.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Download The Complete Lean Belly Breakthrough Program with Special Discount. ♥♥♥ http://scamcb.com/bkfitness3/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Doctor's 2-Minute Ritual For Shocking Daily Belly Fat Loss! Watch This Video ♣♣♣ https://tinyurl.com/bkfitness4u
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • This was an excellent presentation
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

DIY Driver Analysis Webinar slides

  1. 1. TIM BOCK PRESENTS Examples in Q5 (beta) If you have any questions, enter them into the Questions field. Questions will be answered at the end. If we do not have time to get to your question, we will email you. We will email you a link to the slides and data. Get a free one-month trial of Q from www.q-researchsoftware.com. DIY Driver Analysis
  2. 2. Overview • Objectives of (key) driver analysis • Overview of techniques • 13 assumptions that need to be checked when doing QA for driver analysis (there are workarounds for most of these) 1: There are 15 or fewer predictors (if using Shapley) 2: The outcome variable is monotonically increasing 3: The outcome variable is numeric (if using Shapley) 4: The predictor variables are numeric or binary 5: People do not differ in their needs/wants (segmentation) 6: The causal model is plausible 7: There is no multicollinearity/correlations between predictors (if using GLMs) 8: There are no unexpected correlations between the predictors and the outcome variable 9: The signs of the importance scores are correct 10: The predictor variables have no missing values 11: There are no outliers/influential data points 12: There is no serial correlation (aka autocorrelation) 13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable) 2
  3. 3. The basic objective of (key) driver analysis The basic objective: work out the relative importance of a series of predictor variables in predicting an outcome variable. For example: • NPS: comfort vs customer service vs price. • Customer satisfaction: wait time vs staff friendliness vs comfort. • Brand preference: modernity vs friendliness vs youthfulness. What driver analysis is not: predictive analysis (e.g., predicting sales, customer churn). Although, you can use driver analysis to make strategic predictions (e.g., if I improve, say, fun, then preference will increase.) 3
  4. 4. Likelihood to recommend This brand is fun This brand is exciting This brand is youthful 6 1 1 1 9 0 1 0 7 0 0 0 6 1 1 1 9 0 1 0 7 0 0 1 7 0 0 0 What the data looks like This data shows 7 observations 1 outcome variable 4 Predictor variables (Typically there will be more than 3.)
  5. 5. Case study 1: Technology Outcome variable(s) Predictor variable(s) Likelihood to recommend: • Apple • Microsoft • IBM • Google • Intel • Hewlett-Packard • Sony • Dell • Yahoo • Nokia • Samsung • LG • Panasonic Brand associations: • Fun • Worth what you pay for • Innovative • Good customer service • Stylish • Easy-to-use • High quality • High performance • Low prices 5
  6. 6. ID Apple Microsoft IBM Apple Microsoft IBM Apple Microsoft IBM 1 6 9 7 1 0 0 1 1 0 2 8 7 7 1 0 0 1 0 0 3 0 9 8 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 This brand is fun This brand is exciting Likelihood to recommend ID Brand Likelihood to recommend This brand is fun This brand is exciting 1 Apple 6 1 1 1 Microsoft 9 0 1 1 IBM 7 0 0 2 Apple 6 1 1 2 Microsoft 9 0 1 2 IBM 7 0 0 3 Apple 6 1 1 3 Microsoft 9 0 1 3 IBM 7 0 0 4 Apple 6 1 1 4 Microsoft 9 0 1 4 IBM 7 0 0 The data (stacked) From: one row per respondent To: one row per brand per respondent 6
  7. 7. Tips for stacking • Get an SPSS .SAV data file. If you do not have an SPSS file: • Import your data the usual way • Tools > Save Data as SPSS/CSV and Save as type: SPSS • Re-import • Tools > Stack SPSS .sav Data File • Set the labels for the stacking variable (in Q: observation) in Value Attributes • Delete any None of these data (e.g., brand associations where respondents were able to select None of these 7
  8. 8. Case study 2: Cola brand attitude Outcome variable(s) 34 Predictor variable(s) If the brand was a person, what would its personality be? Hate/Dislike/Neither/ Like/Love/Don’t know: • Coke Zero • Coke • Diet Coke • Diet Pepsi • Pepsi Max • Pepsi Brand associations: • Beautiful • Carefree • Charming • Confident • Down-to-earth • Feminine • Fun • Health-conscious • Hip • Honest • Humorous • Imaginative • Individualistic • Innocent • Intelligent • Masculine • Older • Open to new experiences • Outdoorsy • Rebellious • Reckless • Reliable • Sexy • Sleepy • Tough • Traditional • Trying to be cool • Unconventional • Up-to-date • Upper-class • Urban • Weight-conscious • Wholesome • Youthful 8
  9. 9. 9 Example output: Importance scores Key drivers of cola preference
  10. 10. 10 Example output: Performance- Importance Chart (aka Quad Chart)
  11. 11. Example output: Correspondence Analysis with Importance 11
  12. 12. Overview • Objectives of (key) driver analysis • Overview of techniques • 13 assumptions that need to be checked when doing QA for driver analysis (there are workarounds for most of these) 1: There are 15 or fewer predictors (if using Shapley) 2: The outcome variable is monotonically increasing 3: The outcome variable is numeric (if using Shapley) 4: The predictor variables are numeric or binary 5: People do not differ in their needs/wants (segmentation) 6: The causal model is plausible 7: There is no multicollinearity/correlations between predictors (if using GLMs) 8: There are no unexpected correlations between the predictors and the outcome variable 9: The signs of the importance scores are correct 10: The predictor variables have no missing values 11: There are no outliers/influential data points 12: There is no serial correlation (aka autocorrelation) 13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable) 12
  13. 13. Standard “best practice” recommendation for driver analysis: LMG Lindeman, Merenda, Gold (1980) = Kruskal Kruskal (1987) = Dominance Analysis Budescu (1993) = Shapley / Shapley Value Lipovetsky and Conklin(2001) The average improvement in R² that a predictor makes across all possible models (aka “Shapley”)
  14. 14. 14 Best practice: Bespoke models (e.g., Bayesian multilevel model) Bivariate metrics E.g., Correlations, Jaccard Coefficients Shapley, Relative Importance Analysis Much too hard Too hard Too Soft Just Right GLMs (e.g., linear regression)
  15. 15. What makes bespoke models and GLMs too hard? To estimate an OK bespoke model, you need to have a few week, and know lots of things, including: • Joint interpretation of parameter estimates, the predictor covariance matrix, and the parameter covariance matrix • Conditional effects • Multicollinearity • Confounding (e.g., suppressor effects) • Estimation (ML, Bayesian) • Specification of informative priors • Specification of random effects 15 To understand importance in a GLM (e.g., linear regression), you need to know quite a lot about: • Joint interpretation of parameter estimates, the predictor covariance matrix, and the parameter covariance matrix • Conditional effects • Multicollinearity • Confounding (e.g., suppressor effects) Shapley and similar methods allow us to be less careful when interpreting results
  16. 16. 16 Bespoke models & GLMs Proportional Marginal Variance Decomposition Shapley With coefficient adjustment Lipovetsky and Conklin(2001) Random Forest (for importance analysis) Kruskal’s Squared partial correlation Called Kruskal in Q Relative Importance Analysis AKA Relative Weight: Johnson (2000) Shapley
  17. 17. Shapley case study • Open Initial.Q. This already contains the cola data. • File > Data Sets > Add to Project > From File > Stacked Technology • Create > Regression > Driver (Importance) Analysis > Shapley • Dependent variable: Q3. Likelihood to recommend [Stacked Technology] • Dependent variable: Q4 variables from Stacked Technology • No when asked about confidence intervals (clicking Yes is OK as well) • Note that High Quality is the most important, with a score of 18.2 • Right-click: Reference name: shapley Everything I demonstrate in this webinar is described on a slide like this. The rest of them are hidden in this deck, but you can get them if you download the slides. So, there is no need to take detailed notes. 17 Instructions for the case studies
  18. 18. Overview • Objectives of (key) driver analysis • Overview of techniques • 13 assumptions that need to be checked when doing QA for driver analysis (there are workarounds for most of these) 1: There are 15 or fewer predictors (if using Shapley) 2: The outcome variable is monotonically increasing 3: The outcome variable is numeric (if using Shapley) 4: The predictor variables are numeric or binary 5: People do not differ in their needs/wants (segmentation) 6: The causal model is plausible 7: There is no multicollinearity/correlations between predictors (if using GLMs) 8: There are no unexpected correlations between the predictors and the outcome variable 9: The signs of the importance scores are correct 10: The predictor variables have no missing values 11: There are no outliers/influential data points 12: There is no serial correlation (aka autocorrelation) 13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable)18
  19. 19. 1: There are 15 or fewer predictors (if using Shapley) 19 Options (ranked from best to worst) Comments Relative Importance Analysis Tends to give almost identical results to Shapley regression, and much, much, faster to compute. Shapley using only a subset of models (e.g., all predictors, leave 1 out predictor, 2 predictors, 1 predictor) There is no evidence that this method produces accurate estimates of Shapley importance. This is likely to be less similar to a proper Shapley than using Relative Importance Analysis. Dimension reduction (e.g., PCA) Hard to interpret. Issue Shapley Regression cannot be computed with more 15 predictors in Q (it reverts to Relative Importance Analysis) If you run Shapley Regression in the R package relimpo with this many variables you get a crash.
  20. 20. 1: There are 15 or fewer predictors (if using Shapley) • With the cola study, we have 34 variables, and that will take an infinite amount of time to compute, so using Shapley is not an option and we have to use Relative Importance Analysis. • We can use the technology data set, which only has 9 predictors, to explore how similar the techniques are. • Create > Regression > Linear Regression • Reference name: relative.importance • Select variables • Output: Relative importance analysis • Check Automatic Note that High Quality is again most important • Right-click: Add R Output: comparison = cbind(shapley = shapley[-10], "Relative Importance" = relative.importance$relative.importance$importance) • Calculate • Change shapley to shapley[-10] • Calculate • Right-click: Add R Output: correlation = cor(comparison) • Increase number of decimal places. Note the correlation is 0.999 • Rename output: Correlation • Insert > Charts > Visualization > Labeled Scatterplot, • Table: comparison • Automatic 20 Instructions for the case studies
  21. 21. 2: The outcome variable is monotonically increasing 21 Options (not mutually exclusive) Comments Set Don’t Knows to missing Merge categories • Do this when there are categories that have ambiguous orderings (e.g., OK and Good). • The more categories you merge, the less significant the results will be. Recode the data in some meaningful way (e.g., reverse the scale, Likelihood to recommend, recoded as NPS) The specific values tend to make little difference, so using a recoding that is easy to explain to stakeholders, such as NPS, is often desirable. Issue All the standard driver analysis algorithms assume that the outcome variable contains categories ordered from lowest to highest, and which are believed to be associated with greater levels of preference. Test This is usually best checked by creating a summary table.
  22. 22. 2: The outcome variable is monotonically increasing • Right-click: Add Table • Blue drop-down: Likelihood to recommend [Stacked Technology.sav] Other than the NET, which is ignored by analyses, the order is correct. • Click on table of outcome variables in Technology. Note that there is no problem. • Click on Brand preference Note that here we have a Don’t Know. • Right-click on Don’t Know and press Remove. This sets it as missing values. • Click on two grey outputs (just setting up a later analysis) 22 Instructions for the case studies
  23. 23. 3: The outcome variable is numeric (if using Shapley) 23 Options (ranked from best to worst) Comments Use limited dependent variable versions of Relative Importance Analysis (e.g., Ordered Logit) • The less numeric the variable, the better this option is. • This approach is also preferable because it can take non-linear relationships into account automatically. Ignore the problem and use Shapley. Where the variable is close to being numeric, there is probably little lost by this approach. Issue Shapley assumes that the outcome variable is numeric (theoretically, it can deal with non-numeric outcome variables, but for more than about 10 or so variables, it is impractical).
  24. 24. 3: Numeric outcome • Select relative.importance • Type: Ordered logit • Select chart We can see that the correlation between the models is smaller than before. This is because correctly modeling the data type has a bigger impact on our results than the difference between Shapley and Relative importance analysis. • Note that the changes are pretty trivial. 24 Instructions for the case studies
  25. 25. 4: The predictor variables are numeric or binary 25 Options (not mutually exclusive) Comments Set Don’t Knows to missing This can be problematic as the variables as the missing values may not be missing at random. This is discussed later. Merge categories • Do this when there are categories that have ambiguous orderings (e.g., OK and Good). • The more categories you merge, the less significant the results will be. Recode the data in some meaningful way (midpoint recoding) Use a bespoke or Generalized Linear Model (GLM), with dummy variables and/or splines, computing importance as the difference between the lowest and largest effect sizes for each variable. In theory this is the best approach to dealing with non-numeric data, but it requires quite a lot to get right and, when interpreting the data, the sampling error of the categorical and spline effects will make them hard to compare. Issue Both Shapley and Relative Importance Analysis assume that the predictor variables are numeric or binary.
  26. 26. 5: People do not differ in their needs/wants (segmentation) 26 Options (not mutually exclusive) Comments Estimate an appropriate bespoke model (e.g., latent class analysis) and then estimate the driver analysis models within each segment In Q: In a non-stacked data file, set up the data as an Experiment, and use Create > Segment > Latent Class Analysis Form segments by judgment, and estimate separate relative importance analyses for each segment. Ignore the problem, interpreting results as “average” effects Rightly-or-wrongly, this is how 99.9%* of all modelling is done. * Made-up number Issue Traditional driver analysis techniques assume that people have the same needs/wants, and apply these consistently from situation to situation. How to test • Compare by brand • Compare by other data • Latent class analysis
  27. 27. 5: People do not differ in their needs/wants (segmentation) • Select the Shapley Importance for Q3… table and press Duplicate • Brown drop-down: Brand [Stacked Technology.sav] • Note that this table is showing difference in important scores by segment. It is not showing difference in perceptions. • The colors and arrows are telling us that there are differences by brand. Look at Fun. This is significantly lower for Intel, HP, and Dell. But, really important in the evaluation of Google. • What do we do now? Let’s return to the previous slide and look at the options. 27 Instructions for the case studies
  28. 28. 6: The causal model is plausible 28 Options (not mutually exclusive) Comments Build a bespoke model This is usually too hard Include all the relevant (non-outcome) variables and cross your fingers (if you have not collected the data, you cannot magic it into existence) Rightly-or-wrongly, this is how 99.9%* of all modelling is done * Made-up number Issue All driver analysis techniques assume that the analysis is a plausible explanation of the causal relationship between the predictor variables and the outcome variable. This assumption is never true. How to test Common sense. Four common examples are shown on the next slides.
  29. 29. Example causality problem: Omitted variable bias If we fail to include a relevant predictor variable, and that variable is correlated with the predictor variables that we do include, the estimates of importance will be wrong. If your R-square is less than 0.9, you may have this problem (a typical R-square is closer to 0.2 than 0.9). Predictor 1 Predictor 2 Predictor 3 Predictor 4 E.g., price Outcome 1 Assumed predictor variables Arrows denote the true causal relationship 29
  30. 30. Example causality problem: Outcome variable included as a predictor 30 Predictor 1 Predictor 2 Predictor 3 Outcome 2 E.g., Satisfaction Outcome 1 E.g., NPS Assumed predictor variables If we include a predictor variable that is really an outcome variable, the estimates of importance will be wrong. Arrows denote the true causal relationship
  31. 31. 6: Outcome variable a predictor • Click on relative.importance • Worth what paid for is clearly an example of this problem, so should be deleted. • Remove Worth what paid for from the model • Click on Shapley Importance for Q3. Likelihood to recommend • Click on Comparisons. Note • The warning • The R-square is back in the model. • Change in the R code -10 to -9 31 Instructions for the case studies
  32. 32. Example causality problem: Backdoor path 32 Predictor 1 E.g., price perception Predictor 2 E.g., quality Predictor 3 E,g., packaging Variable E.g., Attitude Outcome 1 E.g., NPS Assumed predictor variables If backdoor path exists from the predictors to the outcome variable, the estimates of importance will be wrong (spurious). Arrows denote the true causal relationship
  33. 33. Example causality problem: Functional form 33 Assumed functional form If we have the wrong functional form (i.e., assumed equation), the estimates of importance will be wrong. Arrows denote the true causal relationship Outcome = Predictor 1 + Predictor 2 + Predictor 3 Outcome = Predictor 1 × Predictor 2 + Predictor 3 True functional form
  34. 34. 7: There is no multicollinearity/correlations between predictors (if using GLMs, e.g., linear regression) 34 Options (ranked from best to worst) Comments Take all the relevant theory into account when interpreting the results. This requires a strong technical and intuitive understanding of the underlying maths. Even if you possess that understanding, it is really difficult to explain to clients (particularly if it is a tracking study and they are seeing results fluctuate from period-to-period) Use Shapley or Relative Importance Analysis. These techniques are designed to address this problem. They are not perfect, but they are easier to interpret than linear regression and other GLMs when predictor variables are correlated. Issue The bigger the correlations between predictors, the more difficult it is to accurately interpret estimates from traditional GLMs (e.g., linear regression) Test 1. Inspect the Variance Inflation Factors (VIF) or Generalized Variance Inflation Factors (GVIF). Q automatically computes these and warns you if they are high. 2. Inspect the coefficients. Do they make sense? 3. Look at the correlations.
  35. 35. 7: There is no multicollinearity • Click on relative.importance • Uncheck Automatic • Change it back to Summary. Now it is showing an order logit regression. • Type to Linear. • Note that we have a negative effect for Stylish. That is, the model says that if we hold everything else constant, an improvement in Stylish will make a reduction in likelihood to recommend. This doesn’t seem to make sense. • Create > Correlation > Correlation Matrix. Select • Q3. Likelihood to recommend • Q4. • Note that there is a moderate correlation between Stylish and Likelihood to recommend. And, Stylish is correlated with everything out. • My intuition is that this is all some weird and uninteresting quirk in the data. But, in truth I don’t know. • Conclusion: we should be using Shapley or Relative Importance Analysis. It is designed to remove such headaches. • Output: Relative Importance Analysis 35 Instructions for the case studies
  36. 36. 8: There are no unexpected correlations between the predictors and the outcome variable 36 Options (ranked from best to worst) Investigate the data to make sense of the unexpected relationships. Remove problematic variables from the analysis. Issue When people interpret importance scores, they assume that higher means better. This is assumption is not always right. Test Correlate each predictor variable with the outcome variable
  37. 37. 8: There are no unexpected correlations • Click back on the correlation matrix. As we discussed earlier, these all look fine. • Now, let’s look at the cola correlations. I computed them before, but they will need to re-run because we removed the Don’t Know category from the Outcome Variable. • Click on cola.correlations • You can see we have some negative correlations in the first column, which shows the correlations with the outcome variable. 37 Instructions for the case studies
  38. 38. 9: The signs of the importance scores are correct 38 Recommendation If all the effects should be positive, select the Absolute importance scores option. Otherwise, manually change the results when reporting. Issue The underlying Shapley and Relative Importance Analysis algorithms always compute a positive importance scores. However, the true effect of a predictor can be negative, resulting in people misinterpreting the results. Test Compute a GLM (e.g., linear regression). Any negative coefficients warrant investigation. For this reason, Q automatically does this and puts the signs of the multiple regression coefficients onto the driver analysis outputs (both Shapley and Relative Importance Analysis). If the correlation is also negative, it means that the effect is negative. If positive, it suggests that the multiple regression is picking up a non-interesting artefact.
  39. 39. 9: The signs of the importance scores are correct • Click on relative.importance • Stylish is negative. As discussed, it is because of the multiple regression. This is really just a reminder that we need to be careful. We have already looked at this, and we know that the correlation is positive, so we can ignore this. • Check Absolute importance scores • Go to comparison, and replace shapley[-9] with abs(shapley[-9]) 39 Instructions for the case studies
  40. 40. 10: The predictor variables have no missing values 40 Options (ranked from best to worst) Comments Create a bespoke model that appropriately models the process(es) that cause the values to be missing. This is really hard! Multiple imputation of missing values If using Relative Importance Analysis, set Missing Data to Multiple Imputation Leave out observations with missing values from the analysis (i.e., complete case analysis) This implicitly assumes that the data is Missing Completely At Random (MCAR; i.e., other than that some variables have more missing values than others, there is no pattern of any kind in the missing data). Test this assumption using Automate > Browse Online Library > Missing Data > Little’s MCAR Test Issue There are missing values of predictor variables (e.g., some attributes were not collected for some respondents, or there were “don’t know” response)
  41. 41. 11: There are no outliers/unusual data points 41 Options (ranked from best to worst) Comments Inspect each unusual observation, and understand if it is an error or not Difficult/time consuming Filter out all the unusual observations, and check to see if the model has changed. If it has changed, and the number of unusual observations is small, use the new model. Ignore the problem This is, by far, the most common approach. Issue A few outliers/unusual observations can skew the results of importance analysis. Test • Hat/influence scores • Standardized residuals • Cook’s distance
  42. 42. 11: There are no outliers/unusual data points • Click on 3 more (warnings) • Note that we have a warning about Unusual Observations. Let’s explore this further. • Create > Regression > Diagnostic > Plot > Influence Index • Uncheck Automatic (so that it does not update when we later change the model) • You can see the various unusual observations have been marked: 379, 382, 295, and so on • Go to Notes tab and copy code • Variables and Questions – Stacked Technology.sav • Right-click on a row: Insert > Variable(s) > R Variable: • !(1:4056 %in% c(295, 1744, 1749, 2764, 3420, 246, 2364, 2829, 2830, 4029, 295, 2380, 2764, 3049, 3050) • Name: Unusual observations removed • Check the F in the Tags column. Make the variable available as a filter. • Go back to Output: relative.importance • Apply the filter to the table 42 Instructions for the case studies
  43. 43. 12: There is no serial correlation (aka autocorrelation) 43 Options (ranked from best to worst) Comments Create a bespoke model that addresses the serial correlation (e.g., a random effects model if the serial correlation is due to repeated measures, or a time series model if it is measures over time) This is a lot of work. Don’t report statistical test results (i.e., p- values). The importance scores will be OK. The significance tests will be misleading to an unknown extent. Issue The standard tests for the significance of a predictor assume that there is no serial correlation/autocorrelation (a particular type of pattern in the residuals). Whenever you stack data you are highly likely to have this problem. Test Regression > Diagnostic > Serial Correlation (Durbin-Watson)
  44. 44. 12: There is no serial correlation (aka auto…) • Regression > Diagnostic > Serial Correlation (Durbin-Watson) • These are significant. As mentioned, this is always the case. 44 Instructions for the case studies
  45. 45. 13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable) 45 Options (ranked from best to worst) Comments Use a more appropriate model (e.g., ordered logit) This is not possible with Shapley. This models make other, hopefully less problematic, assumptions (beyond the scope of this webinar) Use robust standard errors This is not possible with Shapley. In Q: check Robust standard error Issue The standard tests for the significance of a predictor in a linear model assume that the variance of the residuals is constant. This is rarely the case in driver analysis, as usually the data is from a bounded scale (e.g., if it is a rating out of 10, it is impossible for a value to be observed that is greater than 10). Test Displayr automatically performs the Breusch-Pagen Test Type = Linear
  46. 46. 13: The residuals have constant variance • Click on relative.importance. Note that there is a warning. • Change the Type to Ordered Logit 46 Instructions for the case studies
  47. 47. Creating the donut chart (earlier in the presentation) • The first step is to create a new table containing the cola importance scores. Add a new R Output, with code: ColaImportance = cola.importance$relative.importance$importance # Removing "Performance: " from the labels names(ColaImportance) = flipFormat::TidyLabels(names(ColaImportance)) ColaImportance • Then, sort the cola importance scores, in a descending order: • Add a new R Output, with code: SortedColaImportance = sort(ColaImportance, decreasing = TRUE) • Create > Charts > Visualization > Donut Chart • Select SortedColaImportance as the Table • Data value suffix: % • Data label decimal places: 1 • Change the name in the report tree to Donut • Check Automatic (at the top) 47 Instructions for the case studies
  48. 48. Creating the performance-importance chart (earlier in the presentation) • Right-click on donut in the Report tree and select Add Table • In the blue drop-down menu, select Performance • Click on Brand preference, at the top of the Report tree • Add another table and select Brand [Stacked Cola Brand Associations.sav] in the blue drop-down menu • Right-click on the 17% next to Diet Coke and select Create Filter • Make sure that Apply filter to the current table is not selected, and press OK • Click the Performance table, and apply the Diet Coke filter (using the Filter drop-down, at the bottom-left of the screen) • Right-click on Performance and select Reference Name and change this to performance • Right-click on the Performance table in the Report tree and select Add R Output, entering the code PerformanceImportance = cbind("Importance (%)" = ColaImportance, "Diet Coke Brand Associations (%)" = table.Performance[- nrow(table.Performance)]) • Create > Charts > Visualization > Labeled Scatterplot • Set Table to PerformanceImportance • Rename to Perfomance Importance Chart • Press Calculate 48 Instructions for the case studies
  49. 49. Creating the correspondence analysis chart (earlier in the presentation) • Right-click on Performance Importance Chartin the Report tree and select Add Table • In the blue drop-down menu, select Performance • In the brown drop-down select Brand [Stacked Cola Brand Associations.sav] • Right-click on None of these and select Remove • Create > Dimension Reduction > Correspondence Analysis of a Table • Table: Performance by Brand • Output: Bubble Chart • Bubble sizes: ColaImportance • Legend title: Importance 49 Instructions for the case studies
  50. 50. TIM BOCK PRESENTS Q&A Session Type questions into the Questions fields in GoToWebinar. If we do not get to your question during the webinar, we will write back via email. We will email you a link to the slides and data. Get a free one-month trial of Q from www.q-researchsoftware.com.

×