SlideShare a Scribd company logo
1 of 46
Abbott - A More Transparent Interpretation of
            Health Club Surveys
Salford Analytics and Data Mining Conference
                     2012
                 San Diego, CA
                 May 24, 2012
                        Dean Abbott
                    Abbott Analytics, Inc.
          URL: http://www.abbottanalytics.com
          Blog: http://abbottanalytics.blogspot.com
          Twitter: @deanabb
            Copyright © 2004-2012, Abbott Analytics, Inc.    1
             and Seer Analytics, Inc. All rights reserved.
About Seer Analytics

                    Seer           Analytics, LLC
                         – Founded in 2001, based in Tampa, FL
                         – Produce actionable intelligence to help clients make
                           smarter decision and drive business performance.
                         – Reports embed sophisticated analytics yet are designed
                           to be accessible and meaningful to a non-technical
                           audience.
                    Bill       Lazarus, Founder, President and CEO
                         – BA from University of Wisconsin
                         – MA from University of Toronto,
                         – SM and PhD from Massachusetts Institute of Technology.



Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   2
About Abbott Analytics
                    Abbott Analytics
                        – Founded in 1999, based in San Diego, CA
                        – Dedicated to data mining consulting and training
                    Principal: Dean Abbott
                        – Applied Data Mining for 22+ years in
                                 Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud
                                  Detection, Predictive Toxicology, Biological Risk Assessment
                        – Course Instruction
                                 Public 1-, 2-, and 3-day Data Mining Courses
                                 Conference Tutorials and Workshops (next: ACM Data Mining
                                  Bootcamp, November 13th in San Francisco)
                        – Customized Training and Knowledge Transfer
                                 Data mining methodology (CRISP-DM)
                                 Training services for software products, including CART, Clementine,
                                  Affinium Model, Insightful Miner

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.              3
Talk Outline
                     Health Club Survey Analysis Problem
                      Description
                       – Overview of Survey Analysis and Approaches
                     Solution 1: traditional approach
                       – Statistical approach, fancy visualization
                     Solution 2: solution aligning models to
                      business objectives
                     Results and conclusions

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   4
Problem Setup: Member Survey

                     Question:
                       – What are the characteristics of members who indicated
                         the highest overall satisfaction with their Y?
                     Data:
                       – 32,811 records containing survey answers
                       – No demographic data except what was on survey
                         (marital status, children, age, gender)
                     Approach:
                       – Create supervised learning models with target variable
                         “overall_satisfaction = 1”


Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   5
Some Notes
                     It is very unusual to have so many records
                       – The 31K responses were for one year
                       – Responses are collected from across the country
                     Seer tracks survey responses longitudinally
                      as well (not discussed in this talk)
                       – Began collecting survey responses and storing in
                         a database in 2001 => 10 years of data
                       – Seer has moved beyond modeling satisfaction to
                         include a more complete view of the YMCA
                         member experience

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   6
Data Preparation
                     Begin with 57 candidate inputs to model
                       – All survey questions are multiple choice
                                  Treated as categories, not numbers
                                  Typically 6 categories per question (1-5)
                                  Unknown initially coded as “0”
                       – No text comments fields included as inputs to model
                     Create new column for target variable
                       – If overall_satisfaction = 1, variable value = 1,
                         otherwise, variable value = 0
                     Data very clean with respect to NULLs


Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   7
Member Survey Question Categories




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   8
Sampling and Target
                                         Populations
                     Begin with 32,811 responses
                     Set aside about half for validation (not used during modeling):
                      16,379 records
                       – These records will be used to provide final summaries of the
                         segments
                     Q1 - Satisfaction = 1: 31%
                       – 86% have Recommend to friends = 1
                     Q48 - Recommend to Friend = 1: 54%
                       – 49% have Overall Satisfaction = 1
                       – 26.0% have both overall satisfaction and recommend to friends
                         both equal to 1
                     Q32 - Likelihood to Renew = 1: 46%
                     Implications
                       – All three are interesting, but Recommend is so high already, not
                         much room for growth

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   9
Objective and Data Challenges

                      Project Objective
                        – Interpret results of survey for YMCA
                      Challenges
                        – Missing data (some questions either N/A or blank)
                                   Solution: Impute values that least effect information
                                    communicated by question (not a mean or median!)
                        – Question responses highly correlated with one another
                                   Multi-collinearity and interpretation of results problematic
                                   Must reduce dimensionality without losing interpretation of
                                    results
                                   Solution: Factor analysis


Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.        10
Objective and Data Challenges
                      Challenges, cont‟d
                        – Target variable
                                   Three questions pointed to the important actionable
                                    information (related to how satisfied members were)
                                   No one question fully characterized the value of a member
                                   Solution: combine all three into a new “index of excellence”
                                    (IOE)
                                        – IOE = additive weighted
                                          sum of Q1, Q32, Q48
                                        – Reverse scale so higher
                                          IOE is better




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.        11
Why Factor Analysis?
                     “Traditional approach to survey analysis involves
                      the use of frequency counts, t-test, correlation,
                      and measures of central tendency. “
                     “Factor analysis is a variable-reduction statistical
                      technique capable of probing underlying
                      relationships in variables”
                       – Santos, J.R.A., Clegg, M.D. (1999), "Factor analysis adds
                         new dimension to extension surveys", Journal of Extension,
                         http://www.joe.org/joe/1999october/rb6.php
                     Our use of Factor Analysis
                       – Traditional view: there is an underlying “truth” that exist,
                         and the survey is a redundant measure of that truth.
                       – Just a derived variable that reduces dimensionality

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   12
Factor Analysis: Key Factors
                                                                                                                               Factor 1

                                                                                                        1.00




                                                                                       Loading Value
                                                                                                        0.80
                                                                                                        0.60
                                                                                                        0.40
                                                                                                        0.20
                                                                                                        0.00
                                                                                                               Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9 Q10 Q11 Q12
                                                                                                                              Top Question Loadings



                                                                                                                               Factor 2

                                                                                                        0.80



                                                                                       Loading Values
                                                                                                        0.60
                                                                                                        0.40
                                                                                                        0.20
                                                                                                        0.00
                                                                                                               Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q23
                                                                                                                              Top Question Loadings




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                                                        13
Member Survey Factor Analysis
                            Loadings                                                    Condition of     Friendly /
    Factor                 Facilities                                                     Specific      Competent Financial
 Description  Staff Cares clean/safe Equipment Registration                             Equipment          Staff    Assistance    Parking
Factor Number   Factor1     Factor2    Factor3   Factor4                                   Factor5        Factor6     Factor7     Factor8
      Q2            0.295      0.238      0.115      0.458                                      0.054         0.380     (0.016)      0.095
      Q3            0.217      0.143      0.093      0.708                                      0.094         0.077      0.033       0.048
      Q4            0.298      0.174      0.106      0.601                                      0.068         0.266      0.002       0.062
      Q5            0.442      0.198      0.087      0.173                                      0.025         0.613     (0.021)      0.053
      Q6            0.417      0.254      0.142      0.318                                      0.044         0.584     (0.008)      0.058
      Q7            0.406      0.277      0.167      0.252                                      0.045         0.461      0.003       0.092
      Q8            0.774      0.058      0.041      0.093                                      0.052         0.113      0.036       0.061
      Q9            0.733      0.175      0.108      0.145                                      0.052         0.260      0.024       0.052
     Q10            0.786      0.139      0.079      0.110                                      0.060         0.218      0.029       0.046
     Q11            0.765      0.120      0.101      0.132                                      0.089         0.015      0.038       0.047
     Q12            0.776      0.090      0.049      0.087                                      0.041         0.014      0.042       0.053
     Q13            0.145      0.728      0.174      0.112                                      0.106         0.110      0.006       0.018
     Q14            0.191      0.683      0.163      0.151                                      0.053         0.124      0.013       0.089
     Q15            0.102      0.598      0.141      0.090                                      0.162         0.070      0.029       0.152
     Q16            0.100      0.370      0.133      0.082                                      0.028         0.035      0.009       0.843
     Q17            0.128      0.567      0.229      0.102                                      0.116         0.080      0.018       0.224
     Q18            0.148      0.449      0.562      0.116                                      0.132         0.114      0.010       0.042
     Q19            0.129      0.315      0.811      0.101                                      0.102         0.103      0.002       0.063
     Q20            0.171      0.250      0.702      0.086                                      0.145         0.078      0.016       0.149
     Q23            0.271      0.220      0.188      0.316                                      0.121         0.046      0.069       0.019
     Q24            0.363      0.165      0.128      0.140                                      0.080         0.095      0.035       0.076

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                              14
Reduce Variables using
                                        Regression
                      Already beginning
                                                                                                       Regression Rankings of Questions/Factors
                       with only 13
                       variables                                                                 0.6




                                                                        Regression Coefficient
                                                                                                 0.5
                      Question: how                                                             0.4
                       many of these are                                                         0.3

                       useful predictors?                                                        0.2


                      Decided to retain 5                                                       0.1


                       factors for final
                                                                                                  0




                                                                                                    fa 1 0
                                                                                                              2

                                                                                                              9

                                                                                                              1

                                                                                                              4

                                                                                                              3

                                                                                                              8



                                                                                                              6

                                                                                                              5

                                                                                                              7
                                                                                                             44

                                                                                                             22

                                                                                                    fa 25
                                                                                                            3.

                                                                                                            3.

                                                                                                            3.

                                                                                                            3.

                                                                                                            3.

                                                                                                            3.



                                                                                                            3.

                                                                                                            3.

                                                                                                            3.
                       model




                                                                                                           3.
                                                                                                           Q

                                                                                                           Q

                                                                                                           Q

                                                                                                         or

                                                                                                         or

                                                                                                         or

                                                                                                         or

                                                                                                         or

                                                                                                         or



                                                                                                         or

                                                                                                         or

                                                                                                         or
                                                                                                        or
                                                                                                      ct

                                                                                                      ct

                                                                                                      ct

                                                                                                      ct

                                                                                                      ct

                                                                                                      ct



                                                                                                      ct

                                                                                                      ct

                                                                                                      ct
                                                                                                     ct
                                                                                                    fa

                                                                                                    fa

                                                                                                    fa

                                                                                                    fa

                                                                                                    fa




                                                                                                    fa

                                                                                                    fa
                                                                                                   fa
                                                                                                                        Question/Factor




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                                       15
Predictive Modeling Approach

                                                                                              3 questions with
                                     Identify Key                                             high association
    50+ Survey Questions




                                      Questions                                               with target

                                                                              Regression Model:
                                                                               Find Significant
                                                                                  Variables                      13 fields
                                    Factor Analysis:                                                             down to 7
                                                                          10 factors, or
                                       10 factors                         variables that
                                                                          loaded                      Regression Model:
                                                                          highest on                   Find Significant
                                                                          each factor                     Variables
                           3 key questions                                                                       Variable
                                                                                                                 ranks


Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                  16
One Further Note on Final
                               Regression Models
                     Empirical comparison: Factors as inputs vs. Top-loading question in
                      factor as input
                       – Top-loading or most interesting question on factor as representative
                         of that factor produced slightly better models
                       – Use of top-loading question makes final model more easily
                         understood
                       – This flies in the face of traditional theory, but worked better
                         operationally
                     Final regression model contained these fields:




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.     17
Key: Explaining Results
                                                                  Drivers of
                   Visualization shows key                       Satisfaction
                    variables in survey
                    associated with
                    “excellence”, and
                                                                   Current
                    performance metrics for                        Year vs.
                                                                                          Staff 1
                                                                                                     Staff 2            relationships
                                                                   Last Year
                    each Y
                     –    How well did this Y do?                                                               goals

                     –    What is the change over
                          last year‟s result?                         Prior year


                     –    This is a 45-dimensional                                            equipment
                          visualization (don‟t ask                                                                  value
                          me to name them all!)                                                      facility
                   Shows which attributes
                    does the Y need to
                    improve to improve
                    customer satisfaction.

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                             18
Interpreting Index/Drivers of
                                 Excellence Analysis
                        All factors listed are important

                        Position on „x‟ axis indicates relative importance of
                         factors in driving IOE

                        Above the line = “better than peers”; below the line =
                         “worse than peers”

                        Small dot indicates position last year

                        R-Y-G indicates magnitude of change from previous year

                        Size of bubble indicates magnitude of score on factor

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   19
drivers of
             excellence

             May 2003

             Gardena YMCA


              2003 vs.2002
                                          Relative Performance




                 Better                                          Staff cares                               Meet fitness goals

                 Same                                                            Staff competence

                 Worse                                                                                                                          Peer
                                                                                                                                               Average


                 2002 year
                  Prior
                                                                                                                                Feel welcome


                                                                                                             Value


                                                                                                      Equipment
                                                                               Facilities



                                                                                              Importance


             © 2003 Seer Analytics, LLC
             Tampa, FL 33602

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                                              20
Equipment


               drivers of                                          Staff cares
               excellence                                                                                            Facilities


               May 2003                                                                                                                      Feel welcome


               Montebello YMCA


                2003 vs.2002
                                            Relative Performance


                                                                                 Staff competence
                   Better
                   Same                                                                                                           Meet fitness goals
                   Worse                                                                                                                                     Peer
                                                                                                                                                            Average
                                                                                                             Value

                   2002 year
                    Prior




                                                                                                    Importance


               © 2003 Seer Analytics, LLC
               Tampa, FL 33602

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                                                           21
drivers of
              excellence
                                                                                                  Equipment

              May 2003

              Torrance South YMCA
                                                                           Staff competence                                              Feel welcome
                                                                                                                    Value
                                                                  Staff cares
               2003 vs.2002
                                           Relative Performance



                  Better
                  Same
                  Worse                                                                                                                                  Peer
                                                                                              Facilities                                                Average


                  2002 year
                   Prior                                                                                                    Meet fitness goals




                                                                                                       Importance


              © 2003 Seer Analytics, LLC
              Tampa, FL 33602

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                                                       22
drivers of
                                     I love this visualization!
             excellence

             May 2003

             Culver YMCA

                                                                                                                                Feel welcome
                                                                 Staff cares
              2003 vs.2002
                                          Relative Performance



                                                                                                            Value
                 Better                                                                     Facilities

                 Same
                 Worse                                                                                                                          Peer
                                                                                                                                               Average


                 Prior year                                              Staff competence                           Meet fitness goals
                 2002




                                                                                                     Equipment




                                                                                                Importance


             © 2003 Seer Analytics, LLC
             Tampa, FL 33602

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                                              23
What’s the Problem with That?
                     Customer was not interested in “techno” solutions
                     Customer was interested in what actions could be
                      taken as a result of the data mining models
                       – Which characteristics are most correlated with best
                         customers?
                                  What do they like and dislike about the Y?
                                  Is it equipment? relationships? facility? staff?
                       – Show key contributors, how each Y compared with other
                         Y locations, and if Y is improving




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   24
So What’s The Problem with
                                  That? (cont’d)
                     Regression, Neural Networks are “global”
                      estimators
                       – The operate over the entire data space
                       – Descriptors of Regression represent average influence
                       – Neither technique provides explicit localized
                         characteristics
                     Customer would like actionable analytics
                       – Clear characteristics of subgroups
                       – Different strategies for subgroups
                     Conclusion: In Round 2, use another approach


Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   25
Who cares
                                           about “satisfaction”?
                      Issue: The YMCA is a cause-driven charity
                              It‟s not about running “satisfactory” gyms
                              It‟s about improving lives and building communities
                      Question: How can the member survey
                       data help Ys achieve mission goals?
                      Answer: Develop a tool that is:
                                 Grounded in solid social science
                                 Accessible/understandable
                                 Diagnostic/predictive
                                 A driver of performance and change


Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   26
Satisfaction Model Performance
                                                                                                           100                                100

                                                                                                           80                                 80
            Misclassification for Learn Data




                                                                                                 % Class
                                                                                                           60                                 60
                 Class            N     N Mis-                      Pct             Cost
                                                                                                           40                                 40
                                Cases Classed                      Error
                          0      33,220   8,178                      24.62               0.25              20                                 20
                          1      14,845   2,622                      17.66               0.18
                                                                                                            0                              0
                                                                                                                 0    20   40   60    80 100
                                                                                                                       % Population


          Node         Cases     % of Node               %          Cum %          Cum %          %                   Cases          Cum             Lift
                      Tgt. Class Tgt. Class         Tgt. Class     Tgt. Class       Pop          Pop                 in Node           lift          Pop
             1          7,289     72.788             49.101         49.101         20.834       20.834                10,014         2.357          2.357
             9           904      51.984               6.09          55.19         24.452       3.618                 1,739          2.257          1.683
             2          2,317     50.612             15.608         70.798         33.977       9.525                 4,578          2.084          1.639
             3           471       46.45              3.173         73.971         36.087        2.11                 1,014           2.05          1.504
            10           431      43.186              2.903         76.874         38.163       2.076                  998           2.014          1.398
             4           349      40.819              2.351         79.225         39.942       1.779                  855           1.984          1.322
            12           462      38.404              3.112         82.337         42.445       2.503                 1,203           1.94          1.243
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                                                 27
Satisfaction Model from CART®
                                      Q25




                    Q13                                  Q13

                                                                                 •     Q25: Feel Welcome
                            Q22                  Q6              Q2                       – Surrogate: Q24 (can relate to other
                1                                                                           members)
                                   Q13             Q31         Q31
                                                                                 – Q13: Facilities are clean
                    2                          9                                          – Surrogate: Q14 (Facilities safe and
                                                                     15                     secure)
                             Q19                           Q36
                                                                                 •     Q22: Value for Money
                                            8 10                                        – Surrogates: Q21 (convenient
                                   Q36
                                                                                           schedule) and Q23 (quality
                        3                                                                  classes/programs)
                              Q34 $                                              –     Q6: Staff Competent
                                                                                        – Surrogates: Q5 (friendly staff) and
                             Q31
                                                                                           Q7 (enough staff)



Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                       28
Member Satisfaction Model:
                                 Key Rules
Terminal Node 1                                      Terminal Node 2                          Terminal Node 3

• 10,014 surveys (20.8%),                            • 4.578 surveys (9.5%),                  • 1,739 surveys (3.6%),
• 7,289 highly satisfied                             • 2,317 highly satisfied                 • 904 highly satisfied
(72.8%),                                             (50.6%),                                 (52.0%),
• 49% of all highly satisfied                        • 15.6% of all highly satisfied          • 6.1% of all highly satisfied

RULE:                                                RULE:                                    RULE:
If strongly agree that                               If strongly agree that                   If strongly agree that Y
                                                                                              has the right equipment
facilities are clean and                             feel welcome and
                                                                                              and strongly agree that feel
strongly agree that                                  strongly agree Y is value                welcome, and somewhat
member feels welcome,                                for money, even if don‟t                 agree that facilities are
then highly satisfied                                strongly agree facilities                clean, even though don‟t
                                                     are clean, then highly                   strongly feel Y is good
                                                     satisfied                                value for the money, then
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   highly satisfied            29
Member Satisfaction Model:
                                Other Rules
                Terminal Node 10                                                       Terminal Node 9

                • 998 surveys (2.1%),                                                  •1,739 surveys (3.6%),
                • 431 highly satisfied (43.2%),                                        • 904 highly satisfied (52.0%),
                • 2.9% of all highly satisfied                                         • 6.1% of all highly satisfied

                RULE: weakest of top 5                                                 RULE
                If strongly agree that loyal to Y                                      If strongly agree that facilities are
                and strongly agree that facilities                                     clean, and strongly agree that staff
                are clean, even though don‟t                                           is competent, even if don‟t
                strongly agree that feel welcome                                       strongly agree feel welcome, then
                nor strongly agree that staff is                                       highly satisfied
                competent, then highly satisfied

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                    30
Member Satisfaction Model:
                                  Unsatisfied Rules
              Terminal Node 15                                                                Terminal Node 8

              • 19,323 surveys (40.2%),                                                       • 1,364 surveys (2.8%),
              • 1,231 highly satisfied (6.4%),                                                • 141 highly satisfied (10.3%),
              • 8.3% of highly satisfied                                                      • 1.0% of all highly satisfied
              • 58.2% of all not highly satisfied
                                                                                              RULE
              RULE:                                                                           If don‟t strongly agree that
              If don‟t strongly agree that staff                                              facilities are clean and don‟t
              is efficient and don‟t strongly                                                 strongly agree that the Y is
              agree that feel welcome, and don‟t                                              good value for the money, even
              strongly agree that the facilities                                              though strongly agree that feel
              are clean, then member isn‟t                                                    welcome, member isn‟t highly
              highly satisfied                                                                satisfied.
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                     31
Recommend to Friend Model
                                 from CART®
                            Q31                                                              Q31: Loyal
                                                                                              – Surrogates: Q25, Q44,
                                                                                                Q22, Q24 (can relate to
                    Q25               Q22
                                                                                                other members)
                                                                                             Q25: Feel Welcome
                       Q22        Q25
                                                                                              – Surrogates: Q24, Q5
                   1                          7                                                 (friendly staff)
                                     Q44                                                     Q22: Value for Money
                       2        4                                                             – Surrogates: Q23
                                                                                                (quality
                                                                                                classes/programs)
                                     5
                                                                                             Q44: Helps meet
                                                                                              fitness goals
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                               32
Recommend to Friend Model:
                                Key Rules
Terminal Node 1                                       Terminal Node 2                         Terminal Node 4

• 13,678 surveys (28.5%),                             • 6,637 surveys (13.8%),                • 2,628 surveys (5.5%),
• 12,122 recommend (88.6%),                           • 4,744 recommend (71.5%),              • 1,932 recommend (73.5%),
• 47.0% of all strong                                 • 18.4% of all strong                   • 6.1% of all strong
recommends                                            recommends                              recommends

RULE:                                                 RULE:                                   RULE
If strongly agree that loyal to                       If strongly agree that loyal to         If strongly agree that Y is a
Y and strongly agree that feel                        Y and agree that Y is a good            good value for the money and
welcome, then strongly agree                          value for the money, even               strongly agree that feel
that will recommend to friend                         though don‟t strongly agree             welcome, even though not
                                                      feel welcome, strongly agree            strongly loyal to Y, strongly
                                                      will recommend to friend.               agree will recommend to
                                                                                              friend


Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                33
Recommend to Friend Model:
                               Other Rules

  Terminal Node 7                                                               Terminal Node 5

  • 21,865 surveys (45.5%),                                                     • 814 surveys (1.7%),
  • 5,461 highly recommend (25.0%),                                             • 509 highly recommend (62.5%),
  • 21.2% of all highly recommend                                               • 2.0% of all highly recommend

  RULE:                                                                         RULE
  If don‟t strongly agree that loyal to                                         If strongly agree that Y is good value
  Y and don‟t strongly agree that Y is                                          for the money, and strongly agree that
  value for the money, then will not                                            Y helps meet fitness goals, even
  highly recommend to a friend                                                  though not strongly loyal to the Y
                                                                                and don‟t strongly feel welcome, will
                                                                                highly recommend to a friend
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                              34
Intend to Renew Model from
                                    CART®
                                                                •     Q25: Feel Welcome
                             Q25                                        – Surrogate: Q24 (can relate to other
                                                                          members)
                                                                – Q44: Helps meet fitness goals
                 Q44                      Q44                           – Surrogate: Q51 (visit frequency)
                                                                •     Q22: Value for Money
                                                                       – Left split Surrogates: Q21 (convenient
                       Q22             Q22                               schedule) and Q23 (quality
             1                                     8                     classes/programs)
                                                                       – Right split surrogates: Q25 (feel
                          Q47      Q27
                                                                         welcome=2 or 3)
                   2                                            –     Q47: Would be donor
                                               7
                                                                       – Surrogate: Q45A (have been donor)
                                                                –     Q27: Feel sense of belonging
                        3                                              – Surrogates: Q25, Q24
                                   5
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                       35
Intend to Renew Model:
                                         Key Rules
    Terminal Node 1                                   Terminal Node 2                         Terminal Node 5

    • 13,397 surveys (27.9%),                         • 3,051 surveys (6.3%),                 • 5,704 surveys (11.9%),
    • 9,903 renew (73.9%),                            • 1,823 renew (59.8%),                  • 3,201 recommend (56.1%),
    • 48.4% of all intend to                          • 8.9% of all intend to renew           • 15.6% of all intend to
    renew                                                                                     renew
                                                      RULE:
    RULE:                                             If strongly agree Y is good             RULE
    If strongly agree that feel                       value for the money and                 If strongly agree that feel
    welcome and strongly                              strongly agree that feel                sense of belonging, and
    agree that Y helps meet                           welcome, even if don‟t                  agree that Y is value for the
    fitness goals, then                               strongly agree that Y helps             money, and strongly agree
    strongly agree that intend                        meet fitness goals, then                that Y helps meet fitness
    to renew                                          strongly agree that intend to           goals, even if don‟t feel
                                                      renew                                   welcome, then strongly
                                                                                              agree intend to renew.
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                   36
Intend to Renew Model:
                                        Other Rules
                 Terminal Node 8                                                          Terminal Node 7

                 18,547 surveys (38.6%),                                                  2,178 surveys (4.5%),
                 • 3,130 strongly intend to renew (16.9%),                                • 578 strongly intend to renew (26.5%),
                 • 15.3% of all strongly intend to renew                                  • 2.8% of all strongly intend to renew

                 RULE:                                                                    RULE
                 If don‟t strongly agree that feel welcome                                If don‟t strongly agree that Y is good
                 and don‟t strongly agree that Y helps meet                               value for money and don‟t strongly
                 fitness goals, then don‟t strongly agree                                 agree that feel welcome, even if
                 that intend to renew                                                     strongly agree Y helps meet fitness
                                                                                          goals, don‟t strongly agree that intend
                                                                                          to renew.




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                                         37
Summary of Key Questions in
                               Models
                    Feel Welcome was root splitter (or surrogate) for
                     each model
                    Satisfaction is different than Recommend and Renew
                     in other respects
                      – Helps meet fitness goals was in Recommend and
                        Renew models, but not satisfaction
                      – Facilities clean only in satisfaction model



                                                         3




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   38
Key Differences Between
                            Targets, Put Another Way
                     Satisfaction
                       – Feel Welcome
                       – Clean Facility
                     Renewal
                       – Feel Welcome
                       – Y Helps Meet Fitness Goals, Value for $$
                     Recommend to Friend
                       – Feel Welcome
                       – Loyal to Y, Value for $$


Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   39
Top Terminal Nodes Comprise
                       More than 70% of Hits




Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   40
Subsequent Results

                                                                                                       Percent
                                     Measure                              2002                2009   Improvement
                      Satisfaction                                         31%                41%       32%
                      Recommend to Friend                                  54%                57%        6%

                   Rules from Models are still in use today
                   Trees and Factors can help reduce # questions in
                    survey
                     – Employee ruleset (using same methodology) resulted
                       in a new “short-form” survey using only questions in
                       the splits
                     – Not yet implemented in Member survey
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.                        41
Index Construction and Scaling

    Begin with Factor Analysis
    Cluster attribute groupings to be managerially meaningful
    Z-normalize the variables, cast all in units of variance
    Run tests for deviation from Standard Normal by variable and
     factor
    Create z-index for each factor
    Re-scale to nation-wide percentile
Analysis of Hierarchy Claim

                  Pearson Correlations Existing Order
              Facilities Support          Value          Engagement Impact     Involvement
Facilities          1.00           0.53             0.65       0.37       0.25         0.04
Support                            1.00             0.72       0.78       0.46         0.16
Value                                               1.00       0.65       0.55         0.25
Engagement                                                     1.00       0.61         0.53
Impact                                                                    1.00         0.53
Involvement                                                                            1.00
n=425

        Pearson Correlations Reverse Value and Support
              Facilities Value            Support        Engagement Impact     Involvement
Facilities          1.00           0.65             0.53       0.37       0.25         0.04
Value                              1.00             0.72       0.65       0.55         0.25
Support                                             1.00       0.78       0.46         0.16
Engagement                                                     1.00       0.61         0.53
Impact                                                                    1.00         0.53
Involvement                                                                            1.00
Summary from “Power of Habit”

                               In 2000, for instance, two statisticians were hired by the
                               YMCA—one of the nation’s largest nonprofit organizations—
                               to use the powers of data-driven fortune-telling to make
                               the world a healthier place. The YMCA has more than 2,600
                               branches in the United States, most of them gyms and
                               community centers. About a decade ago, the organization’s
                               leaders began worrying about how to stay competitive.
                               They asked a social scientist and a mathematician—Bill
                               Lazarus and Dean Abbott—for help.

                               The two men gathered data from more than 150,000 YMCA
                               member satisfaction surveys that had been collected over
                               the years and started looking for patterns. At that point,
                               the accepted wisdom among YMCA executives was that
                               people wanted fancy exercise equipment and sparkling,
                               modern facilities. The YMCA had spent millions of dollars .
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   44
Summary from “Power of
                            Habit”: YMCA Satisfaction
                Retention, the data said, was driven by emotional factors, such as
                whether employees knew members’ names or said hello when
                they walked in. People, it turns out, often go to the gym looking
                for a human connection, not a treadmill. If a member made a
                friend at the YMCA, they were much more likely to show up for
                workout sessions. In other words, people who join the YMCA have
                certain social habits. If the YMCA satisfied them, members were
                happy. So if the YMCA wanted to encourage people to exercise, it
                needed to take advantage of patterns that already existed, and
                teach employees to remember visitors’ names. It’s a variation of
                the lesson learned by Target and radio DJs: to sell a new habit—in
                this case exercise—wrap it in something that people already know
                and like, such as the instinct to go places where it’s easy to make
                friends.

                “We’re cracking the code on how to keep people at the gym,”
                Lazarus told me. “People want to visit places that satisfy their
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   45
Conclusions
                    The “best” solutions are not always “good”
                     solutions
                      – There is often more than one way to approach a solution
                      – It is often unclear even to the end customer what solution
                        is best until the solution exists on paper
                    Interactions are the Key (or why trees improve
                     regression models)
                      – Main effects are interesting, but deeper insights gained
                        from subgroups
                    Don‟t give up
                      – Matching data to decisions is difficult business
                      – Get feedback; make sure the story themodel tells is
                        understood by decision-makers

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved.   46

More Related Content

Similar to A More Transparent Interpretation of Health Club Surveys (YMCA)

Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....CORE Group
 
Copyright © 2012 EMC Corporation. All Rights Reserved. EMC.docx
Copyright © 2012 EMC Corporation. All Rights Reserved. EMC.docxCopyright © 2012 EMC Corporation. All Rights Reserved. EMC.docx
Copyright © 2012 EMC Corporation. All Rights Reserved. EMC.docxbobbywlane695641
 
Experiences with Data Feedback - Better Software 2004 - Ben Linders
Experiences with Data Feedback - Better Software 2004 - Ben LindersExperiences with Data Feedback - Better Software 2004 - Ben Linders
Experiences with Data Feedback - Better Software 2004 - Ben LindersBen Linders
 
201505 Statistical Thinking course extract
201505 Statistical Thinking course extract201505 Statistical Thinking course extract
201505 Statistical Thinking course extractJefferson Lynch
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Vishal Kumar
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Business Over Broadway
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...TCELab LLC
 
Data Granularity and Business Decisions by VCare Insurance Company
Data Granularity and Business Decisions by VCare Insurance CompanyData Granularity and Business Decisions by VCare Insurance Company
Data Granularity and Business Decisions by VCare Insurance CompanyDILIP KUMAR
 
Growing a Culture of Data-Driven Continuous Improvement
Growing a Culture of Data-Driven Continuous ImprovementGrowing a Culture of Data-Driven Continuous Improvement
Growing a Culture of Data-Driven Continuous ImprovementComparative Agility
 
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектовAI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектовGeeksLab Odessa
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologySergey Shelpuk
 
How to sustain analytics capabilities in an organization
How to sustain analytics capabilities in an organizationHow to sustain analytics capabilities in an organization
How to sustain analytics capabilities in an organizationSAS Canada
 
Copyright © 2014 EMC Corporation. All Rights Reserved..docx
Copyright © 2014 EMC Corporation. All Rights Reserved..docxCopyright © 2014 EMC Corporation. All Rights Reserved..docx
Copyright © 2014 EMC Corporation. All Rights Reserved..docxdickonsondorris
 
1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extractJefferson Lynch
 
Forage Accenture Data Analytics Task 3 - Data Visualization and Storytelling....
Forage Accenture Data Analytics Task 3 - Data Visualization and Storytelling....Forage Accenture Data Analytics Task 3 - Data Visualization and Storytelling....
Forage Accenture Data Analytics Task 3 - Data Visualization and Storytelling....SahilSharma235241
 
Expert-System for Health Promotion
Expert-System for Health PromotionExpert-System for Health Promotion
Expert-System for Health PromotionJoel Bennett
 

Similar to A More Transparent Interpretation of Health Club Surveys (YMCA) (20)

Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
 
Copyright © 2012 EMC Corporation. All Rights Reserved. EMC.docx
Copyright © 2012 EMC Corporation. All Rights Reserved. EMC.docxCopyright © 2012 EMC Corporation. All Rights Reserved. EMC.docx
Copyright © 2012 EMC Corporation. All Rights Reserved. EMC.docx
 
Ba introduction
Ba introductionBa introduction
Ba introduction
 
Ba introduction
Ba introductionBa introduction
Ba introduction
 
Experiences with Data Feedback - Better Software 2004 - Ben Linders
Experiences with Data Feedback - Better Software 2004 - Ben LindersExperiences with Data Feedback - Better Software 2004 - Ben Linders
Experiences with Data Feedback - Better Software 2004 - Ben Linders
 
201505 Statistical Thinking course extract
201505 Statistical Thinking course extract201505 Statistical Thinking course extract
201505 Statistical Thinking course extract
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...
 
Data Granularity and Business Decisions by VCare Insurance Company
Data Granularity and Business Decisions by VCare Insurance CompanyData Granularity and Business Decisions by VCare Insurance Company
Data Granularity and Business Decisions by VCare Insurance Company
 
Growing a Culture of Data-Driven Continuous Improvement
Growing a Culture of Data-Driven Continuous ImprovementGrowing a Culture of Data-Driven Continuous Improvement
Growing a Culture of Data-Driven Continuous Improvement
 
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектовAI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
AI&BigData Lab 2016. Сергей Шельпук: Методология Data Science проектов
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
How to sustain analytics capabilities in an organization
How to sustain analytics capabilities in an organizationHow to sustain analytics capabilities in an organization
How to sustain analytics capabilities in an organization
 
Copyright © 2014 EMC Corporation. All Rights Reserved..docx
Copyright © 2014 EMC Corporation. All Rights Reserved..docxCopyright © 2014 EMC Corporation. All Rights Reserved..docx
Copyright © 2014 EMC Corporation. All Rights Reserved..docx
 
1505 Statistical Thinking course extract
1505 Statistical Thinking course extract1505 Statistical Thinking course extract
1505 Statistical Thinking course extract
 
Data integration my_experience
Data integration my_experienceData integration my_experience
Data integration my_experience
 
Forage Accenture Data Analytics Task 3 - Data Visualization and Storytelling....
Forage Accenture Data Analytics Task 3 - Data Visualization and Storytelling....Forage Accenture Data Analytics Task 3 - Data Visualization and Storytelling....
Forage Accenture Data Analytics Task 3 - Data Visualization and Storytelling....
 
Expert-System for Health Promotion
Expert-System for Health PromotionExpert-System for Health Promotion
Expert-System for Health Promotion
 
Promise notes
Promise notesPromise notes
Promise notes
 

More from Salford Systems

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetSalford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7Salford Systems
 

More from Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

A More Transparent Interpretation of Health Club Surveys (YMCA)

  • 1. Abbott - A More Transparent Interpretation of Health Club Surveys Salford Analytics and Data Mining Conference 2012 San Diego, CA May 24, 2012 Dean Abbott Abbott Analytics, Inc. URL: http://www.abbottanalytics.com Blog: http://abbottanalytics.blogspot.com Twitter: @deanabb Copyright © 2004-2012, Abbott Analytics, Inc. 1 and Seer Analytics, Inc. All rights reserved.
  • 2. About Seer Analytics  Seer Analytics, LLC – Founded in 2001, based in Tampa, FL – Produce actionable intelligence to help clients make smarter decision and drive business performance. – Reports embed sophisticated analytics yet are designed to be accessible and meaningful to a non-technical audience.  Bill Lazarus, Founder, President and CEO – BA from University of Wisconsin – MA from University of Toronto, – SM and PhD from Massachusetts Institute of Technology. Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 2
  • 3. About Abbott Analytics  Abbott Analytics – Founded in 1999, based in San Diego, CA – Dedicated to data mining consulting and training  Principal: Dean Abbott – Applied Data Mining for 22+ years in  Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud Detection, Predictive Toxicology, Biological Risk Assessment – Course Instruction  Public 1-, 2-, and 3-day Data Mining Courses  Conference Tutorials and Workshops (next: ACM Data Mining Bootcamp, November 13th in San Francisco) – Customized Training and Knowledge Transfer  Data mining methodology (CRISP-DM)  Training services for software products, including CART, Clementine, Affinium Model, Insightful Miner Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 3
  • 4. Talk Outline  Health Club Survey Analysis Problem Description – Overview of Survey Analysis and Approaches  Solution 1: traditional approach – Statistical approach, fancy visualization  Solution 2: solution aligning models to business objectives  Results and conclusions Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 4
  • 5. Problem Setup: Member Survey  Question: – What are the characteristics of members who indicated the highest overall satisfaction with their Y?  Data: – 32,811 records containing survey answers – No demographic data except what was on survey (marital status, children, age, gender)  Approach: – Create supervised learning models with target variable “overall_satisfaction = 1” Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 5
  • 6. Some Notes  It is very unusual to have so many records – The 31K responses were for one year – Responses are collected from across the country  Seer tracks survey responses longitudinally as well (not discussed in this talk) – Began collecting survey responses and storing in a database in 2001 => 10 years of data – Seer has moved beyond modeling satisfaction to include a more complete view of the YMCA member experience Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 6
  • 7. Data Preparation  Begin with 57 candidate inputs to model – All survey questions are multiple choice  Treated as categories, not numbers  Typically 6 categories per question (1-5)  Unknown initially coded as “0” – No text comments fields included as inputs to model  Create new column for target variable – If overall_satisfaction = 1, variable value = 1, otherwise, variable value = 0  Data very clean with respect to NULLs Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 7
  • 8. Member Survey Question Categories Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 8
  • 9. Sampling and Target Populations  Begin with 32,811 responses  Set aside about half for validation (not used during modeling): 16,379 records – These records will be used to provide final summaries of the segments  Q1 - Satisfaction = 1: 31% – 86% have Recommend to friends = 1  Q48 - Recommend to Friend = 1: 54% – 49% have Overall Satisfaction = 1 – 26.0% have both overall satisfaction and recommend to friends both equal to 1  Q32 - Likelihood to Renew = 1: 46%  Implications – All three are interesting, but Recommend is so high already, not much room for growth Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 9
  • 10. Objective and Data Challenges  Project Objective – Interpret results of survey for YMCA  Challenges – Missing data (some questions either N/A or blank)  Solution: Impute values that least effect information communicated by question (not a mean or median!) – Question responses highly correlated with one another  Multi-collinearity and interpretation of results problematic  Must reduce dimensionality without losing interpretation of results  Solution: Factor analysis Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 10
  • 11. Objective and Data Challenges  Challenges, cont‟d – Target variable  Three questions pointed to the important actionable information (related to how satisfied members were)  No one question fully characterized the value of a member  Solution: combine all three into a new “index of excellence” (IOE) – IOE = additive weighted sum of Q1, Q32, Q48 – Reverse scale so higher IOE is better Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 11
  • 12. Why Factor Analysis?  “Traditional approach to survey analysis involves the use of frequency counts, t-test, correlation, and measures of central tendency. “  “Factor analysis is a variable-reduction statistical technique capable of probing underlying relationships in variables” – Santos, J.R.A., Clegg, M.D. (1999), "Factor analysis adds new dimension to extension surveys", Journal of Extension, http://www.joe.org/joe/1999october/rb6.php  Our use of Factor Analysis – Traditional view: there is an underlying “truth” that exist, and the survey is a redundant measure of that truth. – Just a derived variable that reduces dimensionality Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 12
  • 13. Factor Analysis: Key Factors Factor 1 1.00 Loading Value 0.80 0.60 0.40 0.20 0.00 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Top Question Loadings Factor 2 0.80 Loading Values 0.60 0.40 0.20 0.00 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q23 Top Question Loadings Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 13
  • 14. Member Survey Factor Analysis Loadings Condition of Friendly / Factor Facilities Specific Competent Financial Description Staff Cares clean/safe Equipment Registration Equipment Staff Assistance Parking Factor Number Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Q2 0.295 0.238 0.115 0.458 0.054 0.380 (0.016) 0.095 Q3 0.217 0.143 0.093 0.708 0.094 0.077 0.033 0.048 Q4 0.298 0.174 0.106 0.601 0.068 0.266 0.002 0.062 Q5 0.442 0.198 0.087 0.173 0.025 0.613 (0.021) 0.053 Q6 0.417 0.254 0.142 0.318 0.044 0.584 (0.008) 0.058 Q7 0.406 0.277 0.167 0.252 0.045 0.461 0.003 0.092 Q8 0.774 0.058 0.041 0.093 0.052 0.113 0.036 0.061 Q9 0.733 0.175 0.108 0.145 0.052 0.260 0.024 0.052 Q10 0.786 0.139 0.079 0.110 0.060 0.218 0.029 0.046 Q11 0.765 0.120 0.101 0.132 0.089 0.015 0.038 0.047 Q12 0.776 0.090 0.049 0.087 0.041 0.014 0.042 0.053 Q13 0.145 0.728 0.174 0.112 0.106 0.110 0.006 0.018 Q14 0.191 0.683 0.163 0.151 0.053 0.124 0.013 0.089 Q15 0.102 0.598 0.141 0.090 0.162 0.070 0.029 0.152 Q16 0.100 0.370 0.133 0.082 0.028 0.035 0.009 0.843 Q17 0.128 0.567 0.229 0.102 0.116 0.080 0.018 0.224 Q18 0.148 0.449 0.562 0.116 0.132 0.114 0.010 0.042 Q19 0.129 0.315 0.811 0.101 0.102 0.103 0.002 0.063 Q20 0.171 0.250 0.702 0.086 0.145 0.078 0.016 0.149 Q23 0.271 0.220 0.188 0.316 0.121 0.046 0.069 0.019 Q24 0.363 0.165 0.128 0.140 0.080 0.095 0.035 0.076 Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 14
  • 15. Reduce Variables using Regression  Already beginning Regression Rankings of Questions/Factors with only 13 variables 0.6 Regression Coefficient 0.5  Question: how 0.4 many of these are 0.3 useful predictors? 0.2  Decided to retain 5 0.1 factors for final 0 fa 1 0 2 9 1 4 3 8 6 5 7 44 22 fa 25 3. 3. 3. 3. 3. 3. 3. 3. 3. model 3. Q Q Q or or or or or or or or or or ct ct ct ct ct ct ct ct ct ct fa fa fa fa fa fa fa fa Question/Factor Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 15
  • 16. Predictive Modeling Approach 3 questions with Identify Key high association 50+ Survey Questions Questions with target Regression Model: Find Significant Variables 13 fields Factor Analysis: down to 7 10 factors, or 10 factors variables that loaded Regression Model: highest on Find Significant each factor Variables 3 key questions Variable ranks Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 16
  • 17. One Further Note on Final Regression Models  Empirical comparison: Factors as inputs vs. Top-loading question in factor as input – Top-loading or most interesting question on factor as representative of that factor produced slightly better models – Use of top-loading question makes final model more easily understood – This flies in the face of traditional theory, but worked better operationally  Final regression model contained these fields: Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 17
  • 18. Key: Explaining Results Drivers of  Visualization shows key Satisfaction variables in survey associated with “excellence”, and Current performance metrics for Year vs. Staff 1 Staff 2 relationships Last Year each Y – How well did this Y do? goals – What is the change over last year‟s result? Prior year – This is a 45-dimensional equipment visualization (don‟t ask value me to name them all!) facility  Shows which attributes does the Y need to improve to improve customer satisfaction. Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 18
  • 19. Interpreting Index/Drivers of Excellence Analysis  All factors listed are important  Position on „x‟ axis indicates relative importance of factors in driving IOE  Above the line = “better than peers”; below the line = “worse than peers”  Small dot indicates position last year  R-Y-G indicates magnitude of change from previous year  Size of bubble indicates magnitude of score on factor Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 19
  • 20. drivers of excellence May 2003 Gardena YMCA 2003 vs.2002 Relative Performance Better Staff cares Meet fitness goals Same Staff competence Worse Peer Average 2002 year Prior Feel welcome Value Equipment Facilities Importance © 2003 Seer Analytics, LLC Tampa, FL 33602 Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 20
  • 21. Equipment drivers of Staff cares excellence Facilities May 2003 Feel welcome Montebello YMCA 2003 vs.2002 Relative Performance Staff competence Better Same Meet fitness goals Worse Peer Average Value 2002 year Prior Importance © 2003 Seer Analytics, LLC Tampa, FL 33602 Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 21
  • 22. drivers of excellence Equipment May 2003 Torrance South YMCA Staff competence Feel welcome Value Staff cares 2003 vs.2002 Relative Performance Better Same Worse Peer Facilities Average 2002 year Prior Meet fitness goals Importance © 2003 Seer Analytics, LLC Tampa, FL 33602 Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 22
  • 23. drivers of I love this visualization! excellence May 2003 Culver YMCA Feel welcome Staff cares 2003 vs.2002 Relative Performance Value Better Facilities Same Worse Peer Average Prior year Staff competence Meet fitness goals 2002 Equipment Importance © 2003 Seer Analytics, LLC Tampa, FL 33602 Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 23
  • 24. What’s the Problem with That?  Customer was not interested in “techno” solutions  Customer was interested in what actions could be taken as a result of the data mining models – Which characteristics are most correlated with best customers?  What do they like and dislike about the Y?  Is it equipment? relationships? facility? staff? – Show key contributors, how each Y compared with other Y locations, and if Y is improving Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 24
  • 25. So What’s The Problem with That? (cont’d)  Regression, Neural Networks are “global” estimators – The operate over the entire data space – Descriptors of Regression represent average influence – Neither technique provides explicit localized characteristics  Customer would like actionable analytics – Clear characteristics of subgroups – Different strategies for subgroups  Conclusion: In Round 2, use another approach Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 25
  • 26. Who cares about “satisfaction”?  Issue: The YMCA is a cause-driven charity  It‟s not about running “satisfactory” gyms  It‟s about improving lives and building communities  Question: How can the member survey data help Ys achieve mission goals?  Answer: Develop a tool that is:  Grounded in solid social science  Accessible/understandable  Diagnostic/predictive  A driver of performance and change Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 26
  • 27. Satisfaction Model Performance 100 100 80 80 Misclassification for Learn Data % Class 60 60 Class N N Mis- Pct Cost 40 40 Cases Classed Error 0 33,220 8,178 24.62 0.25 20 20 1 14,845 2,622 17.66 0.18 0 0 0 20 40 60 80 100 % Population Node Cases % of Node % Cum % Cum % % Cases Cum Lift Tgt. Class Tgt. Class Tgt. Class Tgt. Class Pop Pop in Node lift Pop 1 7,289 72.788 49.101 49.101 20.834 20.834 10,014 2.357 2.357 9 904 51.984 6.09 55.19 24.452 3.618 1,739 2.257 1.683 2 2,317 50.612 15.608 70.798 33.977 9.525 4,578 2.084 1.639 3 471 46.45 3.173 73.971 36.087 2.11 1,014 2.05 1.504 10 431 43.186 2.903 76.874 38.163 2.076 998 2.014 1.398 4 349 40.819 2.351 79.225 39.942 1.779 855 1.984 1.322 12 462 38.404 3.112 82.337 42.445 2.503 1,203 1.94 1.243 Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 27
  • 28. Satisfaction Model from CART® Q25 Q13 Q13 • Q25: Feel Welcome Q22 Q6 Q2 – Surrogate: Q24 (can relate to other 1 members) Q13 Q31 Q31 – Q13: Facilities are clean 2 9 – Surrogate: Q14 (Facilities safe and 15 secure) Q19 Q36 • Q22: Value for Money 8 10 – Surrogates: Q21 (convenient Q36 schedule) and Q23 (quality 3 classes/programs) Q34 $ – Q6: Staff Competent – Surrogates: Q5 (friendly staff) and Q31 Q7 (enough staff) Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 28
  • 29. Member Satisfaction Model: Key Rules Terminal Node 1 Terminal Node 2 Terminal Node 3 • 10,014 surveys (20.8%), • 4.578 surveys (9.5%), • 1,739 surveys (3.6%), • 7,289 highly satisfied • 2,317 highly satisfied • 904 highly satisfied (72.8%), (50.6%), (52.0%), • 49% of all highly satisfied • 15.6% of all highly satisfied • 6.1% of all highly satisfied RULE: RULE: RULE: If strongly agree that If strongly agree that If strongly agree that Y has the right equipment facilities are clean and feel welcome and and strongly agree that feel strongly agree that strongly agree Y is value welcome, and somewhat member feels welcome, for money, even if don‟t agree that facilities are then highly satisfied strongly agree facilities clean, even though don‟t are clean, then highly strongly feel Y is good satisfied value for the money, then Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. highly satisfied 29
  • 30. Member Satisfaction Model: Other Rules Terminal Node 10 Terminal Node 9 • 998 surveys (2.1%), •1,739 surveys (3.6%), • 431 highly satisfied (43.2%), • 904 highly satisfied (52.0%), • 2.9% of all highly satisfied • 6.1% of all highly satisfied RULE: weakest of top 5 RULE If strongly agree that loyal to Y If strongly agree that facilities are and strongly agree that facilities clean, and strongly agree that staff are clean, even though don‟t is competent, even if don‟t strongly agree that feel welcome strongly agree feel welcome, then nor strongly agree that staff is highly satisfied competent, then highly satisfied Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 30
  • 31. Member Satisfaction Model: Unsatisfied Rules Terminal Node 15 Terminal Node 8 • 19,323 surveys (40.2%), • 1,364 surveys (2.8%), • 1,231 highly satisfied (6.4%), • 141 highly satisfied (10.3%), • 8.3% of highly satisfied • 1.0% of all highly satisfied • 58.2% of all not highly satisfied RULE RULE: If don‟t strongly agree that If don‟t strongly agree that staff facilities are clean and don‟t is efficient and don‟t strongly strongly agree that the Y is agree that feel welcome, and don‟t good value for the money, even strongly agree that the facilities though strongly agree that feel are clean, then member isn‟t welcome, member isn‟t highly highly satisfied satisfied. Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 31
  • 32. Recommend to Friend Model from CART® Q31  Q31: Loyal – Surrogates: Q25, Q44, Q22, Q24 (can relate to Q25 Q22 other members)  Q25: Feel Welcome Q22 Q25 – Surrogates: Q24, Q5 1 7 (friendly staff) Q44  Q22: Value for Money 2 4 – Surrogates: Q23 (quality classes/programs) 5  Q44: Helps meet fitness goals Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 32
  • 33. Recommend to Friend Model: Key Rules Terminal Node 1 Terminal Node 2 Terminal Node 4 • 13,678 surveys (28.5%), • 6,637 surveys (13.8%), • 2,628 surveys (5.5%), • 12,122 recommend (88.6%), • 4,744 recommend (71.5%), • 1,932 recommend (73.5%), • 47.0% of all strong • 18.4% of all strong • 6.1% of all strong recommends recommends recommends RULE: RULE: RULE If strongly agree that loyal to If strongly agree that loyal to If strongly agree that Y is a Y and strongly agree that feel Y and agree that Y is a good good value for the money and welcome, then strongly agree value for the money, even strongly agree that feel that will recommend to friend though don‟t strongly agree welcome, even though not feel welcome, strongly agree strongly loyal to Y, strongly will recommend to friend. agree will recommend to friend Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 33
  • 34. Recommend to Friend Model: Other Rules Terminal Node 7 Terminal Node 5 • 21,865 surveys (45.5%), • 814 surveys (1.7%), • 5,461 highly recommend (25.0%), • 509 highly recommend (62.5%), • 21.2% of all highly recommend • 2.0% of all highly recommend RULE: RULE If don‟t strongly agree that loyal to If strongly agree that Y is good value Y and don‟t strongly agree that Y is for the money, and strongly agree that value for the money, then will not Y helps meet fitness goals, even highly recommend to a friend though not strongly loyal to the Y and don‟t strongly feel welcome, will highly recommend to a friend Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 34
  • 35. Intend to Renew Model from CART® • Q25: Feel Welcome Q25 – Surrogate: Q24 (can relate to other members) – Q44: Helps meet fitness goals Q44 Q44 – Surrogate: Q51 (visit frequency) • Q22: Value for Money – Left split Surrogates: Q21 (convenient Q22 Q22 schedule) and Q23 (quality 1 8 classes/programs) – Right split surrogates: Q25 (feel Q47 Q27 welcome=2 or 3) 2 – Q47: Would be donor 7 – Surrogate: Q45A (have been donor) – Q27: Feel sense of belonging 3 – Surrogates: Q25, Q24 5 Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 35
  • 36. Intend to Renew Model: Key Rules Terminal Node 1 Terminal Node 2 Terminal Node 5 • 13,397 surveys (27.9%), • 3,051 surveys (6.3%), • 5,704 surveys (11.9%), • 9,903 renew (73.9%), • 1,823 renew (59.8%), • 3,201 recommend (56.1%), • 48.4% of all intend to • 8.9% of all intend to renew • 15.6% of all intend to renew renew RULE: RULE: If strongly agree Y is good RULE If strongly agree that feel value for the money and If strongly agree that feel welcome and strongly strongly agree that feel sense of belonging, and agree that Y helps meet welcome, even if don‟t agree that Y is value for the fitness goals, then strongly agree that Y helps money, and strongly agree strongly agree that intend meet fitness goals, then that Y helps meet fitness to renew strongly agree that intend to goals, even if don‟t feel renew welcome, then strongly agree intend to renew. Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 36
  • 37. Intend to Renew Model: Other Rules Terminal Node 8 Terminal Node 7 18,547 surveys (38.6%), 2,178 surveys (4.5%), • 3,130 strongly intend to renew (16.9%), • 578 strongly intend to renew (26.5%), • 15.3% of all strongly intend to renew • 2.8% of all strongly intend to renew RULE: RULE If don‟t strongly agree that feel welcome If don‟t strongly agree that Y is good and don‟t strongly agree that Y helps meet value for money and don‟t strongly fitness goals, then don‟t strongly agree agree that feel welcome, even if that intend to renew strongly agree Y helps meet fitness goals, don‟t strongly agree that intend to renew. Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 37
  • 38. Summary of Key Questions in Models  Feel Welcome was root splitter (or surrogate) for each model  Satisfaction is different than Recommend and Renew in other respects – Helps meet fitness goals was in Recommend and Renew models, but not satisfaction – Facilities clean only in satisfaction model 3 Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 38
  • 39. Key Differences Between Targets, Put Another Way  Satisfaction – Feel Welcome – Clean Facility  Renewal – Feel Welcome – Y Helps Meet Fitness Goals, Value for $$  Recommend to Friend – Feel Welcome – Loyal to Y, Value for $$ Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 39
  • 40. Top Terminal Nodes Comprise More than 70% of Hits Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 40
  • 41. Subsequent Results Percent Measure 2002 2009 Improvement Satisfaction 31% 41% 32% Recommend to Friend 54% 57% 6%  Rules from Models are still in use today  Trees and Factors can help reduce # questions in survey – Employee ruleset (using same methodology) resulted in a new “short-form” survey using only questions in the splits – Not yet implemented in Member survey Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 41
  • 42. Index Construction and Scaling  Begin with Factor Analysis  Cluster attribute groupings to be managerially meaningful  Z-normalize the variables, cast all in units of variance  Run tests for deviation from Standard Normal by variable and factor  Create z-index for each factor  Re-scale to nation-wide percentile
  • 43. Analysis of Hierarchy Claim Pearson Correlations Existing Order Facilities Support Value Engagement Impact Involvement Facilities 1.00 0.53 0.65 0.37 0.25 0.04 Support 1.00 0.72 0.78 0.46 0.16 Value 1.00 0.65 0.55 0.25 Engagement 1.00 0.61 0.53 Impact 1.00 0.53 Involvement 1.00 n=425 Pearson Correlations Reverse Value and Support Facilities Value Support Engagement Impact Involvement Facilities 1.00 0.65 0.53 0.37 0.25 0.04 Value 1.00 0.72 0.65 0.55 0.25 Support 1.00 0.78 0.46 0.16 Engagement 1.00 0.61 0.53 Impact 1.00 0.53 Involvement 1.00
  • 44. Summary from “Power of Habit” In 2000, for instance, two statisticians were hired by the YMCA—one of the nation’s largest nonprofit organizations— to use the powers of data-driven fortune-telling to make the world a healthier place. The YMCA has more than 2,600 branches in the United States, most of them gyms and community centers. About a decade ago, the organization’s leaders began worrying about how to stay competitive. They asked a social scientist and a mathematician—Bill Lazarus and Dean Abbott—for help. The two men gathered data from more than 150,000 YMCA member satisfaction surveys that had been collected over the years and started looking for patterns. At that point, the accepted wisdom among YMCA executives was that people wanted fancy exercise equipment and sparkling, modern facilities. The YMCA had spent millions of dollars . Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 44
  • 45. Summary from “Power of Habit”: YMCA Satisfaction Retention, the data said, was driven by emotional factors, such as whether employees knew members’ names or said hello when they walked in. People, it turns out, often go to the gym looking for a human connection, not a treadmill. If a member made a friend at the YMCA, they were much more likely to show up for workout sessions. In other words, people who join the YMCA have certain social habits. If the YMCA satisfied them, members were happy. So if the YMCA wanted to encourage people to exercise, it needed to take advantage of patterns that already existed, and teach employees to remember visitors’ names. It’s a variation of the lesson learned by Target and radio DJs: to sell a new habit—in this case exercise—wrap it in something that people already know and like, such as the instinct to go places where it’s easy to make friends. “We’re cracking the code on how to keep people at the gym,” Lazarus told me. “People want to visit places that satisfy their Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 45
  • 46. Conclusions  The “best” solutions are not always “good” solutions – There is often more than one way to approach a solution – It is often unclear even to the end customer what solution is best until the solution exists on paper  Interactions are the Key (or why trees improve regression models) – Main effects are interesting, but deeper insights gained from subgroups  Don‟t give up – Matching data to decisions is difficult business – Get feedback; make sure the story themodel tells is understood by decision-makers Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 46