Abbott a more transparent interpretation of health club surveys

Abbott - A More Transparent Interpretation of
Health Club Surveys
Salford Analytics and Data Mining Conference
2012
San Diego, CA
May 24, 2012
Dean Abbott
Abbott Analytics, Inc.
URL: http://www.abbottanalytics.com
Blog: http://abbottanalytics.blogspot.com
Twitter: @deanabb
Copyright © 2004-2012, Abbott Analytics, Inc. 1
and Seer Analytics, Inc. All rights reserved.

About Seer Analytics

  Seer Analytics, LLC
– Founded in 2001, based in Tampa, FL
– Produce actionable intelligence to help clients make
smarter decision and drive business performance.
– Reports embed sophisticated analytics yet are designed
to be accessible and meaningful to a non-technical
audience.
  Bill Lazarus, Founder, President and CEO
– BA from University of Wisconsin
– MA from University of Toronto,
– SM and PhD from Massachusetts Institute of Technology.

Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. 2

About Abbott Analytics
  Abbott Analytics
–  Founded in 1999, based in San Diego, CA
–  Dedicated to data mining consulting and training
  Principal: Dean Abbott
–  Applied Data Mining for 22+ years in
  Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud
Detection, Predictive Toxicology, Biological Risk Assessment
–  Course Instruction
  Public 1-, 2-, and 3-day Data Mining Courses
  Conference Tutorials and Workshops (next: ACM Data Mining
Bootcamp, November 13th in San Francisco)
–  Customized Training and Knowledge Transfer
  Data mining methodology (CRISP-DM)
  Training services for software products, including CART, Clementine,
Affinium Model, Insightful Miner


Talk Outline
  Health Club Survey Analysis Problem
Description
–  Overview of Survey Analysis and Approaches
  Solution 1: traditional approach
–  Statistical approach, fancy visualization
  Solution 2: solution aligning models to
business objectives
  Results and conclusions


Problem Setup: Member Survey

  Question:
–  What are the characteristics of members who indicated
the highest overall satisfaction with their Y?
  Data:
–  32,811 records containing survey answers
–  No demographic data except what was on survey
(marital status, children, age, gender)
  Approach:
–  Create supervised learning models with target variable
overall_satisfaction = 1


Some Notes
  It is very unusual to have so many records
–  The 31K responses were for one year
–  Responses are collected from across the country
  Seer tracks survey responses longitudinally
as well (not discussed in this talk)
–  Began collecting survey responses and storing in
a database in 2001 => 10 years of data
–  Seer has moved beyond modeling satisfaction to
include a more complete view of the YMCA
member experience


Data Preparation
  Begin with 57 candidate inputs to model
–  All survey questions are multiple choice
  Treated as categories, not numbers
  Typically 6 categories per question (1-5)
  Unknown initially coded as 0
–  No text comments fields included as inputs to model
  Create new column for target variable
–  If overall_satisfaction = 1, variable value = 1,
otherwise, variable value = 0
  Data very clean with respect to NULLs


Member Survey Question Categories


Sampling and Target
Populations
  Begin with 32,811 responses
  Set aside about half for validation (not used during modeling):
16,379 records
–  These records will be used to provide final summaries of the
segments
  Q1 - Satisfaction = 1: 31%
–  86% have Recommend to friends = 1
  Q48 - Recommend to Friend = 1: 54%
–  49% have Overall Satisfaction = 1
–  26.0% have both overall satisfaction and recommend to friends
both equal to 1
  Q32 - Likelihood to Renew = 1: 46%
  Implications
–  All three are interesting, but Recommend is so high already, not
much room for growth


Objective and Data Challenges

  Project Objective
–  Interpret results of survey for YMCA
  Challenges
–  Missing data (some questions either N/A or blank)
  Solution: Impute values that least effect information
communicated by question (not a mean or median!)
–  Question responses highly correlated with one another
  Multi-collinearity and interpretation of results problematic
  Must reduce dimensionality without losing interpretation of
results
  Solution: Factor analysis


Objective and Data Challenges
  Challenges, cont d
–  Target variable
  Three questions pointed to the important actionable
information (related to how satisfied members were)
  No one question fully characterized the value of a member
  Solution: combine all three into a new index of
excellence (IOE)
–  IOE = additive weighted
sum of Q1, Q32, Q48
–  Reverse scale so higher
IOE is better


Why Factor Analysis?

  Traditional approach to survey analysis involves
the use of frequency counts, t-test, correlation,
and measures of central tendency.
  Factor analysis is a variable-reduction statistical
technique capable of probing underlying
relationships in variables
–  Santos, J.R.A., Clegg, M.D. (1999), "Factor analysis adds
new dimension to extension surveys", Journal of Extension,
http://www.joe.org/joe/1999october/rb6.php
  Our use of Factor Analysis
–  Traditional view: there is an underlying truth that exist,
and the survey is a redundant measure of that truth.
–  Just a derived variable that reduces dimensionality


Factor Analysis: Key Factors
Factor 1

1.00

Loading Value
0.80
0.60
0.40
0.20
0.00
Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
Top Question Loadings

Factor 2

0.80

Loading Values
0.60
0.40
0.20

0.00
Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q23
Top Question Loadings


Member Survey Factor Analysis
Loadings Condition of Friendly /
Factor Facilities Specific Competent Financial
Description Staff Cares clean/safe Equipment Registration Equipment Staff Assistance Parking
Factor Number !"#$%&' !"#$%&( !"#$%&) !"#$%&* !"#$%&+ !"#$%&, !"#$%&- !"#$%&.
Q2 0.295 0.238 0.115 0.458 0.054 0.380 (0.016) 0.095
Q3 0.217 0.143 0.093 0.708 0.094 0.077 0.033 0.048
Q4 0.298 0.174 0.106 0.601 0.068 0.266 0.002 0.062
Q5 0.442 0.198 0.087 0.173 0.025 0.613 (0.021) 0.053
Q6 0.417 0.254 0.142 0.318 0.044 0.584 (0.008) 0.058
Q7 0.406 0.277 0.167 0.252 0.045 0.461 0.003 0.092
Q8 0.774 0.058 0.041 0.093 0.052 0.113 0.036 0.061
Q9 0.733 0.175 0.108 0.145 0.052 0.260 0.024 0.052
Q10 0.786 0.139 0.079 0.110 0.060 0.218 0.029 0.046
Q11 0.765 0.120 0.101 0.132 0.089 0.015 0.038 0.047
Q12 0.776 0.090 0.049 0.087 0.041 0.014 0.042 0.053
Q13 0.145 0.728 0.174 0.112 0.106 0.110 0.006 0.018
Q14 0.191 0.683 0.163 0.151 0.053 0.124 0.013 0.089
Q15 0.102 0.598 0.141 0.090 0.162 0.070 0.029 0.152
Q16 0.100 0.370 0.133 0.082 0.028 0.035 0.009 0.843
Q17 0.128 0.567 0.229 0.102 0.116 0.080 0.018 0.224
Q18 0.148 0.449 0.562 0.116 0.132 0.114 0.010 0.042
Q19 0.129 0.315 0.811 0.101 0.102 0.103 0.002 0.063
Q20 0.171 0.250 0.702 0.086 0.145 0.078 0.016 0.149
Q23 0.271 0.220 0.188 0.316 0.121 0.046 0.069 0.019
Q24 0.363 0.165 0.128 0.140 0.080 0.095 0.035 0.076


Reduce Variables using
Regression
  Already beginning
Regression Rankings of Questions/Factors
with only 13
variables 0.6

Regression Coefficient
0.5
  Question: how 0.4
many of these are 0.3

useful predictors? 0.2

  Decided to retain 5 0.1

0
factors for final

fa 1 0
2

9

1

4

3

8

6

5

7
44

22

fa 25
3.

3.

3.

3.

3.

3.

3.

3.

3.
model

3.
Q

Q

Q

or

or

or

or

or

or

or

or

or
or
ct

ct

ct

ct

ct

ct

ct

ct

ct
ct
fa

fa

fa

fa

fa

fa

fa
fa
Question/Factor


Predictive Modeling Approach

3 questions with
Identify Key high association
with target
50+ Survey Questions

Questions
Regression Model:
Find Significant
Variables 13 fields
Factor Analysis: down to 7
10 factors, or
10 factors variables that
loaded Regression Model:
highest on Find Significant
each factor Variables
3 key questions Variable
ranks


One Further Note on Final
Regression Models
  Empirical comparison: Factors as inputs vs. Top-loading question in
factor as input
–  Top-loading or most interesting question on factor as representative
of that factor produced slightly better models
–  Use of top-loading question makes final model more easily
understood
–  This flies in the face of traditional theory, but worked better
operationally
  Final regression model contained these fields:


Key: Explaining Results
Drivers of
  Visualization shows key Satisfaction
variables in survey
associated with
excellence , and
Current
performance metrics for Year vs.
Staff 2 relationships
Last Year Staff 1
each Y
–  How well did this Y do? goals
–  What is the change over
last year s result? Prior year

–  This is a 45-dimensional equipment
visualization (don t ask value
me to name them all!) facility
  Shows which attributes
does the Y need to
improve to improve
customer satisfaction.


Interpreting Index/Drivers of
Excellence Analysis

  All factors listed are important

  Position on x axis indicates relative importance of
factors in driving IOE

  Above the line = better than peers ; below the line =
worse than peers

  Small dot indicates position last year

  R-Y-G indicates magnitude of change from previous year

  Size of bubble indicates magnitude of score on factor


drivers
of

excellence

May
2003

G ardena
Y MC A

2003
vs .2002
R elative
P erformance

B etter Staff cares Meet fitness goals

S ame Staff competence

Wors e P eer

Average

2002 year
Prior
Feel welcome

Value

Equipment
Facilities

Importance

©
2003
S eer
Analytics ,
L L C
T ampa,
F L
33602


Equipment

drivers
of
Staff cares
excellence Facilities

May
2003 Feel welcome

Montebello
Y MC A

2003
vs .2002
R elative
P erformance

Staff competence
B etter
S ame Meet fitness goals
Wors e P eer

Average
Value

2002 year
Prior

Importance

©
2003
S eer
Analytics ,
L L C
T ampa,
F L
33602


drivers
of

excellence
Equipment

May
2003

T orrance
S outh
Y MC A
Staff competence Feel welcome
Value
Staff cares
2003
vs .2002
R elative
P erformance

B etter
S ame
Wors e P eer

Facilities Average

2002 year
Prior Meet fitness goals

Importance

©
2003
S eer
Analytics ,
L L C
T ampa,
F L
33602


drivers
of

I love this visualization!
excellence

May
2003

C ulver
Y MC A

Feel welcome
Staff cares
2003
vs .2002
R elative
P erformance

Value
B etter Facilities

S ame
Wors e P eer

Average

Staff competence Meet fitness goals
2002
Prior year

Equipment

Importance

©
2003
S eer
Analytics ,
L L C
T ampa,
F L
33602


What s the Problem with That?
  Customer was not interested in techno
solutions
  Customer was interested in what actions could be
taken as a result of the data mining models
–  Which characteristics are most correlated with best
customers?
  What do they like and dislike about the Y?
  Is it equipment? relationships? facility? staff?
–  Show key contributors, how each Y compared with other
Y locations, and if Y is improving


So What s The Problem with
That? (cont’d)
  Regression, Neural Networks are global
estimators
–  The operate over the entire data space
–  Descriptors of Regression represent average influence
–  Neither technique provides explicit localized
characteristics
  Customer would like actionable analytics
–  Clear characteristics of subgroups
–  Different strategies for subgroups
  Conclusion: In Round 2, use another approach


Who cares
about satisfaction ?
  Issue: The YMCA is a cause-driven charity
  It s not about running satisfactory gyms
  It s about improving lives and building communities
  Question: How can the member survey
data help Ys achieve mission goals?
  Answer: Develop a tool that is:
  Grounded in solid social science
  Accessible/understandable
  Diagnostic/predictive
  A driver of performance and change


Satisfaction Model Performance
100 100

80 80
Misclassification for Learn Data

% Class
60 60
Class N N Mis- Pct Cost
40 40
Cases Classed Error
0 33,220 8,178 24.62 0.25 20 20
1 14,845 2,622 17.66 0.18
0 0
0 20 40 60 80 100
% Population

Node Cases % of Node % Cum % Cum % % Cases Cum Lift
Tgt. Class Tgt. Class Tgt. Class Tgt. Class Pop Pop in Node lift Pop
1 7,289 72.788 49.101 49.101 20.834 20.834 10,014 2.357 2.357
9 904 51.984 6.09 55.19 24.452 3.618 1,739 2.257 1.683
2 2,317 50.612 15.608 70.798 33.977 9.525 4,578 2.084 1.639
3 471 46.45 3.173 73.971 36.087 2.11 1,014 2.05 1.504
10 431 43.186 2.903 76.874 38.163 2.076 998 2.014 1.398
4 349 40.819 2.351 79.225 39.942 1.779 855 1.984 1.322
12 462 38.404 3.112 82.337 42.445 2.503 1,203 1.94 1.243

Satisfaction Model from CART®
Q 25

Q 13 Q 13

•  Q25: Feel Welcome
Q 22 Q6 Q2 –  Surrogate: Q24 (can relate to other
1 members)
Q 13 Q 31 Q 31
–  Q13: Facilities are clean
2 9 –  Surrogate: Q14 (Facilities safe and
15 secure)
Q 19 Q 36
•  Q22: Value for Money
8 10 –  Surrogates: Q21 (convenient
Q 36
schedule) and Q23 (quality
3 classes/programs)
Q 34 $ –  Q6: Staff Competent
–  Surrogates: Q5 (friendly staff) and
Q 31
Q7 (enough staff)


Member Satisfaction Model:
Key Rules
Terminal Node 1 Terminal Node 2 Terminal Node 3

•  10,014 surveys (20.8%), •  4.578 surveys (9.5%), •  1,739 surveys (3.6%),
•  7,289 highly satisfied •  2,317 highly satisfied •  904 highly satisfied
(72.8%), (50.6%), (52.0%),
•  49% of all highly satisfied •  15.6% of all highly satisfied •  6.1% of all highly satisfied

RULE: RULE: RULE:
If strongly agree that If strongly agree that If strongly agree that Y
has the right equipment
facilities are clean and feel welcome and
and strongly agree that feel
strongly agree that strongly agree Y is value welcome, and somewhat
member feels welcome, for money, even if don t agree that facilities are
then highly satisfied strongly agree facilities clean, even though don t
are clean, then highly strongly feel Y is good
satisfied value for the money, then
Copyright © 2004-2012, Abbott Analytics, Inc. and Seer Analytics, Inc. All rights reserved. highly satisfied 29

Other Rules
Terminal Node 10 Terminal Node 9

•  998 surveys (2.1%), • 1,739 surveys (3.6%),
•  431 highly satisfied (43.2%), •  904 highly satisfied (52.0%),
•  2.9% of all highly satisfied •  6.1% of all highly satisfied

RULE: weakest of top 5 RULE
If strongly agree that loyal to Y If strongly agree that facilities are
and strongly agree that facilities clean, and strongly agree that staff
are clean, even though don t is competent, even if don t
strongly agree that feel welcome strongly agree feel welcome, then
nor strongly agree that staff is highly satisfied
competent, then highly satisfied


Unsatisfied Rules

•  19,323 surveys (40.2%), •  1,364 surveys (2.8%),
•  1,231 highly satisfied (6.4%), •  141 highly satisfied (10.3%),
•  8.3% of highly satisfied •  1.0% of all highly satisfied
•  58.2% of all not highly satisfied
RULE
RULE: If don t strongly agree that
If don t strongly agree that staff facilities are clean and don t
is efficient and don t strongly strongly agree that the Y is
agree that feel welcome, and don t good value for the money, even
strongly agree that the facilities though strongly agree that feel
are clean, then member isn t welcome, member isn t highly
highly satisfied satisfied.

Recommend to Friend Model
from CART®
Q31   Q31: Loyal
–  Surrogates: Q25, Q44,
Q22, Q24 (can relate to
Q25 Q22
other members)
  Q25: Feel Welcome
Q22 Q25
–  Surrogates: Q24, Q5
1 7 (friendly staff)
Q44   Q22: Value for Money
2 4 –  Surrogates: Q23
(quality classes/
programs)
5
  Q44: Helps meet
fitness goals

Recommend to Friend Model:
Key Rules

•  13,678 surveys (28.5%), •  6,637 surveys (13.8%), •  2,628 surveys (5.5%),
•  12,122 recommend (88.6%), •  4,744 recommend (71.5%), •  1,932 recommend (73.5%),
•  47.0% of all strong •  18.4% of all strong •  6.1% of all strong
recommends recommends recommends

RULE: RULE: RULE
If strongly agree that loyal to If strongly agree that loyal to If strongly agree that Y is a
Y and strongly agree that feel Y and agree that Y is a good good value for the money and
welcome, then strongly agree value for the money, even strongly agree that feel
that will recommend to friend though don t strongly agree welcome, even though not
feel welcome, strongly agree strongly loyal to Y, strongly
will recommend to friend. agree will recommend to
friend


Recommend to Friend Model:
Other Rules


•  21,865 surveys (45.5%), •  814 surveys (1.7%),
•  5,461 highly recommend (25.0%), •  509 highly recommend (62.5%),
•  21.2% of all highly recommend •  2.0% of all highly recommend

RULE: RULE
If don t strongly agree that loyal to If strongly agree that Y is good value
Y and don t strongly agree that Y is for the money, and strongly agree that
value for the money, then will not Y helps meet fitness goals, even
highly recommend to a friend though not strongly loyal to the Y
and don t strongly feel welcome, will
highly recommend to a friend

Intend to Renew Model from
CART®
•  Q25: Feel Welcome
Q25 –  Surrogate: Q24 (can relate to other
members)
–  Q44: Helps meet fitness goals
Q44 Q44 –  Surrogate: Q51 (visit frequency)
•  Q22: Value for Money
–  Left split Surrogates: Q21 (convenient
Q22 Q22 schedule) and Q23 (quality classes/
1 8 programs)
–  Right split surrogates: Q25 (feel
Q47 Q27
welcome=2 or 3)
2 –  Q47: Would be donor
7
–  Surrogate: Q45A (have been donor)
–  Q27: Feel sense of belonging
3 –  Surrogates: Q25, Q24
5

Intend to Renew Model:
Key Rules

•  13,397 surveys (27.9%), •  3,051 surveys (6.3%), •  5,704 surveys (11.9%),
•  9,903 renew (73.9%), •  1,823 renew (59.8%), •  3,201 recommend (56.1%),
•  48.4% of all intend to •  8.9% of all intend to renew •  15.6% of all intend to
renew renew
RULE:
RULE: If strongly agree Y is good RULE
If strongly agree that feel value for the money and If strongly agree that feel
welcome and strongly strongly agree that feel sense of belonging, and
agree that Y helps meet welcome, even if don t agree that Y is value for the
fitness goals, then strongly agree that Y helps money, and strongly agree
strongly agree that intend meet fitness goals, then that Y helps meet fitness
to renew strongly agree that intend to goals, even if don t feel
renew welcome, then strongly
agree intend to renew.

Intend to Renew Model:
Other Rules

18,547 surveys (38.6%), 2,178 surveys (4.5%),
•  3,130 strongly intend to renew (16.9%), •  578 strongly intend to renew (26.5%),
•  15.3% of all strongly intend to renew •  2.8% of all strongly intend to renew

RULE: RULE
If don t strongly agree that feel welcome If don t strongly agree that Y is good
and don t strongly agree that Y helps meet value for money and don t strongly
fitness goals, then don t strongly agree agree that feel welcome, even if strongly
that intend to renew agree Y helps meet fitness goals, don t
strongly agree that intend to renew.


Summary of Key Questions in
Models
  Feel Welcome was root splitter (or surrogate) for
each model
  Satisfaction is different than Recommend and Renew
in other respects
–  Helps meet fitness goals was in Recommend and
Renew models, but not satisfaction
–  Facilities clean only in satisfaction model

3


Key Differences Between
Targets, Put Another Way
  Satisfaction
–  Feel Welcome
–  Clean Facility
  Renewal
–  Feel Welcome
–  Y Helps Meet Fitness Goals, Value for $$
  Recommend to Friend
–  Feel Welcome
–  Loyal to Y, Value for $$


Top Terminal Nodes Comprise
More than 70% of Hits


Subsequent Results

Percent
Measure 2002 2009 Improvement
Satisfaction 31% 41% 32%
Recommend to Friend 54% 57% 6%

  Rules from Models are still in use today
  Trees and Factors can help reduce # questions in
survey
–  Employee ruleset (using same methodology) resulted
in a new short-form survey using only questions in
the splits
–  Not yet implemented in Member survey

Index Construction and Scaling

  Begin with Factor Analysis
  Cluster attribute groupings to be managerially meaningful
  Z-normalize the variables, cast all in units of variance
  Run tests for deviation from Standard Normal by variable and
factor
  Create z-index for each factor
  Re-scale to nation-wide percentile

Analysis of Hierarchy Claim

Pearson Correlations Existing Order
Facilities Support Value Engagement Impact Involvement
Facilities 1.00 0.53 0.65 0.37 0.25 0.04
Support 1.00 0.72 0.78 0.46 0.16
Value 1.00 0.65 0.55 0.25
Engagement 1.00 0.61 0.53
Impact 1.00 0.53
Involvement 1.00
n=425

Pearson Correlations Reverse Value and Support
Facilities Value Support Engagement Impact Involvement
Facilities 1.00 0.65 0.53 0.37 0.25 0.04
Value 1.00 0.72 0.65 0.55 0.25
Support 1.00 0.78 0.46 0.16
Engagement 1.00 0.61 0.53
Impact 1.00 0.53
Involvement 1.00

Summary from “Power of Habit”

In 2000, for instance, two statisticians were hired by the
YMCA—one of the nation’s largest nonprofit organizations—
to use the powers of data-driven fortune-telling to make
the world a healthier place. The YMCA has more than 2,600
branches in the United States, most of them gyms and
community centers. About a decade ago, the organization’s
leaders began worrying about how to stay competitive.
They asked a social scientist and a mathematician—Bill
Lazarus and Dean Abbott—for help.

The two men gathered data from more than 150,000 YMCA
member satisfaction surveys that had been collected over
the years and started looking for patterns. At that point,
the accepted wisdom among YMCA executives was that
people wanted fancy exercise equipment and sparkling,
modern facilities. The YMCA had spent millions of dollars .

Summary from “Power of
Habit”: YMCA Satisfaction
Retention, the data said, was driven by emotional factors, such as
whether employees knew members’ names or said hello when
they walked in. People, it turns out, often go to the gym looking
for a human connection, not a treadmill. If a member made a
friend at the YMCA, they were much more likely to show up for
workout sessions. In other words, people who join the YMCA have
certain social habits. If the YMCA satisfied them, members were
happy. So if the YMCA wanted to encourage people to exercise, it
needed to take advantage of patterns that already existed, and
teach employees to remember visitors’ names. It’s a variation of
the lesson learned by Target and radio DJs: to sell a new habit—in
this case exercise—wrap it in something that people already know
and like, such as the instinct to go places where it’s easy to make
friends.

“We’re cracking the code on how to keep people at the gym,”
Lazarus told me. “People want to visit places that satisfy their

Conclusions
  The best solutions are not always good
solutions
–  There is often more than one way to approach a solution
–  It is often unclear even to the end customer what solution
is best until the solution exists on paper
  Interactions are the Key (or why trees improve
regression models)
–  Main effects are interesting, but deeper insights gained
from subgroups
  Don t give up
–  Matching data to decisions is difficult business
–  Get feedback; make sure the story themodel tells is
understood by decision-makers


Abbott a more transparent interpretation of health club surveys

Recommended

Recommended

More Related Content

Similar to Abbott a more transparent interpretation of health club surveys

Similar to Abbott a more transparent interpretation of health club surveys (20)

Recently uploaded

Recently uploaded (20)

Abbott a more transparent interpretation of health club surveys