Emetrics - Oct 19 2011 - New York - X channel optimisation
Text Mining Summit 2009 V4 (For General Presentations)
1. Harris Interactive Text Analytics
Click to edit Master title style
Strategic Insight and
Text Analytics
2. Harris Interactive History
• Harris Poll – longest running US proprietary
poll (since 1963)
• Leading pioneer for online panels and online
surveys in 1990s
• 2005 – ranked 13th largest market research
firm worldwide (ESOMAR)
2
3. Qualitative and Quantitative Streams
actionable
prescription
needs rich detail
projectable
generalizations
require methodical
validation
and significance
testing.
3
4. Why Text Analytics is becoming Critical
• Provides a much needed validation of quantitative analysis; and can
be used to help understand what respondents mean when they
provide a scaled response
• Offers a more contextualized analysis where emerging themes are
less driven by a priori research categories and concepts
• Classification can be more sophisticated and a less taxing method of
building consistent thematic structures
• Catalogued text which can be easily organized in hierarchies and
interlocking relationships, which more accurately reflect how
respondents organize their conceptual world
• Sentiment which can be associated with thematically organized data
and provide insight into dispositions that influence stated or
observed behavior
4
5. Scalability and Replication
• Text analytics software offers lower costs for
handling large volumes of content rich data
• Natural language processing, and domain
specific training sets give reliable replication
and updating of analysis
• Iterative testing of thematic structures ensures
systematic validation of themes and their
analytical relevance.
5
6. Text Analytics Output
• Classification:
– Thematic Structure
– Links to detailed comments
• Volume:
– Count for Documents, Sentences and Authors
• Sentiment:
– Positive and Negative Content
6
7. Case Study
B2B Survey with Top Global Accounts
7
8. Context
• Data Source
– ~5,000 B2B interviews annually (multiple interviews per account)
– ~120 structured questions & ~5 open comments per interview
• Prior analysis:
– Main exploration and analysis dependent on structured variables
– Statistical approach limited to simple correlations and
comparison of average scores
– Use comments as anecdotes to support a priori hypotheses
about account issues
8
9. Key Issues
• Structured variables not providing actionable
insights
• Unstructured comments too difficult to theme
effectively
9
10. Issues with Structured Variables: Coefficient Graph
0.8
Comparing Multivariate and Bivariate Analysis of Drivers of Loyalty
0.7
0.6
0.5
0.4
Coefficients for 2008
0.3
ZeroOrder CORR
CATREG Optimal Scale
0.2
CR Scaled Zero Order Corr
0.1
0
-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
-0.1
-0.2
-0.3
Coefficients for 2007
10
11. Issue of Thematic Complexity
• No consistent story emerged in original analysis, despite having fewer
themes
• Comments were presented anecdotally and loosely mapped to a priori
groups of structured questions
From Original Harris
Reports Classification
Number of themes identified 44 65
Number of insignificant
or redundant themes 16 0
Number of unique themes 28 65
11
12. Overcoming Classification Redundancy
Text Analytics Child Categories Section 1 (original report) Section 2 (original report) Section 3 (original report) Section 4 (original report)
Account Mgmt - Communication
Account Mgmt - Knowledge & Expertis e A/ BR: Account Teams Viewed As Trus ted Advis ors Adding Strategic Value AM: Acct Mgr Adds Strategic Value
Account Mgmt - Local
Account Mgmt - Res pons ivenes s
Account Mgmt - Quality
Account Mgmt - Support RV: Empower Acct Mgr / Acct Team Support
Account Mgmt - Problem Res olution
Cons ulting Srvc - Value
Cons ulting Srvc - Offerings
Cons ulting Srvc - Knowledge & Expertis e
Contracts - Pos t-Sale
Contracts - Commitment
Contracts- Quality
Contracts - Value
Fulfillment - Commitment
Fulfillment - Speed
H ardware - Computers / Laptops
H ardware - Improvements
H ardware - Innovation
H ardware - Printers
H ardware - Quality & Reliability PQR: H ardware Quality / Reliability
H ardware - Server
H ardware - Value
H ardware - OS
H ardware - Virtualiz ation
Relations hip Value - Executive Engagement RV: Add Value / More Executive Engagement ABR: Effective Executive Engagement
Relations hip Value - Local
Text
Relations hip Value - Partnering
Relations hip Value - Communicate
Relations hip Value - Long-term View
Service & Technical Support - Flexibility & Res pons ivenes s S&S: Efficiency & Acces s ibility S&S: Prompt
Analytics Service & Technical Support - Quality
Service & Technical Support - Proactive
Service & Technical Support - Local
Service & Technical Support - Price
Matching Unique and Redundant
Software - Capabilities
Child Software - Quality & Reliability
Software - Improvements
Software - Value
Solutions - Quality & Reliability
Themes from Original Analysis
PQR: Software Quality / Solutions
Themes
Solutions - Value
Solutions - Comprehens ive
Solutions - Innovation
Solutions - Adaptive
Solutions - Price
G en - Pricing Improve TCO / Reduce Prices H elping Manage TCO
G en - Value PCV: Solutions Offer Compelling ROI PCV: Services Deliver Meas urable Value
Integrated Solutions - H ardware
Integrated Solutions - Software
Integrated Solutions - H ardware & Software
Integrated Solutions - G eneral IS: Integrated Solutions Reliability IS: Deliver Promis ed Benefits IS: More Integrated Solutions IS: Seamles s Integration of Solutions
Eas e of Doing Bus ines s - Fewer Silos / Less Bureaucracy EODB: Fewer Silos / Les s Bureaucracy EODB: Act Cohes ively EODB: G lobal Cons is tency
Eas e of Doing Bus ines s - Owners hip / Follow-through EODB: Owners hip / Follow-through RV: Proactively Propos e Solutions
Eas e of Doing Bus ines s - Flexibility & Res pons ivenes s EODB: Flexibility and Res pons ivenes s
Eas e of Doing Bus ines s - Simplify Proces s es (Pricing, Quoting, Ordering,
EODB: Simplify Proces s es (Pricing, Quoting, Ordering, Invoicing) EODB: T&Cs ; Quote TAT; Invoice Clarity EODB: Contract Proces s
Invoicing, Contracts , Terms )
U nders tanding Bus ines s N eeds U BN : Bus ines s U nders tanding U BN : Ability to U nders tand Critical Succes s Factors / Priorities
Cus tomer Communications & Education - Provide Roadmaps Communications : Roadmaps Enable Planning
Cus tomer Communications & Education - Provide Training, Seminars ,
Education
Cus tomer Communications & Education - G eneral Communications : Effective Communications
Depth & Breadth of Technology Portfolio TPK: Breadth & Depth of Technology Portfolio
Technology Experience & Expertis e TPK: Knowledge, Expertis e & Experience
Brand, Pres ence & Credibility TPP: Brand, Pres ence and Credibility
G lobal Coverage - Account Mgmt
G lobal Coverage - Solutions G C: Addres s G lobal Needs G C: G lobal Capabilities
G lobal Coverage - Support & Technical Service
12
13. Analysis Plan to Address Issues
Big Picture
Use Heat Maps
Identify Leading Themes
Linkages between Themes Co-occurrence
Narrative Structure of
Themes
CHAID
Statistical Validation
13
14. Step One: Effective Classification
Creating a thematic structure from case study data is first step
14
15. Key to Success: Detailed Classification
Classification
Tree Case Study Example
Understand
Ease of Breadth of Experience
Integrated Specific Brand
Doing Technology and
Solutions Business Credibility
Business Portfolio Expertise
Needs
See Training Set Example
15
18. Step Two: Heat Maps on Volume and Sentiment
Building a Market Story from a Thematic Structure
18
19. Why Heat Maps?
Current
• Heat maps provide a simple method of
Hi
identifying actions that are directly related
Sentiment
to the thematic structure of customer
Med
discourse.
• Text analytics provides a method of
Low
measuring both the current state of
Low Med Hi
sentiment and volume of mentions.
Emerging Volume
• The thematic integrity of the classification
Hi
scheme permits year over year analysis
of changes in sentiment and volume.
Sentiment
Med
• Combining Current (e.g., 2008) and
Emerging (i.e., year over year changes)
Low
patterns suggests ways to address
marketing, brand and product positioning. Low Med Hi
Volume
Note: Year over year change
19
20. How to interpret current state Heat Maps
Brand/Position
Hi
Latent Core Attractors
Leverage – continue to
Sentiment
Themes: Med build to maintain
requires top market position
of mind
uplift
Low
Brand Volume
Low Med Hi Brand/Position
Appendages: Detractors –
Requires top of need to address
mind and negative spin
sentiment uplift
20
21. Heat Map Criteria
Allocation of Themes Box Names
S>.85
S>.85 S>.85
Hi
Hi
V<.15
V ~.15-
V>.85 1 2 3
.85
Sentiment
Sentiment
S~.15-
S~ .15- S~.15-
Med
Med
.85
.85
V ~.15-
.85 4 5 6
V<.15 V>.85
.85
S<.15
Low
Low
S<.15 S<.15
V<.15
V ~.15-
V>.85 7 8 9
.85
Low Med Hi Low Med Hi
Volume Volume
Criteria for theme assignment uses 15th and 85th percentiles as parameters.
21
22. Interpretation: Emerging Problems
Emerging Problems
• Emerging Pattern E Example
– Sentiment (Low) means large downward
change in average
Hi
– Volume (Hi) means large increase in number
of mentions
Sentiment
• Current State C C
Med
– Sentiment (Med) means mix of positive and
negative dispositions
– Volume (Med) average number of mentions
Low
• Interpretation
E
– What are currently moderate issues may
become critical issues by next year if trend Low Med Hi
continues Volume
– There is a frequent top of mind mention that
requires remedial action not messaging
22
23. Interpretation: New Positioning
Develop New Positioning
• Emerging Pattern E Example
– Sentiment (Hi) means large upward
change in average
Hi
– Volume (Hi) means large
C E
1
increase in number of mentions
Sentiment
• Current State C
Med
– Sentiment (Med-Hi) means mix of positive 5 C 6
and negative dispositions
– Volume (Med-Hi) above average number
Low
of mentions
• Interpretation
– Upward trends in cells 1,5 & 6 mean new Low Med Hi
positive themes are emerging that tactical Volume
messaging can reinforce and connect to
current brand anchors
23
24. Interpretation: Diminishing Relevance
Diminishing Relevance
• Emerging Pattern E Example
– Sentiment (Lo) means large downward
change in average
Hi
– Volume (Lo) means large decrease in C
number of mentions 1
Sentiment
• Current State C
Med
– Sentiment (Hi) highly positive dispositions
– Volume (Med) average number of mentions 4 5 6
• Interpretation
Low
– Downward trend in both average sentiment and
number of mentions, which means that while the
E
positive disposition is eroding, the relevance of
the theme may become less significant over time Low Med Hi
– Shoring up this characteristic may have a low
ROI, look for positive changes in themes from Volume
1,4,5 & 6.
24
26. Total Market: Hyperlink Heat Map
Current State 2008 Big Picture
• Safe and reliable but stodgy
and not effectively improving
Hi
1 2 3
products and services
• Has effective account
Sentiment
management which represents
Med
4 5 6
value but can be seen as
maintaining the status quo
• Strategic partnering is
Low
7 8 9
hampered by a service
orientation that does not
Low Med Hi effectively communicate an
Volume improvement-consultative
model
26
27. Current Overall Client Experience
Strengths Threats
• Brand – broad portfolio & • Weak value proposition
consistent credibility – Software, Consulting &
• Strong account management Contracts
– Responsive, brings value & • Failure to improve through
expertise innovation and product evolution
• Delivering quality and reliability – Lagging developments in
– Hardware and software software and hardware
– Solutions – Weak business follow-through
– Service and Technical Support – Problems communicating
visionary roadmaps and
functionality
27
28. Understanding Interleafing Strengths
Quality and Reliability Account Management
• Co-occurrence Analysis • Co-occurrence Analysis
– Quality for solutions, support, SW – Account responsiveness is a
and HW are integrally linked function of communication and
– Product quality is reinforced by problem resolution
strong account management – Account management quality is
– Product quality has concrete tied to support and
underpinnings e.g., HW quality responsiveness
shows links to OS issues, which in – Account capabilities has
turn are tied to servers and connections to local account
emerging virtualization issues support and service support
28
29. Buffered by Account Management
Current State 2008 Account Mgt Themes
Responsiveness
Hi
1 2 3
Quality
Knowledge & Expertise
Sentiment
Med
Local Relationship Value
4 5 6
Support
Communication
Low
7 8 9
Local
Problem Resolution
Low Med Hi
Volume
29
30. Understanding Interleafing Threats
Building a Sense of Value Ongoing Improvement
• Co-occurrences • Co-occurrences
– Value disassociated from quality – Disconnect between innovation
and reliability and improvement in SW
– More narrowly focused or linked to – Co-occurrence with integrated
specific products and services solutions and SW capabilities is
– HW value co-occurs with SW not improving view of of SW
value suggesting mutual improvement
reinforcement – Both improvement and innovation
– HW value aligns with discussion of dimensions are poorly positioned
servers and HW improvements – HW improvement is linked to a a
– Positive server position does not highly commoditized technology
offset poor position of HW (i.e., printers) limiting uplift
improvement, which undermines
HW value
30
31. Stunted Improvement/Innovation Issues
Current State 2008
HW Innovation
Hi
1 2 3
Solution Innovation
Sentiment
Med
SW Improvement
4 5 6
HW Improvement
Low
7 8 9
Low Med Hi
Volume
31
33. Repurchase: Top box and Bottom Box
New Insights
• High volume positive themes of top box clients Similar to general market
– Very positive reaction to account management:
responsiveness, support, quality, expertise
– Very positive perceptions of quality and reliability
• High volume negative themes of bottom box clients
– Lack simplified processes and follow-through (EOB)
– Reactive service model Very different pain points
– Lack executive engagement and consultative value
– Fail to emphasize improvement
Very different pain points
33
34. Hyperlink Heat Map: Repurchase Top Box
Current State 2008
Hi 2 3
Sentiment
Med
Low
8 9
Low Med Hi
Volume
34
35. Hyperlink Heat Map: Repurchase Bottom Box
Current State 2008
Hi 2 3
Sentiment
Med
Low
8 9
Low Med Hi
Volume
35
37. Themes with Positive Uplift in Volume and Sentiment
• Large increases in volume and positive sentiment
– Brand presence and global solutions
– SW quality
– HW virtualization
• Large increases in volume, but large decreases in positive sentiment
– Integrated Solutions - Software
– Ease of Doing Business - Simplify Processes (Pricing, Quoting,
Ordering, Invoicing, Contracts, Terms)
Increasingly
Positive
Increasing
Chatter
Increasingly
Negative
37
38. Hyperlink Heat Map: Total Market Year Over Year Changes
Emerging Trends
Hi 2 3
Sentiment
Med
Low
8 9
Low Med Hi
Volume
38
40. Enhancement and Positioning Opportunities
Integrated Solutions – HW
S&T Support - Proactive Trend Diagnostics
HW - Virtualization
Hi
Acct Mgt - Local 1 2 3
Sentiment
• Local account management
Med
has latent potential on small 4
A 5
H&V 6
scale
• Positive development for core
Low
strength in S&T support 7 8 9
• Look to key market drivers
like virtualization to Low Med Hi
strengthen HW and reinforce
positive shifts in integrated Volume
solutions
40
41. Immediate Remedial Action Required
HW –Innovation (1)
HW – Improvements (2) Trend Diagnostics
Integrated Solutions – SW (1)
SW – Improvement (2)
Hi
1 2 3
EDB – Simplify Processes
Sentiment
E
Med
• SW (improvement) is not 4 5 6
showing signs of emerging H1 H2
from the red zone
• EDB (simplify) and Integrated
Low
Solutions (SW) show potential 7 S2 8 1
S 9
entrenchment in the red zone
• HW (innovation and Low Med Hi
improvement) see erosion in Volume
their moderate, trending to
red zone
41
42. Heat Maps: 2008 Account Type
Few Clients AND Many Clients
42
43. Hyperlink Heat Map: 2008 Account Type:
Linking Themes to Structured Variables
One to Few
Hi 2 3
Sentiment
Med
Low
8 9
Low Med Hi
Volume
43
44. Hyperlink Heat Map: 2008 Account Type:
Linking Themes to Structured Variables
One to Many
Hi 2 3
Sentiment
Med
Low
8 9
Low Med Hi
Volume
44
46. Hyperlink Heat Map for Relationship 2008:
Linking Themes to Structured Variables
Strategic Partner
Hi 2 3
Sentiment
Med
Low
8 9
Low Med Hi
Volume
46
47. Hyperlink Heat Map for Relationship 2008:
Linking Themes to Structured Variables
Solution Provider
Hi 2 3
Sentiment
Med
Low
8 9
Low Med Hi
Volume
47
48. Hyperlink Heat Map for Relationship 2008:
Linking Themes to Structured Variables
Transactional
Hi 2 3
Sentiment
Med
Low
8 9
Low Med Hi
Volume
48
49. Predictive Trees
Statistical validation of core product and service thematic
relationships to sentiment for total market
49
50. CHAID Tree for Sentiment
1- Low (4377) 22.8%
2 - Mid (5142) 26.8%
3 - High (9651) 50.3%
Total 19170
CATG15 - 15 Software
P=0.000000
CHI=187.835417; DF=2
SW
0 1
1- Low (4121) 22.1% 1- Low (256) 46.5%
2 - Mid (5014) 26.9% 2 - Mid (128) 23.3%
3 - High (9485) 50.9% 3 - High (166) 30.2%
Total 18620 97.1% Total 550 2.9%
CATG10 - 10 Hardware CATEGORY -
P=0.000476 P=0.000000
CHI=19.905137; DF=2 CHI=166.054857; DF=6
HW
0(16524) 1(2096) Software / Software - Capabilities(117) Software / Software - Improvements(145) Software / Software - Quality & Reliability(117) Software / Software - Value(171)
1- Low (3636) 22.0% 1- Low (485) 23.1% 1- Low (34) 29.1% 1- Low (119) 82.1% 1- Low (17) 14.5% 1- Low (86) 50.3%
2 - Mid (4379) 26.5% 2 - Mid (635) 30.3% 2 - Mid (43) 36.8% 2 - Mid (15) 10.3% 2 - Mid (25) 21.4% 2 - Mid (45) 26.3%
3 - High (8509) 51.5% 3 - High (976) 46.6% 3 - High (40) 34.2% 3 - High (11) 7.6% 3 - High (75) 64.1% 3 - High (40) 23.4%
Total 16524 86.2% Total 2096 10.9% Total 117 0.6% Total 145 0.8% Total 117 0.6% Total 171 0.9%
CATG16 - 16 Solutions CATEGORY -
P=0.000000 P=0.000000
CHI=238.887402; DF=2 CHI=608.919718; DF=6
0(13296) 1(3228) Hardware / Hardware - Computers/ Laptops(216) Hardware / Hardware - Improvements(249) Hardware / Hardware - Innovation(30) Hardware / Hardware - Quality & Reliability(518)
Solution
Hardware / Hardware - OS(27) Hardware / Hardware - Value(333)
1- Low (3029) 22.8% 1- Low (607) 18.8% Hardware / Hardware - Virtualization(94) 1- Low (188) 75.5% Hardware / Hardware - Printers(70) 1- Low (43) 8.3%
2 - Mid (3801) 28.6% 2 - Mid (578) 17.9% 2 - Mid (37) 14.9% Hardware / Hardware - Server(559) 2 - Mid (91) 17.6%
3 - High (6466) 48.6% 3 - High (2043) 63.3% 1- Low (74) 22.0% 3 - High (24) 9.6% 3 - High (384) 74.1%
Total 13296 69.4% Total 3228 16.8% 2 - Mid (153) 45.4% Total 249 1.3% 1- Low (180) 18.1% Total 518 2.7%
3 - High (110) 32.6% 2 - Mid (354) 35.7%
CATG6 - 6 Depth & Breadth of Technology Portfolio CATEGORY - Total 337 1.8% 3 - High (458) 46.2%
P=0.000000 P=0.000000 Total 992 5.2%
CHI=122.248266; DF=2 CHI=266.880808; DF=4
0(13035) 1(261) Solutions / Solutions - Adaptive(179) Solutions / Solutions - Comprehensive(325) Solutions / Solutions - Quality & Reliability(1690)
Solutions / Solutions - Price(619) Solutions / Solutions - Innovation(134)
1- Low (3014) 23.1% 1- Low (15) 5.7% Solutions / Solutions - Value(281) 1- Low (161) 9.5%
2 - Mid (3770) 28.9% 2 - Mid (31) 11.9% 1- Low (110) 24.0% 2 - Mid (266) 15.7%
3 - High (6251) 48.0% 3 - High (215) 82.4% 1- Low (336) 31.1% 2 - Mid (76) 16.6% 3 - High (1263) 74.7%
Total 13035 68.0% Total 261 1.4% 2 - Mid (236) 21.9% 3 - High (273) 59.5% Total 1690 8.8%
3 - High (507) 47.0% Total 459 2.4%
CATG14 - 14 Service & Technical Support Total 1079 5.6%
P=0.000000
CHI=64.536308; DF=2
0(10738)
1- Low (2565) 23.9%
2 - Mid (3198) 29.8%
3 - High (4975) 46.3%
Total 10738 56.0%
Technical 1(2297)
1- Low (449) 19.5%
2 - Mid (572) 24.9%
3 - High (1276) 55.6%
Total 2297 12.0%
CATG4 - 4 Cost/ Price/ Value - General CATEGORY -
Support
P=0.000000 P=0.000000
CHI=155.188571; DF=2 CHI=147.782652; DF=4
0(9342) 1(1396) Service & Technical Support / Service & Technical Support - Flexibility & Responsiveness(648) Service & Technical Support / Service & Technical Support - Local(92) Service & Technical Support / Service & Technical Support - Quality(1099)
Service & Technical Support / Service & Technical Support - Proactive(66) Service & Technical Support / Service & Technical Support - Price(392)
1- Low (2377) 25.4% 1- Low (188) 13.5% 1- Low (149) 13.6%
2 - Mid (2611) 27.9% 2 - Mid (587) 42.0% 1- Low (202) 28.3% 1- Low (98) 20.2% 2 - Mid (201) 18.3%
3 - High (4354) 46.6% 3 - High (621) 44.5% 2 - Mid (214) 30.0% 2 - Mid (157) 32.4% 3 - High (749) 68.2%
Total 9342 48.7% Total 1396 7.3% 3 - High (298) 41.7% 3 - High (229) 47.3% Total 1099 5.7%
Total 714 3.7% Total 484 2.5%
CATG3 - 3 Contracts
P=0.000000
CHI=122.920354; DF=2
0(8923) 1(419)
1- Low (2175) 24.4% 1- Low (202) 48.2%
2 - Mid (2515) 28.2% 2 - Mid (96) 22.9%
3 - High (4233) 47.4% 3 - High (121) 28.9%
Total 8923 46.5% Total 419 2.2%
CATG9 - 9 Global Coverage CATBIG14 - 14 Contracts Value
P=0.000286 P=0.005479
CHI=20.921843; DF=2 CHI=15.018692; DF=2
Validating thematic patterns and sentiment
0(8446) 1(477) 0(159) 1(260)
1- Low (2097) 24.8% 1- Low (78) 16.4% 1- Low (62) 39.0% 1- Low (140) 53.8%
2 - Mid (2350) 27.8% 2 - Mid (165) 34.6% 2 - Mid (34) 21.4% 2 - Mid (62) 23.8%
3 - High (3999) 47.3% 3 - High (234) 49.1% 3 - High (63) 39.6% 3 - High (58) 22.3%
Total 8446 44.1% Total 477 2.5% Total 159 0.8% Total 260 1.4%
CATG5 - 5 Customer Communications & Education
P=0.000000
CHI=44.619311; DF=2
0(7821) 1(625)
1- Low (1889) 24.2% 1- Low (208) 33.3%
2 - Mid (2152) 27.5% 2 - Mid (198) 31.7%
3 - High (3780) 48.3% 3 - High (219) 35.0%
Total 7821 40.8% Total 625 3.3%
CATG1 - 1 Account Mgmt CATEGORY -
P=0.000000 P=0.000127
CHI=196.377570; DF=2 CHI=40.295372; DF=2
0(6965) 1(856) Customer Communications & Education / Customer Communications & Education - General(353) Customer Communications & Education / Customer Communications & Education - Provide Roadmaps(135)
Customer Communications & Education / Customer Communications & Education - Provide Training, Seminars, Education(137)
1- Low (1769) 25.4% 1- Low (120) 14.0% 1- Low (83) 23.5%
2 - Mid (2023) 29.0% 2 - Mid (129) 15.1% 2 - Mid (140) 39.7% 1- Low (125) 46.0%
3 - High (3173) 45.6% 3 - High (607) 70.9% 3 - High (130) 36.8% 2 - Mid (58) 21.3%
Total 6965 36.3% Total 856 4.5% Total 353 1.8% 3 - High (89) 32.7%
Total 272 1.4%
CATG2 - 2 Consulting Services
P=0.000000
CHI=77.011450; DF=2
0(6804) 1(161)
1- Low (1681) 24.7% 1- Low (88) 54.7%
2 - Mid (2003) 29.4% 2 - Mid (20) 12.4%
3 - High (3120) 45.9% 3 - High (53) 32.9%
Total 6804 35.5% Total 161 0.8%
50
Editor's Notes
Modified version of the presentation at the Text Analytics Summit , June 2009, Boston, MA. Original presented by R. Scott Evans, PhD, VP, Harris Interactive.
Text analytics provides a much needed bridge between the richness and prescriptive qualities of qualitative methodologies and the more rigorous validation and predictive extrapolations derived from quantitative methodologies. By its very nature, the thematic structure generated by text analytics offers an integrative capability that permits the integration and synthesis of insights from disparate data sources.
Text analytics provides a much needed validation of quantitative analysis; and can be used to help understand what respondents mean when they provide a scaled response.Additionally, this provides a more contextualized analysis, where emerging themes are less driven by a priori research categories and concepts. In this context, classification can be more sophisticated and a less taxing method of building consistent thematic structures than traditional qualitative approaches. Catalogued text which can be easily organized in hierarchies and interlocking relationships, which more accurately reflect how respondents organize their conceptual world. Sentiment can be associated with thematically organized data and provide insight into dispositions that influence stated or observed behavior.
Successful application of text analytics is not “plug and play”. It does require front loaded commitment to designing thematic structures that address business issues, while ensuring that themes are consistent with what is found in the data. This requires a combination of rules-based learning and linguistically based machine learning. The subsequent repurposing of the ontologies derived from building a suitable thematic framework and associated training set means that replicating or updating the analysis with new data is both efficient and results in the consistent treatment of data.
The three general types of measures resulting from text analytics is themes, counts within themes, and average sentiment scores within themes. The key to successful analysis is to develop a thematic structure at the bottom nodes that is sufficiently rich to provide a detailed understanding of market discourse – but having a hierarchical structure that permits natural roll ups at higher levels. In other words, having parent, children and grandchildren nodes simplifies generalization. The children and grandchildren nodes provide a rich description of the nodes above them. So reporting can take place for parent nodes and the analysis can incorporate the insights derived from the sub-nodes .
The original case study was a mix of online and telephone interviews. The structured questions were extensive and the original design relies on quantitative measures for understanding account relationships. The open comments were to provide an anecdotal backdrop to support or add color to the quantitative findings.The original battery of performance questions forms a single factor in Principal Components analysis. What this means that distinguishing the importance of one question over another is extremely difficult. There is little variation in how respondents answer individual performance questions, with all questions skewed to the high end of the scale. Structured Data Problem: The singularity of the data makes it very difficult to identify salient performance issues and evaluate impact on likelihood to repurchase or recommend.Text Analytics: Provides an alternative way of assessing relative salience of performance by aligning themes with structured data and examining volume of comments and sentiment associated with themes that align with performance questions. This method also provides and independent alternative to the quantitative analysis by looking at sentence volume and average sentiment associated with themes derived independently from the textual data.
These were the issues that prompted focus on this particular data source.
Validation NeededThe reason why secondary validation of quantitative findings is so important is demonstrated in the abovecoefficient graph. It is here that we see some of the dangers of simplistic quantitative analysis, while at the same time pointing to the potential risks of using more advanced techniques. The tight clustering and size of the simple correlation coefficients in blue suggest strong relationships across all variables. They have the added quality of being relatively consistent year over year. Conversely, simple correlations using optimal scaled data (to reduce multicollinearity and potential spuriousness), and to a greater extent categorical regression using optimal scaled data, show greater differentiation in the relative importance of each variable. Even though the scaled correlations seem to show a select few variables that are both stable and high value, their bivariate nature makes it impossible to rule out spuriousness in such a multicollinear environment. While the scaled categorical regression handles the problem of spuriousness, the increased volatility in year over year comparisons renders its conclusion somewhat suspect. The results of the quantitative analysis create an important interpretive dilemma. On the one hand, the more advanced techniques offer a more effective method of identifying the most important factors contributing to account development. On the other, the inconsistency in the year over year results suggests that a more advanced model may be highly volatile and may not be accurately assessing the relative importance of each variable influencing customer loyalty. While such volatility could be a characteristic of the population, it is unlikely given that the accounts this study represents involve key client relationships where instability should be the exception.
A careful comparison of the thematic structure that was created using text analytics, with previous attempts at analysis, demonstrate the more methodical and exhaustive approach offered by a text analytics platform. The identification of non-significant or redundant themes in the prior analysis was based on a theme having either few assigned comments or having few distinguishing characteristics with another theme.
This slide is a graphical depiction of the redundancy and gaps in prior attempts at building a basic thematic structure, compared with the 65 nodes created by the text analytics methodology. The yellow cells represent themes from prior analysis, where redundancies are captured in the repeated yellow cells in columns to the right.
With 65 nodes there is a need to synthesize the essential story for business leaders. The pyramid depicts the method whereby the more general themes are distilled and presented, followed by subsequent inclusion of more detail.
While the rules incorporated in this example suggest inordinate complexity, the platform used in this case study makes building these rules relatively simple. The tool interface offers immediate feedback on work frequencies, different ways of interrogating word patterns and context, and offers drag and drop functionality, so that repetitive typing of chosen word groups is not required.
Heat maps offer a simple method of simplifying the results of the text analytics. The dimensions of volume and sentiment can be easily organized into cells that suggest particular tactical or strategic considerations.
View in slide show mode. This will enable the hyperlinks embedded in the heat map.
Co-occurrences are instances where a statement includes more than one theme. This pattern indicates some fundamental relationship between the themes. The relationship can be as simple as concomitant “top of mind” or a more sophisticated interweaving of multiple themes that express together something more than what is possible with a single theme.The relevance of co-occurrences are critical to understanding broad concepts like value and quality. The extent to which a theme is intermingled indicates the complexity and richness of the concept and the role that it plays in the articulation of customer experience. Identifying the linkages between themes is instrumental for the formulation of effective messaging and positioning. Such linkages are the points of resonance that can amplify or reinforce the relationship between a product position and the way in which the audience experiences the problem or context in which that product is relevant. In this study, value is used in an exclusive or discrete manner. Conversely, quality is more inclusive and is often intermingled with themes that cross domains. For example, when value is coterminous with software or pricing there is minimal overlap with other themes. Even when there are overlaps, they seem enhance specificity. For consulting value the linkage is with offerings. For hardware value it is with servers. For contract value it is with “ease of doing business – flexibility and responsiveness”.Continue (1of2)
Continued (2 of 2)Unlike the concept of value, quality plays a more holistic integrative role. Account management quality co-occurs with solution quality and service and technical support quality. Hardware and software quality demonstrates strong interconnectedness and additional connections with solution quality and support quality. The exceptions to this tendency towards quality inclusivity is where software quality shows co-occurrence with software improvement and where hardware quality co-occurs with servers.A marketing and positioning strategy can use these insights by building a word web and thematic bridges around its messaging or salestrack. The reporting environment within the text analytics platform makes such a task both feasible and effective. Drilling down into themes and examining the actual way in which participants articulate quality or value becomes a fecund environment for messaging content. Moreover, understanding the inclusivity or exclusivity of a concept is important when determining the most appropriate symbols to integrate into messaging content. For example, if quality is the primary focus of the message then broad inclusive symbols that encapsulate the corporate portfolio may be more effective than focusing on a single area. Conversely, focusing on value would be better served with crisp messages that target a particular product or service.
The themes associated with text can be linked to structured data. In the following examples there are structured variables for which it is important to view specific heat maps for each category or for particular parts of a scaled response. The next two heat maps depict heat maps for participants in the study who responded at the extreme ends of a scale where the participant was asked how likely would they repurchase the products or services offered by the vendor.
Placement in a cell is based on the rate of change from year one to year two.
The themes associated with text can be linked to structured data. In the following examples there are structured variables for which it is important to view specific heat maps for each category or for particular parts of a scaled response.