Measuring customer care talk in 
Twitter 
Dr Ruth Page, Professor Jeremy Levesley 
University of Leicester 
rep22@le.ac.uk, @ruthtweetpage 
jl1@le.ac.uk 
www.le.ac.uk
Overview 
• Linguistic methods for quantitative analysis 
– Semantic Differential 
– Corpus linguistics 
– Discourse Analysis 
• Why might this be useful? 
– Identifying distinctive patterns in communication 
– Customer care training 
• How metrics can be turned into indices
Data sets 
Page data 
• Data – 177,735 tweets 
• 100 publically available 
accounts 
– 40 companies 
– 30 celebrities 
– 30 ‘ordinary’ accounts 
• Gathered in 2010 and 2012 
– Hashtags (Page 2012) 
– Apologies (Page 2014) 
Precise data 
• BT Care 
– 4014 tweets 
– 69,976 words 
• HSBC UK Help 
– 3882 tweets 
– 78,375 words
Methods: Scraping 
• Data capture (Page) 
– Bespoke python code that worked with the 
Twitter API to scrape all public posts from named 
accounts 
– Automatically sorted tweets 
• Updates 
• Addressed messages (starting with @username) 
• Retweets 
• Converted files to plain text
Methods: Sampling 
• travel: 
– @bluejet, @luxorlv, @southwestair, @british_airways, @londonmidland, 
@connectbyhertz, @carnivalcruise 
• entertainment: 
– @directv, @marvel, @travelchannel, @tvguide 
• food: 
– @sainsburys, @waitrose, @tastidlite, @popeyeschicken, @starbucks, 
@dunkindonuts, @wholefoods, @uktesco, @dunkindonuts 
• technology: 
– @emccorp, @itunesmusic, @dellcares, @costcomcares 
• finance: 
– @hoover, @hrblock, @zappos, @wachovia, @intuit 
• sport: 
– @chargers, @chicagobulls 
• retail: 
– @selfridges, @americanapparel, @karenmillen, @reiss, @marksandspencer, 
@rubbermaid, @johnlewisretail.
Frequency distribution 
Cumulative 
frequency 
Frequency 
End-to-begin 
cumulative 
frequency
Frequency distribution
Semantic differential 
• Semantic differential consists in three value: 
– Evaluation (Good – Bad) 
– Activity (Active – Passive) 
– Potency (Strong – Weak) 
• For each word from the 1,000 most frequently 
used words these three values are measured. 
• Each value belongs to interval [-4.6, 4.6]
Visualization 
• Right figure depicts the 
distribution of tweets in the 
space of the first three 
principal components 
calculated for the first 150 
words 
• We can see dense cone and 
small cluster outside the 
cone
Visualization 
• We calculate two subsets of the first 150 
words which realize an 80% covering of 
tweets. 
• We calculate three subsets of the first 150 
words which realize a 70% covering of tweets.
Visualization 
• Right figures depict the 
distribution of tweets in 
the space of the first 
three principal 
components calculated 
for the first 80% covering 
• We can see three clusters
layered structure 
• Right figures depict the 
distribution of tweets in 
the space of the first 
three principal 
components calculated 
for the first 70% covering 
• We can see Layered 
structure
Visualization 
• Right figures depict the 
distribution of tweets in the 
space of the first three 
principal components 
calculated for the second 
70% covering 
• We can see layered structure
Question 1 
• What kinds of messages do different groups of 
Twitter members post to their accounts? 
• Methods 
– Quantifying the number of each type of post
INSIGHT: Distribution of tweet types (2010) 
All groups favoured updates, with celebrities most of all 
Twitter is an environment for ‘broadcasting’ one-to-many messages 
‘Conversational’ one-to-one messages were less
INSIGHT: Distribution of tweet types (2012) 
Corporate tweeting behaviour changed and becomes more 
‘conversational’ 
What’s distinctive about the corporate addressed messages?
Using Corpus-based methods 
• Corpus – a definition 
– Collection of representative texts 
– Machine readable form 
• Concordancing tools 
– Antconc (Laurence 2014) - Freeware 
– Wordsmith Tools, Wmatrix – Proprietary 
• Search and sort lexical strings 
• Compare with other corpora
Corpus linguistics: Basic steps 
Start by examining…. 
1.Frequency of words 
2.Keywords in Context 
(KWIC) 
3.Collocations 
Clusters of words that 
repeatedly occur together
Is the frequency pattern specific to 
this dataset? 
Keyness list 
• ‘Keyness’ 
– Statistical over-use of words 
– I compared the corporate 
addressed messages with all 
tweets in my dataset 
• INSIGHT: 
– The items in the keyness list 
cluster together and are 
typically found in apologies
HSBC, BT Care and ‘Page’ data 
• INSIGHT 
• HSBC UK Help and BT 
Care apologise even 
more than the 
companies in my 
dataset!
Why are apologies so important? 
• Twitter is a public 
environment where 
customers can complain 
• Damage to the 
company’s reputation 
• Apologies need to 
rebuild reputation and 
re-establish rapport 
between company and 
customer
Example: KWIC list for HSBC’s sorry
Methods: Discourse Analysis 
• Manually extracted all 
examples of apologies 
from data (c.1200 egs) 
• Coded manually in Excel 
• Features identified by 
other researchers 
interested in apologies 
• Other communicative 
features 
• Formulae which indicate the apology 
• Problem restated in the apology 
• Explanation or account 
• Offer of repair 
• Greeting 
• Naming 
• Additional questions or instructions 
• Emoticons and conversational 
features (discourse markers)
Do companies repeat the 
problem or not? 
Companies tend to avoid repeating the 
problem in their apology. 
This enables them to preserve their 
reputation, but it can appear 
impersonal. 
BT Care typically uses ‘vague language’ 
to avoid restating the problem 
HSBC UK Help typically restates the 
problem
Do companies explain 
why the problem 
occurred? 
Companies do not often explain why a 
problem occurred. But when they do, 
it typically downplays their role in the 
offence that prompted the complaint. 
The effect can be to mitigate damage 
to reputation
Do companies make an 
‘offer of repair’? 
Offering recompense to the customer 
can be a way to rebuild reputation and 
re-establish rapport with the 
customer. 
It’s not always appropriate though, 
and depends on the company in 
question.
Personalising the message 
• Signatures 
– 37% of apologies by 
companies 
– 0 of apologies by 
ordinary accounts 
– 100% by HSBC Help UK 
– 0 by BT Care 
• Name of customer 
– 19% of apologies by 
companies 
– 11% of apologies by 
ordinary accounts 
– 69% by HSCB Help UK 
– 0 by BT Care
‘Rapport’ talk: Greetings 
and Emoticons HSBC BT Care 
Hi 2399 2815 
Hello 230 14 
Good afternoon 344 0 
Good evening 306 0 
Good morning 357 14 
3636 2844 
INSIGHT 
The style of an apology can be more or 
less formal. More conversational 
features like greetings and signals of 
emotional response like emoticons can 
be used to project rapport with the 
customer. 
In my data, 19% of companies and 
none of the ordinary accounts used 
greetings. Six percent of the 
companies and 25% of the ordinary 
accounts used emoticons. 
HSBC uses ‘rapport’ features more 
often than BT Care (figures given per 
million words) 
HSBC BT Care 
:) 2118 343 
:( 77 29 
;-) 89 0 
:-) 115 57 
2399 429
Does the customer need 
to respond again? 
Companies may not be able to 
respond completely to the complaint 
in Twitter. They may need more 
information or ask a third party to 
respond. 
This is risky as it means that the 
communication chain can break down 
leading to greater customer 
dissatisfaction.
Comparison of questions on the 
HSBC Help and BT Care accounts 
HSBC UK Help 
• 31% of all company tweets 
contained a punctuated 
question 
• 15% of the questions 
checked if the customer had 
been in touch 
• 10% asked if further help 
was needed 
BT Care 
• 44% of all company tweets 
contained a punctuated 
question 
• No questions asked if the 
customer had been in touch 
or needed further help
Summary of BT Care and HSBC 
HSBC UK Help 
• Risk reputation by restating 
the problem 
• Build rapport by 
– Personalising their tweets by 
always signing off and 
frequently use the customer’s 
name 
– Use greetings and emoticons 
• Use follow up questions to 
check customers’ needs 
BT Care 
• Protect reputation by rarely 
restating the problem and 
use explanations to defer 
blame 
• Limited rapport 
– Rarely if ever sign off and 
never use the customer’s 
names 
– Rarely use conversational 
features 
• Never use follow up 
questions to close the 
apology
Application for clients 
• What makes an apology good PR? 
– Different factors 
– Not all linguistic (e.g. timeliness) 
– Different people will value different aspects 
• Successful customer care is not mechanistic 
• But analysis can identify areas of need and 
then training can be developed to improve 
practice

Measurement presentation %28 page%29

  • 1.
    Measuring customer caretalk in Twitter Dr Ruth Page, Professor Jeremy Levesley University of Leicester rep22@le.ac.uk, @ruthtweetpage jl1@le.ac.uk www.le.ac.uk
  • 2.
    Overview • Linguisticmethods for quantitative analysis – Semantic Differential – Corpus linguistics – Discourse Analysis • Why might this be useful? – Identifying distinctive patterns in communication – Customer care training • How metrics can be turned into indices
  • 3.
    Data sets Pagedata • Data – 177,735 tweets • 100 publically available accounts – 40 companies – 30 celebrities – 30 ‘ordinary’ accounts • Gathered in 2010 and 2012 – Hashtags (Page 2012) – Apologies (Page 2014) Precise data • BT Care – 4014 tweets – 69,976 words • HSBC UK Help – 3882 tweets – 78,375 words
  • 4.
    Methods: Scraping •Data capture (Page) – Bespoke python code that worked with the Twitter API to scrape all public posts from named accounts – Automatically sorted tweets • Updates • Addressed messages (starting with @username) • Retweets • Converted files to plain text
  • 5.
    Methods: Sampling •travel: – @bluejet, @luxorlv, @southwestair, @british_airways, @londonmidland, @connectbyhertz, @carnivalcruise • entertainment: – @directv, @marvel, @travelchannel, @tvguide • food: – @sainsburys, @waitrose, @tastidlite, @popeyeschicken, @starbucks, @dunkindonuts, @wholefoods, @uktesco, @dunkindonuts • technology: – @emccorp, @itunesmusic, @dellcares, @costcomcares • finance: – @hoover, @hrblock, @zappos, @wachovia, @intuit • sport: – @chargers, @chicagobulls • retail: – @selfridges, @americanapparel, @karenmillen, @reiss, @marksandspencer, @rubbermaid, @johnlewisretail.
  • 6.
    Frequency distribution Cumulative frequency Frequency End-to-begin cumulative frequency
  • 7.
  • 8.
    Semantic differential •Semantic differential consists in three value: – Evaluation (Good – Bad) – Activity (Active – Passive) – Potency (Strong – Weak) • For each word from the 1,000 most frequently used words these three values are measured. • Each value belongs to interval [-4.6, 4.6]
  • 9.
    Visualization • Rightfigure depicts the distribution of tweets in the space of the first three principal components calculated for the first 150 words • We can see dense cone and small cluster outside the cone
  • 10.
    Visualization • Wecalculate two subsets of the first 150 words which realize an 80% covering of tweets. • We calculate three subsets of the first 150 words which realize a 70% covering of tweets.
  • 11.
    Visualization • Rightfigures depict the distribution of tweets in the space of the first three principal components calculated for the first 80% covering • We can see three clusters
  • 12.
    layered structure •Right figures depict the distribution of tweets in the space of the first three principal components calculated for the first 70% covering • We can see Layered structure
  • 13.
    Visualization • Rightfigures depict the distribution of tweets in the space of the first three principal components calculated for the second 70% covering • We can see layered structure
  • 14.
    Question 1 •What kinds of messages do different groups of Twitter members post to their accounts? • Methods – Quantifying the number of each type of post
  • 15.
    INSIGHT: Distribution oftweet types (2010) All groups favoured updates, with celebrities most of all Twitter is an environment for ‘broadcasting’ one-to-many messages ‘Conversational’ one-to-one messages were less
  • 16.
    INSIGHT: Distribution oftweet types (2012) Corporate tweeting behaviour changed and becomes more ‘conversational’ What’s distinctive about the corporate addressed messages?
  • 17.
    Using Corpus-based methods • Corpus – a definition – Collection of representative texts – Machine readable form • Concordancing tools – Antconc (Laurence 2014) - Freeware – Wordsmith Tools, Wmatrix – Proprietary • Search and sort lexical strings • Compare with other corpora
  • 18.
    Corpus linguistics: Basicsteps Start by examining…. 1.Frequency of words 2.Keywords in Context (KWIC) 3.Collocations Clusters of words that repeatedly occur together
  • 19.
    Is the frequencypattern specific to this dataset? Keyness list • ‘Keyness’ – Statistical over-use of words – I compared the corporate addressed messages with all tweets in my dataset • INSIGHT: – The items in the keyness list cluster together and are typically found in apologies
  • 20.
    HSBC, BT Careand ‘Page’ data • INSIGHT • HSBC UK Help and BT Care apologise even more than the companies in my dataset!
  • 21.
    Why are apologiesso important? • Twitter is a public environment where customers can complain • Damage to the company’s reputation • Apologies need to rebuild reputation and re-establish rapport between company and customer
  • 22.
    Example: KWIC listfor HSBC’s sorry
  • 23.
    Methods: Discourse Analysis • Manually extracted all examples of apologies from data (c.1200 egs) • Coded manually in Excel • Features identified by other researchers interested in apologies • Other communicative features • Formulae which indicate the apology • Problem restated in the apology • Explanation or account • Offer of repair • Greeting • Naming • Additional questions or instructions • Emoticons and conversational features (discourse markers)
  • 24.
    Do companies repeatthe problem or not? Companies tend to avoid repeating the problem in their apology. This enables them to preserve their reputation, but it can appear impersonal. BT Care typically uses ‘vague language’ to avoid restating the problem HSBC UK Help typically restates the problem
  • 25.
    Do companies explain why the problem occurred? Companies do not often explain why a problem occurred. But when they do, it typically downplays their role in the offence that prompted the complaint. The effect can be to mitigate damage to reputation
  • 26.
    Do companies makean ‘offer of repair’? Offering recompense to the customer can be a way to rebuild reputation and re-establish rapport with the customer. It’s not always appropriate though, and depends on the company in question.
  • 27.
    Personalising the message • Signatures – 37% of apologies by companies – 0 of apologies by ordinary accounts – 100% by HSBC Help UK – 0 by BT Care • Name of customer – 19% of apologies by companies – 11% of apologies by ordinary accounts – 69% by HSCB Help UK – 0 by BT Care
  • 28.
    ‘Rapport’ talk: Greetings and Emoticons HSBC BT Care Hi 2399 2815 Hello 230 14 Good afternoon 344 0 Good evening 306 0 Good morning 357 14 3636 2844 INSIGHT The style of an apology can be more or less formal. More conversational features like greetings and signals of emotional response like emoticons can be used to project rapport with the customer. In my data, 19% of companies and none of the ordinary accounts used greetings. Six percent of the companies and 25% of the ordinary accounts used emoticons. HSBC uses ‘rapport’ features more often than BT Care (figures given per million words) HSBC BT Care :) 2118 343 :( 77 29 ;-) 89 0 :-) 115 57 2399 429
  • 29.
    Does the customerneed to respond again? Companies may not be able to respond completely to the complaint in Twitter. They may need more information or ask a third party to respond. This is risky as it means that the communication chain can break down leading to greater customer dissatisfaction.
  • 30.
    Comparison of questionson the HSBC Help and BT Care accounts HSBC UK Help • 31% of all company tweets contained a punctuated question • 15% of the questions checked if the customer had been in touch • 10% asked if further help was needed BT Care • 44% of all company tweets contained a punctuated question • No questions asked if the customer had been in touch or needed further help
  • 31.
    Summary of BTCare and HSBC HSBC UK Help • Risk reputation by restating the problem • Build rapport by – Personalising their tweets by always signing off and frequently use the customer’s name – Use greetings and emoticons • Use follow up questions to check customers’ needs BT Care • Protect reputation by rarely restating the problem and use explanations to defer blame • Limited rapport – Rarely if ever sign off and never use the customer’s names – Rarely use conversational features • Never use follow up questions to close the apology
  • 32.
    Application for clients • What makes an apology good PR? – Different factors – Not all linguistic (e.g. timeliness) – Different people will value different aspects • Successful customer care is not mechanistic • But analysis can identify areas of need and then training can be developed to improve practice

Editor's Notes

  • #16 In 2010, the aggregated distribution of types of tweet show that for all 3 groups, the preferred type of tweet is the update, or the 1-to-many broadcast. However, this preference is more marked for the celebrities and least marked for ordinary twitter users. This difference is perhaps understandable: it is more economic for a celebrity to publish a single update to their sizable fan base than to write individually addressed messages to all of their followers.
  • #17 In 2012, this communicative pattern continues for the Celebrity and Ordinary accounts, but not for the corporations. The pattern reverses here and Addressed messages account for a greater proportion of the posts that are published. This is quite striking, especially if we compare it with the modified retweets that are found in the updates.