TELECOM DATA ANALYSIS
USING SOCIAL MEDIA FEED
Introduction
Data Extraction
Data Pre-Processing
Classification
Word Cloud
Frequent words and association
Clustering
Business Value
Cross- sell/Up-sell
Customer Churn and Retention
Customer Genomics
Future Scope
CONTENT
Data Extraction
Hoot suite (uberVU) uses web crawler to extract the data from different social media sources
DATA PRE-PROCESSING
• Part of KDD Process
• Removed Missing Values
• Reduced data from 10,000 rows to 1009 rows in excel
TEXT STEMMING AND CLEANING
# remove at people
mydata$Content = gsub("@w+", "", mydata$Content)
# remove punctuation
mydata$Content = gsub("[[:punct:]]", "",mydata$Content) >
# remove numbers
mydata$Content = gsub("[[:digit:]]", "", mydata$Content)
# remove html links
mydata$Content = gsub("httpw+", "", mydata$Content)
# remove unnecessary spaces
mydata$Content = gsub("[ t]{2,}", "", mydata$Content)
mydata$Content = gsub("^s+|s+$", "", mydata$Content)
TEXT CLASSIFICATION
Implemented Naïve
Bayes algorithm and
Simple Voter
algorithm to find out
the sentiments of
customer feedbacks.
Classify Polarity –
Function allows us
to classify some text
as positive or
negative or neutral.
Classify emotion –
Function helps us to
analyse some text
and classify it in
different types of
emotion:
anger, disgust, fear,
joy, sadness, and
surprise.
# classify emotion
class_emo = classify_emotion(mydata$Content, algorithm="bayes", prior=1.0)
# get emotion best fit
> emotion = class_emo[,7]
# classify emotion
class_emo = classify_emotion(mydata$Content, algorithm=“voter", prior=1.0)
# get emotion best fit
> emotion = class_emo[,7]
# classify polarity
> class_pol = classify_polarity(mydata$Content, algorithm="bayes")
get polarity best fit
> polarity = class_pol[,4]
# classify polarity
> class_pol = classify_polarity(mydata$Content, algorithm=“voter")
get polarity best fit
> polarity = class_pol[,4]
Emotion Analysis(Brand vs Emotion)
WORD CLOUD
Image composed of words used in a particular text or subject, in which the size of each word indicates its frequency or importance.
 # separating text by emotion
 > emos = levels(factor(sent_df$emotion))
 > nemo = length(emos) > emo.docs = rep("", nemo)
 # remove stopwords
 > emo.docs = removeWords(emo.docs, stopwords("english"))
>
 # create corpus
 > corpus = Corpus(VectorSource(emo.docs))
 > tdm = TermDocumentMatrix(corpus) > tdm = as.matrix(tdm)
> colnames(tdm) = emos
 # comparison word cloud
 > comparison.cloud(tdm, colors = brewer.pal(nemo,
"Dark2"), + scale = c(3,.5), random.order = FALSE,
title.size = 1.5)
Frequent words and Association
 findAssocs(dtms, c("service"), corlimit=0.98)
 $service
 will #centurylink at&t centurylink never
 1.00 0.99 0.99 0.99 0.99
findAssocs(dtms, c("at&t"), corlimit=0.98)
at&t`
centurylink internet service will #centurylink the
1.00 0.99 0.99 0.99 0.98 0.98
 dtms <- removeSparseTerms(dtm, 0.1)
 > inspect(dtms)
 <<DocumentTermMatrix (documents: 5, terms: 10)
 >> Non-/sparse entries: 50/0 Sparsity : 0% Maximal term length: 12
Weighting : term frequency (tf)
Terms
Docs #centurylink at&t can centurylink dear internet never service the will
1 39 99 25 177 4 51 10 55 40 38
TERM SIMILARITY BY CLUSTERING
dtmss <- removeSparseTerms(dtm, 0.1) # This makes a matrix that is only 10% empty space, maximum.
inspect(dtmss)
d <- dist(t(dtmss), method="euclidian")
fit <- hclust(d=d, method="ward.D")
 fit
Call: hclust(d = d, method = "ward.D")
Cluster method : ward.D
Distance : euclidean
Number of objects: 10
HIERARCHICAL CLUSTERING
Remove uninteresting or infrequent words
CUSTOMER SEGMENTATION
Based on spends and sentiments
Low value: 30$ to 50$
Medium Value: 51$ to 80$
High Value: 81$ to 120$
BASED ON CUSTOMER SPENDS AND
SENTIMENT
we classify customers for cross sell-
upsell campaigns and also customer
retention campaign.
CROSS SELL/ UP SELL
FUTURE SCOPE
• Sarcasm detection in unstructured data using Natural Language Processing.
• Increase the efficiency of sentiment Analysis.
• Sarcasm detection Method:
1. Lexical Analysis
2. Prediction using likes and dislikes
3. Fact negation
4. Temporal Knowledge extraction
CUSTOMER GENOMICS
• Every customer is represented by a unique model
created by their specific transaction
• Predictive Models access over 200 dimensions for
each person assigns label across all dimensions such
as what they buy, what factors influence their
purchase decision, how they engage, and potential life
event.
• Learns from every customer transaction via social
media, loyalty, self-stated survey data, panel data, and
other 3rd appended information.
• Automatically learns from every new transaction
about customer behaviour and updates every
probability that is associated with the customer.
• Avoids over-fitting and counter- intuitive decisions by
supervising the automation process to ensure that
results are intuitive, accurate and relevant.
References
• http://www.fractalanalytics.com/products-and-solutions/customer-genomics
• http://www.slideshare.net/rdatamining/text-mining-with-r-an-analysis-of-twitter-data
• https://sites.google.com/site/miningtwitter/questions/sentiment/sentiment
Telecom Data Analysis Using Social Media Feeds

Telecom Data Analysis Using Social Media Feeds

  • 1.
    TELECOM DATA ANALYSIS USINGSOCIAL MEDIA FEED
  • 2.
    Introduction Data Extraction Data Pre-Processing Classification WordCloud Frequent words and association Clustering Business Value Cross- sell/Up-sell Customer Churn and Retention Customer Genomics Future Scope CONTENT
  • 3.
    Data Extraction Hoot suite(uberVU) uses web crawler to extract the data from different social media sources
  • 4.
    DATA PRE-PROCESSING • Partof KDD Process • Removed Missing Values • Reduced data from 10,000 rows to 1009 rows in excel
  • 5.
    TEXT STEMMING ANDCLEANING # remove at people mydata$Content = gsub("@w+", "", mydata$Content) # remove punctuation mydata$Content = gsub("[[:punct:]]", "",mydata$Content) > # remove numbers mydata$Content = gsub("[[:digit:]]", "", mydata$Content) # remove html links mydata$Content = gsub("httpw+", "", mydata$Content) # remove unnecessary spaces mydata$Content = gsub("[ t]{2,}", "", mydata$Content) mydata$Content = gsub("^s+|s+$", "", mydata$Content)
  • 6.
    TEXT CLASSIFICATION Implemented Naïve Bayesalgorithm and Simple Voter algorithm to find out the sentiments of customer feedbacks. Classify Polarity – Function allows us to classify some text as positive or negative or neutral. Classify emotion – Function helps us to analyse some text and classify it in different types of emotion: anger, disgust, fear, joy, sadness, and surprise.
  • 7.
    # classify emotion class_emo= classify_emotion(mydata$Content, algorithm="bayes", prior=1.0) # get emotion best fit > emotion = class_emo[,7] # classify emotion class_emo = classify_emotion(mydata$Content, algorithm=“voter", prior=1.0) # get emotion best fit > emotion = class_emo[,7] # classify polarity > class_pol = classify_polarity(mydata$Content, algorithm="bayes") get polarity best fit > polarity = class_pol[,4] # classify polarity > class_pol = classify_polarity(mydata$Content, algorithm=“voter") get polarity best fit > polarity = class_pol[,4]
  • 9.
  • 10.
    WORD CLOUD Image composedof words used in a particular text or subject, in which the size of each word indicates its frequency or importance.  # separating text by emotion  > emos = levels(factor(sent_df$emotion))  > nemo = length(emos) > emo.docs = rep("", nemo)  # remove stopwords  > emo.docs = removeWords(emo.docs, stopwords("english")) >  # create corpus  > corpus = Corpus(VectorSource(emo.docs))  > tdm = TermDocumentMatrix(corpus) > tdm = as.matrix(tdm) > colnames(tdm) = emos  # comparison word cloud  > comparison.cloud(tdm, colors = brewer.pal(nemo, "Dark2"), + scale = c(3,.5), random.order = FALSE, title.size = 1.5)
  • 11.
    Frequent words andAssociation  findAssocs(dtms, c("service"), corlimit=0.98)  $service  will #centurylink at&t centurylink never  1.00 0.99 0.99 0.99 0.99 findAssocs(dtms, c("at&t"), corlimit=0.98) at&t` centurylink internet service will #centurylink the 1.00 0.99 0.99 0.99 0.98 0.98  dtms <- removeSparseTerms(dtm, 0.1)  > inspect(dtms)  <<DocumentTermMatrix (documents: 5, terms: 10)  >> Non-/sparse entries: 50/0 Sparsity : 0% Maximal term length: 12 Weighting : term frequency (tf) Terms Docs #centurylink at&t can centurylink dear internet never service the will 1 39 99 25 177 4 51 10 55 40 38
  • 12.
    TERM SIMILARITY BYCLUSTERING dtmss <- removeSparseTerms(dtm, 0.1) # This makes a matrix that is only 10% empty space, maximum. inspect(dtmss) d <- dist(t(dtmss), method="euclidian") fit <- hclust(d=d, method="ward.D")  fit Call: hclust(d = d, method = "ward.D") Cluster method : ward.D Distance : euclidean Number of objects: 10 HIERARCHICAL CLUSTERING Remove uninteresting or infrequent words
  • 13.
    CUSTOMER SEGMENTATION Based onspends and sentiments Low value: 30$ to 50$ Medium Value: 51$ to 80$ High Value: 81$ to 120$
  • 14.
    BASED ON CUSTOMERSPENDS AND SENTIMENT we classify customers for cross sell- upsell campaigns and also customer retention campaign. CROSS SELL/ UP SELL
  • 16.
    FUTURE SCOPE • Sarcasmdetection in unstructured data using Natural Language Processing. • Increase the efficiency of sentiment Analysis. • Sarcasm detection Method: 1. Lexical Analysis 2. Prediction using likes and dislikes 3. Fact negation 4. Temporal Knowledge extraction
  • 17.
    CUSTOMER GENOMICS • Everycustomer is represented by a unique model created by their specific transaction • Predictive Models access over 200 dimensions for each person assigns label across all dimensions such as what they buy, what factors influence their purchase decision, how they engage, and potential life event. • Learns from every customer transaction via social media, loyalty, self-stated survey data, panel data, and other 3rd appended information. • Automatically learns from every new transaction about customer behaviour and updates every probability that is associated with the customer. • Avoids over-fitting and counter- intuitive decisions by supervising the automation process to ensure that results are intuitive, accurate and relevant.
  • 18.