1. 1
Cork Institute of Technology - Candidate for Master of Science Degree 1
Using Big Data Analytics in a Social
Domain
Master’s in Cloud Computing 2013/2014
Ahmed Abdel-Aziz
May 2015
EMCCAe, CISSP, PMP
2. Cork Institute of Technology - Candidate for Master of Science Degree 2
Objective
1) Social Media, Analytics and the Marketing Campaign
2) Sentiment Analysis – Methodology & Techniques
3) The Need for Case-Study & an Analytics Prototype
4) Learning Outcomes & Future Work
3. Cork Institute of Technology - Candidate for Master of Science Degree 3
Social Media & Social Analytics
• Social media sites offspring of Web 2.0 Movement – Based
on cloud computing model (Software-as-a-Service)
• Percentage of companies using social media for marketing is
88%
Section 1 of 4
4. Cork Institute of Technology - Candidate for Master of Science Degree 4
Marketing Campaign Lifecycle
Section 1 of 4
• Consists of 5 phases
• Social analytics answers
social questions for each
phase
Ex: What is sentiment trend?
5. Cork Institute of Technology - Candidate for Master of Science Degree 5
• Social analytic projects based on sentiment analysis
benefit from a well thought out methodology
Section 2 of 4
Sentiment Analysis Methodology &
Techniques
6. Cork Institute of Technology - Candidate for Master of Science Degree 6
Section 2 of 4
Sentiment Analysis Methodology &
Techniques
• Social sentiment analysis starts with social listening
– Social listening can be performed using a variety of
open source tools such as PostgreSQL, R, Wordle,
and Circos, as well as tools such as Attensity 360 and
Analyze.
• Social data comes from 3 main categories of sources
– Social user’s account – analytic capability limited by social
media provider (FB, Twitter, LinkedIn)
– Social APIs – social media provider offers API to tap into
social data. Allows development of unique analytic programs
– 3rd party tools – provides very fast results but does not
offer same level of analytic capability of a custom program
7. SANS Technology Institute - Candidate for Master of Science Degree 7
Section 2 of 4
Sentiment Analysis Methodology
& Techniques
• Sentiment analysis techniques grouped into two main
categories:
• Supervised machine learning method
• Unsupervised method
• Supervised learning method learns features/words that
correlate with +ve/-ve sentiment. Can identify new text
sentiment
• Unsupervised methods a lexicon is used with words pre-
scored for polarity values. Sum of scores gives sentiment
• Both techniques widely used and offer comparable results
Cork Institute of Technology - Candidate for Master of Science Degree 7
8. Cork Institute of Technology - Candidate for Master of Science Degree 8
• Company launched new product to market – Marketing
campaign already launched long ago and in Account
Performance Phase
• Marketing team needs to measure upticks in sentiment
trend regarding new product to take appropriate actions
• Data science team believes continuous user surveys are
ineffective and a computational approach is necessary ->
Better results and much less intrusive
Need for Case-Study/Analytics Prototype
Section 3 of 4
9. Cork Institute of Technology - Candidate for Master of Science Degree 9
Need for Case-Study/Analytics Prototype
• Decision made to build a prototype for tool to measure
sentiment trend on Twitter specifically as start
• Twitter found to be the social network of choice regarding
brand and product sentiment topics à Thus Twitter
• Data science team key objectives:
– Produce useful results quickly and cost efficiently
Cloud Computing Value Proposition!?
– Get buy-in from marketing management to build full App
Section 3 of 4
10. Cork Institute of Technology - Candidate for Master of Science Degree 10
• Applying the Analytic Project Lifecycle to the Prototype
– Data Preparation
Input Data: raw tweets
Output Data: clean tweet text ready for sentiment analysis
– Analytic Model Planning & Building
Input Data: clean tweet text and learnt Naïve Bayesian model
Output Data: sentiment of analyzed tweets
– Communicate Results
Input Data: sentiment of analyzed tweets and tweets
Output Data: sentiment trend graph for both +ve and –ve
sentiments
Section 3 of 4
Need for Case-Study/Analytics Prototype
11. Cork Institute of Technology - Candidate for Master of Science Degree 11
Need for Case-Study/Analytics Prototype
Section 3 of 4
• Technology decisions made by data science team
– R programming language for social listening
– Twitter Social APIs for source of social data
– Leverage ready-made R packages to accelerate building time
– R programming for data preparation
– Leverage analytics cloud services offered such as Datumbox –
supervised machine learning method using Naïve Bayesian
– R programming to build main body for prototype analytics
application
– Making use of R plotting capabilities to present easy to understand
results for non-technical members of Marketing team
– Settling on the technologies to use to build the full blown
application dealing with much larger data sets – GPText/Pivotal HD
12. Cork Institute of Technology - Candidate for Master of Science Degree 12
Need for Case-Study/Analytics Prototype
Section 3 of 4
• Snippet R code for analytics application – Main Loop
• > possentiments = 0
• > negsentiments = 0
• > for (i in 1:"5")
• > {
• > tweets = searchTwitter("iPhone", n=5, lang="en”)t
• > tweet_txt = sapply(tweets, function(x) x$getText())
• > tweet_clean = clean.text(tweet_txt)
• > tweet_num = length(tweet_clean)
• ……..
• > for (i in 1:tweet_num)
• > {
• > tmp = getSentiment(tweet_clean[i], "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
• > tweet_df$sentiment[i] = tmp$sentiment
• ………..
• > }
• > possentiments <- c(possentiments, sum(tweet_df$sentiment=="positive"))
• > negsentiments <- c(negsentiments, sum(tweet_df$sentiment=="negative"))
• > Sys.sleep(5)
13. Cork Institute of Technology - Candidate for Master of Science Degree 13
Need for Case-Study/Analytics Prototype
• Plotting the trend of both positive and negative sentiments
Section 3 of 4
14. SANS Technology Institute - Candidate for Master of Science Degree 14
Learning Outcomes
• Initial State
– Good foundation in cloud computing and data analytics
– Very little knowledge in social domain – Not even FB account J
– Last coding experience was Java 13 years back
• Initial research project stages
– Social media university
– Addictive analytics workshop -> Introduction to Marketing domain
– Pivotal workshop to learn data analytics in social domain ->
Relevant Pivotal Data Analytics Platforms: GPText and Pivotal HD
• Later research project stages – practical
– Learning enough about R to build small scale analytics application
– How to leverage Datumbox analytics-as-a-service offering
Section 4 of 4
Cork Institute of Technology - Candidate for Master of Science Degree 14
15. Cork Institute of Technology - Candidate for Master of Science Degree 15
Summary
• Cloud, social, and Data Analytics synergy serve Marketing
• Is there an uptick in +ve/-ve sentiments of my product?
Is a question strategically important in the Account
Performance phase of a Marketing Campaign
• The research answered the question using a computational
approach based on a supervised learning method for
sentiment analysis that is cloud based
• Data source and data analytics in the cloud. Data preparation
and data presentation on-premise using R. Future work:
Optimize & Tune for Large Datasets -> Can be all Cloud