SlideShare a Scribd company logo
Twitter Data Analytics using R
By:
Santoshi Kumari
RUAS
Twitter Data Analysis Using R
• Create twitter app developer
account
• Get access credentials
• Install required packages in R
• Connect R tool to twitter
• Extract tweets at real time
• Create corpus
• Data preprocessing and text
mining
• Wordclud
• Frequent term mining
• Sentiemt analysis using lexicon
RUAS
R Packages
• Twitter data extraction: twitteR
• Text cleaning and mining: tm
• Word cloud: wordcloud
• Topic modelling: topicmodels, lda
• Sentiment analysis: sentiment, syzhu
• Social network analysis: igraph, sna
• Visualisation: wordcloud, Rgraphviz, ggplot2
Text Cleaning Functions
• Convert to lower case: tolower
• Remove punctuation: removePunctuation
• Remove numbers: removeNumbers
• Remove stop words (like 'a', 'the', 'in'): removeWords, stopwords
• Remove extra white space: stripWhitespace
Text Mining { Package tm}
• Remove numbers, punctuations, words or extra whitespaces :
• removeNumbers, removePunctuation, removeWords, removeSparseTerms, stripWhitespace
• Remove sparse terms from a term-document matrix
• removeSparseTerms:
• Various kinds of stopwords
• stopwords
• Stem words and complete stems
• stemDocument, stemCompletion
• Build a term-document matrix or a document-term matrix
• TermDocumentMatrix, DocumentTermMatrix
• Generate a term frequency vector
• termFreq
• Find frequent terms or associations of terms
• findFreqTerms, findAssocs
• Various ways to weight a term-document matrix
• weightBin, weightTf, weightTfIdf, weightSMART, WeightFunction
Prerequisites
• You have already installed R version 3.4.3 and are using RStudio.
• In order to extract tweets, you will need a Twitter application and hence a
Twitter account.
• If you don’t have a Twitter account, please sign up.
• Use your Twitter login ID and password to sign in at Twitter Developers.
• https://apps.twitter.com/
New App Form
Fill out the new app form. Names
should be unique, i.e., no one
else should have used this name
for their Twitter app.
Give a brief description of the
app. You can change this later on
if needed. Enter your website or
blog address. Callback URL can be
left blank.
Once you’ve done this, make sure
you’ve read the “Developer Rules
Of The Road” blurb, check the
“Yes, I agree” box, fill in the
CAPTCHA and click the “Create
Your Twitter Application” button.
Create My Access Token
Scroll down and click on “Create my
access token” button.
Note the values of consumer key and
consumer secret and keep them handy for
future use. You should keep these secret. If
anyone was to get these keys, they could
effectively access your Twitter account.
Save Access Credentials
Install And Load R Packages
• R comes with a standard set of packages. A number of other packages are available
for download and installation
• we will need the following packages:
– ROAuth: Provides an interface to the OAuth 1.0 specification, allowing users to authenticate via
OAuth to the server of their choice.
– TwitteR: Provides an interface to the Twitter web API.
• Let’s start by installing and loading all the required packages.
install.packages("twitteR")
install.packages("ROAuth")
library("twitteR")
library("ROAuth")
Extract Tweets
• Use searchTwitter to search Twitter based on the supplied search string and return a list. The “lang”
parameter is used below to restrict tweets to the “English” language.
>tweets <- searchTwitter(search.string, n=no.of.tweets, cainfo="cacert.pem", lang="en")
>tweets
>searchTwitter(searchString, n=25, lang=NULL, since=NULL, until=NULL, locale=NULL,
geocode=NULL, sinceID=NULL, maxID=NULL, resultType=NULL, retryOnRateLimit=120, ...)
Rtweets(n=25, lang=NULL, since=NULL, ...)
Examples
# searchTwitter(“RUAS", n=100)
# Rtweets(n=37)
### Search between two dates
# searchTwitter(‘NarendraModi', since='2015-03-01', until='2018-03-02')
### geocoded results
# searchTwitter('patriots', geocode='42.375,-71.1061111,10mi')
# ## using resultType
# searchTwitter('world cup+brazil', resultType="popular", n=15)
# searchTwitter('from:hadleywickham', resultType="recent", n=10)
Clean Up Text
We have already been authenticated and successfully retrieved the text from the tweets. The first step in creating a word cloud is to clean
up the text by using lowercase and removing punctuation, usernames, links, etc. We are using the function gsub to replace unwanted
text. gsub will replace all occurrences of any given pattern. Although there are alternative packages that can perform this operation, we
have chosen gsub because of its simplicity and readability.
#convert all text to lower case
1. tweets.text <- tolower(tweets.text)
# Replace blank space (“rt”)
1. tweets.text <- gsub("rt", "", tweets.text)
# Replace @UserName
1. tweets.text <- gsub("@w+", "", tweets.text)
# Remove punctuation
1. tweets.text <- gsub("[[:punct:]]", "", tweets.text)
# Remove links
1. tweets.text <- gsub("httpw+", "", tweets.text)
# Remove tabs
1. tweets.text <- gsub("[ |t]{2,}", "", tweets.text)
# Remove blank spaces at the beginning
1. tweets.text <- gsub("^ ", "", tweets.text)
# Remove blank spaces at the end
1. tweets.text <- gsub(" $", "", tweets.text)
Remove Stop Words
• In the next step we will use the text mining package tm to remove stop words. A stop word is a commonly
used word such as “the”.
• If tm is not already installed you will need to install it (available from the Comprehensive R Archive
Network).
• #install tm – if not already installed
install.packages("tm")
library(tm)
#create corpus
tweets.text.corpus <- Corpus(VectorSource(tweets.text))
#clean up by removing stop words
tweets.text.corpus <- tm_map(tweets.text.corpus, function(x)removeWords(x,stopwords()))
Generate Word Cloud
• Generate the word cloud using the wordcloud package.
• For an example we are concerned with plotting no more than 150 words that occur more than once
with random color, order, and position.
#install wordcloud if not already installed
install.packages("wordcloud")
library(word cloud)
#generate wordcloud
wordcloud(tweets.text.corpus, min.freq = 2, scale=c(7,0.5),colors=brewer.pal(8, "Dark2"),
random.color= TRUE, random.order = FALSE, max.words = 150)
Reference
• http://www.rdatamining.com/docs
• https://apps.twitter.com
Thank You

More Related Content

Similar to Twitter data analysis using r (part 2)

Scalable code Design with slimmer Django models .. and more
Scalable code  Design with slimmer Django models .. and moreScalable code  Design with slimmer Django models .. and more
Scalable code Design with slimmer Django models .. and more
Dawa Sherpa
 
What is Swagger?
What is Swagger?What is Swagger?
What is Swagger?
Philip Senger
 
Azure integration in dynamic crm
Azure integration in dynamic crmAzure integration in dynamic crm
Azure integration in dynamic crm
ssuser93127c1
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in Python
CodeOps Technologies LLP
 
Introduction to Swagger
Introduction to SwaggerIntroduction to Swagger
Introduction to Swagger
Knoldus Inc.
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
Hansol Kang
 
Building Push Triggers for Logic Apps
Building Push Triggers for Logic AppsBuilding Push Triggers for Logic Apps
Building Push Triggers for Logic Apps
BizTalk360
 
Twitter Mention Graph - Analytics Project
Twitter Mention Graph - Analytics ProjectTwitter Mention Graph - Analytics Project
Twitter Mention Graph - Analytics Project
Sotiris Baratsas
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
Derek Jacoby
 
Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
High Performance RPC with Finagle
High Performance RPC with FinagleHigh Performance RPC with Finagle
High Performance RPC with Finagle
Samir Bessalah
 
MuleSoft Surat Virtual Meetup#28 - Exposing and Consuming SOAP Service - SOAP...
MuleSoft Surat Virtual Meetup#28 - Exposing and Consuming SOAP Service - SOAP...MuleSoft Surat Virtual Meetup#28 - Exposing and Consuming SOAP Service - SOAP...
MuleSoft Surat Virtual Meetup#28 - Exposing and Consuming SOAP Service - SOAP...
Jitendra Bafna
 
Paper trail gem
Paper trail gemPaper trail gem
Paper trail gem
Bacancy Technology
 
Intro to Rails ActiveRecord
Intro to Rails ActiveRecordIntro to Rails ActiveRecord
Intro to Rails ActiveRecord
Mark Menard
 
Angular 2 overview in 60 minutes
Angular 2 overview in 60 minutesAngular 2 overview in 60 minutes
Angular 2 overview in 60 minutes
Loiane Groner
 
"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)
Portland R User Group
 
Building Your First App with MongoDB Stitch
Building Your First App with MongoDB StitchBuilding Your First App with MongoDB Stitch
Building Your First App with MongoDB Stitch
MongoDB
 
O365 Meetup Seattle March 21st 2019
O365 Meetup Seattle March 21st 2019O365 Meetup Seattle March 21st 2019
O365 Meetup Seattle March 21st 2019
Thomas Gölles
 

Similar to Twitter data analysis using r (part 2) (20)

Scalable code Design with slimmer Django models .. and more
Scalable code  Design with slimmer Django models .. and moreScalable code  Design with slimmer Django models .. and more
Scalable code Design with slimmer Django models .. and more
 
What is Swagger?
What is Swagger?What is Swagger?
What is Swagger?
 
Azure integration in dynamic crm
Azure integration in dynamic crmAzure integration in dynamic crm
Azure integration in dynamic crm
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in Python
 
Introduction to Swagger
Introduction to SwaggerIntroduction to Swagger
Introduction to Swagger
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
Building Push Triggers for Logic Apps
Building Push Triggers for Logic AppsBuilding Push Triggers for Logic Apps
Building Push Triggers for Logic Apps
 
Twitter Mention Graph - Analytics Project
Twitter Mention Graph - Analytics ProjectTwitter Mention Graph - Analytics Project
Twitter Mention Graph - Analytics Project
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
High Performance RPC with Finagle
High Performance RPC with FinagleHigh Performance RPC with Finagle
High Performance RPC with Finagle
 
MuleSoft Surat Virtual Meetup#28 - Exposing and Consuming SOAP Service - SOAP...
MuleSoft Surat Virtual Meetup#28 - Exposing and Consuming SOAP Service - SOAP...MuleSoft Surat Virtual Meetup#28 - Exposing and Consuming SOAP Service - SOAP...
MuleSoft Surat Virtual Meetup#28 - Exposing and Consuming SOAP Service - SOAP...
 
Paper trail gem
Paper trail gemPaper trail gem
Paper trail gem
 
Intro to Rails ActiveRecord
Intro to Rails ActiveRecordIntro to Rails ActiveRecord
Intro to Rails ActiveRecord
 
Angular 2 overview in 60 minutes
Angular 2 overview in 60 minutesAngular 2 overview in 60 minutes
Angular 2 overview in 60 minutes
 
"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)
 
Building Your First App with MongoDB Stitch
Building Your First App with MongoDB StitchBuilding Your First App with MongoDB Stitch
Building Your First App with MongoDB Stitch
 
O365 Meetup Seattle March 21st 2019
O365 Meetup Seattle March 21st 2019O365 Meetup Seattle March 21st 2019
O365 Meetup Seattle March 21st 2019
 

Recently uploaded

原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
IJECEIAES
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 

Recently uploaded (20)

原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 

Twitter data analysis using r (part 2)

  • 1. Twitter Data Analytics using R By: Santoshi Kumari RUAS
  • 2. Twitter Data Analysis Using R • Create twitter app developer account • Get access credentials • Install required packages in R • Connect R tool to twitter • Extract tweets at real time • Create corpus • Data preprocessing and text mining • Wordclud • Frequent term mining • Sentiemt analysis using lexicon RUAS
  • 3. R Packages • Twitter data extraction: twitteR • Text cleaning and mining: tm • Word cloud: wordcloud • Topic modelling: topicmodels, lda • Sentiment analysis: sentiment, syzhu • Social network analysis: igraph, sna • Visualisation: wordcloud, Rgraphviz, ggplot2
  • 4. Text Cleaning Functions • Convert to lower case: tolower • Remove punctuation: removePunctuation • Remove numbers: removeNumbers • Remove stop words (like 'a', 'the', 'in'): removeWords, stopwords • Remove extra white space: stripWhitespace
  • 5. Text Mining { Package tm} • Remove numbers, punctuations, words or extra whitespaces : • removeNumbers, removePunctuation, removeWords, removeSparseTerms, stripWhitespace • Remove sparse terms from a term-document matrix • removeSparseTerms: • Various kinds of stopwords • stopwords • Stem words and complete stems • stemDocument, stemCompletion • Build a term-document matrix or a document-term matrix • TermDocumentMatrix, DocumentTermMatrix • Generate a term frequency vector • termFreq • Find frequent terms or associations of terms • findFreqTerms, findAssocs • Various ways to weight a term-document matrix • weightBin, weightTf, weightTfIdf, weightSMART, WeightFunction
  • 6. Prerequisites • You have already installed R version 3.4.3 and are using RStudio. • In order to extract tweets, you will need a Twitter application and hence a Twitter account. • If you don’t have a Twitter account, please sign up. • Use your Twitter login ID and password to sign in at Twitter Developers. • https://apps.twitter.com/
  • 7. New App Form Fill out the new app form. Names should be unique, i.e., no one else should have used this name for their Twitter app. Give a brief description of the app. You can change this later on if needed. Enter your website or blog address. Callback URL can be left blank. Once you’ve done this, make sure you’ve read the “Developer Rules Of The Road” blurb, check the “Yes, I agree” box, fill in the CAPTCHA and click the “Create Your Twitter Application” button.
  • 8. Create My Access Token Scroll down and click on “Create my access token” button. Note the values of consumer key and consumer secret and keep them handy for future use. You should keep these secret. If anyone was to get these keys, they could effectively access your Twitter account.
  • 10. Install And Load R Packages • R comes with a standard set of packages. A number of other packages are available for download and installation • we will need the following packages: – ROAuth: Provides an interface to the OAuth 1.0 specification, allowing users to authenticate via OAuth to the server of their choice. – TwitteR: Provides an interface to the Twitter web API. • Let’s start by installing and loading all the required packages. install.packages("twitteR") install.packages("ROAuth") library("twitteR") library("ROAuth")
  • 11. Extract Tweets • Use searchTwitter to search Twitter based on the supplied search string and return a list. The “lang” parameter is used below to restrict tweets to the “English” language. >tweets <- searchTwitter(search.string, n=no.of.tweets, cainfo="cacert.pem", lang="en") >tweets >searchTwitter(searchString, n=25, lang=NULL, since=NULL, until=NULL, locale=NULL, geocode=NULL, sinceID=NULL, maxID=NULL, resultType=NULL, retryOnRateLimit=120, ...) Rtweets(n=25, lang=NULL, since=NULL, ...) Examples # searchTwitter(“RUAS", n=100) # Rtweets(n=37) ### Search between two dates # searchTwitter(‘NarendraModi', since='2015-03-01', until='2018-03-02') ### geocoded results # searchTwitter('patriots', geocode='42.375,-71.1061111,10mi') # ## using resultType # searchTwitter('world cup+brazil', resultType="popular", n=15) # searchTwitter('from:hadleywickham', resultType="recent", n=10)
  • 12. Clean Up Text We have already been authenticated and successfully retrieved the text from the tweets. The first step in creating a word cloud is to clean up the text by using lowercase and removing punctuation, usernames, links, etc. We are using the function gsub to replace unwanted text. gsub will replace all occurrences of any given pattern. Although there are alternative packages that can perform this operation, we have chosen gsub because of its simplicity and readability. #convert all text to lower case 1. tweets.text <- tolower(tweets.text) # Replace blank space (“rt”) 1. tweets.text <- gsub("rt", "", tweets.text) # Replace @UserName 1. tweets.text <- gsub("@w+", "", tweets.text) # Remove punctuation 1. tweets.text <- gsub("[[:punct:]]", "", tweets.text) # Remove links 1. tweets.text <- gsub("httpw+", "", tweets.text) # Remove tabs 1. tweets.text <- gsub("[ |t]{2,}", "", tweets.text) # Remove blank spaces at the beginning 1. tweets.text <- gsub("^ ", "", tweets.text) # Remove blank spaces at the end 1. tweets.text <- gsub(" $", "", tweets.text)
  • 13. Remove Stop Words • In the next step we will use the text mining package tm to remove stop words. A stop word is a commonly used word such as “the”. • If tm is not already installed you will need to install it (available from the Comprehensive R Archive Network). • #install tm – if not already installed install.packages("tm") library(tm) #create corpus tweets.text.corpus <- Corpus(VectorSource(tweets.text)) #clean up by removing stop words tweets.text.corpus <- tm_map(tweets.text.corpus, function(x)removeWords(x,stopwords()))
  • 14. Generate Word Cloud • Generate the word cloud using the wordcloud package. • For an example we are concerned with plotting no more than 150 words that occur more than once with random color, order, and position. #install wordcloud if not already installed install.packages("wordcloud") library(word cloud) #generate wordcloud wordcloud(tweets.text.corpus, min.freq = 2, scale=c(7,0.5),colors=brewer.pal(8, "Dark2"), random.color= TRUE, random.order = FALSE, max.words = 150)