SlideShare a Scribd company logo
Twitter Data Analytics using R
By:
Santoshi Kumari
RUAS
Outline
• Big data and Social Media data
• Social media analytics
• Why Social media Analytics
• Real time twitter data analytics
• Text Mining for Tweeter Data
• Basics of Twitter data analytics using R
• Summary
Introduction
• The enormous data produced online,
increasing by seconds to minute
• Humanly difficult to keep up with the rate of
data generation on Twitter and Facebook.
• Device advance analytical model by combining
and implementing Machine learning, data
mining and NLP algorithm to make cognizant
decisions for time sensitive events.
Big Data
Analytics
Volume
• Petabyte
• Exabyte
•Faster processing
Velocity
• Batch
• Near real time
• Real time
•Improve
performance
Variety
• Structured
• Semi structured
• Unstructured
•Increase accuracy
Positive 
Neutral :-/
Negative 
Analytics
Social media Sentiment
Social media New way to predict future by understanding present and take cognizant decision
SOCIAL MEDIA REVOLUTION
Social Media Analytics
Social media has given new way of
communication technology for people to
share their opinion, interest, sentiment to the
world.
Huge amount of unstructured data is
generated over social media like Facebook,
twitter, LinkedIn
Social Media Analytics deals with
development and evaluation of tools and
frameworks to collect, monitor, analyze,
summarize, and visualize social media data
Extracts useful patterns and information
Why Social Media Analytics?
• Social media – An integral part of daily routine, changing the way of communication
across the globe
• Opinion of the mass is important – Political Party; Government Policies; Movies;
Products and Services; Individual(s) ; Organizations
• Trending topics can reveal people’s intentions and their interests and importantly
current happenings
Applications of Social Media Analytics
Retail companies - To harness their brand awareness, service improvement,
advertising/marketing strategies, identifying influencers
Finance: to determine market sentiment, news data for trading
Government and public officials
• Monitoring public perception on political candidates, election campaigns and
announcements
• Prediction at national level of happiness, unemployment etc.
• Social media job loss index: econprediction.eecs.umich.edu
• An article on real world applications
• Sudden change in behavior
Real time analytics
• The pulse of society can be found in social-media in real-time.
• Analyzing social-media content in real-time helps social scientists to
predict future and take quick relevant action in time !!!
What’s trending right now may not be popular
about an hour ago or hour before on social
media
Why Twitter?
• Twitter is a social microblog platform (Short Text Messages of 140 characters)
• 500 million tweets are generated everyday (http://www.internetlivestats.com/twitter-statistics/)
• Users often discuss current affairs and share personal views on various subjects
• views and sentiments on any subject from new products launched, to favorite movies , music to
political decisions.
• Twitter audience varies from common man to celebrities
• The tweets are also public and hence accessible to researchers unlike most social
network sites.
• Tweets are reliably time stamped so that they can be analyzed from a temporal
perspective.
Facts• The male vs. female ratio of social media users is as follows:
• Facebook – 60% female/40% male;
• Twitter – 60% female/40% male;
• Pinterest – 79% female/21% male;
• Google Plus – 29% female/71% male;
• LinkedIn – 55% female/45% male.
• YouTube has over 1 billion unique visitors per month
• 91% of mobile Internet access is for social activities with 73% of smartphone owners accessing social networks
through apps at least once per day.
• There are 684,478 pieces of content shared on Facebook; 3,600 new photos on Instagram; 2,083 check-ins on
Foursquare during every minute of every day.
• LinkedIn has over 3 million company pages
• According to this study, mothers with children under the age of 5 are the most active on social media.
Challenges
• Tweets are highly unstructured and also non-grammatical
• Out of Vocabulary Words
• Lexical Variation
• Extensive usage of acronyms like asap, lol, afaik
Text Analysis
• Text analysis : extract or classify information from text, like tweets, emails, chats,
documents, etc.
• Some popular examples are:
• Spam filtering: One of the most known and used text classification applications
(assign a category to a text). Spam filters learn to classify an email or message as
spam depending on the content and the subject.
• Sentiment Analysis: another application is text classification where an algorithm
must learn to classify an opinion as positive, neutral or negative depending on
the mood expressed by the writer.
• Information Extraction: From a text, learn to extract a particular piece of
information or data, for example, extracting addresses, entities, keywords, etc
Why is Sentiment Analysis Important?• 93% of marketers are using social media. However, only 9% of marketing companies have full-time bloggers
• Around 46% of web users will look towards social media when making a purchase.
• Government or Political party may want to know whether people support their program or not.
• Before investing into a company, one can leverage the sentiment of the people for the company to find out
where it stands.
• A company might want find out the reviews of its products like Amazon
• Economics: Predicting financial market. Used by corporates to monitor stock markets.
• Election :
1. Analyzing election related chatter
2. Find Party / Person wise sentiment
3. Find what people likes dislikes about Party/Person
4. Find major reasons behind success or failure
5. Find major trends in election
6. Analysing impact of non political movements which links to politics (Anna and Ramdev like movements)
Text Mining
https://manoharswamynathan.wordpress.com/2015/03/01/text-mining-101/
Data Processing steps• Explore Corpus – Understand the types of variables, their functions, permissible values, and so on.
• Some formats including html and xml contain tags and other data structures that provide more metadata.
• Convert text to lowercase – This is to avoid distinguish between words simply on case.
• Remove Number(if required) – Numbers may or may not be relevant to our analyses.
• Remove English stop words – Stop words are common words found in a language.
• Words like for, of, are etc are common stop words.
• Remove Own stop words(if required) – Along with English stop words, we could instead or in addition remove our
own stop words.
• Strip white space – Eliminate extra white spaces.
• Stemming – Transforms to root word.
• Stemming uses an algorithm that removes common word endings for English words, such as “es”, “ed” and “’s”.
• For example i.e., 1) “computer” & “computers” become “comput”
• Lemmatisation – transform to dictionary base form i.e., “produce” & “produced” become “produce”
• Sparse terms – We are often not interested in infrequent terms in our documents. Such “sparse” terms should be
removed from the document term matrix.
Document Term Matrix
• Document term matrix – A document term matrix is a matrix with documents as the rows and terms as the
columns and a count of the frequency of words as the cells of the matrix.
• Calculate Term Weight – TF-IDF
• How frequently term appears?
Term Frequency: TF(t) = (Number of times term t appears in a document) / (Total number of terms
in the document)
• How important a term is?
DF: Document Frequency = d (number of documents containing a given term) / D (the size of the collection
of documents)
• To normalize : Essentially we are compressing the scale of values so that very large or very small quantities are
smoothly compared
• IDF: Inverse Document Frequency
IDF(t) = log(Total number of documents / Number of documents with term t in it)
Example:
Consider a document containing 100 words wherein the word CAR appears 3 times
TF(CAR) = 3 / 100 = 0.03
Now, assume we have 10 million documents and the word CAR appears in one thousand of these
IDF(CAR) = log(10,000,000 / 1,000) = 4
TF-IDF weight is product of these quantities: 0.03 * 4 = 0.12
Similarity Distance Measure (Cosine)
• Why Cosine?
• General observation is that the Cosine similarity works better than the
Euclidean for text data.
Calculate Cosine similarity
• Example:
• Text 1: statistics skills and programming skills are equally important for analytics
• Text 2: statistics skills and domain knowledge are important for analytics
• Text 3 : I like reading books and travelling
• Document Term Matrix for the above 3 text would be:
• The three vectors are:
• T1 = (1,2,1,1,0,1,1,1,1,1,0,0,0,0,0,0)
• T2 = (1,1,1,0,1,1,0,1,1,1,1,0,0,0,0,0)
• T3 = (0,0,1,0,0,0,0,0,0,0,0,1,1,1,1,1)
• Degree of Similarity (T1 & T2) = (T1 %*% T2) / (sqrt(sum(T1^2)) * sqrt(sum(T2^2))) = 77%
• Degree of Similarity (T1 & T3) = (T1 %*% T3) / (sqrt(sum(T1^2)) * sqrt(sum(T3^2))) = 12%
Reference
• http://www.rdatamining.com/docs
Thank You
Final Thought ;-)
word is mightier than the sword
tweet is mightier than the sword"

More Related Content

What's hot

Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
Diana Maynard
 
Twitter Analytics
Twitter AnalyticsTwitter Analytics
Twitter Analytics
Stephen Dann
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
Seth Grimes
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methods
Citizens in the Making
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Diana Maynard
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
Diana Maynard
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
Minha Hwang
 
Note-Taking Techniques
Note-Taking TechniquesNote-Taking Techniques
Note-Taking Techniques
Janice Orcutt
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
Ontotext
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 Plus
Shalin Hai-Jew
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Knowledge Media Institute - The Open University
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
🧑‍💻 Manuel Coppotelli
 
Online Research SRC
Online Research SRCOnline Research SRC
Online Research SRCEmily Litle
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Prateek Singh
 
Introduction to Library Research Skills
Introduction to Library Research Skills Introduction to Library Research Skills
Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasets
aneeshabakharia
 
BD-ACA week4a
BD-ACA week4aBD-ACA week4a
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
Todd Rutherford
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
Michael Wilde
 

What's hot (20)

Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Twitter Analytics
Twitter AnalyticsTwitter Analytics
Twitter Analytics
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methods
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Note-Taking Techniques
Note-Taking TechniquesNote-Taking Techniques
Note-Taking Techniques
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 Plus
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
Online Research SRC
Online Research SRCOnline Research SRC
Online Research SRC
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
Introduction to Library Research Skills
Introduction to Library Research Skills Introduction to Library Research Skills
Introduction to Library Research Skills
 
Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasets
 
BD-ACA week4a
BD-ACA week4aBD-ACA week4a
BD-ACA week4a
 
BA FS session 2
BA FS session 2BA FS session 2
BA FS session 2
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
 

Similar to Twitter data analysis using R

A Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis IA Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis I
UNCResearchHub
 
LIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting informationLIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting information
Dr. Russell Rodrigo
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Timo Wandhoefer
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
hajinouha0
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaion
Ravindra Chaudhary
 
Planning to Evaluate Earned, Social/Digital Media Campaigns
Planning to Evaluate Earned, Social/Digital Media CampaignsPlanning to Evaluate Earned, Social/Digital Media Campaigns
Planning to Evaluate Earned, Social/Digital Media Campaigns
Eman Aly
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
Diana Maynard
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
UNCResearchHub
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
Jeremiah Fadugba
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
Diana Maynard
 
Twitter analysis - Data as factor for designing the right communication star...
Twitter analysis  - Data as factor for designing the right communication star...Twitter analysis  - Data as factor for designing the right communication star...
Twitter analysis - Data as factor for designing the right communication star...
Pere Claver Llimona
 
analyzing public sentiments using twitter feeds
 analyzing public sentiments using twitter feeds analyzing public sentiments using twitter feeds
analyzing public sentiments using twitter feeds
Orakzay
 
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
MeaningCloud
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand words
Masum Billah
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
EngrAliSarfrazSiddiq
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
alessio_ferrari
 
How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?
George Sam
 
Digital data
Digital dataDigital data
Digital data
ShivanandaVSeeri
 

Similar to Twitter data analysis using R (20)

A Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis IA Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis I
 
LIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting informationLIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting information
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaion
 
Planning to Evaluate Earned, Social/Digital Media Campaigns
Planning to Evaluate Earned, Social/Digital Media CampaignsPlanning to Evaluate Earned, Social/Digital Media Campaigns
Planning to Evaluate Earned, Social/Digital Media Campaigns
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
 
Twitter analysis - Data as factor for designing the right communication star...
Twitter analysis  - Data as factor for designing the right communication star...Twitter analysis  - Data as factor for designing the right communication star...
Twitter analysis - Data as factor for designing the right communication star...
 
analyzing public sentiments using twitter feeds
 analyzing public sentiments using twitter feeds analyzing public sentiments using twitter feeds
analyzing public sentiments using twitter feeds
 
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand words
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?
 
Digital data
Digital dataDigital data
Digital data
 

Recently uploaded

一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
itech2017
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 

Twitter data analysis using R

  • 1. Twitter Data Analytics using R By: Santoshi Kumari RUAS
  • 2. Outline • Big data and Social Media data • Social media analytics • Why Social media Analytics • Real time twitter data analytics • Text Mining for Tweeter Data • Basics of Twitter data analytics using R • Summary
  • 3. Introduction • The enormous data produced online, increasing by seconds to minute • Humanly difficult to keep up with the rate of data generation on Twitter and Facebook. • Device advance analytical model by combining and implementing Machine learning, data mining and NLP algorithm to make cognizant decisions for time sensitive events. Big Data Analytics Volume • Petabyte • Exabyte •Faster processing Velocity • Batch • Near real time • Real time •Improve performance Variety • Structured • Semi structured • Unstructured •Increase accuracy Positive  Neutral :-/ Negative  Analytics Social media Sentiment
  • 4. Social media New way to predict future by understanding present and take cognizant decision SOCIAL MEDIA REVOLUTION
  • 5.
  • 6.
  • 7. Social Media Analytics Social media has given new way of communication technology for people to share their opinion, interest, sentiment to the world. Huge amount of unstructured data is generated over social media like Facebook, twitter, LinkedIn Social Media Analytics deals with development and evaluation of tools and frameworks to collect, monitor, analyze, summarize, and visualize social media data Extracts useful patterns and information
  • 8. Why Social Media Analytics? • Social media – An integral part of daily routine, changing the way of communication across the globe • Opinion of the mass is important – Political Party; Government Policies; Movies; Products and Services; Individual(s) ; Organizations • Trending topics can reveal people’s intentions and their interests and importantly current happenings
  • 9. Applications of Social Media Analytics Retail companies - To harness their brand awareness, service improvement, advertising/marketing strategies, identifying influencers Finance: to determine market sentiment, news data for trading Government and public officials • Monitoring public perception on political candidates, election campaigns and announcements • Prediction at national level of happiness, unemployment etc. • Social media job loss index: econprediction.eecs.umich.edu • An article on real world applications • Sudden change in behavior
  • 10. Real time analytics • The pulse of society can be found in social-media in real-time. • Analyzing social-media content in real-time helps social scientists to predict future and take quick relevant action in time !!! What’s trending right now may not be popular about an hour ago or hour before on social media
  • 11. Why Twitter? • Twitter is a social microblog platform (Short Text Messages of 140 characters) • 500 million tweets are generated everyday (http://www.internetlivestats.com/twitter-statistics/) • Users often discuss current affairs and share personal views on various subjects • views and sentiments on any subject from new products launched, to favorite movies , music to political decisions. • Twitter audience varies from common man to celebrities • The tweets are also public and hence accessible to researchers unlike most social network sites. • Tweets are reliably time stamped so that they can be analyzed from a temporal perspective.
  • 12. Facts• The male vs. female ratio of social media users is as follows: • Facebook – 60% female/40% male; • Twitter – 60% female/40% male; • Pinterest – 79% female/21% male; • Google Plus – 29% female/71% male; • LinkedIn – 55% female/45% male. • YouTube has over 1 billion unique visitors per month • 91% of mobile Internet access is for social activities with 73% of smartphone owners accessing social networks through apps at least once per day. • There are 684,478 pieces of content shared on Facebook; 3,600 new photos on Instagram; 2,083 check-ins on Foursquare during every minute of every day. • LinkedIn has over 3 million company pages • According to this study, mothers with children under the age of 5 are the most active on social media.
  • 13. Challenges • Tweets are highly unstructured and also non-grammatical • Out of Vocabulary Words • Lexical Variation • Extensive usage of acronyms like asap, lol, afaik
  • 14. Text Analysis • Text analysis : extract or classify information from text, like tweets, emails, chats, documents, etc. • Some popular examples are: • Spam filtering: One of the most known and used text classification applications (assign a category to a text). Spam filters learn to classify an email or message as spam depending on the content and the subject. • Sentiment Analysis: another application is text classification where an algorithm must learn to classify an opinion as positive, neutral or negative depending on the mood expressed by the writer. • Information Extraction: From a text, learn to extract a particular piece of information or data, for example, extracting addresses, entities, keywords, etc
  • 15. Why is Sentiment Analysis Important?• 93% of marketers are using social media. However, only 9% of marketing companies have full-time bloggers • Around 46% of web users will look towards social media when making a purchase. • Government or Political party may want to know whether people support their program or not. • Before investing into a company, one can leverage the sentiment of the people for the company to find out where it stands. • A company might want find out the reviews of its products like Amazon • Economics: Predicting financial market. Used by corporates to monitor stock markets. • Election : 1. Analyzing election related chatter 2. Find Party / Person wise sentiment 3. Find what people likes dislikes about Party/Person 4. Find major reasons behind success or failure 5. Find major trends in election 6. Analysing impact of non political movements which links to politics (Anna and Ramdev like movements)
  • 17. Data Processing steps• Explore Corpus – Understand the types of variables, their functions, permissible values, and so on. • Some formats including html and xml contain tags and other data structures that provide more metadata. • Convert text to lowercase – This is to avoid distinguish between words simply on case. • Remove Number(if required) – Numbers may or may not be relevant to our analyses. • Remove English stop words – Stop words are common words found in a language. • Words like for, of, are etc are common stop words. • Remove Own stop words(if required) – Along with English stop words, we could instead or in addition remove our own stop words. • Strip white space – Eliminate extra white spaces. • Stemming – Transforms to root word. • Stemming uses an algorithm that removes common word endings for English words, such as “es”, “ed” and “’s”. • For example i.e., 1) “computer” & “computers” become “comput” • Lemmatisation – transform to dictionary base form i.e., “produce” & “produced” become “produce” • Sparse terms – We are often not interested in infrequent terms in our documents. Such “sparse” terms should be removed from the document term matrix.
  • 18. Document Term Matrix • Document term matrix – A document term matrix is a matrix with documents as the rows and terms as the columns and a count of the frequency of words as the cells of the matrix. • Calculate Term Weight – TF-IDF • How frequently term appears? Term Frequency: TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document) • How important a term is? DF: Document Frequency = d (number of documents containing a given term) / D (the size of the collection of documents) • To normalize : Essentially we are compressing the scale of values so that very large or very small quantities are smoothly compared • IDF: Inverse Document Frequency IDF(t) = log(Total number of documents / Number of documents with term t in it) Example: Consider a document containing 100 words wherein the word CAR appears 3 times TF(CAR) = 3 / 100 = 0.03 Now, assume we have 10 million documents and the word CAR appears in one thousand of these IDF(CAR) = log(10,000,000 / 1,000) = 4 TF-IDF weight is product of these quantities: 0.03 * 4 = 0.12
  • 19. Similarity Distance Measure (Cosine) • Why Cosine? • General observation is that the Cosine similarity works better than the Euclidean for text data.
  • 20. Calculate Cosine similarity • Example: • Text 1: statistics skills and programming skills are equally important for analytics • Text 2: statistics skills and domain knowledge are important for analytics • Text 3 : I like reading books and travelling • Document Term Matrix for the above 3 text would be: • The three vectors are: • T1 = (1,2,1,1,0,1,1,1,1,1,0,0,0,0,0,0) • T2 = (1,1,1,0,1,1,0,1,1,1,1,0,0,0,0,0) • T3 = (0,0,1,0,0,0,0,0,0,0,0,1,1,1,1,1) • Degree of Similarity (T1 & T2) = (T1 %*% T2) / (sqrt(sum(T1^2)) * sqrt(sum(T2^2))) = 77% • Degree of Similarity (T1 & T3) = (T1 %*% T3) / (sqrt(sum(T1^2)) * sqrt(sum(T3^2))) = 12%
  • 22. Thank You Final Thought ;-) word is mightier than the sword tweet is mightier than the sword"