Is a Twitter user’s location
correlated with their opinion on
#COVID-19?
Rong-Ching Chang
Chun-Ming Lai, Assist Prof
Information Security Lab X Social Computing And Information Security Lab (SCIS)
Tunghai University
Twitter@AnnCC12
Agenda
• General introduction of How you get data from Twitter
• Data, Data Pre-processing
• Mythology
• What top 5 countries of English users tweets about during Feb to May
• How is it different from getting data from Facebook?
How do you get data from Twitter
Twitter API Hydration Web Scrapping Tools
Twitter API
Twitter Account Twitter Developer
Account
Create Application
Open data & Covid & Hydration
Citation Dataset from: Umair Qazi, Muhammad
Imran, Ferda Ofli. GeoCoV19: A Dataset of
Hundreds of Millions of Multilingual COVID-19
Tweets with Location Information. ACM
SIGSPATIAL Special, May 2020. doi:
https://doi.org/10.1145/3404820.3404823
Basic Data Pre-processing
A -> a
@ mail
# hashtags
https://
@mention
Dooing -> doing
Doing - > do
:$%#@
n, nan
stopwords
Filter lang
Filter geography
Remove top 20,
tail 20 words
Tokenization
Word frequency
WorldCloud, Trigram
Mythology
Word Cloud Sentiment
Analysis-
Polarity,
Subjective
N Gram
Word Cloud
Word Cloud
Word
Frequency
Categorization Mixed
A word cloud is a kind
of weighted list to
visualize language or
text data
the size of font
indicates
the number of
subcategories of a
collection.
the size of font
represents the
number of keywords
that appears in the
collection.
Cite:Yuping Jin, “Development of Word Cloud Generator Software Based on Python”, GCMM 2016
N Gram
‘soon’
‘soon’, ‘Comprehensive’
‘soon’, ‘Comprehensive’, ‘testing’
2- Gram
Bi-Gram
1- Gram
-Gram
3- Gram
Tri-Gram
N Gram
‘released’
‘charles’
‘released’, ‘melbourne’
‘charles’,’test’
‘released’, ‘melbourne’, ‘immigration’
‘charles’,’test’,’positive’
Sentiment Analysis- Polarity
Negative
-1.0
Positive
1.0
Neutral
0.0
Sentiment Analysis- Subjective
Subjective
1.0
Objective
0.0
Sentiment Analysis-Polarity, Subjective
Top 5 Countries (ISO 3166-1 alpha-2)
• 'us’ United States of America
• ‘in’ India
• ‘gb’ United Kingdom
• ‘cn’ China
• 'au’ Australia
• Date
• 2/1
• 3/16
• 3/25
• 4/25
Top 5 Countries Total Tweets
Top 5 Countries Polarity & Subjective
United States
2020/02/01
2020/03/16 2020/03/25 2020/04/25
United States Baltimore protests Freddie Gray's death
Odion Jude Ighalo a footballer who played in china
India
2020/03/16 2020/03/25 2020/04/25
India
United Kingdom
2020/03/16 2020/03/25 2020/04/25
United Kingdom
China
2020/03/16 2020/03/25 2020/04/25
China
Australia
2020/03/16 2020/03/25 2020/04/25
Australia
How do you get data from Facebook
A tool from Facebook to help follow, analyze, and report on what’s happening across social media.
How can you access it?
Chrome Extension University
researchers
Crowd Tangle Chrome Extension
Sentiment Analysis-Polarity, Subjective
Thank you
Rong-Ching Chang
Chun-Ming Lai, Assist Prof
Information Security Lab X Social Computing And Information Security Lab
(SCIS)
Tunghai University
Twitter@AnnCC12

Pydata Taipei 2020

Editor's Notes

  • #2 解釋為什麼選四個點 把confirm cases
  • #7 Mis spelling Lemmatization Optional common word removal, you can also add stopwords or additional words
  • #9 In the categorization type, the size of font indicates the number of subcategories of a collection.
  • #10 For the capture of stylometric features, they based their approach on trigrams, arguing that trigrams capture stylometric features well and are more extensible to unknown text when using a small training set, comparing to a bag of words approach. Daniel Ricardo Jaimes Moreno. Et al. “Prediction of Personality Traits in Twitter Users with Latent Features “, IEEE, 2019
  • #11 For the capture of stylometric features, they based their approach on trigrams, arguing that trigrams capture stylometric features well and are more extensible to unknown text when using a small training set, comparing to a bag of words approach. Daniel Ricardo Jaimes Moreno. Et al. “Prediction of Personality Traits in Twitter Users with Latent Features “, IEEE, 2019
  • #12 The polarity score is a float within the range [-1.0, 1.0]  The sentiment polarity can be determined as positive, negative and neutral.
  • #13 Subjective 主觀 Objective 客觀 The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
  • #15 'us','in','gb','cn','au’ 解釋為什麼選四個點 把confirm cases
  • #17 'us','in','gb','cn','au'
  • #18 In the end of March, you can clearly see the positive sentiment In the end of April 25, covid is clearly being tweeted with some political messages 放number of tweets