ASIST Webinar 12/2013

Conducting
Twitter Research
Kim Holmberg, PhD
Statistical Cybermetrics Research Group
University of...
Cascades, Islands, or Streams?
Time, Topic, and Scholarly Activities in
Humanities and Social Science Research
Indiana Uni...
Cascades, Islands, or Streams?
Integrate several datasets representing a
broad range of scholarly activities
Use methodolo...
I’m preparing slides for an #ASIST #webinar
DATA COLLECTION
Webometric Analyst, for data
collection via Twitter’s API, data
cleaning and analysis
http://lexiurl.wlv.a...
DATA COLLECTION
Other data collection tools
Twitter Archiving Google Spreadsheet (TAGS)
http://mashe.hawksey.info/2013/02/...
DATA COLLECTION
Information
dissemination
Influence,
popularity
Networks,
communities
Content,
trends
Time series,
sentime...
DATA EXTRACTION
Use Webometric Analyst to sort the data and
depending on your research goals, to extract
URLs, hashtags or...
ETHICS
Data collected from social media sites is openly available on the web,
hence it is already fully public and does no...
What can we research?
1
1. Networks (users, words, topics, …)
2
2. Content (tweets, RTs, hashtags, …)
FIRST STEPS
Step 1. What do you want to research?
Step 2. Collect tweets that are relevant for your research
questions
Ste...
1 NETWORK ANALYSIS
Possible research questions:
How different communities related to A are in
connection to each other?
Wh...
TWITTER NETWORK DATA
1,248
TWEETS

1

111

2

290

FOLLOWING

FOLLOWERS

3
CREATE THE NETWORK
ALTERNATIVE 1
This creates a network file (.net) based
on the connections between tweeters
and those th...
CREATE THE NETWORK
ALTERNATIVE 2
Sort the data
Then convert the data
into a network file

Source
Username1
Username1
Usern...
OBJECTS OF ANALYSIS
1. An actors
(person, group,
organisation,
word, etc.)
position in the
network
2. Structure of the
net...
AN ACTORS POSITION
Degree centrality
Used to locate actors with
influence in the network or
those that are in a position
w...
NETWORK STRUCTURE
Communities in the
network
Tells something about the
structure of the network
and how the different
acto...
NETWORK ANALYSIS
- tools of the trade

Gephi (for network visualizations)
http://gephi.org/

Ucinet (for network analysis ...
Analyzing astrophysicists’ conversational
connections on Twitter
Holmberg, Haustein, Bowman & Peters (work in progress)

C...
Analyzing astrophysicists’ conversational
connections on Twitter
Holmberg, Haustein, Bowman & Peters (work in progress)
10...
Climate change on Twitter: topics, communities
and conversations about the IPCC
Pearce, Holmberg, Hellsten & Nerlich (unde...
1 NETWORK ANALYSIS
Summary
Step 4. Extract the data that you need (e.g. Tweeters and the
usernames they mentioned, followi...
2 CONTENT ANALYSIS
Possible research questions:
How is topic A discussed on Twitter?
How certain activities on Twitter cor...
15,672
Quantitative

Qualitative
CONTENT ANALYSIS
- manual coding

Positive-Neutral-Negative
Scientific-Not scientific-Not clear
Skeptic-Convinced-Neutral
...
Holmberg, K. & Thelwall, M. (2013). Disciplinary differences in Twitter
scholarly communication. In the Proceedings of 14t...
CONTENT ANALYSIS
- tools of the trade

VOSviewer (to extract noun-phrases from tweets)
http://www.vosviewer.com/

BibExcel...
Noun-phrases
from one of the
communities

Analyzing astrophysicists’ conversational connections on Twitter
Holmberg, Haust...
TIME SERIES
- tools of the trade
Mozdeh (Persian for Good news)
Visit http://mozdeh.wlv.ac.uk/index.html
for free download...
TIME SERIES

Pearce, Holmberg, Hellsten & Nerlich (under review). Climate change on
Twitter: topics, communities and conve...
The Next Pope?
699,337 tweets collected
between February 12, 2013
and March 11, 2013.
Pope Francis - Jorge Mario Bergoglio
Was mentioned in 9 tweets...
ONLINE/OFFLINE
CORRELATIONS
Comparison of Twitter and publication activity and impact
• publications and tweets per day: ρ...
ONLINE/OFFLINE
CORRELATIONS
Overall similarity between abstracts and tweets is low
• cosine=0.081
• 4.1% of 50,854 tweet N...
2 CONTENT ANALYSIS
Summary
Step 4. Extract the data that you need (e.g. hashtags,
usernames, original tweets, ...)

And th...
During this hour
over 20,820,000
tweets were sent
Thank you for your attention

Kim Holmberg
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
kim.hol...
Conducting Twitter Reserch
Conducting Twitter Reserch
Upcoming SlideShare
Loading in …5
×

Conducting Twitter Reserch

1,877 views
1,746 views

Published on

An #ASIST webinar about conducting Twitter research; data collection, filtering, analysis and visualization

Published in: Education, Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
1,877
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
31
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Conducting Twitter Reserch

  1. 1. ASIST Webinar 12/2013 Conducting Twitter Research Kim Holmberg, PhD Statistical Cybermetrics Research Group University of Wolverhampton, UK (e) kim.holmberg@abo.fi (w3) http://kimholmberg.fi
  2. 2. Cascades, Islands, or Streams? Time, Topic, and Scholarly Activities in Humanities and Social Science Research Indiana University, Bloomington, USA University of Wolverhampton, UK Université de Montréal, Canada
  3. 3. Cascades, Islands, or Streams? Integrate several datasets representing a broad range of scholarly activities Use methodological and data triangulation to explore the lifecycle of topics within and across a range of scholarly activities Develop transparent tools and techniques to enable future predictive analyses
  4. 4. I’m preparing slides for an #ASIST #webinar
  5. 5. DATA COLLECTION Webometric Analyst, for data collection via Twitter’s API, data cleaning and analysis http://lexiurl.wlv.ac.uk/ For detailed instructions visit http://lexiurl.wlv.ac.uk/searcher/twitter.htm
  6. 6. DATA COLLECTION Other data collection tools Twitter Archiving Google Spreadsheet (TAGS) http://mashe.hawksey.info/2013/02/twitter-archive-tagsv5/ HootSuite http://hootsuite.com/ Or you can write your own script: https://dev.twitter.com/ http://140dev.com/free-twitter-api-source-code-library/twitter-database-server/
  7. 7. DATA COLLECTION Information dissemination Influence, popularity Networks, communities Content, trends Time series, sentiment Tweet Retweet or RT @username #Hashtag Tweeters
  8. 8. DATA EXTRACTION Use Webometric Analyst to sort the data and depending on your research goals, to extract URLs, hashtags or usernames or to remove stopwords from the tweets
  9. 9. ETHICS Data collected from social media sites is openly available on the web, hence it is already fully public and does not raise any ethical concerns (Wilkinson & Thelwall, 2011). However, in some cases the content of the tweets, blog entries or comments collected may contain identifiable, sensitive information. Although already public, publicizing such information by discussing it in an academic article could potentially have unwanted side-effects. Hence, one must consider to anonymise all data and treate it confidentially. Wilkinson, D. & Thelwall, M. (2011). Researching personal information on the public Web: Methods and ethics, Social Science Computer Review, vol. 29, no. 4, pp. 387-401.
  10. 10. What can we research? 1 1. Networks (users, words, topics, …) 2 2. Content (tweets, RTs, hashtags, …)
  11. 11. FIRST STEPS Step 1. What do you want to research? Step 2. Collect tweets that are relevant for your research questions Step 3. Sort and clean the tweets (e.g. tweets vs. retweets, remove tweets in other languages, remove spam, remove false positives, ...) Step 4. Extract the data that you need (e.g. tweeters, usernames mentioned, hashtags, URLs, ...) 1 2
  12. 12. 1 NETWORK ANALYSIS Possible research questions: How different communities related to A are in connection to each other? Who is most central/influential (has most connections) in a certain network of tweeters? How information is disseminated in the network? Who the actors involved in a certain network are? What kind of local communities are there in a certain network and what do those communities represent? and many more...
  13. 13. TWITTER NETWORK DATA 1,248 TWEETS 1 111 2 290 FOLLOWING FOLLOWERS 3
  14. 14. CREATE THE NETWORK ALTERNATIVE 1 This creates a network file (.net) based on the connections between tweeters and those they mention (@username) in their tweets. Detailed instructions on how to create and analyze conversational networks on Twitter are available at: http://lexiurl.wlv.ac.uk/searcher/twitterC onversationNetworks.html
  15. 15. CREATE THE NETWORK ALTERNATIVE 2 Sort the data Then convert the data into a network file Source Username1 Username1 Username2 Username3 Username3 Username3 Target Username2 Username3 Username3 Username1 Username2 Username4
  16. 16. OBJECTS OF ANALYSIS 1. An actors (person, group, organisation, word, etc.) position in the network 2. Structure of the network (in relation to other networks) or subnetworks (clusters)
  17. 17. AN ACTORS POSITION Degree centrality Used to locate actors with influence in the network or those that are in a position where they can spread information in the network. Can be divided into in- and outdegree. How many other actors can this actor reach directly? Other often used centrality measures: closeness, betweenness, Eigen-vector
  18. 18. NETWORK STRUCTURE Communities in the network Tells something about the structure of the network and how the different actors are spread and connected to each other in the network
  19. 19. NETWORK ANALYSIS - tools of the trade Gephi (for network visualizations) http://gephi.org/ Ucinet (for network analysis and visualization) https://sites.google.com/site/ucinetsoftware/ Pajek (for network analysis and visualization) http://pajek.imfm.si/doku.php
  20. 20. Analyzing astrophysicists’ conversational connections on Twitter Holmberg, Haustein, Bowman & Peters (work in progress) Communities detected based on the conversational connections in astrophysicists’ tweets
  21. 21. Analyzing astrophysicists’ conversational connections on Twitter Holmberg, Haustein, Bowman & Peters (work in progress) 100 % 7.4 90 % 4.4 0.0 2.9 2.9 80 % 13.2 70 % 5.9 60 % 4.4 2.9 6.7 3.3 16.7 4.5 7.8 1.1 0.6 3.3 12.5 16.7 13.3 5.7 10.2 3.4 17.5 9.2 0.9 2.8 0.9 1.1 19.3 20 % Amateur astronomer Teacher or educator 0.0 5.0 0.0 2.5 7.5 12.5 33.3 46.7 27.2 Corporative Organization or association 36.7 Science communicator 0.6 4.4 0.0 10 % Other 11.4 18.2 47.1 Unknown 13.8 26.7 40 % 33.3 40.0 50 % 30 % 0.0 10.1 0.0 0.0 Other researchers 0.9 3.7 13.3 13.8 8.0 5.7 2.5 5.0 6.7 Mod1 (n=88) Mod2 (n=40) Mod3 (n=180) Mod4 (n=30) Mod5 (n=109) Researcher 7.3 Mod0 (n=68) Other astrophysicists 33.3 12.5 8.8 Students 0% Mod6 (n=3) Percentage of people with different roles in the 7 communities
  22. 22. Climate change on Twitter: topics, communities and conversations about the IPCC Pearce, Holmberg, Hellsten & Nerlich (under review). Three groups coded based on their stance to climate change: • Convinced • Skeptic • Neutral
  23. 23. 1 NETWORK ANALYSIS Summary Step 4. Extract the data that you need (e.g. Tweeters and the usernames they mentioned, following or followers lists, ...) Step 5. Convert your data into a network file Step 6. Visualize the network and analyse In addition you may want to run some social network analysis on the network (e.g. centrality) or code the actors according to suitable titles (e.g. work roles, opinion about something, etc.)
  24. 24. 2 CONTENT ANALYSIS Possible research questions: How is topic A discussed on Twitter? How certain activities on Twitter correlate with offline activities? How popular is A compared with B, based on visibility on Twitter? What is the public opinion (of tweeters) about A? What are tweeters saying about A? and many more...
  25. 25. 15,672
  26. 26. Quantitative Qualitative
  27. 27. CONTENT ANALYSIS - manual coding Positive-Neutral-Negative Scientific-Not scientific-Not clear Skeptic-Convinced-Neutral Personal-Work related Astrophysics-Biochemistry-Cheminformatics ... Pro something-Against something and many more depending on your research goals...
  28. 28. Holmberg, K. & Thelwall, M. (2013). Disciplinary differences in Twitter scholarly communication. In the Proceedings of 14th International Society for Scientometrics and Informetrics conference, 2013, Vienna, Austria. Available at: http://issi2013.org/proceedings.html. 40% 35% 5 30% 25% 7 Other 20% 3.5 Links 3.5 7.5 15% Conversations Retweets 10 3 10% 3 18 3 0.5 8.5 6.5 0% Astrophysics Biochemistry Digital humanities 1.5 5 4.5 0 1 5% 0.5 1 Economics History of science Scientific content of the tweets by communication type
  29. 29. CONTENT ANALYSIS - tools of the trade VOSviewer (to extract noun-phrases from tweets) http://www.vosviewer.com/ BibExcel (for co-word analysis) http://www8.umu.se/inforsk/Bibexcel/ Notepad++ (to search and replace in your data) http://notepad-plus-plus.org/ Screaming Frog SEO Spider (to decode short urls) http://www.screamingfrog.co.uk/seo-spider/
  30. 30. Noun-phrases from one of the communities Analyzing astrophysicists’ conversational connections on Twitter Holmberg, Haustein, Bowman & Peters (work in progress)
  31. 31. TIME SERIES - tools of the trade Mozdeh (Persian for Good news) Visit http://mozdeh.wlv.ac.uk/index.html for free download and instructions
  32. 32. TIME SERIES Pearce, Holmberg, Hellsten & Nerlich (under review). Climate change on Twitter: topics, communities and conversations about the IPCC.
  33. 33. The Next Pope? 699,337 tweets collected between February 12, 2013 and March 11, 2013.
  34. 34. Pope Francis - Jorge Mario Bergoglio Was mentioned in 9 tweets...
  35. 35. ONLINE/OFFLINE CORRELATIONS Comparison of Twitter and publication activity and impact • publications and tweets per day: ρ=−0.339* • citation rate and tweets per day: ρ=−0.457** Haustein, Bowman, Holmberg, Larivière, & Peters, (under review). Astrophysicists on Twitter: An in-depth analysis of tweeting and scientific publication behavior.
  36. 36. ONLINE/OFFLINE CORRELATIONS Overall similarity between abstracts and tweets is low • cosine=0.081 • 4.1% of 50,854 tweet NPs in abstracts • 16.0% of 12,970 abstract NPs in tweets Haustein, Bowman, Holmberg, Larivière, & Peters, (under review). Astrophysicists on Twitter: An in-depth analysis of tweeting and scientific publication behavior.
  37. 37. 2 CONTENT ANALYSIS Summary Step 4. Extract the data that you need (e.g. hashtags, usernames, original tweets, ...) And then, depending on your research goals: Step 5A. Analyze frequencies (e.g. most used hashtags, etc.) Step 5B. Classify the tweets manually Step 5C. Extract the noun phrases and create a co-mention network of them with VOSviewer Step 5D. Analyze time series of certain word/hashtag occurrences Step 5E. Run sentiment analysis on the tweets
  38. 38. During this hour over 20,820,000 tweets were sent
  39. 39. Thank you for your attention Kim Holmberg Statistical Cybermetrics Research Group University of Wolverhampton, UK kim.holmberg@abo.fi http://kimholmberg.fi @kholmber Acknowledgements This presentation is based upon work supported by the international funding initiative Digging into Data. Specifically, funding comes from the National Science Foundation in the United States (Grant No. 1208804), JISC in the United Kingdom, and the Social Sciences and Humanities Research Council of Canada.

×