What Do We Get From Twitter
– And What Not?
An Introduction to (Popular) Twitter Research in
the Social Sciences
World Social Science Forum 2013
Montréal, Canada

Dr. Katrin Weller
katrin.weller@gesis.org, @kwelle, http://kwelle.wordpress.org
Work in progress
• First results
• Broader objective
Why study Twitter?
Exemplary Twitter research areas

Brand Communication &
Marketing

Crises & Natural Disasters

Elections & Politics

Popular Culture

Education & Scholarly
Communication

Health Care & Diseases
4
Visions
•
•
•
•
•

New source of research data
Availability, Popularity, Metadata
Insights into what people do and think
Big data
Predictions

 But: there are limitations and challenges.
Growing interest
450
Number of Publications
(Scopus total)
number of Publications
(social sciences only)

400
350

300
250
200

150
100
50
0
2007

2008

2009

2010

2011

2012

2013

SCOPUS search: TITLE(twitter) AND PUBYEAR > 2005, 22.08.2013.

6
How to study Twitter?
Disciplines
Immunology and Microbiology
Dentistry
Veterinary
Chemical Engineering
Multidisciplinary
Energy
Neuroscience
Materials Science
Chemistry
Pharmacology, Toxicology and Pharmaceutics
Physics and Astronomy
Environmental Science
Earth and Planetary Sciences
Health Professions
Biochemistry, Genetics and Molecular Biology
Nursing
Economics, Econometrics and Finance
Agricultural and Biological Sciences
Psychology
Decision Sciences
Arts and Humanities
Business, Management and Accounting
Mathematics
Medicine
Engineering
Social Sciences
Computer Science

298 publications
from social sciences

0

100

200

300

400

500

600

700
Methods? Objectives? Data?
• Current approach: close look at the most
popular (= highly cited) publications.
• Next steps:
– Bibliometric analyses including more sources
– Qualitative interviews with Twitter researchers
Top 17 (more than 50% of all
citations), plus top 3 from 2012.
No.
[1]
[2]
[3]
[4]

[5]

[6]

[7]

[8]

[9]

[10]
[11]
[12]

[13]
[14]

[15]
[16]

Publication
Huberman, B. A., Romero, D. M., & Wu, F. (2009). Social networks that matter: Twitter under the microscope. First Monday, 14(1).
Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/2317/2063
Marwick, A. E., & boyd, d. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New
Media & Society, 13(1), 114–133. doi:10.1177/1461444810365313
Junco, R., Heiberger, G., & Loken, E. (2011). The effect of Twitter on college student engagement and grades. Journal of Computer Assisted
Learning, 27(2), 119–132. doi:10.1111/j.1365-2729.2010.00387.x
Yardi, S., Romero, D., Schoenebeck, G., & boyd, d. (2010). Detecting spam in a Twitter network. First Monday, 15(1). Retrieved from
http://firstmonday.org/ojs/index.php/fm/article/view/2793/2431
Ritter, A., Cherry, C., & Dolan, B. (2010). Unsupervised modeling of Twitter conversations. In HTL'10 Human Language Technologies. The
2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 172–180). Stroudsburg, Pa:
Association for Computational Linguistics (ACL). Retrieved from http://dl.acm.org/citation.cfm?id=1858019
Petrovic, S., Osborne, M., & Lavrenko, V. (2010). Streaming first story detection with application to Twitter. In HTL'10 Human Language
Technologies. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 181–189).
Stroudsburg, Pa: Association for Computational Linguistics (ACL). Retrieved from http://dl.acm.org/citation.cfm?id=1858020
Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent Twitter sentiment classification. In HLT '11 Proceedings of the 49th
Annual Meeting of the Association for Computational Linguistics: Human Language Technologies:. Short papers - Volume 2 (pp. 151–160).
Retrieved from http://dl.acm.org/citation.cfm?id=2002492
Han, B., & Baldwin, T. (2011). Lexical normalisation of short text messages: makn sens a #twitter. In HLT '11 Proceedings of the 49th Annual
Meeting of the Association for Computational Linguistics: Human Language Technologies. Short papers - Volume 2 (pp. 368–378). Retrieved
from http://dl.acm.org/citation.cfm?id=2002520
Gimpel, K., Schneider, N., O'Connor, B., Das, D., Mills, D., Eisenstein, J., Heilmann, M., … (2011). Part-of-speech tagging for Twitter:
Annotation, features, and experiments. In HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies. Short papers - Volume 2 (pp. 42–47). Retrieved from http://dl.acm.org/citation.cfm?id=2002747
Schultz, F., Utz, S., & Göritz, A. (2011). Is the medium the message? Perceptions of and reactions to crisis communication via twitter, blogs
and traditional media. Public Relations Review, 37(1), 20–27. doi:10.1016/j.pubrev.2010.12.001
Barbosa, L., & Feng, J. (2010). Robust sentiment detection on twitter from biased and noisy data. In COLING '10 Proceedings of the 23rd
International Conference on Computational Linguistics (pp. 36–44).
Davidov, D., Tsur, O., & Rappoport, A. (2010). Enhanced sentiment lerarning using Twitter hashtags and smileys. In COLING '10 Proceedings
of the 23rd International Conference on Computational Linguistics (pp. 241–249). Retrieved from
http://dl.acm.org/citation.cfm?id=1944566.1944594
Hargittai, E., & Litt, E. (2011). The tweet smell of celebrity success: Explaining variation in Twitter adoption among a diverse group of young
adults. New Media & Society, 13(5), 824–842. doi:10.1177/1461444811405805
Zhou, X., Lee, W.-C., Peng, W.-C., Xie, X., Lee, R., & Sumiya, K. Measuring geographical regularities of crowd behaviors for Twitter-based geosocial event detection, 1. doi:10.1145/1867699.1867701
Gruzd, A., Wellman, B., & Takhteyev, Y. (2011). Imagining Twitter as an Imagined Community. American Behavioral Scientist, 55(10), 1294–
1318. doi:10.1177/0002764211409378
Johnson, K. A. (2011). The effect of Twitter posts on students’ perceptions of instructor credibility. Learning, Media and Technology, 36(1),

Results

Citations
155
77
55
28

27

26

22

22

21

19
19
19

18
18

17
16
Results I
Applied methods include:
• interviews with Twitter users,
• experimental settings of using Twitter in certain
environments,
• quantitative analysis of tweets and their
characteristics,
• network analysis of (following) users,
• linguistic analyses, e.g. word clustering, event
detection, sentiment analysis,
• analysis of tweets.
 Almost no combination of methods
Results II
Data
• collections of tweets (retrieved randomly or
based on specific search criteria)
• user profiles / user networks
• data from experiments, surveys, interviews.
Different datasets
Different sizes
Results III
Datasets - Examples
•
•
•
•
•
•
•
•
•
•
•
•

309740 Twitter users (with followers and tweets)
Experiment with 125 students.
17,803 tweets from 8,616 users + 1st degree network (3,048,360 directed
edges, 631,416 unique followers, and 715,198 unique friends)
1.3 million Twitter conversations, with each conversation containing
between 2 and 243 posts
20,000 tweets
1,827 annotated tweets
Experiment with 1677 participants
Survey with 505 young American adults
21,623,947 geo-tagged tweets
One person’s Twitter network (652 followers, 114 followings).
none
99,832 tweets
Results IV
Challanges and Limitations
- Technical challenges (API limitations, data quality)
- Ethical issues (?)
- Co-existing approaches for the same problem
(e.g. sentiment analysis)
- Sampling (size of dataset, random sample,
subsample, biases, representativeness)
- User information, geo information often
unavailable.
Outlook
Thank you for your attention!
Coming soon:
Weller, K., Bruns, A., Burgess, J., Mahrt , M. &
Puschmann, C. (Ed.) (2013). Twitter and Society.
New York: Peter Lang.

What do we get from Twitter - and what not?

  • 1.
    What Do WeGet From Twitter – And What Not? An Introduction to (Popular) Twitter Research in the Social Sciences World Social Science Forum 2013 Montréal, Canada Dr. Katrin Weller katrin.weller@gesis.org, @kwelle, http://kwelle.wordpress.org
  • 2.
    Work in progress •First results • Broader objective
  • 3.
  • 4.
    Exemplary Twitter researchareas Brand Communication & Marketing Crises & Natural Disasters Elections & Politics Popular Culture Education & Scholarly Communication Health Care & Diseases 4
  • 5.
    Visions • • • • • New source ofresearch data Availability, Popularity, Metadata Insights into what people do and think Big data Predictions  But: there are limitations and challenges.
  • 6.
    Growing interest 450 Number ofPublications (Scopus total) number of Publications (social sciences only) 400 350 300 250 200 150 100 50 0 2007 2008 2009 2010 2011 2012 2013 SCOPUS search: TITLE(twitter) AND PUBYEAR > 2005, 22.08.2013. 6
  • 7.
    How to studyTwitter?
  • 8.
    Disciplines Immunology and Microbiology Dentistry Veterinary ChemicalEngineering Multidisciplinary Energy Neuroscience Materials Science Chemistry Pharmacology, Toxicology and Pharmaceutics Physics and Astronomy Environmental Science Earth and Planetary Sciences Health Professions Biochemistry, Genetics and Molecular Biology Nursing Economics, Econometrics and Finance Agricultural and Biological Sciences Psychology Decision Sciences Arts and Humanities Business, Management and Accounting Mathematics Medicine Engineering Social Sciences Computer Science 298 publications from social sciences 0 100 200 300 400 500 600 700
  • 9.
    Methods? Objectives? Data? •Current approach: close look at the most popular (= highly cited) publications. • Next steps: – Bibliometric analyses including more sources – Qualitative interviews with Twitter researchers
  • 10.
    Top 17 (morethan 50% of all citations), plus top 3 from 2012.
  • 11.
    No. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] Publication Huberman, B. A.,Romero, D. M., & Wu, F. (2009). Social networks that matter: Twitter under the microscope. First Monday, 14(1). Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/2317/2063 Marwick, A. E., & boyd, d. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society, 13(1), 114–133. doi:10.1177/1461444810365313 Junco, R., Heiberger, G., & Loken, E. (2011). The effect of Twitter on college student engagement and grades. Journal of Computer Assisted Learning, 27(2), 119–132. doi:10.1111/j.1365-2729.2010.00387.x Yardi, S., Romero, D., Schoenebeck, G., & boyd, d. (2010). Detecting spam in a Twitter network. First Monday, 15(1). Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/2793/2431 Ritter, A., Cherry, C., & Dolan, B. (2010). Unsupervised modeling of Twitter conversations. In HTL'10 Human Language Technologies. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 172–180). Stroudsburg, Pa: Association for Computational Linguistics (ACL). Retrieved from http://dl.acm.org/citation.cfm?id=1858019 Petrovic, S., Osborne, M., & Lavrenko, V. (2010). Streaming first story detection with application to Twitter. In HTL'10 Human Language Technologies. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 181–189). Stroudsburg, Pa: Association for Computational Linguistics (ACL). Retrieved from http://dl.acm.org/citation.cfm?id=1858020 Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent Twitter sentiment classification. In HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies:. Short papers - Volume 2 (pp. 151–160). Retrieved from http://dl.acm.org/citation.cfm?id=2002492 Han, B., & Baldwin, T. (2011). Lexical normalisation of short text messages: makn sens a #twitter. In HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Short papers - Volume 2 (pp. 368–378). Retrieved from http://dl.acm.org/citation.cfm?id=2002520 Gimpel, K., Schneider, N., O'Connor, B., Das, D., Mills, D., Eisenstein, J., Heilmann, M., … (2011). Part-of-speech tagging for Twitter: Annotation, features, and experiments. In HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Short papers - Volume 2 (pp. 42–47). Retrieved from http://dl.acm.org/citation.cfm?id=2002747 Schultz, F., Utz, S., & Göritz, A. (2011). Is the medium the message? Perceptions of and reactions to crisis communication via twitter, blogs and traditional media. Public Relations Review, 37(1), 20–27. doi:10.1016/j.pubrev.2010.12.001 Barbosa, L., & Feng, J. (2010). Robust sentiment detection on twitter from biased and noisy data. In COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics (pp. 36–44). Davidov, D., Tsur, O., & Rappoport, A. (2010). Enhanced sentiment lerarning using Twitter hashtags and smileys. In COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics (pp. 241–249). Retrieved from http://dl.acm.org/citation.cfm?id=1944566.1944594 Hargittai, E., & Litt, E. (2011). The tweet smell of celebrity success: Explaining variation in Twitter adoption among a diverse group of young adults. New Media & Society, 13(5), 824–842. doi:10.1177/1461444811405805 Zhou, X., Lee, W.-C., Peng, W.-C., Xie, X., Lee, R., & Sumiya, K. Measuring geographical regularities of crowd behaviors for Twitter-based geosocial event detection, 1. doi:10.1145/1867699.1867701 Gruzd, A., Wellman, B., & Takhteyev, Y. (2011). Imagining Twitter as an Imagined Community. American Behavioral Scientist, 55(10), 1294– 1318. doi:10.1177/0002764211409378 Johnson, K. A. (2011). The effect of Twitter posts on students’ perceptions of instructor credibility. Learning, Media and Technology, 36(1), Results Citations 155 77 55 28 27 26 22 22 21 19 19 19 18 18 17 16
  • 12.
    Results I Applied methodsinclude: • interviews with Twitter users, • experimental settings of using Twitter in certain environments, • quantitative analysis of tweets and their characteristics, • network analysis of (following) users, • linguistic analyses, e.g. word clustering, event detection, sentiment analysis, • analysis of tweets.  Almost no combination of methods
  • 13.
    Results II Data • collectionsof tweets (retrieved randomly or based on specific search criteria) • user profiles / user networks • data from experiments, surveys, interviews. Different datasets Different sizes
  • 14.
    Results III Datasets -Examples • • • • • • • • • • • • 309740 Twitter users (with followers and tweets) Experiment with 125 students. 17,803 tweets from 8,616 users + 1st degree network (3,048,360 directed edges, 631,416 unique followers, and 715,198 unique friends) 1.3 million Twitter conversations, with each conversation containing between 2 and 243 posts 20,000 tweets 1,827 annotated tweets Experiment with 1677 participants Survey with 505 young American adults 21,623,947 geo-tagged tweets One person’s Twitter network (652 followers, 114 followings). none 99,832 tweets
  • 15.
    Results IV Challanges andLimitations - Technical challenges (API limitations, data quality) - Ethical issues (?) - Co-existing approaches for the same problem (e.g. sentiment analysis) - Sampling (size of dataset, random sample, subsample, biases, representativeness) - User information, geo information often unavailable.
  • 16.
  • 17.
    Thank you foryour attention! Coming soon: Weller, K., Bruns, A., Burgess, J., Mahrt , M. & Puschmann, C. (Ed.) (2013). Twitter and Society. New York: Peter Lang.