Chances and Challenges of Studying Social Media Data
1. Chances and Challenges of
Studying Social Media Data
Dr. Katrin Weller
GESIS – Leibniz-Institute for the Social Sciences
Data Archive for the Social Sciences / Computational Social Science
Cologne, Germany
●
Digital Studies Fellow at John W. Kluge Center
Library of Congress
Washington D.C.
E-Mail: katrin.weller@gesis.org ●Twitter: @kwelle ● Web: www.katrinweller.net
2. My Background
• PhD in Information Science (until 2012 University of
Düsseldorf)
• Interests: Web Science, Social Media (focus on Twitter),
Knoweledge representation + Semantic Web, informetrics +
altmetrics, scholarly communication
• Since 2013: GESIS, Social Web Data: New data types for social
science research; research methods and data archiving.
• Jan-May 2015: Digital Studies Fellowship at the Library of
Congress
2
3. Recent and Current Work
• Co-editor of „Twitter & Society“ (Peter Lang, 2014).
• With Katharina Kinder-Kurlanda: „The hidden data of social
media research“
• #FAIL! Things that didn‘t work out in social media research –
and what we can learn from them (#fail2015a at Web Science
Conference, Oxford, #fail2015b at Internet Research 16,
Phoenix). https://failworkshops.wordpress.com
• Pilotproject for archiving social media datasets in an election
study. (http://arxiv.org/abs/1312.4476v2)
3
7. Social media research 2000-2013
0
1000
2000
3000
4000
5000
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
No. of publications (Scopus)
Scopus search, conducted March 2014: (TITLE-ABS-KEY("social media") OR TITLE-ABS-KEY("social web") OR TITLE-ABS-KEY("social
software") OR TITLE-ABS-KEY("web 2.0")) AND PUBYEAR > 1999
8. Scopus: 2000-2013 by country
0 1000 2000 3000 4000 5000 6000 7000
United States
United Kingdom
Germany
Australia
China
Spain
Canada
Italy
France
Taiwan
Netherlands
South Korea
Finland
Austria
Japan
Greece
India
Singapore
Switzerland
Hong Kong
Ireland
9. Scopus: 2000-2013 by subject area
10650; 36%
5542; 19%
2384
2288
2151
1535
773
772
65 Computer Science
Social Sciences
Engineering
Medicine
Business, Management and Accounting
Mathematics
Arts and Humanities
Decision Sciences
Psychology
Nursing
Economics, Econometrics and Finance
Biochemistry, Genetics and Molecular Biology
Health Professions
Environmental Science
Earth and Planetary Sciences
Agricultural and Biological Sciences
Pharmacology, Toxicology and Pharmaceutics
Physics and Astronomy
Materials Science
Multidisciplinary
Neuroscience
Immunology and Microbiology
Chemical Engineering
Veterinary
Dentistry
Chemistry
Energy
11. Challenge
• Interdisciplinarity!
• „Social media research“ is not a coherent research field.
• Influences from lots of different disciplines.
• Some disciplines still isolated, not all equally advanced in
technical tasks.
• Challenge of keeping track of what is going on – across
disciplines.
11
12. Example: Twitter research in social sciences
12
Weller, K. (2014). What do we get from Twitter – and What Not? A Close Look at Twitter Research in the
Social Sciences. Knowledge Organization 41(3), 238-248.
13. Challenge vs. Chance
• Lots of room for exploration and innovation
• Few or no standards
13
15. 15
Different methods even in social science Twitter research
Weller, K. (2014). What do we get from Twitter – and what not? A close look at Twitter research in the social sciences.
Knowledge Organization. 41(3), 238-248
16. Example
0
10
20
30
40
50
60
2008 2009 2010 2011 2012 2013
Publications on „Twitter and elections“
(Scopus and Web of Science)
Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data:
Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.
16
17. Year of
election
Name of election Country/region No. of papers
(2013)
Date of
election
2008 40th Canadian General Election Canada 1 14.10.200
8
2009 European Parliament election, 2009 Europe 1 07.06.200
9
2009 German federal election, 2009 Germany 2 27.09.200
9
2010 2010 UK general election United Kingdom 4 06.05.201
0
2010 South Korean local elections, 2010 South Korea 1 02.06.201
0
2010 Dutch general election, 2010 Netherlands 2 09.06.201
0
2010 Australian federal election, 2010 Australia 1 21.08.201
0
2010 Swedish general election, 2010 Sweden 1 19.09.201
0
2010 Midterm elections / United States House of Representatives elections, 2010 USA 4 02.11.201
0
2010 Gubernational elections: Georgia USA 1 02.11.201
0
2010 Gubernational elections: Ohio USA 1 02.11.201
0
2010 Gubernational elections: Rhode Island USA 1 02.11.201
0
2010 Gubernational elections: Vermont USA 1 02.11.201
0
2010 2010 superintendent elections South Korea 1 17.12.201
0
2011 Baden-Württemberg state election, 2011 Germany 1 27.03.201
1
2011 Rhineland-Palatinate state election, 2011 Germany 1 27.03.201
1
2011 Scottish parliament election 2011 Scotland 1 05.05.201
1
2011 Singapore’s 16th parliamentary General Election Singapore 1 07.05.201
1
2011 Norwegian local elections, 2011 Norway 2 12.09.201
1
2011 2011 Danish parliamentary election Denmark 2 15.09.201
1
18. 2011 Scottish parliament election 2011 Scotland 1 05.05.201
1
2011 Singapore’s 16th parliamentary General Election Singapore 1 07.05.201
1
2011 Norwegian local elections, 2011 Norway 2 12.09.201
1
2011 2011 Danish parliamentary election Denmark 2 15.09.201
1
2011 Berlin state election, 2011 Germany 2 18.09.201
1
2011 Gubernational elections: West Virginia USA 1 04.10.201
1
2011 Gubernational elections: Louisiana USA 1 22.10.201
1
2011 Swiss federal election, 2011 Switzerland 1 23.10.201
1
2011 2011 Seoul mayoral elections South Korea 1 26.10.201
1
2011 Gubernational eletions: Kentucky USA 1 08.11.201
1
2011 Gubernational elections: Mississippi USA 1 08.11.201
1
2011 Spanish national election 2011 Spain 1 20.11.201
1
2012 Queensland State election Australia 1 24.03.201
2
2012 South Korean legislative election, 2012 South Korea 1 11.04.201
2
2012 French presidential election, 2012 France 2 22.04.201
2
2012 Mexican general election, 2012 Mexico 1 01.07.201
2
2012 United States presidential election, 2012 / United States House of
Representatives elections, 2012
USA 17 06.11.201
2
2012 South Korean presidential election, 2012 South Korea 2 19.12.201
2
2013 Ecuadorian general election, 2013 Ecuador 1 17.02.201
3
2013 Venezuelan presidential election, 2013 Venezuela 1 14.04.201
3
2013 Paraguayan general election, 2013 Paraguay 1 21.04.201
3
19. Big DATA?
2013: twitter and election
No. of Tweets No. Of publications (2013)
0-500 3
501-1.000 4
1.001-5.000 1
5.001-10.000 1
10.001-50.000 7
50.001-100.000 4
100.001-500.000 5
500.001-1.000.000. 3
1.000.001-5.000.000 3
More than 5.000.000 3
More than 100.000.000 1
More than 1.000.000.000 1
No/unsufficient information 13
Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big
Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.
19
20. Comparability
twitter and election
Data collection methods
Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big
Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.
20
Data source number
No information 11
Collected manually from Twitter website (Copy-Paste /
Screenshot)
6
Twitter API (no further information) 8
Twitter Search API 3
Twitter Streaming API 1
Twitter Rest API 1
Twitter API user timeline 1
Own program for accessing Twitter APIs 4
Twitter Gardenhose 1
Official Reseller (Gnip, DataSift) 3
YourTwapperKeeper 3
Other tools (e.g. Topsy) 6
Received from colleagues 1
21. Comparability
twitter and election
Data collection periods
Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big
Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.
21
period Number of publications
(2013)
0-10 hours 1
1-2 days 6
3-7 days 3
8-14 days 5
2-4 weeks 7
1-2 months 13
2-6 months 5
7-12 months 3
More than 12 months 0
No/unsufficient information 6
22. What is being studied?
0
100
200
300
400
500
600
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Twitter
Facebook
YouTube
Blogs
Wikis
Foursquare
LinkedIn
MySpace
Number of publications per year, which mention the respective social media platform‘s name in their title. Scopus
Title Search. For details: http://kwelle.wordpress.com/2014/04/07/bibliometric-analysis-of-social-media-research/.22
23. Challenges
• Quickly changing landscape of social media
platforms
• Twitter as a model organism of social media
research?
23
24. Scopus: 2000-2013 popular keywords
Social networks (897), Social network (657)
User interfaces (1,007)
Social networking (online) (2,291)
Facebook (847)
Knowledge management (860)
Web services (869)
Information systems (810)
Twitter (667)
Semantics(765), Semantic Web(669)
Communication(650)
Information technology (639)
E-learning (623),
Students(579)
Education(520)
Teaching (504)
Scopus search, conducted March 2014: (TITLE-ABS-KEY("social media") OR TITLE-ABS-KEY("social web") OR TITLE-ABS-KEY("social
software") OR TITLE-ABS-KEY("web 2.0")) AND PUBYEAR > 1999
25. Social Media Research: Topics
• Political communication / elections
• Activism
• Popular culture, memes
• Brand communication, marketing
• Journalism (incl. agenda setting, citizen journalism, TV
backchannel)
• Crisis communication, disaster response
• Scholarly communication
• Language
• And many more
25
28. • Weller, Katrin, and Katharina E. Kinder-Kurlanda. 2014. ““I love thinking about
ethics!” Perspectives on ethics in social media research.” Internet Research (IR15),
Deagu, South Korea, 22.-24.10.2014. Paper to be published in Selected Papers of
Internet Research, view preprintHiddenDataEthics_Weller+Kinder-Kurlanda_IR15-
preprint.
• Kinder-Kurlanda, K. E.; Weller, K. (2014). “I always feel it must be great to be a
Hacker!”: The role of interdisciplinary work in social media research. In
Proceedings of the 2014 ACM Web Science Conference WebSci’14, June 23–26,
2014, Bloomington, IN, USA, pp. 91-98.
doi:http://dx.doi.org/10.1145/2615569.2615685. View preprint
versionhiddendata_websci14-preprint_Kinder-Kurlanda+Weller(2014).
• Weller, K. & Kinder-Kurlanda, K. (in press). Uncovering the Challenges in Collection,
Sharing and Documentation: The Hidden Data of Social Media Research? To
appear in Workshop on Standards and Practices in Large-Scale Social Media
Research. ICWSM, Oxford, May 2015.
http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewFile/10657/105
52.
28
29. CHANCES
• Researchers value social media as a new type of data.
• Previously „ephemeral data“ become visible
• Immediate – quick reaction to events
• Structured
• „natural“ data
29
What I find really interesting is that structure becomes manifest in
internet communication. So it’s the first time in history actually that
we can, that social structures between people become manifest
within a technology. (...) They become visible, they become
crawlable, they become analyzable.
30. Some of the CHALLENGES
Preliminary results, more detailed analysis to follow.
- Interdisciplinarity
- Ethics
- Standards
- Data access & infrastructure
- …
30
31. Unregulated and developing field
• Researchers showed a high awareness of the unregulated
and developing character of social media research
methods.
But, I think that (…) in like a couple of years, maybe
five – it depends a lot, because the subject of the
research is changing every day, (…) but I think that
we’re going to have, (…) more or less shared
qualitative approaches with a lot of good practices.
32. Data Sharing
32
But you can’t make your data available for others to look
at, which means both your study can’t really be replicated
and it can’t be tested for review. But also it just means your
data can’t be made available for other people to say, Ah
you have done this with it, I’ll see what I can do with it, (…)
There is no open data.
33. Data Sharing
“I think probably a couple times we’ve asked around if anyone
else happened to have a particular dataset. […] but not so much,
because they probably have tracked in a different data format,
and then merging the two together actually becomes quite
difficult as well.”
33
34. Ethics / privacy
34
“I will not quote tweets.”
“if somebody plays a really
important role in a particular event
then maybe they deserve the credit
of being accredited as well.”
36. Representativeness
Blank, G. (2014). Who uses Twitter? Representativeness of Twitter Users. Presentation at General Online Research GOR 14.
Retrieved from: http://conftool.gor.de/conftool14/index.php?page=downloadPaper&filename=Blank-
Who_uses_Twitter_Representativeness-119.pptx&form_id=119&form_version=final
34
26
8
12
18
14
10
17
12
23
28
3330
35
0
20
40
60
80
100
InterestPolitical activities
Interest
in politics
Send
political
message
Contact
MP online
Re-post
political
news
Political
comment
on SNS
Find
political
facts
Sign
online
petition
OxIS current users: 2013 N=1,613
Figure 6: Political Activities of Twitter Users
Twitter user Non-user
37. Data Quality
• E.g. comparison of Twitter API and Reseller
data.
37
Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the Sample Good Enough? Comparing Data from Twitter’s
Streaming API with Twitter’s Firehose. Retrieved from http://arxiv.org/abs/1306.5204
38. Inequality in data access possibilities
38
boyd, d., and Crawford, K. 2012. Critical questions for Big Data: Provocations for a cultural, technological, and scholarly
phenomenon. Information, Communication & Society 15(5):662–679. DOI: 10.1080/1369118X.2012.678878
• Data haves and data have nots
– Financial reasons
– Connections to companies
– Different skills
– …
39. Top 5 Challenges in Twitter research
• Representativeness and validity
• Cross-platform studies
• Comparisons (e.g. different countries, points in time)
• Multi-method approaches
• Context and meaning
Bruns, Axel, and Katrin Weller. 2014. "Twitter data analytics – or: the pleasures and perils of studying Twitter (guest editorial
for special issue)". Aslib Journal of Information Management 66 (3): 246-249. 39
40. Summary
Three sources of challenges in social media research:
• The variety of user interactions that count as social media
and their ever changing nature that makes social media a
moving target.
• The diversity of the research community, which challenges
knowledge transfer and development of standards.
• The dependency on commercial companies to open up
access to their data. Researchers themselves only have
limited means to change these sources of challenges.
40
Weller, K. (2015). Accepting the challenges of social media research. Online Information Review 39(3).
41. Summary
Currently addressed challenges
• research infrastructure, including data collection and
sharing facilities, training in new methods and
technologies.
• The call for more thoughtfulness in research ethics.
• Critical considerations on big data and data quality,
including reflection of the power of algorithms and
misrepresentations through big data approaches. Requests
for broader scopes by facilitating multi-method and multi-
platform studies, as well as longitudinal studies
41
42. Outlook
• Long term preservation of social media, i.e. archiving
of data and as well as of social media platforms’ look
and feel.
• Documentation of applied research methods which
should enable comparative studies across the single
use cases, quality control and verification of results.
• Accessing social media users’ expectations on privacy
in order to respond to them through ethical standards.
42