#FAIL!
THINGS THAT DIDN‘T WORK OUT IN SOCIAL
MEDIA RESEARCH
- AND WHAT WE CAN LEARN FROM THEM
Workshop at Internet Research 16, Phoenix, October 21st, 2015.
• Workshop hashtag: #fail2015b
• Conference hashtag: #ir16
• Workshop website:
http://failworkshops.wordpress.com
• Etherpad: https://pad.okfn.org/p/fail
WELCOME
Luca Rossi
@LR
Karine Nahon
@karineb
Katrin Weller
@kwelle
ABOUT #FAIL! WORKSHOPS
• Traveling on to different conferences. First
workshop was at WebSci15 (June 2015)
• Aim: collect various examples for things that
can go wrong and share them with different
communities
learn from experiences
Connect different research communities
WHAT WE‘VE LEARNED SO FAR
0
100
200
300
400
500
600
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Twitter
Facebook
YouTube
Blogs
Wikis
Foursquare
LinkedIn
MySpace
Number of publications per year, which mention the respective social media platform‘s name in their title. Scopus
Title Search. For details: http://kwelle.wordpress.com/2014/04/07/bibliometric-analysis-of-social-media-research/
SOCIAL MEDIA RESEARCH
2008-2013 papers on Twitter and elections: data sources
Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big
Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.
6
Data source number
No information 11
Collected manually from Twitter website (Copy-Paste /
Screenshot)
6
Twitter API (no further information) 8
Twitter Search API 3
Twitter Streaming API 1
Twitter Rest API 1
Twitter API user timeline 1
Own program for accessing Twitter APIs 4
Twitter Gardenhose 1
Official Reseller (Gnip, DataSift) 3
YourTwapperKeeper 3
Other tools (e.g. Topsy) 6
Received from colleagues 1
SOCIAL MEDIA RESEARCH
What we discussed at the first workshop…
CURRENT PROBLEMS
Challenge 1: users
• How to involve social media users in the
research process?
• Presentation by Elodie Crespel: “Extending
data collection with web browser extension”
– Participants may be creative in their use of
technology – flexibility is needed.
Challenge 2: methods
• Data analytics: which approaches should be
chosen?
• Taha Yasseri: “The double-edged sword of
statistical significance”
– Questions the p-value as a standard for data
analytics.
– “too much of attention and reliance on specific
measures or methods without being aware of the
logic behind them, can be misleading”
Challenge 3: tools
• Many researchers use third party tools for
data collection or analysis – which may not
always work as expected.
• Presentation by Michael Bossetta and
Anamaria Dutceac Segesten: “Tracing
Eurosceptic Party Networks via Hyperlink
Network Analysis and #FAIL!ng: Can Web
Crawlers Keep up with Web Design?”
– Exemplary case: issuecrawler.
Challenge 4: content
• Content analysis is heavily effected by the
dynamic nature of social media.
• Presentation by Marie Van Cranenbroeck:
“Managing and Using Unstable Data in a Social
Science Research about Museums and
Audiences on Social Media”
– Data collection and storage challenges
Specific details and additions
• Researchers and users may have different ideas
about the definition of social media / social
networks
• Lack of evaluation standards
• Availability of data (also: not enough data)
• Data may be corrupt (e.g. missing data)
• Social media as a moving target (Karpf, D. (2012).
Social science research methods in Internet time.
Information, Communication & Society
15(5):639-661. )
Meta discussion
• Social media research can have various forms.
Different disciplines involved.
• Best practices and pitfalls in social media
research are mainly discussed informally. Few
possibilities to share unsuccessful approaches.
WHAT WE‘D LIKE TO LEARN TODAY
• Towards a categorization of challenges for
social media research: what can go wrong?
• Collection of more experiences
• Structuring them into different categories
WHAT WE‘D LIKE TO LEARN TODAY
Today:
- 4 presentations
- Think about your own experiences!
- … in connection to each presentation
- … in general
9:00 Introduction: “What we’ve learnt from the first workshop and
what we’d like to learn today”
9:15 Shawn Walker: “Complexity of collecting social media data
in ephemeral contexts”
9:40 Cornelius Puschmann: “Why LIWC sucks (or: saner options
for social media content analysis)”
10:05 Break
10:20 Luca Rossi: “The fourth deadly sin of social media researchers
(or: scientific research and unstable socio-technical platforms)”
10:45 Marco Toledo Bastos, “Individual Behavior from Aggregate
Social Media Data“
11:10 Discussion & Conclusions
12:00 End
PROGRAM
• Other experiences? Share your thoughts!
• Main categories of #fail cases?
• Top 3 take away messages for next workshop?
DISCUSSION
WHERE TO GO FROM HERE?
• Next steps – lessons learnt for future
workshop organisation
• Which additional conferences?
• Publication? Guidebook?
• Archiving:
– URLs may vanish (Question: linear rate of decay?)
– Images missing
– Platforms changing (moving target!) – not just about the interface!
• Visualization of results
– Word cloud (compare histograms)
• Tools
– sentiment140, Internet Archive, GNIP
• Methods
– Content Analysis:
• replicability? Validation?
• Context for social media contents (e.g. surrounding tweets).
• LIWC, General Inquirer
– Predictions
– „Data Science“
• Lack of theory
• Data Quality:
– Can we still cite/use data and research published in 2007/2008`?
– Baseline? (how to define for a moving target)
• Theory:
– Can we only do descriptive work for single platforms?
– Look for the theory instead for the data?
• Meta
– Systematic review of existing literature is needed
• Documentation
– Timeframe generalizaion
– Document time, cultures?
– How long will my results be valid?
– Have a general base for comparison

Fail ir16 intro

  • 1.
    #FAIL! THINGS THAT DIDN‘TWORK OUT IN SOCIAL MEDIA RESEARCH - AND WHAT WE CAN LEARN FROM THEM Workshop at Internet Research 16, Phoenix, October 21st, 2015.
  • 2.
    • Workshop hashtag:#fail2015b • Conference hashtag: #ir16 • Workshop website: http://failworkshops.wordpress.com • Etherpad: https://pad.okfn.org/p/fail WELCOME Luca Rossi @LR Karine Nahon @karineb Katrin Weller @kwelle
  • 3.
    ABOUT #FAIL! WORKSHOPS •Traveling on to different conferences. First workshop was at WebSci15 (June 2015) • Aim: collect various examples for things that can go wrong and share them with different communities learn from experiences Connect different research communities
  • 4.
  • 5.
    0 100 200 300 400 500 600 2001 2002 20032004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Twitter Facebook YouTube Blogs Wikis Foursquare LinkedIn MySpace Number of publications per year, which mention the respective social media platform‘s name in their title. Scopus Title Search. For details: http://kwelle.wordpress.com/2014/04/07/bibliometric-analysis-of-social-media-research/ SOCIAL MEDIA RESEARCH
  • 6.
    2008-2013 papers onTwitter and elections: data sources Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript. 6 Data source number No information 11 Collected manually from Twitter website (Copy-Paste / Screenshot) 6 Twitter API (no further information) 8 Twitter Search API 3 Twitter Streaming API 1 Twitter Rest API 1 Twitter API user timeline 1 Own program for accessing Twitter APIs 4 Twitter Gardenhose 1 Official Reseller (Gnip, DataSift) 3 YourTwapperKeeper 3 Other tools (e.g. Topsy) 6 Received from colleagues 1 SOCIAL MEDIA RESEARCH
  • 7.
    What we discussedat the first workshop… CURRENT PROBLEMS
  • 8.
    Challenge 1: users •How to involve social media users in the research process? • Presentation by Elodie Crespel: “Extending data collection with web browser extension” – Participants may be creative in their use of technology – flexibility is needed.
  • 9.
    Challenge 2: methods •Data analytics: which approaches should be chosen? • Taha Yasseri: “The double-edged sword of statistical significance” – Questions the p-value as a standard for data analytics. – “too much of attention and reliance on specific measures or methods without being aware of the logic behind them, can be misleading”
  • 10.
    Challenge 3: tools •Many researchers use third party tools for data collection or analysis – which may not always work as expected. • Presentation by Michael Bossetta and Anamaria Dutceac Segesten: “Tracing Eurosceptic Party Networks via Hyperlink Network Analysis and #FAIL!ng: Can Web Crawlers Keep up with Web Design?” – Exemplary case: issuecrawler.
  • 11.
    Challenge 4: content •Content analysis is heavily effected by the dynamic nature of social media. • Presentation by Marie Van Cranenbroeck: “Managing and Using Unstable Data in a Social Science Research about Museums and Audiences on Social Media” – Data collection and storage challenges
  • 12.
    Specific details andadditions • Researchers and users may have different ideas about the definition of social media / social networks • Lack of evaluation standards • Availability of data (also: not enough data) • Data may be corrupt (e.g. missing data) • Social media as a moving target (Karpf, D. (2012). Social science research methods in Internet time. Information, Communication & Society 15(5):639-661. )
  • 13.
    Meta discussion • Socialmedia research can have various forms. Different disciplines involved. • Best practices and pitfalls in social media research are mainly discussed informally. Few possibilities to share unsuccessful approaches.
  • 14.
    WHAT WE‘D LIKETO LEARN TODAY • Towards a categorization of challenges for social media research: what can go wrong? • Collection of more experiences • Structuring them into different categories
  • 15.
    WHAT WE‘D LIKETO LEARN TODAY Today: - 4 presentations - Think about your own experiences! - … in connection to each presentation - … in general
  • 16.
    9:00 Introduction: “Whatwe’ve learnt from the first workshop and what we’d like to learn today” 9:15 Shawn Walker: “Complexity of collecting social media data in ephemeral contexts” 9:40 Cornelius Puschmann: “Why LIWC sucks (or: saner options for social media content analysis)” 10:05 Break 10:20 Luca Rossi: “The fourth deadly sin of social media researchers (or: scientific research and unstable socio-technical platforms)” 10:45 Marco Toledo Bastos, “Individual Behavior from Aggregate Social Media Data“ 11:10 Discussion & Conclusions 12:00 End PROGRAM
  • 17.
    • Other experiences?Share your thoughts! • Main categories of #fail cases? • Top 3 take away messages for next workshop? DISCUSSION
  • 18.
    WHERE TO GOFROM HERE? • Next steps – lessons learnt for future workshop organisation • Which additional conferences? • Publication? Guidebook?
  • 19.
    • Archiving: – URLsmay vanish (Question: linear rate of decay?) – Images missing – Platforms changing (moving target!) – not just about the interface! • Visualization of results – Word cloud (compare histograms) • Tools – sentiment140, Internet Archive, GNIP • Methods – Content Analysis: • replicability? Validation? • Context for social media contents (e.g. surrounding tweets). • LIWC, General Inquirer – Predictions – „Data Science“ • Lack of theory • Data Quality: – Can we still cite/use data and research published in 2007/2008`? – Baseline? (how to define for a moving target) • Theory: – Can we only do descriptive work for single platforms? – Look for the theory instead for the data? • Meta – Systematic review of existing literature is needed • Documentation – Timeframe generalizaion – Document time, cultures? – How long will my results be valid? – Have a general base for comparison

Editor's Notes

  • #4 Social media platforms and users – a moving target… Best practices and pitfalls in social media research are mainly discussed informally. Few possibilities to share unsuccessful approaches. Researchers with lots of different disciplinary backgrounds enter the field. Different fields of expertise, few interdisciplinary exchange of approaches. Limited possibilities for data sharing / validation and reproduction of results.