Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

877 views

Published on

Wikipedia is a central source of information as 450
million people consult the online encyclopaedia every
month to satisfy their information needs. Some of these
users also refer to Wikipedia within their tweets. In
this paper, we analyse links within tweets referring to
a Wikipedia of a language different from the tweet’s
language. Therefore, we investigate causes for the
usage of such inter-language links by comparing the
tweeted article and its counterpart in the tweet’s
language (if there is any) in terms of article quality.
We find that the main cause for inter-language links is
the non-existence of the article in the tweet’s
language. Furthermore, we observe that the quality of
the tweeted articles is constantly higher in comparison
to their counterparts, suggesting that users choose the
article of higher quality even when tweeting in another
language. Moreover, we find that English is the most
dominant target for inter-language links.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

  1. 1. Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links HICSS 49, January 8th, 2016 Eva Zangerle, Georg Schmidhammer, Günther Specht University of Innsbruck, Austria
  2. 2. 2 MotivationWhy this work does matter… • Wikipedia central source of information • 450 million users per month, 277 editions • Research focused on intrinsic factors • community • content • quality
  3. 3. 3 MotivationWhy this work does matter… • Wikipedia central source of information • 450 million users per month, 277 editions • Research focused on intrinsic factors • community • content • quality • What about extrinsic factors?
  4. 4. 4 Our Vision: Extrinsic Quality-Measures
  5. 5. 5 Inter-language Link Analysis Our Vision: Extrinsic Quality-Measures
  6. 6. 6 Previous Research Eva Zangerle, Georg Schmidhammer and Günther Specht. #Wikipedia on Twitter: Analyzing Tweets About Wikipedia. In Proceedings of the 11th International Symposium on Open Collaboration, OpenSym ’15, pages 14:1–14:8, New York, NY, USA, 2015. ACM. • Extrinsic view on Wikipedia via Twitter • 20% of all tweets lead to a Wikipedia other than the tweet‘s language (except for English and Japanese)
  7. 7. 7 Research Questions How are inter-language links distributed among the different Wikipedias? What are the causes for users to link to a Wikipedia other than the one of their langage?
  8. 8. Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links
  9. 9. 9 Crawling Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links • Twitter API • Search for keyword „wikipedia“ • 2014/10/20 – 2015/04/28 • 6,415,762 tweets in total • Extraction of links from tweets
  10. 10. 10 Cleaning Data Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links • Filter tweets with no Wikipedia URL contained • Bots contained in dataset • 99th percentile (>130 tweets) • BotOrNot Detection Service for 1,083 accounts • users and tweets deleted from dataset
  11. 11. 11 Cleaning Data Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links Feature Raw Cleaned Tweets 6,415,762 2,844,399 Retweets 2,040,816 855,959 Distinct Users 2,287,430 1,092,732 Mentions 4,673,284 2,437,092 Distinct Hashtags 213,574 127,958 Hashtag Usages 2,283,535 788,210 Distinct URLs 1,976,479 1,179,288 URL Usages 4,825,230 3,130,420
  12. 12. 12 Crawling Wikipedia Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links • MediaWiki API • Resolution of revision ID for time tweet was sent • Crawling of • article • headings • wikilinks • references • images • Last 500 edits
  13. 13. 13 Quality Measures Crawl Twitter Crawl Wikipedia Clean Data Quality Analyses Extract Links 1. Article length 2. Number of references (absolute) 3. Number of references (relative) 4. Diversity 5. Number of headings (absolute) 6. Number of headings (relative) Warncke-Wang, M., Cosley, D., and Riedl, J. "Tell Me More: An Actionable Quality Model for Wikipedia", in the proceedings of WikiSym 2013 7. Informativeness 8. Number of images (relative) 9. Number of wikilinks (relative) 10.Currency 11.HasInfoBox 12.Complexity (Flesch Kincaid)
  14. 14. Results
  15. 15. 15 RQ1: Distribution of (Inter-language) links Top3 Interlanguage Targets: 62.68 % English 6.26% Japanese 5.76% Spanish
  16. 16. 16 RQ2: Causes for Inter-language Links 85%do not have a counterpart in the tweet‘s language (out of 691,424 inter-language links)
  17. 17. 17 RQ2: Causes for Inter-language Links Remaining 15%: Could article quality be an issue? https://en.wikipedia.org/wiki/Black_Monday_(1987) https://es.wikipedia.org/wiki/Lunes_negro_(1987) originally posted counterpart
  18. 18. 18
  19. 19. 19
  20. 20. 20 RQ2: Causes for Inter-language Links • Remaining 99,776 articles: apply 12 quality measures to all originally posted articles and their counterparts • Group articles into language pairs (original and counterpart language) • For each article in language pair count number of measures original articles performance better than counterpart and vice versa (result: two vectors) • Wilcoxon signed rank test for each language pair
  21. 21. 21 RQ2: Causes for Inter-language Links for 58%of all language combinations the tweeted language is of significantly better quality (p < 0.05)
  22. 22. 22 Dominating Languages Target Better than (p < 0.05) Count English Spanish, Japanese, French, Korean, Italian, German, Arabic, Indonesian, Portuguese, Dutch, Turkish, Swedish, Thai, Polish, Romanian, Finnish, Danish, Norwegian, Farsi, Welsh, Hindi, Bulgarian, Latvian, Bosnian, Slovakish, Hung-arian, Slovenian, Lithuanian, Bosnian 28 French English, Japanese, Spanish 3 Spanish English, Italian 2 Catalan English, Portuguese 2 German English 1 Japanese German 1 Portuguese Spanish 1 Turkish English 1
  23. 23. 23 Dominating Languages • Most dominating target languages are English, Spanish, Japanese • most extensive Wikipedias • most active Wikipedias  more elaborate, mature articles than in user‘s language
  24. 24. 24 Quality Measures 66%of all articles tweeted feature a significantly higher quality for all twelve quality measures (p < 0.001)
  25. 25. 25 Quality Measures 97%of all articles tweeted feature a significantly higher quality for more than six quality measures (p < 0.001)
  26. 26. 26 Conclusion 85% of all inter-language links: no counterpart available Articles tweeted are of significantly higher quality (with English, Japanese and German dominating) Users deliberately tweet article of higher quality
  27. 27. Questions? any coffee break @eva_zangerle eva.zangerle@uibk.ac.at http://www.evazangerle.at http://dbis-informatik.uibk.ac.at https://www.facebook.com/dbisibk Contact
  28. 28. Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links Eva Zangerle, Georg Schmidhammer, Günther Specht University of Innsbruck, Austria

×