Wikipedia for Researchers


Published on

Wikipedia for Researchers talk, as given at the British Library.

The first part covers Wikipedia as a resource for researchers, looking at how it works, how to judge the reliability of content, and how to use Wikipedia as a starting point to access other resources.

The second part looks at how Wikipedia is used by researchers as a subject or a corpus, and gives an overview of the kinds of research being done on Wikipedia.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Wikipedia for Researchers

  1. 1. Wikipedia for Researchers Andrew Gray – Wikipedian in Residence / @generalising
  2. 2. About Wikipedia & Wikimedia Wikimedia  Movement and charitable body  80,000 contributors in 280 languages and eleven core projects  Image repository, dictionary, news site…  …read by 7% of the world! Wikipedia  19,000,000 articles, 4,000,000 in English  6,500 articles and 235,000 edits per day (…and ten years ago, this was all fields…) 2
  3. 3. …so what is Wikipedia? …an encyclopedia …written neutrally and verifiably …using previously published information …free to use, distribute, or reuse …a collaborative community …with no firm rules 3
  4. 4. Internal processes All edits are visible through watchlists and page histories  About 7% are vandalism or malicious; processes to detect these  Median time to correction < 2 minutes… but some stay much longer Individual discussion pages for all articles – “talk” Quality review and assessment process Specialised “wikiproject” working groups and central noticeboards  eg/ content topics; style; dispute resolution; copyright; etc. 4
  5. 5. Quality of Wikipedia On average… it’s not bad  In 2005 four errors per article, versus three in Britannica  In 2011, in English, Spanish & Arabic: “…the Wikipedia articles in this sample scored higher overall than the comparison articles with respect to accuracy, references, style/ readability and overall judgment…” Millions of articles – so many are, individually, problematic  Various ways of identifying “signs” of quality  Markers for quality are both obvious and subtle Very effective “springboard” tool 5
  6. 6. Looking for quality Corner icons  - article locked down in some way  - featured or “good” quality Problem tags Article talk pages and histories Style  Badly written or formatted articles = often neglected 6
  7. 7. Accessing other content Structured categories and navigational templates “What links here” 7
  8. 8. Moving on to other content Other languages – not translations, and may have more content Mousing over footnote markers Within the references:  Links through DOIs and other identifiers  ISBNs go to a special landing page  …and then out to libraries, booksellers, etc  ISSNs go to WorldCat  If an author, look for authority control links: 8
  9. 9. Preferences Available to logged in users Two particularly useful options:  New window for external links (Gadgets > Browsing)  Quality assessment in headers (Gadgets > Appearance)  Many others - mostly editor-oriented tools 9
  10. 10. Looking for sets of material Some tools available –  Complex to use, but rewarding CatScan: look for intersection of categories  “all physicists born in 1912” – 51 in English, 34 in German Full dumps of all data available – 10
  11. 11. Research about Wikipedia Thriving research around Wikipedia community & content  by mid-2011, 2100 peer-reviewed articles and 38 PhD theses  Active research committee and WMF support Regular report -  also @wikiresearch Major themes include:  Community and content creation  Reading and researching by users  Quality of content  Technical research 11
  12. 12. Research on communities Research on the Wikipedia communities:  Dynamics of community conflict, discussions, collaboration, voting, contribution, mentoring…  Demographics, motivation and specialisms of contributors  Patterns of growth and content creation/deletion  Effect of central programs on volunteer activity  Cross-cultural interaction 12
  13. 13. Research on users Research on usage of Wikipedia:  Specific searching behaviour  Patterns of usage (yearly, daily)  Tracking external events (eg swine flu) through Wikipedia  Search engine rankings  Change in usage by students  Effect of Wikipedia publication on wider literature 13
  14. 14. Research on content Research on the content of Wikipedia:  Evolution of content  Accuracy, coverage and quality  Biases – geographic, cultural, gender  Linguistic analysis  Visualisations of content  Effect of external publications on Wikipedia 14
  15. 15. Research on technical aspects Research on the technical side of Wikipedia:  Extensive work on scaling open-content services  Tools for detecting and handling vandalism  Algorithmic detection and identification of bias, spam  Practical research on uses of wikis 15
  16. 16. Research example – visualising art history 16
  17. 17. Research example – visualising editing patterns 17
  18. 18. Research example – editor activity 18