wikipedia as a data set
& analytical device
Erik Borra, June 2013
DMI research with Wikipedia
reference work
bureaucracy
scandal machine
repurposing Wikipedia as a ...
vigilant community
Medium-specific outlook
“follow the medium”
(Rogers 2009)
The Anatomy of a Wikipedia page
DMI research with Wikipedia
reference work
bureaucracy
scandal machine
repurposing Wikipedia as a ...
vigilant community
DMI research with Wikipedia
repurposing Wikipedia as a ...
reference work <=> cultural reference
bureaucracy
scandal machine
vigilant community
www.nature.com/nature/journal/
v438/n7070/extref/438900a-s1.doc
Burial of 465 identified Bosniaks,
Potočari, 2007.
Map of the Srebrenica military
operations, made by the U.S. Central
Intelligence Agency, with green
arrow showing the route of the
Bosnian forces.
Map of the location of Srebrenica,
the Republika Srpska,
Bosnia-Herzegovina.
Srebrenica-Potočari Memorial and
Cemetery, Bosnia-Herzegovina.
Grave of a 13-year old Bosniak boy.
Ratko Mladic.
An exhumed body with blindfold
and hands tied behind his back. As of
September 2012, the photo has been
removed from Wikipedia article.
Exhumed grave of victims, 2007.
Podrinje Identification Project's
facility for storing, processing, and
handling exhumed remains..
"UN left 8,000 to die in Bosnia."
Headline in The Independent,
30 October 1995.
Satellite photo of Nova Kasaba
mass grave.
International Criminal Tribunal for
the Former Yugoslavia, Den Haag,
the Netherlands.
DUTCH ENGLISH BOSNIAN CROATIAN SERBIAN
SERBO-
CROATIAN
Tool: Wikipedia Cross-Lingual
Image Analysis
Tool: triangulate
National Point of View
Neutral Point of View
Linguistic Point of View
manypedia.com - comparing linguistic points of view (LPOV)
omnipedia.northwestern.edu - making Wikipedia articles in
different languages comparable
DMI research with Wikipedia
repurposing Wikipedia as a ...
reference work
bureaucracy
scandal machine
vigilant community <=> socio-technical device
Wikipedia has been described in terms of open source intelligence(Stalder
and Hirsh, 2002), wisdom of crowds(Surowiecki, 2004; Kittur and Kraut, 2008),
many minds(Sunstein, 2006), collaborative knowledge(Poe, 2006,
McKenzie Wark 2007), an army of volunteers(Jenkins, 2006), mass
collaboration(Tapscott, 2007), distributed collaboration(Shirky,
2008), produsage(Bruns, 2008), crowdsourcing(Economist, 2008), and
mentioned in the context of free labour(Deuze, 2006), and the cult of the
amateur(Keen, 2007).
Tool: Wikipedia Edits Scraper and IP Localizer
+ screen recording software
Govcom.org, 2008.
http://govcom.org/
Analysis by Zachary Devereaux, Sabine Niederer, Richard Rogers, Bram Nijhof
and Michael Stevenson.
© 2008
Wikipedia editing by bots and users: Overall percentage of Bot
activity of all edit activity________________________________
08August
Wikipedia editing by bots and users: Overall percentage of Bot activity of all edit activity
Visualization: Rosa Menkman.
Percentage of bot edits
All wikipedia edits
Digital Methods Initiative, 2008.
www.digitalmethods.net
Analysis by Zachary Devereaux, Sabine Niederer, Richard Rogers, Bram Nijhof
and Michael Stevenson.
© 2008
Human, Bot and Software Assisted Activity on Top Twenty EN
Wikipedia Articles_________________________________________
0August
Human, Bot and Software Assisted Activity on Top Twenty EN Wikipedia Articles
46444644
Software Assisted Edits
30673067
Bot Edits
251378251378
Human Edits
Maps by: Rosa Menkman, Auke Touwslager and Marieke van Dijk
Govcomorg Jubilee, 2008.
www.govcom.org
Analysis by Zachary Devereaux, Sabine Niederer, Richard Rogers, Bram Nijhof
and Michael Stevenson
Visualization by Auke Touwslager
© 2008
Human, Bot and Software Assisted Activity on Top Twenty EN
Wikipedia Articles_________________________________________
0August
Percentage of bot activity
Language scaled by Wikipedia size
Cornish Võro Manx Ladino Friulian Romansh Aromanian Sanskrit Upper Sorbian Corsican Scottish Gaelic Samogitian Chuvash Aragonese Occitan Breton
Endangered Languages
Revived Languages
Hawaiian Cornish Manx West Frisian Belarusian Basque Galician Estonian Hebrew Czech Catalan
Wikipedia Bot Activity in Endangered and Revived Languages
What is the level of bot activity per language, looking at endangered and revived languages?
stats.wikimedia.org + bubble lines
Govcomorg Jubilee, 2008.
www.govcom.org
Analysis by Zachary Devereaux, Sabine Niederer, Richard Rogers, Bram Nijhof
and Michael Stevenson
Visualization by Auke Touwslager
© 2008
Human, Bot and Software Assisted Activity on Top Twenty EN
Wikipedia Articles_________________________________________
0August
Javanese
Urdu
Tamil
Hindi
Marathi Bengali
Telugu
Korean
Vietnamese
Arabic
Chinese
Russian
Spanish
Japanese
Italian
Portuguese
German
French
English
Percentage of bot activity
Language scaled by Wikipedia size
Wikipedia Bot Activity in Most-Used Languages Worldwide
What is the level of bot activity per language, looking at top 20 of most-used languages?
stats.wikimedia.org + Dorling
DMI research with Wikipedia
repurposing Wikipedia as a ...
reference work
bureaucracy
scandal machine <=> place of edits
vigilant community
DMI research with Wikipedia
repurposing Wikipedia as a ...
reference work
bureaucracy <=> controversy diagnostics machine
scandal machine
vigilant community
Administrative apparatus:
Wikipedia’s procedural policies, guidelines and essays
... with the purpose of reaching consensus
Core content policies
Wikipedia collaboration and conflict studies:
Cooperation and conflict (Viegas, 2004)
Conflict and coordination (Kittur, 2007)
Revert Graph (Suh et al, 2007)
Mutual reverts (Brandes, 2007)
Argument identification (Rad & Barosa, 2011)
Edit wars (Sumi, 2011)
Evolution of discussions (Kaltenbrunner & Laniado, 2012)
medium-specific approach:
repurpose how consensus principles (as method
of the medium) act on the objects of the medium
templates
links
images
references
interlanguage links
timestamps comments
Wikipedia objects
author
contropedia.net
DMI research with Wikipedia
reference work <=> cultural reference
bureaucracy <=> controversy diagnostics machine
scandal machine <=> places of edits
repurposing Wikipedia as a ...
vigilant community <=> socio-technical system
DMI projects mentioned
R. Rogers, Digital Methods , Cambridge, MA: MIT Press, 2013. Chapter 5.
https://wiki.digitalmethods.net/Dmi/DebottingWikipedia
https://www.digitalmethods.net/Digitalmethods/TheNetworkedContent
Contropedia, contropedia.net
S. Niederer and J. van Dijck, " Wisdom of the Crowd or Technicity of
Content? Wikipedia as Socio-technical System," New Media & Society,
12, 8, 2010, 1368-1387.
tools.digitalmethods.net
Tools native to Wikipedia
Page information
Search revision history
Contributors per article, ordered by number of edits
Page view statistics
stats.wikimedia.org - Data is published in exportable,
computable format, eg csv. Stats look at all wikis hosted
by the foundation, largest are the Wikipedias.
User edits searches, for all the edits of a specific user
Number of watchers
Other useful tools
manypedia.com - puts two language versions of an
article side by side / computes concept similarity
omnipedia.northwestern.edu - making Wikipedia
articles in different languages comparable
wikiscanner (defunct)
history flow (defunct)
http://vs.aka-online.de/cgi-bin/wppagehiststat.pl
builds an edit history overview page for the article
http://sonetlab.fbk.eu/wikitrip to see geographical
statistics for anonymous edits and gender
Academic literature on Wikipedia
http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia
Okoli et al. (2012). The people's encyclopedia under the gaze of
the sages: A systematic review of scholarly research on Wikipedia
www.digitalmethods.net
sabine@digitalmethods.net
thank you.
erik@digitalmethods.net

Repurposing Wikipedia: Wikipedia as data set and analytical device

  • 1.
    wikipedia as adata set & analytical device Erik Borra, June 2013
  • 2.
    DMI research withWikipedia reference work bureaucracy scandal machine repurposing Wikipedia as a ... vigilant community
  • 3.
  • 4.
    The Anatomy ofa Wikipedia page
  • 5.
    DMI research withWikipedia reference work bureaucracy scandal machine repurposing Wikipedia as a ... vigilant community
  • 6.
    DMI research withWikipedia repurposing Wikipedia as a ... reference work <=> cultural reference bureaucracy scandal machine vigilant community
  • 7.
  • 9.
    Burial of 465identified Bosniaks, Potočari, 2007. Map of the Srebrenica military operations, made by the U.S. Central Intelligence Agency, with green arrow showing the route of the Bosnian forces. Map of the location of Srebrenica, the Republika Srpska, Bosnia-Herzegovina. Srebrenica-Potočari Memorial and Cemetery, Bosnia-Herzegovina. Grave of a 13-year old Bosniak boy. Ratko Mladic. An exhumed body with blindfold and hands tied behind his back. As of September 2012, the photo has been removed from Wikipedia article. Exhumed grave of victims, 2007. Podrinje Identification Project's facility for storing, processing, and handling exhumed remains.. "UN left 8,000 to die in Bosnia." Headline in The Independent, 30 October 1995. Satellite photo of Nova Kasaba mass grave. International Criminal Tribunal for the Former Yugoslavia, Den Haag, the Netherlands. DUTCH ENGLISH BOSNIAN CROATIAN SERBIAN SERBO- CROATIAN Tool: Wikipedia Cross-Lingual Image Analysis
  • 10.
  • 11.
    National Point ofView Neutral Point of View Linguistic Point of View manypedia.com - comparing linguistic points of view (LPOV) omnipedia.northwestern.edu - making Wikipedia articles in different languages comparable
  • 14.
    DMI research withWikipedia repurposing Wikipedia as a ... reference work bureaucracy scandal machine vigilant community <=> socio-technical device
  • 15.
    Wikipedia has beendescribed in terms of open source intelligence(Stalder and Hirsh, 2002), wisdom of crowds(Surowiecki, 2004; Kittur and Kraut, 2008), many minds(Sunstein, 2006), collaborative knowledge(Poe, 2006, McKenzie Wark 2007), an army of volunteers(Jenkins, 2006), mass collaboration(Tapscott, 2007), distributed collaboration(Shirky, 2008), produsage(Bruns, 2008), crowdsourcing(Economist, 2008), and mentioned in the context of free labour(Deuze, 2006), and the cult of the amateur(Keen, 2007).
  • 16.
    Tool: Wikipedia EditsScraper and IP Localizer + screen recording software
  • 18.
    Govcom.org, 2008. http://govcom.org/ Analysis byZachary Devereaux, Sabine Niederer, Richard Rogers, Bram Nijhof and Michael Stevenson. © 2008 Wikipedia editing by bots and users: Overall percentage of Bot activity of all edit activity________________________________ 08August Wikipedia editing by bots and users: Overall percentage of Bot activity of all edit activity Visualization: Rosa Menkman. Percentage of bot edits All wikipedia edits
  • 19.
    Digital Methods Initiative,2008. www.digitalmethods.net Analysis by Zachary Devereaux, Sabine Niederer, Richard Rogers, Bram Nijhof and Michael Stevenson. © 2008 Human, Bot and Software Assisted Activity on Top Twenty EN Wikipedia Articles_________________________________________ 0August Human, Bot and Software Assisted Activity on Top Twenty EN Wikipedia Articles 46444644 Software Assisted Edits 30673067 Bot Edits 251378251378 Human Edits Maps by: Rosa Menkman, Auke Touwslager and Marieke van Dijk
  • 20.
    Govcomorg Jubilee, 2008. www.govcom.org Analysisby Zachary Devereaux, Sabine Niederer, Richard Rogers, Bram Nijhof and Michael Stevenson Visualization by Auke Touwslager © 2008 Human, Bot and Software Assisted Activity on Top Twenty EN Wikipedia Articles_________________________________________ 0August Percentage of bot activity Language scaled by Wikipedia size Cornish Võro Manx Ladino Friulian Romansh Aromanian Sanskrit Upper Sorbian Corsican Scottish Gaelic Samogitian Chuvash Aragonese Occitan Breton Endangered Languages Revived Languages Hawaiian Cornish Manx West Frisian Belarusian Basque Galician Estonian Hebrew Czech Catalan Wikipedia Bot Activity in Endangered and Revived Languages What is the level of bot activity per language, looking at endangered and revived languages? stats.wikimedia.org + bubble lines
  • 21.
    Govcomorg Jubilee, 2008. www.govcom.org Analysisby Zachary Devereaux, Sabine Niederer, Richard Rogers, Bram Nijhof and Michael Stevenson Visualization by Auke Touwslager © 2008 Human, Bot and Software Assisted Activity on Top Twenty EN Wikipedia Articles_________________________________________ 0August Javanese Urdu Tamil Hindi Marathi Bengali Telugu Korean Vietnamese Arabic Chinese Russian Spanish Japanese Italian Portuguese German French English Percentage of bot activity Language scaled by Wikipedia size Wikipedia Bot Activity in Most-Used Languages Worldwide What is the level of bot activity per language, looking at top 20 of most-used languages? stats.wikimedia.org + Dorling
  • 22.
    DMI research withWikipedia repurposing Wikipedia as a ... reference work bureaucracy scandal machine <=> place of edits vigilant community
  • 29.
    DMI research withWikipedia repurposing Wikipedia as a ... reference work bureaucracy <=> controversy diagnostics machine scandal machine vigilant community
  • 30.
    Administrative apparatus: Wikipedia’s proceduralpolicies, guidelines and essays ... with the purpose of reaching consensus
  • 31.
  • 32.
    Wikipedia collaboration andconflict studies: Cooperation and conflict (Viegas, 2004) Conflict and coordination (Kittur, 2007) Revert Graph (Suh et al, 2007) Mutual reverts (Brandes, 2007) Argument identification (Rad & Barosa, 2011) Edit wars (Sumi, 2011) Evolution of discussions (Kaltenbrunner & Laniado, 2012)
  • 33.
    medium-specific approach: repurpose howconsensus principles (as method of the medium) act on the objects of the medium
  • 37.
  • 46.
  • 47.
    DMI research withWikipedia reference work <=> cultural reference bureaucracy <=> controversy diagnostics machine scandal machine <=> places of edits repurposing Wikipedia as a ... vigilant community <=> socio-technical system
  • 48.
    DMI projects mentioned R.Rogers, Digital Methods , Cambridge, MA: MIT Press, 2013. Chapter 5. https://wiki.digitalmethods.net/Dmi/DebottingWikipedia https://www.digitalmethods.net/Digitalmethods/TheNetworkedContent Contropedia, contropedia.net S. Niederer and J. van Dijck, " Wisdom of the Crowd or Technicity of Content? Wikipedia as Socio-technical System," New Media & Society, 12, 8, 2010, 1368-1387.
  • 49.
  • 50.
    Tools native toWikipedia Page information Search revision history Contributors per article, ordered by number of edits Page view statistics stats.wikimedia.org - Data is published in exportable, computable format, eg csv. Stats look at all wikis hosted by the foundation, largest are the Wikipedias. User edits searches, for all the edits of a specific user Number of watchers
  • 51.
    Other useful tools manypedia.com- puts two language versions of an article side by side / computes concept similarity omnipedia.northwestern.edu - making Wikipedia articles in different languages comparable wikiscanner (defunct) history flow (defunct) http://vs.aka-online.de/cgi-bin/wppagehiststat.pl builds an edit history overview page for the article http://sonetlab.fbk.eu/wikitrip to see geographical statistics for anonymous edits and gender
  • 52.
    Academic literature onWikipedia http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia Okoli et al. (2012). The people's encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia
  • 53.