Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The sum of all human knowledge
in the age of machines
A new research agenda for Wikimedia
Dario Taraborelli • Wikimedia Fo...
Non-profit running Wikipedia and sister projects
Mission: support the creation and dissemination
of collaboratively produc...
35M articles in 288 languages 26M media files 60M triples
A conversation
Academic research on Wikipedia
rise and decline of the editor population
gender gap and content biases
contributor motivat...
WIkipedia’s rise and decline
https://meta.wikimedia.org/wiki/Research:The_Rise_and_Decline
Human curated knowledge
in the age of machines
the long-form encyclopedia
Outline
1. sourcing information
2. consuming information
3. distributing content
A new research agenda
Distributed innovat...
1. Sourcing information
Goats
https://en.wikipedia.org/wiki/Goat#Life_expectancy
https://www.wikidata.org/wiki/Q42
https://tools.wmflabs.org/wikidata-todo/stats.php
85%
1. Sourcing information
● What’s the role of humans in sourcing and verifying information when answers to most
questions a...
2. Consuming information
O. Keyes (2015) The Mobile Singularity is already here. Wikipedia and the Mobile Web
Bite-sized consumption
Structured contributions
Manipulating fragments
media
structured data
references
media
long-form
text
fragments
references geocoordinates
structured
data
decoupled
articl...
2. Consuming information
● Can we transform Wikipedia contents to make them suitable to bite-sized consumption?
● How to a...
3. Distributing content
The paradox of reuse
Routing attention
Women in Science
Wikipedia needs your help
The English Wikipedia article Women in
Science needs contribu...
Routing attention
Routing attention
3. Distributing content
● How can we design content distribution systems that do not intermediate Wikipedia?
● How do we l...
A new research agenda
Designing and evaluating systems to:
1. preserve and increase transparent sourcing of information
2....
Distributed innovation: how we work
Open knowledge curation ecosystem
Humans
Cyborgs
Machines
Wikimedia Research as a platform
Wikimedia Research & Data team
Edit/article quality classifiers
Automated link recommenda...
Scaling Wikimedia Research
1:100,000,000
Approximate ratio of full-time data scientists at WMF to monthly unique visitors
Formal collaborations
Stanford University
GroupLens, University of Minnesota
Oxford Internet Institute
Los Alamos National...
Open data
https://meta.wikimedia.org/wiki/Research:Data
Open data: pageviews
http://www.wikipediatrends.com
Open data: clickstream
Wulczyn, E; Taraborelli, D (2015): Wikipedia Clickstream. http://dx.doi.org/10.6084/m9.figshare.130...
Open data: tuples
https://www.wikidata.org/wiki/Wikidata:Data_access http://tools.wmflabs.org/wikidata-todo/tempo_spatial_...
Open data: real-time changes
https://wikitech.wikimedia.org/wiki/RCStream
Conclusions
Questions?
dario@wikimedia.org
@readermeter
@wikiresearch
Image credits
Election Night Crowd, Wellington, 1931
https://www.flickr.com/photos/nationallibrarynz_commons/3326203787
CC...
A new research agenda for Wikimedia – Big Dive 2015
A new research agenda for Wikimedia – Big Dive 2015
Upcoming SlideShare
Loading in …5
×

0

Share

Download to read offline

A new research agenda for Wikimedia – Big Dive 2015

Download to read offline

Slides from my talk at Big Dive 2015
http://www.bigdive.eu/

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

A new research agenda for Wikimedia – Big Dive 2015

  1. 1. The sum of all human knowledge in the age of machines A new research agenda for Wikimedia Dario Taraborelli • Wikimedia Foundation Big Dive, 16 June 2015
  2. 2. Non-profit running Wikipedia and sister projects Mission: support the creation and dissemination of collaboratively produced free knowledge. 250+ employees, mostly based in San Francisco 6th most popular web property by traffic of the planet
  3. 3. 35M articles in 288 languages 26M media files 60M triples
  4. 4. A conversation
  5. 5. Academic research on Wikipedia rise and decline of the editor population gender gap and content biases contributor motivation asymmetries in content and provenance of contributions socio-technical systems governing quality control.
  6. 6. WIkipedia’s rise and decline https://meta.wikimedia.org/wiki/Research:The_Rise_and_Decline
  7. 7. Human curated knowledge in the age of machines
  8. 8. the long-form encyclopedia
  9. 9. Outline 1. sourcing information 2. consuming information 3. distributing content A new research agenda Distributed innovation: how we work
  10. 10. 1. Sourcing information
  11. 11. Goats
  12. 12. https://en.wikipedia.org/wiki/Goat#Life_expectancy
  13. 13. https://www.wikidata.org/wiki/Q42
  14. 14. https://tools.wmflabs.org/wikidata-todo/stats.php 85%
  15. 15. 1. Sourcing information ● What’s the role of humans in sourcing and verifying information when answers to most questions are readily available from search engines? ● Should Wikipedia start integrating algorithmically extracted sources in its contents? ● Should Wikipedia further invest in supporting human generated citations?
  16. 16. 2. Consuming information
  17. 17. O. Keyes (2015) The Mobile Singularity is already here. Wikipedia and the Mobile Web
  18. 18. Bite-sized consumption
  19. 19. Structured contributions
  20. 20. Manipulating fragments
  21. 21. media structured data references media long-form text fragments references geocoordinates structured data decoupled article Decoupling the article long-form article
  22. 22. 2. Consuming information ● Can we transform Wikipedia contents to make them suitable to bite-sized consumption? ● How to accelerate extraction of structured data from Wikipedia and its use in Wikidata? ● How to design effective lightweight contribution funnels around structured data and content fragments? ● How to support programmatic manipulation of content fragments?
  23. 23. 3. Distributing content
  24. 24. The paradox of reuse
  25. 25. Routing attention Women in Science Wikipedia needs your help The English Wikipedia article Women in Science needs contributors from a more global perspective. Help expand it!
  26. 26. Routing attention
  27. 27. Routing attention
  28. 28. 3. Distributing content ● How can we design content distribution systems that do not intermediate Wikipedia? ● How do we leverage content syndication to route (expert) attention to the source?
  29. 29. A new research agenda Designing and evaluating systems to: 1. preserve and increase transparent sourcing of information 2. break down long-form articles into their constituents 3. optimize content fruition, as a function of access 4. enable lightweight contribution/manipulation of structured data / fragments 5. leverage content distributed / syndicated by 3rd parties 6. prioritize work and route contributors to the site, as a function of demand
  30. 30. Distributed innovation: how we work
  31. 31. Open knowledge curation ecosystem Humans Cyborgs Machines
  32. 32. Wikimedia Research as a platform Wikimedia Research & Data team Edit/article quality classifiers Automated link recommendations Article creation recommendations Fundraiser testing and optimization
  33. 33. Scaling Wikimedia Research 1:100,000,000 Approximate ratio of full-time data scientists at WMF to monthly unique visitors
  34. 34. Formal collaborations Stanford University GroupLens, University of Minnesota Oxford Internet Institute Los Alamos National Laboratory https://wikimediafoundation.org/wiki/Open_access_policy
  35. 35. Open data https://meta.wikimedia.org/wiki/Research:Data
  36. 36. Open data: pageviews http://www.wikipediatrends.com
  37. 37. Open data: clickstream Wulczyn, E; Taraborelli, D (2015): Wikipedia Clickstream. http://dx.doi.org/10.6084/m9.figshare.1305770
  38. 38. Open data: tuples https://www.wikidata.org/wiki/Wikidata:Data_access http://tools.wmflabs.org/wikidata-todo/tempo_spatial_display.html
  39. 39. Open data: real-time changes https://wikitech.wikimedia.org/wiki/RCStream
  40. 40. Conclusions
  41. 41. Questions? dario@wikimedia.org @readermeter @wikiresearch
  42. 42. Image credits Election Night Crowd, Wellington, 1931 https://www.flickr.com/photos/nationallibrarynz_commons/3326203787 CC0 King Billy of Dalkey Island https://www.flickr.com/photos/paulodonnell/5937678226 CC BY Secretary at typewriter, 1912 https://www.flickr.com/photos/muohio_digital_collections/3192197470 CC0 "Getting em up" at U.S.Naval Training Camp, Seattle, Washington. ca. 1917 - ca. 1918 https://www.flickr.com/photos/usnationalarchives/5505933145 CC0

Slides from my talk at Big Dive 2015 http://www.bigdive.eu/

Views

Total views

805

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

8

Shares

0

Comments

0

Likes

0

×