Wikimedia presentation data mining meetup pub


Published on

Presentation given at SF Data Mining meetup in November 2010

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Intro – who I am Supposed to talk about how data is used at the foundation to support our cause
  • What’s our cause? Not just an encyclopedia
  • The way I’m going to talk about it is using this dichotomy 2011 fundraising: 29.5m 2011 budget: 24m, 17 of which came from online fundraiser
  • 40M words, 160k pages 50-60x larger than brittanica
  • About half of all edits made by 1400 people
  • Removed article, wikipedia, information, page
  • What are we not doing with data that could support your research/work
  • Wikimedia presentation data mining meetup pub

    1. 1. data and
    2. 2. <ul><li>Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s our commitment. </li></ul><ul><li>Jimmy Wales, Founder of Wikipedia </li></ul>
    3. 3. <ul><li>is: Bigger than you think </li></ul><ul><li>Smaller than you think </li></ul>
    4. 4. <ul><li>477,000,000 </li></ul><ul><li>Readers every month </li></ul>
    5. 5. <ul><li>272 </li></ul><ul><li>Number of Wikipedia Language Versions </li></ul>
    6. 6. The English Wikipedia: 10 years of data As of September 2011 <ul><ul><li>3,754,533 </li></ul></ul><ul><ul><li>3,806,293 </li></ul></ul><ul><ul><li>293,893,801 </li></ul></ul><ul><ul><li>2,337,355,406 </li></ul></ul><ul><ul><li>articles </li></ul></ul><ul><ul><li>people have edited </li></ul></ul><ul><ul><li>total edits </li></ul></ul><ul><ul><li>words (estimated) </li></ul></ul><ul><ul><li>= 9+ million pages! </li></ul></ul>
    7. 7. User Funnel English Wikipedia per month <ul><li>200-300M Readers </li></ul><ul><li>35,000 Active Editors </li></ul><ul><li>3,500 Very Active Editors </li></ul><ul><li>(~80% of edits) </li></ul>91% male College Educated Average age: 32 Predominantly from North America, Western Europe
    8. 8.
    9. 9. Most Edited Wikipedia Article? George W. Bush
    10. 10. Most Edited Pages Total Edits Total Unique Editors Article 43,648 13,783 George W. Bush 33,534 4,306 Barack Obama (discussion) 30,567 3,817 List of World Wrestling Entertainment employees 27,433 8,242 United States 25,308 2,609 Global warming (discussion) 25,224 1,821 Sarah Palin (discussion) 23,241 5,672 Michael Jackson 21,768 5,933 Jesus 21,501 4,647 George W. Bush (discussion) 21,343 753 Gaza War (discussion) <ul><li>In the month surrounding the release of Inconvenient Truth: </li></ul><ul><ul><ul><li>116 people edited >1 </li></ul></ul></ul><ul><ul><ul><li>32 people edited >5 </li></ul></ul></ul>
    11. 11.
    12. 12.
    13. 13. Why do editors leave Wikipedia?
    14. 14. 70% of new users receive their first message from a bot
    15. 15. How we use data <ul><li>Past </li></ul><ul><li>Descriptive analysis </li></ul><ul><li>Why do people edit? </li></ul><ul><li>Why do they stop? </li></ul><ul><li>How can we make them stay longer? </li></ul><ul><li>What types of social interactions correlate with longevity? </li></ul><ul><li>Present </li></ul><ul><li>Experimentation </li></ul><ul><li>How can we create on-ramps into editing? </li></ul><ul><li>How can we improve interactions between new and experienced editors? </li></ul><ul><li>How can we acculturate new editors more effectively? </li></ul><ul><li>Future </li></ul><ul><li>Predictive modeling </li></ul><ul><li>How can we predict whether someone will be an active editor? </li></ul><ul><li>How can we predict when an editor is going to leave? </li></ul>
    16. 16. Get Involved! <ul><li>Our data is open: </li></ul><ul><li> (excel) </li></ul><ul><li> (queries) </li></ul><ul><li> (xml dumps - advanced) </li></ul><ul><ul><li> </li></ul></ul><ul><li>Research hub: </li></ul><ul><li>Survey: </li></ul><ul><li>Work with the Foundation! </li></ul>