Your SlideShare is downloading. ×
0
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Wikimedia presentation data mining meetup pub
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Wikimedia presentation data mining meetup pub

360

Published on

Presentation given at SF Data Mining meetup in November 2010

Presentation given at SF Data Mining meetup in November 2010

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
360
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Intro – who I am Supposed to talk about how data is used at the foundation to support our cause
  • What’s our cause? Not just an encyclopedia
  • The way I’m going to talk about it is using this dichotomy 2011 fundraising: 29.5m 2011 budget: 24m, 17 of which came from online fundraiser
  • 40M words, 160k pages 50-60x larger than brittanica
  • http://commons.wikimedia.org/wiki/File:Editor_Survey_Report_-_April_2011.pdf http://stats.wikimedia.org/EN/TablesWikipediaEN.htm About half of all edits made by 1400 people
  • http://toolserver.org/~daniel/WikiSense/Contributors.php?wikilang=en&wikifam=.wikipedia.org&grouped=on&page=Global_warming http://en.wikipedia.org/wiki/Wikipedia:Database_reports/Pages_with_the_most_revisions
  • Removed article, wikipedia, information, page
  • What are we not doing with data that could support your research/work
  • Transcript

    • 1. data and
    • 2. <ul><li>Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s our commitment. </li></ul><ul><li>Jimmy Wales, Founder of Wikipedia </li></ul>
    • 3. <ul><li>is: Bigger than you think </li></ul><ul><li>Smaller than you think </li></ul>
    • 4. <ul><li>477,000,000 </li></ul><ul><li>Readers every month </li></ul>
    • 5. <ul><li>272 </li></ul><ul><li>Number of Wikipedia Language Versions </li></ul>
    • 6. The English Wikipedia: 10 years of data As of September 2011 <ul><ul><li>3,754,533 </li></ul></ul><ul><ul><li>3,806,293 </li></ul></ul><ul><ul><li>293,893,801 </li></ul></ul><ul><ul><li>2,337,355,406 </li></ul></ul><ul><ul><li>articles </li></ul></ul><ul><ul><li>people have edited </li></ul></ul><ul><ul><li>total edits </li></ul></ul><ul><ul><li>words (estimated) </li></ul></ul><ul><ul><li>= 9+ million pages! </li></ul></ul>
    • 7. User Funnel English Wikipedia per month <ul><li>200-300M Readers </li></ul><ul><li>35,000 Active Editors </li></ul><ul><li>3,500 Very Active Editors </li></ul><ul><li>(~80% of edits) </li></ul>91% male College Educated Average age: 32 Predominantly from North America, Western Europe
    • 8.
    • 9. Most Edited Wikipedia Article? George W. Bush
    • 10. Most Edited Pages Total Edits Total Unique Editors Article 43,648 13,783 George W. Bush 33,534 4,306 Barack Obama (discussion) 30,567 3,817 List of World Wrestling Entertainment employees 27,433 8,242 United States 25,308 2,609 Global warming (discussion) 25,224 1,821 Sarah Palin (discussion) 23,241 5,672 Michael Jackson 21,768 5,933 Jesus 21,501 4,647 George W. Bush (discussion) 21,343 753 Gaza War (discussion) <ul><li>In the month surrounding the release of Inconvenient Truth: </li></ul><ul><ul><ul><li>116 people edited >1 </li></ul></ul></ul><ul><ul><ul><li>32 people edited >5 </li></ul></ul></ul>
    • 11.
    • 12.
    • 13. Why do editors leave Wikipedia?
    • 14. 70% of new users receive their first message from a bot
    • 15. How we use data <ul><li>Past </li></ul><ul><li>Descriptive analysis </li></ul><ul><li>Why do people edit? </li></ul><ul><li>Why do they stop? </li></ul><ul><li>How can we make them stay longer? </li></ul><ul><li>What types of social interactions correlate with longevity? </li></ul><ul><li>Present </li></ul><ul><li>Experimentation </li></ul><ul><li>How can we create on-ramps into editing? </li></ul><ul><li>How can we improve interactions between new and experienced editors? </li></ul><ul><li>How can we acculturate new editors more effectively? </li></ul><ul><li>Future </li></ul><ul><li>Predictive modeling </li></ul><ul><li>How can we predict whether someone will be an active editor? </li></ul><ul><li>How can we predict when an editor is going to leave? </li></ul>
    • 16. Get Involved! <ul><li>Our data is open: </li></ul><ul><li>http://stats.wikimedia.org/ (excel) </li></ul><ul><li>http://toolserver.org/ (queries) </li></ul><ul><li>http://dumps.wikimedia.org/ (xml dumps - advanced) </li></ul><ul><ul><li>https://github.com/whym/wikihadoop </li></ul></ul><ul><li>Research hub: http://meta.wikimedia.org/wiki/Research </li></ul><ul><li>Survey: http://bit.ly/WikimediaData </li></ul><ul><li>Work with the Foundation! </li></ul>

    ×