• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Wikimedia presentation data mining meetup pub
 

Wikimedia presentation data mining meetup pub

on

  • 397 views

Presentation given at SF Data Mining meetup in November 2010

Presentation given at SF Data Mining meetup in November 2010

Statistics

Views

Total Views
397
Views on SlideShare
397
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Intro – who I am Supposed to talk about how data is used at the foundation to support our cause
  • What’s our cause? Not just an encyclopedia
  • The way I’m going to talk about it is using this dichotomy 2011 fundraising: 29.5m 2011 budget: 24m, 17 of which came from online fundraiser
  • 40M words, 160k pages 50-60x larger than brittanica
  • http://commons.wikimedia.org/wiki/File:Editor_Survey_Report_-_April_2011.pdf http://stats.wikimedia.org/EN/TablesWikipediaEN.htm About half of all edits made by 1400 people
  • http://toolserver.org/~daniel/WikiSense/Contributors.php?wikilang=en&wikifam=.wikipedia.org&grouped=on&page=Global_warming http://en.wikipedia.org/wiki/Wikipedia:Database_reports/Pages_with_the_most_revisions
  • Removed article, wikipedia, information, page
  • What are we not doing with data that could support your research/work

Wikimedia presentation data mining meetup pub Wikimedia presentation data mining meetup pub Presentation Transcript

  • data and
    • Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s our commitment.
    • Jimmy Wales, Founder of Wikipedia
    • is: Bigger than you think
    • Smaller than you think
    • 477,000,000
    • Readers every month
    • 272
    • Number of Wikipedia Language Versions
  • The English Wikipedia: 10 years of data As of September 2011
      • 3,754,533
      • 3,806,293
      • 293,893,801
      • 2,337,355,406
      • articles
      • people have edited
      • total edits
      • words (estimated)
      • = 9+ million pages!
  • User Funnel English Wikipedia per month
    • 200-300M Readers
    • 35,000 Active Editors
    • 3,500 Very Active Editors
    • (~80% of edits)
    91% male College Educated Average age: 32 Predominantly from North America, Western Europe
  • Most Edited Wikipedia Article? George W. Bush
  • Most Edited Pages Total Edits Total Unique Editors Article 43,648 13,783 George W. Bush 33,534 4,306 Barack Obama (discussion) 30,567 3,817 List of World Wrestling Entertainment employees 27,433 8,242 United States 25,308 2,609 Global warming (discussion) 25,224 1,821 Sarah Palin (discussion) 23,241 5,672 Michael Jackson 21,768 5,933 Jesus 21,501 4,647 George W. Bush (discussion) 21,343 753 Gaza War (discussion)
    • In the month surrounding the release of Inconvenient Truth:
        • 116 people edited >1
        • 32 people edited >5
  • Why do editors leave Wikipedia?
  • 70% of new users receive their first message from a bot
  • How we use data
    • Past
    • Descriptive analysis
    • Why do people edit?
    • Why do they stop?
    • How can we make them stay longer?
    • What types of social interactions correlate with longevity?
    • Present
    • Experimentation
    • How can we create on-ramps into editing?
    • How can we improve interactions between new and experienced editors?
    • How can we acculturate new editors more effectively?
    • Future
    • Predictive modeling
    • How can we predict whether someone will be an active editor?
    • How can we predict when an editor is going to leave?
  • Get Involved!
    • Our data is open:
    • http://stats.wikimedia.org/ (excel)
    • http://toolserver.org/ (queries)
    • http://dumps.wikimedia.org/ (xml dumps - advanced)
      • https://github.com/whym/wikihadoop
    • Research hub: http://meta.wikimedia.org/wiki/Research
    • Survey: http://bit.ly/WikimediaData
    • Work with the Foundation!