Successfully reported this slideshow.
Your SlideShare is downloading. ×

Wiki[mp]edia data sources & the MediaWiki API

Ad

Wiki[mp]edia
  data sources &
the MediaWiki API
   Brianna Laugher
        for #melhack
       November 2009

Ad

...

Ad

Wikipedia
13M articles total
3M+ articles in English
240+ languages
Simple English!

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Upcoming SlideShare
Moodle and The Cloud
Moodle and The Cloud
Loading in …3
×

Check these out next

1 of 19 Ad
1 of 19 Ad
Advertisement

More Related Content

More from Brianna Laugher (20)

Advertisement

Wiki[mp]edia data sources & the MediaWiki API

  1. 1. Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for #melhack November 2009
  2. 2. ...
  3. 3. Wikipedia 13M articles total 3M+ articles in English 240+ languages Simple English!
  4. 4. {{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
  5. 5. {{Infobox Company |name = Lonely Planet |logo = |type = [[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
  6. 6. Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly photographs, but also diagrams, maps, flags
  7. 7. Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries French biggest at 1.5M (English second at 1.4M)
  8. 8. JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
  9. 9. MediaWiki structure  Users  Logs  Pages, subpages, talk pages  Links, backlinks  Templates  Categories
  10. 10. MediaWiki markup The only thing that completely understands it is MediaWiki :(
  11. 11. Database dumps XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps
  12. 12. DBpedia Community project extracting structured data from Wikipedia and making it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
  13. 13. MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
  14. 14. mwclient Python library for accessing MediaWiki APIs
  15. 15. Toolserver toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks – tools Tools often explicitly implements implicit editing community standards (“community API”)
  16. 16. TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia Commons Lets you query templates very much like SQL
  17. 17. Thanks! identi.ca/pfctdayelise blaugher@wikimedia.org.au Logos and screenshots may be copyright their respective owners Slides are otherwise © Brianna Laugher

×