Wiki[mp]edia
  data sources &
the MediaWiki API
   Brianna Laugher
        for #melhack
       November 2009
...
Wikipedia
13M articles total
3M+ articles in English
240+ languages
Simple English!
{{coord|37|48|49|S|144|57|
47|E|type:city_region:AU-VIC|
display=inline,title}}

stable.toolserver.org/geohack/
wiki.tools...
{{Infobox Company
|name           = Lonely Planet
|logo         =
|type         = [[United Kingdom|British]] [[Government-...
Wikimedia Commons
   commons.wikimedia.org
   Multilingual
   5M+ files
   “Self-created”, PD, Flickr

   Predominantly ph...
Wiktionary
5M+ entries
170+ languages
13 languages > 100K entries

French biggest at 1.5M
(English second at 1.4M)
JavaScript Wiktionary lookup plugin for
third parties:
http://bawolff.blogspot.com/2009/10/introducing-
wiktionary-lookup-...
MediaWiki structure
    Users
    Logs

    Pages, subpages, talk pages

    Links, backlinks

    Templates

    Ca...
MediaWiki markup

The only thing that completely
understands it is MediaWiki :(
Database dumps
XML

download.wikimedia.org
OR Amazon Public Data Sets

meta.wikimedia.org/wiki/
Data_dumps
DBpedia
Community project extracting
structured data from Wikipedia and
making it available

Can download data sets or que...
MediaWiki API

mediawiki.org/wiki/API

en.wikipedia.org/w/api.php

Client libraries!
mwclient

Python library for
accessing MediaWiki
APIs
Toolserver
toolserver.org

Server for community-developed plugins,
addons, extensions, stats and hacks –
tools

Tools ofte...
TemplateTiger
toolserver.org/~kolossos/templatetiger/

For a few dozen Wikipedia languages, &
Wikimedia Commons

Lets you ...
Thanks!
identi.ca/pfctdayelise
blaugher@wikimedia.org.au

                        Logos and screenshots
                  ...
Wiki[mp]edia data sources & the MediaWiki API
Wiki[mp]edia data sources & the MediaWiki API
Upcoming SlideShare
Loading in...5
×

Wiki[mp]edia data sources & the MediaWiki API

2,717

Published on

For #melhack - http://lplabs.com/melbournehack/pmwiki/pmwiki.php/Main/HomePage

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,717
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Wiki[mp]edia data sources & the MediaWiki API"

  1. 1. Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for #melhack November 2009
  2. 2. ...
  3. 3. Wikipedia 13M articles total 3M+ articles in English 240+ languages Simple English!
  4. 4. {{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
  5. 5. {{Infobox Company |name = Lonely Planet |logo = |type = [[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
  6. 6. Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly photographs, but also diagrams, maps, flags
  7. 7. Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries French biggest at 1.5M (English second at 1.4M)
  8. 8. JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
  9. 9. MediaWiki structure  Users  Logs  Pages, subpages, talk pages  Links, backlinks  Templates  Categories
  10. 10. MediaWiki markup The only thing that completely understands it is MediaWiki :(
  11. 11. Database dumps XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps
  12. 12. DBpedia Community project extracting structured data from Wikipedia and making it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
  13. 13. MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
  14. 14. mwclient Python library for accessing MediaWiki APIs
  15. 15. Toolserver toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks – tools Tools often explicitly implements implicit editing community standards (“community API”)
  16. 16. TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia Commons Lets you query templates very much like SQL
  17. 17. Thanks! identi.ca/pfctdayelise blaugher@wikimedia.org.au Logos and screenshots may be copyright their respective owners Slides are otherwise © Brianna Laugher
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×