Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Wiki[mp]edia
  data sources &
the MediaWiki API
   Brianna Laugher
        for #melhack
       November 2009
...
Wikipedia
13M articles total
3M+ articles in English
240+ languages
Simple English!
{{coord|37|48|49|S|144|57|
47|E|type:city_region:AU-VIC|
display=inline,title}}

stable.toolserver.org/geohack/
wiki.tools...
{{Infobox Company
|name           = Lonely Planet
|logo         =
|type         = [[United Kingdom|British]] [[Government-...
Wikimedia Commons
   commons.wikimedia.org
   Multilingual
   5M+ files
   “Self-created”, PD, Flickr

   Predominantly ph...
Wiktionary
5M+ entries
170+ languages
13 languages > 100K entries

French biggest at 1.5M
(English second at 1.4M)
JavaScript Wiktionary lookup plugin for
third parties:
http://bawolff.blogspot.com/2009/10/introducing-
wiktionary-lookup-...
MediaWiki structure
    Users
    Logs

    Pages, subpages, talk pages

    Links, backlinks

    Templates

    Ca...
MediaWiki markup

The only thing that completely
understands it is MediaWiki :(
Database dumps
XML

download.wikimedia.org
OR Amazon Public Data Sets

meta.wikimedia.org/wiki/
Data_dumps
DBpedia
Community project extracting
structured data from Wikipedia and
making it available

Can download data sets or que...
MediaWiki API

mediawiki.org/wiki/API

en.wikipedia.org/w/api.php

Client libraries!
mwclient

Python library for
accessing MediaWiki
APIs
Toolserver
toolserver.org

Server for community-developed plugins,
addons, extensions, stats and hacks –
tools

Tools ofte...
TemplateTiger
toolserver.org/~kolossos/templatetiger/

For a few dozen Wikipedia languages, &
Wikimedia Commons

Lets you ...
Thanks!
identi.ca/pfctdayelise
blaugher@wikimedia.org.au

                        Logos and screenshots
                  ...
Wiki[mp]edia data sources & the MediaWiki API
Wiki[mp]edia data sources & the MediaWiki API
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
Next
Upcoming SlideShare
Deploying MediaWiki On IBM DB2 in The Cloud Presentation
Next
Download to read offline and view in fullscreen.

Share

Wiki[mp]edia data sources & the MediaWiki API

Download to read offline

For #melhack - http://lplabs.com/melbournehack/pmwiki/pmwiki.php/Main/HomePage

Related Books

Free with a 30 day trial from Scribd

See all

Wiki[mp]edia data sources & the MediaWiki API

  1. 1. Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for #melhack November 2009
  2. 2. ...
  3. 3. Wikipedia 13M articles total 3M+ articles in English 240+ languages Simple English!
  4. 4. {{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
  5. 5. {{Infobox Company |name = Lonely Planet |logo = |type = [[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
  6. 6. Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly photographs, but also diagrams, maps, flags
  7. 7. Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries French biggest at 1.5M (English second at 1.4M)
  8. 8. JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
  9. 9. MediaWiki structure  Users  Logs  Pages, subpages, talk pages  Links, backlinks  Templates  Categories
  10. 10. MediaWiki markup The only thing that completely understands it is MediaWiki :(
  11. 11. Database dumps XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps
  12. 12. DBpedia Community project extracting structured data from Wikipedia and making it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
  13. 13. MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
  14. 14. mwclient Python library for accessing MediaWiki APIs
  15. 15. Toolserver toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks – tools Tools often explicitly implements implicit editing community standards (“community API”)
  16. 16. TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia Commons Lets you query templates very much like SQL
  17. 17. Thanks! identi.ca/pfctdayelise blaugher@wikimedia.org.au Logos and screenshots may be copyright their respective owners Slides are otherwise © Brianna Laugher
  • cyrildoussin

    Nov. 14, 2009

For #melhack - http://lplabs.com/melbournehack/pmwiki/pmwiki.php/Main/HomePage

Views

Total views

3,445

On Slideshare

0

From embeds

0

Number of embeds

286

Actions

Downloads

12

Shares

0

Comments

0

Likes

1

×