Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Wikidata


Published on

"Introduction to Wikidata" presentation given 26th April 2013, at the British Library

Published in: Technology, Education
  • Login to see the comments

Introduction to Wikidata

  1. 1. Introduction to WikidataBritish Library, 26/4/13Andrew | @generalising
  2. 2. Wikidata summary●Central data repository for Wikimedia projects●Human- and machine-readable●Human- and machine-editable●Fully multilingual●Supports semantic
  3. 3. Overall plan●Phase I– Centralise cross-language relationships●Phase II– Centralise core structured data●Phase III– Dynamic generation of list content
  4. 4. Phase I●Centralising all “interwiki” cross-language links– Historically, a major maintenance headache!●Single conceptual entity => many articles– ...some unexpected oddities arise; not all 1:1●Almost all entities now listed●Inclusion standards currently restricted
  5. 5. Phase I
  6. 6. Phase I – oddities#
  7. 7. Phase II●Building structured data on these entities●“Phase 2.1” - harvesting data from Wikipedia– and supplemented from other sources●“Phase 2.2” - displaying data on Wikipedia– autogenerated information templates
  8. 8. Phase II
  9. 9. Phase III●Automatic creation of lists and charts●Expected for late 2013...
  10. 10. Wikidata entities●Single entity corresponding to one or moreWikipedia articles– Name (in various languages) + WP links– Contains various Phase II properties– Properties can include sources/qualifiers●No support (yet!) for entities not existing in WP
  11. 11. Phase II – planned model
  12. 12. Phase II – initial properties●Limited properties – gradual roll-outStandard●Single“main type”, but no restrictions on use– “the capital of Julius Caesar”●Relational properties implemented– but no automatic reciprocity yet●String datatypes created for identifiers●130 properties currently in use
  13. 13. Phase II – future properties●Properties created by community discussion●Several awaiting datatypes:– time– geocoordinate– number (and dimension)●Qualifiers yet to be added
  14. 14. Data reuse●Permanent numeric identifier for all items●API available (JSON)– but still being developed!●Regular XML dumps –– all item/property data licensed as CC-0
  15. 15. Identifiers & authorities●GND, ISNI, LCCN, ULAN, VIAF, BNF,SUDOC, CALIS, CiNii, NDL, ICCU, NLA,MusicBrainz, IMDB●ISBN, ISSN, OCLC, DOI, NOR●OpenStreetMap IDs●Corporate, administrative, monument,chemical, gene identifiers, language codes●...and pigeon breed registries
  16. 16. Tools●Examples of toolsets:– GeneaWiki (visualise relations)– Reasonator (display interface)– Query API (experimental, alternative)– Tree of Life (static dump)