Introduction to Wikidata

960 views

Published on

"Introduction to Wikidata" presentation given 26th April 2013, at the British Library

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
960
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
14
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Introduction to Wikidata

  1. 1. Introduction to WikidataBritish Library, 26/4/13Andrew Grayandrew.gray@bl.uk | @generalising
  2. 2. Wikidata summary●Central data repository for Wikimedia projects●Human- and machine-readable●Human- and machine-editable●Fully multilingual●Supports semantic relationshipswww.wikidata.org
  3. 3. Overall plan●Phase I– Centralise cross-language relationships●Phase II– Centralise core structured data●Phase III– Dynamic generation of list content
  4. 4. Phase I●Centralising all “interwiki” cross-language links– Historically, a major maintenance headache!●Single conceptual entity => many articles– ...some unexpected oddities arise; not all 1:1●Almost all entities now listed●Inclusion standards currently restricted
  5. 5. Phase I
  6. 6. Phase I – oddities#
  7. 7. Phase II●Building structured data on these entities●“Phase 2.1” - harvesting data from Wikipedia– and supplemented from other sources●“Phase 2.2” - displaying data on Wikipedia– autogenerated information templates
  8. 8. Phase II
  9. 9. Phase III●Automatic creation of lists and charts●Expected for late 2013...
  10. 10. Wikidata entities●Single entity corresponding to one or moreWikipedia articles– Name (in various languages) + WP links– Contains various Phase II properties– Properties can include sources/qualifiers●No support (yet!) for entities not existing in WP
  11. 11. Phase II – planned model
  12. 12. Phase II – initial properties●Limited properties – gradual roll-outStandard●Single“main type”, but no restrictions on use– “the capital of Julius Caesar”●Relational properties implemented– but no automatic reciprocity yet●String datatypes created for identifiers●130 properties currently in use
  13. 13. Phase II – future properties●Properties created by community discussion●Several awaiting datatypes:– time– geocoordinate– number (and dimension)●Qualifiers yet to be added
  14. 14. Data reuse●Permanent numeric identifier for all items●API available (JSON)– but still being developed!●Regular XML dumps – dumps.wikimedia.org– all item/property data licensed as CC-0
  15. 15. Identifiers & authorities●GND, ISNI, LCCN, ULAN, VIAF, BNF,SUDOC, CALIS, CiNii, NDL, ICCU, NLA,MusicBrainz, IMDB●ISBN, ISSN, OCLC, DOI, NOR●OpenStreetMap IDs●Corporate, administrative, monument,chemical, gene identifiers, language codes●...and pigeon breed registries
  16. 16. Tools●Examples of toolsets:– GeneaWiki (visualise relations)– Reasonator (display interface)– Query API (experimental, alternative)– Tree of Life (static dump)

×