Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Up and running with Wikidata

2,658 views

Published on

An introduction to Wikidata presented on December 14, 2014 to Wikimedia New York City at the Brooklyn Law Incubator & Policy (BLIP) facility.

Contains minor edits and corrections from presentation.

Released under CC0.

Published in: Data & Analytics
  • Be the first to comment

Up and running with Wikidata

  1. 1. Up and running with Wikidata Emw New York City Wikidata workshop 2014-12-14
  2. 2. Wikidata is a free linked database that can be read and edited by humans and machines.
  3. 3. Wikidata's goals ● Centralize interwiki links ● Centralize infoboxes ● Provide an interface for rich queries ● Structure the sum of all human knowledge
  4. 4. What you'll learn from this talk ● How to edit Wikidata ● How to classify ● Ideas for small projects ● Wikidata vocabulary ● Where to find things ● Awesome tools
  5. 5. Elements of a Wikidata statement
  6. 6. Example: New York City (Q60)
  7. 7. Items and properties ● Each item and property has its own page ● Items – Represent subjects: Douglas Adams, Challenger disaster – Have identifiers like Q42, Q921090 – 12,906,291 items ● Properties – Represent attribute names: occupation, has cause – Have identifiers like P106, P828 – 1,329 properties
  8. 8. Statements and claims ● Claims – Claims are “triplets” ● Formally: subject, predicate, object ● In Wikidata: item, property, value ● Example: Douglas Adams, occupation, author ● Statements – A claim is only part of a statement – Statements also include: ● References ● Ranks
  9. 9. Qualifiers, ranks, references ●Qualifiers – Qualifiers are properties used on claims rather than items – “Yonkers population 12,733 at time (P585) 1860” ●Ranks – Preferred, normal, deprecated – Useful to mark outdated claims ●References – Source of claim; provenance – “... stated in (P248) 1860 United States Census”
  10. 10. More on Wikidata vocabulary https://www.wikidata.org/wiki/Wikidata:Glossary
  11. 11. Finding Wikidata items Wikipedia articles have a Wikidata item link in the left navigation panel.
  12. 12. Finding Wikidata items Wikidata search is quick and effective. Instant search suggests items that have labels or aliases matching your keyword.
  13. 13. Search by label
  14. 14. Search by alias: “flu” -> influenza
  15. 15. Finding properties ● Is there a property for “number of windows”? ● What was the ID of that property, again? ● Search – In main site search box, prefix search term with “P:” – “P:number of”, “P:occupation” – Instant search doesn't work for properties, only items ● Browse – https://www.wikidata.org/wiki/Wikidata:List_of_properties ^ bookmark this!
  16. 16. Let's edit Wikidata.
  17. 17. Walking through edits for: Yonkers, New York https://www.wikidata.org/wiki/Q128114
  18. 18. Yonkers TODO https://www.wikidata.org/wiki/Q128114 ● population (P1082) claims for historical table in https://en.wikipedia.org/wiki/Yonkers,_New_York#Demographics ● Include references! “1860 United States Census”, etc. ● To add to item from inbofox: – head of government (P6), office held by head of government (P1313) – date of foundation or creation (P571) – ZIP code (P281)
  19. 19. Area? Population density? ● Properties with units (km^2, people/km^2, $) are not yet possible ● “Units” datatype in development ● https://phabricator.wikimedia.org/T65722
  20. 20. Tools – Querying: Autolist, by Magnus Manske ● http://tools.wmflabs.org/autolist/autolist1.html – Batch editing: Widar, by Magnus Manske ● https://tools.wmflabs.org/autolist/ – Software framework: Wikidata Toolkit, by Markus Kroetzsch et al. ● https://www.mediawiki.org/wiki/Wikidata_Toolkit ● https://github.com/Wikidata/Wikidata-Toolkit
  21. 21. Querying in Wikidata List of politicians who died of a heart attack Pseudo-query: occupation: politician AND cause of death: heart attack occupation: P106 politician: Q82955 cause of death: P509 heart attack: Q12152 Wikidata query in Autolist: claim[106:82955] AND claim[509:12152]
  22. 22. http://tools.wmflabs.org/autolist/autolist1.html?q=claim[106:82955]%20AND%20claim[509:12152]
  23. 23. Classification on Wikidata ● Taxonomy of knowledge ● Enables powerful inference, novel applications ● Interesting philosophical, design, and engineering issues
  24. 24. Tree of Porphyry User:VoiceOfTheCommons, CC-BY-SA 3.0
  25. 25. Classes and instances ● Plato is a human is a animal ● Plato instance of human subclass of animal ● Instance: concrete object, individual ● Class: abstract object
  26. 26. Classification on Wikidata ● instance of (P31) – rdf:type in RDF and OWL – 11,930,243 usages – Most popular Wikidata property ● subclass of (P279) – “all instances of A are also instances of B” – rdfs:subClassOf in RDF and OWL – 170,571 usages
  27. 27. Examples ● USS Nimitz instance of Nimitz-class aircraft carrier Nimitz-class aircraft carrier subclass of aircraft carrier ● 2012 Cannes Film Festival instance of Cannes Film Festival Cannes Film Festival subclass of film festival ● an individual charm quark instance of charm quark charm quark subclass of quark ^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances. (There are no items about individual quarks on Wikidata!) https://www.wikidata.org/wiki/Help:Basic_membership_properties
  28. 28. Bad smells Item has many instance of or subclass of claims Items typically satisfy a huge number of instance of claims: ● Fido instance of dog ● Fido instance of English Pointer ● Fido instance of faithful animal ● … Solution: use one class for instance of, put other class knowledge into normal properties ● Fido instance of dog ● Fido breed: English Pointer ● Fido known for: faithfulness ● ...
  29. 29. Bad smells subclass of claim that is nonsensical when interpreted as “All instances of A are also instances of B” Example: dog subclass of pet But not all dogs are pets! feral dog subclass of dog true feral dog subclass of pet false :. dog subclass of pet false Solution: put “pet” knowledge about dogs into claim that does not apply to all instances of dog. E.g. “dog has role pet”. (Has role would not be transitive. Also needed: some/all quantifier.)
  30. 30. Classification on Wikidata ● Last but not least: part of (P361) – Third basic membership property – Top-level “part-whole” relation ● Instance of, subclass of and part of are all transitive ● Transitive relation: A subclass of B B subclass of C :. A subclass of C https://www.wikidata.org/wiki/Help:Basic_membership_properties
  31. 31. Ideas for small projects ● Add data about towns and cities – population (P1082) – head of government (P6) ● Add medical knowledge about historical figures – medical condition (P1050) – cause of death (P509) – manner of death (P1196) ● Add cultural knowledge about works of art – instance of (P31) – creator (P170) – material used (P186) – collection (P195)

×