An introduction to Wikidata presented on December 14, 2014 to Wikimedia New York City at the Brooklyn Law Incubator & Policy (BLIP) facility.
Contains minor edits and corrections from presentation.
Released under CC0.
1. Up and running with Wikidata
Emw
New York City Wikidata workshop
2014-12-14
2. Wikidata is a free linked database that can be
read and edited by
humans and machines.
3. Wikidata's goals
● Centralize interwiki links
● Centralize infoboxes
● Provide an interface for rich queries
● Structure the sum of all human knowledge
4. What you'll learn from this talk
● How to edit Wikidata
● How to classify
● Ideas for small projects
● Wikidata vocabulary
● Where to find things
● Awesome tools
7. Items and properties
● Each item and property has its own page
● Items
– Represent subjects: Douglas Adams, Challenger disaster
– Have identifiers like Q42, Q921090
– 12,906,291 items
● Properties
– Represent attribute names: occupation, has cause
– Have identifiers like P106, P828
– 1,329 properties
8. Statements and claims
● Claims
– Claims are “triplets”
● Formally: subject, predicate, object
● In Wikidata: item, property, value
● Example: Douglas Adams, occupation, author
● Statements
– A claim is only part of a statement
– Statements also include:
● References
● Ranks
9. Qualifiers, ranks, references
●Qualifiers
– Qualifiers are properties used on claims rather than items
– “Yonkers population 12,733 at time (P585) 1860”
●Ranks
– Preferred, normal, deprecated
– Useful to mark outdated claims
●References
– Source of claim; provenance
– “... stated in (P248) 1860 United States Census”
10. More on Wikidata vocabulary
https://www.wikidata.org/wiki/Wikidata:Glossary
11. Finding Wikidata items
Wikipedia articles have a Wikidata item link in the
left navigation panel.
13. Finding Wikidata items
Wikidata search is quick and effective.
Instant search suggests items that have labels or
aliases matching your keyword.
16. Finding properties
● Is there a property for “number of windows”?
● What was the ID of that property, again?
● Search
– In main site search box, prefix search term with “P:”
– “P:number of”, “P:occupation”
– Instant search doesn't work for properties, only items
● Browse
– https://www.wikidata.org/wiki/Wikidata:List_of_properties
^ bookmark this!
19. Yonkers TODO
https://www.wikidata.org/wiki/Q128114
● population (P1082) claims for historical table in
https://en.wikipedia.org/wiki/Yonkers,_New_York#Demographics
● Include references! “1860 United States Census”, etc.
● To add to item from inbofox:
– head of government (P6), office held by head of government (P1313)
– date of foundation or creation (P571)
– ZIP code (P281)
20. Area? Population density?
● Properties with units (km^2, people/km^2, $)
are not yet possible
● “Units” datatype in development
● https://phabricator.wikimedia.org/T65722
21. Tools
– Querying: Autolist, by Magnus Manske
● http://tools.wmflabs.org/autolist/autolist1.html
– Batch editing: Widar, by Magnus Manske
● https://tools.wmflabs.org/autolist/
– Software framework: Wikidata Toolkit, by Markus Kroetzsch et al.
● https://www.mediawiki.org/wiki/Wikidata_Toolkit
● https://github.com/Wikidata/Wikidata-Toolkit
22. Querying in Wikidata
List of politicians who died of a heart attack
Pseudo-query:
occupation: politician AND cause of death: heart attack
occupation: P106
politician: Q82955
cause of death: P509
heart attack: Q12152
Wikidata query in Autolist:
claim[106:82955] AND claim[509:12152]
27. Classes and instances
● Plato is a human is a animal
● Plato instance of human subclass of animal
● Instance: concrete object, individual
● Class: abstract object
28. Classification on Wikidata
● instance of (P31)
– rdf:type in RDF and OWL
– 11,930,243 usages
– Most popular Wikidata property
● subclass of (P279)
– “all instances of A are also instances of B”
– rdfs:subClassOf in RDF and OWL
– 170,571 usages
29. Examples
● USS Nimitz instance of Nimitz-class aircraft carrier
Nimitz-class aircraft carrier subclass of aircraft carrier
● 2012 Cannes Film Festival instance of Cannes Film Festival
Cannes Film Festival subclass of film festival
● an individual charm quark instance of charm quark
charm quark subclass of quark
^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances.
(There are no items about individual quarks on Wikidata!)
https://www.wikidata.org/wiki/Help:Basic_membership_properties
30. Bad smells
Item has many instance of or subclass of claims
Items typically satisfy a huge number of instance of claims:
● Fido instance of dog
● Fido instance of English Pointer
● Fido instance of faithful animal
● …
Solution: use one class for instance of, put other class
knowledge into normal properties
● Fido instance of dog
● Fido breed: English Pointer
● Fido known for: faithfulness
● ...
31. Bad smells
subclass of claim that is nonsensical when interpreted as “All
instances of A are also instances of B”
Example:
dog subclass of pet
But not all dogs are pets!
feral dog subclass of dog true
feral dog subclass of pet false
:. dog subclass of pet false
Solution: put “pet” knowledge about dogs into claim that does not
apply to all instances of dog. E.g. “dog has role pet”. (Has role
would not be transitive. Also needed: some/all quantifier.)
32. Classification on Wikidata
● Last but not least: part of (P361)
– Third basic membership property
– Top-level “part-whole” relation
● Instance of, subclass of and part of are all transitive
● Transitive relation:
A subclass of B
B subclass of C
:. A subclass of C
https://www.wikidata.org/wiki/Help:Basic_membership_properties
33. Ideas for small projects
● Add data about towns and cities
– population (P1082)
– head of government (P6)
● Add medical knowledge about historical figures
– medical condition (P1050)
– cause of death (P509)
– manner of death (P1196)
● Add cultural knowledge about works of art
– instance of (P31)
– creator (P170)
– material used (P186)
– collection (P195)