Up and running with Wikidata

_
Up and running with Wikidata 
Emw 
New York City Wikidata workshop 
2014-12-14
Wikidata is a free linked database that can be 
read and edited by 
humans and machines.
Wikidata's goals 
● Centralize interwiki links 
● Centralize infoboxes 
● Provide an interface for rich queries 
● Structure the sum of all human knowledge
What you'll learn from this talk 
● How to edit Wikidata 
● How to classify 
● Ideas for small projects 
● Wikidata vocabulary 
● Where to find things 
● Awesome tools
Elements of a Wikidata statement
Example: New York City (Q60)
Items and properties 
● Each item and property has its own page 
● Items 
– Represent subjects: Douglas Adams, Challenger disaster 
– Have identifiers like Q42, Q921090 
– 12,906,291 items 
● Properties 
– Represent attribute names: occupation, has cause 
– Have identifiers like P106, P828 
– 1,329 properties
Statements and claims 
● Claims 
– Claims are “triplets” 
● Formally: subject, predicate, object 
● In Wikidata: item, property, value 
● Example: Douglas Adams, occupation, author 
● Statements 
– A claim is only part of a statement 
– Statements also include: 
● References 
● Ranks
Qualifiers, ranks, references 
●Qualifiers 
– Qualifiers are properties used on claims rather than items 
– “Yonkers population 12,733 at time (P585) 1860” 
●Ranks 
– Preferred, normal, deprecated 
– Useful to mark outdated claims 
●References 
– Source of claim; provenance 
– “... stated in (P248) 1860 United States Census”
More on Wikidata vocabulary 
https://www.wikidata.org/wiki/Wikidata:Glossary
Finding Wikidata items 
Wikipedia articles have a Wikidata item link in the 
left navigation panel.
Up and running with Wikidata
Finding Wikidata items 
Wikidata search is quick and effective. 
Instant search suggests items that have labels or 
aliases matching your keyword.
Search by label
Search by alias: “flu” -> influenza
Finding properties 
● Is there a property for “number of windows”? 
● What was the ID of that property, again? 
● Search 
– In main site search box, prefix search term with “P:” 
– “P:number of”, “P:occupation” 
– Instant search doesn't work for properties, only items 
● Browse 
– https://www.wikidata.org/wiki/Wikidata:List_of_properties 
^ bookmark this!
Let's edit Wikidata.
Walking through edits for: 
Yonkers, New York 
https://www.wikidata.org/wiki/Q128114
Yonkers TODO 
https://www.wikidata.org/wiki/Q128114 
● population (P1082) claims for historical table in 
https://en.wikipedia.org/wiki/Yonkers,_New_York#Demographics 
● Include references! “1860 United States Census”, etc. 
● To add to item from inbofox: 
– head of government (P6), office held by head of government (P1313) 
– date of foundation or creation (P571) 
– ZIP code (P281)
Area? Population density? 
● Properties with units (km^2, people/km^2, $) 
are not yet possible 
● “Units” datatype in development 
● https://phabricator.wikimedia.org/T65722
Tools 
– Querying: Autolist, by Magnus Manske 
● http://tools.wmflabs.org/autolist/autolist1.html 
– Batch editing: Widar, by Magnus Manske 
● https://tools.wmflabs.org/autolist/ 
– Software framework: Wikidata Toolkit, by Markus Kroetzsch et al. 
● https://www.mediawiki.org/wiki/Wikidata_Toolkit 
● https://github.com/Wikidata/Wikidata-Toolkit
Querying in Wikidata 
List of politicians who died of a heart attack 
Pseudo-query: 
occupation: politician AND cause of death: heart attack 
occupation: P106 
politician: Q82955 
cause of death: P509 
heart attack: Q12152 
Wikidata query in Autolist: 
claim[106:82955] AND claim[509:12152]
http://tools.wmflabs.org/autolist/autolist1.html?q=claim[106:82955]%20AND%20claim[509:12152]
Classification on Wikidata 
● Taxonomy of knowledge 
● Enables powerful inference, novel applications 
● Interesting philosophical, design, and engineering issues
Up and running with Wikidata
Tree of Porphyry 
User:VoiceOfTheCommons, CC-BY-SA 3.0
Classes and instances 
● Plato is a human is a animal 
● Plato instance of human subclass of animal 
● Instance: concrete object, individual 
● Class: abstract object
Classification on Wikidata 
● instance of (P31) 
– rdf:type in RDF and OWL 
– 11,930,243 usages 
– Most popular Wikidata property 
● subclass of (P279) 
– “all instances of A are also instances of B” 
– rdfs:subClassOf in RDF and OWL 
– 170,571 usages
Examples 
● USS Nimitz instance of Nimitz-class aircraft carrier 
Nimitz-class aircraft carrier subclass of aircraft carrier 
● 2012 Cannes Film Festival instance of Cannes Film Festival 
Cannes Film Festival subclass of film festival 
● an individual charm quark instance of charm quark 
charm quark subclass of quark 
^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances. 
(There are no items about individual quarks on Wikidata!) 
https://www.wikidata.org/wiki/Help:Basic_membership_properties
Bad smells 
Item has many instance of or subclass of claims 
Items typically satisfy a huge number of instance of claims: 
● Fido instance of dog 
● Fido instance of English Pointer 
● Fido instance of faithful animal 
● … 
Solution: use one class for instance of, put other class 
knowledge into normal properties 
● Fido instance of dog 
● Fido breed: English Pointer 
● Fido known for: faithfulness 
● ...
Bad smells 
subclass of claim that is nonsensical when interpreted as “All 
instances of A are also instances of B” 
Example: 
dog subclass of pet 
But not all dogs are pets! 
feral dog subclass of dog true 
feral dog subclass of pet false 
:. dog subclass of pet false 
Solution: put “pet” knowledge about dogs into claim that does not 
apply to all instances of dog. E.g. “dog has role pet”. (Has role 
would not be transitive. Also needed: some/all quantifier.)
Classification on Wikidata 
● Last but not least: part of (P361) 
– Third basic membership property 
– Top-level “part-whole” relation 
● Instance of, subclass of and part of are all transitive 
● Transitive relation: 
A subclass of B 
B subclass of C 
:. A subclass of C 
https://www.wikidata.org/wiki/Help:Basic_membership_properties
Ideas for small projects 
● Add data about towns and cities 
– population (P1082) 
– head of government (P6) 
● Add medical knowledge about historical figures 
– medical condition (P1050) 
– cause of death (P509) 
– manner of death (P1196) 
● Add cultural knowledge about works of art 
– instance of (P31) 
– creator (P170) 
– material used (P186) 
– collection (P195)
1 of 33

Recommended

Introduction to RDF Data Model by
Introduction to RDF Data ModelIntroduction to RDF Data Model
Introduction to RDF Data ModelCesar Augusto Nogueira
1.7K views23 slides
20110728 datalift-rpi-troy by
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troyFrançois Scharffe
857 views55 slides
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011 by
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011François Scharffe
627 views49 slides
Event-based archival descriptions by
Event-based archival descriptionsEvent-based archival descriptions
Event-based archival descriptionsAthanasios Velios
255 views107 slides
Federated Query Formulation and Processing Through BioFed by
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
299 views20 slides
Federated SPARQL query processing over the Web of Data by
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
2.3K views83 slides

More Related Content

What's hot

FedX - Optimization Techniques for Federated Query Processing on Linked Data by
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte
7.1K views23 slides
Semantic Web And Coldfusion by
Semantic Web And ColdfusionSemantic Web And Coldfusion
Semantic Web And Coldfusionwilliam_greenly
918 views22 slides
Wikipedia infobox type_prediction_slides_dl4_k_gs by
Wikipedia infobox type_prediction_slides_dl4_k_gsWikipedia infobox type_prediction_slides_dl4_k_gs
Wikipedia infobox type_prediction_slides_dl4_k_gsRussa Biswas
341 views18 slides
Web Data Management with RDF by
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDFM. Tamer Özsu
3.4K views126 slides
An Introduction to RDF and the Web of Data by
An Introduction to RDF and the Web of DataAn Introduction to RDF and the Web of Data
An Introduction to RDF and the Web of DataOlaf Hartig
3.2K views47 slides
Rich Data? Poor Data? Depends on... by
Rich Data? Poor Data? Depends on...Rich Data? Poor Data? Depends on...
Rich Data? Poor Data? Depends on...Lars G. Svensson
1.2K views15 slides

What's hot(18)

FedX - Optimization Techniques for Federated Query Processing on Linked Data by aschwarte
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Data
aschwarte7.1K views
Wikipedia infobox type_prediction_slides_dl4_k_gs by Russa Biswas
Wikipedia infobox type_prediction_slides_dl4_k_gsWikipedia infobox type_prediction_slides_dl4_k_gs
Wikipedia infobox type_prediction_slides_dl4_k_gs
Russa Biswas341 views
Web Data Management with RDF by M. Tamer Özsu
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDF
M. Tamer Özsu3.4K views
An Introduction to RDF and the Web of Data by Olaf Hartig
An Introduction to RDF and the Web of DataAn Introduction to RDF and the Web of Data
An Introduction to RDF and the Web of Data
Olaf Hartig3.2K views
Rich Data? Poor Data? Depends on... by Lars G. Svensson
Rich Data? Poor Data? Depends on...Rich Data? Poor Data? Depends on...
Rich Data? Poor Data? Depends on...
Lars G. Svensson1.2K views
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani by Diego Valerio Camarda
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
ALEC (A List of Everything Cool) by Roderic Page
ALEC (A List of Everything Cool)ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)
Roderic Page189 views
On the Way to a Holding Ontology by Jakob .
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding Ontology
Jakob .1.3K views
Connections that work: Linked Open Data demystified by Jakob .
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
Jakob .1.3K views
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and... by DeVonne Parks, CEM
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
DeVonne Parks, CEM1.2K views
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT by Herbert Van de Sompel
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs by Josef Petrák
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
Josef Petrák617 views
MLB Teams Logic Puzzle Solution by PhactoidPhil
MLB Teams Logic Puzzle SolutionMLB Teams Logic Puzzle Solution
MLB Teams Logic Puzzle Solution
PhactoidPhil994 views

Viewers also liked

An Ambitious Wikidata Tutorial by
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial_Emw
11.4K views52 slides
Wikidata for libraries and archives by
Wikidata for libraries and archivesWikidata for libraries and archives
Wikidata for libraries and archives_Emw
4.7K views45 slides
Wikidata presentation at SemTechBiz Berlin 2012 by
Wikidata presentation at SemTechBiz Berlin 2012Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012Denny Vrandecic
3.1K views65 slides
Lecture slides6; Construction contract financial planning by
Lecture slides6; Construction contract financial planningLecture slides6; Construction contract financial planning
Lecture slides6; Construction contract financial planningJB Nartey
827 views12 slides
Argument pp 1 by
Argument pp 1Argument pp 1
Argument pp 1Patrick Farnsworth
2.5K views26 slides
Wikidata and the Semantic Web of Food by
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
5.6K views17 slides

Viewers also liked(6)

An Ambitious Wikidata Tutorial by _Emw
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial
_Emw11.4K views
Wikidata for libraries and archives by _Emw
Wikidata for libraries and archivesWikidata for libraries and archives
Wikidata for libraries and archives
_Emw4.7K views
Wikidata presentation at SemTechBiz Berlin 2012 by Denny Vrandecic
Wikidata presentation at SemTechBiz Berlin 2012Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012
Denny Vrandecic3.1K views
Lecture slides6; Construction contract financial planning by JB Nartey
Lecture slides6; Construction contract financial planningLecture slides6; Construction contract financial planning
Lecture slides6; Construction contract financial planning
JB Nartey827 views
Wikidata and the Semantic Web of Food by Benjamin Good
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
Benjamin Good5.6K views

Similar to Up and running with Wikidata

20141114 wikidata glam_workshop2 by
20141114 wikidata glam_workshop220141114 wikidata glam_workshop2
20141114 wikidata glam_workshop2Sebastiaan ter Burg
3.1K views110 slides
Entity Linking, Link Prediction, and Knowledge Graph Completion by
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionJennifer D'Souza
173 views97 slides
A Little SPARQL in your Analytics by
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
760 views54 slides
Understanding the Standards Gap by
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards GapDan Brickley
7.4K views23 slides
Evaluation Initiatives for Entity-oriented Search by
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
2.9K views64 slides
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ... by
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...Maximilian Klein
2.7K views26 slides

Similar to Up and running with Wikidata(19)

Entity Linking, Link Prediction, and Knowledge Graph Completion by Jennifer D'Souza
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph Completion
Jennifer D'Souza173 views
Understanding the Standards Gap by Dan Brickley
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards Gap
Dan Brickley7.4K views
Evaluation Initiatives for Entity-oriented Search by krisztianbalog
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
krisztianbalog2.9K views
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ... by Maximilian Klein
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Maximilian Klein2.7K views
Csvconf data hacking-with_wikimedia_projects by mattsenate
Csvconf data hacking-with_wikimedia_projectsCsvconf data hacking-with_wikimedia_projects
Csvconf data hacking-with_wikimedia_projects
mattsenate199 views
Quipu: Quechua Knowledge Graph [Pilot: Building virtual assistants based on Q... by Elwin Huaman
Quipu: Quechua Knowledge Graph [Pilot: Building virtual assistants based on Q...Quipu: Quechua Knowledge Graph [Pilot: Building virtual assistants based on Q...
Quipu: Quechua Knowledge Graph [Pilot: Building virtual assistants based on Q...
Elwin Huaman895 views
Curation and Digital Storytelling by Shawn Day
Curation and Digital StorytellingCuration and Digital Storytelling
Curation and Digital Storytelling
Shawn Day1.5K views
2014 10-11 Wikidata talk London WMF UK by Magnus Manske
2014 10-11 Wikidata talk London WMF UK2014 10-11 Wikidata talk London WMF UK
2014 10-11 Wikidata talk London WMF UK
Magnus Manske849 views
Web Driven Revolution For Library Data by Richard Wallis
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
Richard Wallis1.4K views
VALA Tech Camp 2017: Intro to Wikidata & SPARQL by Jane Frazier
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
Jane Frazier117 views
2014-02-27 Wikidata talk Cambridge by Magnus Manske
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge
Magnus Manske1.6K views
Digital Narratives for Transylvania DH by Shawn Day
Digital Narratives for Transylvania DHDigital Narratives for Transylvania DH
Digital Narratives for Transylvania DH
Shawn Day1.4K views
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa... by Olivier Grisel
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Olivier Grisel2.3K views
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net... by Onando Mulang'
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Onando Mulang'182 views
Wikipedia and Civic Engagement by Andrew Lih
Wikipedia and Civic EngagementWikipedia and Civic Engagement
Wikipedia and Civic Engagement
Andrew Lih2.1K views
Creating Narrative with Digital Objects by Shawn Day
Creating Narrative with Digital ObjectsCreating Narrative with Digital Objects
Creating Narrative with Digital Objects
Shawn Day1.9K views
Entity Retrieval (SIGIR 2013 tutorial) by krisztianbalog
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
krisztianbalog1.2K views

Recently uploaded

Listed Instruments Survey 2022.pptx by
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptxsecretariat4
121 views12 slides
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf by
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOPPOTUS - Malaysians on Malaysia 3Q2023.pdf
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOppotus
31 views19 slides
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...DataScienceConferenc1
5 views18 slides
PRIVACY AWRE PERSONAL DATA STORAGE by
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGEantony420421
7 views56 slides
Product Research sample.pdf by
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdfAllenSingson
33 views29 slides
VoxelNet by
VoxelNetVoxelNet
VoxelNettaeseon ryu
17 views21 slides

Recently uploaded(20)

Listed Instruments Survey 2022.pptx by secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat4121 views
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf by Oppotus
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOPPOTUS - Malaysians on Malaysia 3Q2023.pdf
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf
Oppotus31 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views
Product Research sample.pdf by AllenSingson
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdf
AllenSingson33 views
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... by DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
Customer Data Cleansing Project.pptx by Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821714 views
DGST Methodology Presentation.pdf by maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 views
Best Home Security Systems.pptx by mogalang
Best Home Security Systems.pptxBest Home Security Systems.pptx
Best Home Security Systems.pptx
mogalang9 views
Dr. Ousmane Badiane-2023 ReSAKSS Conference by AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 views
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion by Bertram Ludäscher
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Shreyas hospital statistics.pdf by samithavinal
Shreyas hospital statistics.pdfShreyas hospital statistics.pdf
Shreyas hospital statistics.pdf
samithavinal5 views

Up and running with Wikidata

  • 1. Up and running with Wikidata Emw New York City Wikidata workshop 2014-12-14
  • 2. Wikidata is a free linked database that can be read and edited by humans and machines.
  • 3. Wikidata's goals ● Centralize interwiki links ● Centralize infoboxes ● Provide an interface for rich queries ● Structure the sum of all human knowledge
  • 4. What you'll learn from this talk ● How to edit Wikidata ● How to classify ● Ideas for small projects ● Wikidata vocabulary ● Where to find things ● Awesome tools
  • 5. Elements of a Wikidata statement
  • 6. Example: New York City (Q60)
  • 7. Items and properties ● Each item and property has its own page ● Items – Represent subjects: Douglas Adams, Challenger disaster – Have identifiers like Q42, Q921090 – 12,906,291 items ● Properties – Represent attribute names: occupation, has cause – Have identifiers like P106, P828 – 1,329 properties
  • 8. Statements and claims ● Claims – Claims are “triplets” ● Formally: subject, predicate, object ● In Wikidata: item, property, value ● Example: Douglas Adams, occupation, author ● Statements – A claim is only part of a statement – Statements also include: ● References ● Ranks
  • 9. Qualifiers, ranks, references ●Qualifiers – Qualifiers are properties used on claims rather than items – “Yonkers population 12,733 at time (P585) 1860” ●Ranks – Preferred, normal, deprecated – Useful to mark outdated claims ●References – Source of claim; provenance – “... stated in (P248) 1860 United States Census”
  • 10. More on Wikidata vocabulary https://www.wikidata.org/wiki/Wikidata:Glossary
  • 11. Finding Wikidata items Wikipedia articles have a Wikidata item link in the left navigation panel.
  • 13. Finding Wikidata items Wikidata search is quick and effective. Instant search suggests items that have labels or aliases matching your keyword.
  • 15. Search by alias: “flu” -> influenza
  • 16. Finding properties ● Is there a property for “number of windows”? ● What was the ID of that property, again? ● Search – In main site search box, prefix search term with “P:” – “P:number of”, “P:occupation” – Instant search doesn't work for properties, only items ● Browse – https://www.wikidata.org/wiki/Wikidata:List_of_properties ^ bookmark this!
  • 18. Walking through edits for: Yonkers, New York https://www.wikidata.org/wiki/Q128114
  • 19. Yonkers TODO https://www.wikidata.org/wiki/Q128114 ● population (P1082) claims for historical table in https://en.wikipedia.org/wiki/Yonkers,_New_York#Demographics ● Include references! “1860 United States Census”, etc. ● To add to item from inbofox: – head of government (P6), office held by head of government (P1313) – date of foundation or creation (P571) – ZIP code (P281)
  • 20. Area? Population density? ● Properties with units (km^2, people/km^2, $) are not yet possible ● “Units” datatype in development ● https://phabricator.wikimedia.org/T65722
  • 21. Tools – Querying: Autolist, by Magnus Manske ● http://tools.wmflabs.org/autolist/autolist1.html – Batch editing: Widar, by Magnus Manske ● https://tools.wmflabs.org/autolist/ – Software framework: Wikidata Toolkit, by Markus Kroetzsch et al. ● https://www.mediawiki.org/wiki/Wikidata_Toolkit ● https://github.com/Wikidata/Wikidata-Toolkit
  • 22. Querying in Wikidata List of politicians who died of a heart attack Pseudo-query: occupation: politician AND cause of death: heart attack occupation: P106 politician: Q82955 cause of death: P509 heart attack: Q12152 Wikidata query in Autolist: claim[106:82955] AND claim[509:12152]
  • 24. Classification on Wikidata ● Taxonomy of knowledge ● Enables powerful inference, novel applications ● Interesting philosophical, design, and engineering issues
  • 26. Tree of Porphyry User:VoiceOfTheCommons, CC-BY-SA 3.0
  • 27. Classes and instances ● Plato is a human is a animal ● Plato instance of human subclass of animal ● Instance: concrete object, individual ● Class: abstract object
  • 28. Classification on Wikidata ● instance of (P31) – rdf:type in RDF and OWL – 11,930,243 usages – Most popular Wikidata property ● subclass of (P279) – “all instances of A are also instances of B” – rdfs:subClassOf in RDF and OWL – 170,571 usages
  • 29. Examples ● USS Nimitz instance of Nimitz-class aircraft carrier Nimitz-class aircraft carrier subclass of aircraft carrier ● 2012 Cannes Film Festival instance of Cannes Film Festival Cannes Film Festival subclass of film festival ● an individual charm quark instance of charm quark charm quark subclass of quark ^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances. (There are no items about individual quarks on Wikidata!) https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 30. Bad smells Item has many instance of or subclass of claims Items typically satisfy a huge number of instance of claims: ● Fido instance of dog ● Fido instance of English Pointer ● Fido instance of faithful animal ● … Solution: use one class for instance of, put other class knowledge into normal properties ● Fido instance of dog ● Fido breed: English Pointer ● Fido known for: faithfulness ● ...
  • 31. Bad smells subclass of claim that is nonsensical when interpreted as “All instances of A are also instances of B” Example: dog subclass of pet But not all dogs are pets! feral dog subclass of dog true feral dog subclass of pet false :. dog subclass of pet false Solution: put “pet” knowledge about dogs into claim that does not apply to all instances of dog. E.g. “dog has role pet”. (Has role would not be transitive. Also needed: some/all quantifier.)
  • 32. Classification on Wikidata ● Last but not least: part of (P361) – Third basic membership property – Top-level “part-whole” relation ● Instance of, subclass of and part of are all transitive ● Transitive relation: A subclass of B B subclass of C :. A subclass of C https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 33. Ideas for small projects ● Add data about towns and cities – population (P1082) – head of government (P6) ● Add medical knowledge about historical figures – medical condition (P1050) – cause of death (P509) – manner of death (P1196) ● Add cultural knowledge about works of art – instance of (P31) – creator (P170) – material used (P186) – collection (P195)