SlideShare a Scribd company logo
1 of 10
Download to read offline
Cooking the big soup
https://commons.wikimedia.org/wiki/File:Wikidata-logo-en.svg
Sebastian Burgstaller-Muehlbacher
Introduction
● Single value edits are simple, due to the web
interface of Wikidata.
● How to easily mass import data into Wikidata?
● Answer: Use Bots!
● Combine Wikidata API and query endpoints.
● Python as preferred language.
PBB_core
Resource specific code
Auxiliary classes
PBB_core
Data silo
-Get data from silo
-Clean data
-Make silo to Wikidata mapping
-Take mapped data
-Lookup WD if item already exists
-Throw exception if inconsistencies occur
-Construct or modify a WD item JSON object
-Provide logging capabilities
-Provide WD login infrastructure
-Provide settings
1.
2.
3.
4.
1. Get data and map to WD
2. Login to WD
3. Provide PBB_core with data
4. Request write to WD
What does an item look like, really?
https://goo.gl/Ndbcd4
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q423111
A Minimal Bot
A Minimal Bot for Mass Data Import
Advantages of PBB_core
● One interface to Wikidata for (your) bots!
● Fast development and deployment of new bots.
● Integrates Wikidata querying and writing.
● Prevents creation of duplicate items.
● Searches for duplicate use of identifiers.
● Compatible to Python 2 and Python 3.
● Execute queries with SPARQL or WDQ.
● Minimizes HTTP traffic, increases throughput.
All Wikidata data types
● All current Wikidata data types have been implemented.
– PBB_core.WDString
– PBB_core.WDItemID
– PBB_core.WDMonolingualText
– PBB_core.WDProperty
– PBB_core.WDQuantity
– PBB_core.WDTime
– PBB_core.WDUrl
– PBB_core.WDGlobeCoordinate
– PBB_core.WDCommonsMedia
Conclusions
● Mass data imports require scripts/aka bots
● Our solution: PBB_core
– Python framework for reading from and writing to
Wikidata
– Implementing all Wikidata data types
– Implementing consistency checks of data to be
written.
– Get it from:
https://bitbucket.org/sulab/wikidatabots/src
Let's hack Wikidata!!
culturedigitally.org

More Related Content

Viewers also liked

Webinar: Experts Weigh in on the State of WordPress for 2017
Webinar: Experts Weigh in on the State of WordPress for 2017Webinar: Experts Weigh in on the State of WordPress for 2017
Webinar: Experts Weigh in on the State of WordPress for 2017WP Engine
 
Security Webinar: Harden the Heart of Your WordPress SiteSe
Security Webinar: Harden the Heart of Your WordPress SiteSeSecurity Webinar: Harden the Heart of Your WordPress SiteSe
Security Webinar: Harden the Heart of Your WordPress SiteSeWP Engine
 
Why do-leaves-change-color-in-the-fall
Why do-leaves-change-color-in-the-fallWhy do-leaves-change-color-in-the-fall
Why do-leaves-change-color-in-the-fallDevon Inglee
 
5th piet mondrian 2
5th piet mondrian 25th piet mondrian 2
5th piet mondrian 2Devon Inglee
 
Rx capabilities 6 15-2012 revised for dan only
Rx capabilities 6 15-2012 revised for dan onlyRx capabilities 6 15-2012 revised for dan only
Rx capabilities 6 15-2012 revised for dan onlyrxir
 
6th egyptian scarab
6th egyptian scarab6th egyptian scarab
6th egyptian scarabDevon Inglee
 
Welcomerulesprocedures 201516
Welcomerulesprocedures 201516Welcomerulesprocedures 201516
Welcomerulesprocedures 201516Devon Inglee
 
X Flow Interface 2011 En
X Flow Interface 2011 EnX Flow Interface 2011 En
X Flow Interface 2011 EnHans Willems
 
1st grade mice habitat
1st grade mice habitat1st grade mice habitat
1st grade mice habitatDevon Inglee
 
2nd grade clay muppets
2nd grade clay muppets2nd grade clay muppets
2nd grade clay muppetsDevon Inglee
 

Viewers also liked (20)

Webinar: Experts Weigh in on the State of WordPress for 2017
Webinar: Experts Weigh in on the State of WordPress for 2017Webinar: Experts Weigh in on the State of WordPress for 2017
Webinar: Experts Weigh in on the State of WordPress for 2017
 
Security Webinar: Harden the Heart of Your WordPress SiteSe
Security Webinar: Harden the Heart of Your WordPress SiteSeSecurity Webinar: Harden the Heart of Your WordPress SiteSe
Security Webinar: Harden the Heart of Your WordPress SiteSe
 
4th value 2
4th value 24th value 2
4th value 2
 
Why do-leaves-change-color-in-the-fall
Why do-leaves-change-color-in-the-fallWhy do-leaves-change-color-in-the-fall
Why do-leaves-change-color-in-the-fall
 
5th piet mondrian 2
5th piet mondrian 25th piet mondrian 2
5th piet mondrian 2
 
Printmaking
PrintmakingPrintmaking
Printmaking
 
Hello world
Hello worldHello world
Hello world
 
Rx capabilities 6 15-2012 revised for dan only
Rx capabilities 6 15-2012 revised for dan onlyRx capabilities 6 15-2012 revised for dan only
Rx capabilities 6 15-2012 revised for dan only
 
xFlow Capture En
xFlow Capture EnxFlow Capture En
xFlow Capture En
 
Pos negspace2nd
Pos negspace2ndPos negspace2nd
Pos negspace2nd
 
6th egyptian scarab
6th egyptian scarab6th egyptian scarab
6th egyptian scarab
 
Pinchpot2nd
Pinchpot2ndPinchpot2nd
Pinchpot2nd
 
Glaze 2nd
Glaze 2ndGlaze 2nd
Glaze 2nd
 
Welcomerulesprocedures 201516
Welcomerulesprocedures 201516Welcomerulesprocedures 201516
Welcomerulesprocedures 201516
 
X Flow Interface 2011 En
X Flow Interface 2011 EnX Flow Interface 2011 En
X Flow Interface 2011 En
 
1st grade mice habitat
1st grade mice habitat1st grade mice habitat
1st grade mice habitat
 
5th countour shoe
5th countour shoe5th countour shoe
5th countour shoe
 
2nd grade clay muppets
2nd grade clay muppets2nd grade clay muppets
2nd grade clay muppets
 
5 surrealism dali
5 surrealism dali5 surrealism dali
5 surrealism dali
 
2nd saguaro
2nd saguaro2nd saguaro
2nd saguaro
 

Similar to SWAT4LS Wikidata tutorial cambridge dec 2015

Integrate External Data With The Business Data Catalog
Integrate External Data With The Business Data CatalogIntegrate External Data With The Business Data Catalog
Integrate External Data With The Business Data CatalogTom Resing
 
Bot Computing using the Power of Wiki Collaboration
Bot Computing using the Power of Wiki CollaborationBot Computing using the Power of Wiki Collaboration
Bot Computing using the Power of Wiki CollaborationTakashi Yamanoue
 
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure ChestWeb Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure ChestAxiell ALM
 
Pbi iot data viz
Pbi iot data vizPbi iot data viz
Pbi iot data vizDavid Moss
 
Internet of Bioinformatics
Internet of BioinformaticsInternet of Bioinformatics
Internet of BioinformaticsJens Allmer
 
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonSri Ambati
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
 
node-crate: node.js and big data
 node-crate: node.js and big data node-crate: node.js and big data
node-crate: node.js and big dataStefan Thies
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data PersistenceFIWARE
 
Python-CouchDB Training at PyCon PL 2012
Python-CouchDB Training at PyCon PL 2012Python-CouchDB Training at PyCon PL 2012
Python-CouchDB Training at PyCon PL 2012Stefan Kögl
 
Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Obily W
 
Self service reporting on Hadoop using Actuate BIRT
Self service reporting on Hadoop using Actuate BIRTSelf service reporting on Hadoop using Actuate BIRT
Self service reporting on Hadoop using Actuate BIRTVaidehi Deshpande
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaNGDATA
 
MarcEdit and OCLC Integration -- Summer ALA 2014
MarcEdit and OCLC Integration -- Summer ALA 2014MarcEdit and OCLC Integration -- Summer ALA 2014
MarcEdit and OCLC Integration -- Summer ALA 2014Terry Reese
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 

Similar to SWAT4LS Wikidata tutorial cambridge dec 2015 (20)

Integrate External Data With The Business Data Catalog
Integrate External Data With The Business Data CatalogIntegrate External Data With The Business Data Catalog
Integrate External Data With The Business Data Catalog
 
Bot Computing using the Power of Wiki Collaboration
Bot Computing using the Power of Wiki CollaborationBot Computing using the Power of Wiki Collaboration
Bot Computing using the Power of Wiki Collaboration
 
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure ChestWeb Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
 
Pbi iot data viz
Pbi iot data vizPbi iot data viz
Pbi iot data viz
 
Internet of Bioinformatics
Internet of BioinformaticsInternet of Bioinformatics
Internet of Bioinformatics
 
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
node-crate: node.js and big data
 node-crate: node.js and big data node-crate: node.js and big data
node-crate: node.js and big data
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
 
Sap bo xi r4.0
Sap bo xi r4.0Sap bo xi r4.0
Sap bo xi r4.0
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
 
Python-CouchDB Training at PyCon PL 2012
Python-CouchDB Training at PyCon PL 2012Python-CouchDB Training at PyCon PL 2012
Python-CouchDB Training at PyCon PL 2012
 
Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Web Intelligence - Tutorial1
Web Intelligence - Tutorial1
 
Self service reporting on Hadoop using Actuate BIRT
Self service reporting on Hadoop using Actuate BIRTSelf service reporting on Hadoop using Actuate BIRT
Self service reporting on Hadoop using Actuate BIRT
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in Java
 
MarcEdit and OCLC Integration -- Summer ALA 2014
MarcEdit and OCLC Integration -- Summer ALA 2014MarcEdit and OCLC Integration -- Summer ALA 2014
MarcEdit and OCLC Integration -- Summer ALA 2014
 
WebSphere Commerce v7 Data Load
WebSphere Commerce v7 Data LoadWebSphere Commerce v7 Data Load
WebSphere Commerce v7 Data Load
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
Sap bo xi r4.0 short
Sap bo xi r4.0  shortSap bo xi r4.0  short
Sap bo xi r4.0 short
 

Recently uploaded

ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docxkarenmillo
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationSanghamitraMohapatra5
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasChayanika Das
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 

Recently uploaded (20)

Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docx
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitation
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 

SWAT4LS Wikidata tutorial cambridge dec 2015

  • 1. Cooking the big soup https://commons.wikimedia.org/wiki/File:Wikidata-logo-en.svg Sebastian Burgstaller-Muehlbacher
  • 2. Introduction ● Single value edits are simple, due to the web interface of Wikidata. ● How to easily mass import data into Wikidata? ● Answer: Use Bots! ● Combine Wikidata API and query endpoints. ● Python as preferred language.
  • 3. PBB_core Resource specific code Auxiliary classes PBB_core Data silo -Get data from silo -Clean data -Make silo to Wikidata mapping -Take mapped data -Lookup WD if item already exists -Throw exception if inconsistencies occur -Construct or modify a WD item JSON object -Provide logging capabilities -Provide WD login infrastructure -Provide settings 1. 2. 3. 4. 1. Get data and map to WD 2. Login to WD 3. Provide PBB_core with data 4. Request write to WD
  • 4. What does an item look like, really? https://goo.gl/Ndbcd4 https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q423111
  • 6. A Minimal Bot for Mass Data Import
  • 7. Advantages of PBB_core ● One interface to Wikidata for (your) bots! ● Fast development and deployment of new bots. ● Integrates Wikidata querying and writing. ● Prevents creation of duplicate items. ● Searches for duplicate use of identifiers. ● Compatible to Python 2 and Python 3. ● Execute queries with SPARQL or WDQ. ● Minimizes HTTP traffic, increases throughput.
  • 8. All Wikidata data types ● All current Wikidata data types have been implemented. – PBB_core.WDString – PBB_core.WDItemID – PBB_core.WDMonolingualText – PBB_core.WDProperty – PBB_core.WDQuantity – PBB_core.WDTime – PBB_core.WDUrl – PBB_core.WDGlobeCoordinate – PBB_core.WDCommonsMedia
  • 9. Conclusions ● Mass data imports require scripts/aka bots ● Our solution: PBB_core – Python framework for reading from and writing to Wikidata – Implementing all Wikidata data types – Implementing consistency checks of data to be written. – Get it from: https://bitbucket.org/sulab/wikidatabots/src