Successfully reported this slideshow.
Your SlideShare is downloading. ×

LD4 Wikidata Affinity Group - Shorthouse

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 30 Ad

LD4 Wikidata Affinity Group - Shorthouse

Download to read offline

Prepared and presented for the LD4 Wikidata Affinity Group, https://www.wikidata.org/wiki/Wikidata:WikiProject_LD4_Wikidata_Affinity_Group September 21, 2021

Prepared and presented for the LD4 Wikidata Affinity Group, https://www.wikidata.org/wiki/Wikidata:WikiProject_LD4_Wikidata_Affinity_Group September 21, 2021

Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

LD4 Wikidata Affinity Group - Shorthouse

  1. 1. Keeping ‘N Sync …with wikidata …and ORCID …and GBIF David P. Shorthouse Carl Lacy CC-BY-NC https://www.flickr.com/photos/clacey2/6892673576
  2. 2. https://bionomia.net
  3. 3. Who Collected & Identified?
  4. 4. Why? https://doi.org/10.1016/j.tplants.2021.07.013
  5. 5. What Do We Need? https://tdwg.org
  6. 6. https://gbif.org
  7. 7. Darwin Core Standard “Agent Strings” recordedBy identifiedBy
  8. 8. What Do We Need for (Authoritative) Lists of People? Comprehensive, Disambiguated, Up-to-date, Transparent processes Metadata of interest (eg birth, death dates, other demographics) Shared, unique identifiers in wide use Application Layer Robust machine-machine access through APIs Authentication through OAuth2 for self-declaration
  9. 9. #Roundtripping 160M specimen records Refreshed 2X month 2M+ “Agent Strings” Collectors/Determiners Strings to Things & Back Home
  10. 10. What Questions Should I Ask? (…as an organization that might want to use Wikidata at arm’s length) What are the core data objects in my system? Can I reduce my maintenance burden for peripheral data objects? Replace with shared identifiers, pull metadata when I need it Will that help me find a new community and/or bolster my existing relationships?
  11. 11. DINA Collections Management System Agriculture & Agri-Food Canada Symbiota Arizona State University
  12. 12. What Properties Should I Watch? PEOPLE_PROPERTIES = { "IPNI": "P586", "Harvard Index of Botanists": "P6264", "Entomologists of the World": "P5370", "ZooBank Author ID": "P2006", "BHL Creator ID": "P4081", "Stuttgart Database of Scientific Illustrators ID": "P2349" }
  13. 13. What Changed Since Last I Checked? yesterday = Time.now - 86400 %Q( SELECT (REPLACE(STR(?item),".*Q","Q") AS ?qid) WHERE { ?item wdt:P31 wd:Q5 . ?item wdt:P570 ?date_of_death . ?item wdt:P586|wdt:P6264|wdt:P5370|wdt:P2006|wdt:P4081|wdt:P2349 ?id . ?item schema:dateModified ?change . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } FILTER(?change > "#{yesterday.iso8601}"^^xsd:dateTime) } )
  14. 14. User-Triggered Manual Refresh
  15. 15. yesterday = Time.now - 86400 %Q( SELECT DISTINCT ?item ?itemLabel WHERE { ?item wdt:P586|wdt:P6264|wdt:P5370|wdt:P2006|wdt:P4081|wdt:P2349 ?id . ?item wdt:P570 ?date_of_death . ?item schema:dateModified ?change . MINUS { ?item wdt:P6944 ?bionomia . } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } FILTER(?change > "#{yesterday.iso8601}"^^xsd:dateTime) } ) What’s New Since Last I Checked?
  16. 16. Wikidata Client Libraries WikidataR, WikidataQueryServiceR (R packages) Wikidata (Python PyPi) Wikidata Toolkit (Java) wikibase-sdk (Javascript, NodeJS) sparql-client, wikidata-client (ruby)
  17. 17. wikidata-client - ruby gem 'wikidata-client' wiki_user = Wikidata::Item.find(“Q1234567”) …blah, blah code… extract properties, update local store of data & search
  18. 18. What Was Merged Since Last I Checked? week_ago = Time.now - 604800 %Q( SELECT (REPLACE(STR(?item),".*Q","Q") AS ?qid) (REPLACE(STR(?redirect),".*Q","Q") AS ?redirect_toqid) WHERE { ?redirect wdt:P31 wd:Q5 . ?redirect wdt:#{PEOPLE_PROPERTY} ?id . ?redirect wdt:P570 ?date_of_death . ?item owl:sameAs ?redirect . ?item schema:dateModified ?change . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } FILTER(?change > "#{week_ago.iso8601}"^^xsd:dateTime) } )
  19. 19. User.merge_wikidata(qid, dest_qid) …blah, blah code… mirror wikidata’s 301 redirect, update data
  20. 20. What Might be at Risk? https://www.wikidata.org/wiki/Wikidata:Requests_for_deletions “This item in wikidata needs work and is flagged for deletion.” url = "https://www.wikidata.org/wiki/Wikidata:Requests_for_deletions" doc = Nokogiri::HTML(URI.open(url)) ids = doc.to_s.scan(/Qd+/).uniq sm = Bionomia::SendMail.new({ subject: "ALERT! A wikidata page is flagged for deletion." })
  21. 21. Playing Nice: What Makes a Good Identifier? https://www.wikidata.org/wiki/User:Salgo60/ExternalIdentifiers 1. have persistent unique IDs 2. link your data to other data to provide context 3. have version history and support for merges by supporting redirects 4. have a SPARQL endpoint and/or JSON access 5. timestamps for created and changed 6. link back to Wikipedia pages/Wikidata 7. deleted items should be easy to find 8. support for more languages 9. describe your data with a schema 10. create GitHub repositories with code examples how to access your data Magnus Sälgö
  22. 22. “Very long tail of unknown people”
  23. 23. What if No Authority Has the Data You Need? Go All-in! Possible Solution: Use the Wikidata Q number (or ORCID ID) as your identifier
  24. 24. Authoritativeness is (Mostly) a Myth (…especially if the authority does not hit Magnus’ 10 point checklist…)
  25. 25. Simplifies Cognitive Load & Users are Happy Links Made October 2018 – Today 158 Attributors 18 each made > 100,000 4 each made > 1,000,000 11,300 people linked to at least one specimen record 8,500 profiles made public English, Français, Español, Protuguês, Deutsch Countless new wikidata entries AND new ORCID IDs with content
  26. 26. Keeping ‘N Sync Summary What Properties Should I Watch? What Changed Since Last I Checked? What’s New Since Last I Checked? What Was Merged Since Last I Checked? What Might be at Risk? Requests for deletion is a pain point

×