Successfully reported this slideshow.
Your SlideShare is downloading. ×

Make your Web resources more discoverable with Bioschemas markup –Bioschemas Tutorial June 2019

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 46 Ad

Make your Web resources more discoverable with Bioschemas markup –Bioschemas Tutorial June 2019

Download to read offline

This tutorial will introduce you to Bioschemas, and will present how to include schema.org markup to make your resource(s) more discoverable on the Web. At the end of the session, attendees will be able to
1) Understand what is Bioschemas and how to use it
2) Have examples of deployments using Bioschemas

This tutorial will introduce you to Bioschemas, and will present how to include schema.org markup to make your resource(s) more discoverable on the Web. At the end of the session, attendees will be able to
1) Understand what is Bioschemas and how to use it
2) Have examples of deployments using Bioschemas

Advertisement
Advertisement

More Related Content

Similar to Make your Web resources more discoverable with Bioschemas markup –Bioschemas Tutorial June 2019 (20)

Advertisement
Advertisement

Make your Web resources more discoverable with Bioschemas markup –Bioschemas Tutorial June 2019

  1. 1. Improving discoverability for Life Sciences resources Alasdair J.G. Gray Bioschemas Leadership Team Chair Heriot-Watt University/Elixir-UK Bioschemas ELIXIR All Hands Tutorial Lisbon, Portugal – 19 June 2019
  2. 2. Google Search 2http://bioschemas.org
  3. 3. Google Search 3http://bioschemas.org
  4. 4. Google Dataset Search (Sept 2018) 4 https://toolbox.google.com/datasetsearch http://bioschemas.org https://www.blog.google/products/search/making-it-easier-discover-datasets/
  5. 5. Picture: Carole Goble, Turing Lecture 2018 Schema.org: Semantic Markup for the Web
  6. 6. Structured data → descriptors ● Types (614) What we can say about those things ● Properties (905) What we are talking about
  7. 7. Bioschemas • Community initiative built on top of schema.org • Aim • Improve data discoverability and interoperability in Life Sciences • Approach • Add Life Science types to schema.org • Provide usage guidelines and examples • 6 Minimal properties • Link to domain ontologies • Support software Profile over schema.org Layer of constraints + documentation + extensions Specification Data model Minimum information Controlled vocabularies Cardinality Documentation Examples New (properties | types)
  8. 8. Findable Accessible Interoperable Reusable ★Globally unique identifiers ★Community defined enriched metadata ★Indexable by search engines ★JSON-LD/RDFa ★Link to controlled vocabularies ★Links to other resources ★ License ★ Provenance ★Retrievable ★HTTP
  9. 9. Schema.org for Datasets Schema definition: ●Dataset: A body of structured information describing some topic(s) of interest http://schema.org/Dataset ●91 properties including: ○name ○description ○isFamilyFriendly 9
  10. 10. Google Dataset Profile • 2 required properties • Used for Google Dataset Search • 10 recommended properties • Link to DataCatalog • Link to DataDownload Other profiles: Events, Jobs, ... https://developers.google.com/search/docs/data-types/dataset Google Dataset Profile
  11. 11. Compliant with Google Dataset Profile • 5 minimal properties • 8 recommended properties • Link to DataCatalog • Link to DataDownload http://bioschemas.org/specifications/Dataset/ Bioschemas Dataset Profile
  12. 12. Extending Schema.org for the Life Sciences 7 release candidates Submission in progress!
  13. 13. More types in development
  14. 14. 14 Profile Version Group Live Deploys Status notes DataCatalog 0.2 (Jun 2019) Data Repos 20 0.2 fixes minor issues Dataset 0.3 (Jun 2019) Datasets 23 0.3 fixes minor issues Event 0.1 (July 2018) Events 7 Used by TeSS: undergoing revision due to addition of CourseInstance Sample 0.2 (Nov 2018) Samples 1 Taxon 0.3 (Nov 2018) Biodiversity 0 Tool 0.1 (Mar 2018) Tools 5 0.3-DRAFT based on bio.tools profile, needs review TrainingMaterial 0.2 (July 2018) Training 0 Used by TeSS: 0.5-DRAFT incorporating changes from Course Current Bioschemas Profiles
  15. 15. Draft Bioschemas Profiles 15 ● Beacon: 0.2-DRAFT 2018-04-23 ● BioSample: 0.1-DRAFT ● ChemicalSubstance: 0.2-DRAFT 2019-06-11 ● Course: 0.6-DRAFT 2019-06-06 ● CourseInstance: 0.6-DRAFT 2019-06-06 ● DNA: 0.1-DRAFT 2018-11-13 ● DataRecord: 0.2-DRAFT 2019-06-14 ● Gene: 0.5-DRAFT 2019-06-14 ● Journal: 0.1-DRAFT 2019-02-08 ● LabProtocol: 0.3-DRAFT 2019-06-14 ● MolecularEntity: 0.2-DRAFT 2019-11-15 ● Organization: 0.1-DRAFT 2018-03-13 ● Person: 0.1-DRAFT 2018-03-14 ● Phenotype: 0.1-DRAFT 2018-11-15 ● Protein: 0.8-DRAFT 2019-05-08 ● ProteinAnnotation: 0.4-DRAFT 2018-02-25 ● ProteinStructure: 0.5-DRAFT 2018-08-15 ● PublicationIssue: 0.1-DRAFT 2019-02-08 ● PublicationVolume: 0.1-DRAFT 2019-02-08 ● ScholarlyArticle: 0.1-DRAFT 2019-02-08 ● SemanticAnnotation: 0.1-DRAFT 2019-02-08 ● Standard: 0.1-DRAFT 2018-01-01 ● Study: 0.1-DRAFT 2018-11-15 ● Tool: 0.3-DRAFT 2018-11-21 ● TrainingMaterial: 0.6-DRAFT 2019-06-06 ● Workflow: 0.1-DRAFT 2019-02-08
  16. 16. Mapping ProfileUse cases Mockup Adoption Testing Application Profile Creation Process
  17. 17. Bioschemas Software 29 November 2018 http://bioschemas.org 19 Bioschemas Generator ● Supports all profiles ○ Current and draft ● Validates input ● Form generated from YAML description ● Examples extracted from profile
  18. 18. Exploiting Bioschemas Markup
  19. 19. TeSS: Specialised Search http://bioschemas.org • contact • description • endDate • eventType • hostInstitution • location • name • startDate • … Bioschemas Event: 29 November 2018 21
  20. 20. http://bioschemas.org • description • keywords • name • provider • url Bioschema DataCatalog: • alternateName • citation • dateCreated • licence • … Automated Data Curation
  21. 21. Data Exchange: Without an API MarRef → BioSamples https://github.com/EBIBioSamples/bioschemas_marref_demo/blob/master/Summary.md
  22. 22. BKG Explorer Built over Bioschemas markup crawled from 30 live deployments 20,000 pages
  23. 23. Bioschemas What? • Exploiting schema.org to make Life Sciences resources more discoverable • Search engines will index and understand markup How? • Extending schema.org vocabulary for life sciences • 7 release candidate types • Provide guidelines on how to markup resources
  24. 24. 200+ People 7 Tutorial s (2018) 17Type s 6Publications (2018)30Profiles 62Sites 11M+ Pages Bioschemas Community http://bioschemas.org/liveDeploys http://bioschemas.org/ liveDeploys
  25. 25. http://bioschemas.org Acknowledgements http://bioschemas.org/people
  26. 26. http://bioschemas.org/ @bioschemas https://github.com/bioschemas/ Join Bioschemas: http://bioschemas.org/howtojoin/
  27. 27. Creating and Deploying Bioschemas Markup Material from: Justin Clark-Casey License: Attribution 4.0 International (CC BY 4.0) Kenneth McLeod
  28. 28. Creating Bioschemas markup ● Markup is in a format called JSON-LD ● Embedded directly into webpages ● Let’s look at an example of the DataCatalog schema as used by Bioschemas ○ This comes from schema.org but Bioschemas adds ■ Mandatory/recommended/optional properties ■ Cardinality constraints
  29. 29. Markup can be placed in either the head or the body.
  30. 30. Let’s look at this in Google’s Structued Data Testing Tool
  31. 31. @context is overwritten by Google Technically any prefixes can be defined here, e.g., "@context":["https://schema.org", {"OBI":"http://purl.obolibrary.org/obo/OBI_" ...}], "@type":["Sample","OBI:0000747"] … BUT, Google will overwrite this with the basic "@context": "http://schema.org"
  32. 32. @id - gives a node a URL Without @id there are auto-generated URLs for nodes, e.g., <script type="application/ld+json">{ "@context" : "https://schema.org", "@type" : "DataCatalog", ... becomes: _:genid2d4335ed7c72694275bea5b6a86ad9f82b2db0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> . Bad for Linked Data as no one can reference this.
  33. 33. @id - gives a node a URL With an@id you choose the URL for nodes, e.g., <script type="application/ld+json">{ "@context" : "https://schema.org", "@type" : "DataCatalog", "@id" : "https://www.ebi.ac.uk/biosamples" … becomes: <https://www.ebi.ac.uk/biosamples> <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <https://schema.org/DataCatalog> .
  34. 34. Warning! Don’t use the same @id for everything DataCatalog & Dataset defined separately, but combined into a single entity:
  35. 35. GSDTT common errors If you don’t meet Google’s desired property specification for a given type you see errors like: If Bioschemas spec says this is OK, you can ignore error (FYI it is a real error) Not min properties in Bioschemas; do what you want This error is caused by the incorrect target type of location. Description is min property for Bioschemas (ie mandatory)
  36. 36. Bioschema’s Types not yet accepted by Schema: Ignore these
  37. 37. Markup Generator
  38. 38. Example: https://bio.tools/blast
  39. 39. https://blast.ncbi.nlm.nih.gov/Blast.cgi
  40. 40. https://bioschemas.org/devSpecs/Tool/
  41. 41. Evolving Best Practices ● At the moment we largely create markup by hand with validation through Google’s testing tool ○ More validators and tools on the way, see bioschemas.org/tools ● Make pages with markup reachable from your sitemap.xml ○ This will make it easier for some applications to find it. ● Avoid adding Bioschemas markup to the page dynamically (e.g. through Javascript) ○ Applications trying to find your data may not have the resources to render pages. ● Specify an @id ● Evolving guidance at https://github.com/BioSchemas/specifications/wiki/Technical
  42. 42. Questions? ● bioschemas.org ● bioschemas.org/groups/Technical ● https://bioschemas.org/software/ ● Google Structured Data Testing Tool ● kcm1@hw.ac.uk

×