Best practices for generating
linked data
Tutorial @ ICBO 2013
Tutorial Roadmap
Bio2RDF Best Practices
1. Assign a URI for all things
2. Assign labels and identifiers
3. Declare and assign types
4. Prov...
1. Assign URIs for all things
● The base Bio2RDF URI pattern:
http://bio2rdf.org/namespace:identifier
● Data provider reco...
1. Assign URIs for all things
● Data provider records are maintained from
source
○ e.g. DrugBank’s resource IRI for
Leucov...
1. Assign URIs for all things
● Vocabulary namespaces are used for
dataset specific types and predicates
http://bio2rdf.or...
1. Assign URIs for all things
● All valid namespaces are listed in the
Bio2RDF Life Sciences Registry
○ ensures that URIs ...
2. Assign labels and identifiers
● Use rdfs:label to assign a language-specified
label for all resources
○ can be a source...
2. Assign labels and identifiers
● Use Bio2RDF predicates to assign Bio2RDF
namespace and Bio2RDF identifiers:
○ Pattern: ...
2. Assign labels and identifiers
Example: DrugBank entry for Nitrazepam
drugbank:DB0159
rdfs:label "Nitrazepam [drugbank:D...
3. Declare and assign types
● All resources should be typed as being
resources of the dataset
○ Pattern: rdf:type namespac...
3. Declare and assign types
● Object properties and datatype properties
should also be typed
○ Pattern: rdf:type owl:Objec...
4. Provide dataset provenance
data item
Bio2RDF dataset
Features
-Entity-dataset link
-Creator
-Publisher
-Date created
-L...
4. Provide dataset provenance
● link every resource to the versioned/dated
Bio2RDF dataset in which it is described
○ Patt...
A crash course in PHP
PHP : Hypertext Preprocessor
● A general-purpose open source scripting
language
○ homepage : http://php.net
● PHP scripts ...
A hello world PHP script
● All PHP scripts are surrounded by the <?php
and ?> tags
Declaring and instantiating classes
Using the Bio2RDF PHP API to create an
RDFizer
● Basic structure of a Bio2RDFizer script:
○ Initialize script parameters -...
Using the Bio2RDF PHP API to create an
RDFizer
● Bio2RDF PHP API defines helper functions
that implement Bio2RDF best prac...
Example: The Comparative
Toxicogenomics Database
CTD Bio2RDFizer
script is available
on GitHub
Using and contributing to the
Bio2RDF project on GitHub
Using and contributing to the
Bio2RDF project on GitHub
1. Fork the bio2rdf-scripts and php-lib
repositories on Github
htt...
Upcoming SlideShare
Loading in …5
×

Best practices for generating Bio2RDF linked data

686 views
602 views

Published on

Slides summarizing best practices for generating Bio2RDF Linked Data

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
686
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Best practices for generating Bio2RDF linked data

  1. 1. Best practices for generating linked data Tutorial @ ICBO 2013
  2. 2. Tutorial Roadmap
  3. 3. Bio2RDF Best Practices 1. Assign a URI for all things 2. Assign labels and identifiers 3. Declare and assign types 4. Provide dataset provenance
  4. 4. 1. Assign URIs for all things ● The base Bio2RDF URI pattern: http://bio2rdf.org/namespace:identifier ● Data provider record identifiers are maintained from source ● Linked Data = no blank nodes!
  5. 5. 1. Assign URIs for all things ● Data provider records are maintained from source ○ e.g. DrugBank’s resource IRI for Leucovorin http://bio2rdf.org/drugbank:DB00650
  6. 6. 1. Assign URIs for all things ● Vocabulary namespaces are used for dataset specific types and predicates http://bio2rdf.org/drugbank_vocabulary:Drug ● Resource namespaces are used to assign an identifier when one isn't a provided by the source - unique identifier with UUID, hash, counter, concatenated strings, etc http://bio2rdf.org/drugbank_resource:DB00440_DB00650
  7. 7. 1. Assign URIs for all things ● All valid namespaces are listed in the Bio2RDF Life Sciences Registry ○ ensures that URIs are consistent across all Bio2RDF datasets ○ registry is publicly available at http://tinyurl. com/dataregistry
  8. 8. 2. Assign labels and identifiers ● Use rdfs:label to assign a language-specified label for all resources ○ can be a source provided title, a script generated phrase, or a phrase provided in a third party dataset ○ Pattern: rdfs:label "label [ns:id]"@lang ● Use Dublin Core predicates for source- provided label and identifiers ○ Pattern: dc:title "label"@lang (assign language tag only when one is provided) ○ Pattern: dc:identifier "ns:id"^^xsd:string
  9. 9. 2. Assign labels and identifiers ● Use Bio2RDF predicates to assign Bio2RDF namespace and Bio2RDF identifiers: ○ Pattern: bio2rdf_vocabulary:namespace "ns"^^xsd: string ○ Pattern: bio2rdf_vocabulary:identifier "id"^^xsd: string
  10. 10. 2. Assign labels and identifiers Example: DrugBank entry for Nitrazepam drugbank:DB0159 rdfs:label "Nitrazepam [drugbank:DB0159]"@en ; dc:title “Nitrazepam”@en ; dc:identifier “drugbank:DB0159”^^xsd:string ; bio2rdf_vocabulary:namespace “drugbank”^^xsd:string ; bio2rdf_vocabulary:identifier “DB0159”^^xsd:string .
  11. 11. 3. Declare and assign types ● All resources should be typed as being resources of the dataset ○ Pattern: rdf:type namespace_vocabulary:Resource ● Instances of a dataset vocabulary type should also be typed as owl: NamedIndividual ○ Pattern: rdf:type namespace_vocabulary:Type ○ Pattern: rdf:type owl:NamedIndividual ● Classes should be typed as owl:Class ○ Pattern: rdf:type owl:Class ○ If superclass has been described using namespace_vocabulary pattern, then link class using rdfs:subClassOf
  12. 12. 3. Declare and assign types ● Object properties and datatype properties should also be typed ○ Pattern: rdf:type owl:ObjectProperty ○ Pattern: rdf:type owl:DatatypeProperty ● Examples: drugbank:DB0159 rdf:type drugbank_vocabulary:Resource ; rdf:type owl:Class ; rdfs:subClassOf drugbank_vocabulary:Drug . drugbank_vocabulary:ddi-interactor-in rdf:type owl:ObjectProperty .
  13. 13. 4. Provide dataset provenance data item Bio2RDF dataset Features -Entity-dataset link -Creator -Publisher -Date created -License & rights -Source -Availability - SPARQL endpoint - Data dump Vocabularies VoID Dublin Core W3C Provenance Bio2RDF vocabulary Source dataset prov:wasDerivedFrom void:inDataset
  14. 14. 4. Provide dataset provenance ● link every resource to the versioned/dated Bio2RDF dataset in which it is described ○ Pattern: void:inDataset <http://bio2rdf.org/dataset: namespace-dd-mm-yyyy.rdf> ○ Example: drugbank:DB0159 void:inDataset <http://bio2rdf. org/dataset:drugbank-03-07-2013> .
  15. 15. A crash course in PHP
  16. 16. PHP : Hypertext Preprocessor ● A general-purpose open source scripting language ○ homepage : http://php.net ● PHP scripts can be executed from the command line or embedded in HTML documents ● Syntactically similar to C/C++/Java but it is not strongly typed
  17. 17. A hello world PHP script ● All PHP scripts are surrounded by the <?php and ?> tags
  18. 18. Declaring and instantiating classes
  19. 19. Using the Bio2RDF PHP API to create an RDFizer ● Basic structure of a Bio2RDFizer script: ○ Initialize script parameters - input file(s), default dataset namespace, etc. ○ Define a Run() function that handles downloading and iterating over input files, as well as function calls to parse and convert input data to RDF ○ Define function(s) to convert input data to RDF using Bio2RDF API helper functions
  20. 20. Using the Bio2RDF PHP API to create an RDFizer ● Bio2RDF PHP API defines helper functions that implement Bio2RDF best practices: ○ getNamespace() ○ getVoc() ○ getRes() ○ triplify($subject, $predicate, $object) //object is an rdf resource ○ triplifyString($subject, $predicate, "string")// object is a literal ○ describeIndividual($uri, $label, $type, $title, $description, $language) ○ describeClass( ... ) ○ describeProperty ( ... )
  21. 21. Example: The Comparative Toxicogenomics Database CTD Bio2RDFizer script is available on GitHub
  22. 22. Using and contributing to the Bio2RDF project on GitHub
  23. 23. Using and contributing to the Bio2RDF project on GitHub 1. Fork the bio2rdf-scripts and php-lib repositories on Github https://help.github.com/articles/fork-a-repo 2. Write some code! 3. Commit code to your fork 4. Make a pull request to the bio2rdf-scripts repo

×