Your SlideShare is downloading. ×
Linking Linked Data CSHALS2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Linking Linked Data CSHALS2013

129
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
129
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Linking Linked Data Linked Data to Integrated DataExpert Bioinformatics from Bioinformatics Experts
  • 2. Put your data on the webmake a pretty web site later. Expert Bioinformatics from Bioinformatics Experts
  • 3. Expert Bioinformatics from Bioinformatics Experts
  • 4. Now we can ask questions like this...What members of a target pathway are already targeted in other diseases? Target Pathway Disease Chembl Uniprot Reactome OMIM Protein Target Compound Pathway Disease Expert Bioinformatics from Bioinformatics Experts
  • 5. Because we have lots of data exposedas RDF Uniprot:Protein BioPAX:Protein Mim:Phenotype Expert Bioinformatics from Bioinformatics Experts
  • 6. What do you do when you have to adddata... Expert Bioinformatics from Bioinformatics Experts
  • 7. Or connect SPARQL endpoints? RDF != Linked Data Expert Bioinformatics from Bioinformatics Experts
  • 8. Is your data 5* ? Linked data is essential to actually connect the semantic web. It is quite easy to do with a little thought, and becomes second nature. Various common sense considerations determine when to make a link and when not to. Expert Bioinformatics from Bioinformatics Experts
  • 9. Example openflydata to BioCyc What genes are differentially expressed in the hindgut and are there any pathways associated with those genes? ● Use FlyAtlas at openflydata.org for tissue specific expression profiles. ● Use FlyCyc from BioCyc. ● Then SPARQL Expert Bioinformatics from Bioinformatics Experts
  • 10. Problem: Node URIs<http://openflydata.org/id/flyatlas/affyid/1616608_a_at><http://purl.org/NET/flyatlas/schema#gene><http://openflydata.org/id/flybase/feature/FBgn0001128> .<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#xref><http://biocyc.org/biopax/biopax-level3#Protein202210> .<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#db> FlyCyc .<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#id> FBGN0001128 . Expert Bioinformatics from Bioinformatics Experts
  • 11. Integration Level 1Use Identifiers.org CONSTRUCT { ?x RDFS:seeAlso `bif:sprintf_iri ("http://identifiers.org/flybase/%s", ?id)` } WHERE { ?x BP:unificationxref ?xref . ?xref BP:id ?id . ?blank BP:db "FlyCyc"^^xsd:string } Expert Bioinformatics from Bioinformatics Experts
  • 12. Integration Level 2adding property characteristics BP = <http://www.biopax.org/release/biopax-level3.owl#>BP:Protein BP:controls BP:CatalysisBP:Catalysis BP:controls BP:BioChemicalReactionBP:Protein BP:controls BP:BioChemicalReactionCONSTRUCT {?x GB:controlledBy ?y }WHERE { ?x BP:controls ?catalysis . ?catalysis BP:controls ?y } Expert Bioinformatics from Bioinformatics Experts
  • 13. Integration Level 3class subsumption FlyA = <http://purl.org/NET/flyatlas/schema#>flywebflyatlas:1616608_a_at a flyatlas:ProbeData BP = <http://www.biopax.org/release/biopax-level3.owl#> flyatlas:ProbeData rdfs:subClassOf BP:DNARegionCONSTRUCT {?x a BP:DNARegion }WHERE { ?x a flyatlas:ProbeData } Expert Bioinformatics from Bioinformatics Experts
  • 14. Connect BiochemicalReactions toExpression ValuesSELECT ?name ?id ?meanWHERE{ ?reaction a BP:BiochemicalReaction . ?reaction BP:standardName ?name . ?reaction GB:controlledBy ?protein . ?protein a BP:Protein . ?protein BP:xref ?id . ?probe a BP:DNARegion . ?probe BP:xref ?id . ?probe flyatlas:l_fatbody ?blank . ?blank flyatlas:mean ?mean}LIMIT 5 No Reasoner – just a few SPARQL CONSTRUCTs Expert Bioinformatics from Bioinformatics Experts
  • 15. Expert Bioinformatics from Bioinformatics Experts
  • 16. Client Architecture Expert Bioinformatics from Bioinformatics Experts
  • 17. Vocabularies in Linked DataWhat does the linked data cloud know about Drugs.... chembl:Activity chembl:Assay chembl:AssayCategorySELECT distinct ?class chembl:AssayTargetLinkWHERE chembl:ChemicalCompound >100 chembl:DrugTarget{ chembl:LiteratureCitation ?s a ?class . dailymed:drugs ?s ?p ?o drugbank:Drug} drugbank:DrugInteraction drugbank:EnzymeLink drugbank:ExternalIdentifier drugbank:ExternalLink drugbank:LiteratureCitation drugbank:Molecule drugbank:OrganismSpecies drugbank:Patent drugbank:ProteinSequence drugbank:TargetLink entrez:EnsemblReference entrez:Gene pdb:Molecule pdb:Structure pubmed:Chemical pubmed:Citation Expert Bioinformatics from Bioinformatics Experts pubmed:DatabankReference
  • 18. Create a tighter more unified “view” underone schema Expert Bioinformatics from Bioinformatics Experts
  • 19. Unified VocabularyWhat does the linked data cloud know about Drugs.... Expert Bioinformatics from Bioinformatics Experts
  • 20. Map Classes and Properties into asingle instantiated view Expert Bioinformatics from Bioinformatics Experts
  • 21. Before QuerySELECT *WHERE{?s drugb:calculatedInChIKey ?inchiD .?s a drugb:Drug .?c a Chembl:ChemicalCompund .?c chembl:standardInChIKey ?inchiC .FILTER regex(?inchiD, ?inchiC)} Expert Bioinformatics from Bioinformatics Experts
  • 22. After QuerySELECT *where{?s a GB:Drug .?s GB:inchiKey ?inchi .} Expert Bioinformatics from Bioinformatics Experts
  • 23. Linked Data Architecture Expert Bioinformatics from Bioinformatics Experts
  • 24. Creating fixed “views” of Linked DataWhen the use of integrated data is fixed e.g. an API orapplication, Linked Data can be expensive: – Changes to data requires significant recoding – Multiple Schemas make queries long and inefficient• A view or middle layer of data used by the API, changes to data are managed by the view and the API is minimally disturbed – Views are easier to query – Views are faster to query• Client gets the best of both worlds a tight view of data for API queries while still having all the advantages of a linked data strategy. Expert Bioinformatics from Bioinformatics Experts
  • 25. Summary● Exposing data as RDF does not equal Linked Data● Making data linked is not hard – Node IRIs – Unifying Classes – Transitive closure of Properties● A little semantics goes a long way (no reasoner required)● Creating “Views” from one schema to another is not hard. – But should be easier Expert Bioinformatics from Bioinformatics Experts
  • 26. www.generalbioinformatics.com/science.html Expert Bioinformatics from Bioinformatics Experts