LINKED DATA AND ONTOLOGY
TUTORIAL
R D - C O N N E C T T U T O R I A L , H E I D E L B E RG 2 0 1 4
M a r c o R o o s , P e...
2
1. Basic introduction to Linked Data
1. The problem
2. Linked Data Approach
3. Linked Data Architecture
4. Nanopublicati...
Marco Roos1, Pedro Lopes2,
Mark Thompson1, Rajaram Kaliyaperumal1
1. BioSemantics Group, Human Genetics Department,Leiden ...
4
Ulrike Braisch’ Problem
C (USA) R2 (EU) R3 (EU)
Education
level
C_EDUC:
7 levels
Edlevel:
9 levels
Isced:
7 levels
Marit...
5
Ulrike Braisch’ Problem
C (USA) R2 (EU) R3 (EU)
Education
level
C_EDUC:
7 levels
Edlevel:
9 levels
Isced:
7 levels
Marit...
6
Ulrike Braisch’ Problem
6
Registry 1
Registry 2
Registry 3
A ≠ A’ ≠ A’’, B ≠ B’ ≠ B’’,
C ≠ C’ ≠ C’’
Can I rely on what I...
7
Solution 1: Ulrike solves the problem
7
Registry 1
A B C
Registry 2
A’ B’ C’
My ‘Registry’
A’’’ B’’’ C’’’
Ulrike has to ...
8
I wish to...
 correlate patient characteristics with CAG repeat
length (Ulrike)
 correlate clinical data with genome d...
9
I wish to...
 correlate patient characteristics with CAG repeat
length (Ulrike)
 correlate clinical data with genome d...
10
Solution 1: Bob, Alice, Ulrike,
Christian, Don solve the problem
Registry 1
A B C
Registry 2
A’ B’ C’
They all
do the
h...
11
Can computers help? – NO!
Registry 1
A B C
Registry 2
A’ B’ C’
Computers
cannot help;
not for
alignment
12
Effort for data integration
Experiment
Data
generation
Data
Integration
Analysis
Application
Gain
Data
Knowledge
The (s...
13
PainPain
Effort for data integration
Experiment
Data
generation
Data
Integration
Analysis
Application
Gain
Pain
Pain
Da...
14
PainPain
Effort for data integration
Experiment
Data
generation
Data
Integration
Analysis
Application
Gain
Pain
Pain
Da...
15
Pain
Pain
Linked Data = Redistribution of pain
to enable computers to help us
15
Pain
Gain
Pain
Pain
Experiment
Data
ge...
16
Pain
Pain
Linked Data = Redistribution of pain
to enable computers to help us
16
Pain
Gain
Pain
Pain
Experiment
Data
ge...
 The three layers of data “harmonization”
 The key role of “Uniform Resource Identifiers”
 Sayings things with Linked D...
18
Disentangling harmonization
“Harmonization” is
commonly used to refer to
aligning what samples and
data are collected w...
19
Disentangling harmonization
It is useful to discriminate
three aspects of
”Harmonization”
…
and avoid conflating them
20
 Harmonize what is
measured and how
 Harmonize classification
and relations (meaning)
 Harmonize how we make
it comp...
21
1) Harmonize what is
measured and how
2) Harmonize classification
and relations (meaning)
3) Harmonize how we
make it c...
22
 Harmonize what is
measured and how
 Harmonize classification
and relations (meaning)
 Harmonize how we make
it comp...
23
Use of ontologies, but not Linked Data
C (USA) R2 (EU) R3 (EU) Ontology
Education
level
C_EDUC:
7 levels
Edlevel:
9 lev...
24
Use of ontologies, but not Linked Data
C (USA) R2 (EU) R3 (EU) Ontology
Education
level
C_EDUC:
7 levels
Edlevel:
9 lev...
25
Universal Resource Identifier
Linked Data: unique computer-
readable identifiers
<URI> <URI> <URI> <URI>
<URI> <URI> <U...
26
Universal Resource Identifier
Linked Data: unique computer-
readable identifiers
<URI> <URI> <URI> <URI>
<URI> <URI> <U...
27
http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877...
Universal Resource Identifier
An example URI…
Why are the...
28
http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877...
A Universal Resource Identifier (URI) is…
A unique identi...
29
http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877...
Universal Resource Identifier
And they look familiar…
30
Reuse of technology:
world wide web hyperlinks
<a href=“http://www.ni.nlm.nih.gove/pubmed/18927111">
31
Reuse of technology:
world wide web hyperlinks
<a href=“http://www.ni.nlm.nih.gove/pubmed/18927111">
For Linked Data we...
32
Documents for human consumption
Document 1
Document 2
http://www.ncbi.nlm.nih.gov/
pubmed/18927111
Hyperlinks (URIs) li...
33
Data for computer consumption
http://www.ncbi.nlm.nih.gov/
pubmed/18927111
Hyperlinks (URIs) can link data
‘Linked Data...
34
http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877...
Universal Resource Identifier (URI)
100% Unique!
“Address...
35
http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877...
Universal Resource Identifier (URI)
100% Unique!
“Address...
36
Predicate Objectsubject
<HDAC1>
<malaria>
<mutation X>
<interacts with>
<is transmitted by>
<has frequency>
<ParvB>
<mo...
37
Predicate Objectsubject
<HDAC1>
<malaria>
<mutation X>
<interacts with>
<is transmitted by>
<has frequency>
<ParvB>
<mo...
38
http://purl.uniprot.org/uniprot/Q13547
http://conceptwiki.org/index.php/Concept:e6559...
http://bio2rdf.org/geneid:2978...
39
http://purl.uniprot.org/uniprot/Q13547
http://conceptwiki.org/index.php/Concept:e6559...
http://bio2rdf.org/geneid:2978...
40
http://purl.uniprot.org/uniprot/Q13547
http://conceptwiki.org/index.php/Concept:e6559...
http://bio2rdf.org/geneid:2978...
41
http://purl.uniprot.org/uniprot/Q13547
http://conceptwiki.org/index.php/Concept:e6559...
http://bio2rdf.org/geneid:2978...
42
http://purl.uniprot.org/uniprot/Q135
47.rdf
“HDAC1”
The UniProt Linked Data
representation of HDAC1:
many more triples!
43
http://purl.uniprot.org/uniprot/Q13547
We said all that by just
this reference
Things we can say
URIs are references. N...
44
http://purl.uniprot.org/uniprot/Q13547
<URI for a type of relation>
<URI for object of relation>
Things we can say: rel...
45
http://purl.uniprot.org/uniprot/Q13547
<URI for “label”>
“HDAC1”
Things we can say: human readable
labels
Here we add a...
46
http://purl.uniprot.org/uniprot/Q13547
<URI for “is of type”>
<URI for class Protein>
Things we can say: classify
“HDAC...
47
http://purl.uniprot.org/uniprot/Q13547
<URI for “is of type”>
<URI for class Protein>
<URI for “has label”>
“Protein”
T...
48
http://purl.uniprot.org/uniprot/Q13547
<URI for “is of type”>
<URI for class Protein>
<URI for “has label”>
“Protein”
T...
49
http://purl.uniprot.org/uniprot/Q13547
<URI for “is of type”>
<URI for class Protein>
<URI for “label”>
“Protein”
Thing...
50
“parvb”
“HDAC1”
“Interacts with”
“genome
location <…>”
“has genome location”
“Homo
Sapiens”
“Species”
“in species”
“in ...
51
“parvb”
“HDAC1”
“Interacts with”
“genome
location <…>”
“has genome location”
“Homo
Sapiens”
“Species”
“in species”
“in ...
52
http://purl.uniprot.org/uniprot/Q13547
http://conceptwiki.org/index.php/Concept:e6559...
http://bio2rdf.org/geneid:2978...
53
“parvb”
“HDAC1”
“Interacts with”
“genome
location <…>”
“has genome location”
“Homo
Sapiens”
“Species”
“in species”
“in ...
54
http://purl.uniprot.org/uniprot/Q13547
<URI for “is same as”>
<URI in other resource>
Things we can say: mappings
Back ...
55
http://purl.uniprot.org/uniprot/Q13547
<URI for “also referred to as”>
<URI in other resource>
Things we can say: mappi...
56
By using these URIs
Ulrike Braisch’ Problem
<URI for C> <URI for R2> <URI for R3>
<URI for
Education
level>
<URI for C_...
57
Linked Data for Ulrike
<URI for C>, <URI for R2>, <URI for R3>
<URI for “is of type”>
<URI for RD resource>
<URI for Ed...
58
Linked Data is not
 Painless data integration and computer reasoning
 Harmonization moved up to early data management...
59
Linked data is
 A way to enable computers to help harmonize
 Everything has a unique reference
 Ontologies say what ...
Linked Data Architecture
25 April 2014
In the next few slides we
show (simplified) how
Linked Data systems work
61
Most common use: common reference
25 April 2014
Smoker
Heavy smoker
Light smoker
Gene Expression
Database
Clinical Regi...
62
Most common use: common reference
25 April 2014
Smoker
Heavy smoker
Light smoker
Gene Expression
Database
Clinical Regi...
63
Most common use: common reference
25 April 2014
Smoker
Heavy smoker
Light smoker
Gene Expression
Database
Clinical Regi...
64
Typical Linked Data architecture for
data integration applications
64
Linked
Data Cache
(e.g. running COEUS)
Case
Study...
65
Typical Linked Data architecture for
data integration applications
65
Linked
Data Cache
(e.g. running COEUS)
Case
Study...
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Sp...
Claim your findings as Nanopublications
Nanopublication
Mark Thompson, Rajaram Kaliyaperumal
67
It was
me, me,
me!
Finally...
68
 What do you say with a Nanopublication?
 Minimal statement for which you deserve credit
 How you came to say it (pr...
69
 What do you say with a Nanopublication?
 Minimal statement for which you deserve credit
 How you came to say it (pr...
70
Pain
Pain
Fame and glory (and reproducibility):
Nanopublication!
Pain
Gain
Pain
Pain
Experiment
Data
generation
Integra...
71
Pain
Pain
Fame and glory (and reproducibility):
Nanopublication!
Pain
Gain
Pain
Pain
Experiment
Data
generation
Integra...
Acknowledgements
Ulrike Braisch (University of ULM, Germany)
RD-Connect (EU-FP7)
Leiden University Medical Center
Dutch Te...
Upcoming SlideShare
Loading in...5
×

Linked Data and Ontology Tutorial (for RD-Connect)

1,228

Published on

In this tutorial we explain the basics of a 'Linked Data and Ontology' approach for combining data, in particular for the study of rare diseases. The approach is motivated by a case study provided by health care researcher Ulrike Braisch. The main take home lesson is that with this approach the effort for data integration can be substantially lowered, i.e. lead to a shorter path to new treatments for (rare) diseases.

The presentation is based on a tutorial given at the RD-Connect/Neuromics/Euronomics plenary meeting in Heidelberg, Germany, February 26, 2014. It was made possible by RD-Connect, a European project to support Rare Disease research (http://www.rd-connect.eu).

0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,228
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Ulrike’s problem: ndefined sameness
  • Ulrike’s problem: ndefined sameness
  • Ulrike’s problem: ndefined sameness
  • Ulrike’s problem: ndefined sameness
  • Ulrike’s problem: ndefined sameness
  • Ulrike’s problem: ndefined sameness
  • Ulrike’s problem: ndefined sameness
  • Ulrike’s problem: ndefined sameness
  • Ulrike’s problem: ndefined sameness
  • Transcript of "Linked Data and Ontology Tutorial (for RD-Connect)"

    1. 1. LINKED DATA AND ONTOLOGY TUTORIAL R D - C O N N E C T T U T O R I A L , H E I D E L B E RG 2 0 1 4 M a r c o R o o s , P e d r o L o p e s , M a r k T h o m p s o n , R a j a r a m K a i l y a p e r u m a l A c k n o w l e d g e m e n t s : U l r i k e B r a i s c h ( U L M ) , P a u l G r o t h a n d F r a n k v a n H a r m e l e n ( V U A m s t e r d a m ) , B i o S e m a n t i c s g r o u p L U M C R D - C o n n e c t L i n k e d D a t a & O n t o l o g y T a s k F o r c e , 2 0 1 3 - 2 0 1 4 1
    2. 2. 2 1. Basic introduction to Linked Data 1. The problem 2. Linked Data Approach 3. Linked Data Architecture 4. Nanopublication Agenda
    3. 3. Marco Roos1, Pedro Lopes2, Mark Thompson1, Rajaram Kaliyaperumal1 1. BioSemantics Group, Human Genetics Department,Leiden University Medical Center, The Netherlands – http://biosemantics.org 2. Bioinformatics & Computational Biology Group, University of Aveiro, Portugal – http://bioinformatics.ua.pt Acknowledgements: Ulrike Braisch (ULM), Paul Groth (VU Amsterdam), BioSemantics group EMC/LUMC, RD-Connect Linked Data & Ontology Task Force Introduction to Linked Data3
    4. 4. 4 Ulrike Braisch’ Problem C (USA) R2 (EU) R3 (EU) Education level C_EDUC: 7 levels Edlevel: 9 levels Isced: 7 levels Marital status C_MARSTAT: never, now, separated, divorced, divorced Maristat: single, married, partnership, divorced, widowed Maristat: single, married, partnership, divorced, widowed Age/date of birth Age at baseline in years Exact age at visit Exact age at visit I wish to correlate patient characteristics
    5. 5. 5 Ulrike Braisch’ Problem C (USA) R2 (EU) R3 (EU) Education level C_EDUC: 7 levels Edlevel: 9 levels Isced: 7 levels Marital status C_MARSTAT: never, now, separated, divorced, divorced Maristat: single, married, partnership, divorced, widowed Maristat: single, married, partnership, divorced, widowed Age/date of birth Age at baseline in years Exact age at visit Exact age at visit Ulrike’s Problem: the data in the fields pertain to very similar things, but not exactly the same. How similar she does not know a priori. I wish to correlate patient characteristics
    6. 6. 6 Ulrike Braisch’ Problem 6 Registry 1 Registry 2 Registry 3 A ≠ A’ ≠ A’’, B ≠ B’ ≠ B’’, C ≠ C’ ≠ C’’ Can I rely on what I think the headers mean? A B C A’’ B’’ C’’ A’ B’ C’ How to align the data? I wish to correlate patient characteristics
    7. 7. 7 Solution 1: Ulrike solves the problem 7 Registry 1 A B C Registry 2 A’ B’ C’ My ‘Registry’ A’’’ B’’’ C’’’ Ulrike has to do the alignment herself. She has to do the heavy lifting for data integration
    8. 8. 8 I wish to...  correlate patient characteristics with CAG repeat length (Ulrike)  correlate clinical data with genome data (Bob)  compare Huntington data with Alzheimer data (Alice)  study social aspects of clinical surveys (Christian)  compute the commonalities between all diseases (Don) Not just Ulrike’s problem
    9. 9. 9 I wish to...  correlate patient characteristics with CAG repeat length (Ulrike)  correlate clinical data with genome data (Bob)  compare Huntington data with Alzheimer data (Alice)  study social aspects of clinical surveys (Christian)  compute the commonalities between all diseases (Don) Not just Ulrike’s problem The data are valuable for many people; they all face the same problem
    10. 10. 10 Solution 1: Bob, Alice, Ulrike, Christian, Don solve the problem Registry 1 A B C Registry 2 A’ B’ C’ They all do the heavy lifting
    11. 11. 11 Can computers help? – NO! Registry 1 A B C Registry 2 A’ B’ C’ Computers cannot help; not for alignment
    12. 12. 12 Effort for data integration Experiment Data generation Data Integration Analysis Application Gain Data Knowledge The (simplified) steps of data integration. How is the pain for data integration distributed?
    13. 13. 13 PainPain Effort for data integration Experiment Data generation Data Integration Analysis Application Gain Pain Pain Data Knowledge Pain
    14. 14. 14 PainPain Effort for data integration Experiment Data generation Data Integration Analysis Application Gain Pain Pain Data Knowledge Pain Data are not explicitly prepared for data integration (apart from storing them in tables/files/databases). The pain of data integration is with Ulrike. Computers can not help her with that.
    15. 15. 15 Pain Pain Linked Data = Redistribution of pain to enable computers to help us 15 Pain Gain Pain Pain Experiment Data generation Integration Analysis Application Data Knowledge “Linked Data” moves the pain and enables computers
    16. 16. 16 Pain Pain Linked Data = Redistribution of pain to enable computers to help us 16 Pain Gain Pain Pain Experiment Data generation Integration Analysis Application Data Knowledge The goal of “Linked Data”Take home message: “Linked Data” does not take the pain of data integration away; alignment remains necessary. But it moves the pain to data experts, making the overall workflow more efficient. And it enables computers to help. Next we explain how…
    17. 17.  The three layers of data “harmonization”  The key role of “Uniform Resource Identifiers”  Sayings things with Linked Data  Linked Data Infrastructure Linked Data and Ontology approach 17
    18. 18. 18 Disentangling harmonization “Harmonization” is commonly used to refer to aligning what samples and data are collected within a consortium
    19. 19. 19 Disentangling harmonization It is useful to discriminate three aspects of ”Harmonization” … and avoid conflating them
    20. 20. 20  Harmonize what is measured and how  Harmonize classification and relations (meaning)  Harmonize how we make it computable Disentangling harmonization
    21. 21. 21 1) Harmonize what is measured and how 2) Harmonize classification and relations (meaning) 3) Harmonize how we make it computable Disentangling harmonization Ontologies Linked Data Consensus (1) is about agreement between people, (2) is about how to call things in our data, (3) is about enabling computers to help
    22. 22. 22  Harmonize what is measured and how  Harmonize classification and relations (meaning)  Harmonize how we make it computable Disentangling harmonization Ontologies Linked Data Consensus Syntax Semantics Ontologies have 2 roles: (i) enforce compliance with the consensus, (ii) convey meaning to computers; they have a human and computer-readable representation Agreement
    23. 23. 23 Use of ontologies, but not Linked Data C (USA) R2 (EU) R3 (EU) Ontology Education level C_EDUC: 7 levels Edlevel: 9 levels Isced: 7 levels Onto:1234 Marital status C_MARSTAT: never, now, separated, divorced, divorced Maristat: single, married, partnership, divorced, widowed Maristat: single, married, partnership, divorced, widowed Onto:2345 Age/date of birth Age at baseline in years Exact age at visit Exact age at visit Onto:3456 Perhaps confusing, but ontology identifiers (like GO or HPO IDs) are often not readily readable for computers...
    24. 24. 24 Use of ontologies, but not Linked Data C (USA) R2 (EU) R3 (EU) Ontology Education level C_EDUC: 7 levels Edlevel: 9 levels Isced: 7 levels Onto:1234 Marital status C_MARSTAT: never, now, separated, divorced, divorced Maristat: single, married, partnership, divorced, widowed Maristat: single, married, partnership, divorced, widowed Onto:2345 Age/date of birth Age at baseline in years Exact age at visit Exact age at visit Onto:3456For a computer they are but a string of symbols; adding these IDs to a table is good, but it is not Linked Data yet.
    25. 25. 25 Universal Resource Identifier Linked Data: unique computer- readable identifiers <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> This is more like it for computers!
    26. 26. 26 Universal Resource Identifier Linked Data: unique computer- readable identifiers <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> <URI> ‘Uniform Resource Identifiers’ are identifiers for computers The URI is an international recommendation by the World Wide Web Consortium (W3C)
    27. 27. 27 http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877... Universal Resource Identifier An example URI… Why are they so useful?...
    28. 28. 28 http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877... A Universal Resource Identifier (URI) is… A unique identifier for data or concept A unique reference for data or concept Computer-readable Universal Resource Identifier URIs are three things at once
    29. 29. 29 http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877... Universal Resource Identifier And they look familiar…
    30. 30. 30 Reuse of technology: world wide web hyperlinks <a href=“http://www.ni.nlm.nih.gove/pubmed/18927111">
    31. 31. 31 Reuse of technology: world wide web hyperlinks <a href=“http://www.ni.nlm.nih.gove/pubmed/18927111"> For Linked Data we simply reuse what made the World Wide Web such a success: the hyperlink… What is different?...
    32. 32. 32 Documents for human consumption Document 1 Document 2 http://www.ncbi.nlm.nih.gov/ pubmed/18927111 Hyperlinks (URIs) link documents The Web as we know it links documents for humans
    33. 33. 33 Data for computer consumption http://www.ncbi.nlm.nih.gov/ pubmed/18927111 Hyperlinks (URIs) can link data ‘Linked Data’ links data for computers (enabling them to support us)
    34. 34. 34 http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877... Universal Resource Identifier (URI) 100% Unique! “Address” data itemProtocol for exchange by computers Computer-readable reference for data URIs function through three main elements: [protocol][address][ID]
    35. 35. 35 http://rdf.biosemantics.org/owl/BioSemanticsConcepts#c3877... Universal Resource Identifier (URI) 100% Unique! “Address” data itemProtocol for exchange by computers Computer-readable reference for data A URI can represent many things: a gene, a person, a value, but also a relation, such as ‘causes’
    36. 36. 36 Predicate Objectsubject <HDAC1> <malaria> <mutation X> <interacts with> <is transmitted by> <has frequency> <ParvB> <mosquitos> <0.25%> Can we say things with URIs? A ‘triple’ of URIs can form a computer-readable statement
    37. 37. 37 Predicate Objectsubject <HDAC1> <malaria> <mutation X> <interacts with> <is transmitted by> <has frequency> <ParvB> <mosquitos> <0.25%> Can we say things with URIs? Subject, Predicate, and Object are each URIs URIs are not for humans, but they are often supplied with a web page for humans…
    38. 38. 38 http://purl.uniprot.org/uniprot/Q13547 http://conceptwiki.org/index.php/Concept:e6559... http://bio2rdf.org/geneid:29780 <HDAC1> <interacts with> <PARVB> computer-readable => human readable Simply copy a URI to your browser NB This may not always give you a human readable web page
    39. 39. 39 http://purl.uniprot.org/uniprot/Q13547 http://conceptwiki.org/index.php/Concept:e6559... http://bio2rdf.org/geneid:29780 Linked data = computer readable knowledge “HDAC1 interacts with Parvb” Back to our triple Note that we ‘said’ something meaningful! Triples allow us to say things that computers can understand
    40. 40. 40 http://purl.uniprot.org/uniprot/Q13547 http://conceptwiki.org/index.php/Concept:e6559... http://bio2rdf.org/geneid:29780 URIs in one triple can point to different locations Linked data = computer readable knowledge “HDAC1 interacts with Parvb” Think of the implication for Data Integration
    41. 41. 41 http://purl.uniprot.org/uniprot/Q13547 http://conceptwiki.org/index.php/Concept:e6559... http://bio2rdf.org/geneid:29780 Linked data = computer readable knowledge “HDAC1 interacts with Parvb” Remember that URIs are also references they may refer to more information… Is this all we said?
    42. 42. 42 http://purl.uniprot.org/uniprot/Q135 47.rdf “HDAC1” The UniProt Linked Data representation of HDAC1: many more triples!
    43. 43. 43 http://purl.uniprot.org/uniprot/Q13547 We said all that by just this reference Things we can say URIs are references. No need to download a whole ontology or all of UniProt in your own knowledge base What kind of things can we say?
    44. 44. 44 http://purl.uniprot.org/uniprot/Q13547 <URI for a type of relation> <URI for object of relation> Things we can say: relation http://purl.uniprot.org/uniprot/Q13547 http://conceptwiki.org/index.php/Concept:e6559... http://bio2rdf.org/geneid:29780 “HDAC1” We already saw the (biological) relation
    45. 45. 45 http://purl.uniprot.org/uniprot/Q13547 <URI for “label”> “HDAC1” Things we can say: human readable labels Here we add a label for humans. Software engineers use this in the User Interface of their tools. URIs are used ‘under the hood’.
    46. 46. 46 http://purl.uniprot.org/uniprot/Q13547 <URI for “is of type”> <URI for class Protein> Things we can say: classify “HDAC1” Here we say what type of thing a URI represents: we classify a URI.
    47. 47. 47 http://purl.uniprot.org/uniprot/Q13547 <URI for “is of type”> <URI for class Protein> <URI for “has label”> “Protein” Things we can say: classify + human readable labels “HDAC1” …and we add a label for this class.
    48. 48. 48 http://purl.uniprot.org/uniprot/Q13547 <URI for “is of type”> <URI for class Protein> <URI for “has label”> “Protein” Things we can say: classify + human readable labels “HDAC1” Classification is special: here is where Linked Data and Ontologies meet
    49. 49. 49 http://purl.uniprot.org/uniprot/Q13547 <URI for “is of type”> <URI for class Protein> <URI for “label”> “Protein” Things we can say: human readable labels This is from an ontology! Good ontologies have a “URI” representation (format: OWL/RDF)
    50. 50. 50 “parvb” “HDAC1” “Interacts with” “genome location <…>” “has genome location” “Homo Sapiens” “Species” “in species” “in species” instance of “Genome Location” instance of “Protein” instance of instance of “Gene” “encodes” “Biological Entity” “subclass of” “subclass of” “subclass of” Knowledge and data represented by graphs With Linked Data we build knowledge graphs. NB we decide what to include per application.
    51. 51. 51 “parvb” “HDAC1” “Interacts with” “genome location <…>” “has genome location” “Homo Sapiens” “Species” “in species” “in species” instance of “Genome Location” instance of “Protein” instance of instance of “Gene” “encodes” “Biological Entity” “subclass of” “subclass of” “subclass of” Knowledge and data represented by graphs
    52. 52. 52 http://purl.uniprot.org/uniprot/Q13547 http://conceptwiki.org/index.php/Concept:e6559... http://bio2rdf.org/geneid:29780 http://nanopub.org/nschema/hasPublicationInfo http://nanopub.org/4214adf1... http://swan.mindinformatics.org/.../pav/Author http://orcid.org/0000-0002-8691-772X Things we can say: it was me! “HDAC1 interacts with Parvb” “nanopublication authored by me!” , Example: acknowledgement by Nanopublication What we say is not limited to biology… BiologyCredit
    53. 53. 53 “parvb” “HDAC1” “Interacts with” “genome location <…>” “has genome location” “Homo Sapiens” “Species” “in species” “in species” instance of “Genome Location” instance of “Protein” instance of instance of “Gene” “encodes” “Biological Entity” “subclass of” “subclass of” “subclass of” Knowledge and data represented by graphs myNanopub:myAssertion Our name is on this now
    54. 54. 54 http://purl.uniprot.org/uniprot/Q13547 <URI for “is same as”> <URI in other resource> Things we can say: mappings Back to Ulrike. One other type of relation: the mapping. We state what is what between resources.
    55. 55. 55 http://purl.uniprot.org/uniprot/Q13547 <URI for “also referred to as”> <URI in other resource> Things we can say: mappings Vocabularies exist for sophisticated mapping (also as URIs) We can do that in a precise and subtle way
    56. 56. 56 By using these URIs Ulrike Braisch’ Problem <URI for C> <URI for R2> <URI for R3> <URI for Education level> <URI for C_EDUC>: <URIs for 7 levels> <URI for Edlevel> <URIs for 9 levels> <URI for Isced> <URI for 7 levels> <URI for Marital status> <URI for C_MARSTAT> <URIs for never, now, separated, divorced, divorced> <URI for Maristat> <URIs for single, married, partnership, divorced, widowed> <URI for Maristat> <URIs for single, married, partnership, divorced, widowed> <URI for Age/date of birth> <URI for Age at baseline in years> <URI for Exact age at visit> <URI for Exact age at visit> I wish to correlate patient characteristics with CAG repeat length If Ulrike’s table were Linked Data…
    57. 57. 57 Linked Data for Ulrike <URI for C>, <URI for R2>, <URI for R3> <URI for “is of type”> <URI for RD resource> <URI for Edlevel level 3> <URI for “is narrower than”> <URI for C_EDUC level 2> <URI for lsced level 3> <URI for “is same as”> <URI for C_EDUC level 2> <URI for C_MARSTAT:divorced> <URI for “is same as”> <URI for Maristat:divorced> <URI for C_MARSTAT:never> <URI for “is related to”> <URI for Maristat:single> <URI for C_MARSTAT>, <URI for Maristat> <URI for “subclass of”> <URI for Marital status> We also say… Remember: URI = ID + Reference + Computable
    58. 58. 58 Linked Data is not  Painless data integration and computer reasoning  Harmonization moved up to early data management  More efficient, modelling effort is reused  Pain: semantic model for new data  Early days for reasoning: we need your Linked Data first! Conclusions (1/2)
    59. 59. 59 Linked data is  A way to enable computers to help harmonize  Everything has a unique reference  Ontologies say what data means  Mappings specify the relation between datasets  Data integration (almost) trivial  Enable computing with knowledge Conclusions (2/2)
    60. 60. Linked Data Architecture 25 April 2014 In the next few slides we show (simplified) how Linked Data systems work
    61. 61. 61 Most common use: common reference 25 April 2014 Smoker Heavy smoker Light smoker Gene Expression Database Clinical RegistryLinked Data Exchange
    62. 62. 62 Most common use: common reference 25 April 2014 Smoker Heavy smoker Light smoker Gene Expression Database Clinical RegistryLinked Data Exchange Ontologies in Linked Data provide a reference for systems whatever internal structures they use
    63. 63. 63 Most common use: common reference 25 April 2014 Smoker Heavy smoker Light smoker Gene Expression Database Clinical RegistryLinked Data Exchange Systems do not have to agree on one fixed schema One common link suffices to connect resources
    64. 64. 64 Typical Linked Data architecture for data integration applications 64 Linked Data Cache (e.g. running COEUS) Case Study Exposed Linked Data Exposed Linked Data Exposed Linked Data Interface User dependent Source 1 Source 2 Source 3
    65. 65. 65 Typical Linked Data architecture for data integration applications 65 Linked Data Cache (e.g. running COEUS) Case Study Exposed Linked Data Exposed Linked Data Exposed Linked Data Interface User dependent Source 1 Source 2 Source 3 Linked Data can be integrated in a cache Integration is trivial when sources are well- formed Linked Data: when the same URIs were used for the same things, integration is instant
    66. 66. Nanopub Db VoID Data Cache (Virtuoso Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service Data Import CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” VoID Db Nanopub Db VoID Db VoID Nanopub VoID Public Content Commercial Public Ontologies User Annotations Applications OpenPHACTS uses Linked Data for drug discovery
    67. 67. Claim your findings as Nanopublications Nanopublication Mark Thompson, Rajaram Kaliyaperumal 67 It was me, me, me! Finally, a word about Nanopublication, because in our opinion your data contributions should be acknowledged
    68. 68. 68  What do you say with a Nanopublication?  Minimal statement for which you deserve credit  How you came to say it (provenance)  Who should be cited  Preferred Format: Linked Data! Nanopublication
    69. 69. 69  What do you say with a Nanopublication?  Minimal statement for which you deserve credit  How you came to say it (provenance)  Who should be cited  Preferred Format: Linked Data! Nanopublication Science Good Science Acknowledged Good Science Digital
    70. 70. 70 Pain Pain Fame and glory (and reproducibility): Nanopublication! Pain Gain Pain Pain Experiment Data generation Integration Analysis Application Data Knowledge Gain Nano- publications Gain Nano- publications
    71. 71. 71 Pain Pain Fame and glory (and reproducibility): Nanopublication! Pain Gain Pain Pain Experiment Data generation Integration Analysis Application Knowledge Gain Nano- publications Gain Nano- publications Data A new type of gain is the credit you can get for data publication
    72. 72. Acknowledgements Ulrike Braisch (University of ULM, Germany) RD-Connect (EU-FP7) Leiden University Medical Center Dutch Tech Centre for Life Sciences RD-Connect Linked Data and Ontology Task Force, in particular: Pedro Lopes, Rachel Thompson, David Salgado, Peter Robinson, Manual Posada, Estrella Lopez Martin,Mark Thompson, Michael Orth, David van Enckevort BioSemantics team LUMC: Kristina Hettne, Eleni Mina, Tareq Malas, Herman van Haagen, Peter-Bram ‘t Hoen, Rajaram Kaliyaperumal, Zuotian Tatum, Eelke van der Horst, Mark Thompson, Barend Mons These slides are partly based on input and inspiration from Frank van Harmelen, Paul Groth, Scott Marshall, Andrew Gibson, Katy Wolstencroft, Jun Zhao, Robert Stevens, Carole Goble, W3C Health Care and Life Science Interest Group Thank you for your attention… 25 April 2014

    ×