Kidney and Urinary Pathways Knowledge Base
(part of e-LICO)
Simon Jupp
University of Manchester
Bio-ontologies, Boston
July 9 2010
July 9, 2010Bio-ontologies, Boston
Kidney and Urinary Knowledge Base and Ontology
KUP KB
(RDF store)
 Specialised repository of KUP related data
 KUP ontology for integration, query and inference
 Background knowledge for data mining experiments
 Collaborative update by the community
July 9, 2010Bio-ontologies, Boston
Chronic Renal Disease
Obstructive nephropathy
- first cause of end-stage
renal disease in children.
Dialysis or transplantation
- 8000$/patient
A plumbing problem
Kidney
Ureter
Bladder
Urine
July 9, 2010Bio-ontologies, Boston
Collecting data
Proteome
Metabolome
Genome
urine
tissue
CE-MS
antibody array LC-MS/MS
m/z
600 800 1000 1200 1400 1600
10
20
30
40
50
60
70
80
90
100
Intensity
609.256
b6
755.422
y8
882.357
b9
852.476
y9
995.435
b10
1092.506
b11
1181.252
y12
1318.578
b13
1587.759
b16
1715.817
b18
858.408
b18 ++
794.380
b16 ++
0
miRNA
array
mRNA
array
July 9, 2010Bio-ontologies, Boston
Genome Proteome MetabolomeOR OR
Identification of pathways instead of molecules
July 9, 2010Bio-ontologies, Boston
Genome Proteome MetabolomeAND AND
Identification of pathways instead of molecules
!
Identification of nodes in the pathophysiology of obstruction
July 9, 2010Bio-ontologies, Boston
e-LICO
Expression data
KUP KB
(RDF store)
Text-mining / Image mining
New models
And hypothesis
Further wet lab
experiments
e-LICO FP7 EU project.
e-Laboratory for Interdisciplinary Collaborative research in
data-mining and data-intensive sciences.
http://www.e-lico.eu
July 9, 2010Bio-ontologies, Boston
e-LICO
Expression data
Text-mining / Image mining
New models
And hypothesis
Further wet lab
experiments
e-LICO FP7 EU project.
e-Laboratory for Interdisciplinary Collaborative research in
data-mining and data-intensive sciences.
http://www.e-lico.eu
KUP KB
(RDF store)
Use Semantic Web technologies (RDF/OWL)
for this part of our infrastructure
July 9, 2010Bio-ontologies, Boston
REQUIREMENTS
 Need low cost platform for data integration
 Flexible data model
– Community extensions
 Use of controlled vocabularies
– Ontologies for query and inferencing
KUP KB requirements
July 9, 2010Bio-ontologies, Boston
Kidney and Urinary Pathway Knowledge Base
1. Background knowledge to data-mining experiment
2. Repository of KUP experiments
http://www.e-lico.eu/kupkb
-omics data
Experimental data
July 9, 2010Bio-ontologies, Boston
KUP KB prototype
 Currently contain set of example queries that use the
KUP ontology to query the data:
– Which Human genes have evidence for upregulation in the glomerulus?
– In which tissue is "PLA2G4A" expressed and in which biological processes does
it participate?
– What proteins participate in TGF-beta signaling pathways are where are they
upregulated in the kidney?
July 9, 2010Bio-ontologies, Boston
Querying the graph
KUPO Ontology
Entre gene
Gene X GO:0054426
go:biological_process
Gene Y
MA:00345
kupo:002444
PT epithelial cell
rdfs:label
ro:part_of
MA:00456
kupo:004672
DT epithelial cell
rdfs:label
ro:part_of
Higgings Dataset
MA:000345
kupo:expressed_in
Gene Y
MA:00456
kupo:expressed_in
Proximal tubule
Distal tubule
Gene X
Query: What are the genes involved in
Proteins transport expressed in Proximal Tubule Epithelial Cell?
July 9, 2010Bio-ontologies, Boston
KUP KB: KUP ontology (alpha)
Anatomy (MAO)Anatomy (MAO) Gene Biological
processes(GO)
Gene Biological
processes(GO)
Cells (CTO)Cells (CTO)
part-of
participate-in
Renal
proximal
tubule
Renal
proximal
tubule
Proximal
straight
tubule
Proximal
straight
tubule
Proximal
convoluted
tubule
Proximal
convoluted
tubule
Assertion
Inference
subClassOf
Proximal
tubule
epithelial cell
Proximal
tubule
epithelial cell
Proximal
straight
tubule
epithelial
cell
Proximal
straight
tubule
epithelial
cell
Proximal
convoluted
tubule
epithelial cell
Proximal
convoluted
tubule
epithelial cell
subClassOf
part-of
Renal sodium
absorption
Renal sodium
absorption
Renal sodium
ion absorption
Renal sodium
ion absorption
participates-in
part-of
participates-in
Kidney CortexKidney Cortex
part-of
part-of
Each kidney cell is currently described by its localisation and function
July 9, 2010Bio-ontologies, Boston
The KUPO development process
Collaborative
Spreadsheet
Collaborative
Spreadsheet
Individual
Spreadsheet
Individual
Spreadsheet
Issue TrackerIssue Tracker
OPPL
Script
Formulation
OPPL
Script
Formulation
Generate
OWL
Generate
OWL
Reasoned
Ontology
Reasoned
Ontology
View OntologyView Ontology
July 9, 2010Bio-ontologies, Boston
KUP KB: –omics data
Asserted relationship
geneid:17638geneid:17638
Entrez
Gene ID
Entrez
Gene ID
type
FaslFasl
symbol
AC18765AC18765
encodes
UNIPROT
ID
UNIPROT
ID
type
We can represent -omics data as a graph
KEGG
pathway
ID
KEGG
pathway
ID
has:00527has:00527
type
participates-in
Fas-ligandFas-ligand
symbol
ApoptosisApoptosis
symbol
July 9, 2010Bio-ontologies, Boston
KUP KB: experimental data
Asserted relationship
Geneid:17638Geneid:17638
GEO
Experiment ID
GEO
Experiment ID
GEO:028364GEO:028364
type
sample
Differentially
expressed genes
Differentially
expressed genes
KUPO:
Proximal
straight tubule
KUPO:
Proximal
straight tubule
observation
contains
Higgins et alHiggins et al
contributor
We can represent experimental data as a graph
July 9, 2010Bio-ontologies, Boston
Connecting the graphs
GEO:028364GEO:028364
sample
Differentially
expressed genes
Differentially
expressed genes
observation
contains
Higgins et alHiggins et al
contributor geneid:17638geneid:17638
FaslFasl
symbol
AC18765AC18765 has:00527has:00527
participates-in
Fas-ligandFas-ligand
symbol
ApoptosisApoptosis
symbol
Renal
proximal
tubule
Renal
proximal
tubule
Proximal
straight
tubule
Proximal
straight
tubule
Proximal
convoluted
tubule
Proximal
convoluted
tubule
subClassOf
Proximal
tubule
epithelial cell
Proximal
tubule
epithelial cell
Proximal
straight
tubule
epithelial
cell
Proximal
straight
tubule
epithelial
cell
Proximal
convoluted
tubule
epithelial cell
Proximal
convoluted
tubule
epithelial cell
subClassOf
part-of
Renal sodium
absorption
Renal sodium
absorption
Renal sodium
ion absorption
Renal sodium
ion absorption
participates-inpart-of
participates-in
July 9, 2010Bio-ontologies, Boston
Bio2RDF
 Best practices from W3C Health Care and Life Science Working group.
 Bio2RDF ontology as a schema
KUP KB
(RDF store)
July 9, 2010Bio-ontologies, Boston
So why RDF over RDMS?
 Having a standard representation simply makes my life easier
 Lots of heterogeneous KUP data to be integrated
 RDF allows me to to simply pile more data in
 Natural support for ontologies
 Although limited
 RDF alone isn’t enough
 Next step, intelligent agents and crawlers…
 How do we harness all this connected data
July 9, 2010Bio-ontologies, Boston
Challenges
 Bad modelling (?)
– Conflation of instances and classes
Cells bears some function (that is realised in some
process) vs Cell participates in some Process
 False statements and vague semantics
– Trying to accommodate the biologists queries
– Mapping natural language to semantic relationships
– Experiments, expression data, gene lists etc.. It’s hard
 Plus a whole list of general Semantic Web related issues
July 9, 2010Bio-ontologies, Boston
Data mining
 Data mining experiments just started
 SPARQL query to generate tables for background knowledge to
data mining tools
 Mine results for associations, clusters and predictive models.
 Build user friendly tools to hide the underlying technology
 Results expected Y2 (later this year….)
July 9, 2010Bio-ontologies, Boston
Summary
 Rapid and low cost data integration
– Thanks to existing community efforts!!
 Single SPARQL endpoint provides flexible queries
– Especially useful for our data-mining queries
 Rapid ontology development
– Spreadsheets to engage domain experts
July 9, 2010Bio-ontologies, Boston
KUP Knowledge Base in e-LICO
KUP KB
(RDF store)
KUP KB
(RDF store)
Bio2RDF
http://www.e-lico.eu/kupkb
E-LICO
Workflows
Use case data
Raw data
E-LICO
DB
E-LICO
DB
E-LICO
Data Analysis
Web interface
Linked Open Data /
Semantic Web /
Bio ontologies
Linked Open Data /
Semantic Web /
Bio ontologies
Query
Results
Shared meta-data
July 9, 2010Bio-ontologies, Boston
 Julie Klein, Joost Schanstra
– Inserm, France
 Robert Stevens
– University of Manchester
 EuroKUP members who already contributed to the
ontology
Acknowledgements
July 9, 2010Bio-ontologies, Boston
Challenges
 KUP KB implemented as triple store (Sesame)
– Scalable
– Limited inference (RDFS)
 Experiments with OWL
– Classification possible (Fact++)
– DL Query language lack desirable features
• Joins, Unions, Filters etc..
July 9, 2010Bio-ontologies, Boston
Challenges 2
 Re-use existing RDF datasets
– Bio2RDF could be improved
– URI guidelines unclear
• PURLs or OBO URI?
 Bio-portal, OBO foundry, Bio2RDF….
– RDF endpoint to bio-portal is great!
July 9, 2010Bio-ontologies, Boston
Challenges 4
 Warehoused data
– I don’t want to maintain other peoples data
 Linked data and query federation
– What is possible now?
– SADI framework
July 9, 2010Bio-ontologies, Boston

Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

  • 1.
    Kidney and UrinaryPathways Knowledge Base (part of e-LICO) Simon Jupp University of Manchester Bio-ontologies, Boston July 9 2010 July 9, 2010Bio-ontologies, Boston
  • 2.
    Kidney and UrinaryKnowledge Base and Ontology KUP KB (RDF store)  Specialised repository of KUP related data  KUP ontology for integration, query and inference  Background knowledge for data mining experiments  Collaborative update by the community July 9, 2010Bio-ontologies, Boston
  • 3.
    Chronic Renal Disease Obstructivenephropathy - first cause of end-stage renal disease in children. Dialysis or transplantation - 8000$/patient A plumbing problem Kidney Ureter Bladder Urine July 9, 2010Bio-ontologies, Boston
  • 4.
    Collecting data Proteome Metabolome Genome urine tissue CE-MS antibody arrayLC-MS/MS m/z 600 800 1000 1200 1400 1600 10 20 30 40 50 60 70 80 90 100 Intensity 609.256 b6 755.422 y8 882.357 b9 852.476 y9 995.435 b10 1092.506 b11 1181.252 y12 1318.578 b13 1587.759 b16 1715.817 b18 858.408 b18 ++ 794.380 b16 ++ 0 miRNA array mRNA array July 9, 2010Bio-ontologies, Boston
  • 5.
    Genome Proteome MetabolomeOROR Identification of pathways instead of molecules July 9, 2010Bio-ontologies, Boston
  • 6.
    Genome Proteome MetabolomeANDAND Identification of pathways instead of molecules ! Identification of nodes in the pathophysiology of obstruction July 9, 2010Bio-ontologies, Boston
  • 7.
    e-LICO Expression data KUP KB (RDFstore) Text-mining / Image mining New models And hypothesis Further wet lab experiments e-LICO FP7 EU project. e-Laboratory for Interdisciplinary Collaborative research in data-mining and data-intensive sciences. http://www.e-lico.eu July 9, 2010Bio-ontologies, Boston
  • 8.
    e-LICO Expression data Text-mining /Image mining New models And hypothesis Further wet lab experiments e-LICO FP7 EU project. e-Laboratory for Interdisciplinary Collaborative research in data-mining and data-intensive sciences. http://www.e-lico.eu KUP KB (RDF store) Use Semantic Web technologies (RDF/OWL) for this part of our infrastructure July 9, 2010Bio-ontologies, Boston
  • 9.
    REQUIREMENTS  Need lowcost platform for data integration  Flexible data model – Community extensions  Use of controlled vocabularies – Ontologies for query and inferencing KUP KB requirements July 9, 2010Bio-ontologies, Boston
  • 10.
    Kidney and UrinaryPathway Knowledge Base 1. Background knowledge to data-mining experiment 2. Repository of KUP experiments http://www.e-lico.eu/kupkb -omics data Experimental data July 9, 2010Bio-ontologies, Boston
  • 11.
    KUP KB prototype Currently contain set of example queries that use the KUP ontology to query the data: – Which Human genes have evidence for upregulation in the glomerulus? – In which tissue is "PLA2G4A" expressed and in which biological processes does it participate? – What proteins participate in TGF-beta signaling pathways are where are they upregulated in the kidney? July 9, 2010Bio-ontologies, Boston
  • 12.
    Querying the graph KUPOOntology Entre gene Gene X GO:0054426 go:biological_process Gene Y MA:00345 kupo:002444 PT epithelial cell rdfs:label ro:part_of MA:00456 kupo:004672 DT epithelial cell rdfs:label ro:part_of Higgings Dataset MA:000345 kupo:expressed_in Gene Y MA:00456 kupo:expressed_in Proximal tubule Distal tubule Gene X Query: What are the genes involved in Proteins transport expressed in Proximal Tubule Epithelial Cell? July 9, 2010Bio-ontologies, Boston
  • 13.
    KUP KB: KUPontology (alpha) Anatomy (MAO)Anatomy (MAO) Gene Biological processes(GO) Gene Biological processes(GO) Cells (CTO)Cells (CTO) part-of participate-in Renal proximal tubule Renal proximal tubule Proximal straight tubule Proximal straight tubule Proximal convoluted tubule Proximal convoluted tubule Assertion Inference subClassOf Proximal tubule epithelial cell Proximal tubule epithelial cell Proximal straight tubule epithelial cell Proximal straight tubule epithelial cell Proximal convoluted tubule epithelial cell Proximal convoluted tubule epithelial cell subClassOf part-of Renal sodium absorption Renal sodium absorption Renal sodium ion absorption Renal sodium ion absorption participates-in part-of participates-in Kidney CortexKidney Cortex part-of part-of Each kidney cell is currently described by its localisation and function July 9, 2010Bio-ontologies, Boston
  • 14.
    The KUPO developmentprocess Collaborative Spreadsheet Collaborative Spreadsheet Individual Spreadsheet Individual Spreadsheet Issue TrackerIssue Tracker OPPL Script Formulation OPPL Script Formulation Generate OWL Generate OWL Reasoned Ontology Reasoned Ontology View OntologyView Ontology July 9, 2010Bio-ontologies, Boston
  • 15.
    KUP KB: –omicsdata Asserted relationship geneid:17638geneid:17638 Entrez Gene ID Entrez Gene ID type FaslFasl symbol AC18765AC18765 encodes UNIPROT ID UNIPROT ID type We can represent -omics data as a graph KEGG pathway ID KEGG pathway ID has:00527has:00527 type participates-in Fas-ligandFas-ligand symbol ApoptosisApoptosis symbol July 9, 2010Bio-ontologies, Boston
  • 16.
    KUP KB: experimentaldata Asserted relationship Geneid:17638Geneid:17638 GEO Experiment ID GEO Experiment ID GEO:028364GEO:028364 type sample Differentially expressed genes Differentially expressed genes KUPO: Proximal straight tubule KUPO: Proximal straight tubule observation contains Higgins et alHiggins et al contributor We can represent experimental data as a graph July 9, 2010Bio-ontologies, Boston
  • 17.
    Connecting the graphs GEO:028364GEO:028364 sample Differentially expressedgenes Differentially expressed genes observation contains Higgins et alHiggins et al contributor geneid:17638geneid:17638 FaslFasl symbol AC18765AC18765 has:00527has:00527 participates-in Fas-ligandFas-ligand symbol ApoptosisApoptosis symbol Renal proximal tubule Renal proximal tubule Proximal straight tubule Proximal straight tubule Proximal convoluted tubule Proximal convoluted tubule subClassOf Proximal tubule epithelial cell Proximal tubule epithelial cell Proximal straight tubule epithelial cell Proximal straight tubule epithelial cell Proximal convoluted tubule epithelial cell Proximal convoluted tubule epithelial cell subClassOf part-of Renal sodium absorption Renal sodium absorption Renal sodium ion absorption Renal sodium ion absorption participates-inpart-of participates-in July 9, 2010Bio-ontologies, Boston
  • 18.
    Bio2RDF  Best practicesfrom W3C Health Care and Life Science Working group.  Bio2RDF ontology as a schema KUP KB (RDF store) July 9, 2010Bio-ontologies, Boston
  • 19.
    So why RDFover RDMS?  Having a standard representation simply makes my life easier  Lots of heterogeneous KUP data to be integrated  RDF allows me to to simply pile more data in  Natural support for ontologies  Although limited  RDF alone isn’t enough  Next step, intelligent agents and crawlers…  How do we harness all this connected data July 9, 2010Bio-ontologies, Boston
  • 20.
    Challenges  Bad modelling(?) – Conflation of instances and classes Cells bears some function (that is realised in some process) vs Cell participates in some Process  False statements and vague semantics – Trying to accommodate the biologists queries – Mapping natural language to semantic relationships – Experiments, expression data, gene lists etc.. It’s hard  Plus a whole list of general Semantic Web related issues July 9, 2010Bio-ontologies, Boston
  • 21.
    Data mining  Datamining experiments just started  SPARQL query to generate tables for background knowledge to data mining tools  Mine results for associations, clusters and predictive models.  Build user friendly tools to hide the underlying technology  Results expected Y2 (later this year….) July 9, 2010Bio-ontologies, Boston
  • 22.
    Summary  Rapid andlow cost data integration – Thanks to existing community efforts!!  Single SPARQL endpoint provides flexible queries – Especially useful for our data-mining queries  Rapid ontology development – Spreadsheets to engage domain experts July 9, 2010Bio-ontologies, Boston
  • 23.
    KUP Knowledge Basein e-LICO KUP KB (RDF store) KUP KB (RDF store) Bio2RDF http://www.e-lico.eu/kupkb E-LICO Workflows Use case data Raw data E-LICO DB E-LICO DB E-LICO Data Analysis Web interface Linked Open Data / Semantic Web / Bio ontologies Linked Open Data / Semantic Web / Bio ontologies Query Results Shared meta-data July 9, 2010Bio-ontologies, Boston
  • 24.
     Julie Klein,Joost Schanstra – Inserm, France  Robert Stevens – University of Manchester  EuroKUP members who already contributed to the ontology Acknowledgements July 9, 2010Bio-ontologies, Boston
  • 25.
    Challenges  KUP KBimplemented as triple store (Sesame) – Scalable – Limited inference (RDFS)  Experiments with OWL – Classification possible (Fact++) – DL Query language lack desirable features • Joins, Unions, Filters etc.. July 9, 2010Bio-ontologies, Boston
  • 26.
    Challenges 2  Re-useexisting RDF datasets – Bio2RDF could be improved – URI guidelines unclear • PURLs or OBO URI?  Bio-portal, OBO foundry, Bio2RDF…. – RDF endpoint to bio-portal is great! July 9, 2010Bio-ontologies, Boston
  • 27.
    Challenges 4  Warehouseddata – I don’t want to maintain other peoples data  Linked data and query federation – What is possible now? – SADI framework July 9, 2010Bio-ontologies, Boston

Editor's Notes

  • #14 We initially chose a KUP portion of the FMA, but domain experts found that there was too much detail in some sections and not enough in others. In addition, too many ontological distinctions were made within the portion of the FMA and the consequent dispersal of information made it hard to use. In time, we could have refined the FMA to do the job required, but we found that the MAO had all the detail for our needs. Although the connecting tubule is absent in mouse and present in humans, the MAO has this entity. Therefore the MAO can act as a substitute for the human anatomy.
  • #16 The should have seen this before, also the KUP day at manchester in november. Year 2 add urinary pathway, diseases
  • #17 The should have seen this before, also the KUP day at manchester in november. Year 2 add urinary pathway, diseases
  • #18 The should have seen this before, also the KUP day at manchester in november. Year 2 add urinary pathway, diseases