Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Kidney and Urinary Pathways Knowledge Base
(part of e-LICO)
Simon Jupp
University of Manchester
Bio-ontologies, Boston
July 9 2010
July 9, 2010Bio-ontologies, Boston

Kidney and Urinary Knowledge Base and Ontology
KUP KB
(RDF store)
 Specialised repository of KUP related data
 KUP ontology for integration, query and inference
 Background knowledge for data mining experiments
 Collaborative update by the community

Chronic Renal Disease
Obstructive nephropathy
- first cause of end-stage
renal disease in children.
Dialysis or transplantation
- 8000$/patient
A plumbing problem
Kidney
Ureter
Bladder
Urine

Collecting data
Proteome
Metabolome
Genome
urine
tissue
CE-MS
antibody array LC-MS/MS
m/z
600 800 1000 1200 1400 1600
10
20
30
40
50
60
70
80
90
100
Intensity
609.256
b6
755.422
y8
882.357
b9
852.476
y9
995.435
b10
1092.506
b11
1181.252
y12
1318.578
b13
1587.759
b16
1715.817
b18
858.408
b18 ++
794.380
b16 ++
0
miRNA
array
mRNA
array

Genome Proteome MetabolomeOR OR
Identification of pathways instead of molecules

Genome Proteome MetabolomeAND AND
Identification of pathways instead of molecules
!
Identification of nodes in the pathophysiology of obstruction

e-LICO
Expression data
KUP KB
(RDF store)
Text-mining / Image mining
New models
And hypothesis
Further wet lab
experiments
e-LICO FP7 EU project.
e-Laboratory for Interdisciplinary Collaborative research in
data-mining and data-intensive sciences.
http://www.e-lico.eu

e-LICO
Expression data
Text-mining / Image mining
New models
And hypothesis
Further wet lab
experiments
e-LICO FP7 EU project.
e-Laboratory for Interdisciplinary Collaborative research in
data-mining and data-intensive sciences.
http://www.e-lico.eu
KUP KB
(RDF store)
Use Semantic Web technologies (RDF/OWL)
for this part of our infrastructure

REQUIREMENTS
 Need low cost platform for data integration
 Flexible data model
– Community extensions
 Use of controlled vocabularies
– Ontologies for query and inferencing
KUP KB requirements

Kidney and Urinary Pathway Knowledge Base
1. Background knowledge to data-mining experiment
2. Repository of KUP experiments
http://www.e-lico.eu/kupkb
-omics data
Experimental data

KUP KB prototype
 Currently contain set of example queries that use the
KUP ontology to query the data:
– Which Human genes have evidence for upregulation in the glomerulus?
– In which tissue is "PLA2G4A" expressed and in which biological processes does
it participate?
– What proteins participate in TGF-beta signaling pathways are where are they
upregulated in the kidney?

Querying the graph
KUPO Ontology
Entre gene
Gene X GO:0054426
go:biological_process
Gene Y
MA:00345
kupo:002444
PT epithelial cell
rdfs:label
ro:part_of
MA:00456
kupo:004672
DT epithelial cell
rdfs:label
ro:part_of
Higgings Dataset
MA:000345
kupo:expressed_in
Gene Y
MA:00456
kupo:expressed_in
Proximal tubule
Distal tubule
Gene X
Query: What are the genes involved in
Proteins transport expressed in Proximal Tubule Epithelial Cell?

KUP KB: KUP ontology (alpha)
Anatomy (MAO)Anatomy (MAO) Gene Biological
processes(GO)
Gene Biological
processes(GO)
Cells (CTO)Cells (CTO)
part-of
participate-in
Renal
proximal
tubule
Renal
proximal
tubule
Proximal
straight
tubule
Proximal
straight
tubule
Proximal
convoluted
tubule
Proximal
convoluted
tubule
Assertion
Inference
subClassOf
Proximal
tubule
epithelial cell
Proximal
tubule
epithelial cell
Proximal
straight
tubule
epithelial
cell
Proximal
straight
tubule
epithelial
cell
Proximal
convoluted
tubule
epithelial cell
Proximal
convoluted
tubule
epithelial cell
subClassOf
part-of
Renal sodium
absorption
Renal sodium
absorption
Renal sodium
ion absorption
Renal sodium
ion absorption
participates-in
part-of
participates-in
Kidney CortexKidney Cortex
part-of
part-of
Each kidney cell is currently described by its localisation and function

The KUPO development process
Collaborative
Spreadsheet
Collaborative
Spreadsheet
Individual
Spreadsheet
Individual
Spreadsheet
Issue TrackerIssue Tracker
OPPL
Script
Formulation
OPPL
Script
Formulation
Generate
OWL
Generate
OWL
Reasoned
Ontology
Reasoned
Ontology
View OntologyView Ontology

KUP KB: –omics data
Asserted relationship
geneid:17638geneid:17638
Entrez
Gene ID
Entrez
Gene ID
type
FaslFasl
symbol
AC18765AC18765
encodes
UNIPROT
ID
UNIPROT
ID
type
We can represent -omics data as a graph
KEGG
pathway
ID
KEGG
pathway
ID
has:00527has:00527
type
participates-in
Fas-ligandFas-ligand
symbol
ApoptosisApoptosis
symbol

KUP KB: experimental data
Asserted relationship
Geneid:17638Geneid:17638
GEO
Experiment ID
GEO
Experiment ID
GEO:028364GEO:028364
type
sample
Differentially
expressed genes
Differentially
expressed genes
KUPO:
Proximal
straight tubule
KUPO:
Proximal
straight tubule
observation
contains
Higgins et alHiggins et al
contributor
We can represent experimental data as a graph

Connecting the graphs
GEO:028364GEO:028364
sample
Differentially
expressed genes
Differentially
expressed genes
observation
contains
Higgins et alHiggins et al
contributor geneid:17638geneid:17638
FaslFasl
symbol
AC18765AC18765 has:00527has:00527
participates-in
Fas-ligandFas-ligand
symbol
ApoptosisApoptosis
symbol
Renal
proximal
tubule
Renal
proximal
tubule
Proximal
straight
tubule
Proximal
straight
tubule
Proximal
convoluted
tubule
Proximal
convoluted
tubule
subClassOf
Proximal
tubule
epithelial cell
Proximal
tubule
epithelial cell
Proximal
straight
tubule
epithelial
cell
Proximal
straight
tubule
epithelial
cell
Proximal
convoluted
tubule
epithelial cell
Proximal
convoluted
tubule
epithelial cell
subClassOf
part-of
Renal sodium
absorption
Renal sodium
absorption
Renal sodium
ion absorption
Renal sodium
ion absorption
participates-inpart-of
participates-in

Bio2RDF
 Best practices from W3C Health Care and Life Science Working group.
 Bio2RDF ontology as a schema
KUP KB
(RDF store)

So why RDF over RDMS?
 Having a standard representation simply makes my life easier
 Lots of heterogeneous KUP data to be integrated
 RDF allows me to to simply pile more data in
 Natural support for ontologies
 Although limited
 RDF alone isn’t enough
 Next step, intelligent agents and crawlers…
 How do we harness all this connected data

Challenges
 Bad modelling (?)
– Conflation of instances and classes
Cells bears some function (that is realised in some
process) vs Cell participates in some Process
 False statements and vague semantics
– Trying to accommodate the biologists queries
– Mapping natural language to semantic relationships
– Experiments, expression data, gene lists etc.. It’s hard
 Plus a whole list of general Semantic Web related issues

Data mining
 Data mining experiments just started
 SPARQL query to generate tables for background knowledge to
data mining tools
 Mine results for associations, clusters and predictive models.
 Build user friendly tools to hide the underlying technology
 Results expected Y2 (later this year….)

Summary
 Rapid and low cost data integration
– Thanks to existing community efforts!!
 Single SPARQL endpoint provides flexible queries
– Especially useful for our data-mining queries
 Rapid ontology development
– Spreadsheets to engage domain experts

KUP Knowledge Base in e-LICO
KUP KB
(RDF store)
KUP KB
(RDF store)
Bio2RDF
http://www.e-lico.eu/kupkb
E-LICO
Workflows
Use case data
Raw data
E-LICO
DB
E-LICO
DB
E-LICO
Data Analysis
Web interface
Linked Open Data /
Semantic Web /
Bio ontologies
Linked Open Data /
Semantic Web /
Bio ontologies
Query
Results
Shared meta-data

 Julie Klein, Joost Schanstra
– Inserm, France
 Robert Stevens
– University of Manchester
 EuroKUP members who already contributed to the
ontology
Acknowledgements

Challenges
 KUP KB implemented as triple store (Sesame)
– Scalable
– Limited inference (RDFS)
 Experiments with OWL
– Classification possible (Fact++)
– DL Query language lack desirable features
• Joins, Unions, Filters etc..

Challenges 2
 Re-use existing RDF datasets
– Bio2RDF could be improved
– URI guidelines unclear
• PURLs or OBO URI?
 Bio-portal, OBO foundry, Bio2RDF….
– RDF endpoint to bio-portal is great!

Challenges 4
 Warehoused data
– I don’t want to maintain other peoples data
 Linked data and query federation
– What is possible now?
– SADI framework

Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

More Related Content

What's hot

Viewers also liked

Similar to Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

More from robertstevens65

Recently uploaded

Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

Editor's Notes