Two Reason-able Views to the Web of Linked Data - Presentation Transcript
Two Reason-able Views to the Web
of Linked Data
Atanas Kiryakov, Vassil Momtchev
June, 2009
Presentation Outline
• Ontotext
• OWLIM semantic repository: introduction and latest news
• Reason-able views to the web of linked data
• LDSR: Linked Data Semantic Repository
• PIKB: Pathway and Interaction Knowledge Base
Two Reason-able Views to the Web of Linked Data June, 2009 #2
What is Ontotext?
• Ontotext is a semantic technology provider
• Established in year 2000 as part of Sirma Group
– Sirma is a top-3 software house in Bulgaria, est. 1992, ~300 persons
• Staff: 40 employees in Sofia and Varna
– Multiple affiliates and contractors in Western Europe
• Over 150 person-years invested in product development
• Investment acquired in July 2008
– A financial investor obtained minority share in a deal for 2.5 MEURO
• Ontotext is involved in two joint ventures:
– Innonvantage: online recruitment intelligence provider in UK
– Namerimi: national search engine in Bulgaria
Two Reason-able Views to the Web of Linked Data June, 2009 #3
Ontotext Positioning
• Unique coverage of research/technology areas, including:
– Semantic Databases: high-performance RDF DBMS, scalable reasoning
– Semantic Search: text-mining (IE), Information Retrieval (IR)
– Semantic Web Services and BPM: WS annotation, discovery, etc.
– Web Mining: focused crawling, wrapping
– Knowledge fusion: identity resolution, record linkage
• Core business: development of semantic engines
– Mostly product development and sales
– Complemented by professional services
– Joint ventures for vertical applications
Two Reason-able Views to the Web of Linked Data June, 2009 #4
Application Domains
• Ontotext technologies are used for various applications:
– Data Integration (consolidation of multiple databases)
– Knowledge & Content Management (enterprise search)
– Business Intelligence
– Enterprise Application Integration & Business Process Management
– Web-mining/Web-intelligence
• Major industries/markets
– Life sciences
– Telecommunications
– Media Archives, Media Research
– Online recruitment
– IP/Patent Research
– Web search, Web 2.0 and Semantic Web start-ups
Two Reason-able Views to the Web of Linked Data June, 2009 #5
Leading Semantic Web Technology Developer
Ontotext develops several outstanding products:
• KIM semantic annotation & search platform
– Google 1st place for “semantic annotation”
• OWLIM semantic database
– Google 1st place for “semantic repository”
• wsmo4j & WSMO Studio – semantic web service annotation platform
– lead developer of popular open-source projects
Ontotext is a major contributor to open-source projects:
• GATE – the most popular NLP and text-mining platform
• Sesame – one of the most popular framework for RDF repositories
Two Reason-able Views to the Web of Linked Data June, 2009 #6
Extensive Involvement in Research Projects
• Ontotext has participated in 20+ EC research projects
• > 100 MEuro is the budget of the projects Ontotext is part of
– This is above 10% of the EC projects related to semantics
• Ontotext is the most successful Bulgarian company in FP6
• Ontotext is part of four FP7 projects, running until 2011:
– LARKC: web-scale reasoning
– soa4all: SOA for the masses through Semantic Web technology
– NoTube: semantics for personalized TV guides
– Insemtives: Incentive models and framework for semantic metadata
Two Reason-able Views to the Web of Linked Data June, 2009 #7
Presentation Outline
• Ontotext
• OWLIM semantic repository: introduction and latest news
• Reason-able views to the web of linked data
• LDSR: Linked Data Semantic Repository
• PIKB: Pathway and Interaction Knowledge Base
Two Reason-able Views to the Web of Linked Data June, 2009 #8
Semantic Repository for RDFS and OWL
• OWLIM is a scalable semantic repository which allows
– Management, integration, and analysis of heterogeneous data
– Combined with light-weight reasoning capabilities
• Its is capable to replace RDBMS in many applications
– Suitable for analytical tasks and Business Intelligence (OLAP)
– Inappropriate for highly dynamic OLTP environments
• OWLIM is RDF database with high-performance reasoning:
– The inference is based on logical rule-entailment
– Full RDFS and limited OWL Lite and Horst are supported
– Custom semantics defined via rules and axiomatic triples
Two Reason-able Views to the Web of Linked Data June, 2009 #9
Lightweight Semantics
• Still not sure what is the added value of ontologies at the end?
• Hard to comprehend what is “satisfiability”?
• Tough to predict, manage, tune, and scale?
• Probably someone is trying to sell you the ideas from his PhD
thesis and not an industrial data management technology
• Ontotext is productizing lightweight semantics that is easy to
understand, deploy, and manage
• For instance, think of ontologies as database schema with
simple rules. Plenty of obvious, but useful, inferences are just
around the corner
Two Reason-able Views to the Web of Linked Data June, 2009 #10
Rule-Based Inference
owl:SymmetricProperty owl:inverseOf
<C1,rdfs:subClassOf,C2> owl:inverseOf
<C2,rdfs:subClassOf,C3>
⇒ <C1,rdfs:subClassOf,C3> owl:relativeOf ptop:parentOf
ptop:Agent rdfs:subPropertyOf
<I,rdf:type,C1>
owl:inverseOf
rdf
owl:inverseOf
<C1,rdfs:subClassOf,C2> s: s
ub owl:inverseOf
⇒ <I,rdf:type,C2> Cla
s sO
f
<I1,P1,I2> ptop:Person
<P1,rdfs:range,C2> rdfs:range
⇒ <I2,rdf:type,C2> rd f ptop:childOf
s:s
ubC
<P1,owl:inverseOf,P2> myData:Ivan las
rd f sO
<I1,P1,I2> s:s f
ubC
⇒ <I2,P2,I1> las
pto sO
p:c f
pto hil
<P1,rdf:type,owl:SymmetricProperty> p:p dO ptop:Woman
are f
⇒ <P1,owl:inverseOf,P1> pto n tO
p:r f
ela inferred
tiv
eO
f myData: Maria rdf:type
Two Reason-able Views to the Web of Linked Data June, 2009 #11
Using OWLIM
• OWLIM is implemented as storage and inference layer (SAIL) for Sesame
• OWLIM is based on TRREE
– TRREE = Triple Reasoning and Rule Entailment Engine
– TRREE takes care of storage, indexing, inference, and query evaluation
– TRREE has different flavors, mapping to different OWLIM species
• OWLIM can be used and accessed in different ways:
– By end users: through the web UI routines of Sesame
• Though Ontology Editors, integrated with Sesame, e.g. Top Braid Composer
– By applications: through the API’s of Sesame
• embed it as a library or
• access it as standalone server
Two Reason-able Views to the Web of Linked Data June, 2009 #12
Sesame, TRREE, ORDI, and OWLIM
Sesame Web UI
User Application Sesame
or Ontology Editor
SAIL API
OWLIM
TRREE
ORDI
Engine
Two Reason-able Views to the Web of Linked Data June, 2009 #13
SwiftOWLIM and BigOWLIM
• Two OWLIM species: SwiftOWLIM and BigOWLIM
– Share the same inference and semantics (rule-compiler, etc.)
– They are identical in terms of usage and integration
• The same APIs, syntaxes, query languages (thanks to Sesame)
• Different are only the configuration parameters for performance tuning
• SwiftOWLIM is good for experiments and medium-sized data
– Extremely fast loading of data (incl. inference, storage, etc.)
• BigOWLIM is designed to handle huge volumes of data and
massive querying loads
– Query optimizations ensure faster query evaluation on large datasets
– Scales much better, having lower memory requirements
Two Reason-able Views to the Web of Linked Data June, 2009 #14
SwiftOWLIM and BigOWLIM (II)
SwiftOWLIM BigOWLIM
Scale 10 MSt, using 1.6 GB RAM 130 MSt, using 1.6GB
(Mill. of explicit statem.) 100 MSt, using 16 GB RAM 1068 MSt, using 8GB
Processing speed 30 KSt/s on notebook 5 KSt/s on notebook
(load+infer+store) 200 KSt/s on server 80 KSt/s on server
Query optimization No Yes
Persistence Back-up in N-Triples Binary data files and indices
Licence and Availability Open-source under LGPL; Commercial. Research and
Uses SwiftTRREE that is evaluation copies provided for
free, but not open-source free
Two Reason-able Views to the Web of Linked Data June, 2009 #15
Named Graphs, SPARQL, Sesame 2.0
• Named graphs (NG) represent an extension of the RDF model
– Quadruples <s,p,o,ng> are used to define RDF multi-graph
– Allow for handling provenance when multiple RDF graphs are
integrated
• SPARQL is the most popular RDF query language
– Comprehensive support for SPARQL requires NG support
• Sesame is the most efficient RDF framework
– Versions 2.x and later supports NG (under the name “contexts”)
– It also supports SPARQL
Two Reason-able Views to the Web of Linked Data June, 2009 #16
OWLIM Versions and Features Map
Sesame SPARQL Instant owl:sameAs Comment
version initializ. optimization
SwiftOWLIM 2.9.x 1.2.x - - - The fastest OWL database.
Multi-threaded inference, with
transitive inference optimiz.
BigOWLIM 2.x 1.2.x - + + Optimal performance and
scalability. The fastest query
evaluation. Successor of 0.9.x
SwiftOWLIM 3.x 2.x + + - The fastest RDF machine with
NG and SPARQL support
BigOWLIM 3.x 2.x + + + Ultimate scalability and fast
SPARQL evaluation
Two Reason-able Views to the Web of Linked Data June, 2009 #17
Outstanding Performance
• SwiftOWLIM is the fastest OWL engine!
– It scales to 10 million statements on a desktop PC (32-bit)
– It loads LUBM(50) in 42 seconds at average speed 161 KSt./sec.
• BigOWLIM is the most scalable OWL engine!
– It can load and reason with 8 bill. statements on a $10,000 server
• LUBM(64k) loaded in about 9 days with inference and materialization
– Loads the 14 billion statements of LUBM(64k) after materialization
– Loads the 1 bill. Statements of LUBM(8k) in 14 hours and answers the
queries in 1 hour on a $2000 workstation
• “Full-cycle” loading, inference, and query evaluation in 15.2 hours
Two Reason-able Views to the Web of Linked Data June, 2009 #18
Scalable Inference Map: Introduction
• The map on the next slide presents the loading speed of few
of the most scalable repositories in relation to the size of the
dataset and the complexity of the loading
– Most recently published evaluation results have been used
– The map includes runs of the LUBM and loading of Uniprot and LDSR
– For OWLIM, ORACLE and DAML DB, loading includes forward-chaining and
materialization
• The complexity of the reasoning reflects the language used
and the specificity of the instance data
– For instance, UNIPROT is much heavily interconnected than LUBM
Two Reason-able Views to the Web of Linked Data June, 2009 #19
Scalable Reasoning Map (up to 1.5B)
Two Reason-able Views to the Web of Linked Data June, 2009 #20
Scalable Reasoning Map (the big picture)
Bubble size indicates
140
loading complexity
(bigger is better)
cluster of 14
120 8-core blades
100
sub-$2000
4-core desktop
80
b
h
e
g
c
s
/
r
t
)
.
,
i
sub-$10,000
60 8-core server
(1
0
40
20
o
p
n
d
e
a
g
S
L
i
0
0 2 4 6 8 10 12 14
Dataset size (bill. explicit statements)
BigOWLIM AllegroGraph Virtuoso Jena TDB BigData
Two Reason-able Views to the Web of Linked Data June, 2009 #21
Naïve OWL Fragments Map
Complexity*
OWL Full
SWRL
OWL DL
OWL Lite
OWL/WSML Flight
Datalog
OWLIM / OWL2 RL
OWL Horst / Tiny
OWL Lite- / DHL
OWL DLP
RDFS
Rules, LP DL
Two Reason-able Views to the Web of Linked Data June, 2009 #22
Semantics Supported by OWLIM
• The ruleset parameter allows for switching between 4
predefined inference modes:
– owl-max – the most expressive set (see the next slides);
– owl-horst – a set similar to the one defined in [Horst05]:
• It is sufficient to pass the LUBM benchmark correctly;
• Similar to the OWL fragment supported by ORACLE
– rdfs – the standard RDF(S) semantics;
– empty – as an RDF store without any inference.
• The partialRDFS parameter allows switching on/off an
optimization in the RDFS and OWL support
• Custom rule-sets can be defined (rules + axiomatic triples)
– This way, one can specify semantics which best fits the concrete application
in terms of expressivity and performance
Two Reason-able Views to the Web of Linked Data June, 2009 #23
LUBM(50,0): Rule-set and Inference Mode
240
e
c
s
)
(
.
181
131
105
86 85
73
0 o
5 )L
n
d
c
e
a
%
r
%
%
%
f
I
8 %
8 %
7 %
9
2
2
7
1
5
2
1
0
1
2
1
0
M
U
owl-max owl-max, p* horst horst, p* rdfs rdfs, p* empty
B
L
(
• p* above means that the partialRDFS optimizations are switched on
• Since SwiftOWLIM 2.9.1, there are few “optimizations” which partialRDFS
triggers in the OWL support in rule-sets owl-horst and owl-max
Two Reason-able Views to the Web of Linked Data June, 2009 #24
LUBM(50,0): Multi-threaded Inference
344
(sec.)
299
Load and Inference
278
141
113 107 105 110
LUBM(50)
134%
108%
102%
100%
105%
124%
100%
108%
4cOpt12g, 4cOpt12g, 4cOpt12g, 4cOpt12g, 4cOpt12g, Piv0.9g, Piv0.9g, Piv0.9g,
lin64 - 1 lin64 - 2 lin64 - 3 lin64 - 4 lin64 - 5 w in32 - 1 w in32 - 2 w in32 - 3
Refer to OWLIM’s system documentation for analysis and comments
Two Reason-able Views to the Web of Linked Data June, 2009 #25
Query Performance
• Berlin SPARQL Benchmark, evaluates the performance of query
engines in e-commerce use case: searching products and
navigating through related information
• Randomized “query mixes” (of 25 queries each) are evaluated
continuously towards datasets of different size
• Multiple-clients load is simulated as well.
BSBM 25M Query Evaluation
• The diagram 10,000
compares the 8,000
BigOWLIM 3.1
6,000
performance of few Sesame 2.2.4
4,000 Jena TDB 0.72
of the most popular 2,000 Virtuoso TS 5.0
engines
m
Q
0
o
h
p
u
e
y
x
s
r
i
1 client 4 clients 8 clients
Two Reason-able Views to the Web of Linked Data June, 2009 #26
OWLIM in Use
• BigOWLIM is used for data-integration in life sciences
– Large scale protein-interaction related data in LifeSKIM platform
• SwiftOWLIM is bundled as ontology service in GATE 4.0/5.0
– GATE is the most popular text-mining platform
• OWLIM is used as a semantic repository in KIM
– KIM is a semantic annotation and search platform of Ontotext
• TopBraid Composer bundles OWLIM as a reasoner
• OWLIM is used in more than 10 European research projects
• OWLIM is used by top-5 US defense contracotor
– But also many 5-person startups
Two Reason-able Views to the Web of Linked Data June, 2009 #27
TopBraid Composer v.2.3 Announcement
“Our initial tests with OWLIM indicate that OWLIM may become
a serious alternative to the better-known engines such as Pellet
and Jena. My colleague Dean Allemang is using OWLIM to
classify ontologies that contain tons of individuals. His models
are essentially impossible to handle with Pellet…”
Holger Knublauch, TopQuadrant’s VP Product Development, in
his blog announcement of the new version of TBC
Two Reason-able Views to the Web of Linked Data June, 2009 #28
Benchmarks of BBN (the DAML-DB developers)
“We feel that the triple-stores that offered the best all-around
performance for operations with a large dataset were Sesame +
DAML DB, Jena + DAML DB, and Sesame + BigOWLIM. Each of
these triple-stores has their own relative merits. Most
importantly, all three of them provide adequate query response
time performance for various queries, but no one triple-store is
clearly better than the other triple-stores in all cases under the
conditions evaluated in this study. For instance, Sesame +
BigOWLIM provides better response time than the other triple-
stores when responding to complex queries. …”
Ruhloff, K; Dean, M; Emmons, I; Ryder, D; Sumner, J. (2007) An
Evaluation of Triple-Store Technologies for Large Data Stores. In
Proc. Of Scalable Semantic Systems Workshop (SSSW 2007).
Two Reason-able Views to the Web of Linked Data June, 2009 #29
Benchmarks of the KAON2 Developers
“OWLIM performed very well, while still being able to process
OWL DLP, and hence should be the choice for ABox reasoning
with lightweight ontologies.”
J. Bock, P. Haase, Q. Ji, R. Volz: Benchmarking OWL Reasoners. In
ARea2008 - Workshop on Advancing Reasoning on the Web
Two Reason-able Views to the Web of Linked Data June, 2009 #30
Presentation Outline
• Ontotext
• OWLIM semantic repository: introduction and latest news
• Reason-able views to the web of linked data
• LDSR: Linked Data Semantic Repository
• PIKB: Pathway and Interaction Knowledge Base
Two Reason-able Views to the Web of Linked Data June, 2009 #31
Linking Open Data
• Linking Open Data (LOD) W3C SWEO Community project
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
• Initiative for publishing “linked data” – a set of principles,
which allows browsing of RDF data, spread across different
servers, in the way HTML is browsed
Two Reason-able Views to the Web of Linked Data June, 2009 #32
Reason-able views to the LOD
• Classical sound and complete reasoning is unfeasible to a
web of linked data.
• The major obstacles:
– Most of the popular reasoning setups count on “closed world
assumption” (which is irrelevant in web context)
– The complexity of reasoning even the with the simplest DL (say
OWL Lite) is prohibitatively high
– Some of the datasets of LOD (or some parts of them) are not
suitable for reasoning. It seems that some data publishers use the
OWL and RDFS vocabulary without account for its formal semantics
– Although reasoning with data distributed across different web
servers is possible it is much slower than reasoning with local data.
The fundamental reason is related to the so called "remote join"
problem known from the distributed DBMS
Two Reason-able Views to the Web of Linked Data #33
Reason-able views to the LOD (2)
• Reason-able views represent an approach for reasoning
with the web of linked data
• Key ideas:
– Inference with respect to tractable OWL dialects
– group selected datasets and ontologies in a reason-able view
– load all ontologies and data in a single semantic repository
• Selection Criteria:
– the dataset (or a part of it that is easy to define and isolate) allows
inference, which delivers meaningful results under the semantics
determined for the view;
– the dataset is more or less static, i.e. not a wrapper for a database
or service
Two Reason-able Views to the Web of Linked Data #34
Two reason-able views to the web of linked data
Ontotext persents:
• Linked Data Semantic Repository
– Some of the central LOD datasets
– General-purpose information
– 358M explicit and 512M inferred triples
– http://www.ontotext.com/ldsr/
• Linked Life Data - PIKB (in yellow)
– Several popular life-science datasets
– Complemented by gluing ontologies
– 1.47B explicit and 842M inferred triples
– http://www.linkedlifedata.com
Two Reason-able Views to the Web of Linked Data #35
Linking Open Data Datasets
Two Reason-able Views to the Web of Linked Data June, 2009 #36
How Linked Data Help Enterprises
• You can interlink proprietary data with linked data
• So, what?
• There is a chance that it is easier to integrate to proprietary
databases by linking them to the linked data cloud than by
trying to link them directly to one another
• And you put your data “in context”
• Which allows you make more
interesting queries
• And get more interesting answers
Two Reason-able Views to the Web of Linked Data #37
Presentation Outline
• Ontotext
• OWLIM semantic repository: introduction and latest news
• Reason-able views to the web of linked data
• LDSR: Linked Data Semantic Repository
• PIKB: Pathway and Interaction Knowledge Base
Two Reason-able Views to the Web of Linked Data June, 2009 #38
Linked Data Semantic Repository
• Datasets: DBPedia, Geonames, UMBEL, Wordnet, CIA
World Factbook, Lingvoj
• Ontologies: Dublin Core, SKOS*, RSS
• Inference: materialization with respect to owl-max
– One of the richest tractable fragments of OWL
– Seems to completely cover the semantics of the data
– owl:sameAs optimisation in BigOWLIM, allows reduction of the
indices, without loss of semantics or performance
• Publicly available at http://ldsr.ontotext.com
– Query and explore through Openrdf’s Workbench (web UI)
– SPARQL end-point
– Explorator interface
Two Reason-able Views to the Web of Linked Data #39
LDSR Statistics
Inferred after RDF nodes after
Dataset
Explicit Triples import import
Umbel 3,167,205 56,833 1,230,550
DBpedia (sameAs) 145,120 278,139 1,414,157
Geonames 72,747,880 428,696,785 34,813,153
DBpedia 3.2 core 280,697,077 38,922,702 100,131,770
lingvoj 19,692 848,978 100,141,681
Wordnet 1,946,838 8,575,920 100,769,150
CIA Factbook 35,956 291,877 101,005,679
Total 357,844,134 511,522,747 101,005,679
• Total statements in the repository indices: 869M
• Number of retrievable statements (considering
owl:sameAs expansion): above 1.1B
Two Reason-able Views to the Web of Linked Data #40
Reasoning and Querying Across Datasets
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia4: <http://dbpedia.org/ontology/>
PREFIX dbpedia3: <http://dbpedia.org/resource/>
PREFIX opencyc: <http://sw.opencyc.org/2008/06/10/concept/en/>
PREFIX ontology: http://www.geonames.org/ontology#
SELECT *
WHERE { ?Person dbpedia4:birthplace ?BirthPlace .
?BirthPlace ontology:parentFeature dbpedia3:Florida.
?Person rdf:type opencyc:Entertainer }
• This query involves data from DBPedia, Geonames, and UMBEL
• It involves inference over types, sub-classes, and transitive
relationships
Two Reason-able Views to the Web of Linked Data #41
LDSR in LarKC
• LDSR was set up as a testbed for selection and ranking
components for RDF
• PageRankRDF performance on LDSR:
– it takes only 10 seconds to perform one iteration of PageRank
– 3 minutes to compute the ranks of the 100 million nodes in LDSR
• DualRDF (an RDF priming component) performance on LDSR:
– The performance of the spreading activation tasks varies considerably
depending on the parameters of the process
– As a reference point use the following result: it takes 7 seconds to activate
about 7 thousand nodes after spreading of activation from resource
http://dbpedia.org/resource/Berlin with decay factor 0.25.
– Queries on the “primed” or “selected” part of a dataset run up to 20 times
faster and return only focussed results
• More details are presented in deliverable D2.4.1
Two Reason-able Views to the Web of Linked Data #42
Presentation Outline
• Ontotext
• OWLIM semantic repository: introduction and latest news
• Reason-able views to the web of linked data
• LDSR: Linked Data Semantic Repository
• PIKB: Pathway and Interaction Knowledge Base
Two Reason-able Views to the Web of Linked Data June, 2009 #43
Time to Guess It?
Two Reason-able Views to the Web of Linked Data June, 2009 #44
The problem!
• The data is supported by different organizations
• The information is highly distributed and redundant
• There are tons of flat file formats with special
semantics
• The knowledge is locked in vast data silos
• There are many isolated communities which could not
reach crossdomain understanding
Two Reason-able Views to the Web of Linked Data June, 2009 #45
Linked Data Dataset Growth
Two Reason-able Views to the Web of Linked Data June, 2009 #46
That’s How LinkedLifeData is Born?
• Reason-able view to the web of data to describe the life
science and health care domain
• Allow straightforward updates of the information
• Support incremental extension of the knowledge base with
highly heterogeneous data sets
• Analyze unstructured text information
• Assessed with clinical expertise of AstraZeneca
• Currently operates over semantic repository
Two Reason-able Views to the Web of Linked Data June, 2009 #47
A Pharmaceutical Industry Researcher
• Hard to find information
• Problems to use data due
to lack of context
information
• Hard to collaborate across
domains due to
information silos
• No easy way to interpret
the information (most of
the time is lost to prepare
and transform data)
Two Reason-able Views to the Web of Linked Data June, 2009 #48
LinkedLifeData is a Platform to Help the Drug
Development Process
Two Reason-able Views to the Web of Linked Data June, 2009 #49
The Different Levels of Information Systems
Two Reason-able Views to the Web of Linked Data June, 2009 #50
LinkedLifeData Vision
June, 2009 #51
Two
Reason
http://en.wikipedia.org/wiki/AstraZeneca
References to 52 drugs
(the list is claimed to be incomplete)
Two Reason-able Views to the Web of Linked Data June, 2009 #52
http://dbpedia.org/resource/AstraZeneca
dbpedia:Budesonide
dbpedia:Entocort
skos:subject
dbpedia:redirect
dbpedia:wikilink
dbpedia:redirect skos:subject
dbpedia:wikilink
References to 6 drugs
dbpedia:Losec dbpedia:Omeprazole
dbpedia:AstraZeneca skos:subject
dbpedia:wikilink dbpedia:redirect
dbpedia:Esomeprazole
dbpedia:Nexium
Two Reason-able Views to the Web of Linked Data June, 2009 #53
Another Data Sources
datasource:organization/AstraZeneca
datasource:organization/AstraZeneca_LP
datasource:organization/AstraZeneca_Pharmaceuticals%2C_LP
datasource:organization/AstraZeneca_Pharmaceuticals_LP
datasource:organization/AstraZeneca_Pharmaeuticals_LP
datasource:organization/AstraZeneca_Pharnaceuticals_LP
Two Reason-able Views to the Web of Linked Data June, 2009 #54
Namespace mapping Reference node
db
ns-x: id ns-y: id db: id id
Mismatched identifiers Value dereference
accession term
db: accession db: id
Transitive link Literal extraction
text to
name describe
name
Two Reason-able Views to the Web of Linked Data June, 2009 #55
Pathway and Interaction Knowledge Base Dataset
• Linked Life Data statistics:
– gene – proteins – pathways – targets – disease – drugs – patient
• Number of statements: 2,187,294,998
• Prototype to test scalability and performance of the
Ontotext’s Linked Data infrastructure
Two Reason-able Views to the Web of Linked Data June, 2009 #56
Database Size Schema Description
Uniprot 1,146,084,021 Original by the provider Protein sequences and
annotations
Entrez-Gene 107,193,308 Custom RDF schema Genes and annotation
Gene Ontology 9,656,074 Schema by the provider Gene and gene product
annotation thesaurus
BioGRID 1,892,897 BioPAX 2.0 (custom generated) Protein interactions
extracted from the
literature
NCI - Pathway 333,415 BioPAX 2.0 (original by the Human pathway
Interaction Database provider) interaction database
The Cancer Cell Map 173,914 BioPAX 2.0 (original by the Cancer pathways
provider) database
Reactome 2,538,793 BioPAX 2.0 (original by the Human pathways and
provider) interactions
INOH 432,456 BioPAX 2.0 (original by the Pathway database
provider)
KEGG 18,128,735 BioPAX 1.0 (original by the Molecular Interaction
provider)
PubMed * 900,861,385 Custom RDF schema Biomedical citations
UMLS * 79,88,309 Public OWL semantic network + Biomedical terms
custom RDF schema
Total 2,187,294,998
Two Reason-able Views to the Web of Linked Data June, 2009 #57
LinkedLifeData 0.2 Dataset
• Linked Life Data statistics:
– gene – proteins – pathways – targets – disease – drugs – patient
• Number of entities: over 3 billions statements
• Data sources:
• Uniprot, Entrez-Gene, PubMed, UMLS (MeSH, Taxonomy,
GeneOntoloigy), BioGRID, NCI, Reactome, BioCarta, KEGG,
BioCyc, DBPedia, LODD, Bio2RDF
Two Reason-able Views to the Web of Linked Data June, 2009 #58
Common Questions We Would Like To Ask?
• Find drugs and their aimed disease and used target
"Which are the targets for drugs used in the treatment of
Endocrine diseases?"
• Find potential new targets (part of pathway) for development
of new drugs for a specific disease
“Company X has a very profitable drug give me new potential
targets?”
Two Reason-able Views to the Web of Linked Data June, 2009 #59
LifeSKIM Application
• A platform offering software infrastructure for:
– automatic semantic annotation of text
– ontology population
• Store the extracted facts and reason on top of them
• Semantic indexing and retrieval of content
• Query and navigation involving structured knowledge
• Based on Information Extraction (i.e. text-mining) technology
Two Reason-able Views to the Web of Linked Data June, 2009 #60
How LifeSKIM Searchers Better?
• LifeSKIM can match a query
Documents about interleukin 6 (interferon, beta 2) where is
connected to apoptosis of neutrophils .
• With a document containing
…. the same effect was not observed for IFNB2, IL-8 and TNF-
alpha…….. …. is induced neutrophil programmed cell death by
apoptosis ……
Two Reason-able Views to the Web of Linked Data June, 2009 #61
How LifeSKIM Searchers Better?
The classical IR could not match:
• interleukin 6 with a HGF; HSF; BSF2; IL-6; IFNB2
Interleukin 6 is a an entity in Entrez-Gene with GeneID: 3569,
and HGF; HSF; BSF2; IL-6; IFNB2 are aliases for the same gene
entity.
• apoptosis of neutrophils with neutrophil apoptosis;
programmed cell death of neutrophils by apoptosis;
programmed cell death, neutrophils; neutrophil programmed
cell death by apoptosis;
GeneOntology thesaurus adds the above list of terms as part of
apoptosis of neutrophils term.
Two Reason-able Views to the Web of Linked Data June, 2009 #62
Semantic Annotation Example
Two Reason-able Views to the Web of Linked Data June, 2009 #63
Thanks
AstraZeneca Ontotext
• Bosse Andersson • Deyan Peychev
• The researchers • Georgi Georgiev
• OWLIM team
• KIM team
The development of PIKB and LinkedLifeData is
partially funded by FP7 215535 LarKC
Two Reason-able Views to the Web of Linked Data June, 2009 #64
0 comments
Post a comment