• Save
Two Reason-able Views to the Web of Linked Data
Upcoming SlideShare
Loading in...5
×
 

Two Reason-able Views to the Web of Linked Data

on

  • 4,889 views

 

Statistics

Views

Total Views
4,889
Views on SlideShare
4,875
Embed Views
14

Actions

Likes
5
Downloads
0
Comments
0

2 Embeds 14

http://www.slideshare.net 12
http://localhost 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Two Reason-able Views to the Web of Linked Data Two Reason-able Views to the Web of Linked Data Presentation Transcript

  • Two Reason-able Views to the Web of Linked Data Atanas Kiryakov, Vassil Momtchev June, 2009
  • Presentation Outline • Ontotext • OWLIM semantic repository: introduction and latest news • Reason-able views to the web of linked data • LDSR: Linked Data Semantic Repository • PIKB: Pathway and Interaction Knowledge Base Two Reason-able Views to the Web of Linked Data June, 2009 #2
  • What is Ontotext? • Ontotext is a semantic technology provider • Established in year 2000 as part of Sirma Group – Sirma is a top-3 software house in Bulgaria, est. 1992, ~300 persons • Staff: 40 employees in Sofia and Varna – Multiple affiliates and contractors in Western Europe • Over 150 person-years invested in product development • Investment acquired in July 2008 – A financial investor obtained minority share in a deal for 2.5 MEURO • Ontotext is involved in two joint ventures: – Innonvantage: online recruitment intelligence provider in UK – Namerimi: national search engine in Bulgaria Two Reason-able Views to the Web of Linked Data June, 2009 #3
  • Ontotext Positioning • Unique coverage of research/technology areas, including: – Semantic Databases: high-performance RDF DBMS, scalable reasoning – Semantic Search: text-mining (IE), Information Retrieval (IR) – Semantic Web Services and BPM: WS annotation, discovery, etc. – Web Mining: focused crawling, wrapping – Knowledge fusion: identity resolution, record linkage • Core business: development of semantic engines – Mostly product development and sales – Complemented by professional services – Joint ventures for vertical applications Two Reason-able Views to the Web of Linked Data June, 2009 #4
  • Application Domains • Ontotext technologies are used for various applications: – Data Integration (consolidation of multiple databases) – Knowledge & Content Management (enterprise search) – Business Intelligence – Enterprise Application Integration & Business Process Management – Web-mining/Web-intelligence • Major industries/markets – Life sciences – Telecommunications – Media Archives, Media Research – Online recruitment – IP/Patent Research – Web search, Web 2.0 and Semantic Web start-ups Two Reason-able Views to the Web of Linked Data June, 2009 #5
  • Leading Semantic Web Technology Developer Ontotext develops several outstanding products: • KIM semantic annotation & search platform – Google 1st place for “semantic annotation” • OWLIM semantic database – Google 1st place for “semantic repository” • wsmo4j & WSMO Studio – semantic web service annotation platform – lead developer of popular open-source projects Ontotext is a major contributor to open-source projects: • GATE – the most popular NLP and text-mining platform • Sesame – one of the most popular framework for RDF repositories Two Reason-able Views to the Web of Linked Data June, 2009 #6
  • Extensive Involvement in Research Projects • Ontotext has participated in 20+ EC research projects • > 100 MEuro is the budget of the projects Ontotext is part of – This is above 10% of the EC projects related to semantics • Ontotext is the most successful Bulgarian company in FP6 • Ontotext is part of four FP7 projects, running until 2011: – LARKC: web-scale reasoning – soa4all: SOA for the masses through Semantic Web technology – NoTube: semantics for personalized TV guides – Insemtives: Incentive models and framework for semantic metadata Two Reason-able Views to the Web of Linked Data June, 2009 #7
  • Presentation Outline • Ontotext • OWLIM semantic repository: introduction and latest news • Reason-able views to the web of linked data • LDSR: Linked Data Semantic Repository • PIKB: Pathway and Interaction Knowledge Base Two Reason-able Views to the Web of Linked Data June, 2009 #8
  • Semantic Repository for RDFS and OWL • OWLIM is a scalable semantic repository which allows – Management, integration, and analysis of heterogeneous data – Combined with light-weight reasoning capabilities • Its is capable to replace RDBMS in many applications – Suitable for analytical tasks and Business Intelligence (OLAP) – Inappropriate for highly dynamic OLTP environments • OWLIM is RDF database with high-performance reasoning: – The inference is based on logical rule-entailment – Full RDFS and limited OWL Lite and Horst are supported – Custom semantics defined via rules and axiomatic triples Two Reason-able Views to the Web of Linked Data June, 2009 #9
  • Lightweight Semantics • Still not sure what is the added value of ontologies at the end? • Hard to comprehend what is “satisfiability”? • Tough to predict, manage, tune, and scale? • Probably someone is trying to sell you the ideas from his PhD thesis and not an industrial data management technology • Ontotext is productizing lightweight semantics that is easy to understand, deploy, and manage • For instance, think of ontologies as database schema with simple rules. Plenty of obvious, but useful, inferences are just around the corner Two Reason-able Views to the Web of Linked Data June, 2009 #10
  • Rule-Based Inference owl:SymmetricProperty owl:inverseOf <C1,rdfs:subClassOf,C2> owl:inverseOf <C2,rdfs:subClassOf,C3> ⇒ <C1,rdfs:subClassOf,C3> owl:relativeOf ptop:parentOf ptop:Agent rdfs:subPropertyOf <I,rdf:type,C1> owl:inverseOf rdf owl:inverseOf <C1,rdfs:subClassOf,C2> s: s ub owl:inverseOf ⇒ <I,rdf:type,C2> Cla s sO f <I1,P1,I2> ptop:Person <P1,rdfs:range,C2> rdfs:range ⇒ <I2,rdf:type,C2> rd f ptop:childOf s:s ubC <P1,owl:inverseOf,P2> myData:Ivan las rd f sO <I1,P1,I2> s:s f ubC ⇒ <I2,P2,I1> las pto sO p:c f pto hil <P1,rdf:type,owl:SymmetricProperty> p:p dO ptop:Woman are f ⇒ <P1,owl:inverseOf,P1> pto n tO p:r f ela inferred tiv eO f myData: Maria rdf:type Two Reason-able Views to the Web of Linked Data June, 2009 #11
  • Using OWLIM • OWLIM is implemented as storage and inference layer (SAIL) for Sesame • OWLIM is based on TRREE – TRREE = Triple Reasoning and Rule Entailment Engine – TRREE takes care of storage, indexing, inference, and query evaluation – TRREE has different flavors, mapping to different OWLIM species • OWLIM can be used and accessed in different ways: – By end users: through the web UI routines of Sesame • Though Ontology Editors, integrated with Sesame, e.g. Top Braid Composer – By applications: through the API’s of Sesame • embed it as a library or • access it as standalone server Two Reason-able Views to the Web of Linked Data June, 2009 #12
  • Sesame, TRREE, ORDI, and OWLIM Sesame Web UI User Application Sesame or Ontology Editor SAIL API OWLIM TRREE ORDI Engine Two Reason-able Views to the Web of Linked Data June, 2009 #13
  • SwiftOWLIM and BigOWLIM • Two OWLIM species: SwiftOWLIM and BigOWLIM – Share the same inference and semantics (rule-compiler, etc.) – They are identical in terms of usage and integration • The same APIs, syntaxes, query languages (thanks to Sesame) • Different are only the configuration parameters for performance tuning • SwiftOWLIM is good for experiments and medium-sized data – Extremely fast loading of data (incl. inference, storage, etc.) • BigOWLIM is designed to handle huge volumes of data and massive querying loads – Query optimizations ensure faster query evaluation on large datasets – Scales much better, having lower memory requirements Two Reason-able Views to the Web of Linked Data June, 2009 #14
  • SwiftOWLIM and BigOWLIM (II) SwiftOWLIM BigOWLIM Scale 10 MSt, using 1.6 GB RAM 130 MSt, using 1.6GB (Mill. of explicit statem.) 100 MSt, using 16 GB RAM 1068 MSt, using 8GB Processing speed 30 KSt/s on notebook 5 KSt/s on notebook (load+infer+store) 200 KSt/s on server 80 KSt/s on server Query optimization No Yes Persistence Back-up in N-Triples Binary data files and indices Licence and Availability Open-source under LGPL; Commercial. Research and Uses SwiftTRREE that is evaluation copies provided for free, but not open-source free Two Reason-able Views to the Web of Linked Data June, 2009 #15
  • Named Graphs, SPARQL, Sesame 2.0 • Named graphs (NG) represent an extension of the RDF model – Quadruples <s,p,o,ng> are used to define RDF multi-graph – Allow for handling provenance when multiple RDF graphs are integrated • SPARQL is the most popular RDF query language – Comprehensive support for SPARQL requires NG support • Sesame is the most efficient RDF framework – Versions 2.x and later supports NG (under the name “contexts”) – It also supports SPARQL Two Reason-able Views to the Web of Linked Data June, 2009 #16
  • OWLIM Versions and Features Map Sesame SPARQL Instant owl:sameAs Comment version initializ. optimization SwiftOWLIM 2.9.x 1.2.x - - - The fastest OWL database. Multi-threaded inference, with transitive inference optimiz. BigOWLIM 2.x 1.2.x - + + Optimal performance and scalability. The fastest query evaluation. Successor of 0.9.x SwiftOWLIM 3.x 2.x + + - The fastest RDF machine with NG and SPARQL support BigOWLIM 3.x 2.x + + + Ultimate scalability and fast SPARQL evaluation Two Reason-able Views to the Web of Linked Data June, 2009 #17
  • Outstanding Performance • SwiftOWLIM is the fastest OWL engine! – It scales to 10 million statements on a desktop PC (32-bit) – It loads LUBM(50) in 42 seconds at average speed 161 KSt./sec. • BigOWLIM is the most scalable OWL engine! – It can load and reason with 8 bill. statements on a $10,000 server • LUBM(64k) loaded in about 9 days with inference and materialization – Loads the 14 billion statements of LUBM(64k) after materialization – Loads the 1 bill. Statements of LUBM(8k) in 14 hours and answers the queries in 1 hour on a $2000 workstation • “Full-cycle” loading, inference, and query evaluation in 15.2 hours Two Reason-able Views to the Web of Linked Data June, 2009 #18
  • Scalable Inference Map: Introduction • The map on the next slide presents the loading speed of few of the most scalable repositories in relation to the size of the dataset and the complexity of the loading – Most recently published evaluation results have been used – The map includes runs of the LUBM and loading of Uniprot and LDSR – For OWLIM, ORACLE and DAML DB, loading includes forward-chaining and materialization • The complexity of the reasoning reflects the language used and the specificity of the instance data – For instance, UNIPROT is much heavily interconnected than LUBM Two Reason-able Views to the Web of Linked Data June, 2009 #19
  • Scalable Reasoning Map (up to 1.5B) Two Reason-able Views to the Web of Linked Data June, 2009 #20
  • Scalable Reasoning Map (the big picture) Bubble size indicates 140 loading complexity (bigger is better) cluster of 14 120 8-core blades 100 sub-$2000 4-core desktop 80 b h e g c s / r t ) . , i sub-$10,000 60 8-core server (1 0 40 20 o p n d e a g S L i 0 0 2 4 6 8 10 12 14 Dataset size (bill. explicit statements) BigOWLIM AllegroGraph Virtuoso Jena TDB BigData Two Reason-able Views to the Web of Linked Data June, 2009 #21
  • Naïve OWL Fragments Map Complexity* OWL Full SWRL OWL DL OWL Lite OWL/WSML Flight Datalog OWLIM / OWL2 RL OWL Horst / Tiny OWL Lite- / DHL OWL DLP RDFS Rules, LP DL Two Reason-able Views to the Web of Linked Data June, 2009 #22
  • Semantics Supported by OWLIM • The ruleset parameter allows for switching between 4 predefined inference modes: – owl-max – the most expressive set (see the next slides); – owl-horst – a set similar to the one defined in [Horst05]: • It is sufficient to pass the LUBM benchmark correctly; • Similar to the OWL fragment supported by ORACLE – rdfs – the standard RDF(S) semantics; – empty – as an RDF store without any inference. • The partialRDFS parameter allows switching on/off an optimization in the RDFS and OWL support • Custom rule-sets can be defined (rules + axiomatic triples) – This way, one can specify semantics which best fits the concrete application in terms of expressivity and performance Two Reason-able Views to the Web of Linked Data June, 2009 #23
  • LUBM(50,0): Rule-set and Inference Mode 240 e c s ) ( . 181 131 105 86 85 73 0 o 5 )L n d c e a % r % % % f I 8 % 8 % 7 % 9 2 2 7 1 5 2 1 0 1 2 1 0 M U owl-max owl-max, p* horst horst, p* rdfs rdfs, p* empty B L ( • p* above means that the partialRDFS optimizations are switched on • Since SwiftOWLIM 2.9.1, there are few “optimizations” which partialRDFS triggers in the OWL support in rule-sets owl-horst and owl-max Two Reason-able Views to the Web of Linked Data June, 2009 #24
  • LUBM(50,0): Multi-threaded Inference 344 (sec.) 299 Load and Inference 278 141 113 107 105 110 LUBM(50) 134% 108% 102% 100% 105% 124% 100% 108% 4cOpt12g, 4cOpt12g, 4cOpt12g, 4cOpt12g, 4cOpt12g, Piv0.9g, Piv0.9g, Piv0.9g, lin64 - 1 lin64 - 2 lin64 - 3 lin64 - 4 lin64 - 5 w in32 - 1 w in32 - 2 w in32 - 3 Refer to OWLIM’s system documentation for analysis and comments Two Reason-able Views to the Web of Linked Data June, 2009 #25
  • Query Performance • Berlin SPARQL Benchmark, evaluates the performance of query engines in e-commerce use case: searching products and navigating through related information • Randomized “query mixes” (of 25 queries each) are evaluated continuously towards datasets of different size • Multiple-clients load is simulated as well. BSBM 25M Query Evaluation • The diagram 10,000 compares the 8,000 BigOWLIM 3.1 6,000 performance of few Sesame 2.2.4 4,000 Jena TDB 0.72 of the most popular 2,000 Virtuoso TS 5.0 engines m Q 0 o h p u e y x s r i 1 client 4 clients 8 clients Two Reason-able Views to the Web of Linked Data June, 2009 #26
  • OWLIM in Use • BigOWLIM is used for data-integration in life sciences – Large scale protein-interaction related data in LifeSKIM platform • SwiftOWLIM is bundled as ontology service in GATE 4.0/5.0 – GATE is the most popular text-mining platform • OWLIM is used as a semantic repository in KIM – KIM is a semantic annotation and search platform of Ontotext • TopBraid Composer bundles OWLIM as a reasoner • OWLIM is used in more than 10 European research projects • OWLIM is used by top-5 US defense contracotor – But also many 5-person startups Two Reason-able Views to the Web of Linked Data June, 2009 #27
  • TopBraid Composer v.2.3 Announcement “Our initial tests with OWLIM indicate that OWLIM may become a serious alternative to the better-known engines such as Pellet and Jena. My colleague Dean Allemang is using OWLIM to classify ontologies that contain tons of individuals. His models are essentially impossible to handle with Pellet…” Holger Knublauch, TopQuadrant’s VP Product Development, in his blog announcement of the new version of TBC Two Reason-able Views to the Web of Linked Data June, 2009 #28
  • Benchmarks of BBN (the DAML-DB developers) “We feel that the triple-stores that offered the best all-around performance for operations with a large dataset were Sesame + DAML DB, Jena + DAML DB, and Sesame + BigOWLIM. Each of these triple-stores has their own relative merits. Most importantly, all three of them provide adequate query response time performance for various queries, but no one triple-store is clearly better than the other triple-stores in all cases under the conditions evaluated in this study. For instance, Sesame + BigOWLIM provides better response time than the other triple- stores when responding to complex queries. …” Ruhloff, K; Dean, M; Emmons, I; Ryder, D; Sumner, J. (2007) An Evaluation of Triple-Store Technologies for Large Data Stores. In Proc. Of Scalable Semantic Systems Workshop (SSSW 2007). Two Reason-able Views to the Web of Linked Data June, 2009 #29
  • Benchmarks of the KAON2 Developers “OWLIM performed very well, while still being able to process OWL DLP, and hence should be the choice for ABox reasoning with lightweight ontologies.” J. Bock, P. Haase, Q. Ji, R. Volz: Benchmarking OWL Reasoners. In ARea2008 - Workshop on Advancing Reasoning on the Web Two Reason-able Views to the Web of Linked Data June, 2009 #30
  • Presentation Outline • Ontotext • OWLIM semantic repository: introduction and latest news • Reason-able views to the web of linked data • LDSR: Linked Data Semantic Repository • PIKB: Pathway and Interaction Knowledge Base Two Reason-able Views to the Web of Linked Data June, 2009 #31
  • Linking Open Data • Linking Open Data (LOD) W3C SWEO Community project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData • Initiative for publishing “linked data” – a set of principles, which allows browsing of RDF data, spread across different servers, in the way HTML is browsed Two Reason-able Views to the Web of Linked Data June, 2009 #32
  • Reason-able views to the LOD • Classical sound and complete reasoning is unfeasible to a web of linked data. • The major obstacles: – Most of the popular reasoning setups count on “closed world assumption” (which is irrelevant in web context) – The complexity of reasoning even the with the simplest DL (say OWL Lite) is prohibitatively high – Some of the datasets of LOD (or some parts of them) are not suitable for reasoning. It seems that some data publishers use the OWL and RDFS vocabulary without account for its formal semantics – Although reasoning with data distributed across different web servers is possible it is much slower than reasoning with local data. The fundamental reason is related to the so called "remote join" problem known from the distributed DBMS Two Reason-able Views to the Web of Linked Data #33
  • Reason-able views to the LOD (2) • Reason-able views represent an approach for reasoning with the web of linked data • Key ideas: – Inference with respect to tractable OWL dialects – group selected datasets and ontologies in a reason-able view – load all ontologies and data in a single semantic repository • Selection Criteria: – the dataset (or a part of it that is easy to define and isolate) allows inference, which delivers meaningful results under the semantics determined for the view; – the dataset is more or less static, i.e. not a wrapper for a database or service Two Reason-able Views to the Web of Linked Data #34
  • Two reason-able views to the web of linked data Ontotext persents: • Linked Data Semantic Repository – Some of the central LOD datasets – General-purpose information – 358M explicit and 512M inferred triples – http://www.ontotext.com/ldsr/ • Linked Life Data - PIKB (in yellow) – Several popular life-science datasets – Complemented by gluing ontologies – 1.47B explicit and 842M inferred triples – http://www.linkedlifedata.com Two Reason-able Views to the Web of Linked Data #35
  • Linking Open Data Datasets Two Reason-able Views to the Web of Linked Data June, 2009 #36
  • How Linked Data Help Enterprises • You can interlink proprietary data with linked data • So, what? • There is a chance that it is easier to integrate to proprietary databases by linking them to the linked data cloud than by trying to link them directly to one another • And you put your data “in context” • Which allows you make more interesting queries • And get more interesting answers Two Reason-able Views to the Web of Linked Data #37
  • Presentation Outline • Ontotext • OWLIM semantic repository: introduction and latest news • Reason-able views to the web of linked data • LDSR: Linked Data Semantic Repository • PIKB: Pathway and Interaction Knowledge Base Two Reason-able Views to the Web of Linked Data June, 2009 #38
  • Linked Data Semantic Repository • Datasets: DBPedia, Geonames, UMBEL, Wordnet, CIA World Factbook, Lingvoj • Ontologies: Dublin Core, SKOS*, RSS • Inference: materialization with respect to owl-max – One of the richest tractable fragments of OWL – Seems to completely cover the semantics of the data – owl:sameAs optimisation in BigOWLIM, allows reduction of the indices, without loss of semantics or performance • Publicly available at http://ldsr.ontotext.com – Query and explore through Openrdf’s Workbench (web UI) – SPARQL end-point – Explorator interface Two Reason-able Views to the Web of Linked Data #39
  • LDSR Statistics Inferred after RDF nodes after Dataset Explicit Triples import import Umbel 3,167,205 56,833 1,230,550 DBpedia (sameAs) 145,120 278,139 1,414,157 Geonames 72,747,880 428,696,785 34,813,153 DBpedia 3.2 core 280,697,077 38,922,702 100,131,770 lingvoj 19,692 848,978 100,141,681 Wordnet 1,946,838 8,575,920 100,769,150 CIA Factbook 35,956 291,877 101,005,679 Total 357,844,134 511,522,747 101,005,679 • Total statements in the repository indices: 869M • Number of retrievable statements (considering owl:sameAs expansion): above 1.1B Two Reason-able Views to the Web of Linked Data #40
  • Reasoning and Querying Across Datasets PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbpedia4: <http://dbpedia.org/ontology/> PREFIX dbpedia3: <http://dbpedia.org/resource/> PREFIX opencyc: <http://sw.opencyc.org/2008/06/10/concept/en/> PREFIX ontology: http://www.geonames.org/ontology# SELECT * WHERE { ?Person dbpedia4:birthplace ?BirthPlace . ?BirthPlace ontology:parentFeature dbpedia3:Florida. ?Person rdf:type opencyc:Entertainer } • This query involves data from DBPedia, Geonames, and UMBEL • It involves inference over types, sub-classes, and transitive relationships Two Reason-able Views to the Web of Linked Data #41
  • LDSR in LarKC • LDSR was set up as a testbed for selection and ranking components for RDF • PageRankRDF performance on LDSR: – it takes only 10 seconds to perform one iteration of PageRank – 3 minutes to compute the ranks of the 100 million nodes in LDSR • DualRDF (an RDF priming component) performance on LDSR: – The performance of the spreading activation tasks varies considerably depending on the parameters of the process – As a reference point use the following result: it takes 7 seconds to activate about 7 thousand nodes after spreading of activation from resource http://dbpedia.org/resource/Berlin with decay factor 0.25. – Queries on the “primed” or “selected” part of a dataset run up to 20 times faster and return only focussed results • More details are presented in deliverable D2.4.1 Two Reason-able Views to the Web of Linked Data #42
  • Presentation Outline • Ontotext • OWLIM semantic repository: introduction and latest news • Reason-able views to the web of linked data • LDSR: Linked Data Semantic Repository • PIKB: Pathway and Interaction Knowledge Base Two Reason-able Views to the Web of Linked Data June, 2009 #43
  • Time to Guess It? Two Reason-able Views to the Web of Linked Data June, 2009 #44
  • The problem! • The data is supported by different organizations • The information is highly distributed and redundant • There are tons of flat file formats with special  semantics • The knowledge is locked in vast data silos  • There are many isolated communities which could not  reach cross­domain understanding Two Reason-able Views to the Web of Linked Data June, 2009 #45
  • Linked Data Dataset Growth Two Reason-able Views to the Web of Linked Data June, 2009 #46
  • That’s How LinkedLifeData is Born? • Reason-able view to the web of data to describe the life science and health care domain • Allow straightforward updates of the information • Support incremental extension of the knowledge base with highly heterogeneous data sets • Analyze unstructured text information • Assessed with clinical expertise of AstraZeneca • Currently operates over semantic repository Two Reason-able Views to the Web of Linked Data June, 2009 #47
  • A Pharmaceutical Industry Researcher • Hard to find information • Problems to use data due to lack of context information • Hard to collaborate across domains due to information silos • No easy way to interpret the information (most of the time is lost to prepare and transform data) Two Reason-able Views to the Web of Linked Data June, 2009 #48
  • LinkedLifeData is a Platform to Help the Drug Development Process Two Reason-able Views to the Web of Linked Data June, 2009 #49
  • The Different Levels of Information Systems Two Reason-able Views to the Web of Linked Data June, 2009 #50
  • LinkedLifeData Vision June, 2009 #51 Two Reason
  • http://en.wikipedia.org/wiki/AstraZeneca References to 52 drugs (the list is claimed to be incomplete) Two Reason-able Views to the Web of Linked Data June, 2009 #52
  • http://dbpedia.org/resource/AstraZeneca dbpedia:Budesonide dbpedia:Entocort skos:subject dbpedia:redirect dbpedia:wikilink dbpedia:redirect skos:subject dbpedia:wikilink References to 6 drugs dbpedia:Losec dbpedia:Omeprazole dbpedia:AstraZeneca skos:subject dbpedia:wikilink dbpedia:redirect dbpedia:Esomeprazole dbpedia:Nexium Two Reason-able Views to the Web of Linked Data June, 2009 #53
  • Another Data Sources datasource:organization/AstraZeneca datasource:organization/AstraZeneca_LP datasource:organization/AstraZeneca_Pharmaceuticals%2C_LP datasource:organization/AstraZeneca_Pharmaceuticals_LP datasource:organization/AstraZeneca_Pharmaeuticals_LP datasource:organization/AstraZeneca_Pharnaceuticals_LP Two Reason-able Views to the Web of Linked Data June, 2009 #54
  • Namespace mapping Reference node db ns-x: id ns-y: id db: id id Mismatched identifiers Value dereference accession term db: accession db: id Transitive link Literal extraction text to name describe name Two Reason-able Views to the Web of Linked Data June, 2009 #55
  • Pathway and Interaction Knowledge Base Dataset • Linked Life Data statistics: – gene – proteins – pathways – targets – disease – drugs – patient • Number of statements: 2,187,294,998 • Prototype to test scalability and performance of the Ontotext’s Linked Data infrastructure Two Reason-able Views to the Web of Linked Data June, 2009 #56
  • Database Size Schema Description Uniprot 1,146,084,021 Original by the provider Protein sequences and annotations Entrez-Gene 107,193,308 Custom RDF schema Genes and annotation Gene Ontology 9,656,074 Schema by the provider Gene and gene product annotation thesaurus BioGRID 1,892,897 BioPAX 2.0 (custom generated) Protein interactions extracted from the literature NCI - Pathway 333,415 BioPAX 2.0 (original by the Human pathway Interaction Database provider) interaction database The Cancer Cell Map 173,914 BioPAX 2.0 (original by the Cancer pathways provider) database Reactome 2,538,793 BioPAX 2.0 (original by the Human pathways and provider) interactions INOH 432,456 BioPAX 2.0 (original by the Pathway database provider) KEGG 18,128,735 BioPAX 1.0 (original by the Molecular Interaction provider) PubMed * 900,861,385 Custom RDF schema Biomedical citations UMLS * 79,88,309 Public OWL semantic network + Biomedical terms custom RDF schema Total 2,187,294,998 Two Reason-able Views to the Web of Linked Data June, 2009 #57
  • LinkedLifeData 0.2 Dataset • Linked Life Data statistics: – gene – proteins – pathways – targets – disease – drugs – patient • Number of entities: over 3 billions statements • Data sources: • Uniprot, Entrez-Gene, PubMed, UMLS (MeSH, Taxonomy, GeneOntoloigy), BioGRID, NCI, Reactome, BioCarta, KEGG, BioCyc, DBPedia, LODD, Bio2RDF Two Reason-able Views to the Web of Linked Data June, 2009 #58
  • Common Questions We Would Like To Ask? • Find drugs and their aimed disease and used target "Which are the targets for drugs used in the treatment of Endocrine diseases?" • Find potential new targets (part of pathway) for development of new drugs for a specific disease “Company X has a very profitable drug give me new potential targets?” Two Reason-able Views to the Web of Linked Data June, 2009 #59
  • LifeSKIM Application • A platform offering software infrastructure for: – automatic semantic annotation of text – ontology population • Store the extracted facts and reason on top of them • Semantic indexing and retrieval of content • Query and navigation involving structured knowledge • Based on Information Extraction (i.e. text-mining) technology Two Reason-able Views to the Web of Linked Data June, 2009 #60
  • How LifeSKIM Searchers Better? • LifeSKIM can match a query Documents about interleukin 6 (interferon, beta 2) where is connected to apoptosis of neutrophils . • With a document containing …. the same effect was not observed for IFNB2, IL-8 and TNF- alpha…….. …. is induced neutrophil programmed cell death by apoptosis …… Two Reason-able Views to the Web of Linked Data June, 2009 #61
  • How LifeSKIM Searchers Better? The classical IR could not match: • interleukin 6 with a HGF; HSF; BSF2; IL-6; IFNB2 Interleukin 6 is a an entity in Entrez-Gene with GeneID: 3569, and HGF; HSF; BSF2; IL-6; IFNB2 are aliases for the same gene entity. • apoptosis of neutrophils with neutrophil apoptosis; programmed cell death of neutrophils by apoptosis; programmed cell death, neutrophils; neutrophil programmed cell death by apoptosis; GeneOntology thesaurus adds the above list of terms as part of apoptosis of neutrophils term. Two Reason-able Views to the Web of Linked Data June, 2009 #62
  • Semantic Annotation Example Two Reason-able Views to the Web of Linked Data June, 2009 #63
  • Thanks AstraZeneca Ontotext • Bosse Andersson • Deyan Peychev • The researchers • Georgi Georgiev • OWLIM team • KIM team The development of PIKB and LinkedLifeData is partially funded by FP7 215535 LarKC Two Reason-able Views to the Web of Linked Data June, 2009 #64