Ontotext AD<br />Expanding the Pathway and Interaction Knowledge in Linked Life Data<br />10/23/2009<br />
Holistic View of the Scientific Problems<br />Link data between different silos applications<br />Put the information into...
The Challenge of the Holistic View<br />Extreme amount of data<br />Data is supported by different organizations<br />Info...
Ontologies and Thesauri<br />Unified Medical Language System®<br />Drugs<br />The Open Biomedical Ontologies <br />Linked ...
LLD Integration Process<br />Data Source Identification<br />Flat files<br />OBO files<br />XML<br />RDBMS<br />RDF<br />S...
Complex Cross-Domain Queries<br />Semantic Web Challenge 2009<br />10/23/2009<br />Physiological<br />process<br />Molecul...
Semantic Web Challenge 2009<br />10/23/2009<br />TODO Put a nice screenshot<br />
Instance Alignment<br />Redundancy is removed by human crafted declarative rules<br />Semantic Web Challenge 2009<br />10/...
db<br />X<br />Y<br />X<br />Y<br />ns-x: id<br />ns-y: id<br />id<br />db: id<br />X<br />X<br />Y<br />accession<br />te...
Semantic Annotations<br />Semantic Web Challenge 2009<br />10/23/2009<br />Respiration Disorders<br />umls:C0035204<br />C...
Semantic Annotations<br />Multiple information extraction views<br />High-recall annotations<br />lifeskim:mentions<br />7...
Distribution of RDF Triples<br />10/23/2009<br />Semantic Web Challenge 2009<br />
Conclusion<br />Linked Life Data is a public and free service<br />There are many more interesting queries<br />Multi-para...
Acknowledgement<br />AstraZeneca<br />Bosse Andersson<br />LODD<br />BioRDF<br />HCLSIG<br />Ontotext<br />Deyan Peychev<b...
Upcoming SlideShare
Loading in …5
×

Expanding the Pathway and Interaction Knowledge in Linked Life Data

2,258 views

Published on

ISWC'2009

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,258
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Traditional thinking is that you need systems to describe the sequence: gene, proteins, ....What we need is a holistic view – disease in the context of the patient and the environmental factors; to link the information
  • (Structure view) The Linked Life Data approach – “the slide is targeted to the bioinformaticians and present the type of data sources are integrated and could be queried”Gene and Proteins – custom RDF schemaInteraction and Pathways – BioPAX schemaDrugs – Schema is designed by Linked Open Drug DataOntologies and Thesauri – SKOSDocuments – a light weight schema
  • (Behaviour view)
  • Eh… it is still possible to get these answers in real time:Select all human genes, which code for proteins with known molecular interactions (as part of physiological process, like inflammation) and are analyzed with molecular techniques like ‘Transfection‘;Restrict the results just to gene or proteins which are known drug targets for a specific disease (like ‘Asthma’)
  • Faceted browsing of the results.PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX entrezgene: <http://linkedlifedata.com/resource/EntrezGene/>PREFIX uniprot: <http://purl.uniprot.org/core/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX gene: <http://linkedlifedata.com/resource/EntrezGene/> PREFIX core: <http://purl.uniprot.org/core/> PREFIX biopax2: <http://www.biopax.org/release/biopax-level2.owl#> PREFIX lifeskim: <http://linkedlifedata.com/resource/lifeskim/> PREFIX umls: <http://linkedlifedata.com/resource/umls/> PREFIX pubmed: <http://linkedlifedata.com/resource/pubmed/> PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/> SELECT distinct ?geneName ?uniprotName ?targetFunction ?drugName ?molecularTechnique WHERE { ?interaction rdf:type biopax2:interaction . ?interaction biopax2:PARTICIPANTS ?p . ?p biopax2:PHYSICAL-ENTITY ?protein . ?protein skos:exactMatch ?uniprotaccession . ?uniprotaccessionuniprot:mnemonic ?uniprotName. ?uniprotaccessioncore:organism <http://purl.uniprot.org/taxonomy/9606> . ?geneidgene:hasDescription ?genedescription . ?geneidgene:uniprotAccession ?uniprotaccession . ?geneidgene:pubmed ?pmid . ?geneidentrezgene:hasOfficialSymbol ?geneName. ?pmidlifeskim:mentions ?umlsid . ?umlsidrdf:type <http://linkedlifedata.com/resource/semanticnetwork/id/T063>. ?umlsidskos:prefLabel ?molecularTechnique . ?target skos:closeMatch ?geneid. ?target drugbank:specificFunction ?targetFunction. ?drug drugbank:target ?target. ?drug drugbank:genericName ?drugName . }LIMIT 10000
  • Link the redundant/related information with SKOS predicates skos:exactMatch, skos:closeMatch, skos:broaderMatch, skos:narrowMatch
  • The type of links we have found in 20 different sources
  • Semantic annotation the most powerful linking technique
  • Semantic annotation results; planning more
  • Quantitative distribution of the data sources. – sequence databases are the biggest pie (gene/proteins) – 4 times dbpedia; ontology/thesauri (UMLS, OBO, etc) – not visible on the picture
  • Expanding the Pathway and Interaction Knowledge in Linked Life Data

    1. 1. Ontotext AD<br />Expanding the Pathway and Interaction Knowledge in Linked Life Data<br />10/23/2009<br />
    2. 2. Holistic View of the Scientific Problems<br />Link data between different silos applications<br />Put the information into context<br />Analyze the knowledge locked into unstructured data<br />Environmental Factors<br />10/23/2009<br />Semantic Web Challenge 2009<br />
    3. 3. The Challenge of the Holistic View<br />Extreme amount of data<br />Data is supported by different organizations<br />Information is highly distributed and redundant<br />Tons of flat file formats with special semantics<br />Knowledge is locked in vast data silos <br />Isolated communities which could not reach cross-domain understanding<br />10/23/2009<br />Semantic Web Challenge 2009<br />
    4. 4. Ontologies and Thesauri<br />Unified Medical Language System®<br />Drugs<br />The Open Biomedical Ontologies <br />Linked Open Drug Data<br />Text-Analysis<br />Interactions and Pathways<br />Linked Life Data<br />RDF Warehouse<br />Biological Pathway Exchange<br />Semantic Annotations<br />Genes and Proteins<br />Documents<br />5 billion RDF statements!<br />
    5. 5. LLD Integration Process<br />Data Source Identification<br />Flat files<br />OBO files<br />XML<br />RDBMS<br />RDF<br />Special tailored transformer<br />OBO to SKOS converter<br />Custom XSLT<br />RDBMS to RDF formatter<br />RDF warehouse<br />Reasoner<br />Instance Mappings<br />Semantic Annotations<br />10/23/2009<br />Semantic Web Challenge 2009<br />
    6. 6. Complex Cross-Domain Queries<br />Semantic Web Challenge 2009<br />10/23/2009<br />Physiological<br />process<br />Molecular Technique<br />filter human genes<br />Gene<br />participate in<br />cause<br />Molecular Interaction<br />Disease<br />analyzed by<br />express protein<br />Protein<br />Drugs<br />curated interaction<br />treated<br />with<br />target<br />
    7. 7. Semantic Web Challenge 2009<br />10/23/2009<br />TODO Put a nice screenshot<br />
    8. 8. Instance Alignment<br />Redundancy is removed by human crafted declarative rules<br />Semantic Web Challenge 2009<br />10/23/2009<br />CD40L_HUMAN <br />biopax-2:SHORT-NAME<br />cpath:CPATH-94138<br />CD40L_HUMAN<br />biopax-2:DB<br />biopax-2:XREF<br />UNIPROT<br />uniprot:mnemonic<br />biopax-2:PHYSICAL-ENTITY<br />biopax-2:ID<br />uniprot:mnemonic<br /> TNF5_HUMAN <br />P29965<br />biopax-2:PHYSICAL-ENTITY<br />uniprot:P29965<br />uniprot:mnemonic<br />TNFL5_HUMAN<br />cpath:CPATH-LOCAL-8467065<br />uniprot:mnemonic<br />cpath:CPATH-LOCAL-8749236<br />CD4L_HUMAN<br />
    9. 9. db<br />X<br />Y<br />X<br />Y<br />ns-x: id<br />ns-y: id<br />id<br />db: id<br />X<br />X<br />Y<br />accession<br />term<br />db: accession<br />db: id<br />Y<br />text to describe name<br />Y<br />X<br />Y<br />X<br />name<br />
    10. 10. Semantic Annotations<br />Semantic Web Challenge 2009<br />10/23/2009<br />Respiration Disorders<br />umls:C0035204<br />Chronic Obstructive Airway Diseases<br />broader<br />Bronchial Diseases<br />umls:C0006261<br />Asthma and chronic obstructive pulmonary disease (COPD) are chronic airway diseases characterized by airflow obstruction. The beta(2)-adrenoceptormediates bronchodilatation in response to exogenous and endogenous beta-adrenoceptor agonists. Single nucleotide polymorphisms in the beta(2)-adrenoceptorgene (ADRB2) cause amino acid changes (e.g. Arg16Gly, Gln27Glu) that potentially alter receptor function. Recently, a large cohort study found no association between asthma susceptibility and beta(2)-adrenoceptor polymorphisms. In contrast, asthma phenotypes, such as asthma severity and bronchial hyperresponsiveness, have been associated with beta(2)-adrenoceptor polymorphisms.<br />broaderTransitive<br />COPD<br />broader<br />broader<br />broaderTransitive<br />Asthma<br />mentions<br />umls:C000496<br />mentions<br />journal<br />pmid:17714090<br />author<br />Clinical and experimental pharmacology …<br />Ian A Yang<br />
    11. 11. Semantic Annotations<br />Multiple information extraction views<br />High-recall annotations<br />lifeskim:mentions<br />705,338,334 statements<br />High-precision annotations<br />lifeskim:mentionsStrict<br />263,323,164 statements<br />Semantic Web Challenge 2009<br />10/23/2009<br />lifeskim:mentions<br />rdfs:subPropertyOf<br />lifeskim:mentionsStrict<br />
    12. 12. Distribution of RDF Triples<br />10/23/2009<br />Semantic Web Challenge 2009<br />
    13. 13. Conclusion<br />Linked Life Data is a public and free service<br />There are many more interesting queries<br />Multi-paradigm search<br />Concept co-occurrence<br />Combine text with structured search<br />Easy integrates information from the public linked data cloud<br />Why not give a try?<br />Semantic Web Challenge 2009<br />10/23/2009<br />
    14. 14. Acknowledgement<br />AstraZeneca<br />Bosse Andersson<br />LODD<br />BioRDF<br />HCLSIG<br />Ontotext<br />Deyan Peychev<br />Georgi Georgiev<br />Todor Primov<br />OWLIM team<br />Semantic Web Challenge 2009<br />10/23/2009<br />The development of PIKB and Linked Life Data is partially funded by FP7 215535<br />

    ×