• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Expanding the Pathway and Interaction Knowledge in Linked Life Data
 

Expanding the Pathway and Interaction Knowledge in Linked Life Data

on

  • 2,767 views

ISWC'2009

ISWC'2009

Statistics

Views

Total Views
2,767
Views on SlideShare
2,763
Embed Views
4

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 4

http://www.slideshare.net 4

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Traditional thinking is that you need systems to describe the sequence: gene, proteins, ....What we need is a holistic view – disease in the context of the patient and the environmental factors; to link the information
  • (Structure view) The Linked Life Data approach – “the slide is targeted to the bioinformaticians and present the type of data sources are integrated and could be queried”Gene and Proteins – custom RDF schemaInteraction and Pathways – BioPAX schemaDrugs – Schema is designed by Linked Open Drug DataOntologies and Thesauri – SKOSDocuments – a light weight schema
  • (Behaviour view)
  • Eh… it is still possible to get these answers in real time:Select all human genes, which code for proteins with known molecular interactions (as part of physiological process, like inflammation) and are analyzed with molecular techniques like ‘Transfection‘;Restrict the results just to gene or proteins which are known drug targets for a specific disease (like ‘Asthma’)
  • Faceted browsing of the results.PREFIX rdf: PREFIX entrezgene: PREFIX uniprot: PREFIX rdfs: PREFIX skos: PREFIX gene: PREFIX core: PREFIX biopax2: PREFIX lifeskim: PREFIX umls: PREFIX pubmed: PREFIX drugbank: SELECT distinct ?geneName ?uniprotName ?targetFunction ?drugName ?molecularTechnique WHERE { ?interaction rdf:type biopax2:interaction . ?interaction biopax2:PARTICIPANTS ?p . ?p biopax2:PHYSICAL-ENTITY ?protein . ?protein skos:exactMatch ?uniprotaccession . ?uniprotaccessionuniprot:mnemonic ?uniprotName. ?uniprotaccessioncore:organism . ?geneidgene:hasDescription ?genedescription . ?geneidgene:uniprotAccession ?uniprotaccession . ?geneidgene:pubmed ?pmid . ?geneidentrezgene:hasOfficialSymbol ?geneName. ?pmidlifeskim:mentions ?umlsid . ?umlsidrdf:type . ?umlsidskos:prefLabel ?molecularTechnique . ?target skos:closeMatch ?geneid. ?target drugbank:specificFunction ?targetFunction. ?drug drugbank:target ?target. ?drug drugbank:genericName ?drugName . }LIMIT 10000
  • Link the redundant/related information with SKOS predicates skos:exactMatch, skos:closeMatch, skos:broaderMatch, skos:narrowMatch
  • The type of links we have found in 20 different sources
  • Semantic annotation the most powerful linking technique
  • Semantic annotation results; planning more
  • Quantitative distribution of the data sources. – sequence databases are the biggest pie (gene/proteins) – 4 times dbpedia; ontology/thesauri (UMLS, OBO, etc) – not visible on the picture

Expanding the Pathway and Interaction Knowledge in Linked Life Data Expanding the Pathway and Interaction Knowledge in Linked Life Data Presentation Transcript

  • Ontotext AD
    Expanding the Pathway and Interaction Knowledge in Linked Life Data
    10/23/2009
  • Holistic View of the Scientific Problems
    Link data between different silos applications
    Put the information into context
    Analyze the knowledge locked into unstructured data
    Environmental Factors
    10/23/2009
    Semantic Web Challenge 2009
  • The Challenge of the Holistic View
    Extreme amount of data
    Data is supported by different organizations
    Information is highly distributed and redundant
    Tons of flat file formats with special semantics
    Knowledge is locked in vast data silos
    Isolated communities which could not reach cross-domain understanding
    10/23/2009
    Semantic Web Challenge 2009
  • Ontologies and Thesauri
    Unified Medical Language System®
    Drugs
    The Open Biomedical Ontologies
    Linked Open Drug Data
    Text-Analysis
    Interactions and Pathways
    Linked Life Data
    RDF Warehouse
    Biological Pathway Exchange
    Semantic Annotations
    Genes and Proteins
    Documents
    5 billion RDF statements!
  • LLD Integration Process
    Data Source Identification
    Flat files
    OBO files
    XML
    RDBMS
    RDF
    Special tailored transformer
    OBO to SKOS converter
    Custom XSLT
    RDBMS to RDF formatter
    RDF warehouse
    Reasoner
    Instance Mappings
    Semantic Annotations
    10/23/2009
    Semantic Web Challenge 2009
  • Complex Cross-Domain Queries
    Semantic Web Challenge 2009
    10/23/2009
    Physiological
    process
    Molecular Technique
    filter human genes
    Gene
    participate in
    cause
    Molecular Interaction
    Disease
    analyzed by
    express protein
    Protein
    Drugs
    curated interaction
    treated
    with
    target
  • Semantic Web Challenge 2009
    10/23/2009
    TODO Put a nice screenshot
  • Instance Alignment
    Redundancy is removed by human crafted declarative rules
    Semantic Web Challenge 2009
    10/23/2009
    CD40L_HUMAN
    biopax-2:SHORT-NAME
    cpath:CPATH-94138
    CD40L_HUMAN
    biopax-2:DB
    biopax-2:XREF
    UNIPROT
    uniprot:mnemonic
    biopax-2:PHYSICAL-ENTITY
    biopax-2:ID
    uniprot:mnemonic
    TNF5_HUMAN
    P29965
    biopax-2:PHYSICAL-ENTITY
    uniprot:P29965
    uniprot:mnemonic
    TNFL5_HUMAN
    cpath:CPATH-LOCAL-8467065
    uniprot:mnemonic
    cpath:CPATH-LOCAL-8749236
    CD4L_HUMAN
  • db
    X
    Y
    X
    Y
    ns-x: id
    ns-y: id
    id
    db: id
    X
    X
    Y
    accession
    term
    db: accession
    db: id
    Y
    text to describe name
    Y
    X
    Y
    X
    name
  • Semantic Annotations
    Semantic Web Challenge 2009
    10/23/2009
    Respiration Disorders
    umls:C0035204
    Chronic Obstructive Airway Diseases
    broader
    Bronchial Diseases
    umls:C0006261
    Asthma and chronic obstructive pulmonary disease (COPD) are chronic airway diseases characterized by airflow obstruction. The beta(2)-adrenoceptormediates bronchodilatation in response to exogenous and endogenous beta-adrenoceptor agonists. Single nucleotide polymorphisms in the beta(2)-adrenoceptorgene (ADRB2) cause amino acid changes (e.g. Arg16Gly, Gln27Glu) that potentially alter receptor function. Recently, a large cohort study found no association between asthma susceptibility and beta(2)-adrenoceptor polymorphisms. In contrast, asthma phenotypes, such as asthma severity and bronchial hyperresponsiveness, have been associated with beta(2)-adrenoceptor polymorphisms.
    broaderTransitive
    COPD
    broader
    broader
    broaderTransitive
    Asthma
    mentions
    umls:C000496
    mentions
    journal
    pmid:17714090
    author
    Clinical and experimental pharmacology …
    Ian A Yang
  • Semantic Annotations
    Multiple information extraction views
    High-recall annotations
    lifeskim:mentions
    705,338,334 statements
    High-precision annotations
    lifeskim:mentionsStrict
    263,323,164 statements
    Semantic Web Challenge 2009
    10/23/2009
    lifeskim:mentions
    rdfs:subPropertyOf
    lifeskim:mentionsStrict
  • Distribution of RDF Triples
    10/23/2009
    Semantic Web Challenge 2009
  • Conclusion
    Linked Life Data is a public and free service
    There are many more interesting queries
    Multi-paradigm search
    Concept co-occurrence
    Combine text with structured search
    Easy integrates information from the public linked data cloud
    Why not give a try?
    Semantic Web Challenge 2009
    10/23/2009
  • Acknowledgement
    AstraZeneca
    Bosse Andersson
    LODD
    BioRDF
    HCLSIG
    Ontotext
    Deyan Peychev
    Georgi Georgiev
    Todor Primov
    OWLIM team
    Semantic Web Challenge 2009
    10/23/2009
    The development of PIKB and Linked Life Data is partially funded by FP7 215535