• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
A Reason Able View To The Web Of Pathway Data
 

A Reason Able View To The Web Of Pathway Data

on

  • 2,708 views

 

Statistics

Views

Total Views
2,708
Views on SlideShare
2,706
Embed Views
2

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 2

http://www.slideshare.net 1
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    A Reason Able View To The Web Of Pathway Data A Reason Able View To The Web Of Pathway Data Presentation Transcript

    • Vassil  Momtchev Group Leader, Semantic Life Science Applications A Reason-Able View to the Web of Pathway Data 2009/10/08
    • Objectives WITH NO 2009/10/08 Bio-IT World, Hannover
    • A Typical Question? Select drugs related to asthma that are linked to a curated molecular interaction in the literature where the protein is known to cause inflammatory response… 2009/10/08 Bio-IT World, Hannover
    • A Typical Answer!
      • Find all drugs related to asthma
      • Extract all proteins in a text file
      • Compose a long query to list the proteins by OR
      • Get a filtered list of genes
      • For each gene send a query in molecular interaction database
      2009/10/08 Bio-IT World, Hannover
    • A Typical Question? Select all human genes, which code for proteins with known molecular interactions and are analyzed with molecular techniques like ‘Transfection‘; Restrict the results just to gene or proteins which are known drug targets… 2009/10/08 Bio-IT World, Hannover
    • Wrap-up: Typical Problems
      • Link data between different silos applications
        • Hard to collaborate across domains due to information silos
        • Combine public and private information
        • Expensive process that could not be done on the fly
      • Put the information into context
        • Query cross-domain information
        • Make more interesting queries
        • No easy way to interpret the information
      • Analyze the knowledge locked into unstructured data
        • Database information is often outdated
        • Relationships are not enough to capture the case study questions
        • Hard to find information
      2009/10/08 Bio-IT World, Hannover
    • Agenda
      • Ontotext
      • Put your data into context
      • Linked Life Data platform and Pathway and Interaction dataset
      • LifeSKIM an application to find any existing information
      2009/10/08 Bio-IT World, Hannover
    • Who Are We?
      • Ontotext is a semantic technology provider
      • Established in year 2000 as part of Sirma Group
        • Sirma is a top-3 software house in Bulgaria , est. 1992, ~300 persons
        • Since September 2008 a separate company
      • Staff: 40 employees in Bulgaria
        • Multiple affiliates and contractors around the world
      • Over 150 person-years invested in product development
      Bio-IT World, Hannover 2009/10/08
    • What is Our Position?
      • Unique coverage of technology areas:
        • Semantic Databases : high-performance RDF DBMS, scalable reasoning
        • Semantic Search : text-mining (IE), Information Retrieval (IR)
        • Semantic Web Services and BPM : WS annotation, discovery, etc.
        • Web Mining : focused crawling, wrapping
        • Knowledge fusion : identity resolution, record linkage
      • Core business: development of semantic engines
        • Mostly product development and sales
        • Complemented by professional services
        • Joint ventures for vertical applications
      Bio-IT World, Hannover 2009/10/08
    • Our Application Domains
      • Ontotext technologies are used for various applications :
        • Data Integration (consolidation of multiple databases)
        • Knowledge & Content Management (enterprise search)
        • Business Intelligence
        • Web-mining/Web-intelligence
      • Major industries/markets
        • Life sciences and health care
        • Telecommunications
        • Media Archives, Media Research
        • Online recruitment
        • IP/Patent Research
        • Web search, Web 2.0 and Semantic Web start-ups
      Bio-IT World, Hannover 2009/10/08
    • Linked Life Data Pathway and Interaction Knowledge Base
      • Linked Life Data
        • RDF warehouse solution
        • Powered by OWLIM semantic database
      • Pathway and Interaction Knowledge Base
        • Dataset that integrates molecular data
        • Part of linked data cloud
      • Test with case studies developed
      • in collaboration with AstraZeneca
      2009/10/08 Bio-IT World, Hannover
    • Select drugs related to asthma that are linked to a curated molecular interaction in the literature where the protein is known to cause inflammatory response… 2009/10/08 Bio-IT World, Hannover
    • Select all human genes, which code for proteins with known molecular interactions and are analyzed with molecular techniques like ‘Transfection‘; Restrict the results just to gene or proteins which are known drug targets… 2009/10/08 Bio-IT World, Hannover
    • Holistic View of the Scientific Problems
      • Use data of more than 20 sources
      • Inter-link the information
      • Put it into new contexts
      • Switch different perspectives
      Environmental Factors 2009/10/08 Bio-IT World, Hannover
    • The Challenge of the Holistic View
      • Extreme amount of data
      • Data is supported by different organizations
      • Information is highly distributed and redundant
      • Tons of flat file formats with special semantics
      • Knowledge is locked in vast data silos
      • Isolated communities which could not reach cross-domain understanding
      • Increase the data abstraction level
      2009/10/08 Bio-IT World, Hannover
    • RDF Technology 2009/10/08 Bio-IT World, Hannover ERBB2 HER2 CD340 Q4H1F1 Q4H1F2 Protein GO:0005023 EGF receptor activity receptor activity peroxisome receptor ENSG00000141736 Gene 2064 Gene Ontology Term GO: 0004872 GO: 0005006 is_a is_a type type type type type type label label label label label label   database cross-reference  hasProtein hasProtein  hasGene hasGene
    • Linking Open Data Community Project
      • Use URIs to identify the information
      • Expose via dereferenceable URIs
      • Allows browsing of data spread across different servers, in the way HTML is browsed
      2009/10/08 Bio-IT World, Hannover
    • LLD Integration Process Data Source Identification Flat files OBO files XML RDBMS RDF Special tailored transformer OBO to SKOS converter Custom XSLT RDBMS to RDF formatter RDF warehouse Reasoner Instance Mappings Semantic Annotations 2009/10/08 Bio-IT World, Hannover
    • Over 20 Different Sources Number of statements: 4.792.035.475 Number of explicit statements: 2.218.239.691 Number of entities: 370.230.951 2009/10/08 Bio-IT World, Hannover Data source Description RDF statements Disease Ontology Disease Ontology is a controlled 446,066 Human Phenotype Ontology The human phenotype ontology (HPO) intends 70,911 Symptom Ontology The symptom ontology was designed around 4,163 DrugBank The DrugBank database is a unique bioinformatics 493,794 Diseasome The diseasome website is a disease relationships 69,546 DailyMed DailyMed provides high quality information about 116,992 SIDER SIDER contains information on marketed medicines 96,272 BioGRID The Biological General Repository for Interaction Datasets 1,892,897 INOH INOH (Integrating Network Objects with Hierarchies) 432,456 CellMap The Cancer Cell Map contains selected 173,914 HPRD The Human Protein Reference Database 18,05,651 HumanCYC HumanCyc is a bioinformatics database that describes 341,225 IMID General Repository for Interaction Datasets. 154,408 IntAct IntAct provides a freely available, open source database 11,005,555 Reactome Reactome is a free, online, open-source, curated resource 2,538,793 NCI-Nature Nature pathway interaction database. 333,415 KEGG KEGG PATHWAY is a collection of manually drawn 18,128,735 Entrez-Gene Entrez Gene is a searchable database of genes 107,193,308 PubMed PubMed is a service of the U.S. National Library of Medicine 807,851,455 UniProt Major resource for protein sequences 1,252,667,885 UMLS Metathesaurus Database that contains information about biomedical 12,420,882 UMLS Semantic network Semantic categorization of terminology 1,368
      • <C1,broader,C2>
      • <C2,broader,C3>
      • <C1,broaderTransitive,C3>
      • <premise 1>
      • <premise 2>
      • <conclusion>
      Schema Reasoning 2009/10/08 rdf:type broader umls:C0035204 inferred broader broaderTransitive Bio-IT World, Hannover COPD Bronchial Diseases Respiration Disorders umls:C0006261 umls:C0024117 Chronic Obstructive Airway Diseases
    • Instance Mapping biopax-2:SHORT-NAME biopax-2:XREF P29965 UNIPROT CD40L_HUMAN cpath:CPATH-94138 cpath:CPATH-LOCAL-8467065 biopax-2:PHYSICAL-ENTITY biopax-2:ID biopax-2:DB biopax-2:PHYSICAL-ENTITY cpath:CPATH-LOCAL-8749236 uniprot:P29965 CD40L_HUMAN uniprot:mnemonic TNF5_HUMAN uniprot:mnemonic TNFL5_HUMAN uniprot:mnemonic CD4L_HUMAN uniprot:mnemonic 2009/10/08 Bio-IT World, Hannover
    • X Y ns-x: id ns-y: id db id X Y db: id X Y accession db: id db: accession X term Y Y X Y X text to describe name name 2009/10/08 Bio-IT World, Hannover Namespace mapping Reference node Mismatched identifiers Value dereference Transitive link Semantic Annotations
    • Semantic Annotations 2009/10/08 Bio-IT World, Hannover broader umls:C0035204 broader broaderTransitive COPD Bronchial Diseases Respiration Disorders umls:C0006261 Chronic Obstructive Airway Diseases This an example text of document that mentions COPD disease hasDocumentText mentions Natural Language Processing Natural Language Processing Natural Language Processing Natural Language Processing
    • Semantic Annotations #2
      • Executed over selected textual fields
      • Powered by standard and open source NLP components
      • Very efficient parallelization techniques
      • The annotation process created UMLS and PubMed:
        • Over 705 millions high recall semantic annotations
        • Over 263 millions high precision semantic annotations
      Bio-IT World, Hannover 2009/10/08
    • Bio-IT World, Hannover 2009/10/08
    • Bio-IT World, Hannover 2009/10/08
    • Linked Life Data Service
      • Pathway and Interaction Knowledge Base is a free resource
      • LLD is free public service available http://linkedlifedata.com
      • OWLIM engine is experimentally proven to scale up to:
        • 20 billion RDF statements (15 billions explicit)
        • On a computer that costs less than 10’000$
      2009/10/08 Bio-IT World, Hannover
    • Conclusion
      • Ontotext is company that provides very efficient software to manage semantic information
      • Link data between different silos applications
      • Easily incrementally extensible
      • Put the information into context
      • Start more interesting queries
      • Manage knowledge derived from text mining
      Bio-IT World, Hannover 2009/10/08
    • Acknowledgement
      • AstraZeneca
        • Bosse Andersson
      • LODD
      • BioRDF
      • HCLSIG
      • Ontotext
        • Deyan Peychev
        • Georgi Georgiev
        • Todor Primov
        • OWLIM team
      Bio-IT World, Hannover 2009/10/08 The development of PIKB and Linked Life Data is partially funded by FP7 215535
    • Questions
      • ?
      Bio-IT World, Hannover 2009/10/08