Traditional thinking is that you need systems to describe the sequence: gene, proteins, ....What we need is a holistic view – disease in the context of the patient and the environmental factors; to link the information
(Structure view) The Linked Life Data approach – “the slide is targeted to the bioinformaticians and present the type of data sources are integrated and could be queried”Gene and Proteins – custom RDF schemaInteraction and Pathways – BioPAX schemaDrugs – Schema is designed by Linked Open Drug DataOntologies and Thesauri – SKOSDocuments – a light weight schema
(Behaviour view)
Eh… it is still possible to get these answers in real time:Select all human genes, which code for proteins with known molecular interactions (as part of physiological process, like inflammation) and are analyzed with molecular techniques like ‘Transfection‘;Restrict the results just to gene or proteins which are known drug targets for a specific disease (like ‘Asthma’)
Link the redundant/related information with SKOS predicates skos:exactMatch, skos:closeMatch, skos:broaderMatch, skos:narrowMatch
The type of links we have found in 20 different sources
Semantic annotation the most powerful linking technique
Semantic annotation results; planning more
Quantitative distribution of the data sources. – sequence databases are the biggest pie (gene/proteins) – 4 times dbpedia; ontology/thesauri (UMLS, OBO, etc) – not visible on the picture
Expanding the Pathway and Interaction Knowledge in Linked Life Data - Presentation Transcript
Ontotext AD Expanding the Pathway and Interaction Knowledge in Linked Life Data 10/23/2009
Holistic View of the Scientific Problems Link data between different silos applications Put the information into context Analyze the knowledge locked into unstructured data Environmental Factors 10/23/2009 Semantic Web Challenge 2009
The Challenge of the Holistic View Extreme amount of data Data is supported by different organizations Information is highly distributed and redundant Tons of flat file formats with special semantics Knowledge is locked in vast data silos Isolated communities which could not reach cross-domain understanding 10/23/2009 Semantic Web Challenge 2009
Ontologies and Thesauri Unified Medical Language System® Drugs The Open Biomedical Ontologies Linked Open Drug Data Text-Analysis Interactions and Pathways Linked Life Data RDF Warehouse Biological Pathway Exchange Semantic Annotations Genes and Proteins Documents 5 billion RDF statements!
LLD Integration Process Data Source Identification Flat files OBO files XML RDBMS RDF Special tailored transformer OBO to SKOS converter Custom XSLT RDBMS to RDF formatter RDF warehouse Reasoner Instance Mappings Semantic Annotations 10/23/2009 Semantic Web Challenge 2009
Complex Cross-Domain Queries Semantic Web Challenge 2009 10/23/2009 Physiological process Molecular Technique filter human genes Gene participate in cause Molecular Interaction Disease analyzed by express protein Protein Drugs curated interaction treated with target
Semantic Web Challenge 2009 10/23/2009 TODO Put a nice screenshot
Instance Alignment Redundancy is removed by human crafted declarative rules Semantic Web Challenge 2009 10/23/2009 CD40L_HUMAN biopax-2:SHORT-NAME cpath:CPATH-94138 CD40L_HUMAN biopax-2:DB biopax-2:XREF UNIPROT uniprot:mnemonic biopax-2:PHYSICAL-ENTITY biopax-2:ID uniprot:mnemonic TNF5_HUMAN P29965 biopax-2:PHYSICAL-ENTITY uniprot:P29965 uniprot:mnemonic TNFL5_HUMAN cpath:CPATH-LOCAL-8467065 uniprot:mnemonic cpath:CPATH-LOCAL-8749236 CD4L_HUMAN
db X Y X Y ns-x: id ns-y: id id db: id X X Y accession term db: accession db: id Y text to describe name Y X Y X name
Semantic Annotations Semantic Web Challenge 2009 10/23/2009 Respiration Disorders umls:C0035204 Chronic Obstructive Airway Diseases broader Bronchial Diseases umls:C0006261 Asthma and chronic obstructive pulmonary disease (COPD) are chronic airway diseases characterized by airflow obstruction. The beta(2)-adrenoceptormediates bronchodilatation in response to exogenous and endogenous beta-adrenoceptor agonists. Single nucleotide polymorphisms in the beta(2)-adrenoceptorgene (ADRB2) cause amino acid changes (e.g. Arg16Gly, Gln27Glu) that potentially alter receptor function. Recently, a large cohort study found no association between asthma susceptibility and beta(2)-adrenoceptor polymorphisms. In contrast, asthma phenotypes, such as asthma severity and bronchial hyperresponsiveness, have been associated with beta(2)-adrenoceptor polymorphisms. broaderTransitive COPD broader broader broaderTransitive Asthma mentions umls:C000496 mentions journal pmid:17714090 author Clinical and experimental pharmacology … Ian A Yang
Distribution of RDF Triples 10/23/2009 Semantic Web Challenge 2009
Conclusion Linked Life Data is a public and free service There are many more interesting queries Multi-paradigm search Concept co-occurrence Combine text with structured search Easy integrates information from the public linked data cloud Why not give a try? Semantic Web Challenge 2009 10/23/2009
Acknowledgement AstraZeneca Bosse Andersson LODD BioRDF HCLSIG Ontotext Deyan Peychev Georgi Georgiev Todor Primov OWLIM team Semantic Web Challenge 2009 10/23/2009 The development of PIKB and Linked Life Data is partially funded by FP7 215535
0 comments
Post a comment