2009 09 Lod London


Published on

Presentation about Linking Open Drug Data at the Linked Data Gathering in London 2009.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • TODO: statistics about the number of triples from Anja’s doc
  • TODO: check the updated figure from Anja
  • 2009 09 Lod London

    1. 1. Linked Data for Health Care and Life Science Research Jun Zhao University of Oxford
    2. 2. L inking O pen D rug D ata (LODD) <ul><li>A task force of the W3C Health Care Life Science Interest Group, started since October 2008 </li></ul><ul><li>Enrich the Web of Data by publishing drug-related and as Linked Data </li></ul><ul><li>Investigate the benefits of LODD for drug discovery and biomedical research </li></ul><ul><li>~ 12 active participants, including researchers and pharmas </li></ul><ul><li>Anja Jentzsch, Bosse Anderssen, Chris Bizer, Eric Prud'hommeaux, Don Doherty, Matthias Samwald, Oktie Hassanzadeh, Scott Marshall, Susie Stephens </li></ul>
    3. 3. Dataset Content Publishing tool Triples LinkedCT Derived from ClinicalTrials.gov; more than 60,000 trials conducted in the US and other countries D2R Server 7,036, 000 DrugBank Nearly 5,000 FDA-approved small molecule and biotech drugs D2R Server 767,000 DailyMed Published by National Library of Medicine (NLM); high quality packaging information on 4,300 marketed drugs D2R Server 164, 300 RDF-TCM 850 herbs, herb-gene and herb-disease associations Pubby 117, 600 Diseasome A network of disorders and disorder genes, obtained from Online Mendelian Inheritance in Man (OMIM) D2R Server 91, 200 SIDER Information on 930 marketed drugs and 1,700 related side effects D2R Server 192,500 8, 400, 000
    4. 4. Dataset Outgoing links LinkedCT 220, 569 DrugBank 59, 661 DailyMed 38, 220 RDF-TCM 3, 438 Diseasome 31,065 SIDER 19, 281
    5. 5. Create linked data <ul><li>Heterogeneous source data </li></ul><ul><ul><li>Relational database dumps, tab-delimited data … </li></ul></ul><ul><li>Most data are open access </li></ul><ul><li>The toolkits are maturing </li></ul><ul><ul><li>D2R Server and OpenLink Virtuos </li></ul></ul><ul><li>The difficulties </li></ul><ul><ul><li>Understand the semantics of the source data </li></ul></ul><ul><ul><li>Heterogeneous semantics between source data </li></ul></ul><ul><li>We got to a long way without data integration or consensus of the semantics </li></ul>
    6. 6. Create links between data <ul><li>Challenge: create links on a large scale </li></ul><ul><li>Silk </li></ul><ul><ul><li>Mapping data by querying their SPARQL endpoints </li></ul></ul><ul><ul><li>Silk-LSL: Combining mappings rules, mapping algorithms and matching thresholds </li></ul></ul><ul><li>LinQuer </li></ul><ul><ul><li>Semantic link discovery over relational data </li></ul></ul><ul><ul><li>LinQL: specify linkage requirements which are rewritten into SQL queries </li></ul></ul><ul><li>Sacrifice recall for precision </li></ul><ul><li>Maintain updates of links </li></ul>
    7. 7. Use case: connect medical knowledge <ul><li>Apart from the growth of studies in alternative medicines, they are yet included in standard medical care in western countries </li></ul><ul><li>A lot of knowledge about alternative medicine is not available in English </li></ul><ul><li>Use DBpedia to link together information </li></ul><ul><li>Create a Web of Data connecting alternative medicine studies with western biomedical research </li></ul>
    8. 8. Applications <ul><li>Patients : </li></ul><ul><ul><li>Search for alternative medicine for a disease </li></ul></ul><ul><ul><li>Search for clinical trial information about a herb </li></ul></ul><ul><ul><li>Search for side effects information about a herb </li></ul></ul><ul><ul><li>Search for alternative herbs for a western drug, such as what is the alternative medicine for aspirin </li></ul></ul><ul><li>Researchers </li></ul><ul><ul><li>Confirm these genes are also associated with the disease in biomedical research </li></ul></ul>
    9. 11. true
    10. 13. Are there any Raccoons in India?
    11. 14. The TOP pharma questions <ul><li>What patents exist for this pathway/target? </li></ul><ul><li>What side effects are there for this drug, especially those not on the label? </li></ul><ul><li>Has a similar compound to ours been approved previously, and what were the side effects? </li></ul><ul><li>http://esw.w3.org/topic/HCLSIG/LODD/Questions </li></ul>
    12. 15. Future issues <ul><li>Links update notification </li></ul><ul><li>Annotate the data with the Translational Medicine Ontology, also from W3C HCIS </li></ul><ul><li>More applications to support scientific questions </li></ul><ul><li>Work with more datasets: such as gene expression data, protein interaction network data </li></ul>
    13. 16. http://esw.w3.org/topic/HCLSIG/LODD/ Thank you!