    1. 1. Linked Data for Health Care and Life Science Research Jun Zhao University of Oxford
    2. 2. Outline <ul><li>What is Linked Data? </li></ul><ul><li>What do you need to make Linked Data? </li></ul><ul><li>What can you do with Linked Data? </li></ul>
    4. 4. What are the differences? <ul><li>These are not data warehouses </li></ul><ul><ul><li>Individual stores, individual SPARQL access points </li></ul></ul><ul><ul><li>Easier to maintain and to update </li></ul></ul><ul><li>They are taking advantage of the Web </li></ul><ul><ul><li>Using the web as the platform </li></ul></ul><ul><ul><li>Using URIs to identify and link entities </li></ul></ul><ul><ul><li>Building a Web-scale knowledge base </li></ul></ul>
    5. 5. How to make linked data? <ul><li>Publish data as RDF </li></ul><ul><li>Assign unique identifiers to data entities </li></ul><ul><li>Use HTTP URIs so that people can look up those names </li></ul><ul><li>Include links to other data resources so that they can discover more things </li></ul><ul><li>Provide SPARQL endpoints so that data can be accessed and queried </li></ul>
    6. 6. How….? cont. <ul><li>Linked data publication tools </li></ul><ul><ul><li>D2R server </li></ul></ul><ul><ul><li>Triplify </li></ul></ul><ul><ul><li>Pubby </li></ul></ul><ul><ul><li>Virtuoso Sponge </li></ul></ul><ul><li>Transformation scripts are widely shared and open accessible </li></ul><ul><li>Automatic link creation tools </li></ul><ul><ul><li>Silk, see presentation on Thursday 2 pm </li></ul></ul>
    7. 7. Linked Open Drug Data <ul><li>A task force of the W3C Health Care Life Science Interest Group, started since October 2008 </li></ul><ul><li>Enrich the Web of Data by publishing drug-related and as Linked Data </li></ul><ul><li>Investigate the benefits of LODD for drug discovery and biomedical research </li></ul><ul><li>~ 12 active participants, including researchers and pharmas </li></ul>
    8. 8. Dataset Outgoing links LinkedCT 220, 569 DrugBank 59, 661 DailyMed 38, 220 RDF-TCM 3, 438 Diseasome 31,065 SIDER 19, 281
    9. 9. Dataset Content Publishing tool Triples LinkedCT Derived from; more than 60,000 trials conducted in the US and other countries D2R Server 7,036, 000 DrugBank Nearly 5,000 FDA-approved small molecule and biotech drugs D2R Server 767,000 DailyMed Published by National Library of Medicine (NLM); high quality packaging information on 4,300 marketed drugs D2R Server 164, 300 RDF-TCM 850 herbs, herb-gene and herb-disease associations Pubby 117, 600 Diseasome A network of disorders and disorder genes, obtained from Online Mendelian Inheritance in Man (OMIM) D2R Server 91, 200 SIDER Information on 930 marketed drugs and 1,700 related side effects D2R Server 192,500 8, 400, 000
    10. 10. Create linked data <ul><li>Heterogeneous source data </li></ul><ul><ul><li>Relational database dumps, tab-delimited data … </li></ul></ul><ul><ul><li>Used D2R Server and OpenLink Virtuos to publish linked data </li></ul></ul><ul><ul><li>Used Silk and LinQuer to create links </li></ul></ul><ul><li>We got to a long way without data integration or consensus of the semantics </li></ul><ul><li>The difficulties </li></ul><ul><ul><li>Understand the semantics of the source data </li></ul></ul><ul><ul><li>Heterogeneous semantics between source data </li></ul></ul>
    11. 14. What is the alternative medicine of Varenicline used for treating Epilepsy?
    12. 15. SELECT DISTINCT ?diseaseLabel ?altMedicineLabel WHERE { <> drugbank:possibleDiseaseTarget ?disease . ?disease owl:sameAs ?sameDisease . ?altMedicine tcm:treatment ?sameDisease . ?altMedicine rdf:type tcm:Medicine . ?sameDisease rdfs:label ?diseaseLabel . ?altMedicine rdfs:label ?altMedicineLabel . } ------------------------------------------ | diseaseLabel | altMedicineLabel | ========================================== | &quot;Epilepsy&quot; | &quot;Ginkgo biloba&quot; | | &quot;Epilepsy&quot; | &quot;Cynanchum otophyllum&quot; | | &quot;Epilepsy&quot; | &quot;Piper longum&quot; | | &quot;Epilepsy&quot; | &quot;Datura stramonium&quot; | | &quot;Epilepsy&quot; | &quot;Uncaria rhynchophylla&quot; | | &quot;Epilepsy&quot; | &quot;Cannabis sativa&quot; | | &quot;Epilepsy&quot; | &quot;Gastrodia elata&quot; | ------------------------------------------ Query 6 datasets as if they are one Thanks to Olaf Hartig
    15. 18. Thank you!