Linked Data and Cochrane Reviews

  • 1,809 views
Uploaded on

A talk I gave at the Cochrane Colloquium in Madrid in October 2011 …

A talk I gave at the Cochrane Colloquium in Madrid in October 2011
http://www.cochrane.org/multimedia/multimedia-cochrane-colloquia-and-meetings/colloquium-madrid-2011#5

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,809
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The technologies that underpin this can be used at the level of the web itself AND within and across individual datasets. We are investigating both.Current web = web of documentsData in documents (mostly HTML/XML, etc.) not structuredLinked data allows for structuring data so that both humans and computers can understand itFor example, Cochrane Review XML is highly-structured but relationships not explicitIf they were, we could query across dataset and link other datasets for complex queriesThe web turns into a giant database (the vision, anyway)Search results display can be improvedContent enriched with external data and re-packaging of our dataMany other possiblitiesCurrent web = web of documentsMostly in HTML/XMLGood for telling computers how to display informationNot good for embedding meaning and relationships between bits of informationLinked Data technologies allow for structuring data so that both humans and computers can understand itIt’s all about structured data
  • Every presentation about the web and technology nowadays has to include a cloud image!!
  • So, what does this mean for Cochrane data.
  • Cochrane Reviews are in XML that is transformed into HTML for display on web pages (Cochrane Library, cochrane.org, etc.). The document is highly structured but relationships between the various elements not explicit. If they were, we could link data across our own dataset, across Reviews.
  • Cochrane Reviews are in XML that is transformed into HTML for display on web pages (Cochrane Library, cochrane.org, etc.). The document is highly structured but relationships between the various elements not explicit. If they were, we could link data across our own dataset, across Reviews.
  • The CRS is an exciting new project that could facilitate better linking of data. (Refer to something Gordon said in his talk, if possible/applicable).
  • Henceforth in this presentation, when you see this image, all that semantic technology “stuff“ is at work behind-the-scenes...
  • Triples in a triple store are understandable by both humans and machines. We think in triples. So, imagine having a whole bunch (millions) of these statements about aspects of Cochrane Reviews in one bin and querying it. You could infer all sorts of new knowledge. Then, imagine chucking in and linking to other datasets such as the CRS or external ones...
  • And, we‘re going to need those gears where we‘re going! So, this is what the “Star Trek“ stream of work has done so far.
  • Explain Star Trek joke and general framework here...
  • This is the initial pass at creating an ontology, a structured representation of the concepts behind Cochrane Reviews. I know you can‘t read it all here but the basic idea is to express all the relationships between the various parts of the Review.In case anyone asks:- What the heck is an “ontology” – a definition: “An ontology is a formal specification of a shared conceptualization” – Tom Gruber- Ontology = Database schema (more or less)
  • So, here you have the concept of a Review.
  • Reviews have included studies.
  • Reviews include comparisons. Comparisons include outcomes that are compared. Etc. This is by no means complete but was meant to assist in a “proof of concept“ exercise we called “interrogating the XML“ of Cochrane Reviews.
  • For example, in the area of findings, the ontology is not yet fleshed out. Lorne Becker thus began a basic finding ontology to move us forward in that area.
  • Well, we‘ll need the gears for sure! From this initial ontology, we were able to use the gears (OWL, RDF, SPARQL) to answer some initial queries. We started by asking the question: “What have you always wanted to know from Cochrane Reviews?“ So, to tease out queries that cut across the dataset and answer questions that Review structure is not set up to do, currently. For example...***Special note: Archie and/or the CRS might already be able to do some of these queries either via the front-end interface or behind the scenes. The difference with the linked data approach is that this markup enables linking to outside datasets as well as “stitching together” data across Cochrane datasets. In addition, there are advantages and disadvantages to this approach that Paul will cover.***
  • These questions build on each other in increasing complexity...
  • Just so you can visualize the potential application of this: In this Pubmed search, each Cochrane box has pointers to the Cochrane review(s) that included that trial.
  • Just showing this once: the actual gears at work!
  • What sorts of bias are most prevalent in this particular body of research/clinical question of interest?
  • So, in just these first 2 sample questions, one can see how looking across the data and querying the dataset in ways that aren‘t currently possible (without a load of manual work, of course) can allow us to ask new questions of our data. These are just a few, “proof of concept“ examples. There are an infinite number of other possible questions we could have asked...
  • How can we achieve thìs? Linked Data sets like Linkedlifedata.com contain multiple datasets all linked together via...
  • the gears! So, with our data using the gears, and these datasets using these same gears, we can enrich our content, improve search, etc.
  • For example, one dataset in linkedlifedata is DrugBank. Drugbank contains all the variant names, worldwide, for a given drug. But...
  • Our Reviews contain many inconsistencies in the data when referring to drugs and in fields. One of the lessons learned from the Star Trek process so far is that our data is not always clean and consistent. Thankfully, there are things we can do to affect this without changing Archie/RevMan now. Perhaps mention Semantic BioMedical Tagger??
  • But, we might want to look at improvements we can make to RevMan, Archie and our processes that are further “upstream“ in the Review production process that can improve the quality of the data that comes out.
  • Obviously a hand-cranked example but you get the idea. This seems super star trek but we‘re in contact with folks at the BBC who are interested in doing a project such as this.
  • Looking to the future again...
  • We could look at answering these kinds of questions which involve external datasets and mashups...
  • We could look at answering these kinds of questions which involve external datasets and mashups...
  • GeoNames is a linked data set with geographical information and WWARN, though not yet available in linked data markup, contains info about Malaria drug resistance worldwide.
  • Photoshopped visualization of the answer to this question.
  • Some of these developments require us, an organisation, to think differently about our content. The one-size-fits-all “container“ of the Cochrane Review will need to be flexible in order for use to meet new user demands and to allow for content that travels freely (any device, any platform, any context), retains its context and meaning so that people know they can trust it and allows us to create new “products“ to meet these new user demands.
  • Quote from Martin Hepp: The “value proposition“ of Cochrane Review data is what we have to say about health care, about the evidence behind certain interventions for certain conditions, about the trials that are conducted, etc. Structured and linked data allows us to spread our message wider, to disseminate more effectively the valuable things we have to say about how health care is administered worldwide.
  • BUT, we shouldn‘t sell the farm or throw out what we have and start from scratch at all! What‘s great about these technologies is that they can sit alongside and enrich and enhance our content, without overturning current processes and infrastructure. So, no worries!
  • How could partners use it? What would it look like on a news site? Or, in PubMed? or anywhere else, for that matter?
  • Obviously, this is a Photoshopped image. We‘d need to work out a deal with PubMed! But, it represents the basic idea of thinking of our content in “nimble“ terms.
  • Cochrane could take the lead and model the entire knowledge space of Evidence-based Health Care in these semantic standards and create a giant triple store with our data. Then, others in the EbHC would use our ontologies and refer to our data and thus we would drive “the conversation” around the data.
  • Then, once we start throwing in other datasets, the triple store becomes even more powerful (note: not sure I drew all lines between all datasets, but you get the idea)…
  • For example, Volkswagen have done this for the car industry. They modeled the domain of car options with a car options ontology and are now positioned with “first mover” advantage in the car industry in leveraging semantic technologies.
  • This crazy image is the Linked Data cloud which shows all the various datasets currently in the web of data. The pink area is the life sciences area and includes PubMed, DrugBank and others.

Transcript

  • 1. Linked Data and Cochrane Reviews A report from the „Star Trek‟ CrewChris MavergamesWeb Operations Manager/Information ArchitectCochrane Collaboration Web Team
  • 2.  Intro to linked data and what it means for Cochrane "Star Trek" stream of work so far Whats possible now and in the future * Acknowledgements to Lorne Becker and the entire Star Trek crew. Their input was invaluable in the preparation of this talk.Structure of this talk
  • 3.  There are problems that limit their use by some people ◦ Difficult to wade through all of the text ◦ Difficult to understand the figures, terminology, and other bits of the Review ◦ Hard to compare interventions without reading multiple Reviews ◦ Can be difficult to find the Review you seekCochrane Reviews are fantasticBUT…
  • 4.  Search for “Prozac” – no reviews Search for “fluoxetine” – 25 reviewsSearching The Cochrane Library
  • 5.  Beginning to do this now: ◦ Summaries.Cochrane.org for consumers ◦ Cochrane Clinical for clinicians BUT ◦ Takes a lot of work to reformulate reviews & authors, CRGs, etc are busyWouldn’t it be nice if we could automate or partially automate this?Ideally we‟d restructure ourcontent for different users
  • 6. How did Bing read 3 differentweather sites & bring me the dataI need?
  • 7. If so, what might we be able to accomplish?Could we do similar magic withour Cochrane reviews?
  • 8. Linked data
  • 9. Semantic Web is made up of: Linked Data & Web of Data Which all together comprise Web 3.0What is linked data?
  • 10. Current web = Web of documents
  • 11. Docs are linked not data in docs
  • 12.  Data on the web is meant for human consumption Machines need the data to be structured Once structured, information can be more easily shared within datasets and across web pagesMachines aren„t good at readingweb pages
  • 13. Cochrane Reviews and Linked data
  • 14. XML <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <COCHRANE_REVIEW DESCRIPTION="For publication" DOI="10.1002/14651858.CD008440" GROUP_ID="HIV" ID="589309120202025823" MERGED_FROM="" MODIFIED="2011-05-06 12:29:46 +0100" MODIFIED_BY="Rachel Marshall" REVIEW_NO="" REVMAN_SUB_VERSION="5.1.1" REVMAN_VERSION="5" SPLIT_FROM="" STAGE="R" STATUS="A" TYPE="INTERVENTION" VERSION_NO="2.0">........Cochrane Reviews
  • 15. Fortunately, Cochrane Reviews are structured – but we still need to teach the machines how to read them, where to find data within them and how the data is related.
  • 16. Data Data point Data point point Data Data point Data point pointCochrane Reviews
  • 17. Cochrane Register of Studies
  • 18.  Lack of unique study IDs a real problem CRS solves this by providing a unique ID for all studies that can be referenced Better linking of data about trials and possibilities with linking to external sources such as PubMed (example later)Links to the CRS
  • 19.  OWL (Web Ontology Language) RDF (Resource Description Framework) SPARQL (RDF query language) Model Cochrane Reviews in OWL Transform them into RDF and add to triple store Query them with SPARQL OR, simply...Linked data technologies
  • 20. Use the gears!
  • 21. Subject -> Property -> Object<Gerd Antes> has-role <Director German Ctr><Director German Ctr> works-in <Freiburg, Germany><Gerd Antes> works-in <Freiburg, Germany>Triple store = Way we think!
  • 22. Standard tools have been All developed to facilitate this Reviews process in Archie A Copy of A A Machine the Model Readable Review of the “Triple XML Data Store”Using “the gears”
  • 23. A Question A Machine Generated Answer A Machine Readable “Triple Store”Using “the gears”
  • 24. Star Trek
  • 25. Insert witty Star Trek reference here!
  • 26. Cochrane Review ontology
  • 27. Lots of work still needed from people with a deep understanding of Cochrane content in order to get the data model and ontology rightCochrane Review ontology
  • 28. Cochrane Review ontology
  • 29. Cochrane Review ontology
  • 30. Cochrane Review ontology
  • 31. Findings ontology from Lorne
  • 32. A Question A Machine Generated Answer A Machine Readable “Triple Store”What sorts of things could we dowith this?
  • 33. Gears!
  • 34.  Ask questions that use data from several different reviews Enhance the experience of our users by including data from the triple stores of others Improve search Make it easier for people to find Cochrane ReviewsWe can…
  • 35. Ask questions that use data from several different reviewsEnhancing the User Experience
  • 36. I’ve done a search for trials on a particular intervention for dementia. I want to know which of the trials have been included in a Cochrane Review.A question using multiple reviews
  • 37.  Search for the relevant Reviews Read the reference lists to find included trials Compare with my trial search Eliminate the new references that are additional publications from trials already included in a Review.OR…Finding the answer the old way
  • 38. My list of trials A A Machine ”studified” Generated list from list of the CRS trials not yet included The in a Cochrane review Review “Triple Store”The “Star Trek” Way
  • 39. Links to the relevant Review forthose trials that were included
  • 40. INSERT IMAGE FOR QUESTION 1 HEREQuestion 1: SPARQL query and partial list of results
  • 41. What are the risks of bias for the entire set of trials assessing the effectiveness of a particular intervention?Another question using multipleReviews
  • 42.  Search for the relevant reviews (there may be more than one) Read the tables of included studies to find risk of bias assessments for each trial Combine them* (in some cases review authors may have done this for all of the trials in a single review)Finding the answer the old way
  • 43. A Machine generated The summary of Cochrane the Risk of Review Bias “Triple assessments Store” for the relevant trialsThe “Star Trek” Way
  • 44. RoB Summary for Cochrane Reviews ondementia These figures summarize Risks of Bias from the trials included in the reviews in your searchQuestion 2 visualized
  • 45. XML <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <COCHRANE_REVIEW DESCRIPTION="For publication" DOI="10.1002/14651858.CD008440" GROUP_ID="HIV" ID="589309120202025823" MERGED_FROM="" MODIFIED="2011-05-06 12:29:46 +0100" MODIFIED_BY="Rachel Marshall" REVIEW_NO="" REVMAN_SUB_VERSION="5.1.1" REVMAN_VERSION="5" SPLIT_FROM="" STAGE="R" STATUS="A" TYPE="INTERVENTION" VERSION_NO="2.0">........Cochrane Reviews
  • 46. Make search work betterEnhancing the User Experience
  • 47.  Or, one could say any of these: Abenol (CA), Acephen, Anadin Paracetamol (UK), Apo-Acetaminophen (CA), Aspirin Free Anacin, Atasol (CA), Calpol (UK), Cetaphen, Childrens Tylenol Soft Chews, Disprol (UK), Exdol (CA), Feverall, Galpamol (UK), Genapap, Genebs, Infants Pain Reliever, Mandanol (UK), Nortemp, Pain Eze, Panadol (UK), Robigesic (CA), Silapap, Tycolene, Tylenol 8 Hour, Tylenol, Tylenol Arthritis, Uni- Ace, ValorinYou Say “Paracetamol”I Say “Acetaminophen”
  • 48. LinkedLifeData.com
  • 49. LinkedLifeData.com
  • 50. DrugBank
  • 51. XML <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <COCHRANE_REVIEW DESCRIPTION="For publication" DOI="10.1002/14651858.CD008440" GROUP_ID="HIV" ID="589309120202025823" MERGED_FROM="" MODIFIED="2011-05-06 12:29:46 +0100" MODIFIED_BY="Rachel Marshall" REVIEW_NO="" REVMAN_SUB_VERSION="5.1.1" REVMAN_VERSION="5" SPLIT_FROM="" STAGE="R" STATUS="A" TYPE="INTERVENTION" VERSION_NO="2.0">........Cochrane Reviews
  • 52. XML <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <COCHRANE_REVIEW DESCRIPTION="For publication" DOI="10.1002/14651858.CD008440" GROUP_ID="HIV" ID="589309120202025823" MERGED_FROM="" MODIFIED="2011-05-06 12:29:46 +0100" MODIFIED_BY="Rachel Marshall" REVIEW_NO="" REVMAN_SUB_VERSION="5.1.1" REVMAN_VERSION="5" SPLIT_FROM="" STAGE="R" STATUS="A" TYPE="INTERVENTION" VERSION_NO="2.0">........Cochrane Reviews
  • 53. Make it easier for people to find Cochrane ReviewsEnhancing the User Experience
  • 54. Enhancing news content
  • 55.  Cochrane Reviews marked up in semantic markup can be linked to news publishers For example, BBC Health writers could be suggested related Cochrane evidence for a particular story they are writing And, could include a link to primary source material such as a Cochrane Review Thus driving traffic to our ReviewsEnhancing news content
  • 56. Super Star Trek
  • 57. How applicable is this Review in my part of the world?Super Star Trek
  • 58. A list of the drugs in comparisons of malaria in Reviews and the geographic extent of their effectivenessGeographical relevance
  • 59. Map of Artemisin Resistance
  • 60. The future
  • 61. Structured and linked data can help makeour content “nimble”Nimble content can: • Travel Freely • Retain Context Meaning • Create New Products - R. Lovinger, RazorfishMaking our content nimble
  • 62. "Structured data allows you topreserve your value propositionover a longer distance to a muchwider audience." - Martin Hepp, creator of the Good Relations ontologyStructured data
  • 63. Implementing semantic and linkeddata technologies should be: • Non-invasive • Agile • Low impact (on staff – hopefully, high impact on users!)Incremental development
  • 64. What would Cochrane data “look like” outside of it’s container, the Review?Looking to the future
  • 65. For example: someone who is looking at a study in PubMed might be interested in seeing Cochrane’s Risk of Bias assessment of this study, regardless of whether they are interested in the overall Cochrane Review that includes that study.Risk of Bias in PubMed
  • 66. RoB assessment in PubMed
  • 67.  Linked Data or Web 3.0 is here How can we leverage these tools to further our mission Requires that we think differently about the “container“ of the Review Our data needs to become “nimble“ to meet future user needs We should proceed slowly, incrementally What are the “quick wins“ – Links to CRS? Across-Review queries? Links to external datasetsSummary
  • 68. CRS/ CDSR CENTRAL HTAs DARE CMREbHC Semantic Platform
  • 69. CRS/ CDSR CENTRAL UMLS Drug Bank Diseasome HTAs DARE Symptom CMR * BBC Health Ontology OntologyEbHC Semantic Platform * Not yet created
  • 70. Cochrane and EbHC ontology?
  • 71. Will Cochrane have a bubblehere someday?
  • 72. Muchas Gracias!