Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Linked Data, an opportunity to mitigate complexity              pharmaceutical research and development                   ...
do make use of relationships developed by experts, in terms of so      should be balanced with utilization of more mundane...
Upcoming SlideShare
Loading in …5

Linked data in pharma R&D


Published on

"Linked Data, an opportunity to mitigate complexity pharmaceutical research and development" A poster accapted for first international workshop on linked web data management in Uppsala, 25 March, 2011

Published in: Technology, Education
  • Be the first to comment

Linked data in pharma R&D

  1. 1. Linked Data, an opportunity to mitigate complexity pharmaceutical research and development Bo Andersson Kerstin Forsberg AstraZeneca R&D AstraZeneca R&D 221 87 LUND 431 83 MÖLNDAL +46 46 337621 +46 31 7065318 kerstin.l.forsberg@astrazeneca.comABSTRACT Table 1. SemanticWeb standardsIn the paper we present opportunities and challenges in applying Resource Description Framework (RDF) to modellinked data principles to improve the research utility of data to statements as so called RDF-triples of subject/predicate/object,mitigate complexity in pharmaceutical research and development. e.g. clinical study/active ingredient/omeprazole, often encodedAssociation and interpretation of the data needed for research and using the RDF/XML format.development in a pharmaceutical company has become a too Universal Resource Identifiers (URI:s) to globally identifycomplex task for individual scientists, or even teams to handle. entities, such as proteins, organ systems, clinical studies, e.g.Scientists need data to be organized for associations, prepared for http//, and alsonot yet defined use and ready for automation where computers can data content such as clinical observation alongside the scientists. Web Ontology Language (OWL) to express vocabularies of terms representing identified entities and relationships inKeywords portions of reality, e.g. the terms ‘active ingredient’ andlinked data, semantic web, pharmaceutical research ‘pharmaceutical product’ for two of the entities identified in the Translational Medicine Ontology (TMO).1. INTRODUCTION Simple Knowledge Organization System (SKOS) to expressDuring the WWW2007 conference a breakthrough of the Linked taxonomies and controlled vocabularies, e.g. the termsData idea happened in a session where web experts demonstrated ‘tauopathies’ and ‘dementia’ as broader terms than ‘alzheimerthe power of a new generation of the web, a web of data. For us disease’ in the Medical Subject Headings (MeSH).attending the session it was hard to imagine the full potential onwhat this idea would mean for individual scientists and for a Furthermore, access to data in a standard model is not enough, butpharmaceutical company. relationships among data must also be made available as RDFIn the Large Knowledge Collider (LarKC) project scientists triples to create a web of data. This collection of associatedworking in early clinical development have identified [1] how datasets can also be referred to as Linked Data.successful research and development rely on scientists having A typical case of a large linked dataset is DBpedia, which makesaccess to data and ability to utilize it, to handle uncertainty and to the structured content of Wikipedia available as RDF triples. Theprepare for the unexpected. To meet these challenges we have beauty of this is not only that Wikipedia data becomes availableidentified three key enablers: linked data principles, semantic web as linked data, but also that it connects to other datasets publiclystandards and open ontologies. available such as Drug Bank. For example, the entity identifiedOne activity in the W3C interest group for semantic web in Health with the URI -, is linkedCare and Life Science (HCLS) has been to link publicly available to the descriptions of the same entity in Drug Bank, also madedatasets [2] such as Drug Bank and These available as linked data.datasets are now part of the growing Linked Open Data cloud. An This means that facts available in the two different datasets can beimportant learning from the collaborative work across health care combined. For example the name Losec, and the 70+ other brandorganizations, pharmaceutical companies and academia groups is names for Omeprazole, from Drug Bank can be integrated bythat we all need the same datasets, however for different decisions different applications and can also be directly queried, e.g. whenand different types of applications. searching for brand names for drugs categorized in Wikipedia as essential medicines by WHO.2. LINKED DATA FOR PHARMAA federated knowledge repository of data, built with Semantic 3. OPPORTUNTIES FOR PHARMAWeb standards, is a new generation of knowledge base beyond Scientists routinely utilize publicly available documents as aseparated document repositories. Gaining value from the complement to data in internal and licensed sources. Most of thefederation of knowledge requires data to be available using a scientists have preferred sources they utilize to gain knowledgestandard model, of so called RDF triples. from and to associate with the data relevant for their decisions and questions. The knowledge mining is often done for individual tasks, and in many cases done manually by scientists together Copyright is held by the author/owner(s). with informatics and library experts. LWDM 2011, March 25, 2011, Uppsala, Sweden. Imagine an environment where internal data easily connects with Copyright 2011 ACM 978-1-4503-0608-9/11/03 ...$10.00. external datasets from different sources. The associations made,
  2. 2. do make use of relationships developed by experts, in terms of so should be balanced with utilization of more mundane and relaxedcalled ontologies, and thereby help scientists find answerers not relationships as for example from DBpedia, a source humans areobvious today. While broaden the scientific spectrum where comfortable to use.individual scientists find their answers, this would also stimulate The research utility of shared datasets will also require explicitmore cross scientific sharing and establish a foundation for provenanceiii information to make it possible for scientists andimproved computational support in research and development. computers to make trust judgments about the datasets.We therefore propose the prospective use of linked data principles Linked datasets also needs to be described with a vocabulary thatas a component of the information infrastructure for allow for automation of discovery, selection and association. Thepharmaceutical research and development, to make sure internal Vocabulary of Interlinked Datasets (void)iv standard allowsdata connects to external data. This is a step towards an formally describing shared datasets and thereby improving theinformation infrastructure where data is prepared for association, research utility.for not yet defined use, and for facilitating computers to functionalongside scientists to mitigate the complexity. A pragmatic approach for linked data management is needed that motivates people to constructively face the tension between4. CHALLENGES FOR PHARMA different needs. This must be an iterative process where weFrom internal semantic web projects, by participation in the continuously improve the research utility of shared datasets byLarKCi and in W3C HCLSii since 2006 we have built experience applying three key enablers:in linking data. Although integration of disparate datasets today is  Linked Data principles, including establishing globaleasily done by experts, there are still more to do. We need to namespaces of URI:sreach a maturity level where datasets is routinely published in a  Semantic web standards, also for provenance data andway that automatically associates them with other datasets. dataset descriptionsThe Linking Open Drug Data (LODD) team in W3C HCLS  Open ontologies, with the required level of precisionrevealed various challenges during the work to connect public and consistencyavailable datasets such as DrugBank and Many And, also a process where computers are used to work alongsideof them relate to technology but a key aspect is peoples’ scientists to automatically:willingness to constructively face the tension of opposing needs to  Identify entities and assign URI:simprove the research utility of shared datasets.  Structure data and capture provenance dataRelationships are a significant challenge for shared datasets in  Mine for causal relationships and inconsistencypharmaceutical research and development. We often have a strongprevalence of terminology conflicts, synonyms, and homonyms.For instance, the word ‘drug’ can refer to the whole REFERENCESpharmaceutical product, or just the active ingredient. [1] B. Andersson, V. Momtchev, Requirements summary andBiomedical ontologies represent what entities exist in portions of data repository, Large Knowledge Collider (LarKC), 2008.the biological and clinical reality, and how they relate to each One such ontology for translational medicine (TMO) [3] is 11_requirements-summary-and-data-repository_m6.pdfdeveloped by a W3C HCLS team with representatives from [2] A. Jentzsch, Enabling Tailored Therapeutics withpharmaceutical companies, health care providers, National Centre Linked Data. In Proceedings of the WWW2009 workshop onfor Biomedical Ontology (NCBO) and academia groups. TMO Linked Data on the Web (LDOW2009), 2009.structure the different meanings of ‘drug’, by identifying different of entities and making relationships between them explicit: .pdf’Molecular entity’ for single molecule kinds. ‘Active ingredient’ [3] M. Dumontier, The Translational Medicine Ontology:as a role played by a chemical substance which has the disposition Driving personalized medicine by bridging the gap fromto treat a certain disease and is part of a ‘pharmaceutical bedside to bench. In Proceedings of the 13th Annual Bio-formulation’ for biologically active chemicals. A ‘Pharmaceutical Ontologies Meeting, Boston, USA (Bio-Ontologies 2010),product’ is a formulated pharmaceutical that has been approved. 2010. difficulty in using precise and consistent defined relationshipsin ontologies, such as the TMO, for improved research utilityi