Low Hanging Fruit Breakout Discussion #2


Published on

Four projects (compound risk dossier, text mining, screening data management, and support for cloud collaboration) were outlined during a breakout discussion led by Paul Bradley and Barry Hardy at the Pistoia Alliance Information Ecosystem Workshop in October 2011.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Low Hanging Fruit Breakout Discussion #2

  1. 1. Compound Risk Dossier ObjectivesImproved toxicological prediction demands the best integrated view of current and historicdata, both proprietary and public domain. The objective of the compound risk dossier (CRD)would be to create a service that is able gather and integrate risk/safety-related informationfor a compound (including consideration of similar structures, key moieties, metabolites,toxicology MoA, etc). The harvested information would then be integrated and presented tothe user in the form of a “safety profile”.Business CaseIt is envisaged that the CRD could bring the following business benefits:  The system would enable an efficient “background check” for NCEs based on structural or biological similarity, or possibly shared pharmacology, toxicology MoAs or adverse event effects, i.e. what is known about molecules similar to my candidate?  Creation of a safety profile, in which safety categories are normalised and can be grouped according to public ontologies, provides a powerful method of aligning data and enables intelligent analysis.  Pharma companies duplicate effort in aligning internal, vendor and public data; such a CRD service would reduce the organisation time for this sort of activity down to almost zero for common activities across organisations, which currently can be costly, time consuming, tedious, and error prone.Open Standards  Open vocabularies, ontologies, e.g. PubChem, ChemIDplus, WHOINN, OBO, OpenTox, ChEBI,…  Safety data sources: AERS, drug labels, regulatory documents, etc.  Open source methods (QSAR, CDK, Weka, R, OpenTox,..)  Open APIs (e.g., extend and test OpenTox API 1.2 http://www.opentox.org/dev/apis/api-1.2 for data integration into common rdf resource)ImplementationIt is suggested that a limited set of public domain data sources are selected in the firstinstance, to allow a proof of concept within a 12 months.  Identify vocabulary, ontology sources for compounds, pathologies, etc.(See Toxicology Ontology Roadmap, Hardy, B. et al. from OpenTox-EBI Industry Forum workshop, in press)  Identify data sources from which to harvest risk related information. Opt for a handful of structured sources rather than free text (NDAs, etc.) in the first instance?  Compound safety data sources, both public and private, are mined for risk-related content which is harmonised and organised using public domain ontologies (and held as an RDF triple store?)  Text mining and other semantic technologies will be necessary at this stage.  This data store can be called on by APIs or provide information that can be consumed by analysis tools, ELNs, etc.  Decide on quality metrics – on-the-fly profiles vs. curated, pre-canned data, accuracy vs. recall  Other things to consider include provenance, governance, security, legal, etc.Pistoia Alliance Role  Definition of Use Case  Guidance on best safety-related data sources  Guidance on open standards to use, and their extensions needed  Provide partners willing to integrate public, vendor and proprietary data  Funding of early phase POCs
  2. 2. Text Mining/Metadata Mark up of Unstructured Text ObjectivesUnstructured text sources, both public and proprietary, are rich in information but severalfeatures limit their use in analysis, such as:  No mark-up of key concepts – important terms such as drug and target names are buried within free text with no simple mechanism to surface this information  Linguistic diversity – widespread use of synonyms and ad hoc identifiers make it difficult to carry out semantic searching of free text sources.The objective is to carry out carry out text mining and concept tagging of unstructured text toprovide a meta-data layer over documents. By linking the metadata to public ontologies, asemantically consistent set of tags will be produced, allowing document sources to be queriedand clustered according to recognised standards. This resource could then be made availableusing a cloud model to deliver value and standard search capabilities to Pharma andAcademics alike with appropriate consumption models.Business CaseThe mark-up and mapping of key terms from unstructured text would bring the followingbenefits:  Enhanced search and document retrieval over free text sources  Linking of in-house structured data sources to unstructured information, in-house and in the public domain  Repurpose unstructured text to produce actionable intelligence, for example by creating assertional metadata  Drive towards a common standard for searching or at least a common “honest broker” for search across different resources.Open StandardsIt is suggested that, in order to achieve a working implementation within a 12 month timeframe, a limited set of open standards are applied in the first instance. This could bediscussed more widely within the Pistoia Alliance, but the following areas are worthy ofconsideration  Limiting by domain, e.g. protein targets, drug terms, gene names, pathology  Limit to a single standard that covers multiple domains, e.g. SNOMED-CT, ICD9CMImplementation  Select public domain free text source, e.g. Medline  Identify public ontologies and vocabulary sources  Use text mining/concept recognition tools to identify key concepts and map to standards: Autonomy, Metawise (BioWisdom), Helium (Ceiba), etc.  Platform for search/display – Lucene, other open sourcePistoia Alliance Role  Collaborate to define Use Case  Agree on document sources  Agree on open standards to use, extensions needed  Advise on best practice on document mark-up, search, analysis, governance, security, etc.  Funding of early phase POCs to aid the development of the tools and a drive towards standards.  Support for a free/reduced cost academic access mechanism to encourage common methods of tagging and naming in the academic environment.
  3. 3. Improved Collaboration: Management of Screening Data Objectives  To integrate screening data from multiple sources  To create a standard for expression of screening data, to allow easier integrationBusiness Case  Definition of a standard for reporting compound screening data allows easier integration, with cost and time savings  Facilitates easier sharing of data and collaborationOpen Standards  MIABE, MIAME  ISA-TAB  Define standard for dose response for HTS, HCS, include vocabulary, units; support multiple plate formats, standardised statistical anaylsis  Define how to deal with incomplete data sets, null values, etc.Implementation  Create a the standard, learning from existing standards such as MIAME  Apply the standard in a working project  Reiterate and refinePistoia Alliance Role  Guidance on definition of the standard  Survey what has already been done in the area
  4. 4. Enabling better collaboration in the cloud, applied to monitoring of NGS data Objectives  To provide scientific, business and legal processes outlining best practices for organisations collaborating in the cloud.  Application of these best practices in a system for monitoring the progress of NGS projects.Business Case  Time and cost savings in deciding whether a collaborative project should be carried out in the cloud.  Streamline implementation of cloud-based collaborations by providing clear guidelines.  Reduces delays in handovers.  Greater visibility of distributed project statuses across different organisations.  Early visibility, alerting of important events, allowing timely interventions.Open Standards  Clear APIs and communication standards.  Define web services and service discovery mechanisms.  UDDI (Universal Description, Discovery and Integration).  MIAME?Implementation  Outline best practice rules for working on the cloud  What is the use case?, e.g. alternative to an internally-hosted system, a method of distributing large queries, etc.  What are the requirements for flexibility, such as how long is the service required for and will capacity requirements change over time? What is the tie-in period?  Need clear APIs and communication standards.  Location – does data need to be held within certain boundaries, e.g. within the EU?  What level of encryption is required?  Create standard format for NGS data, consumable by analysis software, e.g. Spotfire.Pistoia Alliance Role  Signposting best practice in the cloud.  Advise on standard representation of NGS data.