http://openphacts.orgpmu@openphacts.org    @Open_PHACTS
Source: Nature Reviews DrugDiscovery 11, 191-200 (March2012) | doi:10.1038/nrd3681Jack W. Scannell, Alex Blanckley,Helen B...
harmful                   useless                                harmfulSource: Nature Reviews Drug Discovery 3, 711-716 (...
Derek Lowehttp://www.medicalprogresstoday.com/spotlight/spotlight_indarchive.php?id=1039
http://www.ebi.ac.uk/Information/Brochures/pdf/EMBL-EBI%20Annual%20Report%202011.pdf
297,650http://www.forbes.com/sites/matthewherper/2011/04/13/a-decade-in-drug-industry-layoffs/
Information Tombs…   ¤ Built to primary use-case   ¤ Tailored indexes   ¤ Tailored GUIs   ¤ Unique language &      met...
The Outside World
Precompetitive InformaticsPublic Domain Drug Discovery Data:Pharma are accessing, processing, storing & re-processing     ...
The Innovative Medicines Initiative•  EC funded public-         The Open PHACTS Project   private partnership for   •  Cre...
Pathways                              Interactions      Proteins    Genes                                Pharmacological  ...
Optimised To Business QuestionsNumber	     sum	     Nr	  of	  1	     Ques-on	     15         12	           9	           Al...
GoalsPlatform               GUI           Apps   API               Standards
A Precompetitive Knowledge                       Framework                                     Management                 ...
Open PHACTS                                     Explorer          1st Gen Apps          Partner AppsOct. 2012       Identi...
GB:29384!P12047 X31045!
Issues¤ Provenance¤ Conflicting Authorities¤ Management¤ Transitivity
Whats “equal” anyway?   Gleevec® = Imatinib Mesylate         Imatinib Mesylate         YLMAHDNUQAMNNX-UHFFFAOYSA-N
Search “Gleevec”                                Imatinib                     MesylateChemSpider    Drugbank               ...
Consequences…..
Ignore Salts?NCX-911                   Viagra ®
The 18th International Conference on KnowledgeEngineering and Knowledge Management isconcerned with all aspects of eliciti...
Dynamic EqualityStrict                               RelaxedAnalysing                            Browsing§    Tuneable (s...
LinkSet#1 {  chemspider:gleevec hasParent imatinib ...  drugbank:gleevec exactMatch imatinib ...}
Profile P1   Profile P2   Profile P2“Broad”      “Parents”     “Strict”                                  linkSet1{        ...
The Identifier Mapping ServiceFor each line of SPARQL:                                                              Q, P1 ...
Shouldn’t an integration system be able to                   tell you exactly what its integrating?Based on ve2 editor htt...
## your dataset description:myDS rdf:type void:Dataset ;     foaf:homepage <http://example.org/> ;     dcterms:title "Exam...
Provenance Everywhere<inDataset href=“http://rdf.chemspider.com/void.rdf#chemSpiderDataset” />
Nanopublications                   !
Credit For Curation
Quality AssertionsChemSpider Validation & Standardization Platform             http://bit.ly/NZF5VB
QUDT (http://www.qudt.org/)STANDARD_TYPE   UNIT_COUNT---------------- -------                         STANDARD_TYPE       ...
Licencing
Linked Closed Data
Kick-Starting Sustainability                        Apps                 API           •    Chem-Bio Navigator           •...
Conclusions¤ Project designed for the new drug discovery   environment¤ Timing with RDF/SW is good  ¤  Companies eager ...
Acknowledgements¤  Many members of the consortium who have contributed to data, use cases,    funding, support, documenta...
More Infopmu@openphacts.orghttp://openphacts.org@Open_PHACTSlee@connecteddiscovery.com@Scibitely
backup
Find me the off-target              activities of known cancer            drugs whos primary target is a             cell ...
Are these Interleukin 1A?Human Interleukin 1A Protein        http://bio2rdf.org/uniprot:P01583Human Interleukin 1A Protein...
“There is lots of data we all use every day, and it’s not part of the web. I   can see my bank statements on the web, and ...
Are These Vanilla?
Multiple Namespaces                  Uniprot database                  ID: P26838  http://identifiers.org/uniprot/P26838  ...
What’s this?
/Viagrahttp://www.drugbank.ca/drugs/DB00203
Data sets
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project
Upcoming SlideShare
Loading in …5
×

2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project

5,862
-1

Published on

Keynote presentation given by Lee Harland at EKAW 2012

http://rd.springer.com/chapter/10.1007/978-3-642-33876-2_1

Published in: Health & Medicine

2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project

  1. 1. http://openphacts.orgpmu@openphacts.org @Open_PHACTS
  2. 2. Source: Nature Reviews DrugDiscovery 11, 191-200 (March2012) | doi:10.1038/nrd3681Jack W. Scannell, Alex Blanckley,Helen Boldon & Brian Warrington
  3. 3. harmful useless harmfulSource: Nature Reviews Drug Discovery 3, 711-716 (August 2004)| doi:10.1038/nrd1470Ismail Kola & John Landis
  4. 4. Derek Lowehttp://www.medicalprogresstoday.com/spotlight/spotlight_indarchive.php?id=1039
  5. 5. http://www.ebi.ac.uk/Information/Brochures/pdf/EMBL-EBI%20Annual%20Report%202011.pdf
  6. 6. 297,650http://www.forbes.com/sites/matthewherper/2011/04/13/a-decade-in-drug-industry-layoffs/
  7. 7. Information Tombs… ¤ Built to primary use-case ¤ Tailored indexes ¤ Tailored GUIs ¤ Unique language & metadata ¤ Poor interoperability/ integrationIn vivo Portfolio Literature HR Synthesis SAR Docs Safety Etc
  8. 8. The Outside World
  9. 9. Precompetitive InformaticsPublic Domain Drug Discovery Data:Pharma are accessing, processing, storing & re-processing Repeat @ x Literature Genbank Downloads Databases Patents PubChem each company Firewalled Databases Data Integration Data Analysis Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
  10. 10. The Innovative Medicines Initiative•  EC funded public- The Open PHACTS Project private partnership for •  Create a semantic integration hub (“Open pharmaceutical Pharmacological Space”)… research •  Runs 2011-2014•  Focus on key problems •  Deliver services to support on-going drug –  Efficacy, Safety, discovery programs in pharma and public Education & domain Training, •  Leading academics in semantics, Knowledge pharmacology and informatics, driven by Management solid industry business requirements •  23 academic partners, 8 pharmaceutical companies, 3 software SMEs •  Work split into clusters: •  Technical Build •  Scientific Drive •  Community & Sustainability
  11. 11. Pathways Interactions Proteins Genes Pharmacological ActivitiesTranscripts Clinical Drug ` Applications Biological Processes DiseasesPathological Indications Processes Drugs Chemicals Compounds
  12. 12. Optimised To Business QuestionsNumber   sum   Nr  of  1   Ques-on   15 12   9   All  oxido,reductase  inhibitors  ac6ve  <100nM  in  both  human  and  mouse   Given  compound  X,  what  is  its  predicted  secondary  pharmacology?  What  are  the  on  and  off,target  safety   18 14   8   concerns  for  a  compound?  What  is  the  evidence  and  how  reliable  is  that  evidence  (journal  impact  factor,   KOL)  for  findings  associated  with  a  compound?   Given  a  target  find  me  all  ac-ves  against  that  target.  Find/predict  polypharmacology  of  ac-ves.  Determine   24 13   8   ADMET  profile  of  ac-ves.   32 13   8   For  a  given  interac-on  profile,  give  me  compounds  similar  to  it.   The  current  Factor  Xa  lead  series  is  characterised  by  substructure  X.  Retrieve  all  bioac-vity  data  in  serine   37 13   8   protease  assays  for  molecules  that  contain  substructure  X.   Retrieve  all  experimental  and  clinical  data  for  a  given  list  of  compounds  defined  by  their  chemical   38 13   8   structure  (with  op-ons  to  match  stereochemistry  or  not).   A  project  is  considering  Protein  Kinase  C  Alpha  (PRKCA)  as  a  target.  What  are  all  the  compounds  known  to   modulate  the  target  directly?  What  are  the  compounds  that  may  modulate  the  target  directly?  i.e.  return   41 13   8   all  cmpds  ac-ve  in  assays  where  the  resolu-on  is  at  least  at  the  level  of  the  target  family  (i.e.  PKC)  both   from  structured  assay  databases  and  the  literature.   44 13   8   Give  me  all  ac-ve  compounds  on  a  given  target  with  the  relevant  assay    data   46 13   8   Give  me  the  compound(s)  which  hit  most  specifically  the  mul-ple  targets  in  a  given  pathway  (disease)   59 14   8   Iden-fy  all  known  protein-­‐protein  interac-on  inhibitors  
  13. 13. GoalsPlatform GUI Apps API Standards
  14. 14. A Precompetitive Knowledge Framework Management Community / Governance KD InnovationPharma Needs Sustainability Data Mining Stability Services/ Security Algorithms Mapping & InterfacesIntegration Architecture Populating & Services Vocabularies ContentInputs & Identifiers Structured & (URIs) Unstructured
  15. 15. Open PHACTS Explorer 1st Gen Apps Partner AppsOct. 2012 Identity Resolution “Adenosine Domain Service (ConceptWiki) receptor 2a” Linked Data API (RDF/XML, TTL, JSON) Specific Services Identifier P12374 Management EC2.43.4 Service CS4532 (BridgeDb+) Chemistry Data Cache Normalisation & Q/C (Virtuoso Triple Store) ChemSpider Data Import PublicOntologies User Public Content Commercial Annotations
  16. 16. GB:29384!P12047 X31045!
  17. 17. Issues¤ Provenance¤ Conflicting Authorities¤ Management¤ Transitivity
  18. 18. Whats “equal” anyway? Gleevec® = Imatinib Mesylate Imatinib Mesylate YLMAHDNUQAMNNX-UHFFFAOYSA-N
  19. 19. Search “Gleevec” Imatinib MesylateChemSpider Drugbank PubChem
  20. 20. Consequences…..
  21. 21. Ignore Salts?NCX-911 Viagra ®
  22. 22. The 18th International Conference on KnowledgeEngineering and Knowledge Management isconcerned with all aspects of eliciting, acquiring,modeling and managing knowledge, and its role inthe construction of knowledge-intensive systems andservices for the semantic web, knowledgemanagement, e-business, natural languageprocessing, intelligent information integration, etc. Thefocus of the 18th edition of EKAW will be on"Knowledge Engineering and KnowledgeManagement that matters".
  23. 23. Dynamic EqualityStrict RelaxedAnalysing Browsing§  Tuneable (same data, different questions)§  Domain specific§  User driven§  Traceable
  24. 24. LinkSet#1 { chemspider:gleevec hasParent imatinib ... drugbank:gleevec exactMatch imatinib ...}
  25. 25. Profile P1 Profile P2 Profile P2“Broad” “Parents” “Strict” linkSet1{ chemspider:aspirin exactMatch chembl:aspirin …. } linkSet2{ imantinib_mesylate hasParent imatinib …. } linkSet3{ (+)Staurosporine enantiomer (-)Staurosporine …. } linkSet4{ vanillaEssence hasPart Vanillin …. }
  26. 26. The Identifier Mapping ServiceFor each line of SPARQL: Q, P1 Q’context GRAPH <http://rdf.chemspider.com> { parse cw:979b545d-f9a9 cheminf:logd ?logd recognise cw:979b545d-f9a9 Query Expander expand [cs:2157, chembl:1280,db:db00945] Service transform ?iri cheminf:logd ?logd . FILTER (?iri = cw:979b545d-f9a9 || ?iri = cs:2157 || Identity ?iri = chembl:1280 || Mapping ?iri = db:db00945 || Service …) … } (BridgeDB) Mappings Profiles
  27. 27. Shouldn’t an integration system be able to tell you exactly what its integrating?Based on ve2 editor http://lab.linkeddata.deri.ie/ve2/
  28. 28. ## your dataset description:myDS rdf:type void:Dataset ; foaf:homepage <http://example.org/> ; dcterms:title "Example Dataset"^^xsd:string ; dcterms:description """A simple dataset inRDF."""^^xsd:string ; pav:license <http://creativecommons.org/licenses/by-sa/3.0/> ; void:uriSpace "http://example.org/"^^xsd:string ; pav:retrievedFrom <http://exampledownload.com> ; pav:retrievedOn "2012-09-19"^^xsd:date ; pav:retrievedBy <http://some_web_id> ; pav:version "15.5"^^xsd:string ;
  29. 29. Provenance Everywhere<inDataset href=“http://rdf.chemspider.com/void.rdf#chemSpiderDataset” />
  30. 30. Nanopublications !
  31. 31. Credit For Curation
  32. 32. Quality AssertionsChemSpider Validation & Standardization Platform http://bit.ly/NZF5VB
  33. 33. QUDT (http://www.qudt.org/)STANDARD_TYPE UNIT_COUNT---------------- ------- STANDARD_TYPE STANDARD_UNITS COUNT(*)AC50 7 ------------------ ------------------ --------Activity 421 IC50 nM 829448EC50 39 IC50 ug.mL-1 41000IC50 46 IC50 38521ID50 42 IC50 ug/ml 2038Ki 23 IC50 ug ml-1 509Log IC50 4 IC50 mg kg-1 295Log Ki 7 IC50 molar ratio 178Potency 11 IC50 ug 117log IC50 0 IC50 % 113 IC50 uM well-1 52 IC50 p.p.m. 51>5000 types IC50 ppm 36 IC50 uM-1 25 IC50 nM kg-1 25 IC50 milliequivalent 22 IC50 kJ m-2 20 ~ 100 units
  34. 34. Licencing
  35. 35. Linked Closed Data
  36. 36. Kick-Starting Sustainability Apps API •  Chem-Bio Navigator •  Target Dossier •  Polypharmacology Browser •  Utopia Documents •  Disease Maps •  … more
  37. 37. Conclusions¤ Project designed for the new drug discovery environment¤ Timing with RDF/SW is good ¤  Companies eager to see whether it can really make a difference¤ Challenge: Got to be better than state of the art (in 3 years!)¤ Funding challenges are formidable
  38. 38. Acknowledgements¤  Many members of the consortium who have contributed to data, use cases, funding, support, documentation, management¤  EBI: John Overington, Anna Gaulton, Mark Davies¤  Lundbeck: Sune Askjær¤  Maastricht: Chris Evelo, Andra Waagmeester, Egon Willighagen¤  Manchester: ¤  Carole Goble, Alasdair Gray, Christian Brenninkmeijer ¤  Steve Pettifer, Ian Dunlop, Rishi Ramgolam, James Eales¤  NBIC: Barend Mons, Kees Burger¤  RSC: Antony Williams, Valery Tkachenko¤  SIB: Christine Chichester¤  VU: Frank van Harmelen, Paul Groth, Antonis Loizou¤  OpenLink: Orri Erling, Yrjana Rankka, Hugh Williams¤  Chem2Bio2RDF: David Wild, Bin Chen
  39. 39. More Infopmu@openphacts.orghttp://openphacts.org@Open_PHACTSlee@connecteddiscovery.com@Scibitely
  40. 40. backup
  41. 41. Find me the off-target activities of known cancer drugs whos primary target is a cell cycle regulatory kinase GeneChEMBL DrugBank Wikipathways Ontology ChEBI Uniprot UMLS ConceptWiki ChemSpider Connected Using Semantic Technology
  42. 42. Are these Interleukin 1A?Human Interleukin 1A Protein http://bio2rdf.org/uniprot:P01583Human Interleukin 1A Protein http://identifiers.org/uniprot/P01583Human Interleukin 1A Entrez Gene: 3552, Ensembl:ENSG00000115008GeneAffymetrix probes hIL1A 1076_at, 210118_s_at, 208200_at, 208200_atMouse Interleukin 1A Uniprot:P01582IL1A PDB Structures 1ITA (3D) 2ILA (3D) 2KKI (3D) 2L5X (3D)….etc
  43. 43. “There is lots of data we all use every day, and it’s not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar? No. Why not? Because we don’t have a web of data. Because data is controlled by applications and each application keeps it to itself.” Sir Tim Berners-Lee
  44. 44. Are These Vanilla?
  45. 45. Multiple Namespaces Uniprot database ID: P26838 http://identifiers.org/uniprot/P26838 http://bio2rdf.org/uniprot:P26838 http://uniprot.bio2rdf.org/uniprot:P26838 http://chem2bio2rdf.org/uniprot/resource/P26838 http://purl.uniprot.org/uniprot/P26838 ……
  46. 46. What’s this?
  47. 47. /Viagrahttp://www.drugbank.ca/drugs/DB00203
  48. 48. Data sets

×