Chemistry Connect

Uploaded on

AstraZeneca’s cheminformatics platform for large-scale integration of structure and bioactivity data

AstraZeneca’s cheminformatics platform for large-scale integration of structure and bioactivity data

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Chemistry ConnectAstraZeneca’s cheminformatics platform forlarge-scale integration of structure andbioactivity dataICIC 2012 14-17 October BerlinSorel MuresanAstraZeneca R&D MölndalChemistry Innovation Centre, Discovery Sciences
  • 2. Chemistry Connect – a team effort Discovery Sciences | CIC
  • 3. Driver – explosion in SAR data • Chemical information landscape changing fast • Make every SAR point count, access all available chemistry • Internal & external datasources 2006 2008 Discovery Sciences | CICSouthan, C.; Varkonyi, P.; Muresan, S., J. Cheminfo. 2009
  • 4. SAR key entities and relationships Unstructured Data Structured Entries in from Documents Relational Databases Expert Extraction or Text Mining Discovery Sciences | CICSouthan, C.; Boppana, K.; Jagarlapudi, S.; Muresan, S. J. Cheminfo. 2011
  • 5. Manually extracted SAR data (commercial)• GOSTAR (GVKBIO Online Structure Activity Relationship Database) is a comprehensive database that captures explicit relationships between the three entities of publications, compounds and targets. Discovery Sciences | CIC
  • 6. SAR data (public)• PubChem • the NCBI public informatics backbone for the NIH Molecular Libraries Initiative focused on small molecules as systems biology probes and potential therapeutic agents.• ChEMBL • includes drugs, small molecules from the medicinal chemistry or biochemical literature and their targets. Discovery Sciences | CIC
  • 7. Extracting chemical entities from textCollaboration with IBM Research Almaden to apply text analytics technology to analyze intellectual property and scientific literature - 10 million full text patents - 11 million structures - 17% out of 58M parent structures in Chemistry Connect Discovery Sciences | CIC
  • 8. Chemical Named Entity Recognition (NER) 7-CHLORO-1,3-DIHYDRO-1-METHYL-5- PHENYL-2H-1,4-BENZODIAZEPIN-2-ONE Name-to-Structure software CN1c2ccc(cc2C(=NCC1=O)c3ccccc3)Cl Discovery Sciences | CIC
  • 9. Extracting chemical entities from text The biggest cause of missing compounds when extracting chemical entities from text is the presence of typographical errors: human errors, OCR failures, hyphenation and multiple line issues, etc. • Automated spelling correction with CaffeineFix from NextMove Software • CaffeineFix significantly improves extraction rates (22% increase from D=0 to D=1) • name2structure software are complementary (40% of the structures come from single n2s contributions) Discovery Sciences | CICSayle, R.; Xie, P.; Muresan, S., JCIM 2012
  • 10. Chemistry Connect Compound Document Test & Result Chemistry Connect Discovery Sciences | CIC
  • 11. Chemistry Connect Compound Document Test & Result Target Chemistry Connect Discovery Sciences | CIC
  • 12. Chemistry Connect Discovery Sciences | CIC
  • 13. Chemistry Connect Discovery Sciences | CIC
  • 14. Chemistry Connect Discovery Sciences | CIC
  • 15. Exact match source comparisons sources that include predominantly patent- known drugs derived compounds Discovery Sciences | CIC
  • 16. Finding a common language Acetaminophen [3H]Acetaminophen 882-720-13 Acetaminophen (4-hydroxyacetanilide) 10066-90-7 882-720-16 Acetaminophen glucuronide(55%) acetaminophen sulfate 103-90-2 882-720-20 Acetaminophen sulfate(30%) 1047-607-00 A F ANACIN acetaminophen sulphate Acetaminophen Uniserts 1169-894-12 A PER acetaminophene A.F. ANACIN Acetamol 16110-10-4 ACETANILIDE, 4-HYDROXY- AAP Acetavance 222 AF aa-sulfate Acetofen 222-AF AA-sulphate ACETOMINOPHEN Actamin 3-(glutathion-S-yl)acetaminophen Abenol Actamin Extra Actamin Super 37519-14-5 Abensanil Actifed Plus 3-hydroxyacetaminophen ABROL Actimol Actimol Chewable Tablets 4-(Acetylamino)phenol ABROLET Actimol Childrens Suspension 4-13-00-01091 AC112578 Actimol Infants Suspension Actimol Junior Strength Caplets 4-ACETAMIDOPHENOL AC112579 Actron Acamol Afebrin 4-Acetaminophenol Afebryl Accu-Tap Aferadol 4-ACETYLAMINOPHENOL Acenol AG10223 4-Hydroxyacetanilide Acenol (pharmaceutical) AG12029 AG124687 4-HYDROXYACETANILIDE Acephen AG12800 AG12948 4-HYDROXYANILID KYSELINY OCTOVE Acertol Amadil 4-hydroxyphenolacetamide Aceta Aminofen 644/4046 Aceta Elixir Aminofen Max Anacin 644/7502 Aceta Tablets Anacin-3 64889-81-2 Acetaco Anacin-3 Extra Strength Acetagesic Anadin dla dzieci 659/9501 Anaflon Acetalgin 77097-85-9Acetaminophen: Analter ACETAMIDE, N-(4- Anapap 840-416-00 HYDROXYPHENYL)- Andox>1000 synonyms.. 872-667-00 878-022-04 ACETAMIDE, N-(P- HYDROXYPHENYL)- Anelix Anexsia Anexsia 10/660 878-022-09 Acetamidophenol Anexsia 5/325 878-022-14 Acetaminofen Anexsia 7.5/325 Acetaminophen Anexsia 7.5/650 878-022-19 Anhiba 882-720-04 Acetaminophen (4- Anoquan hydroxyacetanilide) Anti-Algos 882-720-07 Antidol Acetaminophen 882-720-10 glucuronide(55%) Apacet Discovery Sciences | CIC Apacet Capsules acetaminophen sulfate
  • 17. Word of the Day : Crowdsourcing Discovery Sciences | CIC
  • 18. Chemistry Connect Discovery Sciences | CIC
  • 19. Technical Overview - ETL Data Sources Extraction Transformation Loading Text Files Python Structure Scripts Normalization (chemistry) Property calc Oracle PL/SQL Oracle DB (ext tables) Pipeline Pilot (biological results) Web Service Discovery Sciences | CIC
  • 20. Technical Overview - Application HTML Java Oracle 11g WebLogic Server Direct 7 REST (and SOAP) services .Net PipelinePilot Knime Excel Discovery Sciences | CIC
  • 21. Chemistry Connect Apps Canvas SARConnect Chemistry Plato Connect Key compounds Discovery Sciences | CIC
  • 22. Canvas is… …a Rosetta stone for compounds It automatically translates AZ numbers, 196 B.C. 2012 A.D. SNs, chemical names, structures, SMILEs, development IDs, reagent IDs, trade names, legacy Astra & Zeneca IDs… …really easy to use Copy a compound name or structure to the clipboard and let …a portal to information Canvas do the rest of the work It acts as a springboard to let you access Chemistry Connect, ISAC, IBEX, IBIS data, Compound View, ELN data, AZ Patent Db, IBEX, Integrity... …and in 2011, 1750 AZ scientists did Safety assessment & many others… Biologists Med chem Synthetic chem …a compound design tool Patent attorneys It quickly calculates C-lab properties, chemical Crystallographers names, molecular weights, checks novelty… DMPK Comp chem Discovery Sciences | CICJon Winter, Oncology iMed
  • 23. Utopia Documents Discovery Sciences | CIC
  • 24. Key compound prediction from patents From WO1996025405 the earliest patent which claims it, can you work out the structure of Bextra (Valdecoxib), the Pfizer NSAID? 74 exemplified cmpds Discovery Sciences | CICTyrchan, C. et al JCIM 2012
  • 25. WO1996025405 - Bextra Source #compounds Bextra Bextra exists ranked GVKBIO 74 Y 1 (broad core) 1 (narrow core) SureChem 501 Y 1 (broad core) 1 (narrow core) Discovery Sciences | CIC
  • 26. EP268956 - Aciphex Source #compounds Aciphex Aciphex exists ranked GVKBIO 27 Y 2 (core1) 1 (core2) SureChem 168 Y 1 (core1) 1 (core2) Discovery Sciences | CIC
  • 27. PLATO for Safety/Tox – General Concept Predictive Secondary Pharmacology Expert Systems QSAR Models Job Input ResultsMolecule BioSim Summary Pharma Connect Additional ServicesScott BoyerCatrin HasselgrenLars Carlsson All services in Plato are complementary toTobias Noeske find an overall answer to your problem! Discovery Sciences | CIC
  • 28. Predictive Secondary Pharmacology strategy Chemistry Connect Similarity or Compound information: name & structure substructure search Target information: bioactivity data Input Molecule Compound - Target associations • Potential off-target effects? • Part of a safety risk assessmentScott Boyer Similarity concept: Similar compoundsCatrin Hasselgren bind to similar targets.Lars Carlsson M Johnson et al., Prog Clin Biol Res (1989), 291:167 Discovery Sciences | CICTobias Noeske P Willet, Drug Discov Today (2006), 11:1046
  • 29. SARConnect – navigate SAR landscape Compound hierarchy Test & Results Target hierarchy Discovery Sciences | CICEriksson, M. et al Molecular Informatics 2012
  • 30. SARConnect – structure classificationCompound structure Molecular Framework Topological Framework Terminal Rings & Bonds Level 1 Level2 Level 3 Level 4 Discovery Sciences | CIC
  • 31. SARConnect – target classification Level 1 (Broad target class) Enzyme NHR GPCR Ion Channel Other Level 2 (Swiss-Prot family class)GPCR Signal Transmembrane Transmembrane Level 3 (Sub-families) Class A Class B Class C Frizzeled Family SymbolCALCR CALCLR CRHR2 GCCR GIPR GLP1R PTH2R VIPR1 VIPR2 Discovery Sciences | CIC
  • 32. SARConnect – navigate SAR landscape Discovery Sciences | CIC
  • 33. Take-home messages• Chemistry Connect is enabling AstraZeneca to intensify its exploitation of synergies between internal and external SAR estate and to shorten the time between hypothesis generation during DMTA cycles• Our Chemical Dictionary of 120 million chemical terms has become a crucial cross-mapping resource between chemistry and the scientific literature• We cannot wave a magic wand over data quality, provenance issues, drug name space, and the inherent challenges of chemistry representation but Chemistry Connect gives us a unique overview and amelioration options for each source Discovery Sciences | CIC
  • 34. A Democracy of Ideas (Acknowledgements)• Plamen Petrov • Niklas Blomberg• Chris Southan • Jon Winter• Paul Xie • John Cumming• Peter Varkonyi • Scott Boyer• Thierry Kogej • Catrin Hasselgren• Christian Tyrchan • Lars Carlsson• Magnus Kjellberg • Tobias Noeske• Håkan Nilsson • and many others…• Mats Eriksson• Jonas Ekengren• Ithipol Suriyawongkul Discovery Sciences | CIC
  • 35. Thank you! Discovery Sciences | CIC