SlideShare a Scribd company logo
1 of 35
Download to read offline
Chemistry Connect
AstraZeneca’s cheminformatics platform for
large-scale integration of structure and
bioactivity data

ICIC 2012 14-17 October Berlin


Sorel Muresan
AstraZeneca R&D Mölndal
Chemistry Innovation Centre, Discovery Sciences
Chemistry Connect – a team effort




                                                 Discovery Sciences | CIC
http://dx.doi.org/10.1016/j.drudis.2011.10.005
Driver – explosion in SAR data

       • Chemical information landscape changing fast

       • Make every SAR point count, access all available chemistry

       • Internal & external datasources


                      2006                                  2008




                                                            Discovery Sciences | CIC
Southan, C.; Varkonyi, P.; Muresan, S., J. Cheminfo. 2009
SAR key entities and relationships




     Unstructured Data                                           Structured Entries in
      from Documents                                             Relational Databases
                                   Expert Extraction
                                          or
                                     Text Mining

                                                                      Discovery Sciences | CIC
Southan, C.; Boppana, K.; Jagarlapudi, S.; Muresan, S. J. Cheminfo. 2011
Manually extracted SAR data (commercial)

• GOSTAR (GVKBIO Online Structure Activity Relationship Database) is a
  comprehensive database that captures explicit relationships between the three
  entities of publications, compounds and targets.




                                                             Discovery Sciences | CIC
SAR data (public)


• PubChem
  • the NCBI public informatics backbone for the NIH Molecular
    Libraries Initiative focused on small molecules as systems
    biology probes and potential therapeutic agents.


• ChEMBL
  • includes drugs, small molecules from the medicinal chemistry
    or biochemical literature and their targets.




                                                Discovery Sciences | CIC
Extracting chemical entities from text


Collaboration with IBM Research Almaden to apply
 text analytics technology to analyze intellectual
 property and scientific literature

 - 10 million full text patents
 - 11 million structures

 - 17% out of 58M parent structures in Chemistry Connect




                                            Discovery Sciences | CIC
Chemical Named Entity Recognition (NER)

                    7-CHLORO-1,3-DIHYDRO-1-METHYL-5-
                    PHENYL-2H-1,4-BENZODIAZEPIN-2-ONE


                                        Name-to-Structure
                                        software



                     CN1c2ccc(cc2C(=NCC1=O)c3ccccc3)Cl




                                         Discovery Sciences | CIC
Extracting chemical entities from text


  The biggest cause of missing compounds when extracting
   chemical entities from text is the presence of typographical
   errors: human errors, OCR failures, hyphenation and
   multiple line issues, etc.

  • Automated spelling correction with CaffeineFix from
   NextMove Software

       • CaffeineFix significantly improves extraction rates (22%
         increase from D=0 to D=1)

       • name2structure software are complementary (40% of the
         structures come from single n2s contributions)


                                                      Discovery Sciences | CIC
Sayle, R.; Xie, P.; Muresan, S., JCIM 2012
Chemistry Connect



                      Compound

                      Document

                     Test & Result



          Chemistry Connect

                                 Discovery Sciences | CIC
Chemistry Connect



                      Compound

                      Document

                     Test & Result

                        Target

          Chemistry Connect

                                 Discovery Sciences | CIC
Chemistry Connect




                    Discovery Sciences | CIC
Chemistry Connect




                    Discovery Sciences | CIC
Chemistry Connect




                    Discovery Sciences | CIC
Exact match source comparisons




    sources that include   predominantly patent-
    known drugs            derived compounds


                                    Discovery Sciences | CIC
Finding a common language
                                                                                         Acetaminophen
                   [3H]Acetaminophen                  882-720-13                         Acetaminophen (4-hydroxyacetanilide)
                   10066-90-7                         882-720-16                         Acetaminophen glucuronide(55%)
                                                                                         acetaminophen sulfate
                   103-90-2                           882-720-20                         Acetaminophen sulfate(30%)
                   1047-607-00                        A F ANACIN                         acetaminophen sulphate
                                                                                         Acetaminophen Uniserts
                   1169-894-12                        A PER                              acetaminophene
                                                      A.F. ANACIN                        Acetamol
                   16110-10-4                                                            ACETANILIDE, 4'-HYDROXY-
                                                      AAP                                Acetavance
                   222 AF                             aa-sulfate                         Acetofen
                   222-AF                             AA-sulphate
                                                                                         ACETOMINOPHEN
                                                                                         Actamin
                   3-(glutathion-S-yl)acetaminophen   Abenol                             Actamin Extra
                                                                                         Actamin Super
                   37519-14-5                         Abensanil                          Actifed Plus
                   3-hydroxyacetaminophen             ABROL                              Actimol
                                                                                         Actimol Chewable Tablets
                   4-(Acetylamino)phenol              ABROLET                            Actimol Children's Suspension
                   4-13-00-01091                      AC112578                           Actimol Infants' Suspension
                                                                                         Actimol Junior Strength Caplets
                   4-ACETAMIDOPHENOL                  AC112579                           Actron
                                                      Acamol                             Afebrin
                   4-Acetaminophenol                                                     Afebryl
                                                      Accu-Tap                           Aferadol
                   4-ACETYLAMINOPHENOL                Acenol                             AG10223
                   4'-Hydroxyacetanilide              Acenol (pharmaceutical)
                                                                                         AG12029
                                                                                         AG124687
                   4-HYDROXYACETANILIDE               Acephen                            AG12800
                                                                                         AG12948
                   4-HYDROXYANILID KYSELINY OCTOVE    Acertol                          Amadil
                   4-hydroxyphenolacetamide           Aceta                            Aminofen
                   644/4046                           Aceta Elixir                     Aminofen Max
                                                                                       Anacin
                   644/7502                           Aceta Tablets                    Anacin-3
                   64889-81-2                         Acetaco                          Anacin-3 Extra Strength
                                                      Acetagesic                       Anadin dla dzieci
                   659/9501                                                            Anaflon
                                                      Acetalgin
                   77097-85-9
Acetaminophen:
                                                                                       Analter
                                                      ACETAMIDE, N-(4-                 Anapap
                   840-416-00                         HYDROXYPHENYL)-                  Andox

>1000 synonyms..   872-667-00
                   878-022-04
                                                      ACETAMIDE, N-(P-
                                                      HYDROXYPHENYL)-
                                                                                       Anelix
                                                                                       Anexsia
                                                                                       Anexsia 10/660
                   878-022-09                         Acetamidophenol                  Anexsia 5/325
                   878-022-14                         Acetaminofen                     Anexsia 7.5/325
                                                      Acetaminophen                    Anexsia 7.5/650
                   878-022-19                                                          Anhiba
                   882-720-04                         Acetaminophen (4-                Anoquan
                                                      hydroxyacetanilide)              Anti-Algos
                   882-720-07                                                          Antidol
                                                      Acetaminophen
                   882-720-10                         glucuronide(55%)
                                                                                       Apacet
                                                                                Discovery Sciences | CIC
                                                                                       Apacet Capsules
                                                      acetaminophen sulfate
Word of the Day : Crowdsourcing




                                  Discovery Sciences | CIC
Chemistry Connect




                    Discovery Sciences | CIC
Technical Overview - ETL

   Data Sources   Extraction           Transformation              Loading




    Text Files      Python               Structure
                    Scripts            Normalization
                  (chemistry)          Property calc

                                                                Oracle PL/SQL
   Oracle DB                                                     (ext tables)

                            Pipeline Pilot
                         (biological results)
      Web
     Service

                                                       Discovery Sciences | CIC
Technical Overview - Application


                                                     HTML



                                                      Java

    Oracle 11g       WebLogic Server
     Direct 7    REST (and SOAP) services
                                                      .Net


                                                 PipelinePilot
                                                    Knime
                                                     Excel




                                            Discovery Sciences | CIC
Chemistry Connect Apps


                        Canvas




         SARConnect
                      Chemistry    Plato
                       Connect




                          Key
                       compounds


                                   Discovery Sciences | CIC
Canvas
                      is…


 …a Rosetta stone for
 compounds
   It automatically translates AZ numbers,            196 B.C.                  2012 A.D.
   SNs, chemical names, structures, SMILEs,
   development IDs, reagent IDs, trade
   names, legacy Astra & Zeneca IDs…                             …really easy to use
                                                                   Copy a compound name or
                                                                   structure to the clipboard and let
 …a portal to information                                          Canvas do the rest of the work
   It acts as a springboard to let you access
   Chemistry Connect, ISAC, IBEX, IBIS data,
   Compound View, ELN data, AZ Patent Db, IBEX,
   Integrity...                                                  …and in 2011, 1750 AZ scientists did
                                                                 Safety assessment
                                                                                                          & many others…
                                                                                                          Biologists
                                                                 Med chem                                 Synthetic chem
 …a compound design tool                                                                                  Patent attorneys
   It quickly calculates C-lab properties, chemical                                                       Crystallographers
   names, molecular weights, checks novelty…
                                                                        DMPK
                                                                      Comp chem            Discovery Sciences | CIC
Jon Winter, Oncology iMed
Utopia Documents




                                 Discovery Sciences | CIC
http://getutopia.com/index.php
Key compound prediction from patents
   From WO1996025405 the earliest patent which claims it, can you
   work out the structure of Bextra (Valdecoxib), the Pfizer NSAID?




                                              74 exemplified cmpds


                                                               Discovery Sciences | CIC
Tyrchan, C. et al JCIM 2012
WO1996025405 - Bextra




 Source     #compounds   Bextra   Bextra
                         exists   ranked
 GVKBIO     74           Y        1 (broad core)
                                  1 (narrow core)
 SureChem   501          Y        1 (broad core)
                                  1 (narrow core)


                                  Discovery Sciences | CIC
EP268956 - Aciphex




  Source     #compounds   Aciphex   Aciphex
                          exists    ranked
  GVKBIO     27           Y         2 (core1)
                                    1 (core2)
  SureChem   168          Y         1 (core1)
                                    1 (core2)


                                    Discovery Sciences | CIC
PLATO for Safety/Tox – General Concept
                         Predictive Secondary
                         Pharmacology



                         Expert
                         Systems




                         QSAR
                         Models
           Job

 Input                            Results
Molecule                 BioSim
                                                           Summary



                         Pharma
                         Connect




                         Additional
                         Services



Scott Boyer
Catrin Hasselgren
Lars Carlsson       All services in Plato are complementary to
Tobias Noeske        find an overall answer to your problem!
                                            Discovery Sciences | CIC
Predictive Secondary Pharmacology strategy


                                                        Chemistry Connect
                Similarity or
                                                        Compound information: name & structure
             substructure search
                                                        Target information: bioactivity data
  Input
 Molecule




                                    Compound - Target associations
                                    • Potential off-target effects?
                                    • Part of a safety risk assessment




Scott Boyer                         Similarity concept: Similar compounds
Catrin Hasselgren                            bind to similar targets.
Lars Carlsson                      M Johnson et al., Prog Clin Biol Res (1989), 291:167
                                                                     Discovery Sciences | CIC
Tobias Noeske                      P Willet, Drug Discov Today (2006), 11:1046
SARConnect – navigate SAR landscape
     Compound hierarchy




                                          Test &
                                          Results




                          Target hierarchy
                                                    Discovery Sciences | CIC
Eriksson, M. et al Molecular Informatics 2012
SARConnect – structure classification




Compound structure   Molecular Framework   Topological Framework     Terminal Rings & Bonds

     Level 1               Level2                 Level 3                      Level 4




                                                              Discovery Sciences | CIC
SARConnect – target classification

        Level 1 (Broad target class)
 Enzyme NHR GPCR Ion Channel Other


       Level 2 (Swiss-Prot family class)
GPCR    Signal Transmembrane Transmembrane


            Level 3 (Sub-families)
  Class A Class B Class C Frizzeled Family


                   Symbol
CALCR      CALCLR CRHR2 GCCR GIPR GLP1R
             PTH2R VIPR1 VIPR2


                                             Discovery Sciences | CIC
SARConnect – navigate SAR landscape




                                Discovery Sciences | CIC
Take-home messages

• Chemistry Connect is enabling AstraZeneca to intensify its
  exploitation of synergies between internal and external SAR estate
  and to shorten the time between hypothesis generation during DMTA
  cycles


• Our Chemical Dictionary of 120 million chemical terms has become a
  crucial cross-mapping resource between chemistry and the scientific
  literature


• We cannot wave a magic wand over data quality, provenance issues,
  drug name space, and the inherent challenges of chemistry
  representation but Chemistry Connect gives us a unique overview and
  amelioration options for each source

                                                     Discovery Sciences | CIC
A Democracy of Ideas (Acknowledgements)


• Plamen Petrov           • Niklas Blomberg
• Chris Southan           • Jon Winter
• Paul Xie                • John Cumming
• Peter Varkonyi          • Scott Boyer
• Thierry Kogej           • Catrin Hasselgren
• Christian Tyrchan       • Lars Carlsson
• Magnus Kjellberg        • Tobias Noeske
• Håkan Nilsson           • and many others…
• Mats Eriksson
• Jonas Ekengren
• Ithipol Suriyawongkul




                                                Discovery Sciences | CIC
Thank you!




             Discovery Sciences | CIC

More Related Content

Viewers also liked

DASH - Josh Curtis - Dominating Influencer Marketing for Mobile Games - Chart...
DASH - Josh Curtis - Dominating Influencer Marketing for Mobile Games - Chart...DASH - Josh Curtis - Dominating Influencer Marketing for Mobile Games - Chart...
DASH - Josh Curtis - Dominating Influencer Marketing for Mobile Games - Chart...Josh Curtis
 
Power of LinkedIn Lookup: finding hidden talent in your organization | Talent...
Power of LinkedIn Lookup: finding hidden talent in your organization | Talent...Power of LinkedIn Lookup: finding hidden talent in your organization | Talent...
Power of LinkedIn Lookup: finding hidden talent in your organization | Talent...LinkedIn Talent Solutions
 
Analisis arquitectonico y estudio de medio geografico del sitio arqueologico ...
Analisis arquitectonico y estudio de medio geografico del sitio arqueologico ...Analisis arquitectonico y estudio de medio geografico del sitio arqueologico ...
Analisis arquitectonico y estudio de medio geografico del sitio arqueologico ...Cristyuli Yamille Celis Gómez
 
Business Administration Studies at Brooklyn College
Business Administration Studies at Brooklyn College Business Administration Studies at Brooklyn College
Business Administration Studies at Brooklyn College Craig Raucher New York
 
Alfresco tech talk live share extensibility metadata and actions for 4.1
Alfresco tech talk live share extensibility metadata and actions for 4.1Alfresco tech talk live share extensibility metadata and actions for 4.1
Alfresco tech talk live share extensibility metadata and actions for 4.1Alfresco Software
 
Loan Point- A Reliable Hub of Money Lending
Loan Point- A Reliable Hub of Money LendingLoan Point- A Reliable Hub of Money Lending
Loan Point- A Reliable Hub of Money LendingJessica Rodz
 
Mapping Human-Centric Product Vision (ProductCamp Boston 2016)
Mapping Human-Centric Product Vision (ProductCamp Boston 2016)Mapping Human-Centric Product Vision (ProductCamp Boston 2016)
Mapping Human-Centric Product Vision (ProductCamp Boston 2016)ProductCamp Boston
 
CDC International Logistics - Company and Service Introduction
CDC International Logistics - Company and Service IntroductionCDC International Logistics - Company and Service Introduction
CDC International Logistics - Company and Service IntroductionVegard Synnes
 
University of Hertfordshire - Resume of slides from Library Teach Meet June 2014
University of Hertfordshire - Resume of slides from Library Teach Meet June 2014University of Hertfordshire - Resume of slides from Library Teach Meet June 2014
University of Hertfordshire - Resume of slides from Library Teach Meet June 2014LisaKFlint
 
Stacy's secret sauce: how to recruit like a boss | Talent Connect Anaheim
Stacy's secret sauce: how to recruit like a boss  | Talent Connect AnaheimStacy's secret sauce: how to recruit like a boss  | Talent Connect Anaheim
Stacy's secret sauce: how to recruit like a boss | Talent Connect AnaheimLinkedIn Talent Solutions
 
Technik.hotelarstwa 341[04] z3.04_u
Technik.hotelarstwa 341[04] z3.04_uTechnik.hotelarstwa 341[04] z3.04_u
Technik.hotelarstwa 341[04] z3.04_uPusiu99
 
Empowering Customer Success through Product Feedback
Empowering Customer Success through Product FeedbackEmpowering Customer Success through Product Feedback
Empowering Customer Success through Product FeedbackGainsight
 
Anatomy of an Intranet (Triangle SharePoint User Group) October 2016
Anatomy of an Intranet (Triangle SharePoint User Group) October 2016Anatomy of an Intranet (Triangle SharePoint User Group) October 2016
Anatomy of an Intranet (Triangle SharePoint User Group) October 2016Michael Greene
 

Viewers also liked (17)

DASH - Josh Curtis - Dominating Influencer Marketing for Mobile Games - Chart...
DASH - Josh Curtis - Dominating Influencer Marketing for Mobile Games - Chart...DASH - Josh Curtis - Dominating Influencer Marketing for Mobile Games - Chart...
DASH - Josh Curtis - Dominating Influencer Marketing for Mobile Games - Chart...
 
Mkt content Mercado
Mkt content MercadoMkt content Mercado
Mkt content Mercado
 
Power of LinkedIn Lookup: finding hidden talent in your organization | Talent...
Power of LinkedIn Lookup: finding hidden talent in your organization | Talent...Power of LinkedIn Lookup: finding hidden talent in your organization | Talent...
Power of LinkedIn Lookup: finding hidden talent in your organization | Talent...
 
Analisis arquitectonico y estudio de medio geografico del sitio arqueologico ...
Analisis arquitectonico y estudio de medio geografico del sitio arqueologico ...Analisis arquitectonico y estudio de medio geografico del sitio arqueologico ...
Analisis arquitectonico y estudio de medio geografico del sitio arqueologico ...
 
Business Administration Studies at Brooklyn College
Business Administration Studies at Brooklyn College Business Administration Studies at Brooklyn College
Business Administration Studies at Brooklyn College
 
Bianco toro
Bianco toroBianco toro
Bianco toro
 
Alfresco tech talk live share extensibility metadata and actions for 4.1
Alfresco tech talk live share extensibility metadata and actions for 4.1Alfresco tech talk live share extensibility metadata and actions for 4.1
Alfresco tech talk live share extensibility metadata and actions for 4.1
 
Loan Point- A Reliable Hub of Money Lending
Loan Point- A Reliable Hub of Money LendingLoan Point- A Reliable Hub of Money Lending
Loan Point- A Reliable Hub of Money Lending
 
Mapping Human-Centric Product Vision (ProductCamp Boston 2016)
Mapping Human-Centric Product Vision (ProductCamp Boston 2016)Mapping Human-Centric Product Vision (ProductCamp Boston 2016)
Mapping Human-Centric Product Vision (ProductCamp Boston 2016)
 
Juliao
JuliaoJuliao
Juliao
 
3
33
3
 
CDC International Logistics - Company and Service Introduction
CDC International Logistics - Company and Service IntroductionCDC International Logistics - Company and Service Introduction
CDC International Logistics - Company and Service Introduction
 
University of Hertfordshire - Resume of slides from Library Teach Meet June 2014
University of Hertfordshire - Resume of slides from Library Teach Meet June 2014University of Hertfordshire - Resume of slides from Library Teach Meet June 2014
University of Hertfordshire - Resume of slides from Library Teach Meet June 2014
 
Stacy's secret sauce: how to recruit like a boss | Talent Connect Anaheim
Stacy's secret sauce: how to recruit like a boss  | Talent Connect AnaheimStacy's secret sauce: how to recruit like a boss  | Talent Connect Anaheim
Stacy's secret sauce: how to recruit like a boss | Talent Connect Anaheim
 
Technik.hotelarstwa 341[04] z3.04_u
Technik.hotelarstwa 341[04] z3.04_uTechnik.hotelarstwa 341[04] z3.04_u
Technik.hotelarstwa 341[04] z3.04_u
 
Empowering Customer Success through Product Feedback
Empowering Customer Success through Product FeedbackEmpowering Customer Success through Product Feedback
Empowering Customer Success through Product Feedback
 
Anatomy of an Intranet (Triangle SharePoint User Group) October 2016
Anatomy of an Intranet (Triangle SharePoint User Group) October 2016Anatomy of an Intranet (Triangle SharePoint User Group) October 2016
Anatomy of an Intranet (Triangle SharePoint User Group) October 2016
 

More from Sorel Muresan

Alcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerAlcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerSorel Muresan
 
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Sorel Muresan
 
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALAkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALSorel Muresan
 
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Sorel Muresan
 
Berol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningBerol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningSorel Muresan
 
Multifunctional hydrotropes
Multifunctional hydrotropesMultifunctional hydrotropes
Multifunctional hydrotropesSorel Muresan
 
Thickening with cationic surfactants
Thickening with cationic surfactantsThickening with cationic surfactants
Thickening with cationic surfactants Sorel Muresan
 
Challenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelChallenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelSorel Muresan
 
Berol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightBerol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightSorel Muresan
 
Patent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsPatent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsSorel Muresan
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsSorel Muresan
 
Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Sorel Muresan
 

More from Sorel Muresan (12)

Alcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerAlcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymer
 
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
 
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALAkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
 
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
 
Berol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningBerol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaning
 
Multifunctional hydrotropes
Multifunctional hydrotropesMultifunctional hydrotropes
Multifunctional hydrotropes
 
Thickening with cationic surfactants
Thickening with cationic surfactantsThickening with cationic surfactants
Thickening with cationic surfactants
 
Challenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelChallenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobel
 
Berol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightBerol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delight
 
Patent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsPatent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patents
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dots
 
Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...
 

Recently uploaded

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 

Recently uploaded (20)

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 

Chemistry Connect

  • 1. Chemistry Connect AstraZeneca’s cheminformatics platform for large-scale integration of structure and bioactivity data ICIC 2012 14-17 October Berlin Sorel Muresan AstraZeneca R&D Mölndal Chemistry Innovation Centre, Discovery Sciences
  • 2. Chemistry Connect – a team effort Discovery Sciences | CIC http://dx.doi.org/10.1016/j.drudis.2011.10.005
  • 3. Driver – explosion in SAR data • Chemical information landscape changing fast • Make every SAR point count, access all available chemistry • Internal & external datasources 2006 2008 Discovery Sciences | CIC Southan, C.; Varkonyi, P.; Muresan, S., J. Cheminfo. 2009
  • 4. SAR key entities and relationships Unstructured Data Structured Entries in from Documents Relational Databases Expert Extraction or Text Mining Discovery Sciences | CIC Southan, C.; Boppana, K.; Jagarlapudi, S.; Muresan, S. J. Cheminfo. 2011
  • 5. Manually extracted SAR data (commercial) • GOSTAR (GVKBIO Online Structure Activity Relationship Database) is a comprehensive database that captures explicit relationships between the three entities of publications, compounds and targets. Discovery Sciences | CIC
  • 6. SAR data (public) • PubChem • the NCBI public informatics backbone for the NIH Molecular Libraries Initiative focused on small molecules as systems biology probes and potential therapeutic agents. • ChEMBL • includes drugs, small molecules from the medicinal chemistry or biochemical literature and their targets. Discovery Sciences | CIC
  • 7. Extracting chemical entities from text Collaboration with IBM Research Almaden to apply text analytics technology to analyze intellectual property and scientific literature - 10 million full text patents - 11 million structures - 17% out of 58M parent structures in Chemistry Connect Discovery Sciences | CIC
  • 8. Chemical Named Entity Recognition (NER) 7-CHLORO-1,3-DIHYDRO-1-METHYL-5- PHENYL-2H-1,4-BENZODIAZEPIN-2-ONE Name-to-Structure software CN1c2ccc(cc2C(=NCC1=O)c3ccccc3)Cl Discovery Sciences | CIC
  • 9. Extracting chemical entities from text The biggest cause of missing compounds when extracting chemical entities from text is the presence of typographical errors: human errors, OCR failures, hyphenation and multiple line issues, etc. • Automated spelling correction with CaffeineFix from NextMove Software • CaffeineFix significantly improves extraction rates (22% increase from D=0 to D=1) • name2structure software are complementary (40% of the structures come from single n2s contributions) Discovery Sciences | CIC Sayle, R.; Xie, P.; Muresan, S., JCIM 2012
  • 10. Chemistry Connect Compound Document Test & Result Chemistry Connect Discovery Sciences | CIC
  • 11. Chemistry Connect Compound Document Test & Result Target Chemistry Connect Discovery Sciences | CIC
  • 12. Chemistry Connect Discovery Sciences | CIC
  • 13. Chemistry Connect Discovery Sciences | CIC
  • 14. Chemistry Connect Discovery Sciences | CIC
  • 15. Exact match source comparisons sources that include predominantly patent- known drugs derived compounds Discovery Sciences | CIC
  • 16. Finding a common language Acetaminophen [3H]Acetaminophen 882-720-13 Acetaminophen (4-hydroxyacetanilide) 10066-90-7 882-720-16 Acetaminophen glucuronide(55%) acetaminophen sulfate 103-90-2 882-720-20 Acetaminophen sulfate(30%) 1047-607-00 A F ANACIN acetaminophen sulphate Acetaminophen Uniserts 1169-894-12 A PER acetaminophene A.F. ANACIN Acetamol 16110-10-4 ACETANILIDE, 4'-HYDROXY- AAP Acetavance 222 AF aa-sulfate Acetofen 222-AF AA-sulphate ACETOMINOPHEN Actamin 3-(glutathion-S-yl)acetaminophen Abenol Actamin Extra Actamin Super 37519-14-5 Abensanil Actifed Plus 3-hydroxyacetaminophen ABROL Actimol Actimol Chewable Tablets 4-(Acetylamino)phenol ABROLET Actimol Children's Suspension 4-13-00-01091 AC112578 Actimol Infants' Suspension Actimol Junior Strength Caplets 4-ACETAMIDOPHENOL AC112579 Actron Acamol Afebrin 4-Acetaminophenol Afebryl Accu-Tap Aferadol 4-ACETYLAMINOPHENOL Acenol AG10223 4'-Hydroxyacetanilide Acenol (pharmaceutical) AG12029 AG124687 4-HYDROXYACETANILIDE Acephen AG12800 AG12948 4-HYDROXYANILID KYSELINY OCTOVE Acertol Amadil 4-hydroxyphenolacetamide Aceta Aminofen 644/4046 Aceta Elixir Aminofen Max Anacin 644/7502 Aceta Tablets Anacin-3 64889-81-2 Acetaco Anacin-3 Extra Strength Acetagesic Anadin dla dzieci 659/9501 Anaflon Acetalgin 77097-85-9 Acetaminophen: Analter ACETAMIDE, N-(4- Anapap 840-416-00 HYDROXYPHENYL)- Andox >1000 synonyms.. 872-667-00 878-022-04 ACETAMIDE, N-(P- HYDROXYPHENYL)- Anelix Anexsia Anexsia 10/660 878-022-09 Acetamidophenol Anexsia 5/325 878-022-14 Acetaminofen Anexsia 7.5/325 Acetaminophen Anexsia 7.5/650 878-022-19 Anhiba 882-720-04 Acetaminophen (4- Anoquan hydroxyacetanilide) Anti-Algos 882-720-07 Antidol Acetaminophen 882-720-10 glucuronide(55%) Apacet Discovery Sciences | CIC Apacet Capsules acetaminophen sulfate
  • 17. Word of the Day : Crowdsourcing Discovery Sciences | CIC
  • 18. Chemistry Connect Discovery Sciences | CIC
  • 19. Technical Overview - ETL Data Sources Extraction Transformation Loading Text Files Python Structure Scripts Normalization (chemistry) Property calc Oracle PL/SQL Oracle DB (ext tables) Pipeline Pilot (biological results) Web Service Discovery Sciences | CIC
  • 20. Technical Overview - Application HTML Java Oracle 11g WebLogic Server Direct 7 REST (and SOAP) services .Net PipelinePilot Knime Excel Discovery Sciences | CIC
  • 21. Chemistry Connect Apps Canvas SARConnect Chemistry Plato Connect Key compounds Discovery Sciences | CIC
  • 22. Canvas is… …a Rosetta stone for compounds It automatically translates AZ numbers, 196 B.C. 2012 A.D. SNs, chemical names, structures, SMILEs, development IDs, reagent IDs, trade names, legacy Astra & Zeneca IDs… …really easy to use Copy a compound name or structure to the clipboard and let …a portal to information Canvas do the rest of the work It acts as a springboard to let you access Chemistry Connect, ISAC, IBEX, IBIS data, Compound View, ELN data, AZ Patent Db, IBEX, Integrity... …and in 2011, 1750 AZ scientists did Safety assessment & many others… Biologists Med chem Synthetic chem …a compound design tool Patent attorneys It quickly calculates C-lab properties, chemical Crystallographers names, molecular weights, checks novelty… DMPK Comp chem Discovery Sciences | CIC Jon Winter, Oncology iMed
  • 23. Utopia Documents Discovery Sciences | CIC http://getutopia.com/index.php
  • 24. Key compound prediction from patents From WO1996025405 the earliest patent which claims it, can you work out the structure of Bextra (Valdecoxib), the Pfizer NSAID? 74 exemplified cmpds Discovery Sciences | CIC Tyrchan, C. et al JCIM 2012
  • 25. WO1996025405 - Bextra Source #compounds Bextra Bextra exists ranked GVKBIO 74 Y 1 (broad core) 1 (narrow core) SureChem 501 Y 1 (broad core) 1 (narrow core) Discovery Sciences | CIC
  • 26. EP268956 - Aciphex Source #compounds Aciphex Aciphex exists ranked GVKBIO 27 Y 2 (core1) 1 (core2) SureChem 168 Y 1 (core1) 1 (core2) Discovery Sciences | CIC
  • 27. PLATO for Safety/Tox – General Concept Predictive Secondary Pharmacology Expert Systems QSAR Models Job Input Results Molecule BioSim Summary Pharma Connect Additional Services Scott Boyer Catrin Hasselgren Lars Carlsson All services in Plato are complementary to Tobias Noeske find an overall answer to your problem! Discovery Sciences | CIC
  • 28. Predictive Secondary Pharmacology strategy Chemistry Connect Similarity or Compound information: name & structure substructure search Target information: bioactivity data Input Molecule Compound - Target associations • Potential off-target effects? • Part of a safety risk assessment Scott Boyer Similarity concept: Similar compounds Catrin Hasselgren bind to similar targets. Lars Carlsson M Johnson et al., Prog Clin Biol Res (1989), 291:167 Discovery Sciences | CIC Tobias Noeske P Willet, Drug Discov Today (2006), 11:1046
  • 29. SARConnect – navigate SAR landscape Compound hierarchy Test & Results Target hierarchy Discovery Sciences | CIC Eriksson, M. et al Molecular Informatics 2012
  • 30. SARConnect – structure classification Compound structure Molecular Framework Topological Framework Terminal Rings & Bonds Level 1 Level2 Level 3 Level 4 Discovery Sciences | CIC
  • 31. SARConnect – target classification Level 1 (Broad target class) Enzyme NHR GPCR Ion Channel Other Level 2 (Swiss-Prot family class) GPCR Signal Transmembrane Transmembrane Level 3 (Sub-families) Class A Class B Class C Frizzeled Family Symbol CALCR CALCLR CRHR2 GCCR GIPR GLP1R PTH2R VIPR1 VIPR2 Discovery Sciences | CIC
  • 32. SARConnect – navigate SAR landscape Discovery Sciences | CIC
  • 33. Take-home messages • Chemistry Connect is enabling AstraZeneca to intensify its exploitation of synergies between internal and external SAR estate and to shorten the time between hypothesis generation during DMTA cycles • Our Chemical Dictionary of 120 million chemical terms has become a crucial cross-mapping resource between chemistry and the scientific literature • We cannot wave a magic wand over data quality, provenance issues, drug name space, and the inherent challenges of chemistry representation but Chemistry Connect gives us a unique overview and amelioration options for each source Discovery Sciences | CIC
  • 34. A Democracy of Ideas (Acknowledgements) • Plamen Petrov • Niklas Blomberg • Chris Southan • Jon Winter • Paul Xie • John Cumming • Peter Varkonyi • Scott Boyer • Thierry Kogej • Catrin Hasselgren • Christian Tyrchan • Lars Carlsson • Magnus Kjellberg • Tobias Noeske • Håkan Nilsson • and many others… • Mats Eriksson • Jonas Ekengren • Ithipol Suriyawongkul Discovery Sciences | CIC
  • 35. Thank you! Discovery Sciences | CIC