Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open PHACTS Webinar Series - Chemistry Platform


Published on

Open PHACTS - Chemistry Platform Update and learnings. An open source platform for normalisation and standardisation

Published in: Science
  • Be the first to comment

  • Be the first to like this

Open PHACTS Webinar Series - Chemistry Platform

  1. 1. Open PHACTS - Chemistry Platform Update and learnings Antony Williams and Valery Tkachenko ORCID ID:0000-0002-2668-4821
  2. 2. @gray_alasdair Big Data Integration 2 OpenPHACTS and CRS Diagram
  3. 3. The Chemical Registration Service Chemistry processing •Validation •Standardization •Properties generation •Properties retrieval Export •RDF •SDF API •Domain-specific searches •Chemical visualization •Properties •Conversions
  4. 4. Subsystems • “CVSP” (frontend, backend, database) • Compounds (frontend, database) • OpenPHACTS API (frontend, database) • Datasources registry (frontend, database) • Processing farm (optional)
  5. 5. Structure-Based Database linking • Open PHACTS, and many other projects requiring the linking of structure databases, depend on mappings • Different databases use different processes for standardization prior at deposition • Examples: PubChem, EBI databases, ChemSpider, etc.
  6. 6. DrugBank • ~60 records can’t be dearomatized unambiguously • ~40 records where InChIs did not match structure • 2 records where SMILES, InChI and name did not match the structure • 7 records with 2 stereo bonds at chiral atoms DB04283 DB04462
  7. 7. Standardizers • EBI Standardizer: / • PubChem Standardizer: https:// • NCGC Standardizer: p=61 • The CVSP Standardizer work in Open PHACTS
  8. 8. Standardization Rules • Available from: • Use the SRS as guidance for standardization • Adjust as necessary to our needs
  9. 9. Nitro groups
  10. 10. Salt and Ionic Bonds
  11. 11. The CVSP System
  12. 12. Supports various file formats
  13. 13. Comptox Chemistry Dashboard Prior to deposition check a deposition…
  14. 14. >3450 compounds in one SDF
  15. 15. 98 Errors, 1571 Warnings
  16. 16. Review Errors
  17. 17. Validation Rule Set
  18. 18. Various Rules Sets Available
  19. 19. CVSP – My own custom rules
  20. 20. ChEMBL Validation Review (of 1.3 million records) • 11,020 records with 4 bonds and zero charge, e.g. CHEMBL501101 or CHEMBL501973 • 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine • 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704
  21. 21. Chemical Validation first… Standardization Second • Chemical Validation detects errors – Standardization FIXES them according to rules • SMIRKS transformations are based on both InChI Normalization and FDA SRS rules
  22. 22. Standardization SMIRKS Examples of InChI normalization [*;H+:1]>>[*;H:1] [O,S,Se,Te:1]=[O+,S+,Se+,Te+:2][C-;v3:3]>>[O,S,Se,Te:1]=[O,S,Se,Te:2]=[C:3] [N-,P-,As-,Sb-:1]=[C+;v3:2]>>[N,P,As,Sb:1]#[C:2] Examples of FDA SRS rules [n:1]=[O:2]>>[n+:1][O-:2] [*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3] [N+0;H3:1].[C:3](=[O:4])[O:5][H:6]>>[N+1;H4:1].[C:3](=[O:4])[O-:5] Thiopurine [H:1][S:2][c:3]1[n:8][c:7]([H,*:13])[n:6][c:5]2[c:4]1[n:11][c:10] ([H,*:12])[n:9]2>>[H:1][N:8]1[C:7]([H,*:13])=[N:6][C:5]2=[C:4]([N:11]=[C:10] ([H,*:12])[N:9]2)[C:3]1=[S:2]
  23. 23. Examples of Standardization Double bond with adjacent wiggly single bond Collapser hydrogen atoms with no stereo bonds
  24. 24. Examples of Standardization Remove symmetric stereocenters Turn off chiral flag if no up or down bonds
  25. 25. Defining a Community Rule Set • There are multiple standardizers, each with their own rules set • Can we decide on a default community rules set, like Standard InChI, that could be used by ALL Standardizers? • A joint meeting between the Research Data Alliance (RDA), IUPAC and ACS Division of Chemical Information discussed the value and possibilities of this approach (July 2016)
  26. 26. EPA is investigating CVSP • EPA is investigating CVSP as a validation and standardization platform • Considering the API aspects of CVSP to integrate to our registration system • CVSP is a reference implementation and “starting point” for a community rules set
  27. 27. CVSP code is now Open Source • Open Source CVSP code now released • Code is hosted on Open PHACTS Github • Valery Tkachenko will offer future support • Hoping for additional community engagement and support • Some details of availability….
  28. 28. Virtual Machines • OPS_FRONT (all websites and API) • OPS_BACK (all heavy-lifting) • OPS_DB (databases) • VMs are VMware images • Can be converted to other hypervisors
  29. 29. Thank you Emails: and SLIDES: