NCI/CADD Chemical Structure Web ServicesMarkus SitzmannComputer-Aided Drug Design Group, Chemical Biology Laboratory,Frede...
http://cactus.nci.nih.gov
Chemical Structure Web API                                                                      external      Chemical    ...
Chemical Structures                      SYBYL Line Notation  SMILES                                         CAS Registry ...
Chemical Identifier Resolver (CIR)                          CIR works as a resolver for different                         ...
Chemical Identifier Resolver (CIR)                         • officially released in June 2009                         • si...
CIR Usage Statistics12,000,000               Requests per month since June 200910,000,000 8,000,000 6,000,000 4,000,000 2,...
Top Users (US)Academic/Hospitals                Pharma/Chemical Industry• St. Olaf College                • Eli Lilly• Car...
External web services and applications •   CIR node for KNIME, by Talete s.r.l. •   Lab Helper app for Windows Phone •   A...
Examples using CIR
Chemical Identifier Resolver (CIR)                                C7H6O2                                APtclcactv03051222...
Chemical Identifier Resolver (CIR)                              benzoic acid                              65-85-0         ...
Chemical Identifier Resolver (CIR)                               InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-N                     ...
Chemical Identifier Resolver (CIR)programmatic URL API: http://cactus.nci.nih.gov/chemical/structure/”identifier”/”represe...
Chemical Identifier Resolver (CIR)• access by programming libraries/languages (e.g. Python): from urllib2 import * url = “...
Chemical Identifier Resolver (CIR)examples:http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas 20...
Chemical Identifier Resolver (CIR)                                                                         /smiles        ...
(Partial) InChIKey Lookup• resolve Standard InChIKey into full structure representation:                                  ...
Chemical File Representation• available file format representations:http://cactus.nci.nih.gov/chemical/structure/Aspirin/f...
Chemical Structure Images (GIF, PNG)                                                                 Buckyball            ...
Chemical Properties• request molecular weight:                                                          Aspirinhttp://cact...
Chemical Name Lookup• request (alternative) names:http://cactus.nci.nih.gov/chemical/structure/Aspirin/names/xml <?xml ver...
Chemical Name Pattern Search• Google-like searches on CIR’s name index (approx. 70 million names) example: all chemical na...
Search name pattern ‘+morphine +methyl’: 7 matching names<request string="+morphine +methyl" representation="stdinchikey">...
Chemical Name Pattern Searchexample: chemical names that contain the words “morphine” and “methyl”but not “hydroxyl” (name...
NCI/CADD Chemical Structure DataBase            CSDB 2010
Chemical Structure Normalization/Identifier• stepwise process:                      structure                       hashco...
Chemical Structure Normalization/Identifier• calculation of a set of parent structures with different  sensitivity to chem...
NCI/CADD Identifiers (FICTS, FICuS, uuuuu)                                                                                ...
Chemical Structure Normalization/Identifier• calculation of Standard InChIKey from the union set of  parent structures    ...
Chemical Structure Database (CSDB)• ChemNavigator iResearch Library  compilation of commercially available screening  comp...
NCI/CADD Chemical Structure DataBase            CSDB 2013
Chemical Structure Database 2013• >270 small-molecule database• >600 database releases (full, incremental, “historic versi...
Chemical Structure Database 2013InChI/InChIKey (Version 1.04) calculated with four InChI flag sets:Standard Set, Set 1 & S...
Chemical Structure Database 2013• calculation of Standard InChIKey                  structure                             ...
Chemical Structure Database 2013• database schema is entirely implemented in python/  • supports many different database e...
Chemical Structure Database 2013  • SQLAlchemy table definitionstructure_table = Table(‘structure’, metadata,     Column(‘...
Chemical Structure Database 2013• Query the database > s = db.session.query(Structure).filter(Structure.id==1234).one() <o...
Chemical Structure Database• Goals  • index any chemical structures that can be referenced in some way or    has a known s...
NCI/CADD Chemical Web Apps
NCI/CADD Chemical Web Apps• implemented with jQuery Mobile (1.3.0)  • HTML5  • supports web browser on major mobile platfo...
Chemical Activity Predictor - GUSAR            chemical structure             prediction of physicochemical properties and...
Chemical Activity Predictor - GUSARGUSAR Software                      characteristics:                      chemical stru...
Chemical Activity Predictor - GUSAR            GUSAR Software                     1.00                     0.90           ...
Chemical Activity Predictor - GUSAR• QSAR-based models created by GUSAR can be used separately  from the application• broa...
Chemical ActivitiesCategories              Models            Endpoints                                          Boiling po...
Activity Endpoints
Activity Endpoints
Activity Endpoints
Activity Endpoints
Prediction ResultsGUSAR• value• unit• in applicability domain• quantitative and  qualitative models
Chemical Activity Predictor – GUSAR beta          http://cactus.nci.nih.gov/chemial/apps
Chemical Activity Predictor – GUSAR beta          http://cactus.nci.nih.gov/chemial/apps
Chemical Structure Lookup Service (CSLS)• first version was released in 2006, development stalled in 2008• new version wil...
InChI/InChIKey Resolver
InChI/InChIKey Resolver                          “loose coupling”                          of InChI resolvers             ...
InChI/InChIKey Resolver• Evan Bolton (NCBI, NLM, NIH)• Valery Tkachenko (RSC/ChemSpider)• Marc Nicklaus (CADD Group, NCI, ...
Chemical Structure Web API                                                                        external      Chemical  ...
Chemical Structure Web API                                                                        external      Chemical  ...
http://cactus.nci.nih.gov/blog
AcknowledgementsNCI/CADD Team              ChemNavigatorAlexey Zakharov            Scott HuttonLaura Guasch Pàmies        ...
Acknowledgments - Software             CACTVS                                       Python Web Framework                  ...
http://cactus.nci.nih.gov
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
ACS Meeting New Orleans 2013 (CINF)
Upcoming SlideShare
Loading in …5
×

ACS Meeting New Orleans 2013 (CINF)

3,969 views

Published on

MY presentation during the CINF Public Databases session at the ACS Meeting in New Orleans.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,969
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • While usage of CIR since its unofficial announcement in 2009 hovered around the few-100,000 requests per month mark for the first two and a half years or so, it took off about a year ago and crossed the 10 million mark per month early this year, [and has been at or above 2 million per month for every month for a year now.]
  • This slide shows some of the top users by request count. There are some very well known names, and to-be-expected users. And also maybe some more surprising ones... ...and yes, Merck is also among the users from the pharma sector. [40 sec]
  • As it was designed to be, CIR is used .... ...as well as in educational tools such as the CheMagic educational web site put together by Otis Rothenberger at Illinois State University [based on the Jmol Virtual Molecular Model Kit] which centrally depends on our CIR
  • … but you can do things also independently from InChI – this is the general scheme Almost every identifier or representation can be converted to any other representation
  • ACS Meeting New Orleans 2013 (CINF)

    1. 1. NCI/CADD Chemical Structure Web ServicesMarkus SitzmannComputer-Aided Drug Design Group, Chemical Biology Laboratory,Frederick National Laboratory for Cancer Research, NIH, DHHS
    2. 2. http://cactus.nci.nih.gov
    3. 3. Chemical Structure Web API external Chemical NCI/CADD NCI/CADD web services Identifier web service web service Resolver http Chemical Structure Web API other CACTVS software packages NCI/CADD Chemical Structure OPSIN DataBase (CSDB)
    4. 4. Chemical Structures SYBYL Line Notation SMILES CAS Registry Number chemical names GIF image ChemNavigator SID SD File chemical structure CML FDA UNII NCI/CADD Identifiers NSC number MRV InChI/InChIKey PubChem SID/CID ChemSpider ID ChEBI ID Chemical Formula PDB Ligand ID
    5. 5. Chemical Identifier Resolver (CIR) CIR works as a resolver for different chemical structure identifiers or representations. It allows one to convert a given structure identifier into another representation or structure identifier. http://cactus.nci.nih.gov/chemical/structure
    6. 6. Chemical Identifier Resolver (CIR) • officially released in June 2009 • since then four beta versions (for testing, learning, experience things) • one larger database update March 2010 • since early 2012: major internal rewrite (which will allow us to add new services and API functionality while not breaking the existing API) • major database update and services planned for 2013 http://cactus.nci.nih.gov/chemical/structure
    7. 7. CIR Usage Statistics12,000,000 Requests per month since June 200910,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 Typical number of unique IP addresses per month: 4,000 – 8,000 7
    8. 8. Top Users (US)Academic/Hospitals Pharma/Chemical Industry• St. Olaf College • Eli Lilly• Carnegie Mellon • Dow Chemical• Drexel University • Intermune• Princeton • Procter & Gamble• Mayo • VertexU.S. Government Other• EPA • Google• NIH (NIEHS, NCI, NLM...) • Amazon• Lawrence Livermore Natl. Lab. • HP• CDC • Agilent• DoD • Symyx 8
    9. 9. External web services and applications • CIR node for KNIME, by Talete s.r.l. • Lab Helper app for Windows Phone • Avogadro molecule editor • Jmol/JSmol open-source viewer for chemical structures in 3D • GChem for Google Spreadsheet • Bioclipse (CIR plugin) • Macs in Chemistry • Accelrys Draw ...and educational tools/sites such as: • Jmol/JSmol Virtual Molecular Model Kit • ISU CheMagic • Caltech Library 9
    10. 10. Examples using CIR
    11. 11. Chemical Identifier Resolver (CIR) C7H6O2 APtclcactv03051222202D 0 0.00000 0.00000 15 15 0 0 0 0 0 0 0 0999 V2000 2.8660 -2.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -0.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 0.9400 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -2.6800 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 2.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 1 6 1 0 0 0 0ChemWriter Editor 4 7 1 0 0 0 0 7 8 1 0 0 0 0 7 9 2 0 0 0 0 1 10 1 0 0 0 0 2 11 1 0 0 0 0 3 12 1 0 0 0 0 5 13 1 0 0 0 0 6 14 1 0 0 0 0 8 15 1 0 0 0 0 M END SD file $$$$
    12. 12. Chemical Identifier Resolver (CIR) benzoic acid 65-85-0 WLN: QVR Unisept BZA AIDS018010 Salvo liquid Benzoic acid-ring-UL-14C ST5213864 Benzoesaeure CHEBI:30746 NSC 149 benzenecarboxylic acid phenylformic acid Benzoic acid (JP15/USP) Benzoic acid (TN) 18102_RIEDEL Aromatic hydroxy acid Benzoic acid (7CI,8CI,9CI) Benzoic acid [USAN:JAN] W213128_ALDRICH 47849_SUPELCO Acide benzoique [French] Acido benzoico [Italian] Benzoate (VAN)ChemWriter Editor Benzoesaeure [German] Benzoic acid (natural) Acide benzoique Benzeneformic acid Benzenemethanoic acid Benzoesaeure GK Benzoesaeure GV Benzoic acid, tech. Carboxybenzene Kyselina benzoova names Phenylcarboxylic acid
    13. 13. Chemical Identifier Resolver (CIR) InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-N InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9) C1=CC=C(C=C1)C(O)=OChemWriter Editor InChIKey InChI SMILES
    14. 14. Chemical Identifier Resolver (CIR)programmatic URL API: http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”if a request is not successful: HTTP404 status message
    15. 15. Chemical Identifier Resolver (CIR)• access by programming libraries/languages (e.g. Python): from urllib2 import * url = “http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas” resolver = urlopen(url) try: response = resolver.read() except HTTPError: raise “your own error handling” print response 204255-11-8• access from Unix shell level (e.g., via wget): shell > wget -qO - http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas 204255-11-8
    16. 16. Chemical Identifier Resolver (CIR)examples:http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas 204255-11-8 MIME type: text/plainhttp://cactus.nci.nih.gov/chemical/structure/tamiflu/image MIME type: image/gif
    17. 17. Chemical Identifier Resolver (CIR) /smiles chemical names /names, /iupac_name IUPAC names (OPSIN) /cas CAS numbers /inchi, /stdinchi SMILES strings /inchikey, /stdinchikey IUPAC InChI/InChIKeys /ficts, /ficus, /uuuuu NCI/CADD Identifiers /image CACTVS HASHISY CIR /file, /sdf NSC number http://cactus.nci.nih.gov/chemcial/structure /mw, /monoisotopic_mass PubChem SID /formula ZINC Code /twirl ChemSpider ID /urls ChemNavigator SID /chemspider_id eMolecule VID /pubchem_sid UNII /chemnavigator_sid “identifier” “representation”
    18. 18. (Partial) InChIKey Lookup• resolve Standard InChIKey into full structure representation: Ethanolhttp://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smiles CCOhttp://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA/smiles` CCO CC[OH2+]http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ/smiles C(C(O)([2H])[2H])[2H] CC(O)([2H])[2H] C(CO)([2H])([2H])[2H] CC[17OH] C(CO)[2H] [14CH3]CO CCO
    19. 19. Chemical File Representation• available file format representations:http://cactus.nci.nih.gov/chemical/structure/Aspirin/file?format=sdf alc Alchemy format maestro Schroedinger MacroModel cdxml CambridgeSoft ChemDraw XML format structure file format cerius MSI Cerius II format mol Symyx molecule file charmm Chemistry at HARvard sybyl2/mol2 Tripos Sybyl MOL2 format Macromolecular Mechanics file format mrv ChemAxon MRV format cif Crystallographic Information File pdb Protein Data Bank cml Chemical Markup Language sdf Symyx Structure Data Format gjf Gaussian input data file sdf3000 Symyx Structure Data Format 3000 gromacs GROMACS file format sln SYBYL Line Notation hyperchem HyperChem file format smiles SMILES jme Java Molecule Editor format xyz xyz file format
    20. 20. Chemical Structure Images (GIF, PNG) Buckyball http://cactus.nci.nih.gov/chemical/structure/ XMWRBQBLMFGWIX-UHFFFAOYSA-N/image ?height=300&width=300&bgcolor=black&bondcolor=white http://cactus.nci.nih.gov/chemical/structure/Aspirin/image ?height=200&width=200&symbolfontsize=7&footer="Aspirin"
    21. 21. Chemical Properties• request molecular weight: Aspirinhttp://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/weight 180.1598 MIME type: text/plain /mw molecular weight /aromatic compound is aromatic /formula formula /macrocyclic compound is macrocyclic /monoisotopic_mass monoisotopic mass /heteroatom_count heteroatom count /h_bond_donor_count H bond donor count /hydrogen_atom_count H atom count /h_bond_acceptor_count H bond acceptor count /heavy_atom_count heavy atom count /h_bond_center_count H bond center count /deprotonable_group_count number of /rotor_count number of rotatable bonds deprotonable groups /effective_rotor_count number of effectively /protonable_group_count number of rotatable bonds protonable groups /rule_of_5_violation_count number of Rule-of-5 /ring_count number of rings violations /ringsys_count number of ringsystems /xlogp2 octanol−water partition coefficient XLOGP2
    22. 22. Chemical Name Lookup• request (alternative) names:http://cactus.nci.nih.gov/chemical/structure/Aspirin/names/xml <?xml version="1.0" encoding="UTF-8" ?> <request string=“Aspirin" representation="names"> <data id="1" resolver=“name" string_class=“Name"> <item id="1" classification=“pubchem_iupac_name">2-acetyloxybenzoic acid</item> <item id="2" classification="pubchem_iupac_openeye_name">2-Acetoxybenzoic acid</item> <item id="3" classification="pubchem_generic_registry_name">50-78-2</item> <item id="4" classification="pubchem_generic_registry_name">11126-35-5</item> <item id="5" classification="pubchem_generic_registry_name">11126-37-7</item> <item id="6" classification="pubchem_generic_registry_name">2349-94-2</item> <item id="7" classification="pubchem_generic_registry_name">26914-13-6</item> <item id="8" classification="pubchem_substance_synonym">NCGC00090977-04</item> <item id="9" classification="pubchem_substance_synonym">KBioSS_002272</item> <item id="10" classification="pubchem_substance_synonym">SBB015069</item> <item id="11" classification="pubchem_substance_synonym">Aspirin</item> <item id="12" classification="pubchem_substance_synonym">D00109</item> […]
    23. 23. Chemical Name Pattern Search• Google-like searches on CIR’s name index (approx. 70 million names) example: all chemical names that contain the words “morphine” and “methyl” (name pattern: ‘+morphine +methyl‘): http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl/stdinchikey/xml?resolver=name_pattern based on the open source full text search server Sphinx (http://sphinxsearch.com)
    24. 24. Search name pattern ‘+morphine +methyl’: 7 matching names<request string="+morphine +methyl" representation="stdinchikey"> <data id="1" resolver="name_pattern" notation="Morphine 3-methyl ether"> <item id="1">InChIKey=OROGSEYTTFOCAN-DNJOTXNNSA-N</item> </data> <data id="2" resolver="name_pattern" notation="6-Methyl-delta(sup 6)-deoxy-morphine"> <item id="1">InChIKey=CUFWYVOFDYVCPM-GGNLRSJOSA-N</item> </data> <data id="3" resolver="name_pattern" notation="Morphine, dihydro-6-methyl-"> <item id="1">InChIKey=NBKVWIJQJMEQLE-NGTWOADLSA-N</item> </data> <data id="4" resolver="name_pattern“ notation="6-METHYL-MORPHINE ETHER"> <item id="1">InChIKey=FNAHUZTWOVOCTL-UHFFFAOYSA-N</item> </data> <data id="5" resolver="name_pattern" notation="Morphine alcoholic methyl ether"> <item id="1">InChIKey=FNAHUZTWOVOCTL-XSSYPUMDSA-N</item> </data> <data id="6" resolver="name_pattern" notation="N-Methyl morphine chloride"> <item id="1">InChIKey=MJNCZWBHCFTYFU-SCLAZZCHSA-N</item> </data> <data id="7" resolver="name_pattern" notation="Morphine, 7-hydroxy-6,6-dimethoxy-3-O-methyl-"> <item id="1">InChIKey=URFKRBIESURBKC-UHFFFAOYSA-N</item> </data></request>
    25. 25. Chemical Name Pattern Searchexample: chemical names that contain the words “morphine” and “methyl”but not “hydroxyl” (name pattern: ‘+morphine +methyl -hydroxyl‘):http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl -hydroxyl/stdinchikey/xml?resolver=name_pattern 6 matching namesexample: chemical names that contain the substring “morphine”somewhere in the name (name pattern: ‘*morphine*‘)http://cactus.nci.nih.gov/chemical/structure/*morphine*/stdinchikey/xml?resolver=name_pattern 45 matching namesexample: chemical names that contain a single character “m” and the word“benzene” in a maximum distance of 3 words (finds meta-substituted aromaticcompounds, name pattern: ‘“m benzene”~3‘):http://cactus.nci.nih.gov/chemical/structure/(m benzene)~3/stdinchikey/xml?resolver=name_pattern 22 matching names
    26. 26. NCI/CADD Chemical Structure DataBase CSDB 2010
    27. 27. Chemical Structure Normalization/Identifier• stepwise process: structure hashcode original normalization calculation parent NCI/CADD structure structure Identifier record E_HASHISY Molfile SDF SDF SMILES SMILES database ChemDraw cdx PDB original structure records, parent structures and identifiers are stored in the database
    28. 28. Chemical Structure Normalization/Identifier• calculation of a set of parent structures with different sensitivity to chemical features: structure hashcode original normalization calculation parent NCI/CADD structure structure Identifier record E_HASHISY FICTS FICTS FICuS FICuS uuuuu uuuuu all steps are performed using CACTVS
    29. 29. NCI/CADD Identifiers (FICTS, FICuS, uuuuu) Obased on CACTVS hashcodes (HASHISY) OH HN16-digit hexadecimal number (64-bit unsigned) N NH 2 9850FD9F9E2B4E25structure normalization - histidine: O O O O O Na+ HN OH N OH HN O- HN OH HN OH N NH NH NH2 N NH2 N NH2 N NH2 tautomer 1 tautomer 2 salt R S 9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS9850FD9F9E2B4E25-FICuS 9850FD9F9E2B4E25-FICuS E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu
    30. 30. Chemical Structure Normalization/Identifier• calculation of Standard InChIKey from the union set of parent structures structure hashcode original normalization calculation parent NCI/CADD structure structure Identifier record E_HASHISY FICTS FICuS union set: Standard InChIKey uuuuu 1.03
    31. 31. Chemical Structure Database (CSDB)• ChemNavigator iResearch Library compilation of commercially available screening compounds from ~300 international chemistry suppliers PubChem ChemNav. ~38%• PubChem Substance Database iResearch Lib. including Open NCI database, EPA DSSTox ~56% databases, NIAID HIV database, NIST Webbook, NLM ChemIDplus, ChemSpider, … ~6%• Commercial Sources / others others Asinex, Comgenex, eMolecules, … current status: 140 chemical structure databases (released March 2010) 120 million structure records 84.6 million unique structures by FICuS 110 million Standard InChIKeys for lookup
    32. 32. NCI/CADD Chemical Structure DataBase CSDB 2013
    33. 33. Chemical Structure Database 2013• >270 small-molecule database• >600 database releases (full, incremental, “historic versions”)• 385 million original database recordsunique structure count: FICTS ~125.0 million FICuS ~121.4 million uuuuu ~109.0 million union set: 141.7 million unique structures
    34. 34. Chemical Structure Database 2013InChI/InChIKey (Version 1.04) calculated with four InChI flag sets:Standard Set, Set 1 & Set 2: addition of hydrogen atoms by CACTVSSet 3: addition of hydrogen atoms by the InChI library CACTVS Standard : Add H Standard InChIKey Set 1 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T Set 2 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T Set 3 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T
    35. 35. Chemical Structure Database 2013• calculation of Standard InChIKey structure hashcode original normalization calculation parent NCI/CADD structure structure Identifier record E_HASHISY FICTS FICuS union set: uuuuu Standard InChIKey 1.04 Standard Set 1 Set 2 Set 3
    36. 36. Chemical Structure Database 2013• database schema is entirely implemented in python/ • supports many different database engines: Oracle, PostreSQL, MySQL • SQLAlchemy provides: • the communication layer with the database engine • creates a object-oriented data model representation of the database to the “python”-side • table relationships: • either defined by Foreign Key relationships in the database or specified on python level • SQLAlchemy creates table joins on the SQL level
    37. 37. Chemical Structure Database 2013 • SQLAlchemy table definitionstructure_table = Table(‘structure’, metadata, Column(‘id’, Integer, primary_key=True, autoincrement=True), Column(‘hash’, Char(16), unique=True, Column(‘smiles’, Text()), schema=schema)class Structure(TableRepr, TableInit): __table__ = structure_tablemapper(Structure, structure_table, relationship={ ‘name’: relationship(Name, backref=backref(‘structure’, primaryjoin=structure_table.c.id=name_table.c.structure_id})
    38. 38. Chemical Structure Database 2013• Query the database > s = db.session.query(Structure).filter(Structure.id==1234).one() <object “Structure”> > s.smiles CCO• if the object-oriented data model representation creates too much overhead, SQLAlchemy supports writing “almost bare” SQL but still follows the python paradigms > q = select([structure_table.c.id,]).where(structure.c.id==1234) > s = q.execute().fetchone() (CCO,)
    39. 39. Chemical Structure Database• Goals • index any chemical structures that can be referenced in some way or has a known source • may also include virtual chemistry or generic structure collections • collect public dataset/databases/structure collections • normalize them to our standards • make them available in our public web interfaces and APIs (if we are allowed to) • no refusal/deletion of structures – curation is performed by “keep the bad and tag it as bad” track chemical space
    40. 40. NCI/CADD Chemical Web Apps
    41. 41. NCI/CADD Chemical Web Apps• implemented with jQuery Mobile (1.3.0) • HTML5 • supports web browser on major mobile platforms: iOS, Android, BlackBerry, WindowsPhone, Windows 8, Palm, Symbian • supports major Desktop web browsers: Google Chrome, Firefox, IE9/10 • WAI-ARIA compliant (W3C specification draft describing accessibility standards of dynamic Web content for people with disabilities)• services will be optimized for usage on tabled-sized touch screens devices, however, not (yet) for smart-phone sized devices (current development is done on an iPad3)• all services work on a common platform
    42. 42. Chemical Activity Predictor - GUSAR chemical structure prediction of physicochemical properties and activities
    43. 43. Chemical Activity Predictor - GUSARGUSAR Software characteristics: chemical structures are represented by QNA descriptors MNA descriptors mathematical algorithm unique algorithm of self- consistent regression allows to select the best set of descriptors main developer for a robust and reliable QSAR Alexey Zakharov model.
    44. 44. Chemical Activity Predictor - GUSAR GUSAR Software 1.00 0.90 comparison was performed on the 0.80 following data sets: 0.70Accuracy (R2 test) 0.60 • ligand–enzyme interactions 0.50 0.40 • ligand–receptor interactions 0.30 • acute toxicity 0.20 0.10 • interaction with drug-metabolism 0.00 CoMFA CoMSIA HQSAR EVA 2D 3D GOLPE GUSAR • enzymes Cerius2 Cerius2
    45. 45. Chemical Activity Predictor - GUSAR• QSAR-based models created by GUSAR can be used separately from the application• broad spectra of chemical/biological activity and property prediction models for small molecules in development: • physicochemical properties • assessment of toxicity, metabolism and antineoplastic activities • HIV-1-related models• will be available as Web App and programmatic URL API:http://cactus.nci.nih.gov/chemical/activity/CCOCC/boiling_point{in_applicability_domain: True, datatype: ‘float’, value: 42.660}
    46. 46. Chemical ActivitiesCategories Models Endpoints Boiling point DensityPhysicochemical Physicochemical Flash pointProperties Models Melting point Surface tension Thermal conductivity Vapor pressure Viscosity Water solubility HIV-1 Integrase (Strand Transfer) InhibitorBiological Activities HIV-Models HIV-1 Reverse Transcriptase Inhibitor
    47. 47. Activity Endpoints
    48. 48. Activity Endpoints
    49. 49. Activity Endpoints
    50. 50. Activity Endpoints
    51. 51. Prediction ResultsGUSAR• value• unit• in applicability domain• quantitative and qualitative models
    52. 52. Chemical Activity Predictor – GUSAR beta http://cactus.nci.nih.gov/chemial/apps
    53. 53. Chemical Activity Predictor – GUSAR beta http://cactus.nci.nih.gov/chemial/apps
    54. 54. Chemical Structure Lookup Service (CSLS)• first version was released in 2006, development stalled in 2008• new version will be based on CSDB• new release planned for 2013• allows easy lookup of chemical structures within the constituting databases in CSDB
    55. 55. InChI/InChIKey Resolver
    56. 56. InChI/InChIKey Resolver “loose coupling” of InChI resolvers provided by different organizations central list of resolvers each resolver must provide a specific protocol.
    57. 57. InChI/InChIKey Resolver• Evan Bolton (NCBI, NLM, NIH)• Valery Tkachenko (RSC/ChemSpider)• Marc Nicklaus (CADD Group, NCI, NIH)• Steven Bachrach (Trinity University)• Antony Williams (RSC/ChemSpider)• Markus Sitzmann (CADD Group, NCI, NIH)
    58. 58. Chemical Structure Web API external Chemical NCI/CADD NCI/CADD web services Identifier web service web service Resolver http Chemical Structure Web API other CACTVS software packages NCI/CADD Chemical Structure OPSIN DataBase (CSDB)
    59. 59. Chemical Structure Web API external Chemical NCI/CADD NCI/CADD web services Identifier web service web service Resolver http Chemical Structure Web API other GUSAR CACTVS software packages NCI/CADD Chemical Structure OPSIN DataBase (CSDB)
    60. 60. http://cactus.nci.nih.gov/blog
    61. 61. AcknowledgementsNCI/CADD Team ChemNavigatorAlexey Zakharov Scott HuttonLaura Guasch Pàmies Tad HurstMegan PeachMarc Nicklaus Pubchem All other database providersXemistry GmbH, GermanyWolf-Dietrich IhlenfeldtInChI Team
    62. 62. Acknowledgments - Software CACTVS Python Web Framework ChemWriter Python SQL Library Peter Ertl (Novartis) Javascript library Fulltext Search Engine
    63. 63. http://cactus.nci.nih.gov

    ×