“ Online chemical database
  with modeling environment”
a summer school course


Sergii Novotarskyi
Iurii Sushko
Chemoinformatics – overview of online resources
Chemical databases

1. PubChem — a database that provides information on the biological
activities of small molecules

2. ChemSpider — a free access service providing a structure centric
community for chemists

3. ChemIDplus — a tool, that provides chemical structure, property, and
toxicity searching

4. ChemBank — a database of chemical structures and assays

5. ChemDB — a set of chemoinformatics tools
Chemoinformatics – overview of online resources
Literature databases

6. PubMed — a service, that includes over 19 million citations from
MEDLINE and other life science journals for biomedical articles back to
1948

7. Toxicology Literature Online (TOXLINE) — references from toxicology
literature

8. ScienceDirect — a full-text scientific database offering articles/chapters
from more than 2,500 peer-reviewed journals and more than 10,000
books

9. ACS Publications — a worldwide scientific community with a collection
of the most cited peer-reviewed journals in the chemical and related
sciences.
Chemoinformatics – overview of online resources
PubChem – start page




URL: http://pubchem.ncbi.nlm.nih.gov/ or   for «PubChem»
Chemoinformatics – overview of online resources
PubChem – search results
Chemoinformatics – overview of online resources
PubChem – compound details
Chemoinformatics – overview of online resources
PubChem – bioassay search results
Chemoinformatics – overview of online resources
ChemSpider – start page




URL: http://www.chemspider.com/ or     for «ChemSpider»
Chemoinformatics – overview of online resources
ChemSpider – search results
Chemoinformatics – overview of online resources
ChemIdPlus – main page




        URL: http://chem.sis.nlm.nih.gov/chemidplus/
                 for «ChemIdPlus»
Chemoinformatics – overview of online resources
ChemIdPlus – search results
Chemoinformatics – overview of online resources
ChemBank – main page




URL: http://chembank.broadinstitute.org/ or   for «ChemBank»
Chemoinformatics – overview of online resources
ChemBank – search results
Chemoinformatics – overview of online resources
ChemDB – main page




URL: http://cdb.ics.uci.edu/ or   for «ChemDB»
Chemoinformatics – overview of online resources
ChemDB – search results
Chemoinformatics – overview of online resources
PubMed – main page




URL: http://www.ncbi.nlm.nih.gov/pubmed/ or       for «PubMed»
Online chemical database with modeling environment
The subject of development



      The web-based service
         The database of physical, chemical and biological properties
             Accumulating experimentally verified data
             Providing user-friendly web-based access to this data

         The QSPR modeling environment
             Providing web-based tools for QSPR modeling
             Storing and “publishing” created models
Online chemical database with modeling environment
Motivation




      Our motivation
         The importance of QSPR modeling

         The importance of web-based tools for QSPR modeling

         The importance to build one more service in this field
Online chemical database with modeling environment
Motivation - QSPR

      Structure-property relationship hypothesis:
                                            “Similar structures - similar properties”




            log (IC50) =   log (IC50) =
            1.87 log(µM)   1.87 log(µM)

      QSPR modeling:
                                            Predicting properties based on available
                                            data for structurally similar molecules.

                                            Structures are represented by a set of
                                            descriptors (atom count, molecular
           log (IC50) =    log (IC50) = ?   weight).
           0.64 log(µM)
Online chemical database with modeling environment
QSPR – Similarity in descriptor space

     Number of specific fragments in a molecule
Online chemical database with modeling environment
Motivation - web-based tools for modeling




      Main benefits of web-based tools:
          Availability and accessibility
          only a computer with Internet access and a modern web-browser required
          to start working; possibility to share work materials among several
          locations; works with any platform (Linux, Win, Mac)
          Communication and collaboration
          possibility to work on common topics, publish own results and use new
          results of other people
Online chemical database with modeling environment
Motivation - one more web-based tool




      Reasons to build one more service:
          Different approach to data modification
          a completely open database, any user can add, delete and edit data (only
          constrained by a set of simple rules)

          Different approach to data organization
          data in the database is organized in a way, suitable for QSPR modeling

          Integration of a database with modeling tools
          data from the database can be used for model creation and property
          prediction
Online chemical database with modeling environment
Distinctive features

       The features, that make our service different:
           “Wiki” approach to data handling
           users can add, modify and delete data
           Mandatory reference to an article
           every record in a database should contain a reference to an article, where
           the data was published
           Storing additional information
           we store measurement conditions to increase data quality
           Several tools to support decision making
           integration with other web-services (validation of molecule names against
           PubChem database, automatic fetching of article information from
           PubMed), duplicate records management
           Aimed at model building
           convenient to build training sets from data - filter by property, article and
           export data either to internal modeling tools or download as Excel file
Online chemical database with modeling environment
Data structure
Online chemical database with modeling environment
Simplified data structure

      Records                                  Properties



                                               Conditions

      Molecules                        Users

                                                            Units
                            Articles



                                                            Journals
Online chemical database with modeling environment
User interface agreements

Browser-based interface
Online chemical database with modeling environment
User interface agreements

Browser-based interface
Online chemical database with modeling environment
User interface agreements

Icons
        Edit current record (item, article, unit, etc.)

        Delete current record

        Most places — open record-specific submenu, sometimes — view profile

        Open a wiki page with additional explanations

        Send a message to the user

        Download data in XLS format

        Select item
Online chemical database with modeling environment
Summary

    The database currently contains:

          More than 50000 records

          Around 285 properties

          More than 2700 articles
Thank you
Online chemical database with modeling environment
Practical course - outline

•      Collection of data from original literature

•      Use of publicly available tools for literature and cmemical structure
       lookup

•      Introduction of data to OCHEM — single record

•      Collection of data from benchmark literature

•      Introduction of data to OCHEM — batch upload
Online chemical database with modeling environment
Practical course – collection of data – before we start

      Article name       PubMedID     Compound name       Value
  1




  2




  3

  4




  5
Online chemical database with modeling environment
Practical course – collection of data

The goal: achive data on CYP450 1A2 inhibitors and noninhibitors

Cytochrome P450 (abbreviated CYP, P450, CYP450) is a very large and diverse
      superfamily of hemoproteins found in all domains of life. © Wikipedia

PubMed search terms: CYP1A2 inhibition
Online chemical database with modeling environment
Practical course – data collection

      Article name                      PubMedID   Compound name                                    CYP
                                                                                                    Modulation
  1   Chemical genomics of                         •3H-1,2-dithiole-3-thione                        Inhibitor
      cancer chemopreventive
                                        19126641   •4-methyl-5-pyrazinyl-3H-1,2-dithiole-3-thione   Noninhibitor
      dithiolethiones                              •5-tert-butyl-3H-1,2-dithiole-3-thione           Noninhibitor


  2   Comprehensive in vitro                                                                        Noninhibitor
      analysis of voriconazole          19029318   Voriconazole
      inhibition of eight cytochrome
      P450 (CYP) enzymes: major
      effect on CYPs 2B6, 2C9,
      2C19, and 3A

  3   Involvement of CYP1A2 in                     Mexiletine                                       Inhibitor
      mexiletine metabolism             9690950
  4   Differential inhibition of                   Indinavir                                        Noninhibitor
      cytochrome P450 isoforms by       9278209
      the protease inhibitors,
      ritonavir, saquinavir and
      indinavir
  5   An evaluation of potential                   Clorgyline                                       Inhibitor
      mechanism-based inactivation      16669850
      of human drug metabolizing
      cytochromes P450 by
      monoamine oxidase
      inhibitors,including isoniazid.
Online chemical database with modeling environment
Practical course – data introduction – cheat sheet

Good chemistry lookup engine: PubChem (find URL in Google.com)

We search by name, and want to get structure

Convenient structure representation - SMILES

Property: CYP450 Modulation

Condition: CYP450 Type = CYP1A2
Online chemical database with modeling environment
Practical course – batch data introduction – template

•   CASRN — CAS registration number
•   SMILES — smiles string
•   NAME — molecule name
•   ARTICLEID — article identifier (PubMed or OCHEM)
•   PAGE — article page
•   TABLE — article table
•   LINE — article line
•   COMMENT — text comment
•   REFERENCE — record reference
•   CYP450 Modulation — value of the property
•   Unit — measurment unit of the property
•   Accuracy — measurment accuracy
•   Interval — measurmen interval
•   CYP450 Type — record condition
Online chemical database with modeling environment
Practical course – batch data introduction – cheat sheet
   •   Article URL: http://tinyurl.com/rendic
   •   Article title: «Summary of information on human CYP enzymes:
       human P450 metabolism data»
   •   Good chemistry lookup engine: PubChem (find URL in Google.com)
   •   We search by name, and want to get structure
   •   Convenient structure representation - SMILES
   •   Property: CYP450 Modulation
   •   Condition: CYP450 Type = CYP1A2
   •   Reference = 1
   •   ArticleID = Q1592
   •   Batch upload template URL: http://tinyurl.com/bu-template
Thank you (once more)

Online Chemical Database with Modelling Environment

  • 1.
    “ Online chemicaldatabase with modeling environment” a summer school course Sergii Novotarskyi Iurii Sushko
  • 2.
    Chemoinformatics – overviewof online resources Chemical databases 1. PubChem — a database that provides information on the biological activities of small molecules 2. ChemSpider — a free access service providing a structure centric community for chemists 3. ChemIDplus — a tool, that provides chemical structure, property, and toxicity searching 4. ChemBank — a database of chemical structures and assays 5. ChemDB — a set of chemoinformatics tools
  • 3.
    Chemoinformatics – overviewof online resources Literature databases 6. PubMed — a service, that includes over 19 million citations from MEDLINE and other life science journals for biomedical articles back to 1948 7. Toxicology Literature Online (TOXLINE) — references from toxicology literature 8. ScienceDirect — a full-text scientific database offering articles/chapters from more than 2,500 peer-reviewed journals and more than 10,000 books 9. ACS Publications — a worldwide scientific community with a collection of the most cited peer-reviewed journals in the chemical and related sciences.
  • 4.
    Chemoinformatics – overviewof online resources PubChem – start page URL: http://pubchem.ncbi.nlm.nih.gov/ or for «PubChem»
  • 5.
    Chemoinformatics – overviewof online resources PubChem – search results
  • 6.
    Chemoinformatics – overviewof online resources PubChem – compound details
  • 7.
    Chemoinformatics – overviewof online resources PubChem – bioassay search results
  • 8.
    Chemoinformatics – overviewof online resources ChemSpider – start page URL: http://www.chemspider.com/ or for «ChemSpider»
  • 9.
    Chemoinformatics – overviewof online resources ChemSpider – search results
  • 10.
    Chemoinformatics – overviewof online resources ChemIdPlus – main page URL: http://chem.sis.nlm.nih.gov/chemidplus/ for «ChemIdPlus»
  • 11.
    Chemoinformatics – overviewof online resources ChemIdPlus – search results
  • 12.
    Chemoinformatics – overviewof online resources ChemBank – main page URL: http://chembank.broadinstitute.org/ or for «ChemBank»
  • 13.
    Chemoinformatics – overviewof online resources ChemBank – search results
  • 14.
    Chemoinformatics – overviewof online resources ChemDB – main page URL: http://cdb.ics.uci.edu/ or for «ChemDB»
  • 15.
    Chemoinformatics – overviewof online resources ChemDB – search results
  • 16.
    Chemoinformatics – overviewof online resources PubMed – main page URL: http://www.ncbi.nlm.nih.gov/pubmed/ or for «PubMed»
  • 17.
    Online chemical databasewith modeling environment The subject of development The web-based service The database of physical, chemical and biological properties Accumulating experimentally verified data Providing user-friendly web-based access to this data The QSPR modeling environment Providing web-based tools for QSPR modeling Storing and “publishing” created models
  • 18.
    Online chemical databasewith modeling environment Motivation Our motivation The importance of QSPR modeling The importance of web-based tools for QSPR modeling The importance to build one more service in this field
  • 19.
    Online chemical databasewith modeling environment Motivation - QSPR Structure-property relationship hypothesis: “Similar structures - similar properties” log (IC50) = log (IC50) = 1.87 log(µM) 1.87 log(µM) QSPR modeling: Predicting properties based on available data for structurally similar molecules. Structures are represented by a set of descriptors (atom count, molecular log (IC50) = log (IC50) = ? weight). 0.64 log(µM)
  • 20.
    Online chemical databasewith modeling environment QSPR – Similarity in descriptor space Number of specific fragments in a molecule
  • 21.
    Online chemical databasewith modeling environment Motivation - web-based tools for modeling Main benefits of web-based tools: Availability and accessibility only a computer with Internet access and a modern web-browser required to start working; possibility to share work materials among several locations; works with any platform (Linux, Win, Mac) Communication and collaboration possibility to work on common topics, publish own results and use new results of other people
  • 22.
    Online chemical databasewith modeling environment Motivation - one more web-based tool Reasons to build one more service: Different approach to data modification a completely open database, any user can add, delete and edit data (only constrained by a set of simple rules) Different approach to data organization data in the database is organized in a way, suitable for QSPR modeling Integration of a database with modeling tools data from the database can be used for model creation and property prediction
  • 23.
    Online chemical databasewith modeling environment Distinctive features The features, that make our service different: “Wiki” approach to data handling users can add, modify and delete data Mandatory reference to an article every record in a database should contain a reference to an article, where the data was published Storing additional information we store measurement conditions to increase data quality Several tools to support decision making integration with other web-services (validation of molecule names against PubChem database, automatic fetching of article information from PubMed), duplicate records management Aimed at model building convenient to build training sets from data - filter by property, article and export data either to internal modeling tools or download as Excel file
  • 24.
    Online chemical databasewith modeling environment Data structure
  • 25.
    Online chemical databasewith modeling environment Simplified data structure Records Properties Conditions Molecules Users Units Articles Journals
  • 26.
    Online chemical databasewith modeling environment User interface agreements Browser-based interface
  • 27.
    Online chemical databasewith modeling environment User interface agreements Browser-based interface
  • 28.
    Online chemical databasewith modeling environment User interface agreements Icons Edit current record (item, article, unit, etc.) Delete current record Most places — open record-specific submenu, sometimes — view profile Open a wiki page with additional explanations Send a message to the user Download data in XLS format Select item
  • 29.
    Online chemical databasewith modeling environment Summary The database currently contains: More than 50000 records Around 285 properties More than 2700 articles
  • 30.
  • 31.
    Online chemical databasewith modeling environment Practical course - outline • Collection of data from original literature • Use of publicly available tools for literature and cmemical structure lookup • Introduction of data to OCHEM — single record • Collection of data from benchmark literature • Introduction of data to OCHEM — batch upload
  • 32.
    Online chemical databasewith modeling environment Practical course – collection of data – before we start Article name PubMedID Compound name Value 1 2 3 4 5
  • 33.
    Online chemical databasewith modeling environment Practical course – collection of data The goal: achive data on CYP450 1A2 inhibitors and noninhibitors Cytochrome P450 (abbreviated CYP, P450, CYP450) is a very large and diverse superfamily of hemoproteins found in all domains of life. © Wikipedia PubMed search terms: CYP1A2 inhibition
  • 34.
    Online chemical databasewith modeling environment Practical course – data collection Article name PubMedID Compound name CYP Modulation 1 Chemical genomics of •3H-1,2-dithiole-3-thione Inhibitor cancer chemopreventive 19126641 •4-methyl-5-pyrazinyl-3H-1,2-dithiole-3-thione Noninhibitor dithiolethiones •5-tert-butyl-3H-1,2-dithiole-3-thione Noninhibitor 2 Comprehensive in vitro Noninhibitor analysis of voriconazole 19029318 Voriconazole inhibition of eight cytochrome P450 (CYP) enzymes: major effect on CYPs 2B6, 2C9, 2C19, and 3A 3 Involvement of CYP1A2 in Mexiletine Inhibitor mexiletine metabolism 9690950 4 Differential inhibition of Indinavir Noninhibitor cytochrome P450 isoforms by 9278209 the protease inhibitors, ritonavir, saquinavir and indinavir 5 An evaluation of potential Clorgyline Inhibitor mechanism-based inactivation 16669850 of human drug metabolizing cytochromes P450 by monoamine oxidase inhibitors,including isoniazid.
  • 35.
    Online chemical databasewith modeling environment Practical course – data introduction – cheat sheet Good chemistry lookup engine: PubChem (find URL in Google.com) We search by name, and want to get structure Convenient structure representation - SMILES Property: CYP450 Modulation Condition: CYP450 Type = CYP1A2
  • 36.
    Online chemical databasewith modeling environment Practical course – batch data introduction – template • CASRN — CAS registration number • SMILES — smiles string • NAME — molecule name • ARTICLEID — article identifier (PubMed or OCHEM) • PAGE — article page • TABLE — article table • LINE — article line • COMMENT — text comment • REFERENCE — record reference • CYP450 Modulation — value of the property • Unit — measurment unit of the property • Accuracy — measurment accuracy • Interval — measurmen interval • CYP450 Type — record condition
  • 37.
    Online chemical databasewith modeling environment Practical course – batch data introduction – cheat sheet • Article URL: http://tinyurl.com/rendic • Article title: «Summary of information on human CYP enzymes: human P450 metabolism data» • Good chemistry lookup engine: PubChem (find URL in Google.com) • We search by name, and want to get structure • Convenient structure representation - SMILES • Property: CYP450 Modulation • Condition: CYP450 Type = CYP1A2 • Reference = 1 • ArticleID = Q1592 • Batch upload template URL: http://tinyurl.com/bu-template
  • 38.