Towards a comprehensive
computational platform for next
generation drug development –
A Russian‐German joint venture

          Edgar Wingender
                CEO




                            Wolfenbüttel, Am Exer 10b
            GmbH            http://www.genexplain.com
We aim to provide a comprehensive platform of
bioinformatics, systems biological and cheminformatics
                      tools for a
    personalized medicine and pharmacogenomics
Some facts about geneXplain:


  Founded in April 2010, starting active business July 2010
  International (German-Russian) shareholder structure
  Managing directors: E. Wingender (CEO), A. Kel (CSO)
  Product portfolio in bioinformatics, systems biology,
   cheminformatics
  Development close to science and research
  Participation in international and national research consortia
    - SYSCOL (EU FP7)
    - GERONTOSHIELDS (BMBF)
proteins




compounds               networks


            genes
Some facts about geneXplain:


  Founded in April 2010, starting active business July 2010
  International (German-Russian) shareholder structure
  Managing directors: E. Wingender (CEO), A. Kel (CSO)
  Product portfolio in bioinformatics, systems biology,
   cheminformatics
  Close to science and research
  Participation in international and national research
   consortia
    - SYSCOL (EU FP7)
    - GERONTOSHIELDS (BMBF)
    - TEMPUS (EU)
The idea:


  Providing a platform of methods for
  Biomedical research
  Focus: drug development
  Complete pipeline from high-throughput data to a lead structure
  High-throughput data:
       Genomics
       Transkriptomics
       Proteomics
  Public private partnership
GeneXplainTM Platform: A Workflow for Drug Discovery
                                                  The geneXplain platformTM is a new product integrating bio- and cheminformatics tools for pharmacogenomics.
                                                  It provides a drug discovery workflow that guides from the statistical analysis of biological high-throughput
                                                  data to a panel of potential lead compounds for further validation.
             Statistics
   Input: High-throughput data
     from patients (genomics,
                                                                                                                                     Within the geneXplain platformTM,
    transcriptomics, ChIP-seq,
                                                                                                                                     identification of drug target protein
         proteomics, etc.)
                                                                                                                                     molecules by bioinformatics and
  Output: List of relevant genes or
                                                                                                                                     systems      biology   methods,    is
               proteins
                                                                                                                                     complemented by prediction of
                                                                                                 Any pre-processed list of           biological activities and adverse
                                                                                                 genes or proteins from
                                                     Bioinformatics                                                                  effects for chemical compounds,
                                                                                                 own experiments, from
                                         Search for regulatory modules in any                    literature or databases             based on multilevel neighborhoods
                                                     genomic regions                                                                 of atoms (MNA) descriptors.
                                          Output: List of transcription factors
                                        potentially responsible for the observed
                                                (co-)regulation of genes

                                                                                                                                           Any list of transcription factors;
                                                                                                                                           any list of genes or proteins from
                                                                                                                                           own experiments, from literature
                                                                                               Systems Biology
The workflow                                                                                                                               or databases to be mapped on
                                                                                     Topological analysis of the networks                  known pathways
The      incorporated     statistical                                                  upstream of transcription factors,
analyses help to identify relevant                                                    simulation of the network behavior,
genes or proteins in the raw                                                                  patient stratification
data,     e.g.   those  that     are      Hypotheses about
                                          gene regulators                          Output: List of potential master regulators
differentially expressed.
                                          essential for the
The Bioinformatics block allows
                                          studied process
to reveal potential regulation of
genes by transcription factors or
miRNAs.
Systems       biology  approaches
analyze networks of molecular                                                                                                            Cheminformatics
events and suggest promising                                                                                                   Prediction of biological activities of the
                                                                                            Hypotheses                       compounds, selection of compounds with
drug target molecules and their                                                             about target
mechanisms of action.                                                                                                         required effects and without adverse or
                                                                                            molecules and
The integrated PASS tool enables                                                            their role in the                                toxic effects.
to direct compound screening by                                                             studied process                   Output: List of potential lead structures
pre-selection of chemicals with                     Hypotheses for                                                                          for validation
desirable and without adverse or                validations and clinical
toxic effects.                                            trials
                                                Systematic generation of
                                                 statistically significant
                                                       hypotheses
Proof of concept:                    Net2Drug consortium
                                     EU FP6, Coordinator: A. Kel


            Transcriptomics breast cancer cell line


                    Statistical evaluation


               Integrated bioinformatic analysis
                (promoter & pathway analysis)



                Systems biological simulation



       Cheminformatic identification of candidate drugs
Proof of concept:                     Net2Drug consortium
                                      EU FP6, Coordinator: A. Kel


             Transcriptomics breast cancer cell line


  Results:           Statistical evaluation
  Out of 24 million compounds, 16 substances turned out
  to be feasibleIntegrated bioinformatic analysis
                 for experimental testing.
                (promoter & pathway analysis)
  For 2 compounds, highly specific activities were found.

                 Systems biological simulation



       Cheminformatic identification of candidate drugs
GeneXplainTM Platform: A Workflow for Drug Discovery
                                                  The geneXplain platformTM is a new product integrating bio- and cheminformatics tools for pharmacogenomics.
                                                  It provides a drug discovery workflow that guides from the statistical analysis of biological high-throughput
                                                  data to a panel of potential lead compounds for further validation.
             Statistics
   Input: High-throughput data
     from patients (genomics,
                                                                                                                                     Within the geneXplain platformTM,
    transcriptomics, ChIP-seq,
                                                                                                                                     identification of drug target protein
         proteomics, etc.)
                                                                                                                                     molecules by bioinformatics and
  Output: List of relevant genes or
                                                                                                                                     systems      biology   methods,    is
               proteins
                                                                                                                                     complemented by prediction of
                                                                                                 Any pre-processed list of           biological activities and adverse
                                                                                                 genes or proteins from
                                                     Bioinformatics                                                                  effects for chemical compounds,
                                                                                                 own experiments, from
                                         Search for regulatory modules in any                    literature or databases             based on multilevel neighborhoods
                                                     genomic regions                                                                 of atoms (MNA) descriptors.
                                          Output: List of transcription factors
                                        potentially responsible for the observed
                                                (co-)regulation of genes

                                                                                                                                           Any list of transcription factors;
                                                                                                                                           any list of genes or proteins from
                                                                                                                                           own experiments, from literature
                                                                                               Systems Biology
The workflow                                                                                                                               or databases to be mapped on
                                                                                     Topological analysis of the networks                  known pathways
The      incorporated     statistical                                                  upstream of transcription factors,
analyses help to identify relevant                                                    simulation of the network behavior,
genes or proteins in the raw                                                                  patient stratification
data,     e.g.   those  that     are      Hypotheses about
                                          gene regulators                          Output: List of potential master regulators
differentially expressed.
                                          essential for the
The Bioinformatics block allows
                                          studied process
to reveal potential regulation of
genes by transcription factors or
miRNAs.
Systems       biology  approaches
analyze networks of molecular                                                                                                            Cheminformatics
events and suggest promising                                                                                                   Prediction of biological activities of the
                                                                                            Hypotheses                       compounds, selection of compounds with
drug target molecules and their                                                             about target
mechanisms of action.                                                                                                         required effects and without adverse or
                                                                                            molecules and
The integrated PASS tool enables                                                            their role in the                                toxic effects.
to direct compound screening by                                                             studied process                   Output: List of potential lead structures
pre-selection of chemicals with                     Hypotheses for                                                                          for validation
desirable and without adverse or                validations and clinical
toxic effects.                                            trials
                                                Systematic generation of
                                                 statistically significant
                                                       hypotheses
The cheminformatics portfolio:

  PASS
   predicts biological activities of chemical compounds from their structural formulae; assigns
   probability values to each activity and identifies those parts of the molecule that are responsible
   for this activitiy

  PharmaExpert
   mines large amounts of predictions generated by PASS to filter out those compounds that
   optimaly fit user-defined requirements

  GUSAR
   generates quantitative structure-activity relationship (QSAR) models
How to get there:
                      GeneXplainTM Platform: A Workflow for Drug Discovery
                                                  The geneXplain platformTM is a new product integrating bio- and cheminformatics tools for pharmacogenomics.
                                                  It provides a drug discovery workflow that guides from the statistical analysis of biological high-throughput
                                                  data to a panel of potential lead compounds for further validation.
             Statistics
   Input: High-throughput data
     from patients (genomics,
                                                                                                                                     Within the geneXplain platformTM,
    transcriptomics, ChIP-seq,
                                                                                                                                     identification of drug target protein
         proteomics, etc.)
                                                                                                                                     molecules by bioinformatics and
  Output: List of relevant genes or
                                                                                                                                     systems      biology   methods,    is
               proteins
                                                                                                                                     complemented by prediction of
                                                                                                 Any pre-processed list of           biological activities and adverse
                                                                                                 genes or proteins from
                                                     Bioinformatics                                                                  effects for chemical compounds,
                                                                                                 own experiments, from
                                         Search for regulatory modules in any                    literature or databases             based on multilevel neighborhoods
                                                     genomic regions                                                                 of atoms (MNA) descriptors.
                                          Output: List of transcription factors
                                        potentially responsible for the observed
                                                (co-)regulation of genes

                                                                                                                                           Any list of transcription factors;
                                                                                                                                           any list of genes or proteins from
                                                                                                                                           own experiments, from literature
                                                                                               Systems Biology
The workflow                                                                                                                               or databases to be mapped on
                                                                                     Topological analysis of the networks                  known pathways
The      incorporated     statistical                                                  upstream of transcription factors,
analyses help to identify relevant                                                    simulation of the network behavior,
genes or proteins in the raw                                                                  patient stratification
data,     e.g.   those  that     are      Hypotheses about
                                          gene regulators                          Output: List of potential master regulators
differentially expressed.
                                          essential for the
The Bioinformatics block allows
                                          studied process
to reveal potential regulation of
genes by transcription factors or
miRNAs.
Systems       biology  approaches
analyze networks of molecular                                                                                                            Cheminformatics
events and suggest promising                                                                                                   Prediction of biological activities of the
                                                                                            Hypotheses                       compounds, selection of compounds with
drug target molecules and their                                                             about target
mechanisms of action.                                                                                                         required effects and without adverse or
                                                                                            molecules and
The integrated PASS tool enables                                                            their role in the                                toxic effects.
to direct compound screening by                                                             studied process                   Output: List of potential lead structures
pre-selection of chemicals with                     Hypotheses for                                                                          for validation
desirable and without adverse or                validations and clinical
toxic effects.                                            trials
                                                Systematic generation of
                                                 statistically significant
                                                       hypotheses
The way:

             The geneXplain platform

  Integrated collection of bioinformatic and systems
   biological program modules („Bricks“)
     Based on proven BioUML technology
     Statistical analysis of high-throughput data
     Integrated bioinformatic promoter- and network analysis
     Systems biological simulation
  Unified look-and-feel
  Workflow management system
  Pre-defined standard workflows
  Easy integration of own tools and scripts
Upstream analysis of causes


                              Key node
The way:

              The geneXplain platform

  Integrated collection of bioinformatic and systems biological
   program modules („Bricks“)
     Based on proven BioUML technology
     Statistical analysis of high-throughput data
     Integrated bioinformatic promoter- and network analysis
     Systems biological simulation
  Unified look-and-feel
  Workflow management system
  Pre-defined standard workflows
  Easy integration of own tools and scripts
The geneXplain platform
The geneXplain platform
The geneXplain platform
The geneXplain platform

            Public Private Partnership
 Clash of cultures:
      Cheminformatics: commercial approaches accepted
      Bioinformatics: public domain prevalent (Internet culture)
 Advantages of public-domain services:
      Latest state of the art
      Visibility („marketing“ through publications, conference talks, etc.)
      High acceptance
 Disadvantages of public-domain services:
      No unified look-and-feel
      Low user-friendliness
      Poor support
      Uncertainty on side of users without expertise
      Unsure long-term perspective
The geneXplain platform

            Public Private Partnership


 The disadvantages of the public domain are advantages of a
  commercial offer
 Optimal: combination of free and commercial tools
 Business model:
      Platform with integrated free and proprietary offerings
      Payable access
      Payable support
The geneXplain platform

            Public Private Partnership

 Advantages for the user
      Standardized interface
      Integrated workflows
      Default parametrizations byexperts
      Selection of free modules by experts in the field
      Selection of proprietary, uszually low-price modules by the user
      Full cost-control by the user
www.genexplain.com


Contact:

Edgar Wingender            edgar.wingender@genexplain.com

German Russian Workshop 2011 - geneXplain

  • 1.
    Towards a comprehensive computationalplatform for next generation drug development – A Russian‐German joint venture Edgar Wingender CEO Wolfenbüttel, Am Exer 10b GmbH http://www.genexplain.com
  • 2.
    We aim toprovide a comprehensive platform of bioinformatics, systems biological and cheminformatics tools for a personalized medicine and pharmacogenomics
  • 3.
    Some facts aboutgeneXplain:  Founded in April 2010, starting active business July 2010  International (German-Russian) shareholder structure  Managing directors: E. Wingender (CEO), A. Kel (CSO)  Product portfolio in bioinformatics, systems biology, cheminformatics  Development close to science and research  Participation in international and national research consortia - SYSCOL (EU FP7) - GERONTOSHIELDS (BMBF)
  • 4.
    proteins compounds networks genes
  • 5.
    Some facts aboutgeneXplain:  Founded in April 2010, starting active business July 2010  International (German-Russian) shareholder structure  Managing directors: E. Wingender (CEO), A. Kel (CSO)  Product portfolio in bioinformatics, systems biology, cheminformatics  Close to science and research  Participation in international and national research consortia - SYSCOL (EU FP7) - GERONTOSHIELDS (BMBF) - TEMPUS (EU)
  • 6.
    The idea: Providing a platform of methods for  Biomedical research  Focus: drug development  Complete pipeline from high-throughput data to a lead structure  High-throughput data:  Genomics  Transkriptomics  Proteomics  Public private partnership
  • 7.
    GeneXplainTM Platform: AWorkflow for Drug Discovery The geneXplain platformTM is a new product integrating bio- and cheminformatics tools for pharmacogenomics. It provides a drug discovery workflow that guides from the statistical analysis of biological high-throughput data to a panel of potential lead compounds for further validation. Statistics Input: High-throughput data from patients (genomics, Within the geneXplain platformTM, transcriptomics, ChIP-seq, identification of drug target protein proteomics, etc.) molecules by bioinformatics and Output: List of relevant genes or systems biology methods, is proteins complemented by prediction of Any pre-processed list of biological activities and adverse genes or proteins from Bioinformatics effects for chemical compounds, own experiments, from Search for regulatory modules in any literature or databases based on multilevel neighborhoods genomic regions of atoms (MNA) descriptors. Output: List of transcription factors potentially responsible for the observed (co-)regulation of genes Any list of transcription factors; any list of genes or proteins from own experiments, from literature Systems Biology The workflow or databases to be mapped on Topological analysis of the networks known pathways The incorporated statistical upstream of transcription factors, analyses help to identify relevant simulation of the network behavior, genes or proteins in the raw patient stratification data, e.g. those that are Hypotheses about gene regulators Output: List of potential master regulators differentially expressed. essential for the The Bioinformatics block allows studied process to reveal potential regulation of genes by transcription factors or miRNAs. Systems biology approaches analyze networks of molecular Cheminformatics events and suggest promising Prediction of biological activities of the Hypotheses compounds, selection of compounds with drug target molecules and their about target mechanisms of action. required effects and without adverse or molecules and The integrated PASS tool enables their role in the toxic effects. to direct compound screening by studied process Output: List of potential lead structures pre-selection of chemicals with Hypotheses for for validation desirable and without adverse or validations and clinical toxic effects. trials Systematic generation of statistically significant hypotheses
  • 8.
    Proof of concept: Net2Drug consortium EU FP6, Coordinator: A. Kel Transcriptomics breast cancer cell line Statistical evaluation Integrated bioinformatic analysis (promoter & pathway analysis) Systems biological simulation Cheminformatic identification of candidate drugs
  • 9.
    Proof of concept: Net2Drug consortium EU FP6, Coordinator: A. Kel Transcriptomics breast cancer cell line Results: Statistical evaluation Out of 24 million compounds, 16 substances turned out to be feasibleIntegrated bioinformatic analysis for experimental testing. (promoter & pathway analysis) For 2 compounds, highly specific activities were found. Systems biological simulation Cheminformatic identification of candidate drugs
  • 10.
    GeneXplainTM Platform: AWorkflow for Drug Discovery The geneXplain platformTM is a new product integrating bio- and cheminformatics tools for pharmacogenomics. It provides a drug discovery workflow that guides from the statistical analysis of biological high-throughput data to a panel of potential lead compounds for further validation. Statistics Input: High-throughput data from patients (genomics, Within the geneXplain platformTM, transcriptomics, ChIP-seq, identification of drug target protein proteomics, etc.) molecules by bioinformatics and Output: List of relevant genes or systems biology methods, is proteins complemented by prediction of Any pre-processed list of biological activities and adverse genes or proteins from Bioinformatics effects for chemical compounds, own experiments, from Search for regulatory modules in any literature or databases based on multilevel neighborhoods genomic regions of atoms (MNA) descriptors. Output: List of transcription factors potentially responsible for the observed (co-)regulation of genes Any list of transcription factors; any list of genes or proteins from own experiments, from literature Systems Biology The workflow or databases to be mapped on Topological analysis of the networks known pathways The incorporated statistical upstream of transcription factors, analyses help to identify relevant simulation of the network behavior, genes or proteins in the raw patient stratification data, e.g. those that are Hypotheses about gene regulators Output: List of potential master regulators differentially expressed. essential for the The Bioinformatics block allows studied process to reveal potential regulation of genes by transcription factors or miRNAs. Systems biology approaches analyze networks of molecular Cheminformatics events and suggest promising Prediction of biological activities of the Hypotheses compounds, selection of compounds with drug target molecules and their about target mechanisms of action. required effects and without adverse or molecules and The integrated PASS tool enables their role in the toxic effects. to direct compound screening by studied process Output: List of potential lead structures pre-selection of chemicals with Hypotheses for for validation desirable and without adverse or validations and clinical toxic effects. trials Systematic generation of statistically significant hypotheses
  • 11.
    The cheminformatics portfolio:  PASS predicts biological activities of chemical compounds from their structural formulae; assigns probability values to each activity and identifies those parts of the molecule that are responsible for this activitiy  PharmaExpert mines large amounts of predictions generated by PASS to filter out those compounds that optimaly fit user-defined requirements  GUSAR generates quantitative structure-activity relationship (QSAR) models
  • 12.
    How to getthere: GeneXplainTM Platform: A Workflow for Drug Discovery The geneXplain platformTM is a new product integrating bio- and cheminformatics tools for pharmacogenomics. It provides a drug discovery workflow that guides from the statistical analysis of biological high-throughput data to a panel of potential lead compounds for further validation. Statistics Input: High-throughput data from patients (genomics, Within the geneXplain platformTM, transcriptomics, ChIP-seq, identification of drug target protein proteomics, etc.) molecules by bioinformatics and Output: List of relevant genes or systems biology methods, is proteins complemented by prediction of Any pre-processed list of biological activities and adverse genes or proteins from Bioinformatics effects for chemical compounds, own experiments, from Search for regulatory modules in any literature or databases based on multilevel neighborhoods genomic regions of atoms (MNA) descriptors. Output: List of transcription factors potentially responsible for the observed (co-)regulation of genes Any list of transcription factors; any list of genes or proteins from own experiments, from literature Systems Biology The workflow or databases to be mapped on Topological analysis of the networks known pathways The incorporated statistical upstream of transcription factors, analyses help to identify relevant simulation of the network behavior, genes or proteins in the raw patient stratification data, e.g. those that are Hypotheses about gene regulators Output: List of potential master regulators differentially expressed. essential for the The Bioinformatics block allows studied process to reveal potential regulation of genes by transcription factors or miRNAs. Systems biology approaches analyze networks of molecular Cheminformatics events and suggest promising Prediction of biological activities of the Hypotheses compounds, selection of compounds with drug target molecules and their about target mechanisms of action. required effects and without adverse or molecules and The integrated PASS tool enables their role in the toxic effects. to direct compound screening by studied process Output: List of potential lead structures pre-selection of chemicals with Hypotheses for for validation desirable and without adverse or validations and clinical toxic effects. trials Systematic generation of statistically significant hypotheses
  • 13.
    The way: The geneXplain platform  Integrated collection of bioinformatic and systems biological program modules („Bricks“)  Based on proven BioUML technology  Statistical analysis of high-throughput data  Integrated bioinformatic promoter- and network analysis  Systems biological simulation  Unified look-and-feel  Workflow management system  Pre-defined standard workflows  Easy integration of own tools and scripts
  • 14.
    Upstream analysis ofcauses Key node
  • 15.
    The way: The geneXplain platform  Integrated collection of bioinformatic and systems biological program modules („Bricks“)  Based on proven BioUML technology  Statistical analysis of high-throughput data  Integrated bioinformatic promoter- and network analysis  Systems biological simulation  Unified look-and-feel  Workflow management system  Pre-defined standard workflows  Easy integration of own tools and scripts
  • 16.
  • 17.
  • 18.
  • 19.
    The geneXplain platform Public Private Partnership  Clash of cultures:  Cheminformatics: commercial approaches accepted  Bioinformatics: public domain prevalent (Internet culture)  Advantages of public-domain services:  Latest state of the art  Visibility („marketing“ through publications, conference talks, etc.)  High acceptance  Disadvantages of public-domain services:  No unified look-and-feel  Low user-friendliness  Poor support  Uncertainty on side of users without expertise  Unsure long-term perspective
  • 20.
    The geneXplain platform Public Private Partnership  The disadvantages of the public domain are advantages of a commercial offer  Optimal: combination of free and commercial tools  Business model:  Platform with integrated free and proprietary offerings  Payable access  Payable support
  • 21.
    The geneXplain platform Public Private Partnership  Advantages for the user  Standardized interface  Integrated workflows  Default parametrizations byexperts  Selection of free modules by experts in the field  Selection of proprietary, uszually low-price modules by the user  Full cost-control by the user
  • 22.
    www.genexplain.com Contact: Edgar Wingender edgar.wingender@genexplain.com