Discovery of a novel drug is an optimizing challenge against an array of chemical and biological attributes to reach the desired efficacy and safety profile. The immense complexity of the human body combined with the astronomically large druggable chemical space hinders the selection of molecules with such a balanced profile. Therefore, the medicinal chemistry toolbox embraces all computational techniques with predictive power to focus the chemical space to the most promising candidates for synthesis and testing. The diversity includes data analysis tools, physics-based simulations, biological target structure driven or ligand structure based approaches [1-3]. While the size of the compound collections vary from a couple of close analogues up to billions of virtual compounds to process[4]. This presentation will highlight general concepts and techniques applied in computer aided drug design, focusing on data and ligand based computational chemistry approaches and showcase solutions developed by ChemAxon.
[1] Gisbert Schneider, David E Clark, Angew Chem Int Ed Engl. 2019, 5;58(32):10792-10803.
[2] John G Cumming, Andrew M Davis, Sorel Muresan, Markus Haeberlein, Hongming Chen, Nat Rev Drug Discov, 2013, 12(12):948-62.
[3] Yu-Chen Lo, Stefano E Rensi, Wen Torng, Russ B Altman, Drug Discov Today 2018, 23(8):1538-1546
[4] Torsten Hoffmanm, Marcus Gastreich, Drug Discov Today, 2019, 24(5):1148-1156.
PRESENTED BY: HARSHPAL SINGH WAHI, SHIKHA D. POPALI
USEFUL FOR PHARMACY STUDENTS AND ACADEMICS, INDUSTRIALS FOR MOLECULE DEVELOPMENT, MODELING, DRUG DISCOVERY, COMPUTATIONAL TOOLS, MOLECULAR DOCKING ITS TYPES, FACTORS AFFECTING, DIFFERENT STAGES, QSAR ADVANTAGES, NEED
A QSAR is a mathematical relationship between a biological activity of a molecular system and its geometric and chemical characteristics.
QSAR attempts to find consistent relationship between biological activity and molecular properties, so that these “rules” can be used to evaluate the activity of new compounds.
Molecular modelling for M.Pharm according to PCI syllabusShikha Popali
THE MOLECULAR MODELLING IS THE MOST IMPORTANT TOPIC FOR CHEMISTRY STUDENTS , HENCE THE THEORY OF MOLECULAR MODELLING IS COVER IN THIS PRESNTATION . HOPE THIS MATTER SAISFY ALL AS WE HAVE TRIED TO ATTEMPT ALL TH TOPICS OF IT.
PRESENTED BY: HARSHPAL SINGH WAHI, SHIKHA D. POPALI
USEFUL FOR PHARMACY STUDENTS AND ACADEMICS, INDUSTRIALS FOR MOLECULE DEVELOPMENT, MODELING, DRUG DISCOVERY, COMPUTATIONAL TOOLS, MOLECULAR DOCKING ITS TYPES, FACTORS AFFECTING, DIFFERENT STAGES, QSAR ADVANTAGES, NEED
A QSAR is a mathematical relationship between a biological activity of a molecular system and its geometric and chemical characteristics.
QSAR attempts to find consistent relationship between biological activity and molecular properties, so that these “rules” can be used to evaluate the activity of new compounds.
Molecular modelling for M.Pharm according to PCI syllabusShikha Popali
THE MOLECULAR MODELLING IS THE MOST IMPORTANT TOPIC FOR CHEMISTRY STUDENTS , HENCE THE THEORY OF MOLECULAR MODELLING IS COVER IN THIS PRESNTATION . HOPE THIS MATTER SAISFY ALL AS WE HAVE TRIED TO ATTEMPT ALL TH TOPICS OF IT.
This is a PRESENTATION just to help students to easily understand one of the method of drug designing i.e. QSAR.. this is a combination of many slides and books..this is not my personal.
A quantitative structure-activity relationship
(QSAR) correlates measurable or calculable
physical or molecular properties to some
specific biological activity in terms of an
equation.
Drug discovery take years to decade for discovering a new drug and very costly
Effort to cut down the research timeline and cost by reducing wet-lab experiment use computer modeling
Others have done the work. Some have used the work. I have spoken only on behalf of their behalf.
SAR versus QSAR, History and development of QSAR, Types of physicochemical
parameters, experimental and theoretical approaches for the determination of
physicochemical parameters such as Partition coefficient, Hammet’s substituent
constant and Taft’s steric constant. Hansch analysis, Free Wilson analysis, 3D-QSAR
approaches like COMFA and COMSIA.
ADMET properties prediction using AI will accelerate the process of drug discovery.
This slide mostly focuses on using graph-based deep learning techniques to predict drug properties.
This presentation gives us an information regarding the protease enzyme and its development ,development of agents using molecular modelling techniques
STUDIES ON INTEGRATED BIO-HYDROGEN PRODUCTION PROCESS-EXPERIMENTAL AND MODELINGArghya_D
In the project “Studies on integrated biohydrogen production process-Experimental and Modeling”,a co-culture (mixture of two microorganisms in a single reactor) study of a dark fermentative and photofermentative microorganism was done to assess its hydrogen production performance. For modeling purpose, Artificial Neural Network and Genetic Algorithm has been used as a stochastic technique. The optimized data from batch study was successfully used to run a photobioreactor in continuous mode. A mechanistic model was developed for a continuous co-culture setup using data from literature and solved using MATLAB.
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATIONMln Phaneendra
In this ppt particle swarm optimization (PSO) is applied to allot the active power among the generating stations satisfying the system constraints and minimizing the cost of power generated.The viability of the method is analyzed for its accuracy and rate of convergence. The economic load dispatch problem is solved for three and six unit system using PSO and conventional method for both cases of neglecting and including transmission losses. The results of PSO method were compared with conventional method and were found to be superior.
This is a PRESENTATION just to help students to easily understand one of the method of drug designing i.e. QSAR.. this is a combination of many slides and books..this is not my personal.
A quantitative structure-activity relationship
(QSAR) correlates measurable or calculable
physical or molecular properties to some
specific biological activity in terms of an
equation.
Drug discovery take years to decade for discovering a new drug and very costly
Effort to cut down the research timeline and cost by reducing wet-lab experiment use computer modeling
Others have done the work. Some have used the work. I have spoken only on behalf of their behalf.
SAR versus QSAR, History and development of QSAR, Types of physicochemical
parameters, experimental and theoretical approaches for the determination of
physicochemical parameters such as Partition coefficient, Hammet’s substituent
constant and Taft’s steric constant. Hansch analysis, Free Wilson analysis, 3D-QSAR
approaches like COMFA and COMSIA.
ADMET properties prediction using AI will accelerate the process of drug discovery.
This slide mostly focuses on using graph-based deep learning techniques to predict drug properties.
This presentation gives us an information regarding the protease enzyme and its development ,development of agents using molecular modelling techniques
STUDIES ON INTEGRATED BIO-HYDROGEN PRODUCTION PROCESS-EXPERIMENTAL AND MODELINGArghya_D
In the project “Studies on integrated biohydrogen production process-Experimental and Modeling”,a co-culture (mixture of two microorganisms in a single reactor) study of a dark fermentative and photofermentative microorganism was done to assess its hydrogen production performance. For modeling purpose, Artificial Neural Network and Genetic Algorithm has been used as a stochastic technique. The optimized data from batch study was successfully used to run a photobioreactor in continuous mode. A mechanistic model was developed for a continuous co-culture setup using data from literature and solved using MATLAB.
ECONOMIC LOAD DISPATCH USING PARTICLE SWARM OPTIMIZATIONMln Phaneendra
In this ppt particle swarm optimization (PSO) is applied to allot the active power among the generating stations satisfying the system constraints and minimizing the cost of power generated.The viability of the method is analyzed for its accuracy and rate of convergence. The economic load dispatch problem is solved for three and six unit system using PSO and conventional method for both cases of neglecting and including transmission losses. The results of PSO method were compared with conventional method and were found to be superior.
Biopharmaceutical Attribute Monitoring with the Waters ACQUITY QDa Mass DetectorWaters Corporation
Bringing greater sensitivity, selectivity, and productivity to routine analysis of biotherapeutics, whether you're in characterization or in downstream production of biologics.
Multi-objective whale optimization based minimization of loss, maximization o...IJECEIAES
Huge need in electricity causes placement of Distribution Generation (DG)s like Photovoltaics (PV) systems in distribution side for enhancing the loadability by improving the voltage stability and minimization of loss with minimum cost. Many optimal placements of DG have done in focus of minimum loss and improving voltage profile. This Whale optimization is a new optimization technique framed with mathematics of spiral bubble-net feeding behavior of humpback whales for solving a power system multi-objective problem considering cost of the power tariff and DG. Here main objectives are minimizing loss and cost with maximization of voltage stability index. IEEE 69 power system data is used for solution of the proposed method.
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...Pooyan Jamshidi
We enable reliable and dependable self‐adaptations of component connectors in unreliable environments with imperfect monitoring facilities and conflicting user opinions about adaptation policies by developing a framework which comprises: (a) mechanisms for robust model evolution, (b) a method for adaptation reasoning, and (c) tool support that allows an end‐to‐end application of the developed techniques in real‐world domains.
Evolving Fuzzy System Applied to Battery Charge Capacity Prediction for Faul...Murilo Camargos
This paper addresses the use of data-driven evolving techniques applied to fault prognostics in Li-ion batteries. In such problems, accurate predictions of multiple steps ahead are essential for the Remaining Useful Life (RUL) estimation of a given asset. The fault prognostics' solutions must be able to model the typical nonlinear behavior of the degradation processes of these assets, and be adaptable to each unit's particularities. In this context, the Evolving Fuzzy Systems (EFS) are models capable of representing such behaviors, in addition of being able to deal with non-stationary behavior, also present in these problems. Moreover, a methodology to recursively track the model's estimation error is presented as a way to quantify uncertainties that are propagated in the long-term predictions. The well-established NASA's Li-ion batteries data set is used to evaluate the models. The experiments indicate that generic EFS can take advantage of both historical and stream data to estimate the RUL and its uncertainty.
We present a cloud computing application aimed at the unattended, high-throughput prediction of thermodynamic stability of amorphous pharmaceutical delivery systems. To that end, we discuss the system-agnostic solubility prediction of Vitamin E TPGS and Tween 80 surfactants in Copovidone. Underlying to the computing scheme was a highly parallelized architecture for molecular dynamics and free energy perturbation from which stability critical points were extracted from free energy profiles. Differential scanning calorimetry of physical samples formulated by hot melt extrusion indicated a tight agreement between the computed stability limits of 9.0 and 10.0 wt% vs. the experimental 7 and 9 wt% for Vitamin E TPGS and Tween 80, respectively. Results suggest that stability screening via resource-optimized cloud computing is a physically meaningful and operationally sensible precursor stage to formulation and stress-testing of amorphous pharmaceutical delivery systems.
Presentation by Dr. Sarah Cianférani-Sanglier, University of Strasbourg, Strasbourg, France. Talk given at Waters Antibody Drug Conjugates (ADC) 2014 Meeting, Nov. 20-21, Wilmslow UK.
Similar to Computational tools for drug discovery (20)
Compound design and progression tracking with CROsEszter Szabó
The new Design Hub is ChemAxon's DMTA application focusing on compound design, tracking and data analysis. In this talk we will show how to bring synthesis CROs efficiently and securely into the daily work of medicinal chemistry project teams, and identify criteria to prove system security while maintaining convenience for end users - a challenging task under any circumstances, in any industry.
New way of writing chemistry patents englishEszter Szabó
ChemAxon's Markush Editor assists you in all the steps of the Chemistry patent writing, saves time and effort, and helps create strong, high-quality claims.
- Find the optimal scaffold and generate all R-group definitions automatically.
- Hierarchical 'tree-like' visualization of R-group relationships in Markush structures
- Real-time feedback about whether and how the compounds match the Markush structure
- Generating Markush claims text and example structure list automatically in docx format
The new way of writing chemistry patentsEszter Szabó
Creating a patent that covers all of the desired compounds, without any overlap with prior-art, can be a challenging task and a huge responsibility. Markush Editor assists you in all the steps of the drafting process, saves time and effort, and helps create strong, high-quality claims.
How to find leads and analogs in Enamine REAL using a highly scalable search ...Eszter Szabó
The new JChem Microservices uses the latest generation search technology that can handle large datasets. Microservice architecture has a modular setup, so instead of one monolithic application, we have smaller modules with specific functionalities. JChem Microservices provide small, separate modules for different areas of ChemAxon functionalities like chemical dataset searching, conversion between chemical file formats, or chemical property calculation. This is scalable, easily manageable, and cloud-agnostic. In this webcast, we will show you how to set up a highly available architecture using Microservices, and demonstrate an example using Enamine search as a service.
Automation of building reliable modelsEszter Szabó
Volume and velocity of bioactivity data available in public or in-house sources represent an immense opportunity to be exploited in novel compound design. Wider and wider array of targets with labelled data necessitates efficient solutions to build a large number of individual models. Velocity of data growth provides the possibility to yield higher accuracy through continuous re-training of the existing models. Automatic re-training maximizes the applicability domain and minimizes the risk of accuracy drop while a project expands into novel chemical series.
Integrating InChI and RInChI Native Libraries into Java ApplicationsEszter Szabó
The technical story of how ChemAxon supports InChI and RInChI in its products. A brief presentation about the challenges of connecting Java applications with native C and C++ libraries focusing on the native libraries released by the InChI Trust(https://www.inchi-trust.org/
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
ISI 2024: Application Form (Extended), Exam Date (Out), EligibilitySciAstra
The Indian Statistical Institute (ISI) has extended its application deadline for 2024 admissions to April 2. Known for its excellence in statistics and related fields, ISI offers a range of programs from Bachelor's to Junior Research Fellowships. The admission test is scheduled for May 12, 2024. Eligibility varies by program, generally requiring a background in Mathematics and English for undergraduate courses and specific degrees for postgraduate and research positions. Application fees are ₹1500 for male general category applicants and ₹1000 for females. Applications are open to Indian and OCI candidates.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
9. ~20k non-modified (canonical) human proteins
1 target to engage with
Target, the William Tell’s challenge
https://doi.org/10.1038/nrd892
10. Rubik cube for medicinal chemists
- Medicinal chemistry optimization is
multi-dimensional problem
- Each chemical modification corresponds to a series
of biological activity changes
- Rubik’s cube has 43 252 003 274 489 856 000 (1019
)
different configurations
- All configurations can be solved in ~20 steps
- Druggable chemical space is ~1060
https://doi.org/10.1016/j.drudis.2011.05.005
18. Scale
Ki or IC50 or EC50
M [mol/dm3] ΔG [kJ/mol] ΔG [kCal/mol] Affinity
0.1 100 mM -5,7 -1,4
Weak
0.01 10 mM -11,4 -2,7
0.001 1 mM -17,1 -4,1
0.0001 100 uM -22,8 -5,5
0.00001 10 uM -28,5 -6,8
Medium
1.00E-06 1 uM -34,2 -8,2
1.00E-07 100 nM -39,9 -9,5
Strong
1.00E-08 10 nM -45,6 -10,9
1.00E-09 1 nM -51,3 -12,3
1.00E-10 100 pM -57,0 -13,6
Very strong
1.00E-11 10 pM -62,8 -15,0
1.00E-12 1 pM -68,5 -16,4
19. Ki or IC50 or EC50
M [mol/dm3] ΔG [kJ/mol] ΔG [kCal/mol] Affinity
0.1 100 mM -5,7 -1,4
Weak
0.01 10 mM -11,4 -2,7
0.001 1 mM -17,1 -4,1
0.0001 100 uM -22,8 -5,5
0.00001 10 uM -28,5 -6,8
Medium
1.00E-06 1 uM -34,2 -8,2
1.00E-07 100 nM -39,9 -9,5
Strong
1.00E-08 10 nM -45,6 -10,9
1.00E-09 1 nM -51,3 -12,3
1.00E-10 100 pM -57,0 -13,6
Very strong
1.00E-11 10 pM -62,8 -15,0
1.00E-12 1 pM -68,5 -16,4
Scale
https://doi.org/10.1021/jm100112j
20. Ki of 1 nM.
Replacing the isopropyl group (marked in red) by hydrogen reduces the affinity to 39 μM.
https://doi.org/10.1021/jm100112j
Ki or IC50 or EC50
M [mol/dm3] ΔG [kJ/mol] ΔG [kCal/mol] Affinity
0.1 100 mM -5,7 -1,4
Weak
0.01 10 mM -11,4 -2,7
0.001 1 mM -17,1 -4,1
0.0001 100 uM -22,8 -5,5
0.00001 10 uM -28,5 -6,8
Medium
1.00E-06 1 uM -34,2 -8,2
1.00E-07 100 nM -39,9 -9,5
Strong
1.00E-08 10 nM -45,6 -10,9
1.00E-09 1 nM -51,3 -12,3
1.00E-10 100 pM -57,0 -13,6
Very strong
1.00E-11 10 pM -62,8 -15,0
1.00E-12 1 pM -68,5 -16,4
Scale
33. „The fundamental laws necessary for the
mathematical treatment of a large part of physics and
the whole of chemistry are thus completely known
34. „The fundamental laws necessary for the
mathematical treatment of a large part of physics and
the whole of chemistry are thus completely known, and
the difficulty lies only in the fact that application of
these laws leads to equations that are too complex to
be solved. „
(Paul Dirac, 1929)
47. Alchemical transformation
FEP +
- Hamiltonian replica exchange method
- region surrounding the protein binding pocket is “heated up”
- the rest of the system stays “cold”
- GPU calculation involving ~6000 atoms requires ~6h (4/day)
https://doi.org/10.1007/978-1-4939-9608-7
49. 1. Availability of at least one high-quality crystal structure with
co-crystallized series ligand.
2. A reasonable expectation of a conserved binding mode across the
series.
3. Minimal tautomeric, ionization state, and stereochemistry uncertainties
across the series.
4. High reliability experimental binding data from the same assay for all
compounds.
5. Assay data and crystal structures are for the same protein construct.
Constraints
https://doi.org/10.1007/978-1-4939-9608-7
51. In God we trust, all others bring data.
William Edwards Deming
Trevor Hastie, Robert Tibshirani, Jerome Friedman
The Elements of Statistical Learning Data Mining, Inference, and Prediction
65. Z = (z1 , z2 , . . . , zN ) where zi = (xi , yi )
B times producing B bootstrap datasets with
replacement
S(Z) is any quantity computed from the data Z
Bootstrap methods
71. P(A|B)=P(B|A)xP(A)/P(B)
99% sensitivity
99% specificity
0.5% positive cases
P(TP|+)=0.99*0.005/[0.99*0.005+0.01*0.995]=33.2%
If the test is positive, still there is only 33.2% chance to be true positive.
1000 cases, 995 negative, 5 positive
995*0.01 = 10 false positive
5*0.99~5
Sum positive 15, true positive = 5 (33%)
Bayes theorem
87. - Linear regression (PLS, LASSO)
- Decision tree (CART) and Random forest
- Support Vector Machine
- Neural Network (Deep, Convolutional Neural
Network)
Model building
106. Workflow overview
Training data
(sdf, with labelled
data)
Training
module
Build
- Descriptor generation
- Model building
- Validation
Model management
- Persistence
- Execution
New model
Icon by Aficons from Noun Project
https://disco.chemaxon.com/calculators/trainer-engine/
107. - Feature engineering
ChemAxon descriptors
User defined descriptors
- Model building
Type: regression, classification
Models: RF, SVR, GB, GC
Hyperparameter optimization: pre-optimized preset, optimizer
Precise automatic models: under the hood
Icon by modgekar from Noun Project
108. - Validation statistics and report
Training test set split
Retrospective accuracy
- Reliability
- Applicability domain: most similar structures
- Prediction error: Conformal prediction
- Overfitting
- Scramble Y
Quality assessment
109. Application Study on ChEMBL
Dataset: Journal of Cheminformatics volume 9, Article number: 45 (2017)
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0232-0
- 163 ChEMBL targets
- Data points in range: 500-4703 per target
- 10-90% test-training set split
- Target pAct
- Pearson, RMSE
118. Specs:
- Using MadFast dev version
- machine: a single Amazon EC2 x1.16xlarge instance
(976 GiB RAM, 64 cores, 2T SSD, $6.7 / h on-demand)
- dataset: Enamine Real 2019q34, 1.2B molecules,
- fingerprint: CFP7, 512 bit
Importing:
- importing time was 6h 16m (ran concurrently with an
ECFP import, using half of the cores)
- result binary blob: 167 GiB
Server startup
- 448 s (~7.5 min) to read 167 GiB to memory (~380 MB/s
throughput)
- Of which 169 s for mols, 74 s for ids, 203 s for fingerprints
Fast similarity search Dissim limit Hit count limit Runs Avg search time
0.4 1 500 0.45 s
0.4 9 500 0.96 s
0.4 81 500 1.16 s
0.4 729 500 1.25 s
0.4 2187 500 1.26 s
0.4 6561 500 1.42 s
0.4 15000 500 1.89 s
1.0 1 50 0.61 s
1.0 9 50 0.98 s
1.0 81 50 1.29 s
1.0 729 50 1.39 s
1.0 2187 50 1.65 s
1.0 6561 50 3.22 s
1.0 15000 50 9.67 s
119. - Pre-screen
Fingerprint match all query bits present in target
Descriptor screen (Mw, counts)
- Graph isomorph check
A graph S is a subgraph of a graph G if S is isomorphic to a
subgraph of G (Ullmann, VF2, VF2+)
Substructure search
Tutorials in Chemoinformatics, 395-448 John Wiley & Sons Ltd, Chichester, UK, 2017; https://doi.org/10.1186/1758-2946-4-13
120. ● Data set: The Enamine library containing 1.2 billion structures was imported in the database cluster.
● Hardware:
○ Citus cluster was set up in AWS to use a distributed PostgreSQL database.
○ The cluster included one coordinator node and 20 worker nodes.
■ Coordinator node was installed on a t2.xlarge type EC2 instance was used (4 cores, 16 GiB memory)
■ Worker nodes were installed on c5a.4xlarge type instances (16 cores and 32 GiB memory per instance)
● Data upload and chemical indexing:
○ Upload of the data took ~12h;
○ Chemical index creation with JChem PostgreSQL Cartridge: 19.3h
● Search types:
○ Full structure, substructure and similarity search, as well as different combined queries were used with one, two or
three additional properties.
○ The number of records returned by the queries was limited to return only the top 100 results.
JChem PostgreSQL cartridge test runs
128. ● Complexity of the human body
○ Single target to interact
○ Multiple targets to avoid
● Complexity of the binding interactions
○ Influence of small structural changes
○ Balancing speed and accuracy
○ Need for structural information
● Pitfalls of machine learning
○ Validation strategy
○ Overfitting
● Size of chemical space
○ Searching in the (multi)billion chemical space
● Accessibility
○ Connecting all the models with designers and medicinal chemists
Challenges
129. „ Essentially, all models are wrong, but some are useful. ”
Box, G. E. P., and Draper, N. R., (1987), Empirical Model Building and Response Surfaces
John Wiley & Sons, New York, NY.