Presentation by Dr Ed Griffen of MedChemica Ltd, at The IBSA Conference "How Artificial Intelligence Can Change the Pharmaceutical Landscape“ - LUGANO, October 9th 2019.
MedChemica Levinthal Lecture at Openeye CUP XX 2020Ed Griffen
This document summarizes a lecture on improving medicinal and computational medicinal chemistry. It discusses defining clear target product profiles through collaboration between medicinal chemists and other experts. Navigating medicinal chemistry projects requires estimating the predicted therapeutic dose of compounds. The document outlines tactics for exploring a compound's structure-activity relationship, including introducing and modifying chiral centers. It also describes how mining past medicinal chemistry data can provide rules for modifying compounds to improve properties like solubility while maintaining potency.
Accelerating lead optimisation with active learning by exploiting MMPA based ...Ed Griffen
Presented at the 15th GCC - German Conference on Cheminformatics November 2019
We combine regression forest machine learning with our MMPA based generative methods to deliver an active learning system to accelerate lead optimisation. In the process we identify permutative MMPA as a method to leverage SAR information from small data sets.
Published by MedChemica Ltd
Presented at Artificial Intelligence and Machine Learning for Advanced Drug Discovery & Development 2019 on 28th May 2019 by Dr Ed Griffen of MedChemica Ltd
MedChemica Active Learning - Combining MMPA and MLAl Dossetter
Describes MedChemica research on combining Matched Molecular Pair Analysis (MMPA) and Machine Learning (ML) into a closed loop to find and optimize new hits for drug discovery. The talks describes the MMPA and Regression Forest models and how they were combined and some early conclusion. Of these permutative MMPA is the clear winner (Free Wilson ++)
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...Al Dossetter
MCPairs performed Matched Molecular Pair Analysis on large scale to build databases of exploitable knowledge which is accessible for Drug Discovery to accelerate research projects. The talk describes how we did this and some of the challenges.
MedChemica BigData What Is That All About?Al Dossetter
A light look at the world of BigData for the lay person - a look at a couple of examples and what we do in MedChemica to speed up drug discovery. First presented at Macclesfield SciBar, and then Knutsford SciBar.
MedChemica - Automated Extraction of Actionable Knowledge from Large Scale in...Al Dossetter
This document discusses the importance of stereochemistry in medicinal chemistry research and drug development. It notes that many approved drugs are chiral molecules where the specific stereochemistry is important. Better rules for medicinal chemistry could help reduce high drug development costs by improving predictions of properties like absorption, distribution, metabolism, and excretion. The document advocates mining large datasets of in vitro pharmacology data to extract actionable knowledge about stereochemistry and its effects on important drug properties and clinical outcomes. This could help medicinal chemists design safer and more effective compounds with lower attrition rates in development.
The talk describes the science and results of a consortium of multiple pharmaceutical companies extracting medicinal chemistry knowledge from research data and the application to real drug design projects. A new technique for automating pharmacophore / toxophore finding from public data is disclosed.
MedChemica Levinthal Lecture at Openeye CUP XX 2020Ed Griffen
This document summarizes a lecture on improving medicinal and computational medicinal chemistry. It discusses defining clear target product profiles through collaboration between medicinal chemists and other experts. Navigating medicinal chemistry projects requires estimating the predicted therapeutic dose of compounds. The document outlines tactics for exploring a compound's structure-activity relationship, including introducing and modifying chiral centers. It also describes how mining past medicinal chemistry data can provide rules for modifying compounds to improve properties like solubility while maintaining potency.
Accelerating lead optimisation with active learning by exploiting MMPA based ...Ed Griffen
Presented at the 15th GCC - German Conference on Cheminformatics November 2019
We combine regression forest machine learning with our MMPA based generative methods to deliver an active learning system to accelerate lead optimisation. In the process we identify permutative MMPA as a method to leverage SAR information from small data sets.
Published by MedChemica Ltd
Presented at Artificial Intelligence and Machine Learning for Advanced Drug Discovery & Development 2019 on 28th May 2019 by Dr Ed Griffen of MedChemica Ltd
MedChemica Active Learning - Combining MMPA and MLAl Dossetter
Describes MedChemica research on combining Matched Molecular Pair Analysis (MMPA) and Machine Learning (ML) into a closed loop to find and optimize new hits for drug discovery. The talks describes the MMPA and Regression Forest models and how they were combined and some early conclusion. Of these permutative MMPA is the clear winner (Free Wilson ++)
How we Built a Large Scale Matched Pair Analysis Engine (MCPairs) using OpenE...Al Dossetter
MCPairs performed Matched Molecular Pair Analysis on large scale to build databases of exploitable knowledge which is accessible for Drug Discovery to accelerate research projects. The talk describes how we did this and some of the challenges.
MedChemica BigData What Is That All About?Al Dossetter
A light look at the world of BigData for the lay person - a look at a couple of examples and what we do in MedChemica to speed up drug discovery. First presented at Macclesfield SciBar, and then Knutsford SciBar.
MedChemica - Automated Extraction of Actionable Knowledge from Large Scale in...Al Dossetter
This document discusses the importance of stereochemistry in medicinal chemistry research and drug development. It notes that many approved drugs are chiral molecules where the specific stereochemistry is important. Better rules for medicinal chemistry could help reduce high drug development costs by improving predictions of properties like absorption, distribution, metabolism, and excretion. The document advocates mining large datasets of in vitro pharmacology data to extract actionable knowledge about stereochemistry and its effects on important drug properties and clinical outcomes. This could help medicinal chemists design safer and more effective compounds with lower attrition rates in development.
The talk describes the science and results of a consortium of multiple pharmaceutical companies extracting medicinal chemistry knowledge from research data and the application to real drug design projects. A new technique for automating pharmacophore / toxophore finding from public data is disclosed.
The document discusses random forest algorithms and their use in machine learning applications such as predicting breast cancer risk. It provides an overview of random forest methods, important hyperparameters that can improve predictive ability or increase speed, and advantages such as being robust, parallelizable, and handling unbalanced data. It also discusses some drawbacks like lower interpretability. The document then describes applying random forests to a dataset from the PLCO study to predict breast cancer risk, including data preprocessing steps and comparing models using different input variables.
Artificial intelligence (AI) has the potential to transform drug discovery through the use of semantic networks to represent biomedical knowledge as facts and infer new insights, probabilistic rules to evaluate beliefs about candidate drugs, and optimization and simulation techniques to iteratively improve outcomes. By continuously learning from diverse sources of data, AI systems could automate and enhance the processes of target identification, compound screening and testing in ways that accelerate research and development compared to traditional analytical tools.
The document summarizes recent developments in using machine learning techniques for computational drug docking. It finds that machine learning methods, such as random forests, can more accurately predict binding affinity between proteins and ligands compared to traditional scoring functions. Specifically, the best random forest model achieved a correlation of 0.803 between predicted and experimental binding affinity, compared to 0.644 for classical scoring functions. Machine learning also more accurately ranks ligands and identifies the top binding pose. The document concludes that machine learning is better able to utilize relevant molecular features for computational drug docking compared to traditional methods.
Myself Omkar B. Tipugade ,M-Pharm Sem II, Department of Pharmaceutics , Today I upload the presentation on Artificial Intelligene , In that I discuss about the definition of AI as well as their important in Pharmaceutical field . Also give brief information about the Neural networking & fuzzy logic with diagrammatic presentation And also application of AI in product formulation. I highlight the important words.
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
This document proposes using a combination of K-nearest neighbors (KNN) and genetic algorithms to classify chemical medicine or drug data with improved accuracy. KNN is described as a simple and effective classification algorithm that stores training data instances. Genetic algorithms are presented as evolutionary algorithms useful for optimization problems. The proposed system applies genetic search to rank attribute importance, selects high-ranked attributes, and then applies both KNN and genetic algorithms to classify the drug data, aiming to improve classification accuracy over using either technique alone. The combination of KNN and genetic algorithms is expected to better optimize classification of complex medical data compared to other algorithms.
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Softwarejatwood3
The document summarizes ProteoIQ Quantitative Proteomics Software. It provides a centralized software package for all proteomic studies that enables faster and more accurate data analysis compared to using multiple platforms. ProteoIQ offers robust data integration, experimental design modeling, industry-leading data visualization, qualitative comparisons, and spectral counting, isobaric tag, isotopic label, and label-free quantification. Its goal is to help users get to biological insights more quickly.
This document discusses structure-based and ligand-based drug design approaches. Structure-based design uses the 3D structure of biological targets to dock potential drug molecules. Ligand-based design analyzes similar molecules that bind to the target to derive pharmacophore models or quantitative structure-activity relationships (QSAR) to predict new candidates. Specific structure-based methods covered include docking tools like AutoDock and CDOCKER, and accounting for protein and complex flexibility. Ligand-based methods discussed are QSAR techniques like Comparative Molecular Field Analysis (CoMSIA) and Field Analysis (CoMFA). In conclusion, computational approaches like these are valuable for drug discovery by facilitating the identification and testing of new ligand
Driver Analysis and Product Optimization with Bayesian NetworksBayesia USA
Market driver analysis and product optimization are one of the central tasks in Product Marketing and thus relevant to virtually all types of businesses. BayesiaLab provides a uni!ed software platform, which can, based on consumer data,
1. provide deep understanding of the market preference structure
2. directly generate recommendations for prioritized product actions.
The proposed approach utilizes Probabilistic Structural Equation Models (PSEM), based on machine-learned Bayesian networks. PSEMs provide an ef!cient alternative to Structural Equation Models (SEM), which have been used traditionally in market research.
This paper presents a set of methods that uses a genetic algorithm for automatic test-data generation in
software testing. For several years researchers have proposed several methods for generating test data
which had different drawbacks. In this paper, we have presented various Genetic Algorithm (GA) based test
methods which will be having different parameters to automate the structural-oriented test data generation
on the basis of internal program structure. The factors discovered are used in evaluating the fitness
function of Genetic algorithm for selecting the best possible Test method. These methods take the test
populations as an input and then evaluate the test cases for that program. This integration will help in
improving the overall performance of genetic algorithm in search space exploration and exploitation fields
with better convergence rate.
Structure based drug design- kiranmayiKiranmayiKnv
This presentation helps in detail learning about the structure based drug design. It includes types of structure based drug design and detailed study of docking, de novo drug design.
Ieee transactions on 2018 knowledge and data engineering topics with abstract .tsysglobalsolutions
This document contains abstracts from 8 different academic papers. The abstracts discuss various topics related to knowledge and data engineering including traditional Chinese medicine prescriptions, multi-label learning, viral marketing in online social networks, location recommendation, nonnegative matrix factorization, distributed processing of top-k dominating queries, and supervised topic modeling for e-commerce applications. The contact information for TSYS Academic Projects is also provided at the top.
This document describes a research project aimed at creating an application that can predict whether chemical compounds will be able to pass through the blood-brain barrier. It discusses using a hybrid approach of artificial neural networks and genetic algorithms. The neural networks are trained on a dataset of chemical properties paired with blood-brain barrier permeability labels. The genetic algorithm is used to optimize parameters of the neural networks like the number of hidden nodes, learning rate, and momentum. The results section indicates the performance of the basic neural network and hybrid genetic algorithm neural network approaches were evaluated using statistical measures of sensitivity and specificity based on their predictions on the test dataset.
De novo drug design is a computer-assisted process that uses the 3D structure of a receptor target to design new drug molecules. It involves determining the structure of existing drug-target complexes, designing modifications to existing lead compounds, and generating new chemical classes of compounds. The goal is to design drugs with the correct shape and functional groups to properly fit and interact within the target's binding site in order to produce the desired pharmacological effect. However, de novo design is challenging and rarely produces ideal compounds on its own. Software tools aim to automate and speed up the process by fitting and linking together molecular fragments to fill interaction sites in the binding pocket.
Are Evolutionary Algorithms Required to Solve Sudoku Problemscsandit
1. Evolutionary algorithms aim to iteratively improve solutions through random mutation and fitness evaluation, but can become trapped in local optima. The author developed a "greedy random" mutation approach that preferentially adds rather than removes values.
2. Experiments showed this greedy random mutation was sometimes more effective at solving harder Sudoku puzzles than traditional evolutionary algorithms. This implies the quality of random mutation can significantly impact evolutionary algorithm performance with Sudoku.
3. The greedy random mutation was integrated into the evolutionary algorithm lifecycle to balance exploration and exploitation. Candidates were assessed after a removal to concentrate entropy around boundary solutions.
Qsar studies on gallic acid derivatives and molecular docking studies of bace...bioejjournal
It is reported that Alzheimer disease is linked with hypertension, diabetes type 2 and high cholesterolemia. The underlying genetic cause relating these diseases are not well studied clinically. But it has been widely
accepted that beta secretase (BACE1) is the main culprit of causing Alzheimer disease. This enzyme comes under peptidase A1 family. In the present work, ligand based and structure based drug designing have been reported. QSAR studies were done using 21 gallic acid derivatives dataset to develop good predictive
model in order to predict biological activity and certain descriptors was reported to further enhance the
analgesic activity of gallic acid derivatives. Molecular docking studies were performed in order to find
structure based drug design. Two natural gallic acid derivative have been repoted as a potent inhibitor to beta secretase enzyme.
Qsar Studies on Gallic Acid Derivatives and Molecular Docking Studies of Bace...bioejjournal
It is reported that Alzheimer disease is linked with hypertension, diabetes type 2 and high cholesterolemia. The underlying genetic cause relating these diseases are not well studied clinically. But it has been widely accepted that beta secretase (BACE1) is the main culprit of causing Alzheimer disease. This enzyme comes under peptidase A1 family. In the present work, ligand based and structure based drug designing have been
reported. QSAR studies were done using 21 gallic acid derivatives dataset to develop good predictive model in order to predict biological activity and certain descriptors was reported to further enhance the analgesic activity of gallic acid derivatives. Molecular docking studies were performed in order to find
structure based drug design. Two natural gallic acid derivative have been repoted as a potent inhibitor to beta secretase enzyme.
1) The document discusses strategies for designing targeted arrays to screen nuclear receptor ligands, including defining nuclear receptor chemical space and using x-ray crystallography data to assess potential ligands.
2) Virtual arrays are generated and analyzed using shape and pharmacophore matching tools like ROCS to prioritize arrays based on similarity to known receptor ligands.
3) Limitations of the current approach include using a single protein structure, not accounting for flexibility, and limitations of computational docking. Advancing the methods could improve array design for orphan receptors.
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET Journal
This document proposes a machine learning framework for predicting drug-target interactions. It extracts features from drug molecules using FP2 fingerprints and from protein sequences using PsePSSM. It then uses Lasso dimensionality reduction to select important features before balancing the data using SMOTE. Finally, it trains an SVM classifier on the processed data to predict drug-target interactions. The framework achieved better performance than traditional methods by leveraging machine learning techniques for efficient and effective prediction of interactions without costly experiments.
This document discusses descriptive versus mechanistic modeling approaches in drug discovery. It provides examples of descriptive modeling, which aims to describe data patterns without understanding the underlying mechanisms, and mechanistic modeling, which works with domain experts to translate scientific knowledge into mathematical representations of the data-generating processes. The document presents tumor growth curve analysis as an example where mechanistic models like Richards and Gompertz curves can incorporate understandings of competing catabolic and anabolic processes to better capture the fundamental characteristics of growth.
Meta-docking is a Bayesian mixture model for improving protein-ligand interaction predictions from multiple docking scores. It accounts for differences in score distributions between active and decoy ligands arising from docking program and ligand effects. Compared to standard consensus docking, meta-docking shows small but consistent improvements in ranking ligands, detecting more active ligands among top ranks. Future work includes investigating inter-ligand features from multiple programs to further improve ligand ranking.
Practical Drug Discovery using Explainable Artificial IntelligenceAl Dossetter
How to build AI systems to enable the drug hunting medicinal chemist in their day-to-day work. Levels are AI are described and the meaning and context Explainable AI to medicinal chemists. Six medicinal chemist projects are described, as well as Matched Molecular Pair Analysis (MMPA), Machine Learning and Permutative MMPA. In each case how a system can be built to drill back to chemical sub-structures so effective decisions can be made.
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
Maximize Your Understanding of Operational Realities in Manufacturing with Predictive Insights using Big Data, Artificial Intelligence, and Pharma 4.0
by Toni Manzano, PhD, Co-founder and CSO, Bigfinite
PDA Annual Meeting 2020
The document discusses random forest algorithms and their use in machine learning applications such as predicting breast cancer risk. It provides an overview of random forest methods, important hyperparameters that can improve predictive ability or increase speed, and advantages such as being robust, parallelizable, and handling unbalanced data. It also discusses some drawbacks like lower interpretability. The document then describes applying random forests to a dataset from the PLCO study to predict breast cancer risk, including data preprocessing steps and comparing models using different input variables.
Artificial intelligence (AI) has the potential to transform drug discovery through the use of semantic networks to represent biomedical knowledge as facts and infer new insights, probabilistic rules to evaluate beliefs about candidate drugs, and optimization and simulation techniques to iteratively improve outcomes. By continuously learning from diverse sources of data, AI systems could automate and enhance the processes of target identification, compound screening and testing in ways that accelerate research and development compared to traditional analytical tools.
The document summarizes recent developments in using machine learning techniques for computational drug docking. It finds that machine learning methods, such as random forests, can more accurately predict binding affinity between proteins and ligands compared to traditional scoring functions. Specifically, the best random forest model achieved a correlation of 0.803 between predicted and experimental binding affinity, compared to 0.644 for classical scoring functions. Machine learning also more accurately ranks ligands and identifies the top binding pose. The document concludes that machine learning is better able to utilize relevant molecular features for computational drug docking compared to traditional methods.
Myself Omkar B. Tipugade ,M-Pharm Sem II, Department of Pharmaceutics , Today I upload the presentation on Artificial Intelligene , In that I discuss about the definition of AI as well as their important in Pharmaceutical field . Also give brief information about the Neural networking & fuzzy logic with diagrammatic presentation And also application of AI in product formulation. I highlight the important words.
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
This document proposes using a combination of K-nearest neighbors (KNN) and genetic algorithms to classify chemical medicine or drug data with improved accuracy. KNN is described as a simple and effective classification algorithm that stores training data instances. Genetic algorithms are presented as evolutionary algorithms useful for optimization problems. The proposed system applies genetic search to rank attribute importance, selects high-ranked attributes, and then applies both KNN and genetic algorithms to classify the drug data, aiming to improve classification accuracy over using either technique alone. The combination of KNN and genetic algorithms is expected to better optimize classification of complex medical data compared to other algorithms.
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Softwarejatwood3
The document summarizes ProteoIQ Quantitative Proteomics Software. It provides a centralized software package for all proteomic studies that enables faster and more accurate data analysis compared to using multiple platforms. ProteoIQ offers robust data integration, experimental design modeling, industry-leading data visualization, qualitative comparisons, and spectral counting, isobaric tag, isotopic label, and label-free quantification. Its goal is to help users get to biological insights more quickly.
This document discusses structure-based and ligand-based drug design approaches. Structure-based design uses the 3D structure of biological targets to dock potential drug molecules. Ligand-based design analyzes similar molecules that bind to the target to derive pharmacophore models or quantitative structure-activity relationships (QSAR) to predict new candidates. Specific structure-based methods covered include docking tools like AutoDock and CDOCKER, and accounting for protein and complex flexibility. Ligand-based methods discussed are QSAR techniques like Comparative Molecular Field Analysis (CoMSIA) and Field Analysis (CoMFA). In conclusion, computational approaches like these are valuable for drug discovery by facilitating the identification and testing of new ligand
Driver Analysis and Product Optimization with Bayesian NetworksBayesia USA
Market driver analysis and product optimization are one of the central tasks in Product Marketing and thus relevant to virtually all types of businesses. BayesiaLab provides a uni!ed software platform, which can, based on consumer data,
1. provide deep understanding of the market preference structure
2. directly generate recommendations for prioritized product actions.
The proposed approach utilizes Probabilistic Structural Equation Models (PSEM), based on machine-learned Bayesian networks. PSEMs provide an ef!cient alternative to Structural Equation Models (SEM), which have been used traditionally in market research.
This paper presents a set of methods that uses a genetic algorithm for automatic test-data generation in
software testing. For several years researchers have proposed several methods for generating test data
which had different drawbacks. In this paper, we have presented various Genetic Algorithm (GA) based test
methods which will be having different parameters to automate the structural-oriented test data generation
on the basis of internal program structure. The factors discovered are used in evaluating the fitness
function of Genetic algorithm for selecting the best possible Test method. These methods take the test
populations as an input and then evaluate the test cases for that program. This integration will help in
improving the overall performance of genetic algorithm in search space exploration and exploitation fields
with better convergence rate.
Structure based drug design- kiranmayiKiranmayiKnv
This presentation helps in detail learning about the structure based drug design. It includes types of structure based drug design and detailed study of docking, de novo drug design.
Ieee transactions on 2018 knowledge and data engineering topics with abstract .tsysglobalsolutions
This document contains abstracts from 8 different academic papers. The abstracts discuss various topics related to knowledge and data engineering including traditional Chinese medicine prescriptions, multi-label learning, viral marketing in online social networks, location recommendation, nonnegative matrix factorization, distributed processing of top-k dominating queries, and supervised topic modeling for e-commerce applications. The contact information for TSYS Academic Projects is also provided at the top.
This document describes a research project aimed at creating an application that can predict whether chemical compounds will be able to pass through the blood-brain barrier. It discusses using a hybrid approach of artificial neural networks and genetic algorithms. The neural networks are trained on a dataset of chemical properties paired with blood-brain barrier permeability labels. The genetic algorithm is used to optimize parameters of the neural networks like the number of hidden nodes, learning rate, and momentum. The results section indicates the performance of the basic neural network and hybrid genetic algorithm neural network approaches were evaluated using statistical measures of sensitivity and specificity based on their predictions on the test dataset.
De novo drug design is a computer-assisted process that uses the 3D structure of a receptor target to design new drug molecules. It involves determining the structure of existing drug-target complexes, designing modifications to existing lead compounds, and generating new chemical classes of compounds. The goal is to design drugs with the correct shape and functional groups to properly fit and interact within the target's binding site in order to produce the desired pharmacological effect. However, de novo design is challenging and rarely produces ideal compounds on its own. Software tools aim to automate and speed up the process by fitting and linking together molecular fragments to fill interaction sites in the binding pocket.
Are Evolutionary Algorithms Required to Solve Sudoku Problemscsandit
1. Evolutionary algorithms aim to iteratively improve solutions through random mutation and fitness evaluation, but can become trapped in local optima. The author developed a "greedy random" mutation approach that preferentially adds rather than removes values.
2. Experiments showed this greedy random mutation was sometimes more effective at solving harder Sudoku puzzles than traditional evolutionary algorithms. This implies the quality of random mutation can significantly impact evolutionary algorithm performance with Sudoku.
3. The greedy random mutation was integrated into the evolutionary algorithm lifecycle to balance exploration and exploitation. Candidates were assessed after a removal to concentrate entropy around boundary solutions.
Qsar studies on gallic acid derivatives and molecular docking studies of bace...bioejjournal
It is reported that Alzheimer disease is linked with hypertension, diabetes type 2 and high cholesterolemia. The underlying genetic cause relating these diseases are not well studied clinically. But it has been widely
accepted that beta secretase (BACE1) is the main culprit of causing Alzheimer disease. This enzyme comes under peptidase A1 family. In the present work, ligand based and structure based drug designing have been reported. QSAR studies were done using 21 gallic acid derivatives dataset to develop good predictive
model in order to predict biological activity and certain descriptors was reported to further enhance the
analgesic activity of gallic acid derivatives. Molecular docking studies were performed in order to find
structure based drug design. Two natural gallic acid derivative have been repoted as a potent inhibitor to beta secretase enzyme.
Qsar Studies on Gallic Acid Derivatives and Molecular Docking Studies of Bace...bioejjournal
It is reported that Alzheimer disease is linked with hypertension, diabetes type 2 and high cholesterolemia. The underlying genetic cause relating these diseases are not well studied clinically. But it has been widely accepted that beta secretase (BACE1) is the main culprit of causing Alzheimer disease. This enzyme comes under peptidase A1 family. In the present work, ligand based and structure based drug designing have been
reported. QSAR studies were done using 21 gallic acid derivatives dataset to develop good predictive model in order to predict biological activity and certain descriptors was reported to further enhance the analgesic activity of gallic acid derivatives. Molecular docking studies were performed in order to find
structure based drug design. Two natural gallic acid derivative have been repoted as a potent inhibitor to beta secretase enzyme.
1) The document discusses strategies for designing targeted arrays to screen nuclear receptor ligands, including defining nuclear receptor chemical space and using x-ray crystallography data to assess potential ligands.
2) Virtual arrays are generated and analyzed using shape and pharmacophore matching tools like ROCS to prioritize arrays based on similarity to known receptor ligands.
3) Limitations of the current approach include using a single protein structure, not accounting for flexibility, and limitations of computational docking. Advancing the methods could improve array design for orphan receptors.
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET Journal
This document proposes a machine learning framework for predicting drug-target interactions. It extracts features from drug molecules using FP2 fingerprints and from protein sequences using PsePSSM. It then uses Lasso dimensionality reduction to select important features before balancing the data using SMOTE. Finally, it trains an SVM classifier on the processed data to predict drug-target interactions. The framework achieved better performance than traditional methods by leveraging machine learning techniques for efficient and effective prediction of interactions without costly experiments.
This document discusses descriptive versus mechanistic modeling approaches in drug discovery. It provides examples of descriptive modeling, which aims to describe data patterns without understanding the underlying mechanisms, and mechanistic modeling, which works with domain experts to translate scientific knowledge into mathematical representations of the data-generating processes. The document presents tumor growth curve analysis as an example where mechanistic models like Richards and Gompertz curves can incorporate understandings of competing catabolic and anabolic processes to better capture the fundamental characteristics of growth.
Meta-docking is a Bayesian mixture model for improving protein-ligand interaction predictions from multiple docking scores. It accounts for differences in score distributions between active and decoy ligands arising from docking program and ligand effects. Compared to standard consensus docking, meta-docking shows small but consistent improvements in ranking ligands, detecting more active ligands among top ranks. Future work includes investigating inter-ligand features from multiple programs to further improve ligand ranking.
Practical Drug Discovery using Explainable Artificial IntelligenceAl Dossetter
How to build AI systems to enable the drug hunting medicinal chemist in their day-to-day work. Levels are AI are described and the meaning and context Explainable AI to medicinal chemists. Six medicinal chemist projects are described, as well as Matched Molecular Pair Analysis (MMPA), Machine Learning and Permutative MMPA. In each case how a system can be built to drill back to chemical sub-structures so effective decisions can be made.
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
Maximize Your Understanding of Operational Realities in Manufacturing with Predictive Insights using Big Data, Artificial Intelligence, and Pharma 4.0
by Toni Manzano, PhD, Co-founder and CSO, Bigfinite
PDA Annual Meeting 2020
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
This module addresses critical business aspects related to launching a predictive analytics project. How to establish the relationship with business KPIs is discussed. A notion of data hunt, for planning & acquiring external data for better predictions is introduced. Model quality and it's role for ROI of data and prediction tasks are explained. The module is concluded with a glimpse on how collaborative data challenges can improve predictive model quality in no time.
SMi Group's AI in Drug Discovery 2020 conferenceDale Butler
This document provides information about an upcoming conference on AI in Drug Discovery taking place from March 16-17, 2020 in London, UK. It includes details about the program, speakers, registration fees and deadlines. The conference will explore how machine learning and AI are being applied across the drug discovery process, from compound design and virtual screening to toxicity prediction. There will be case studies presented from major pharmaceutical companies on their experience integrating AI. The document also advertises sponsorship opportunities for the event.
The document discusses applications of artificial intelligence in drug discovery and development. It begins with an introduction stating the overarching question of what AI applications exist and if it can make the process more efficient. It then outlines the structure of the document in answering sub-questions about what AI is, current approaches, breakthroughs, and challenges. Case studies are provided on startups like Atomwise that use AI for drug candidate generation and Phenomics AI for disease mechanism understanding. Challenges of AI include ethical concerns, regulatory hurdles, and technical obstacles but opportunities exist to make the process more efficient.
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Databricks
Human genetics holds the key to understanding pathogenesis of many devastating diseases like type 2 diabetes and Alzheimer’s disease. The discovery, development, and commercialization of new classes of drugs can take 10-15 years and greater than $5 billion in R&D investment only to see less than 5% of the drugs make it to market. Committed to creating therapeutic innovations, Regeneron has built one of the world’s most comprehensive genetics databases to supplement our state-of-the-art drug development pipeline. While these massive volumes of data provide an unprecedented opportunity to gain novel therapeutic insights, Regeneron has encountered a number of challenges on the road to delivering on the promises of big data and genomics in drug discovery. For example, how do you enable fast and accurate query from >80B data points? And how do you expedite novel statistical tests on TB-scale data?
This presentation will share Regeneron’s vision for building a scalable and performant informatics infrastructure to accelerate genetics-driven drug development. Specifically, we highlight key challenges in establishing the world’s largest clinical genetics databases, provide an overview of how Regeneron leverages Databricks’ Unified Analytics Platform and Apache Spark, and discuss in detail key engineering innovations that have already come out of this collaborative effort.
The document discusses exploiting medicinal chemistry knowledge to accelerate drug discovery projects through in silico drug design techniques. It provides an agenda for a presentation covering problem statements around long development times, sources for design ideas like literature and patents, techniques for 2D and 3D molecular design including QSAR and docking models, and examples of applying these methods to specific drug targets. The presentation aims to explain how to analyze data rigorously and refine compound designs to find drug candidates faster.
Artificial intelligence in Drug discovery and delivery.pptxManjusha Bandi
This document summarizes a seminar on integrating artificial intelligence in drug discovery and delivery. It begins with an introduction to AI, defining it as using machine learning to emulate human cognitive tasks. It then reviews literature on using AI in various pharmaceutical applications and discusses types of AI like deep learning and machine learning. The document outlines several uses of AI in drug discovery for tasks like target identification, toxicity prediction, and drug design. It also discusses using AI to model drug delivery systems like solid dispersions and emulsions. Finally, it acknowledges challenges of AI integration like data quality but emphasizes the benefits of combining AI and human expertise to enhance the drug development process.
This process comprises obtaining data, developing efficient systems for the uses of obtained data, illustrating definite or approximate conclusions and self-corrections / adjustments.
In general, AI is used for analyzing the machine learning to imitate thecognitive tasks of individual.
AI technology is exercised to perform more accurate analyses as well as to attain useful interpretation.
It can handle large volumes of data with enhanced automation.
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
The challenge of accurately characterizing bioassays is a real pain point for many drug discovery organizations. Research has shown that some organizations have legacy assay collections exceeding 20,000 protocols, the great majority of which are not accurately characterized. This problem is compounded by the fact that many new protocol registrations are still not following FAIR (Findability, Accessibility, Interoperability, and Reusability) Data principles.
BioAssay Express is a tool focused on transforming the traditional protocol description from an unstructured free form text into a well-curated data store based upon FAIR Data principles. By using well-defined annotations for assays, the tool enables precise ontology based searches without having to resort to imprecise keyword searches.
This talk explores a number of new important features designed to help scientists accelerate the drug discovery process. Some example use-cases include: enabling drug repositioning projects; improving SAR models; identifying appropriate machine learning data sets; fine-tuning integrative-omic pathways;
An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional. One of the many possibilities is to take the initial prospective ELN entry for a bioassay protocol and feed it directly to an automated instrument. While there are many challenges involved in creating the ELN-to-robot loop, we will provide some insights into our collaborations with UCSF automation experts.
In summary, the ability to quickly and accurately search or analyze bioassay data (public or internal) is a rate limiting problem in drug discovery. We will present the latest developments toward removing this bottleneck.
https://plan.core-apps.com/acs_sd2019/abstract/6f58993d-a716-49ad-9b09-609edde5a3f4
SCI What can Big Data do for Chemistry 2017 MedChemicaEd Griffen
This document discusses how advanced analytics and big data techniques can be applied in the chemistry industry. It provides examples of how matched molecular pair analysis has been used to extract statistically valid structure-activity relationships from large datasets and summarize them in the form of transformation rules. These rules have helped suggest new molecules, explore structure-activity relationships, identify exceptional structure-property relationships, and enable the rapid optimization of drug candidates. The document argues that combining data from multiple sources yields more comprehensive rules and that interfaces must be designed with the intended users in mind.
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...ChemAxon
Boehringer Ingelheim's Nils Weskamp discusses eDesign: a computational platform for molecule design and optimization. This presentation explaing how to combine data, algorithms and user experience to impact compound design, and gives a glimpse into the agile and interdisciplinary teamwork as facilitated by Design Hub as a success factor for the development of digital tools.
Bio-Debug UG is a one-stop solution for all bio-data analysis needs that uses a multidisciplinary team approach. It comprises experts in bioinformatics, genomics, computer science, and more to provide custom solutions for clients' unique projects. Some of the services offered include analysis of next generation sequencing data, genome wide association studies, modeling expression, and more for sectors like research and development, healthcare, pharma, and biotechnology. The company aims to bridge the gap between bioinformatics and biology through close collaboration.
Artificial intelligence (AI), robotics, and computational fluid dynamics (CFD) have various applications in the pharmaceutical industry. AI can be used for disease identification, personalized treatment, drug discovery, clinical trial research, and improving healthcare. It has the potential to reduce drug development costs and time. Robotics is being used for tasks like packing drugs, labeling, filling, and capping vials to increase automation. Both AI and robotics face challenges like high costs but have promising futures in areas like personalized medicine and improving drug development.
IRJET - Machine Learning for Diagnosis of DiabetesIRJET Journal
This document describes a study that uses machine learning models to predict whether a person has diabetes based on patient data. The researchers created several classification models using algorithms like logistic regression and support vector machines on a diabetes dataset. The models with the highest accuracy at predicting diabetes were random forest and gradient boosting. An Android app was also developed to input patient data, run the predictions from the trained models, and display the results to help diagnose diabetes. The goal is to help reduce diabetes rates and healthcare costs by improving diagnosis.
FASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFERiQHub
Dr. Moritz von Stosch presented on using hybrid modeling and transfer learning to accelerate vaccine process development. Transfer learning allows knowledge gained from previous processes and products to be applied to new processes, reducing the number of experiments needed. DataHow uses techniques like entity embedding and Gaussian processes to transfer learning between cell lines and products. This was demonstrated on a practical example where incorporating transfer learning between two production campaigns improved model predictions and understanding of a new complex biologic with limited experimental data.
Artificial intelligence robotics and computational fluid dynamics Chandrakant Kharude
The document discusses applications of artificial intelligence, robotics, and computational fluid dynamics in the pharmaceutical industry. It provides introductions and definitions for each technology, as well as their current and potential applications. Some key applications discussed include using AI for disease identification, personalized treatment, drug discovery/manufacturing, and clinical trials. Applications of robotics mentioned include use in research and development, packaging, sterile syringe filling, and laboratory automation. Current challenges and future directions are also addressed.
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
This presentation described how data-driven chemoinformatics methods may automate much of what has historically been done by a medicinal chemist. It explored what is reasonable to expect “AI” approaches might achieve, and what is best left with a human expert. The implications of automation for the human-machine interface were explored and illustrated with examples from Bradshaw, GSK’s experimental automated design environment.
The document discusses strategic consulting services from PEPR Consulting to help life science companies increase productivity and efficiency in their lab operations through a seven-step program. This includes an analysis of current processes and systems, a Multi Moment Analysis to identify inefficient tasks, development of optimized future processes and systems requirements, creation of a business case, and support in selecting and deploying new lab informatics solutions. The consulting aims to realize improvements of up to 30% for clients.
Similar to Emerging Challenges for Artificial Intelligence in Medicinal Chemistry (20)
Virtual Toxicity panels focussed on interpretable machine learning models that can guide medicinal chemists to identify critical substructures that are assocaited with toxicities.
Lecture given by Ed Griffen UKQSAR meeting Sept 2017. Covers material from work in our paper http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935 background discussed in https://www.linkedin.com/pulse/first-draft-medicinal-chemistry-admet-encyclopedia-ed-griffen/
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Sérgio Sacani
Wereport the study of a huge optical intraday flare on 2021 November 12 at 2 a.m. UT in the blazar OJ287. In the binary black hole model, it is associated with an impact of the secondary black hole on the accretion disk of the primary. Our multifrequency observing campaign was set up to search for such a signature of the impact based on a prediction made 8 yr earlier. The first I-band results of the flare have already been reported by Kishore et al. (2024). Here we combine these data with our monitoring in the R-band. There is a big change in the R–I spectral index by 1.0 ±0.1 between the normal background and the flare, suggesting a new component of radiation. The polarization variation during the rise of the flare suggests the same. The limits on the source size place it most reasonably in the jet of the secondary BH. We then ask why we have not seen this phenomenon before. We show that OJ287 was never before observed with sufficient sensitivity on the night when the flare should have happened according to the binary model. We also study the probability that this flare is just an oversized example of intraday variability using the Krakow data set of intense monitoring between 2015 and 2023. We find that the occurrence of a flare of this size and rapidity is unlikely. In machine-readable Tables 1 and 2, we give the full orbit-linked historical light curve of OJ287 as well as the dense monitoring sample of Krakow.
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
PPT on Alternate Wetting and Drying presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
11.1 Role of physical biological in deterioration of grains.pdf
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
1. October 2019
Exploiting medicinal chemistry knowledge to accelerate projects
Emerging Challenges for Artificial Intelligence in
Medicinal Chemistry
Dr Ed Griffen
IBSA Lugano October 2019
2. Exploiting medicinal chemistry knowledge to accelerate projectsExploiting medicinal chemistry knowledge to accelerate projects
• Founded in 2012 by experienced large Pharma
medicinal/computational chemists to accelerate drug
hunting by exploiting data driven knowledge
• Domain leaders in SAR knowledge extraction and
knowledge based design
• > 10 years experience of building AI systems that suggest
actions to chemists (7 years as MedChemica)
• Creators of largest ever documented database of
medicinal chemistry ADMET knowledge
3. Exploiting medicinal chemistry knowledge to accelerate projectsExploiting medicinal chemistry knowledge to accelerate projects
…7 Years of working with pharma
companies
“Our median number of compounds per LO project is 3000 - this is
unsustainable… [it should be] 300”
– Director of Chemistry (large pharma)
“Can we define the text book of medincal chemistry?”
– Director of Comp Chem (large pharma)
“We are aiming at 300 compound per project – currently we are
about 400, we will get better”
– ExScienta scientist at SCI ‘What can BigData do for chemistry’ –
London Oct 2017
MedChemica uses knowledge extraction techniques to build “expert
systems” to suggest actions to chemists and reduce the time and cost
to critical compounds and candidate drugs.
4. Exploiting medicinal chemistry knowledge to accelerate projectsExploiting medicinal chemistry knowledge to accelerate projects
Explainable AI
The future of AI lies in enabling people to collaborate with machines to solve complex
problems.
Like any efficient collaboration, this requires good communication, trust, clarity and
understanding.
- Freddy Lecue, Explainable AI Research Lead, Accenture Labs
https://www.accenture.com/gb-en/insights/technology/explainable-ai-human-machine
Black box machine learning models are currently being used for high-stakes decision
making throughout society, causing problems in healthcare, criminal justice and other
domains. Some people hope that creating methods for explaining these black box
models will alleviate some of the problems, but trying to explain black box models,
rather than creating models that are interpretable in the first place, is likely to
perpetuate bad practice and can potentially cause great harm to society. The way
forward is to design models that are inherently interpretable.
- Cynthia Rudin Nature Machine Intelligence (2019), 206–215.
5. Exploiting medicinal chemistry knowledge to accelerate projects
Use the right Machine Learning tool for the right problem
Where is Medicinal Chemistry?
Interpretable
Failure cost high
Immature science
Highly skilled, critical users
Business-2-Business
Transparent and auditable
Black Box
Failure cost is low
Real time response critical
Interactive = self correcting
Business-2-consumer
User agnostic of process
6. Exploiting medicinal chemistry knowledge to accelerate projects
Help the HiPPOs – or they’ll crush you
1. McAfee & Brynjolfsson “Big Data: The Management Revolution”,
Harvard Business Review October 2012
“Companies often make most of their
important decisions by relying on
“HiPPO”—the highest-paid person’s
opinion.”1
Chemistry HiPPs:
• experts in pattern recognition
• judged on their ability to make the best decisions with partial data
• highly trained
• time poor
• delivery focused
• gatekeepers to the adoption of new approaches
7. Exploiting medicinal chemistry knowledge to accelerate projects
Data
Warehouse
rule
finder
Exploitable
Knowledge
Molecule
problem
solving
Explainable
QSAR
Automated
loader
MMPA
Clean
Structures &
Data
Property
Prediction
Idea ranking
Instant SAR
analysis
MCPairs
REST API & GUI
Explainable AI for Medicinal Chemistry Design
8. Exploiting medicinal chemistry knowledge to accelerate projects
Molecule Problem Solving
Compounds from Rules
• Exploitable Knowledge is a rule database derived from MMPA
• User puts in a problem molecule with a property they wish to
improve – eg solubility, metabolism, hERG….
• System generates potential improved molecules based on data
Exploitable
Knowledge
MC Expert
Enumerator
System
Problem molecule + property to improve
Solution molecules
Compounds from Rules
https://www.youtube.com/watch?v=lITAT6_-i1E&list=PLtkCAojNL97xs1kd5JHngjIRhl4ZPFTlL&index=3
10. Exploiting medicinal chemistry knowledge to accelerate projects
MMPA Enables knowledge sharing
MMPA
MMPA
MMPA
Combine
and
Extract
Rules
Multiple Pharma
ADMET data
>437000 rules
Better
Project
decisions
Increased
Medicinal
Chemistry
learning
Kramer, Robb, Ting, Zheng, Griffen, et al. J. Med. Chem. 2018, 61(8), 3277-3292
http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935
Our MMPA technology enabled knowledge sharing between multiple
organisations (AstraZeneca, Hoffman La Roche and Genentech)
11. Exploiting medicinal chemistry knowledge to accelerate projectsExploiting medicinal chemistry knowledge to accelerate projects
Griffen, E. et al. J. Med. Chem. 2011, 54(22), pp.7739-7750.
Fully Automated Matched Molecular Pair Analysis (MMPA)
Knowledge Extraction that’s understandable by chemists
Δ Data
A-B1
2
2
3
3
3
4
4
4
12
23
3
34
4
4A B
• Matched Molecular Pairs – Molecules that differ only by a
particular, well-defined structural transformation
• Capture the change and environment – MMPs can be
recorded as transformations from Aà B
• Statistical analysis to define “medicinal chemistry rules”
Defined transformations with high probability of improving
properties of molecules
• Store in a high performance database and provide an
intuitive user interface
12. Exploiting medicinal chemistry knowledge to accelerate projects
Identify and group matching SMIRKS
Calculate statistical parameters for each unique
SMIRKS (n, median, sd, se, n_up/n_down)
Is n ≥ 6?
Not enough data:
ignore transformation
Is the |median| ≤ 0.05 and the
intercentile range (10-90%) ≤ 0.3?
Perform two-tailed binomial test on the
transformation to determine the
significance of the up/ down frequency
transformation is
classified as ‘neutral’
Transformation classified as
‘NED’ (No Effect Determined)
Transformation classified as
‘increase’ or ‘decrease’
depending on which direction the
property is changing
pass fail
yes no
yes no
Rule selection
0 +ve-ve
Median data difference
Neutral IncreaseDecrease
NED
• No assumption of normal
distribution
• Manages ‘censored’ =
qualified / out-of-range
data
13. Exploiting medicinal chemistry knowledge to accelerate projects
Base of Success Story from Genentech
193 compounds
Enumerated
Objective:
improve
metabolic
stability
Enumeration
Calculated Property
Docking
8 compounds
synthesized
100 cmpds x ($2K make + $1K test) = $ 300 000
8 cmpds x ($2K make + $1K test) = $ 24 000
It is not just money, it is actually time
100 cmpds make & test ~ 15 – 25 weeks
8 cmpds make & test ~ 2 – 4 weeks
14. Exploiting medicinal chemistry knowledge to accelerate projects
tBu metabolism issue
Benchmark
compound
Predicted to offer most improvement in microsomal stability (in at least 1 species / assay)
R2
R1
tBu Me Et iPr
99
392
16
64
78
410
53
550
99
288
78
515
41
35
98
327
92
372
24
247
35
128
24
62
60
395
39
445
3
21
20
27
57
89
54
89
• Data shown are Clint for HLM and MLM (top and bottom, respectively)
R1 R2R1tBu
Roger Butlin
Rebecca Newton
Allan Jordan
15. Exploiting medicinal chemistry knowledge to accelerate projectsExploiting medicinal chemistry knowledge to accelerate projects
Tubulin Polymerization Inhibitors
15
16. Exploiting medicinal chemistry knowledge to accelerate projects
Indole-3-glyoxylamide Based Series of Tubulin Polymerization Inhibitors
– Increase potency, solubility and reduce metabolism
– Enable in-vivo xenograft studies
Thompson, M. et al J. Med. Chem., 2015, 58 (23),
pp 9309–9333
MMPA solubility
& QSAR calcsIndibulin D-24851
LC50 0.032
XlogP 3.35
~ potent
In-vivo activity
poor solubility (~ 1uM)
LC50 0.027
XlogP 2.02
LC50 0.055
XlogP 2.91
solubility (~10-80uM)
LC50 0.031
XlogP 2.57
solubility (~10-80uM)
59
17. Exploiting medicinal chemistry knowledge to accelerate projects
Idea Ranking
SpotDesign
• Use the knowledge database to estimate how good an idea is
compared to a benchmark molecule
• System generates assessment based on data
17
Exploitable
Knowledge
SpotDesign
Idea molecule + benchmark
molecule + property
Assessment of idea molecule
compared to benchmark
SpotDesign
https://www.youtube.com/watch?v=JMhQvNdBOFs&index=2&list=PLtkCAojNL97xs1kd5JHngjIRhl4ZPFTlL
19. Exploiting medicinal chemistry knowledge to accelerate projects
Property Prediction
Automated Explainable QSAR
• Chemists get predictions with the substructures highlighted that are
driving prediction and the molecules used to support that part of the
model – transparent / explainable AI.
Explainable
QSAR
Clean
Structures &
Data
Property
Prediction
Molecule Structure
+ property to predict
Prediction
+ clear drivers of prediction
20. Exploiting medicinal chemistry knowledge to accelerate projects
2
Feature Definition
Basic Group Atom or group most likely protonated at pH 7.4
Acidic Group Atom or group most likely deprotonated at pH 7.4, includes N
and C acids
Acceptor Definitions derived from Taylor & Cosgrove
Donor Definitions derived from Taylor & Cosgrove
Hydrophobic C4 or greater cyclic or acyclic alkyl group
Aromatic Attachment connection of any group to an aromatic atom excluding
connections within rings
Aliphatic Attachment connection of any atom to an aliphatic group not in a ring.
Halo F,Cl, Br, I
Reference for Donor acceptor feature definitions:
Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O. J Comput Aided Mol Des 2012, 26 (4), 451–
472.
Acid & Base definitions are SMARTS including C, N, heteroaromatic acids, bases excluding weak aniline bases,
including amidines, guanidine’s - MedChemica definitions.
MedChemica Advanced Pharmacophore Pairs
Gobbi, A.; Poppinger, D. Biotechnology and Bioengineering 1998, 61 (1), 47–54.
Reutlinger, M.; Koch, C. P.; Reker, D.; Todoroff, N.; Schneider, P.; Rodrigues, T.; Schneider, G. Mol. Inf. 2013, 32 (2),
133–138.
21. Exploiting medicinal chemistry knowledge to accelerate projects
Pay attention to your descriptors
• Chemistry must make sense
Simple
H bond
acceptor
base acid
Precise
Diclofenac
(1973)
Sulfadiazine
(1941)
DMAP
22. Exploiting medicinal chemistry knowledge to accelerate projects
Regression Forest & Pharmacophore understanding
• hERG – auditable models
• Identify important chemical features driving potency
• Predict hERG potency from RF model [10 fold CV]
Pharmacophore fp length 280
10 fold CV
Compounds in training 5968
RMSE 0.16
Pearson R2 0.27
23. Exploiting medicinal chemistry knowledge to accelerate projects
• hERG – auditable models
• Predict hERG potency from RF model [10 fold CV]
• Example CHEMBL12713 sertindole
• Colour structure by feature importance
weighted sum of of pharmacophore pair
fingerprints – show the chemists where the
hotspots are.
• Drill deeper to show the most important
positive and negative features. RF prediction pIC50 7.7
median_with: 5.1
median_without: 4.7
median_diff: 0.4
n_examples_with: 4585
n_examples_without : 1383
median_with: 5.1,
median_without: 5.3
median_diff: -0.2
n_examples_with: 3106
n_examples_without : 2862
Regression Forest & Pharmacophore understanding
24. Exploiting medicinal chemistry knowledge to accelerate projects
kNN – Understanding from neighbouring structures
• hERG – auditable models
• Predict hERG potency from kNN model [10 fold CV]
• Example CHEMBL12713 sertindole
• Identify the closest neighbours - by
Tanimoto to ECFP4 fingerprint
• Show chemists structures
kNN prediction pIC50 8.2
distance 0.17 0.2 0.23
pIC50 7.7 4.1 8.2
25. Exploiting medicinal chemistry knowledge to accelerate projects
• ML models built for 20 critical seizure related CNS targets
• Communicate to chemists activity prediction & if model out of domain
• Show close structures and/or toxophores
Seizure prediction by Composite Machine Learning
CHEMBL 12713 sertindole
seizure activity observed
clinically
Predictions in line with
measured data
More potent than 1µM
Less potent than 1µM
Out of Domain – no
prediction possible
27. Exploiting medicinal chemistry knowledge to accelerate projects
Pair & Rule
Database
Compounds
from Rules
API server
RESTful
API
Compound
to Pairs
MCRules
Corporate structures and measurements
from DB
Structure and
data clean up
Spot Design
Pair
finding
Web GUI
MedChemica
In-House
Design tools
CLI
MedChemica
Clean Structures
& Data
Explainable
QSAR
Engineering and Automation
28. Exploiting medicinal chemistry knowledge to accelerate projects
Data
Integrity and
curation Knowledge
extraction
algorithms
Engineering,
Automation
and
Interfaces
Interpretability
✓
✓
✓
✓
Knowledge
Database
MCPairs
Overcoming the Barriers to Implementing AI
MC GUI
30. Exploiting medicinal chemistry knowledge to accelerate projects
A Less Simple Example
Increase logD and gain solubility
Property Number of
Observations
Direction Mean Change Probability
logD 8 Increase 1.2 100%
Log(Solubility) 14 Increase 1.4 92%
What is the effect on
lipophilicity and solubility?
Roche data is inconclusive! (2
pairs for logD, 1 pair for
solubility)
logD = 2.65
Kinetic solubility = 84 µg/ml
IC50 SST5 = 0.8 µM
logD = 3.63
Kinetic solubility = >452 µg/ml
IC50 SST5 = 0.19 µM
Question:
Available
Statistics:
Roche
Example:
31. Exploiting medicinal chemistry knowledge to accelerate projects
Instant SAR Analysis
Compound to Pairs
• Chemists can instantly see the pairs to a compound and explore
property changes
31
Exploitable
Knowledge
Compound to
Pairs
Molecule of interest
All the matched pairs of that molecule
Compound to Pairs
https://www.youtube.com/watch?v=OFhZJulxsAw&t=0s&list=PLtkCAojNL97xs1kd5JHngjIRhl4ZPFTlL&index=2
33. Exploiting medicinal chemistry knowledge to accelerate projects
3 Possible input streams….
Rule
Database
REST - API
Your DB
crontab
MCPCLI
REST - API
ETL
custom
plugin
• Extract Transform Load (ETL)
• Custom plugin scripted by MedChemica
• Usually 3 – 4 weeks work
• On-site work and team interaction required
Exploitation
Your DB
Your DB
YOUR FIREWALL
assay1
• Export Flat files of data
• MCPCLI reads in files and deletes
1
2
3
• Direct Read Access to DB
• SQL searches compounds /
measurements
• https requests for compounds /
measurements
• Most robust option
data
10 years
experience
building
automated
systems
MCPairs
Server
34. Exploiting medicinal chemistry knowledge to accelerate projects
Example Current Pharma install
Rule
Database
In-House Design tools
and workflows
REST - API
MedChemica Web
tool
MedChemica CLI
3 WAYS OF EXPLOITATION
D360
crontab
MCPCLI
REST - API
ETL custom
plugin
• Every 2 days…
• Latest compounds structure pulled from D360 and loaded
• Latest measurements from assays pulled and loaded
• Custom plugin handled data input streaming
• Update the matched pairs and update rules
PHARMA FIREWALL
MCPairs
Server