This document discusses issues with combinatorial chemistry techniques used in drug discovery and alternatives to increase molecular diversity. Specifically, it addresses the problems of screening large libraries of compounds produced by combinatorial synthesis. It proposes using improved virtual docking software that incorporates flexibility of drug targets to more accurately model binding and identify potentially active compounds. The document also reviews literature on fixing problems associated with large combinatorial libraries, such as using analytical techniques and docking simulations to search libraries and determine compound diversity.
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...ChemAxon
In a collaboration with ChemAxon we have developed a web-based interface for searching, browsing and managing chemical information. The system was designed to accommodate to capture the information that users stored in various documents in local files(like pdf, ppt slides, as images etc.). These bits of information were not centrally available, and when people moved on, this data was lost.
ChemAxon’s JChem Cartridge and its Markush extensions and Document to Database tool enabled us to collect this data. It serves a good basis for future developments too. When developing this new interface, we focused on ease of use, maintainability, and flexibility.
PubChem for drug discovery and chemical biologyChris Southan
This document provides an overview of the PubChem database for academic drug discovery and chemical biology. It describes PubChem's large content of over 97 million compounds and 3.4 million with bioactivity results. It highlights drug-related resources in PubChem like ChEMBL and the Guide to Pharmacology. It also demonstrates several use cases, including searching structures extracted from patents, linking between papers and chemistry, and getting probes mapped into PubChem.
The document discusses computer-aided drug design (CADD) and its role in drug discovery. CADD uses computer software and modeling to aid the drug design process and identify new drug candidates. It reduces the time and cost of drug discovery compared to traditional methods. Some of the earliest approved drugs discovered using CADD include Dorzolamide, Captopril, and drugs for HIV. CADD approaches include structure-based design using protein structure data and ligand-based design using information about known active/inactive ligands. Key steps involve target identification, obtaining protein structures, ligand docking simulations, and lead optimization.
Computer Assisted Drug Design By Rauf Pathan and Patel Mo ShaffanPathan Rauf Khan
CADD is modern technique of drug design and use of this technique reduce drug screening time and discover new drugs with specific therapeutic activity.
Will the correct drugs please stand up?Chris Southan
This document summarizes a study comparing different databases of approved drug structures mapped to PubChem identifiers (CIDs). The study found significant discordances between sources, with little consensus on total numbers of approved drugs or their structures. Only 183 structures were common to all 8 sources compared. The sources exhibited extensive structural multiplexing, with the same structure represented by multiple CIDs. This multiplexing extends beyond approved drugs and poses challenges for tasks like QSAR. Improved curation and direct submission of structures from drug developers could help resolve inconsistencies.
The IUPHAR/MMV Guide to Malaria Pharmacology Chris Southan
This document summarizes the creation of the IUPHAR/MMV Guide to Malaria Pharmacology (GtoMPdb) database by the authors. It captures antimalarial compounds, targets, and their relationships by curating data from publications. The database has adapted the Guide to Pharmacology data model and has begun capturing data on 28 antimalarial ligands. Future plans include expanding the curation, developing an online portal, and submitting data to PubChem to link compounds to publications and make the data more accessible.
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...ChemAxon
In a collaboration with ChemAxon we have developed a web-based interface for searching, browsing and managing chemical information. The system was designed to accommodate to capture the information that users stored in various documents in local files(like pdf, ppt slides, as images etc.). These bits of information were not centrally available, and when people moved on, this data was lost.
ChemAxon’s JChem Cartridge and its Markush extensions and Document to Database tool enabled us to collect this data. It serves a good basis for future developments too. When developing this new interface, we focused on ease of use, maintainability, and flexibility.
PubChem for drug discovery and chemical biologyChris Southan
This document provides an overview of the PubChem database for academic drug discovery and chemical biology. It describes PubChem's large content of over 97 million compounds and 3.4 million with bioactivity results. It highlights drug-related resources in PubChem like ChEMBL and the Guide to Pharmacology. It also demonstrates several use cases, including searching structures extracted from patents, linking between papers and chemistry, and getting probes mapped into PubChem.
The document discusses computer-aided drug design (CADD) and its role in drug discovery. CADD uses computer software and modeling to aid the drug design process and identify new drug candidates. It reduces the time and cost of drug discovery compared to traditional methods. Some of the earliest approved drugs discovered using CADD include Dorzolamide, Captopril, and drugs for HIV. CADD approaches include structure-based design using protein structure data and ligand-based design using information about known active/inactive ligands. Key steps involve target identification, obtaining protein structures, ligand docking simulations, and lead optimization.
Computer Assisted Drug Design By Rauf Pathan and Patel Mo ShaffanPathan Rauf Khan
CADD is modern technique of drug design and use of this technique reduce drug screening time and discover new drugs with specific therapeutic activity.
Will the correct drugs please stand up?Chris Southan
This document summarizes a study comparing different databases of approved drug structures mapped to PubChem identifiers (CIDs). The study found significant discordances between sources, with little consensus on total numbers of approved drugs or their structures. Only 183 structures were common to all 8 sources compared. The sources exhibited extensive structural multiplexing, with the same structure represented by multiple CIDs. This multiplexing extends beyond approved drugs and poses challenges for tasks like QSAR. Improved curation and direct submission of structures from drug developers could help resolve inconsistencies.
The IUPHAR/MMV Guide to Malaria Pharmacology Chris Southan
This document summarizes the creation of the IUPHAR/MMV Guide to Malaria Pharmacology (GtoMPdb) database by the authors. It captures antimalarial compounds, targets, and their relationships by curating data from publications. The database has adapted the Guide to Pharmacology data model and has begun capturing data on 28 antimalarial ligands. Future plans include expanding the curation, developing an online portal, and submitting data to PubChem to link compounds to publications and make the data more accessible.
Collaboraive sharing of molecules and data in the mobile ageSean Ekins
The document discusses collaborative drug discovery and the use of mobile applications in chemistry. It describes how the Collaborative Drug Discovery (CDD) platform allows researchers to securely share molecules and data. Examples are provided of collaborations between academic labs and pharmaceutical companies using the CDD vault to work on projects related to tuberculosis drug development. The rise of mobile devices is creating new opportunities for chemistry applications to enable collaborative workflows involving tasks like structure drawing, database searching, and data sharing from any location.
1. Collaboration and data sharing in science is essential but requires technological and cultural changes to allow for analysis and insights.
2. Improving data sharing across organizations is challenging due to heterogeneous systems and social barriers.
3. The Medicines Discovery Catapult is supporting the biopharma sector by developing solutions to improve data interpretation, collaborative data sharing between organizations, and by matching industry challenges to technology providers.
Assessing GtoPdb ligand content in PubChemChris Southan
The document discusses the content of ligands from the IUPHAR/BPS Guide to PHARMACOLOGY database (GtoPdb) that is contained within PubChem. It finds that GtoPdb ligands have extensive overlap with several other sources within PubChem, including patents, DrugBank, vendor structures, bioassays, and ChEMBL. This overlap allows users to find additional information on GtoPdb ligands from these complementary sources within PubChem.
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...Andrew McEachran
There is a growing need for rapid chemical screening and prioritization to inform regulatory decision-making on thousands of chemicals in the environment. We have previously used high-resolution mass spectrometry to examine household vacuum dust samples using liquid chromatography time-of-flight mass spectrometry (LC-TOF/MS). Using a combination of exact mass, isotope distribution, and isotope spacing, molecular features were matched with a list of chemical formulas from the EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database. This has further developed our understanding of how openly available chemical databases, together with the appropriate searches, could be used for the purpose of compound identification. We report here on the utility of the EPA’s iCSS Chemistry Dashboard for the purpose of compound identification using searches against a database of over 720,000 chemicals. We also examine the benefits of QSAR prediction for the purpose of retention time prediction to allow for alignment of both chromatographic and mass spectral properties. This abstract does not reflect U.S. EPA policy.
Computer aided drug design uses computational approaches to aid in the drug discovery process. There are several key approaches including ligand based approaches which identify characteristics of known active ligands, target based approaches which use information about the biological target, and structure based drug design which utilizes 3D structural information. The main steps in drug design include target identification and validation, lead identification and optimization, and preclinical and clinical trials. Computational tools are used throughout the process for tasks like molecular docking, ADMET prediction, and structure activity relationship analysis.
Virtual Screening and Hit PrioritizationPuneet Kacker
This document discusses virtual ligand screening (VLS) as an alternative to high-throughput screening for identifying potential drug candidates. It describes the VLS process, which involves selecting a target and compound library, preparing the target and ligands, running a docking simulation to analyze ligand-target binding, and prioritizing hits. The document outlines advantages of computational methods like VLS compared to experimental screening, as well as some limitations. It also provides examples of free and commercial docking engines that can be used and highlights challenges in VLS like accounting for receptor flexibility.
Multiplexing analysis of 1000 approved drugs in PubChemChris Southan
This document analyzes structural representations of 1000 approved drugs in PubChem. It finds that on average, each drug structure was submitted 81 times under different IDs. Each drug was also represented in 44 mixtures on average. The increased availability of patent data is contributing to issues like increased representations of unspecified chirality and increased "virtual" structures rather than real compounds. As chemical data in PubChem grows towards 100 million entries, structural multiplexing could make it harder to identify correct drug representations and is a problem for inexperienced users of large chemical databases.
Computer aided drug design uses computational methods to aid in the drug discovery process. It can be used for both structure based drug design where the biological target structure is known, as well as ligand based drug design where the target structure is unknown. Structure based drug design techniques include molecular docking to study ligand binding poses and interactions. Ligand based techniques include pharmacophore modeling to identify chemical features important for activity and quantitative structure-activity relationships to correlate chemical structure to biological activity. These computational methods allow for more rapid and cost-effective discovery and optimization of drug candidates compared to traditional experimental methods alone.
There is a growing need for rapid chemical screening and prioritization to inform regulatory decision-making on thousands of chemicals in the environment. We have previously used high-resolution mass spectrometry to examine household vacuum dust samples using liquid chromatography time-of-flight mass spectrometry (LC-TOF/MS). Using a combination of exact mass, isotope distribution, and isotope spacing, molecular features were matched with a list of chemical formulas from the EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database. This has further developed our understanding of how openly available chemical databases, together with the appropriate searches, could be used for the purpose of compound identification. We report here on the utility of the EPA’s iCSS Chemistry Dashboard for the purpose of compound identification using searches against a database of over 720,000 chemicals. We also examine the benefits of QSAR prediction for the purpose of retention time prediction to allow for alignment of both chromatographic and mass spectral properties. This abstract does not reflect U.S. EPA policy.
Computer aided drug design (CADD) uses computational methods to design and optimize drug molecules. There are three main approaches: ligand-based designs new drugs based on similarities to known active ligands; structure-based uses the 3D structure of the receptor to design ligands that bind to it; and de novo design custom designs ligands without prior information by identifying interaction points on the receptor and connecting ligand groups of atoms to those points. The goal is to design ligands that selectively bind and interfere or enhance biological targets related to diseases.
Computer-aided drug design (CADD) is a widely used technology using computational tools and resources for the storage, management, analysis and modeling of compounds. It relies on digital repositories for study of designing compounds with physicochemical characteristics, predicting whether a given molecule will be combined with the target, and if so how strongly. Computer based methods can help us to search new hits in drug discovery, screen many irrelevant compounds at the same time and study the structure-activity relationship of drug molecules.
This document describes RxnFinder, a database workflow tool for organic synthesis researchers. It contains over 2 million chemical reactions and 1.7 million substances. Researchers can use it to search for synthetic routes, reaction conditions, catalysts, and more. Key features include its focus on novel methods, inclusion of failed reactions, and display of full reaction schemes with scope and limitations. The document provides examples of different types of searches researchers can perform, including transformational, substructure, and property searches. It also describes how search results in RxnFinder can link to detailed article information on Wiley Online Library.
Drug discovery and development is a long and expensive process over time has notoriously bucked Moore's law that it now has its own law called Eroom's Law named after it (the opposite of Moore). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of drug failures. Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible becomes all the more important to accelerate drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains. Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories: 1. Classification 2. Regression 3. Read-across. The talk will also cover how by using a hierarchical classification methodology you can simplify the problem of assessing toxicity of any given chemical compound. We will also address recent progress of predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them. We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will also address some of the remaining challenges and limitations yet to be addressed in the area of drug safety assessment.
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...Frederik van den Broek
Slides from my talk at the ACS CINF Symposium on Chemical Nomenclature & Representation on 26 August 2019 in San Diego.
Abstract:
The first edition of the Beilstein Handbook of Organic Chemistry was published nearly 140 years ago. Electronic laboratory notebooks have been in use in chemistry for almost 20 years. And the life science industry still doesn't have a well-defined way of capturing and exchanging information about chemical reactions and relies on imprecise or vendor-specific data formats. Without a common language and structure to describe experiments, data integration is unnecessarily expensive and a significant part of published data has not been readily available for processing or analysis.
The Unified Data Model (UDM) project team aims to improve the situation. UDM is a collective effort of vendors and life science organizations to create an open, extendable and freely available reference model and data format for exchange of experimental information about compound synthesis and testing. Run under the umbrella of the Pistoia Alliance, the project team has published two releases of the UDM data format and it is expected that the model will continue to be improved as demand stipulates working with the Pistoia FAIR data implementation by industry community.
Drug discovery and development is a long and expensive process and over time has notoriously bucked Moore’s law that it now has its own law called Eroom’s Law named after it (the opposite of Moore’s). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of the failures.
Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible is paramount in accelerating drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains.
Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories:
1. Discovery,
2. Toxicity and Safety, and
3. Post-Market Monitoring.
We will address the recent progress in predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them.
We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will address some of the remaining challenges and limitations yet to be addressed in the area of drug discovery and safety assessment.
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized how chemicals are detected in complex matrices. However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. The US EPA has developed functionality within the CompTox Chemistry Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will introduce the tools and combined workflow including visualization and access via the Chemistry Dashboard. These tools, data, and visualization approaches within an open chemistry resource provide a freely available software tool to support structure identification and non-targeted analysis. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Computer aided drug design (CADD) uses computational methods to simulate drug-receptor interactions. CADD is heavily dependent on bioinformatics tools like homology modeling, similarity searches, and physicochemical modeling. These "gears" can be aided by greasing them through easier collaboration between researchers using tools of Web 2.0 like Google Docs, mind maps, and slide sharing to integrate CADD gears. Gaming consoles like the PS3 are also being explored as affordable clusters for CADD applications.
Molecular modelling can help reduce the time and risks of drug development. It is applied to target structural characterization, developing focused libraries for hit discovery, and lead development and optimization. Fragment-based drug design is an important advance, where drug candidates are built inside the target's binding site using small molecule fragments to improve affinity from micromolar to millimolar to nanomolar levels. Molecular modelling supports medicinal chemistry decisions by providing structural insights into how drug candidates interact with their targets.
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...Guide to PHARMACOLOGY
Presented by team member Chris Southan in April 2015 at the BPS Focused meeting in Edinburgh: Exploiting the new pharmacology and application to drug discovery.
drug delivery and formulation sciences in the most intelligent
way. This should be attained to fulfi l the ultimate goal for all
scientists to leave their experimental results all over the years
as footsteps for followers to walk on.
New Drug Design & Discovery discusses the process of drug discovery and design. It begins with an introduction to how drugs work in the human body to modulate functions. The drug discovery process is then described as a long, expensive endeavor involving chemical synthesis, clinical development, and formulation. Computer-aided drug design uses molecular modeling and structure-based approaches to predict ligand-receptor binding and identify biological targets in silico. Combinatorial chemistry and high-throughput screening allow for the rapid synthesis and testing of large libraries of compounds. The goal is to develop more potent and safer drugs through these computational and high-throughput methods.
Collaboraive sharing of molecules and data in the mobile ageSean Ekins
The document discusses collaborative drug discovery and the use of mobile applications in chemistry. It describes how the Collaborative Drug Discovery (CDD) platform allows researchers to securely share molecules and data. Examples are provided of collaborations between academic labs and pharmaceutical companies using the CDD vault to work on projects related to tuberculosis drug development. The rise of mobile devices is creating new opportunities for chemistry applications to enable collaborative workflows involving tasks like structure drawing, database searching, and data sharing from any location.
1. Collaboration and data sharing in science is essential but requires technological and cultural changes to allow for analysis and insights.
2. Improving data sharing across organizations is challenging due to heterogeneous systems and social barriers.
3. The Medicines Discovery Catapult is supporting the biopharma sector by developing solutions to improve data interpretation, collaborative data sharing between organizations, and by matching industry challenges to technology providers.
Assessing GtoPdb ligand content in PubChemChris Southan
The document discusses the content of ligands from the IUPHAR/BPS Guide to PHARMACOLOGY database (GtoPdb) that is contained within PubChem. It finds that GtoPdb ligands have extensive overlap with several other sources within PubChem, including patents, DrugBank, vendor structures, bioassays, and ChEMBL. This overlap allows users to find additional information on GtoPdb ligands from these complementary sources within PubChem.
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...Andrew McEachran
There is a growing need for rapid chemical screening and prioritization to inform regulatory decision-making on thousands of chemicals in the environment. We have previously used high-resolution mass spectrometry to examine household vacuum dust samples using liquid chromatography time-of-flight mass spectrometry (LC-TOF/MS). Using a combination of exact mass, isotope distribution, and isotope spacing, molecular features were matched with a list of chemical formulas from the EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database. This has further developed our understanding of how openly available chemical databases, together with the appropriate searches, could be used for the purpose of compound identification. We report here on the utility of the EPA’s iCSS Chemistry Dashboard for the purpose of compound identification using searches against a database of over 720,000 chemicals. We also examine the benefits of QSAR prediction for the purpose of retention time prediction to allow for alignment of both chromatographic and mass spectral properties. This abstract does not reflect U.S. EPA policy.
Computer aided drug design uses computational approaches to aid in the drug discovery process. There are several key approaches including ligand based approaches which identify characteristics of known active ligands, target based approaches which use information about the biological target, and structure based drug design which utilizes 3D structural information. The main steps in drug design include target identification and validation, lead identification and optimization, and preclinical and clinical trials. Computational tools are used throughout the process for tasks like molecular docking, ADMET prediction, and structure activity relationship analysis.
Virtual Screening and Hit PrioritizationPuneet Kacker
This document discusses virtual ligand screening (VLS) as an alternative to high-throughput screening for identifying potential drug candidates. It describes the VLS process, which involves selecting a target and compound library, preparing the target and ligands, running a docking simulation to analyze ligand-target binding, and prioritizing hits. The document outlines advantages of computational methods like VLS compared to experimental screening, as well as some limitations. It also provides examples of free and commercial docking engines that can be used and highlights challenges in VLS like accounting for receptor flexibility.
Multiplexing analysis of 1000 approved drugs in PubChemChris Southan
This document analyzes structural representations of 1000 approved drugs in PubChem. It finds that on average, each drug structure was submitted 81 times under different IDs. Each drug was also represented in 44 mixtures on average. The increased availability of patent data is contributing to issues like increased representations of unspecified chirality and increased "virtual" structures rather than real compounds. As chemical data in PubChem grows towards 100 million entries, structural multiplexing could make it harder to identify correct drug representations and is a problem for inexperienced users of large chemical databases.
Computer aided drug design uses computational methods to aid in the drug discovery process. It can be used for both structure based drug design where the biological target structure is known, as well as ligand based drug design where the target structure is unknown. Structure based drug design techniques include molecular docking to study ligand binding poses and interactions. Ligand based techniques include pharmacophore modeling to identify chemical features important for activity and quantitative structure-activity relationships to correlate chemical structure to biological activity. These computational methods allow for more rapid and cost-effective discovery and optimization of drug candidates compared to traditional experimental methods alone.
There is a growing need for rapid chemical screening and prioritization to inform regulatory decision-making on thousands of chemicals in the environment. We have previously used high-resolution mass spectrometry to examine household vacuum dust samples using liquid chromatography time-of-flight mass spectrometry (LC-TOF/MS). Using a combination of exact mass, isotope distribution, and isotope spacing, molecular features were matched with a list of chemical formulas from the EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database. This has further developed our understanding of how openly available chemical databases, together with the appropriate searches, could be used for the purpose of compound identification. We report here on the utility of the EPA’s iCSS Chemistry Dashboard for the purpose of compound identification using searches against a database of over 720,000 chemicals. We also examine the benefits of QSAR prediction for the purpose of retention time prediction to allow for alignment of both chromatographic and mass spectral properties. This abstract does not reflect U.S. EPA policy.
Computer aided drug design (CADD) uses computational methods to design and optimize drug molecules. There are three main approaches: ligand-based designs new drugs based on similarities to known active ligands; structure-based uses the 3D structure of the receptor to design ligands that bind to it; and de novo design custom designs ligands without prior information by identifying interaction points on the receptor and connecting ligand groups of atoms to those points. The goal is to design ligands that selectively bind and interfere or enhance biological targets related to diseases.
Computer-aided drug design (CADD) is a widely used technology using computational tools and resources for the storage, management, analysis and modeling of compounds. It relies on digital repositories for study of designing compounds with physicochemical characteristics, predicting whether a given molecule will be combined with the target, and if so how strongly. Computer based methods can help us to search new hits in drug discovery, screen many irrelevant compounds at the same time and study the structure-activity relationship of drug molecules.
This document describes RxnFinder, a database workflow tool for organic synthesis researchers. It contains over 2 million chemical reactions and 1.7 million substances. Researchers can use it to search for synthetic routes, reaction conditions, catalysts, and more. Key features include its focus on novel methods, inclusion of failed reactions, and display of full reaction schemes with scope and limitations. The document provides examples of different types of searches researchers can perform, including transformational, substructure, and property searches. It also describes how search results in RxnFinder can link to detailed article information on Wiley Online Library.
Drug discovery and development is a long and expensive process over time has notoriously bucked Moore's law that it now has its own law called Eroom's Law named after it (the opposite of Moore). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of drug failures. Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible becomes all the more important to accelerate drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains. Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories: 1. Classification 2. Regression 3. Read-across. The talk will also cover how by using a hierarchical classification methodology you can simplify the problem of assessing toxicity of any given chemical compound. We will also address recent progress of predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them. We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will also address some of the remaining challenges and limitations yet to be addressed in the area of drug safety assessment.
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...Frederik van den Broek
Slides from my talk at the ACS CINF Symposium on Chemical Nomenclature & Representation on 26 August 2019 in San Diego.
Abstract:
The first edition of the Beilstein Handbook of Organic Chemistry was published nearly 140 years ago. Electronic laboratory notebooks have been in use in chemistry for almost 20 years. And the life science industry still doesn't have a well-defined way of capturing and exchanging information about chemical reactions and relies on imprecise or vendor-specific data formats. Without a common language and structure to describe experiments, data integration is unnecessarily expensive and a significant part of published data has not been readily available for processing or analysis.
The Unified Data Model (UDM) project team aims to improve the situation. UDM is a collective effort of vendors and life science organizations to create an open, extendable and freely available reference model and data format for exchange of experimental information about compound synthesis and testing. Run under the umbrella of the Pistoia Alliance, the project team has published two releases of the UDM data format and it is expected that the model will continue to be improved as demand stipulates working with the Pistoia FAIR data implementation by industry community.
Drug discovery and development is a long and expensive process and over time has notoriously bucked Moore’s law that it now has its own law called Eroom’s Law named after it (the opposite of Moore’s). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of the failures.
Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible is paramount in accelerating drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains.
Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories:
1. Discovery,
2. Toxicity and Safety, and
3. Post-Market Monitoring.
We will address the recent progress in predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them.
We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will address some of the remaining challenges and limitations yet to be addressed in the area of drug discovery and safety assessment.
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized how chemicals are detected in complex matrices. However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. The US EPA has developed functionality within the CompTox Chemistry Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will introduce the tools and combined workflow including visualization and access via the Chemistry Dashboard. These tools, data, and visualization approaches within an open chemistry resource provide a freely available software tool to support structure identification and non-targeted analysis. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Computer aided drug design (CADD) uses computational methods to simulate drug-receptor interactions. CADD is heavily dependent on bioinformatics tools like homology modeling, similarity searches, and physicochemical modeling. These "gears" can be aided by greasing them through easier collaboration between researchers using tools of Web 2.0 like Google Docs, mind maps, and slide sharing to integrate CADD gears. Gaming consoles like the PS3 are also being explored as affordable clusters for CADD applications.
Molecular modelling can help reduce the time and risks of drug development. It is applied to target structural characterization, developing focused libraries for hit discovery, and lead development and optimization. Fragment-based drug design is an important advance, where drug candidates are built inside the target's binding site using small molecule fragments to improve affinity from micromolar to millimolar to nanomolar levels. Molecular modelling supports medicinal chemistry decisions by providing structural insights into how drug candidates interact with their targets.
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...Guide to PHARMACOLOGY
Presented by team member Chris Southan in April 2015 at the BPS Focused meeting in Edinburgh: Exploiting the new pharmacology and application to drug discovery.
drug delivery and formulation sciences in the most intelligent
way. This should be attained to fulfi l the ultimate goal for all
scientists to leave their experimental results all over the years
as footsteps for followers to walk on.
New Drug Design & Discovery discusses the process of drug discovery and design. It begins with an introduction to how drugs work in the human body to modulate functions. The drug discovery process is then described as a long, expensive endeavor involving chemical synthesis, clinical development, and formulation. Computer-aided drug design uses molecular modeling and structure-based approaches to predict ligand-receptor binding and identify biological targets in silico. Combinatorial chemistry and high-throughput screening allow for the rapid synthesis and testing of large libraries of compounds. The goal is to develop more potent and safer drugs through these computational and high-throughput methods.
Combinatorial chemistry is a technique used to rapidly produce large numbers of similar drug molecules to screen for new drugs more efficiently. It uses solid-phase synthesis with resins, monomers, and linkers under the same reaction conditions to generate compound libraries. This allows companies to increase their chances of finding novel therapeutic compounds and save time and money compared to traditional drug discovery methods. While efficiency depends on factors like compound size and solubility, combinatorial chemistry is expected to continue significantly impacting drug manufacturing.
This document discusses combinatorial chemistry and high-throughput screening methods. It provides an introduction to combinatorial chemistry, explaining that it allows for rapid synthesis of large numbers of structurally similar molecules. High-throughput screening is then discussed as a process for efficiently testing large libraries of potential drug compounds. Some key advantages of combinatorial chemistry are rapid synthesis, ability to produce large numbers of compounds, and increased likelihood of success in drug discovery. Potential challenges are difficulty in characterizing unexpected products and substrate selection. Current applications discussed include drug development areas like histamine receptor antagonists and dihydrofolate reductase inhibitors.
Computers play several important roles in the drug discovery process:
1) They analyze thousands of molecular structures and properties to identify candidate molecules that may bind to disease targets. This virtual screening allows faster evaluation of large libraries.
2) Databases organize data on chemical structures to facilitate computer-aided searches for promising drug candidates.
3) Software allows scientists to visualize and model molecular interactions, guiding the design of molecules that optimally bind to targets.
The document discusses the applications of bioinformatics in drug discovery. It describes how bioinformatics supports computer-aided drug design through computational methods to simulate drug-receptor interactions. It also discusses how virtual high-throughput screening can identify compounds that strongly bind to protein targets. The document outlines the key steps in drug design, including identifying the disease target, studying lead compounds, rational drug design techniques, and testing drugs. It emphasizes that bioinformatics can predict important drug characteristics like absorption and toxicity to save costs during development.
Pharmaceutical companies use computers in many aspects of the drug discovery process. Computers allow researchers to analyze thousands of molecular structures and rapidly search databases to identify promising drug candidates that can bind to disease targets. They use computational modeling and simulations to predict how well a molecule will bind to and affect its target. This helps streamline the process of discovering and developing new drugs compared to traditional trial-and-error methods. Computers play a key role in expediting tasks from target identification to lead optimization and preclinical testing.
This document discusses combinatorial chemistry, which is a technique used to rapidly generate large libraries of compounds for screening and drug discovery. It defines combinatorial chemistry as producing large numbers of similar molecules using the same reaction conditions. The key principles are that it allows preparation of thousands of compounds per month using parallel synthesis techniques like solid and solution phase chemistry. This increases the chances of identifying hit compounds for pharmaceutical development compared to traditional synthetic methods. Applications of combinatorial chemistry include drug discovery, agrochemical and biotechnology research by creating molecular diversity libraries for high-throughput screening.
Computer aided drug design (CADD) uses computer modeling to help design and discover new drug molecules. It involves designing molecules that are complementary in shape and charge to bind to a biomolecular target like a protein. This can help drugs activate or inhibit the target to produce therapeutic effects. CADD is not a direct route to new drugs but provides information to guide and coordinate drug discovery experiments in a more efficient manner. It is hoped CADD can help save time and money in the drug development process.
Bioinformatics plays an important role in drug discovery and development by enabling target identification, rational drug design, compound refinement, and other processes. Key applications of bioinformatics include virtual screening of large compound libraries to identify potential drug leads, homology modeling of protein structures to inform drug design, and similarity searches to find analogs of existing drug molecules. The overall drug development process involves studying the disease, identifying drug targets, designing compounds, testing and refining candidates, and conducting clinical trials. Computational techniques expedite many steps but experimental validation is still needed.
In silico Drug Design: Prospective for Drug Lead Discoveryinventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
The drug discovery process involves several steps:
1) Hits from high-throughput screening are identified, which may have many potential scaffolds.
2) Hit-to-lead involves synthesizing many compounds to determine structure-activity relationships and improve properties.
3) Lead optimization aims to increase potency, selectivity, and in vivo efficacy while maintaining favorable properties. Efficient synthesis and parallel chemistry methods are important throughout the process.
Bioinformatics role in Pharmaceutical industriesMuzna Kashaf
Bioinformatics plays a key role in the pharmaceutical industry by enabling target identification of diseases, rational drug design, compound refinement, and other processes. It facilitates identifying target diseases and compounds, detecting molecular bases of diseases, designing drugs, refining compounds, and testing drug solubility and effects. Bioinformatics supports various stages of drug development including formulation, crystallization determination, polymer modeling, and testing before human use. Its integration into the pharmaceutical industry supports drug discovery, healthcare advances, and realizing the promises of projects like the Human Genome Project.
Back Rapid lead compounds discovery through high-throughput screeningrita martin
High-throughput screening process are used by today most of the drug discovery industries, this process helps pharmaceutical researches to make drug discovery process faster and also increase the quality and quantity of drugs production. This process in combination with robotics, data processing and control software, liquid handling devices and sensitive detectors allows a researcher to quickly conduct millions of chemical, genetic or pharmacological tests
1. Bioinformatics uses computer science and information technology to analyze biological data and assist with drug discovery. It helps identify drug targets and design drug candidates.
2. The drug design process involves identifying a disease target, studying compounds of interest, detecting molecular disease bases, rational drug design, refinement, and testing. Bioinformatics tools assist with each step.
3. CADD uses computational methods to simulate drug-receptor interactions and is heavily dependent on bioinformatics tools and databases. It supports techniques like virtual screening, sequence analysis, homology modeling, and physicochemical modeling to aid drug development.
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)suhaspatil114
This document provides an overview of lead identification in drug discovery. It discusses various methods for identifying lead compounds, including combinatorial chemistry, high-throughput screening, and in silico lead discovery techniques. Combinatorial chemistry allows for the rapid production and screening of large compound libraries. High-throughput screening assays test large numbers of compounds against biological targets using automated technologies. In silico methods like molecular docking use computer simulations to predict how compounds may bind and interact with targets. The goal is to find initial "hit" compounds that can then be optimized into drug candidates.
This document provides information about Anthony Crasto, a Glenmark scientist based in Navi Mumbai, India. It summarizes that he runs several free websites that provide drug and pharmaceutical information which have received millions of hits on Google. These websites help track new drugs worldwide and provide free advertising to help millions. Despite facing personal challenges with his son's health issues, Crasto's vast readership from academia and industry motivates him to continue his work through these websites.
Slides for burroughs wellcome foundation ajw100611 sefinalSean Ekins
- The document discusses open science and how greater collaboration and data sharing between pharmaceutical companies, academia, and other groups could help accelerate drug discovery. It argues that pre-competitive data and models being made freely accessible in public domains could deliver high value.
- It provides examples of existing open databases and tools but notes that more coordinated efforts are needed to clean up chemistry data on the internet and support those promoting open science. Integrating proprietary and public data through open interfaces in collaborative platforms could benefit all parties.
This document discusses the progress and challenges of structure-based drug design. It notes that there has been explosive growth in this field in recent years due to improvements in molecular biology techniques, protein purification methods, overexpression systems, data collection methods, and computing power. These advances have led to a rapid increase in the number of protein structures solved. However, it also notes that moving from discovering a tightly-binding inhibitor to developing an actual drug faces many challenges, as a compound still needs to be safe, stable, and able to reach its target in the body without adverse effects. The final verdict on how successful structure-based drug design will be long-term will take decades to determine as more cases are studied.
1. Abhishek Pai 1319400
1 | P a g e
SCHOOL OF CHEMISTRY
CHM3B2 – Literature Project
Literature Review
Fixing Problems and Finding Alternatives to Combinatorial Chemistry to
increase molecular diversity
Abhishek Pai
1319400
April 2016
PROJECT SUPERVISOR: Dr John Wilkie
2. Abhishek Pai 1319400
2 | P a g e
Abstract
Drug discovery and development is a very difficult area of research and consumes a large
amount of resources to produce effective results. Researchers have found a technique of
synthesising large numbers of molecules called combinatorial synthesis. The use of this
method meant that libraries increased in size dramatically anywhere from 100,000s to
1,000,000s of compounds. The production of these compounds saves the pharmaceutical
industry a lot of resources that would normally be used to discover new compounds through
fieldwork. But they discovered a new problem with this synthesis technique; it would produce
large libraries but introduced the added disadvantage of having to find compounds that were
therapeutically active or diverse in structure and property. The large numbers became more
of a hindrance than a solution; the best analogy for this problem is “Needle in a haystack”.
The diversity aspect of drug discovery was also very important as it meant that novel drug
treatments could be found. This literature review focusses on two aspects, firstly solving the
issues related to finding compounds within combinatorial libraries, secondly finding
alternative routes of synthesis that may produce a larger range of molecular diversity for
drug discovery and development.
Introduction
Currently within the pharmaceutical industry the most popular method of increasing the
number of available compounds involves the use of combinatorial chemistry. The classical
method of discovering new molecules required fieldwork to obtain new and interesting
compounds that chemists can then manipulate and test.
An example of such fieldwork is the exploration of remote biomes such as rain forests,
deserts, tundra’s etc. The rainforest is an incredible breeding ground for compounds that
have never seen in labs. There is enormous potential for discovery due to their isolation from
humans. Sloths are an example of creatures that have a lot of potential to produce new
research material. Their fur is believed to be an excellent breeding ground for new species of
bacteria and fungi. Researchers are hoping to exploit this to obtain novel antibacterial and
antifungal compounds. The serendipitous discovery of new drug compounds is what the
researchers are looking for, since the fungi and bacteria are able to evolve new ways of
combating problems such as antibiotic resistance that is very prevalent around the globe.
But this technique is very time-consuming and cost inefficient. It requires the investments of
large sums of money and the returns may never be as fruitfuli
.
3. Abhishek Pai 1319400
3 | P a g e
This is where combinatorial chemistry is comes in as it is very efficient at producing large
numbers of diverse compounds which may occur naturally in wild. It doesn’t require the man
power and resources like fieldwork. The process of combinatorial chemistry speeds up the
process of creating a large library of compounds by avoiding the conventional route of
chemical synthesis. The conventional method involves reacting compounds A and B to form
the product AB. This process is very slow and only produces one compound whereas
combinatorial chemistry uses several derivatives or analogues of compounds A and B to
form combinations of products. For example, combining A1, A2, A3 … An and B1, B2, B3 … Bn
to form A1B1, A1B2, A2B1 … AnBn, Figure 1
shows how the process of combinatorial
synthesis works. The diversity of compounds
produced is directly related to the variation of
the initial compounds that are available for
the reaction. This enables the Pharmaceutical
companies to build large libraries of
compounds over short periods of time due to
the rapid nature of this process. It is
hypothesised that increasing the number of
compounds in the combinatorial library also makes it more likely that compounds that are
therapeutically active are discoveredii
. It is important to note that not all the compounds
produced using this method will work in medicinal chemistry; it may have uses in other fields
such as food, cleaning, petro-chemical etc. it may also be useless, this factor is explored
later on in the literature review.
Means of testing the large numbers of compounds produced by combinatorial synthesis for
further development and commercial viability were also developed called High Throughput
Screening (HTS) and High Throughput Docking (HTD). HTS uses live cultures of cells with
the aid of robotics and computerised techniques to filter the large combinatorial libraries from
millions of compounds down to hundreds of compounds. Since not every compound within
the library has a pharmacological effect they eliminate them from the testing sample. They
test these libraries of compounds on live cell cultures to detect any response that maybe
produced that shows promising signs of being used for therapeutic purposes.
On the other hand, HTD is a computational simulation that analyses the 3D structure of the
target and the compound to find the ideal conformation for binding. This method is
sometimes considered too rigid as it can only use data points inputted into the simulation to
determine the binding ability of the compound. Since research on compounds and targets in
ongoing to determine their binding points new information is constantly being discovered
Figure 1
4. Abhishek Pai 1319400
4 | P a g e
about the way a compound binds to a target. This lack of information and flexibility means
that the binding capability of some compounds may be completely ignored, whereas HTS
would’ve detected this capability due to the flexibility and fluidity of the live cell cultures. But
HTD has the ability to use ranking systems to list compounds that are the most likely to bind
and produce a response in cell cultures. As it is a computational method of screening
potential drug candidates it is very cheap and efficient at producing results. Increasing the
speed of testing is simply a matter of increasing the computing power which has become
cheaper over the years due to developments in computing technologyiii
.
Review of Literature
Fixing Problems associated with Combinatorial Libraries
The large size of combinatorial libraries has become a little bit of a hindrance for drug
discovery and development. It has become very difficult to assess the viability of all the
compounds and determine their value to researchers. So over the years analytical
techniques have been developed with the aid of computer software to search through
libraries and find useful compounds, but also to determine the relative molecular diversity
and even aiding in the acquisitions of company libraries. The techniques discussed below
are currently being used to fix the issues that researchers and companies face when dealing
with combinatorial libraries.
Virtual Docking
The development of innovative software techniques is being used to enable more accurate
and efficient docking of compounds to targetsiv
. Testing each and every single compound in
a library using HTS is very time consuming and incredibly expensive for researchers. This is
where HTD has been used as an alternative, open source software such as Autodock 4 has
been developed as a means of achieving these goals. The nature of this software
development allows anyone from around the world to tweak and improve the software to
their needs. They can also submit improvements for everyone to use and share, due to the
open source nature of the softwarev
. Current forms of docking software were mainly used to
input a large array of compound into the simulation to bind to the one specific target. One
great leap that needs to be taken is the use of flexibility in target receptors, as they are
biological components and therefore they don’t have a rigid structure but rather a certain
amount of fluidity. This increases the chances of a conformational match of novel new
compounds that would’ve never been considered as potential candidates for future drug
development. It also allows researchers to find and eliminate the possibilities of side effects
5. Abhishek Pai 1319400
5 | P a g e
and adverse reactions when testing the drug. This can be a major issue as compounds can
have multiple sites of action which may be unknown to the researchers, these may be hard
to detect during early phases of the drug trial. Thus further testing of particular compounds
can become a waste of resources for pharmaceutical companies. This technique increases
the overall success rate of further drug development trialsvi
.
Compounds have several different spatial conformations, this can be exploited as it
increases the diversity of molecular shapes that are available to interact with and bind to
receptors. But this diversity in conformation isn’t exclusive to compounds, it also affects
receptors, as they are also fluid and flexible, this is believed to be the next breakthrough in
improving the results obtained by High Throughput Docking. Similar to drug compounds,
receptors have bonds that rotate and bend to produce different conformations and locations
for the drug compounds to interact and bind. It is this relaxed nature of proteins that needs to
be incorporated into the molecular docking software to increase the possibilities of docking
of unconventional and novel compounds. The research into receptor and protein fluidity is
still ongoing and there is still much to be discovered. But the inclusion of the limited data that
is available is a good start to reducing unwanted interactions and reactions in the body.
Researchers have used data collected from multiple crystallographic receptor conformations
and incorporated this into the HTD. The reasoning behind such an approach is to use HTD
software to test unconventional binding locations for diverse range of compounds. The
inclusion of protein flexibility will enable the docking software to predict any possible
interactions and binding that has not been previously observed in conventional HTD
methods. This technique increases the number of possible active compounds that are
discovered which would usually be considered as nonviable. It also enables the increase in
diversity of compounds that are found to be therapeutically active for further drug
developmentvii
.
Usually compounds that appear to be very similar in structure can have vastly different
interaction and binding properties. This can cause a great deal of hindrance to researchers
as they may use HTD programs that only analyse the structure of a compound and
determine that it might be therapeutically active and viable for further development. But in
reality these compounds aren’t able to bind effectively to their targets and produce an
effective response; these sorts of compounds only slow the progress of drug development. A
new development has been made that uses bioassays in order to calculate the IC50 of each
compound against a range of proteins, this data is used to create a bioactive profile called
affinity fingerprint. This information in conjunction with the structural data already available in
combinatorial libraries can aid HTD immensely. It eliminates compounds that have similar
structure but zero to very low biological activity, but increases the scope of HTD to find
6. Abhishek Pai 1319400
6 | P a g e
compounds with a diverse range of structures which would never have been considered for
further developmentviii
.
Inverse Docking
Inverse-docking is another technique that can be used to search through databases of three-
dimensional protein targets to find cavities where the ligand can form a successful bond. The
theory is implemented to help predict any unwanted binding interaction that may occur. This
process when incorporated into a ranking system will increase the accuracy of protein
binding when transferred onto the experimental phase. The process of docking is conducted
by testing each ligand conformation against the protein for the bonding sites. The cavity
within the protein is tested to ensure the least amount of energy is required for both the
ligand and protein to form a bond. The docking software calculates these energy values to
enable better ranking of potential drug development. The term “inverse” refers to the
software finding proteins that bind to ligands that isn’t the primary target. The use of inverse-
docking to input all variations of receptors and test against them all will result in
improvements in drug development, increasing the efficiency and reducing costs by
eliminating ligands that are not viable for drug developmentix
.
Molecular Descriptors
Molecular descriptors are a vital part of pharmaceutical chemistry, they take the chemical
structure and all the information that can be provided from it and convert it into numerical
data. The symbolic representation of chemical structure is converted into mathematical data
such as affinity, efficacy, polarizability, hydrophilicity, lipophilicity etc. using bioassays; this
data is more useful to researchers. When searching for new compounds, it is more effective
to find compounds which have differing properties and structures as too many molecular
descriptor similarities will result in a waste of time and resources for the pharmaceutical
company. However, differences in certain molecular descriptors will produce compounds
that are very diverse in their structure and physical properties that may aid in the discovery
of novel drug treatment options. Activity Island is a region on the molecular descriptor graph
that indicate molecule with the ability to be active compounds. The information gained from
the molecular descriptor graph can be used to modify drugs and produce newer improved
versions. Another parameter that is used is called neighbourhood region this is a boundary
around the properties of any compound that is created so that this region can be avoided as
any compound produced would have very similar structural and physical properties. Figure 2
shows the difference between typical and ideal compound libraries and the distribution of
7. Abhishek Pai 1319400
7 | P a g e
compounds using two molecular
descriptors. It also shows the
neighbourhood regions and activity
islands that can be explored to
produce more diverse drugs.
Researchers choose specific molecular
descriptors and plot the data of large
combinatorial libraries to determine the
activity island and neighbourhood
regions. They can then focus on
avoiding the neighbourhood regions
around a compound and focus on creating molecules that fit into the activity island. The
avoidance of neighbourhood regions will reduce the risk of producing compounds with
similar properties and structures, thus increasing molecular diversity in the processx
.
A study was conducted on the molecular diversity of chemical databases using the
molecular descriptors to compare five different databases. CMC and MDDR are two of the
databases contained medicinal compounds, ACD and SPECS contain commercially
available chemicals and the Welcome Registry a database containing potential biochemical
compounds. Using the descriptors the researchers were able to identify the super-population
of compounds that had very similar properties and structures to one another. This enabled
the discovery of two things, firstly commercially available compounds that were also
medicinally active and secondly the discovery of outliers that could have some potential to
be developed into therapeutic drugs. This was only achieved by the superimposition of the
five databases using specific molecular descriptors and producing a single metric that can
be compared between each database. Using this technique diverse compound was
discovered for further developmentxi
.
Combining Libraries
The information collected from molecular descriptors, activity islands and neighbourhood
regions can be used very effectively to combine large combinatorial libraries. Since the
advent of combinatorial chemistry the pharmaceutical industry has been churning out new
compounds and increasing the size of their combinatorial libraries. This has resulted in very
large libraries of compounds, as corporations have a vested interest in discovering new drug
compounds they will make calculated decisions to combine their libraries to increase their
library sizes and in-turn increase their chances of making new discoveries. Companies can
Figure 2
8. Abhishek Pai 1319400
8 | P a g e
continue to use combinatorial synthesis and expand their libraries but an alternative is
combining libraries to fill chemical spaces or activity islands that already exist within your
own library. The paper, “Rendezvous in chemical space? Comparing the small molecule
compound libraries of Bayer and Schering”, discusses the combining of large combinatorial
libraries of two companies and assesses the advantages and disadvantages of combining
these two particular libraries togetherxiixiii
.
The technique that was used to test these libraries is called LASSOO. It stands for Library
Acquisition with Simultaneous Scoring to Optimize Ordering and it is used to determine
whether an external and internal library is worth combining together for both companies. It
does this by plotting the molecular descriptors of compounds and determining whether new
compounds have the desired characteristics that are unique to the external library when
compared to the internal library by using a scoring method. The external library needs to be
diverse enough to by fill in “chemical spaces” that the pharmaceutical company have within
their internal library.
Figure 3 shows a two dimensional
representation of different molecular
descriptor for an external (Target)
and Internal (Current) libraries. The
composite is a combination of the
two libraries and the areas where the
combining of the libraries will be
favourable and unfavourable. The
light grouping suggests that the
compounds are favourable to being
added into the internal library; meanwhile the dark groupings suggest it is unfavourable for
combining. This scoring method helps researchers to determine whether this target library is
worth merging with as the compounds it contains are different from the current library. By
using LASSOO the companies can determine the relative distribution and diversity of
molecular descriptors when comparing the two libraries, the benefits associated with
combining the libraries together. The image shown above only utilises 2 molecular
descriptors to produce a 2-Dimensional graph of data but more descriptors can be included
to produce 3-,4-,… dimensional graphs. The graphs at higher dimensions will be difficult to
visualise but the software can use its scoring method to determine the viability of the library
mergerxiv
.
Figure 3
9. Abhishek Pai 1319400
9 | P a g e
The author of the paper analysed the compounds from Bayer and Schering. After doing the
analysis it was found that the structural identity was very low but the physico-chemical
overlap was very similar. The overlap in chemical spaces was also very significant meaning
that the company’s libraries were very complimentary to one another. The decision was
taken to keep the libraries separate and in-house but instead information regarding “hit” lists
would be exchanged periodically, this was seen as a better option. Combining two extremely
large libraries would require a large amount of resources but also increase the hindrance
when screening the library. Screening compounds in each library independent of one
another would pose a better chance of making hits on lead compounds for drug
development.
Analysing Drug and Non-Drug compounds within Libraries
Another problem that arises frequently in combinatorial chemistry is how to distinguish
between “drug” and “nondrug” compounds. The combinatorial synthesis process uses
biologically active compounds to produce a larger number of iterations and derivatives of
that compounds but the compounds produced won’t necessarily be biologically or
therapeutically active. The large size of libraries makes the process of testing compounds
resource intensive by using HTS. Discriminating between biologically active and inactive
compounds is vital in reducing the redundancy of chemical compounds in the library. By
testing and eliminating these redundant chemicals from the library the researchers can
increase the speed of screening for viable compounds for drug development.
A software scoring method has been developed to overcome this problem, by automatically
and rapidly classifying compounds into “drug” and “nondrug” categories. The researchers
developing this technique used two databases of compounds to validate their theory,
169,331 molecules from the Available Chemicals Directory (ACD) and 38,416 molecules
from the World Drug Index (WDI). They used these publically available libraries as they were
easily accessible and the molecules were already validated by the pharmaceutical
community and government regulators. This software scoring method was able to classify
83% of ACD and 77% of WDI as medicinal drugs. It is important to highlight that the WDI
only contains biologically active drugs and the scoring scheme should have determined that
most of all the compounds to be drugs accounting for false negative and positives. So there
is room for improvement in the scoring method to increase its effectiveness. The potential
uses for this scoring method is testing the large numbers of diverse molecules in libraries
and determine whether or not they can be used as potential drugs. It can also be used when
combining two large libraries to make sure that libraries containing more potential drug
10. Abhishek Pai 1319400
10 | P a g e
compounds rather than non-drug compounds, this enables them to selectively purchase or
even test compounds in large librariesxv
.
Analysing Molecular Diversity
The main focus of research has been in searching through vast combinatorial libraries of
compounds using HTD or HTS without any regard for molecular similarity or diversity. The
techniques discussed above are all trying to use computer software to increase the
probability of finding a compound for drug development by analysing and/or combining
libraries. Due to the lack of data on molecular diversity researchers have used mass
spectrometry as an analytical technique that can be used to produce quantitative data that
can be vital in drug development. Mass spectrometry works by breaking down large
compounds into small fragments of ionised compounds. The breakdown of the main
compound is very unique and produces a fingerprint on spectra. The unique nature of the
fingerprint helps researchers analyse and determine which compound is being tested. The
molecular fingerprint has been used to create a fragment dictionary that had been indexed
with a specific pattern of fragment. Since fragments produced by two very similar compound
structures are also going to be very similar, the spectral information can be used to
determine the relative diversity of compounds as comparing two compounds with very
diverse structure will produce fragments which are also very diverse in structure. The data
created from this technique is called structural keys and hashed fingerprint. Comparing this
information will help give a quantitative method of analysing the relative diversity of
compounds within a libraryxvi
.
Alternatives to Combinatorial Synthesis to increase Molecular
Diversity
The main problem with all the techniques that were discussed above they analyse
compounds synthesised using combinatorial synthesis, they all focus on creating larger
libraries and analysing these libraries to find a “hit” and then take this compound onto
producing a lead compound for development. Diversity is very important when it comes to
discovering a novel treatment method for a particular disease, but combinatorial synthesis
doesn’t seem to be able to provide this however. Combinatorial synthesis is very good at
producing large numbers of new compounds, most of which start from a pool of compounds
that may be very similar in structure. This similarity in structure is carried over from the
starting compounds to the final products, resulting in a lack of diversity. In order to solve this
issue I try finding new methods of synthesis that is different from combinatorial synthesis that
can produce compounds that have a larger diversity in structure.
11. Abhishek Pai 1319400
11 | P a g e
Structure-based synthesis
Structure-based synthesis, also known as target-oriented synthesis (TOS), is based on
designing a drug with a specific structure rather than discovering it in nature or a large library
of compounds. Data on the 3-D shape of the target molecule is collected using X-Ray
crystallography or NMR spectroscopy, this information is then used to design drugs that fit
into the target site. Since the target site is already known the drug candidates can be
developed to have higher affinities and selectivity. This can also help reduce the number of
drugs binding to secondary sites and causing adverse side effects that result. The main
issue with combinatorial chemistry is the large number of compounds that it produces. Most
of the compounds produced may never be useful as therapeutic drugs.
By using the structural information of
the target sites and the rapid rate of
producing large number of compounds
from combinatorial chemistry we will
see a huge leap forward in the
discovery of molecularly diverse yet
target specific drugs. This technique
uses a base molecule that fits easily
into the target and then produces
copious numbers of derivatives that
have a much higher probability of
working in the target site. Figure 4 shows the diversity of compounds produced by
combinatorial synthesis although these are very diverse they aren’t selective or specific in
their target site. The integration of these two methods involves finding an initial candidate
that fits the target site e.g. Figure 5 shows the steroid backbone that is shared by all the
derivatives of the compound, this backbone enables the compounds to bind to the target site
with greater selectivity but at the same time have varying affinity, efficacy and other
properties, improving the therapeutic indexes of new
drug treatmentsxvii
.
Research conducted by UC Berkley and UC San
Francisco used this integrative concept of structural
based drug design and combinatorial chemistry.
They identified a protein called Cathepsin D which
doesn’t have any potent inhibitors that are currently
available in the market. They found the smallest
Figure 4
Figure 5
12. Abhishek Pai 1319400
12 | P a g e
molecule that can inhibit this protein and created a second generation library where they
used combinatorial synthesis to produce new compounds based off a basic scaffold. This
second generation library can then be searched and filtered appropriately to find compounds
that have a specific range of properties. The researchers used the property of potency to
search through libraries with a diverse range of compounds and receptor targeted libraries. It
was found that the compounds discovered in the second generation receptor targeted
combinatorial library were 5 to 6 fold more potent than the compounds discovered in the
diverse library. The success of this study shows the potential of targeted structure-based
combinatorial chemistry and the great benefits that it holds in producing compounds which
are viable for therapeutic usexviii
.
Fragment-based synthesis
Fragment-based drug discovery (FBDD) is a technique based on discovering small
molecules that weakly bind to the target site and build up the molecules to produce large,
complex, diverse molecules with varying properties such as higher affinity, efficacy,
selectivity etc. The process employs HTS to test small compounds against a variety of
different target site in vitro. HTS is carried out on compounds smaller than 500Da from large
combinatorial libraries that may contain between 100,000s to 1,000,000s of compounds. The
weight limit reduces the number of molecules that need to be tested significantly reducing
the amount of time it takes to find a suitable candidate for FBDD. The strategy of this
technique is to build large complex molecules from small simple moleculesxix
.
Research has been conducted on combining both computational chemistry and Fragment-
based drug discovery to make the process of discovering a fragment and producing
fragment-to-lead (F2L) much more efficient. Computational chemistry is also used to narrow
down the large libraries of compounds
to produce smaller libraries containing
fragments that can be used in FBDD.
The creation of a focussed library of
fragment increases the efficiency of
discovering new potential drug
compounds. With the aid of Bioassays,
NMR and X-Ray Crystallography the
structures of compounds and ligands
can be determined. This information is
then fed into HTD to determine the
relative compatibility of the compoundFigure 6
13. Abhishek Pai 1319400
13 | P a g e
called hit conformation. The compounds are then tested in vitro to determine their structure
activity relationship (SAR) this information can be used to further develop the compound to
change their potency, affinity etc. Figure 6 shows the process of FBDD and how it filters
compounds down the flow chart, they are then built up to become larger molecules resulting
in increased molecular diversity. In the pharmaceutical research market the filing of patent
for intellectual property is very crowded but the use of FBDD to start from small compounds
to build larger compounds means that the potential for conflict is very low. The compounds
that are produced via FBDD can vary dramatically even when starting from the same initial
compound fragment. This increases the molecular diversity quite considerably and results in
the production of new novel drug compounds for therapeutic usexx
.
Diversity-oriented synthesis
Combinatorial chemistry has been very useful at producing a large number of compounds
resulting in very large libraries but there is an issue with the compounds that are produced,
they lack diversity. They are too similar in structure as they all start from the same set of
starting compounds resulting in iterative derivatives of compounds with very similar
structures. Figure 7 shows a compound before and after it was put through the process of
combinatorial synthesis, it shows the relative similarity of the structure of the starting and
finishing compounds. They have very little change in structure; the only change that does
occur is the addition or removal of functional groups. Target-oriented Synthesis (TOS), also
known as structure-based synthesis, has been used analyse a target and find a compound
that fits the structure then large numbers of derivatives are produced, these compounds lack
a huge amount of diversity. Using TOS to synthesise compounds results in very specific
chemical descriptors for the products causing a decrease in diversity of the drug compounds
that are produced. The answer to these issues is Diversity-oriented Synthesis (DOS); this
process intentionally produces more
than one set of compounds that are
diverse in structure and properties in an
efficient manner to solve complex drug
targeting issuesxxixxii
.
DOS was developed to use
combinatorial synthesis to produce large
libraries of compounds, but fixes the
issue of diversity in the process. DOS
aims to produce compounds with a
broad range within specific chemical
Figure 7
14. Abhishek Pai 1319400
14 | P a g e
descriptors making the chances of producing very diverse compounds much higher. This is
because changing the properties of compounds may involve change a large number of
functional groups, stereogenic sites etc. this can result in a change in the overall structure of
the compound thus increasing diversity. Figure 8 shows how all three routes of synthesis
create compounds and the relative diversity of the compounds that are produced. This
representation shows how much more advanced DOS is in comparison to conventional
combinatorial synthesis and TOS at producing compounds with very high molecular
diversityxxiii
.
There are four requirements for increasing molecular diversity in drug synthesis, appendage
diversity, stereochemical diversity, functional group diversity and skeletal diversity. The final
requirement is the hardest to achieve and there are two strategies that have been developed
to help improve the skeletal diversity of novel compounds. Folding and Branching pathways
are two techniques that are being used to develop a diverse range of drug compounds;
figure 9 shows a basic schematic of how the techniques work. The folding pathway uses one
common reagent that folds the structure in a
specific way but uses a variety of different
starting materials to produce a large number of
diverse compounds. Branching pathway uses a
one starting compound and a range of different
reagents to alter the one compound into several
different diverse ranges of compounds. At the
end of both processes a large diverse range of
compounds will be produced.
Figure 8
Figure 9
Figure 10
15. Abhishek Pai 1319400
15 | P a g e
When either technique is used the diversity of molecules produced increases dramatically.
Figure 10 shows a graph using two molecular descriptors to identify the distribution of
compounds. The MDDR library contains compounds that are known to be drugs and when
comparing this database with DOS library (red) and focussed library (blue) it is clear that the
distribution of DOS library is much higher than the focussed library, showing the level of
diversity that is achieved when DOS is used to produce new compoundsxxiv
.
Conclusion
This literature review chose to focus on two aspects of fixing problems associated with
combinatorial synthesis and finding alternative routes of synthesis that improve molecular
diversity. Firstly, fixing problems associated with combinatorial synthesis involving the
improvement of techniques to analyse compounds in libraries and using them to combine
libraries, with the aid of computation techniques such as virtual docking, inverse docking,
LASOO, molecular descriptors, flexibility of drug binding and the analysis of drug and non-
drug compounds. These techniques were used to analyse large libraries of compounds
containing 100,000s and 1,000,000s of compounds to determine their relative diversity and
viability as useful compounds for further drug development.
Second part of the review focused on finding alternative routes of synthesis to combinatorial
synthesis to improve molecular diversity. There are techniques that can be used to synthesis
new compounds that are very diverse such as structural-based synthesis, fragment-based
synthesis and diversity-oriented synthesis. All three of these techniques have been modified
and improved to increase the molecular diversity that can be achieved so that diverse drug
compounds can go from discovery to lead with greater success rates.
The techniques listed above and discussed in this literature review highlight the
improvements that are being made to the process of drug discovery. As these areas of
research progress further the relative ease by which more diverse compounds are
discovered and synthesised will rise.
i
Higginbotham, S., Wong, W., Linington, R., Spadafora, C., Iturrado, L. and Arnold, A. (2014). Sloth Hair as a
Novel Source of Fungi with Potent Anti-Parasitic, Anti-Cancer and Anti-Bacterial Bioactivity. PLoS ONE, 9(1),
p.e84549.
ii
Combinatorial Chemistry Review. (2016). [online] Combichemistry.com. Available at:
http://www.combichemistry.com.
iii
Klon, A., Glick, M., Thoma, M., Acklin, P. and Davies, J. (2004). Finding More Needles in the Haystack: A
Simple and Efficient Method for Improving High-Throughput Docking Results. J. Med. Chem., 47(11), pp.2743-
2749.
16. Abhishek Pai 1319400
16 | P a g e
iv
Shoichet, B. (2004). Virtual screening of chemical libraries. Nature, 432(7019), pp.862-865.
v
Ellingson, S., Dakshanamurthy, S., Brown, M., Smith, J. and Baudry, J. (2013). Accelerating virtual high-
throughput ligand docking: current technology and case study on a petascale supercomputer. Concurrency and
Computation: Practice and Experience, 26(6), pp.1268-1277.
vi
Hou, X., Li, K., Yu, X., Sun, J. and Fang, H. (2015). Protein Flexibility in Docking-Based Virtual Screening:
Discovery of Novel Lymphoid-Specific Tyrosine Phosphatase Inhibitors Using Multiple Crystal Structures. Journal
of Chemical Information and Modeling, 55(9), pp.1973-1983.
vii
Bottegoni, G., Rocchia, W., Rueda, M., Abagyan, R. and Cavalli, A. (2011). Systematic Exploitation of Multiple
Receptor Conformations for Virtual Ligand Screening. PLoS ONE, 6(5), p.e18845.
viii
Dixon, S. and Villar, H. (1998). Bioactive Diversity and Screening Library Selection via Affinity Fingerprinting.
Journal of Chemical Information and Modeling, 38(6), pp.1192-1203.
ix
Chen, Y. and Ung, C. (2001). Prediction of potential toxicity and side effect protein targets of a small molecule
by a ligand–protein inverse docking approach. Journal of Molecular Graphics and Modelling, 20(3), pp.199-218.
x
Patterson, D., Cramer, R., Ferguson, A., Clark, R. and Weinberger, L. (1996). Neighborhood Behavior: A Useful
Concept for Validation of “Molecular Diversity” Descriptors. J. Med. Chem., 39(16), pp.3049-3059.
xi
Cummins, D., Andrews, C., Bentley, J. and Cory, M. (1996). Molecular Diversity in Chemical Databases:
Comparison of Medicinal Chemistry Knowledge Bases and Databases of Commercially Available Compounds.
Journal of Chemical Information and Modeling, 36(4), pp.750-763.
xii
Schamberger, J., Grimm, M., Steinmeyer, A. and Hillisch, A. (2011). Rendezvous in chemical space? Comparing
the small molecule compound libraries of Bayer and Schering. Drug Discovery Today, 16(13-14), pp.636-641.
xiii
Hassan, M., Bielawski, J., Hempel, J. and Waldman, M. (1996). Optimization and visualization of molecular
diversity of combinatorial libraries. Molecular Diversity, 2(1-2), pp.64-74.
xiv
Koehler, R., Dixon, S. and Villar, H. (1999). LASSOO: A Generalized Directed Diversity Approach to the Design
and Enrichment of Chemical Libraries. J. Med. Chem., 42(22), pp.4695-4704.
xv
Sadowski, J. and Kubinyi, H. (1998). A Scoring Scheme for Discriminating between Drugs and Nondrugs. J.
Med. Chem., 41(18), pp.3325-3329.
xvi
Schoonjans, V., Questier, F., Borosy, A., Walczak, B., Massart, D. and Hudson, B. (2000). Use of mass
spectrometry for assessing similarity/diversity of natural products with unknown chemical structures. Journal of
Pharmaceutical and Biomedical Analysis, 21(6), pp.1197-1214.
xvii
Li, J., Murray, C., Waszkowycz, B. and Young, S. (1998). Targeted molecular diversity in drug discovery:
Integration of structure-based design and combinatorial chemistry. Drug Discovery Today, 3(3), pp.105-112.
xviii
Kick, E., Roe, D., Geoffrey Skillman, A., Liu, G., Ewing, T., Sun, Y., Kuntz, I. and Ellman, J. (1997). Structure-
based design and combinatorial chemistry yield low nanomolar inhibitors of cathepsin D. Chemistry & Biology,
4(4), pp.297-307.
xix
Murray, C. and Rees, D. (2015). Opportunity Knocks: Organic Chemistry for Fragment-Based Drug Discovery
(FBDD). Angewandte Chemie International Edition, 55(2), pp.488-492.
xx
Law, R., Barker, O., Barker, J., Hesterkamp, T., Godemann, R., Andersen, O., Fryatt, T., Courtney, S., Hallett, D.
and Whittaker, M. (2009). The multiple roles of computational chemistry in fragment-based drug design.
Journal of Computer-Aided Molecular Design, 23(8), pp.459-473.
xxi
Spring, D. (2003). Diversity-oriented synthesis; a challenge for synthetic chemistsElectronic supplementary
information (ESI) available: Excel file of all the FDA new molecular entities between the years 1998 and July
2003, and new drug approvals between the years 1990 and 2002. See
http://www.rsc.org/suppdata/ob/b3/b310752n/. Organic & Biomolecular Chemistry, 1(22), p.3867.
xxii
Ma DL, L. (2013). Future Frontiers in Diversity-Oriented Synthesis. Organic Chem Curr Res, 03(01).
xxiii
Fergus, S., Bender, A. and Spring, D. (2005). Assessment of structural diversity in combinatorial synthesis.
Current Opinion in Chemical Biology, 9(3), pp.304-309.
xxiv
Spandl, R., Díaz‐Gavilán, M., O'Connell, K., Thomas, G. and Spring, D. (2008). Diversity‐oriented synthesis.
Chem. Record, 8(3), pp.129-142.