Creating novel drugs is an extraordinarily hard and complex problem.
One of the many challenges in drug design is the sheer size of the search space for novel chemical compounds. Scientists need to find molecules that are active toward a biological target or pathway and at the same time have acceptable ADMET properties.
There is now considerable research going on using various AI and ML approaches to tackle these challenges.
Our distinguished speakers, Drs. Alex Tropsha and Ola Engkvist, will discuss their recent work in Drug Design involving Deep Reinforcement Learning and Neural Networks, and will answer questions from the audience on the current state of the research in the field.
Speakers:
Prof Alex Tropsha, Professor at University of North Carolina at Chapel Hill, USA
Dr. Ola Engkvist, Associate Director at AstraZeneca R&D, Gothenburg, Sweden
Tutorial delivered at ECML-PKDD 2021.
TL;DR: This tutorial reviews recent developments on drug discovery using machine learning methods.
Powered by neural networks, modern machine learning has enjoyed great successes in data-intensive domains such as computer vision and languages where human can naturally perform well. Machine learning equipped with reasoning is now accelerating fields that traditionally require deep expertise such as physics, chemistry and biomedicine. This tutorial provides an overview of how machine learning and reasoning are speeding up and lowering the cost of drug discovery. This includes how machine learning can help in wide range of areas such as novel molecule identification, protein representation, drug-target binding, drug re-purposing, generative drug design, chemical reaction, retrosynthesis planning, drug-drug interaction, and safety assessment. We will also discuss relevant machine learning models for graph classification, molecular graph transformation, drug generation using deep generative models and reinforcement learning, and chemical reasoning.
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Chanin Nantasenamat
In this lecture, I provide an overview on how computers can be instrumental in drug discovery efforts. Topics covered includes: big data as a result of omics effort; bioinformatics; cheminformatics; biological space; chemical space; how computers particularly machine learning (and data science) can be applied in the context of drug discovery.
A video of this lecture is also provided on the "Data Professor" YouTube channel available at http://bit.ly/dataprofessor
If you are fascinated about data science, it would mean the world to me if you would consider subscribing to this channel (by clicking the link below):
http://bit.ly/dataprofessor
Drug discovery and development is a long and expensive process and over time has notoriously bucked Moore’s law that it now has its own law called Eroom’s Law named after it (the opposite of Moore’s). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of the failures.
Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible is paramount in accelerating drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains.
Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories:
1. Discovery,
2. Toxicity and Safety, and
3. Post-Market Monitoring.
We will address the recent progress in predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them.
We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will address some of the remaining challenges and limitations yet to be addressed in the area of drug discovery and safety assessment.
Tutorial delivered at ECML-PKDD 2021.
TL;DR: This tutorial reviews recent developments on drug discovery using machine learning methods.
Powered by neural networks, modern machine learning has enjoyed great successes in data-intensive domains such as computer vision and languages where human can naturally perform well. Machine learning equipped with reasoning is now accelerating fields that traditionally require deep expertise such as physics, chemistry and biomedicine. This tutorial provides an overview of how machine learning and reasoning are speeding up and lowering the cost of drug discovery. This includes how machine learning can help in wide range of areas such as novel molecule identification, protein representation, drug-target binding, drug re-purposing, generative drug design, chemical reaction, retrosynthesis planning, drug-drug interaction, and safety assessment. We will also discuss relevant machine learning models for graph classification, molecular graph transformation, drug generation using deep generative models and reinforcement learning, and chemical reasoning.
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Chanin Nantasenamat
In this lecture, I provide an overview on how computers can be instrumental in drug discovery efforts. Topics covered includes: big data as a result of omics effort; bioinformatics; cheminformatics; biological space; chemical space; how computers particularly machine learning (and data science) can be applied in the context of drug discovery.
A video of this lecture is also provided on the "Data Professor" YouTube channel available at http://bit.ly/dataprofessor
If you are fascinated about data science, it would mean the world to me if you would consider subscribing to this channel (by clicking the link below):
http://bit.ly/dataprofessor
Drug discovery and development is a long and expensive process and over time has notoriously bucked Moore’s law that it now has its own law called Eroom’s Law named after it (the opposite of Moore’s). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of the failures.
Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible is paramount in accelerating drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains.
Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories:
1. Discovery,
2. Toxicity and Safety, and
3. Post-Market Monitoring.
We will address the recent progress in predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them.
We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will address some of the remaining challenges and limitations yet to be addressed in the area of drug discovery and safety assessment.
Everything you want to know about role of artificial intelligence in drug discovery.
Artificial intelligence in health care and pharmacy, drug discovery, tensorflow, python,
deep neural network, GANs
AI in drug discovery and development
AI in clinical trials
Bioinformatics is a science of extracting knowledge from biological data, сomplexity and amount of which, has increased significantly over the past decades. To meet the challenges ahead, more sophisticated algorithms and assets should be adopted. Thus, Machine Learning has become an everyday tool in Bioinformatics, that helps to solve important biological riddles. In this report, In this presentation I discussed examples of how using well-known Machine Learning methods, bioinformaticians and computer scientists help doctors and biologists diagnose and treat deadly diseases.
Presentation at Advanced Intelligent Systems for Sustainable Development (AISSD 2021) 20-22 August 2021 organized by the scientific research group in Egypt with Collaboration with Faculty of Computers and AI, Cairo University and the Chinese University in Egypt
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
Discovering new drugs is a lengthy and expensive process. This means that finding new uses for existing drugs can help create new treatments in less time and with less time. The difficulty is in finding these potential new uses.
How do we find these undiscovered uses for existing drugs?
We can unify the available structured and unstructured data sets into a knowledge graph. This is done by fusing the structured data sets, and performing named entity extraction on the unstructured data sets. Once this is done, we can use deep learning techniques to predict latent relationships.
In this talk we will cover:
Building the knowledge graph
Predicting latent relationships
Using the latent relationships to repurpose existing drugs
Protein docking is used to check the structure, position and orientation of a protein when it interacts with small molecules like ligands. Protein receptor-ligand motifs fit together tightly, and are often referred to as a lock and key mechanism. There are both high specificity and induced fit within these interfaces with specificity increasing with rigidity. The foremost thing that we need to start with a docking search is the sequence of our protein of interest. (Halperin et al., 2002).
Protein-protein interactions occur between two proteins that are similar in size. The interface between the two molecules tends to be flatter and smoother than those in interfaces of these interactions do not have the ability to alter protein-ligand interactions. Protein-protein interactions are usually more rigid, the conformation in order to improve binding and ease movement. (Smith and Sternberg, 2002).
The process of drug development has revolved around a screening approach, as nobody knows which compound or approach could serve as a drug or therapy. Such almost blind screening approach is very time-consuming and laborious. The goal of structure-based drug design is to find chemical structures fitting in the binding pocket of the receptor. Based on the three-dimensional structure of the target protein, it can automatically build ligand molecules within the binding pocket and subsequently screen them (Weil et al., 2004).
A homology model of the housefly voltage-gated sodium channel was developed to predict the location of binding sites for the insecticides fenvalerate, a synthetic pyrethroid, and DDT, an early generation organochlorine. The model successfully addresses the state-dependent affinity of pyrethroid insecticides. (O’Reilly et al., 2006).
Drug discovery take years to decade for discovering a new drug and very costly
Effort to cut down the research timeline and cost by reducing wet-lab experiment use computer modeling
Others have done the work. Some have used the work. I have spoken only on behalf of their behalf.
Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years.
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5ChemAxon
Here, we present an updated version of iPPI-DB, our manually curated database of PPI modulators. In this release, the data model, the graphical interface and the tools to query the database have been completely redesigned. We used Chemaxon MarvinJS and JChem library to support this development. We added new PPI modulators, new PPI targets, and extended our focus to stabilizers of PPIs as well. Finally, we introduce a web application relying on crowdsourcing for the maintenance of the database. This application can be used outside of our group to collaboratively maintain iPPI-DB within a community of curators.
Everything you want to know about role of artificial intelligence in drug discovery.
Artificial intelligence in health care and pharmacy, drug discovery, tensorflow, python,
deep neural network, GANs
AI in drug discovery and development
AI in clinical trials
Bioinformatics is a science of extracting knowledge from biological data, сomplexity and amount of which, has increased significantly over the past decades. To meet the challenges ahead, more sophisticated algorithms and assets should be adopted. Thus, Machine Learning has become an everyday tool in Bioinformatics, that helps to solve important biological riddles. In this report, In this presentation I discussed examples of how using well-known Machine Learning methods, bioinformaticians and computer scientists help doctors and biologists diagnose and treat deadly diseases.
Presentation at Advanced Intelligent Systems for Sustainable Development (AISSD 2021) 20-22 August 2021 organized by the scientific research group in Egypt with Collaboration with Faculty of Computers and AI, Cairo University and the Chinese University in Egypt
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
Discovering new drugs is a lengthy and expensive process. This means that finding new uses for existing drugs can help create new treatments in less time and with less time. The difficulty is in finding these potential new uses.
How do we find these undiscovered uses for existing drugs?
We can unify the available structured and unstructured data sets into a knowledge graph. This is done by fusing the structured data sets, and performing named entity extraction on the unstructured data sets. Once this is done, we can use deep learning techniques to predict latent relationships.
In this talk we will cover:
Building the knowledge graph
Predicting latent relationships
Using the latent relationships to repurpose existing drugs
Protein docking is used to check the structure, position and orientation of a protein when it interacts with small molecules like ligands. Protein receptor-ligand motifs fit together tightly, and are often referred to as a lock and key mechanism. There are both high specificity and induced fit within these interfaces with specificity increasing with rigidity. The foremost thing that we need to start with a docking search is the sequence of our protein of interest. (Halperin et al., 2002).
Protein-protein interactions occur between two proteins that are similar in size. The interface between the two molecules tends to be flatter and smoother than those in interfaces of these interactions do not have the ability to alter protein-ligand interactions. Protein-protein interactions are usually more rigid, the conformation in order to improve binding and ease movement. (Smith and Sternberg, 2002).
The process of drug development has revolved around a screening approach, as nobody knows which compound or approach could serve as a drug or therapy. Such almost blind screening approach is very time-consuming and laborious. The goal of structure-based drug design is to find chemical structures fitting in the binding pocket of the receptor. Based on the three-dimensional structure of the target protein, it can automatically build ligand molecules within the binding pocket and subsequently screen them (Weil et al., 2004).
A homology model of the housefly voltage-gated sodium channel was developed to predict the location of binding sites for the insecticides fenvalerate, a synthetic pyrethroid, and DDT, an early generation organochlorine. The model successfully addresses the state-dependent affinity of pyrethroid insecticides. (O’Reilly et al., 2006).
Drug discovery take years to decade for discovering a new drug and very costly
Effort to cut down the research timeline and cost by reducing wet-lab experiment use computer modeling
Others have done the work. Some have used the work. I have spoken only on behalf of their behalf.
Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years.
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5ChemAxon
Here, we present an updated version of iPPI-DB, our manually curated database of PPI modulators. In this release, the data model, the graphical interface and the tools to query the database have been completely redesigned. We used Chemaxon MarvinJS and JChem library to support this development. We added new PPI modulators, new PPI targets, and extended our focus to stabilizers of PPIs as well. Finally, we introduce a web application relying on crowdsourcing for the maintenance of the database. This application can be used outside of our group to collaboratively maintain iPPI-DB within a community of curators.
Drug discovery and development is a long and expensive process over time has notoriously bucked Moore's law that it now has its own law called Eroom's Law named after it (the opposite of Moore). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of drug failures. Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible becomes all the more important to accelerate drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains. Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories: 1. Classification 2. Regression 3. Read-across. The talk will also cover how by using a hierarchical classification methodology you can simplify the problem of assessing toxicity of any given chemical compound. We will also address recent progress of predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them. We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will also address some of the remaining challenges and limitations yet to be addressed in the area of drug safety assessment.
My poster on using pairwise learning for annotating, engineering and designing biological molecules. Mostly an overview of the types of things we are working on at the lab.
Presented by Richard Kidd at "The Future Information Needs of Pharmaceutical & Medicinal Chemistry", Monday 28 November 2011 at The Linnean Society, Burlington Square, London run by the RSC CICAG group.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
Tools and approaches for data deposition into nanomaterial databasesValery Tkachenko
Sustainable research progress in many scientific disciplines critically depends on the existence of robust specialized databases that integrate and structure all available experimental information in the respective fields. The need for such reference database is especially critical for nanoscience and nanomaterial research given the significant diversity of shapes, sizes, and properties of engineered nanomaterials and the difficulty of synthesizing engineered nanoparticles with controlled properties. The acquisition of data from public sources is inefficient, time consuming and limited in scope. Moreover, it is not clear where the resources come from to support this activity on a perpetual basis. The NIH has recently posted its intention to provide special funds toward data deposition by the experimental investigators through the ‘data sharing plan’ for each proposal. However, this points to a current weakness which is that all laboratories use different data collection approaches each of which requires interpretation by staff hosting the database. It would be far more efficient and useful if a template with key terms that could be modified to add new or important additional data or parameters for each investigator. We will discuss tools and approaches to facilitate collection and direct deposition of experimental data into Nanomaterial Registry (https://www.nanomaterialregistry.org/) - a versatile semantically enriched templates-based platform for registering diverse data pertaining to nanomaterials research.
This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
Presented at the 2018 Research Festival at the National Institutes of Health (NIH) in Bethesda, MD (September 13, 2018).
==== Abstract ====
The retinoid X receptor (RXR) is a nuclear hormone receptor that functions as a transcription factor with roles in development, cell differentiation, metabolism, and cell death. Chemicals that interfere the RXR signaling pathway may cause adverse effects on human health. In this study, public-domain bioactivity data available in PubChem (https://pubchem.ncbi.nlm.nih.gov) were used to develop machine learning-based prediction models for chemical modulators of RXR-alpha, which is a subtype of RXR that plays a role in metabolic signaling pathways, dermal cysts, cardiac development, insulin sensitization, etc. The models were constructed from quantitative high-throughput screening (qHTS) data from the Tox21 project, using popular supervised machine learning methods (including support vector machine, random forest, neural network, k-nearest neighbors, decision tree, and naïve Bayes). The general applicability of the developed models was evaluated with external data sets from ChEMBL and the NCATS Chemical Genomics Center (NCGC). This study showcases how open data in the public domain can be used to develop prediction models for bioactivity of small molecules.
Ensuring Chemical Structure, Biological Data and Computational Model Quality
A talk given at SLAS 2016 mon Jan 25th in San Diego
covers published work and recent forays with BIA 10-2474
Building an informatics solution to sustain AI-guided cell profiling with hig...Ola Spjuth
Presentation at SLAS Europe 2019 in Barcelona on 28 june, 2019.
High-content microscopy in automated laboratories present many challenges for storing and processing data, and to build AI models to aid decision making. We have established an informatics system to serve a robotized cell profiling setup with incubators, liquid handling and high-content microscopy for microplates. The informatics system consists of computational infrastructure (CPUs, GPUs, storage), middleware (Kubernetes), imaging database and software (OMERO), and workflow system (Pachyderm) to perform online prioritization of new data, and automate the process from acquired images to continuously updated and deployed AI models. The AI methodologies include Deep Learning models trained on image data, and conventional machine learning models trained on data from Cell Painting experiments. The microservice architecture makes the system scalable and expandable, and a key objective is on improving screening and toxicity assessment using AI-aided intelligent experimental design.
Similar to AI & ML in Drug Design: Pistoia Alliance CoE (20)
Fairification experience clarifying the semantics of data matricesPistoia Alliance
This webinar presents the Statistics Ontology, STATO which is a semantic framework to support the creation of standardized analysis reports to help with review of results in the form of data matrices. STATO includes a hierarchy of classes and a vocabulary for annotating statistical methods used in life, natural and biomedical sciences investigations, text mining and statistical analyses.
Innovation applications of microphysiological systems (MPS) have been growing over the past decade, especially with respect to the use of complex human tissues for assessing safety of drug candidates – but broad industry adoption of MPS methods has not yet become a reality.
This webinar addresses some recent advances in MPS development and begins to explore the barriers to increased incorporation of MPS to improve drug safety assessment and to provide safer, more effective drugs into the clinical pipeline.
Federated Learning (FL) is a learning paradigm that enables collaborative learning without centralizing datasets. In this webinar, NVIDIA present the concept of FL and discuss how it can help overcome some of the barriers seen in the development of AI-based solutions for pharma, genomics and healthcare. Following the presentation, the panel debate on other elements that could drive the adoption of digital approaches more widely and help answer currently intractable science and business questions.
It seems that AI is also becoming a buzzword, like design thinking. Everyone is talking about AI or wants to have AI, and sees all the ideas and benefits – that’s fine, but how do you get started? But what’s different now? Three innovations have finally put AI on the fast track: Big Data, with the internet and sensors everywhere; massive computing power, especially through the Cloud; and the development of breakthrough algorithms, so computers can be trained to accomplish more sophisticated tasks on their own with deep learning. If you use new technology, you need to explore and know what’s possible. With design thinking, it aids to outline the steps and define the ways in which you’re going to create the solution. Starting with mapping the customer journey, defining who will be using that service enhanced with intelligent technology, or who will benefit and gain value from it. We discuss how these two worlds are coming together, and how you get started to transform your venture with Artificial Intelligence using Design Thinking.
Speaker: Claudio Mirti, Principal Solution Specialist – Data & AI, Microsoft
Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology
Knowledge graphs ilaria maresi the hyve 23apr2020Pistoia Alliance
Data for drug discovery and healthcare is often trapped in silos which hampers effective interpretation and reuse. To remedy this, such data needs to be linked both internally and to external sources to make a FAIR data landscape which can power semantic models and knowledge graphs.
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
This presentation described how data-driven chemoinformatics methods may automate much of what has historically been done by a medicinal chemist. It explored what is reasonable to expect “AI” approaches might achieve, and what is best left with a human expert. The implications of automation for the human-machine interface were explored and illustrated with examples from Bradshaw, GSK’s experimental automated design environment.
This presentation reviewed the challenges in identifying, acquiring and utilizing research data in relation to an evolving data market. Strategic solutions were examined in which the FAIR principles play a key role in the future of data management.
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
In this webinar Dr Henriette Harmse from EMBL-EBI presents how they are using their ontology services at EMBL-EBI to scale up the annotation of data and deliver added value through ontologies and semantics to their users.
Fair webinar, Ted slater: progress towards commercial fair data products and ...Pistoia Alliance
Elsevier is a global information analytics business that helps institutions and professional’s
advance healthcare and open science to improve performance for the benefit of humanity.
In this webinar, we discuss how Elsevier is increasingly leveraging the FAIR Guiding Principles to improve its products and services to better serve the scientific community.
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
The FAIR (Findable, Accessible, Interoperable and Reusable) principles aim to maximize the discovery and reuse of digital resources. Using recently developed software and metrics to assess FAIRness and supported through an ELIXIR Implementation Study, Michel worked with a subset of ELIXIR Core Data Resources to apply these technologies. In this webinar, he will discuss their approach, findings, and lessons learned towards the understanding and promotion of the FAIR principles.
Implementing Blockchain applications in healthcarePistoia Alliance
Blockchain technology can revolutionise the way information is exchanged between parties by bringing an unprecedented level of security and trust to these transactions. The technology is finding its way into multiple use cases but we are yet to see full adoption and real-world business implementation in the Healthcare industry.
In this webinar we will explore the main challenges and considerations for the implementation of Blockchain technology in Healthcare use cases. This is the third webinar in our Blockchain Education series.
Building trust and accountability - the role User Experience design can play ...Pistoia Alliance
In this webinar our panel of UX specialists give a brief introduction to User Experience before presenting the design opportunities UX can bring to AI. We all know that AI has great potential but has some significant hurdles to overcome not least so the human aspect of trust and ethical considerations when designing in the life sciences.
In the late Fall and Winter of 2018, the Pistoia Alliance in cooperation with Elsevier and charitable organizations Cures within Reach and Mission: Cure ran a datathon aiming to find drugs suitable for treatment of childhood chronic pancreatitis, a rare disease that causes extreme suffering. The datathon resulted in identification of four candidate compounds in a short time frame of just under three months. In this webinar our speakers discuss the technologies that made this leap possible
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
The slides from the Pistoia Alliance Debates Webinar where a panel of experts from technology support providers and the biopharma industry, who have been invited to share their views on the "Benefits and costs of FAIR Implementation for life science industry".
The slides from thecontinuing part of Pistoia Alliance's drive to improve education and communication around new technologies to life science professionals, this webinar explored how blockchain/DLT and IoT could come together to add even more trust to the GxP domain. If you want to know more about how these new technologies could help enhance GxP compliance, then this webinar will give you much food for thought.
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...GL Anaacs
Contact us if you are interested:
Email / Skype : kefaya1771@gmail.com
Threema: PXHY5PDH
New BATCH Ku !!! MUCH IN DEMAND FAST SALE EVERY BATCH HAPPY GOOD EFFECT BIG BATCH !
Contact me on Threema or skype to start big business!!
Hot-sale products:
NEW HOT EUTYLONE WHITE CRYSTAL!!
5cl-adba precursor (semi finished )
5cl-adba raw materials
ADBB precursor (semi finished )
ADBB raw materials
APVP powder
5fadb/4f-adb
Jwh018 / Jwh210
Eutylone crystal
Protonitazene (hydrochloride) CAS: 119276-01-6
Flubrotizolam CAS: 57801-95-3
Metonitazene CAS: 14680-51-4
Payment terms: Western Union,MoneyGram,Bitcoin or USDT.
Deliver Time: Usually 7-15days
Shipping method: FedEx, TNT, DHL,UPS etc.Our deliveries are 100% safe, fast, reliable and discreet.
Samples will be sent for your evaluation!If you are interested in, please contact me, let's talk details.
We specializes in exporting high quality Research chemical, medical intermediate, Pharmaceutical chemicals and so on. Products are exported to USA, Canada, France, Korea, Japan,Russia, Southeast Asia and other countries.
micro teaching on communication m.sc nursing.pdfAnurag Sharma
Microteaching is a unique model of practice teaching. It is a viable instrument for the. desired change in the teaching behavior or the behavior potential which, in specified types of real. classroom situations, tends to facilitate the achievement of specified types of objectives.
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...i3 Health
i3 Health is pleased to make the speaker slides from this activity available for use as a non-accredited self-study or teaching resource.
This slide deck presented by Dr. Kami Maddocks, Professor-Clinical in the Division of Hematology and
Associate Division Director for Ambulatory Operations
The Ohio State University Comprehensive Cancer Center, will provide insight into new directions in targeted therapeutic approaches for older adults with mantle cell lymphoma.
STATEMENT OF NEED
Mantle cell lymphoma (MCL) is a rare, aggressive B-cell non-Hodgkin lymphoma (NHL) accounting for 5% to 7% of all lymphomas. Its prognosis ranges from indolent disease that does not require treatment for years to very aggressive disease, which is associated with poor survival (Silkenstedt et al, 2021). Typically, MCL is diagnosed at advanced stage and in older patients who cannot tolerate intensive therapy (NCCN, 2022). Although recent advances have slightly increased remission rates, recurrence and relapse remain very common, leading to a median overall survival between 3 and 6 years (LLS, 2021). Though there are several effective options, progress is still needed towards establishing an accepted frontline approach for MCL (Castellino et al, 2022). Treatment selection and management of MCL are complicated by the heterogeneity of prognosis, advanced age and comorbidities of patients, and lack of an established standard approach for treatment, making it vital that clinicians be familiar with the latest research and advances in this area. In this activity chaired by Michael Wang, MD, Professor in the Department of Lymphoma & Myeloma at MD Anderson Cancer Center, expert faculty will discuss prognostic factors informing treatment, the promising results of recent trials in new therapeutic approaches, and the implications of treatment resistance in therapeutic selection for MCL.
Target Audience
Hematology/oncology fellows, attending faculty, and other health care professionals involved in the treatment of patients with mantle cell lymphoma (MCL).
Learning Objectives
1.) Identify clinical and biological prognostic factors that can guide treatment decision making for older adults with MCL
2.) Evaluate emerging data on targeted therapeutic approaches for treatment-naive and relapsed/refractory MCL and their applicability to older adults
3.) Assess mechanisms of resistance to targeted therapies for MCL and their implications for treatment selection
These simplified slides by Dr. Sidra Arshad present an overview of the non-respiratory functions of the respiratory tract.
Learning objectives:
1. Enlist the non-respiratory functions of the respiratory tract
2. Briefly explain how these functions are carried out
3. Discuss the significance of dead space
4. Differentiate between minute ventilation and alveolar ventilation
5. Describe the cough and sneeze reflexes
Study Resources:
1. Chapter 39, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 34, Ganong’s Review of Medical Physiology, 26th edition
3. Chapter 17, Human Physiology by Lauralee Sherwood, 9th edition
4. Non-respiratory functions of the lungs https://academic.oup.com/bjaed/article/13/3/98/278874
- Video recording of this lecture in English language: https://youtu.be/lK81BzxMqdo
- Video recording of this lecture in Arabic language: https://youtu.be/Ve4P0COk9OI
- Link to download the book free: https://nephrotube.blogspot.com/p/nephrotube-nephrology-books.html
- Link to NephroTube website: www.NephroTube.com
- Link to NephroTube social media accounts: https://nephrotube.blogspot.com/p/join-nephrotube-on-social-media.html
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Ve...kevinkariuki227
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Verified Chapters 1 - 19, Complete Newest Version.pdf
TEST BANK for Operations Management, 14th Edition by William J. Stevenson, Verified Chapters 1 - 19, Complete Newest Version.pdf
NVBDCP.pptx Nation vector borne disease control programSapna Thakur
NVBDCP was launched in 2003-2004 . Vector-Borne Disease: Disease that results from an infection transmitted to humans and other animals by blood-feeding arthropods, such as mosquitoes, ticks, and fleas. Examples of vector-borne diseases include Dengue fever, West Nile Virus, Lyme disease, and malaria.
Prix Galien International 2024 Forum ProgramLevi Shapiro
June 20, 2024, Prix Galien International and Jerusalem Ethics Forum in ROME. Detailed agenda including panels:
- ADVANCES IN CARDIOLOGY: A NEW PARADIGM IS COMING
- WOMEN’S HEALTH: FERTILITY PRESERVATION
- WHAT’S NEW IN THE TREATMENT OF INFECTIOUS,
ONCOLOGICAL AND INFLAMMATORY SKIN DISEASES?
- ARTIFICIAL INTELLIGENCE AND ETHICS
- GENE THERAPY
- BEYOND BORDERS: GLOBAL INITIATIVES FOR DEMOCRATIZING LIFE SCIENCE TECHNOLOGIES AND PROMOTING ACCESS TO HEALTHCARE
- ETHICAL CHALLENGES IN LIFE SCIENCES
- Prix Galien International Awards Ceremony
3. Poll Question 1:
Are you or your organisation using AI /
ML in Drug Design?
A. Yes, already
B. Plan to do in next 12 months
C. Plan in next 12-24 months
D. No plans
10. QSAR Modeling Workflow: the
importance of rigorous validation
M o d e l i n g m e t h o d s
5-fold
External
Validation
1
4
3
2
5
12354
courtesy of L. Zhang
Combi-QSAR
modeling
Datasets
K-Nearest
Neighbors (kNN)
Random
Forest (RF)
Support Vector
Machines (SVM)
Dragon MOE
Internal validation
Model selection
An ensemble of
QSAR Models
Modeling set
External set
D e s c r i p t o r s
Evaluation of
external performance
10
Tropsha, A. Best Practices for QSAR Model Development, Validation,
and Exploitation Mol. Inf., 2010, 29, 476 – 488
Fully implemented on CHEMBENCH.MML.UNC.EDU
Virtual screening
(with AD threshold)
Experimental
confirmation
13. ReLeaSE* design principles: learning
and exploiting structural linguistics of
SMILES notation
• SMILES notations reflect rules of Chemistry
• SMILES notation may embed linguistic rules
• Neural nets could learn both of the above types of rules
• This knowledge can be transformed into the generation of
new SMILES corresponding to novel chemically feasible
molecules (generative model)
• One can build QSAR models based solely on SMILES
notation (predictive model)
• QSAR models can be used as a reward function for
reinforcement learning to bias the design of novel libraries
*Popova, M,, Isayev, O., and Tropsha, A. "Deep reinforcement learning for de-novo drug design."
Science Advances, 2018 Jul 25;4(7):eaap7885.
14. NLP/Text mining:directly learn
low-dimensional word vectors
∙ In deeplearning models, a wordis represented as a dense vector
∙ Word vectors form the basis for deep learning methods
∙ Objective: predict word based on the context
Mikolov T . et al. Distributed representations of words and phrases and their compositionality
//Advances in neural information processing systems. – 2013. – С. 3111-3119.
15. Design of the ReLeaSE* method
(Reinforcement Learning for Structural Evolution)
Elements of the
thought cycle
(molecules->models-
molecules):
• Generate chemically
feasible SMILES
• Develop SMILES-
based QSAR model
• Employ QSAR model
to bias library
generation
• Produce new
SMILES
*Popova, Mariya, Olexandr Isayev, and Alexander Tropsha. "Deep reinforcement learning for de-novo drug design."
arXiv preprint arXiv:1711.10907 (2017).
16. ReLeaSE:* Disruptive Innovation of
Conventional Computational Drug
Discovery Pipeline
Learn from
target-specific
data (300-500
molecules)
Target-specific
models
Virtual screening
Internal/public
databases
Selection and
testing of
known
molecules
Generation
of novel
molecules
Selection and
testing of
novel
molecules
ReLeaSE Workflow
Traditional Workflow
Learn from
all data (2M
molecules)
Target-specific and property
models / Reinforcement learning
Hits with
desired
properties
*Popova, M,, Isayev, O., and Tropsha, A. "Deep reinforcement learning for de-novo drug design."
Science Advances, 2018 Jul 25;4(7):eaap7885.
17. Disruptive innovation in QSAR: Can we avoid
descriptor generation altogether and besides,
predict new structures?
Did the
training
converge?
NO
YES
<START>
c
<START>c1ccc(O)cc1<END>
c
1
1
c
c
c
c
)
+ loss
c
(
(
F
+ loss
O
)
)
c
c
c
c
1
1
<END>
Softmax
loss
1.5M
molecules
from
ChEMBL
c1ccc(O)cc1
*Popova, M,, Isayev, O., and Tropsha, A. "Deep reinforcement learning for de-novo drug design."
Science Advances, 2018 Jul 25;4(7):eaap7885.
18. Are we making legitimate Smiles?
AI learning
system
95% Valid
Chemically-feasible
molecules
SMILE strings
/
Smiles strings
21. QSAR modeling using Smiles strings
only*
RMSE: 0.57 0.53
MAE: 0.37 0.35
R2
ext: 0.90 0.91
CN2C(=O)N(C)C(=O)C1=C2N=CN1C
Neural
Network
Property prediction
Predicted LogP
ObservedLogP
5CV RF model with
DRAGON7 Descriptors
5CV NN model with
SMILES directly
*LogP data for ~16K molecules from PHYSPROP (srcinc.com), Toxcast Dashboard
(https://comptox.epa.gov/dashboard), and others.
32. Results: Synthetic accessibility
score* of the designed libraries
*Ertl, Peter, and Ansgar Schuffenhauer. "Estimation of synthetic accessibility score of drug-like molecules based on molecular
complexity and fragment contributions." Journal of cheminformatics 1.1 (2009): 8.
34. Predicted pIC50 for JAK2 kinase
CAS 236-084-2
(buffer reagent)
ZINC37859566
New moleculeSIMILAR SCAFFOLDS
NEW CHEMOTYPE
JAK2 Kinase inhibition
Untrained data distribution
Maximized property distribution
Minimized property distribution
35. Target predictions for generated
compounds using SEA*
*Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand
chemistry. Nat Biotech 25 (2), 197-206 (2007).
36. Target predictions for generated
compounds using SEA*
*Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand
chemistry. Nat Biotech 25 (2), 197-206 (2007).
37. Practical implementation workflow
• Select a target
• Train ReLeaSE to generate new target-specific
molecules; collect computational hits
• Identify a fraction of hits available in commercial
libraries; purchase and test selected hits
• Following successful validation, order NCE synthesis
and testing in vitro and in vivo and if successful file
for IP protection
37
38. Summary
• We propose an innovative de novo drug discovery
technology termed Reinforcement Learning for
Structural Evolution (ReLeaSE)*
• ReLeaSE is a product of convergence of fields as
disparate as cheminformatics and text mining united
by AI
• Unlike most of the current technologies, ReLeaSE
enables the discovery of new chemical entities with the
desired bioactivity and drug-like properties
Patent application filed (application # 62/535069, filed by UNC07/2018)
39. General Summary
• Accumulation of Big Data in all areas of research creates
previously unachievable opportunities for using ML and AI
approaches
– However, primary data must be handled with extreme care (curation,
reproducibility)
• Exciting developments in computational chemistry
– Critical shift from discovery to design and AI-driven robotics
• Rapid progression from the use of computational modeling
for decision support to using models to guide experimental
research
– Critical importance of rigorous and comprehensive model validation
using truly external data
• Natural progression toward automated chemical labs driven
by AI
40. Principal Investigator
Alexander Tropsha
Research Professors
Alexander Golbraikh
Olexander Isayev
Eugene Muratov
Graduate students
Sherif Faraq
Kyle Bowers
Maria Popova
Andrew Thieme
Dan Korn
Phil Gusev
Postdoctoral Fellows
Vinicius Alves
Joyce Borba
MAJOR FUNDING
NIH
- 1U01CA207160
- R01-GM114015
- 5U54CA198999
- 1OT3TR002020
ONR
- N00014-16-1-2311
Acknowledgements
41. Poll Question 2:
What are the biggest barriers to machine
learning adoption Drug Design? (multi
select)
A. Lack of access to AI/ML Skills
B. Access to Data
C. Quality of Data
D. Access to ML & AI Tools
E. Other
42. Artificial Intelligence in Drug Design
Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden
February 26 2019PISTOIA Webinar
43. Drug Design
What to make next? How to make it?
De novo design
Multi-parameter scoring function
Retrosynthesis
44. What is different now?
44
Augmented
design
Autonomous
design
Automatic
design
de novo molecular
design
Synthesis prediction
Automation
Data generation
45. It takes two to tango
45
Artificial Intelligence Chemistry Automation
47. Neural Networks & Deep Learning
47
• Neural Networks known for decades
• Inputs, Hidden Layers, Outputs
• Single layer NNs have been used in QSAR
modelling for years
• Recent Applications use more complex
networks such as
• Multi-layer Feed-Forward NNs
• Convolutional NNs
• biological image processing
• Auto-encoder NNs
• Adversial NNs
• Recurrent NNs
48. Why? Generation of Novel Compounds in the 1060 Chemical Space!
48
Where´s the impact?
• Use for de novo Molecular Design
• Scaffold Hopping
• Novelty
• Virtual Screening
• Library Design
10601010-1012
49. Natural language generation and molecular structure generation
49
• Can we borrow concepts from natural language processing and
apply to SMILES description of molecular structures to generate
molecules?
• Conditional probability distributions given context
• 𝑃 𝑔𝑟𝑒𝑒𝑛 𝑖𝑠, 𝑔𝑟𝑎𝑠𝑠, 𝑇ℎ𝑒
• 𝑃 𝑂 =, 𝐶, 𝐶
The grass is ?
C C = ?
50. Tokenization of SMILES
50
• Tokenize combinations of characters like “Cl” or “[nH]”
• Represent the characters as one-hot vectors
52. Reinforcement learning
52
Learning from doing
Action Reward Update behaviour
Design molecule
Active?
Good DMPK?
Synthetically accessible?
Make more like this?
Make something else instead?
Agent
53. AI live: Create Structures Similar to Celecoxib
53
• Key Message
• RNN generates
structures similar
to Celecoxib
• Rapid sampling!
• Average score
describes how
many learning
steps are required
to reach similar
compounds
54. Some misconceptions about de novo RNN generated molecules
54
“The molecules are not diverse”
“The molecules are not synthetic feasible”
Answer: The generated molecules follows the properties of the dataset used as prior
Segler et al ACS Central Sci. 2018, 4, 120-131 Ertl et al arXiv:1712.07449
Diversity Synthetic feasibility
55. “Cambrian explosion” of different DL based molecular de novo generation
methods
55
PyTorch + RDKit + ChEMBL => anyone with a computer can contribute =>
Benchmarking is urgently needed
56. Which benchmarks? What are the relevant questions?
Does the same algorithm work best for both
scaffold hopping and lead series optimization?
Which algorithm samples the underlying
chemical space most complete?
1
2
3
Which algorithm zooms most efficiently to the
most interesting regions of chemical space?4
Which is best way to describe molecules,
strings or graphs?
57. Benchmark published by the scientific community
• MOSES Polykovskiy et al
• https://arxiv.org/abs/1811.12823
• Diversity and quality of generated molecules
1
2
3
• Arus-Pous et al
• https://chemrxiv.org/articles/Exploring_the_GDB13_Chemical_Space_Using_Deep_Generative_Models/7172849
• Complete sampling of the relevant chemical space
4
• Klambauer et al
• J. Chem. Inf. Mod. 2018, 58, 1736
• Distribution between generated and real molecules
• GuacaMol Brown et al
• https://arxiv.org/abs/1811.09621
• Efficient optimisation of a specific property
58. Artificial Intelligence Guided Drug Design Platform
58
Generation of Novel Chemical
Space
Reaction & Synthesis
Prediction
iLAB
DMTA
Make
Test
Analyse
Design
Desirability
function
Σ IC50, LogP,
Novelty etc.
Iterations
Profiling
AI Design
Platform
Fully Automated
DMTA Cycle
59. 2018 Proof-of-Principle Pilot Study
1st iteration
Novelty
3rd iteration
Expansion
2nd iteration
Novelty
4th iteration
Chemistry Automation
library
~2month ~2month ~2month
Constant re-learning and training
1
• Novelty key goal
• Crowded IP space
• Lots of available data
• Selectivity
• New promising series
identified
2
• Selectivity key goal
• Novelty
• Several promising
series identified
3
• Optimising HI series
• Tool compound
• Optimization successful
60. 60
Lessons from pilot study
• It works!
• Novel scaffolds were identified in crowded chemical space
• Compound series could be efficiently optimised
• Affinity and ADME predictions are still bottlenecks
• Too many ideas might make prioritization for synthesis challenging
• Chemistry resources need to be frontloaded
• Optimisation under constraints might lead to molecules that is difficult to synthesize
61. • Synergize with automation
• Better Machine Learning Models
• Access to more data (for instance IMI2 Call 14 Topic 3)
• Experimental descriptors
• Graph convolution, include protein based information
• Multi-task modelling
• Matrix factorization with side information
• Free energy calculations
• Progress in speed
• Combine with machine learning
• Confidence estimation
• Conformal prediction
• Bayesian methods
• Benchmarking
• Public Chemogenomics set available (ChEMBL, Excape-DB, Pidgin)
• Blind competitions (SAMPL, D3R)
How can we improve affinity prediction?
61
62. Will ML/AI revolutionize drug design?
My personal opinion(s)
62
• Only time will tell….
• The last commonly agreed revolution was the introduction of DMPK
departments in the 90s, so the bar is high
• ML/AI like other promising technologies (for instance PROTACS) warrants
further investments
• More data, automation and ability to learn makes ML/AI bound to have
larger impact on drug design in the future
• During my 19 years in industry it has never been as exciting to work with in
silico drug design
63. Acknowledgements
63
Discovery Sciences CompChem ML/AI Team
Thierry Kogej
Hongming Chen
Isabella Feierberg
Atanas Patronov
Esben Jannik Bjerrum
Preeti Iyer
Jiangming Sun (Postdoc 2015-2017)
Noe Sturm (Postdoc 2017-2018)
Philipp Buerger (Postdoc 2017-2020)
Jiazhen He (Postdoc 2019-2022)
Rocio Mercado (Postdoc 2018-2021)
Thomas Blaschke (PhD student 2017-2018)
Josep Arus Pous (PhD student 2018-2019)
Michael Withnall (PhD student 2018-2019)
Oliver Laufkötter (PhD student 2018-2019)
Laurent David (PhD student 2018-2019)
Ave Kuusk (PhD student 2016-2019)
Marcus Olivecrona (AZ GradProgram 2017)
Alexander Aivazidis (AZ GradProgram 2018)
Dhanushka Weerakoon (AZ GradProgram 2018-2019)
Panagiotis-Christos Kotsias (AZ AI GradProgram 2018-2019)
Edvard Lindelöf (Master Thesis Student 2018-2019)
Simon Johansson (Master Thesis Student 2019)
Oleksii Prykhodko (Master Thesis Student 2019)
Academic Collaborators
Marwin Segler (Munster)
Juergen Bajorath (Bonn)
Jean-Louis Reymond (Bern)
Andreas Bender (Cambridge)
Sepp Hochreiter (Linz)
Gunther Klambauer (Linz)
Sami Kaski (Helsinki)
Discovery Sciences
Garry Pairaudeau
Clive Green
Lars Carlsson
Nidhal Selmi
DSM AI Team
Ernst Ahlberg
Suzanne Winiwarter
Ioana Oprisiu
Ruben Buendia (Postdoc 2018)
PharmSci
Per-Ola Norrby
2018 PoP Pilot Study
Werngard Czechtizky
Ina Terstiege
Christian Tyrchan
Anders Johansson
Jonas Boström
Kun Song
Alex Hird
Neil Grimster
Richard Ward
Jeff Johannes
64. Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove
it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,
Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
64
65. Utilize the GDB-13 database (975 Million compounds)
65
If we train with 1 million compounds and sample 2 billion, what will we get?
Josep Arus
https://chemrxiv.org/articles/Exploring_the_GDB-13_Chemical_Space_Using_Deep_Generative_Models/7172849
66. Utilize the GDB-13 database
66
80% of 2B sampled molecule within GDB-13
70% of GDB-13 sampled
Josep Arus
https://chemrxiv.org/articles/Exploring_the_GDB-13_Chemical_Space_Using_Deep_Generative_Models/7172849
67. Utilize the GDB-13 database
67
Long tail distribution, 99.5% of molecules sampled at least once
Molecules with uncommon substrings sampled less often
Josep Arus
https://chemrxiv.org/articles/Exploring_the_GDB-13_Chemical_Space_Using_Deep_Generative_Models/7172849