We have created structure based chemical ontologies that are used to classify chemical compounds automatically. These classifications can be used with success in semantic search engines to find all representatives of a chemical class. In the present paper we would like to demonstrate use cases when utilizing these chemical classes as features in typical machine learning approaches.
Thus, we have used the co-occurrence of chemical compounds with biological and physico-chemical properties in scientific articles to train models that predict properties of novel compounds that did not occur in those training sets. One example is the prediction of hepatotoxicity as well as bioavailability. In principle, one can use any property that is found in the textual vicinity of compounds to build such predictive models. Criteria will be presented that allow to judge the quality and predictive power of such models.
This document presents the ligand-binders ontology developed to standardize information about protein binders and their targets. The ontology defines concepts like binders, targets, experiments, and their properties and relationships. It represents key concepts through classes and subclasses, and defines class properties to describe concepts. The ontology aims to provide a standardized vocabulary and enable reasoning about protein binder technologies. It was developed based on domain expertise and minimal information needed to describe binder-target pairs.
Molecular recognition is the specific interaction between two or more complementary molecules through noncovalent bonding such as hydrogen bonding, metal coordination, hydrophobic forces, etc. This process is crucial in biological systems and modern chemical research. Molecular recognition can be static, involving a 1:1 complex between a host and guest molecule, or dynamic, where binding of the first guest induces a conformational change affecting binding of a second guest. Molecular recognition is important in fields like supramolecular chemistry, self-assembly, and host-guest chemistry. It has applications in areas like sensing, molecular motors, and enzyme mimicry.
Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of bio-curation are substantially more accurate.
The EcoCyc database is a freely accessible, comprehensive database that combines information about the genome and metabolism of Escherichia coli K-12. It describes the known genes of E. coli, the enzymes encoded by these genes, and how these enzymes catalyze reactions organized into metabolic pathways. The EcoCyc database is jointly developed and curated by researchers at SRI International and the Marine Biological Laboratory based on experimental literature. It provides graphical tools for visualizing and exploring genomic and biochemical data through its user interface, facilitating analysis of high-throughput data and metabolic modeling.
Several approaches are proposed to solve global numerical optimization problems. Most of researchers have experimented the robustness of their algorithms by generating the result based on minimization aspect. In this paper, we focus on maximization problems by using several hybrid chemical reaction optimization algorithms including orthogonal chemical reaction optimization (OCRO), hybrid algorithm based on particle swarm and chemical reaction optimization (HP-CRO), real-coded chemical reaction optimization (RCCRO) and hybrid mutation chemical reaction optimization algorithm (MCRO), which showed success in minimization. The aim of this paper is to demonstrate that the approaches inspired by chemical reaction optimization are not only limited to minimization, but also are suitable for maximization. Moreover, experiment comparison related to other maximization algorithms is presented and discussed.
This document summarizes a framework for automatically extracting human protein-protein interaction data from biomedical literature. It describes benchmarking interaction datasets based on shared functional annotations and known physical interactions. It also outlines a method using a conditional random field tagger to identify protein names in text and two approaches for extracting interactions: co-citation analysis and learning interaction extractors from annotated sentences. Evaluation shows the extracted interactions have accuracy comparable to manually curated databases.
II-SDV 2017: The "International Chemical Ontology Network" Dr. Haxel Consult
The "International Chemical Ontology Network" is established by major chemical industries in order to elaborate and understand the connections in the Big Data collections of chemistry / pharmacology. The collections come from the different natural sciences and human sciences and can only be explained or experienced in overlapping structures and connections. In this way, new innovative approaches are to be defined, new knowledge generated, products produced and marketed. Thus, the whole value creation process can be simulated and defined in the initial phase of a project, and it is possible to counteract unfavorable developments even at a very early stage. The clear structure and the incorporation of all relevant and common rules will help to improve the understanding of overlapping structures in the research, development and production processes, which will lead to a considerable saving of costs and resources, Full innovation potential of the industry 4.0 approach.
This document discusses cheminformatics and various computer representations of molecules. It describes cheminformatics as encompassing concepts and techniques like molecular similarity, QSAR, substructure search, and molecular depiction/encoding. Common file formats like MOL and SMILES are introduced for representing molecular structures on computers. Fingerprints and other methods of measuring molecular similarity are also summarized.
This document presents the ligand-binders ontology developed to standardize information about protein binders and their targets. The ontology defines concepts like binders, targets, experiments, and their properties and relationships. It represents key concepts through classes and subclasses, and defines class properties to describe concepts. The ontology aims to provide a standardized vocabulary and enable reasoning about protein binder technologies. It was developed based on domain expertise and minimal information needed to describe binder-target pairs.
Molecular recognition is the specific interaction between two or more complementary molecules through noncovalent bonding such as hydrogen bonding, metal coordination, hydrophobic forces, etc. This process is crucial in biological systems and modern chemical research. Molecular recognition can be static, involving a 1:1 complex between a host and guest molecule, or dynamic, where binding of the first guest induces a conformational change affecting binding of a second guest. Molecular recognition is important in fields like supramolecular chemistry, self-assembly, and host-guest chemistry. It has applications in areas like sensing, molecular motors, and enzyme mimicry.
Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of bio-curation are substantially more accurate.
The EcoCyc database is a freely accessible, comprehensive database that combines information about the genome and metabolism of Escherichia coli K-12. It describes the known genes of E. coli, the enzymes encoded by these genes, and how these enzymes catalyze reactions organized into metabolic pathways. The EcoCyc database is jointly developed and curated by researchers at SRI International and the Marine Biological Laboratory based on experimental literature. It provides graphical tools for visualizing and exploring genomic and biochemical data through its user interface, facilitating analysis of high-throughput data and metabolic modeling.
Several approaches are proposed to solve global numerical optimization problems. Most of researchers have experimented the robustness of their algorithms by generating the result based on minimization aspect. In this paper, we focus on maximization problems by using several hybrid chemical reaction optimization algorithms including orthogonal chemical reaction optimization (OCRO), hybrid algorithm based on particle swarm and chemical reaction optimization (HP-CRO), real-coded chemical reaction optimization (RCCRO) and hybrid mutation chemical reaction optimization algorithm (MCRO), which showed success in minimization. The aim of this paper is to demonstrate that the approaches inspired by chemical reaction optimization are not only limited to minimization, but also are suitable for maximization. Moreover, experiment comparison related to other maximization algorithms is presented and discussed.
This document summarizes a framework for automatically extracting human protein-protein interaction data from biomedical literature. It describes benchmarking interaction datasets based on shared functional annotations and known physical interactions. It also outlines a method using a conditional random field tagger to identify protein names in text and two approaches for extracting interactions: co-citation analysis and learning interaction extractors from annotated sentences. Evaluation shows the extracted interactions have accuracy comparable to manually curated databases.
II-SDV 2017: The "International Chemical Ontology Network" Dr. Haxel Consult
The "International Chemical Ontology Network" is established by major chemical industries in order to elaborate and understand the connections in the Big Data collections of chemistry / pharmacology. The collections come from the different natural sciences and human sciences and can only be explained or experienced in overlapping structures and connections. In this way, new innovative approaches are to be defined, new knowledge generated, products produced and marketed. Thus, the whole value creation process can be simulated and defined in the initial phase of a project, and it is possible to counteract unfavorable developments even at a very early stage. The clear structure and the incorporation of all relevant and common rules will help to improve the understanding of overlapping structures in the research, development and production processes, which will lead to a considerable saving of costs and resources, Full innovation potential of the industry 4.0 approach.
This document discusses cheminformatics and various computer representations of molecules. It describes cheminformatics as encompassing concepts and techniques like molecular similarity, QSAR, substructure search, and molecular depiction/encoding. Common file formats like MOL and SMILES are introduced for representing molecular structures on computers. Fingerprints and other methods of measuring molecular similarity are also summarized.
Computer-aided prediction of xenobiotics toxicityMobiliuz
The document discusses computer-aided prediction of chemical toxicity. It defines key terms like toxicology, toxicity, and xenobiotics. It then describes using quantitative structure-activity relationship (QSAR) modeling and molecular descriptors to predict toxicity without animal testing. This allows prioritizing chemicals for safety assessment and identifies effects not seen in animal studies. It lists several software programs that use these methods for predictive toxicology.
Humans are potentially exposed to tens of thousands of man-made chemicals in the environment. It is well known that some environmental chemicals mimic natural hormones and thus have the potential to be endocrine disruptors. Most of these environmental chemicals have never been tested for their ability to disrupt the endocrine system, in particular, their ability to interact with the estrogen receptor. EPA needs tools to prioritize thousands of chemicals, for instance in the Endocrine Disruptor Screening Program (EDSP). This project was intended to be a demonstration of the use of predictive computational models on HTS data including ToxCast and Tox21 assays to prioritize a large chemical universe of 32464 unique structures for one specific molecular target – the estrogen receptor. CERAPP combined multiple computational models for prediction of estrogen receptor activity, and used the predicted results to build a unique consensus model. Models were developed in collaboration between 17 groups in the U.S. and Europe and applied to predict the common set of chemicals. Structure-based techniques such as docking and several QSAR modeling approaches were employed, mostly using a common training set of 1677 compounds provided by U.S. EPA, to build a total of 42 classification models and 8 regression models for binding, agonist and antagonist activity. All predictions were evaluated on ToxCast data and on an external validation set collected from the literature. In order to overcome the limitations of single models, a consensus was built weighting models based on their prediction accuracy scores (including sensitivity and specificity against training and external sets). Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. The final consensus predicted 4001 chemicals as actives to be considered as high priority for further testing and 6742 as suspicious chemicals. This abstract does not necessarily reflect U.S. EPA policy
Medicinal chemistry involves the discovery and design of new therapeutic chemicals and their development into medicines and drugs. It is an interdisciplinary field combining chemistry and biology. Medicinal chemists work to design new drug compounds, determine their biological effects, optimize their structures for desired effects and minimal side effects, and study how the body processes drugs. The physicochemical properties of drugs, like solubility, acidity, and reactivity influence their biological actions and interactions with targets in the body. Understanding these properties helps predict drugs' behaviors and design new candidates. A drug's solubility is key to its formulation and absorption in the body. Both lipophilic and hydrophilic structural features impact a molecule's solubility profile. Acidity and
PRESENTED BY: HARSHPAL SINGH WAHI, SHIKHA D. POPALI
USEFUL FOR PHARMACY STUDENTS AND ACADEMICS, INDUSTRIALS FOR MOLECULE DEVELOPMENT, MODELING, DRUG DISCOVERY, COMPUTATIONAL TOOLS, MOLECULAR DOCKING ITS TYPES, FACTORS AFFECTING, DIFFERENT STAGES, QSAR ADVANTAGES, NEED
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision TreesMichel Dumontier
This document describes research on representing toxicity decision trees as OWL ontologies to enable automated reasoning about chemical hazards. Decision trees were trained to predict drug-likeness and carcinogenicity using chemical data. The trees were then formalized as OWL classes and subclasses representing each node and outcome. This allows chemicals to be classified by automatically reasoning over the ontology. Representing decision trees as ontologies provides explanations for classifications and enables comparison of different trees.
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Chanin Nantasenamat
In this lecture, I provide an overview on how computers can be instrumental in drug discovery efforts. Topics covered includes: big data as a result of omics effort; bioinformatics; cheminformatics; biological space; chemical space; how computers particularly machine learning (and data science) can be applied in the context of drug discovery.
A video of this lecture is also provided on the "Data Professor" YouTube channel available at http://bit.ly/dataprofessor
If you are fascinated about data science, it would mean the world to me if you would consider subscribing to this channel (by clicking the link below):
http://bit.ly/dataprofessor
We provide an overview of the use we make of ontologies at the Royal Society of Chemistry. Our engagement with the ontology community began in 2006 with preparations for Project Prospect, which used ChEBI and other Open Biomedical Ontologies to mark up journal articles. Subsequently Project Prospect has evolved into DERA (Digitally Enhancing the RSC Archive) and we have developed further ontologies for text markup, covering analytical methods and name reactions. Most recently we have been contributing to CHEMINF, an open-source cheminformatics ontology, as part of our work on disseminating calculated physicochemical properties of molecules via the Open PHACTS. We show how we represent these properties and how it can serve as a template for disseminating different sorts of chemical information.
The document discusses ontologies and their role in representing knowledge for artificial intelligence systems and the semantic web. It provides examples of biomedical ontologies like Gene Ontology and anatomy ontologies. It explains how ontologies use formal logic and description logics to precisely define concepts, attributes, and relationships. The document also describes how ontologies were crucial for the initial design of the Matrix and how they continue to be important for representing real-world concepts in semantic web languages.
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...Natalio Krasnogor
The document discusses using evolutionary symbolic discovery methods to synthesize effective energy functions for protein structure prediction and systems/synthetic biology models. It describes using genetic programming techniques to explore large combinatorial spaces of modular components and parameters to construct stochastic P systems that model cellular systems. The goal is to find structures and optimize parameters in P systems to match target models through comparing different evolutionary algorithms on test cases of increasing difficulty and dimension.
Toxicology is the field that studies the adverse effects of chemicals on living organisms. Key figures in the history of toxicology include Paracelsus, Rachel Carson, and Orfila. Important chemicals include mercury, DDT, and alcohol. Risk assessment involves hazard identification, hazard characterization, exposure assessment, and risk characterization. Toxicokinetics describes what happens to a compound in the body, including absorption, distribution, metabolism and excretion, while toxicodynamics describes how the compound causes toxicity. Biotransformation can activate or deactivate compounds through phase I and phase II reactions.
Pharmacophore Modeling and Docking Techniques.pptDrVivekChauhan1
Pharmacophore modeling and molecular docking techniques are important computational methods used in drug design and discovery. Pharmacophore models identify the essential molecular features responsible for biological activity. Molecular docking predicts how drug molecules bind to protein targets. The document discusses key concepts like pharmacophores, bioisosterism, and molecular docking workflows. It also covers common docking software and factors that influence docking results like intramolecular forces and target preparation. Overall, the document provides an overview of pharmacophore modeling and molecular docking techniques that are widely applied in rational drug design.
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
The document discusses the Innovative Medicines Initiative's Open PHACTS project, which aims to develop robust standards and apply them in a semantic integration platform ("Open Pharmacological Space") to integrate drug discovery data from various public and private sources. The project brings together partners from industry, academia, and non-profits to build an open infrastructure for linking drug discovery knowledge and supporting ongoing research. It outlines the technical approach, priorities, and initial progress on developing exemplar applications and a prototype "lash up" system.
The document discusses various topics related to drug design and discovery including structure-based drug design, quantitative structure-activity relationships (QSAR), molecular docking, and de novo drug design. It provides details on the drug discovery process, strategies for structure-based design including pharmacophore identification and docking simulations, factors that govern drug design such as physicochemical properties, and methods for QSAR model development, validation, and applications in drug design.
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionGerald Lushington
This document discusses using a chemical interaction matrix and biclustering approach to analyze multi-dimensional biochemical and chemical biology data to better understand mechanisms of action. The procedure involves assembling a matrix of compounds versus activity and descriptive features, normalizing the data, folding activity measurements into the features, and then using biclustering to identify contiguous regions within the matrix that may indicate groups of compounds sharing a mechanism of action or features correlated with activity. An example application to a dataset of 773 molecules and 298 molecular descriptors relating to oral bioavailability achieves promising results, with biclustering condensing the data into 18 high-quality clusters that cover most of the training space and should provide a strong basis for target identification when compared
Drug discovery is an expensive and lengthy process involving high costs and extensive testing over 10-15 years. Computer-aided drug design techniques like molecular modeling, virtual screening, and quantitative structure-activity relationships (QSAR) are helping to improve the drug discovery process. Molecular docking uses computer models to predict how drug molecules bind to their protein targets. Key steps in docking include target and ligand preparation, docking simulations, and analysis of results. Factors like intermolecular forces, flexibility, and binding site selection influence docking accuracy. QSAR analyses seek mathematical correlations between compound structures and their biological activities to enable prediction of new candidates.
This document discusses molecular docking and different models used to describe molecular recognition between biomolecules. It begins by defining molecular recognition and docking, and describes early models like the lock-and-key and induced-fit models. It then discusses computational docking methods, including representing molecules, scoring docked poses, and search algorithms to generate poses. Flexibility is an important consideration, and methods to incorporate flexibility of both small molecule ligands and protein receptors are described.
This document provides an introduction to the process of drug design given by Subhasis Banerjee. It discusses key areas of drug design including target identification and validation, lead finding and optimization, and ligand-based and structure-based drug design. It also describes the application of cheminformatics in drug design for data mining large databases of small molecules and proteins. The goal of drug design is to progress from initial hits identified through screening to optimized lead compounds through iterative chemical modifications and testing.
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
This document describes a metadata list of patent holders in various countries and regions compiled by Muchiu (Henry) Chang. The list is sorted geographically and includes patent holder names from Canada, China, Hong Kong, Macao, Taiwan, the Middle East, and Europe between 2009-2022. Key features include Chinese-English compatibility and use of open source intelligence. The list has previously been utilized by the Region of Peel in Ontario, Canada.
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.
Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.
More Related Content
Similar to AI-SDV 2020: Using chemical ontologies to create molecular prediction systems for any molecular property Claudia Bobach (OntoChem, Germany)
Computer-aided prediction of xenobiotics toxicityMobiliuz
The document discusses computer-aided prediction of chemical toxicity. It defines key terms like toxicology, toxicity, and xenobiotics. It then describes using quantitative structure-activity relationship (QSAR) modeling and molecular descriptors to predict toxicity without animal testing. This allows prioritizing chemicals for safety assessment and identifies effects not seen in animal studies. It lists several software programs that use these methods for predictive toxicology.
Humans are potentially exposed to tens of thousands of man-made chemicals in the environment. It is well known that some environmental chemicals mimic natural hormones and thus have the potential to be endocrine disruptors. Most of these environmental chemicals have never been tested for their ability to disrupt the endocrine system, in particular, their ability to interact with the estrogen receptor. EPA needs tools to prioritize thousands of chemicals, for instance in the Endocrine Disruptor Screening Program (EDSP). This project was intended to be a demonstration of the use of predictive computational models on HTS data including ToxCast and Tox21 assays to prioritize a large chemical universe of 32464 unique structures for one specific molecular target – the estrogen receptor. CERAPP combined multiple computational models for prediction of estrogen receptor activity, and used the predicted results to build a unique consensus model. Models were developed in collaboration between 17 groups in the U.S. and Europe and applied to predict the common set of chemicals. Structure-based techniques such as docking and several QSAR modeling approaches were employed, mostly using a common training set of 1677 compounds provided by U.S. EPA, to build a total of 42 classification models and 8 regression models for binding, agonist and antagonist activity. All predictions were evaluated on ToxCast data and on an external validation set collected from the literature. In order to overcome the limitations of single models, a consensus was built weighting models based on their prediction accuracy scores (including sensitivity and specificity against training and external sets). Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. The final consensus predicted 4001 chemicals as actives to be considered as high priority for further testing and 6742 as suspicious chemicals. This abstract does not necessarily reflect U.S. EPA policy
Medicinal chemistry involves the discovery and design of new therapeutic chemicals and their development into medicines and drugs. It is an interdisciplinary field combining chemistry and biology. Medicinal chemists work to design new drug compounds, determine their biological effects, optimize their structures for desired effects and minimal side effects, and study how the body processes drugs. The physicochemical properties of drugs, like solubility, acidity, and reactivity influence their biological actions and interactions with targets in the body. Understanding these properties helps predict drugs' behaviors and design new candidates. A drug's solubility is key to its formulation and absorption in the body. Both lipophilic and hydrophilic structural features impact a molecule's solubility profile. Acidity and
PRESENTED BY: HARSHPAL SINGH WAHI, SHIKHA D. POPALI
USEFUL FOR PHARMACY STUDENTS AND ACADEMICS, INDUSTRIALS FOR MOLECULE DEVELOPMENT, MODELING, DRUG DISCOVERY, COMPUTATIONAL TOOLS, MOLECULAR DOCKING ITS TYPES, FACTORS AFFECTING, DIFFERENT STAGES, QSAR ADVANTAGES, NEED
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision TreesMichel Dumontier
This document describes research on representing toxicity decision trees as OWL ontologies to enable automated reasoning about chemical hazards. Decision trees were trained to predict drug-likeness and carcinogenicity using chemical data. The trees were then formalized as OWL classes and subclasses representing each node and outcome. This allows chemicals to be classified by automatically reasoning over the ontology. Representing decision trees as ontologies provides explanations for classifications and enables comparison of different trees.
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Chanin Nantasenamat
In this lecture, I provide an overview on how computers can be instrumental in drug discovery efforts. Topics covered includes: big data as a result of omics effort; bioinformatics; cheminformatics; biological space; chemical space; how computers particularly machine learning (and data science) can be applied in the context of drug discovery.
A video of this lecture is also provided on the "Data Professor" YouTube channel available at http://bit.ly/dataprofessor
If you are fascinated about data science, it would mean the world to me if you would consider subscribing to this channel (by clicking the link below):
http://bit.ly/dataprofessor
We provide an overview of the use we make of ontologies at the Royal Society of Chemistry. Our engagement with the ontology community began in 2006 with preparations for Project Prospect, which used ChEBI and other Open Biomedical Ontologies to mark up journal articles. Subsequently Project Prospect has evolved into DERA (Digitally Enhancing the RSC Archive) and we have developed further ontologies for text markup, covering analytical methods and name reactions. Most recently we have been contributing to CHEMINF, an open-source cheminformatics ontology, as part of our work on disseminating calculated physicochemical properties of molecules via the Open PHACTS. We show how we represent these properties and how it can serve as a template for disseminating different sorts of chemical information.
The document discusses ontologies and their role in representing knowledge for artificial intelligence systems and the semantic web. It provides examples of biomedical ontologies like Gene Ontology and anatomy ontologies. It explains how ontologies use formal logic and description logics to precisely define concepts, attributes, and relationships. The document also describes how ontologies were crucial for the initial design of the Matrix and how they continue to be important for representing real-world concepts in semantic web languages.
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...Natalio Krasnogor
The document discusses using evolutionary symbolic discovery methods to synthesize effective energy functions for protein structure prediction and systems/synthetic biology models. It describes using genetic programming techniques to explore large combinatorial spaces of modular components and parameters to construct stochastic P systems that model cellular systems. The goal is to find structures and optimize parameters in P systems to match target models through comparing different evolutionary algorithms on test cases of increasing difficulty and dimension.
Toxicology is the field that studies the adverse effects of chemicals on living organisms. Key figures in the history of toxicology include Paracelsus, Rachel Carson, and Orfila. Important chemicals include mercury, DDT, and alcohol. Risk assessment involves hazard identification, hazard characterization, exposure assessment, and risk characterization. Toxicokinetics describes what happens to a compound in the body, including absorption, distribution, metabolism and excretion, while toxicodynamics describes how the compound causes toxicity. Biotransformation can activate or deactivate compounds through phase I and phase II reactions.
Pharmacophore Modeling and Docking Techniques.pptDrVivekChauhan1
Pharmacophore modeling and molecular docking techniques are important computational methods used in drug design and discovery. Pharmacophore models identify the essential molecular features responsible for biological activity. Molecular docking predicts how drug molecules bind to protein targets. The document discusses key concepts like pharmacophores, bioisosterism, and molecular docking workflows. It also covers common docking software and factors that influence docking results like intramolecular forces and target preparation. Overall, the document provides an overview of pharmacophore modeling and molecular docking techniques that are widely applied in rational drug design.
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
The document discusses the Innovative Medicines Initiative's Open PHACTS project, which aims to develop robust standards and apply them in a semantic integration platform ("Open Pharmacological Space") to integrate drug discovery data from various public and private sources. The project brings together partners from industry, academia, and non-profits to build an open infrastructure for linking drug discovery knowledge and supporting ongoing research. It outlines the technical approach, priorities, and initial progress on developing exemplar applications and a prototype "lash up" system.
The document discusses various topics related to drug design and discovery including structure-based drug design, quantitative structure-activity relationships (QSAR), molecular docking, and de novo drug design. It provides details on the drug discovery process, strategies for structure-based design including pharmacophore identification and docking simulations, factors that govern drug design such as physicochemical properties, and methods for QSAR model development, validation, and applications in drug design.
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionGerald Lushington
This document discusses using a chemical interaction matrix and biclustering approach to analyze multi-dimensional biochemical and chemical biology data to better understand mechanisms of action. The procedure involves assembling a matrix of compounds versus activity and descriptive features, normalizing the data, folding activity measurements into the features, and then using biclustering to identify contiguous regions within the matrix that may indicate groups of compounds sharing a mechanism of action or features correlated with activity. An example application to a dataset of 773 molecules and 298 molecular descriptors relating to oral bioavailability achieves promising results, with biclustering condensing the data into 18 high-quality clusters that cover most of the training space and should provide a strong basis for target identification when compared
Drug discovery is an expensive and lengthy process involving high costs and extensive testing over 10-15 years. Computer-aided drug design techniques like molecular modeling, virtual screening, and quantitative structure-activity relationships (QSAR) are helping to improve the drug discovery process. Molecular docking uses computer models to predict how drug molecules bind to their protein targets. Key steps in docking include target and ligand preparation, docking simulations, and analysis of results. Factors like intermolecular forces, flexibility, and binding site selection influence docking accuracy. QSAR analyses seek mathematical correlations between compound structures and their biological activities to enable prediction of new candidates.
This document discusses molecular docking and different models used to describe molecular recognition between biomolecules. It begins by defining molecular recognition and docking, and describes early models like the lock-and-key and induced-fit models. It then discusses computational docking methods, including representing molecules, scoring docked poses, and search algorithms to generate poses. Flexibility is an important consideration, and methods to incorporate flexibility of both small molecule ligands and protein receptors are described.
This document provides an introduction to the process of drug design given by Subhasis Banerjee. It discusses key areas of drug design including target identification and validation, lead finding and optimization, and ligand-based and structure-based drug design. It also describes the application of cheminformatics in drug design for data mining large databases of small molecules and proteins. The goal of drug design is to progress from initial hits identified through screening to optimized lead compounds through iterative chemical modifications and testing.
Similar to AI-SDV 2020: Using chemical ontologies to create molecular prediction systems for any molecular property Claudia Bobach (OntoChem, Germany) (20)
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
This document describes a metadata list of patent holders in various countries and regions compiled by Muchiu (Henry) Chang. The list is sorted geographically and includes patent holder names from Canada, China, Hong Kong, Macao, Taiwan, the Middle East, and Europe between 2009-2022. Key features include Chinese-English compatibility and use of open source intelligence. The list has previously been utilized by the Region of Peel in Ontario, Canada.
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.
Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
In 2019 the UK was the first major economy to embrace a legal obligation to achieve net zero carbon emissions by 2050. More broadly, the 2021 UK Innovation Strategy sets out the UK government’s vision to make the UK a global hub for innovation by 2035 with a target of increasing public and private sector R&D expenditure to 2.4% of GDP to support the UK being a science superpower with a world-class research and innovation system.
IP rights create an incentive for R&D which ultimately leads to innovation. Analysis and insights from IP data can therefore help provide a better understanding of how the IP system is being used and where and what innovation is taking place. Research and analysis of IP data is a key input to the ongoing work of the UKIPO’s Green Tech Working Group which seeks to:
further the UK’s status as a global leader by making the UK’s IP environment the best for innovating green technology;
develop and deliver IP policies to support government’s ambition on climate change and green technologies; and
to help innovators best protect and commercialise their green tech innovations both at home and internationally.
The UKIPO has been developing a broad portfolio of ‘green’ IP analytics research. A series of patent analytics reports have been published looking at green technologies, and analysis of how the UK’s Green Channel scheme for accelerated processing of green patent applications has been conducted. Patents have been used to identify technological comparative advantage within different green technologies at a country level, and new insights uncovered by mapping green technology patents to the UN Sustainable Development Goals (SDGs). Trade mark data provides a timeliness and closeness to market factor that patent data does not, and complementary trade mark analysis of UK ‘green’ trade marks, identified using a machine learning algorithm, provides a commercialisation angle to our research.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
This document profiles Linda Andersson, the CEO of Artificial Researcher. It provides details on her background, awards, research fields, and academic merits. It then discusses how domain knowledge enables artificial intelligence systems to be smarter by allowing them to understand language and text in particular domains at a deeper level. Finally, it provides an overview of Artificial Researcher's natural language processing and text mining technologies and services for tasks like passage retrieval, ontology generation, and semantic search.
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
In 2013 we witnessed an evolutionary change in the NLP field evolved thanks to the introduction of space embeddings that, with the use of deep learning architectures, achieved human-level performances in many NLP tasks. With the introduction of the Attention mechanism in 2017 the results were further improved and, as result, embeddings are quickly becoming the de facto standards in solving many NLP problems. In this presentation, you will learn how generate and use space embedding for search purposes and provide comparison metrics to more traditional relevance-based search engines. Moreover, I will provide some initial results on a paper currently under review that provides an insight on hyperparameter tuning during the generation of embeddings.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
It is relatively easy for a human to read a document and quickly figure out which concepts are important. However, this task is a difficult challenge for a machine. During the past few decades, there have been two main approaches for concept identification: Natural Language Processing and Machine Learning. During the early part of this century, Machine Learning made great strides as new techniques came into wider use (SVM’s, Topic Modeling, etc..). Sensing the competition, Natural Language Processing responded with deployment of new emerging techniques (sematic networks, finite state automata, etc..). Neither approach has completely solved the WHAT problem. Advances in Artificial Intelligence have the potential to significantly improve the situation. Where AI is making the most impact is as an enhancement to make Machine Learning and Natural Language Processing work better and, more importantly, work together. This presentation looks at some of this history and what might happen in the future when we blend the interpretation of language with pattern prediction.
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
This document discusses using natural language processing on trademark text data to gain insights. It presents research on how trademark activity changed during COVID-19, detecting emerging trends in trademarks over time, and classifying trademarks by industry. The research uses techniques like topic modeling and deep learning classifiers to analyze trademarks and identify patterns. The analysis of trademarks can provide economic indicators and reveal where businesses are focusing their innovation and market presence.
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
In our customer projects involving automated document processing, we often encounter document types providing crucial data in the form of tables. While established text analytics algorithms are usually optimized to operate on running text, they tend to produce rather poor results on tables as they do not capture the non-sequential relations inside them (e.g. interpret the content of a table cell relative to its column title, interpret line breaks inside a cell differently from line breaks between cells or rows). While there are elaborate information extraction products in the market for a few highly specific types of tabular documents, there is no general approach out there. The main cause for this is the fact that table structures can be encoded by a heterogenous range of layout means (e.g. column boundaries can be signaled by lines vs. aligned text vs. white space). In this talk, we will illustrate several solutions that we have developed for a range of challenges occurring in this context, both for scanned and digitally generated documents.
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
This document discusses using AI tools to improve patent search and analysis. It provides metrics on how well an AI system called IPscreener can retrieve patent citations compared to examiners. The metrics show recall rates increase with longer input text and when users provide additional context. Machine translation negatively impacts performance, but the AI can help users navigate patents by selecting relevant text segments. The goal is for AI to boost innovation by improving how users search for and understand prior art.
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
How do you find video when you only have sparse data? While you can wander the stacks (if you can still find open stacks) for inspiration, video either physical or digital, is difficult to discover. Wandering the virtual stacks is, well, virtually impossible. Discovery platforms on the whole have not replicated the inspirational experience of wandering the stacks.
More companies are using archivable video for internal communication of the various research projects, product developments, test results, and more that are being considered, in progress, or completed. Showing how an experiment was conducted can convey considerably more information that is very difficult to communicate via text. How do you find a company video that might be helpful for your project?
A case study is presented of the problems and the solutions that were implemented by a large, multinational chemical company. A suite of content discovery technologies was used including a video to text to tagging system connected to their documents database and automatically indexed using several chemical as well as conceptual systems (rule-based, NLP, inference engine). To build the system and support the manuscript and video submission there is a metadata extraction program which pulls and inserts the metadata into the submission forms so the author can move quickly through that process.
Copyright Clearance Center
A pioneer in voluntary collective licensing, CCC (Copyright Clearance Center) helps organizations integrate, access, and share information through licensing, content, software, and professional services. With expertise in copyright and information management, CCC and its subsidiary RightsDirect collaborate with stakeholders to design and deliver innovative information solutions that power decision-making by helping people integrate and navigate data sources and content assets. CCC recently acquired the assets and technology of Deep SEARCH 9 (DS9), a knowledge management platform that leverages machine learning to help customers perform semantic search, tag content, and discover new insights.
Lighthouse IP is the world’s leading provider of intellectual property content. The core business of Lighthouse IP is sourcing and creating content from the world’s most challenging authorities. Specialized in IP data, Lighthouse IP provides over 160 countries coverage for patents, over 200 authorities for trademarks and over 90 authorities for designs. Lighthouse IP data is available via several partners. The company is headquartered in Schiphol-Rijk in the Netherlands and has offices in the United States, China, Thailand, Vietnam, Egypt, Indonesia and Belarus. Globally a team of 150 experts works on the creation of this unique data collection.
CENTREDOC was created in 1964 as the technical information center of the swiss watchmaking industry. Building on a strong team of engineers, CENTREDOC now offers a complete range of services and solutions for the monitoring of strategic, technological and competitive information. CENTREDOC is also a leader in the research of patent, technical and business intelligence, and offers consulting expertise in the implementation of monitoring solutions.
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings, but also reveals the need to understand and accept the limitations of the technology. Practical deployments on concrete topics are inevitable to assess and manage the challenges of neuronal network based AI. A workshop report.
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfFlorence Consulting
Quattordicesimo Meetup di Milano, tenutosi a Milano il 23 Maggio 2024 dalle ore 17:00 alle ore 18:30 in presenza e da remoto.
Abbiamo parlato di come Axpo Italia S.p.A. ha ridotto il technical debt migrando le proprie APIs da Mule 3.9 a Mule 4.4 passando anche da on-premises a CloudHub 1.0.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
Discover the benefits of outsourcing SEO to Indiadavidjhones387
"Discover the benefits of outsourcing SEO to India! From cost-effective services and expert professionals to round-the-clock work advantages, learn how your business can achieve digital success with Indian SEO solutions.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
Instagram has become one of the most popular social media platforms, allowing people to share photos, videos, and stories with their followers. Sometimes, though, you might want to view someone's story without them knowing.
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
AI-SDV 2020: Using chemical ontologies to create molecular prediction systems for any molecular property Claudia Bobach (OntoChem, Germany)
1. Claudia Bobach, K. Kruse, S. Boyer, T. Böhme, L. Weber
Using chemical ontologies to create molecular
prediction systems for any molecular property
1Nice, 06.10.2020 11:45
3. OntoChem was started in 2005 in Halle / Saale (Germany).
We are approx. 15 people with diverse scientific and technical backgrounds: chemists, biologists,
computer linguists, chemo- and bioinformatics, computer scientists.
Who we are ...
3
4. How do we support our customers ?
4
1
Data Analytics & Predictions:
indexing your enterprise data; combining internal &
external sources, documents, and DBs; linked data;
document classification; alerts; compound reports.
3 Semantic Software Solutions:
standalone customizable annotation pipelines with
or without ontologies and source documents
(installable or as SaaS).
5 Data Streams:
normalized patents, scientific articles (open source
and behind paywalls), clinical trials, news & more.
2Smart Data:
compound properties from text, images & tables;
mixture compositions; relations (disease X is caused by
protein Y which is inhibited by Z), etc.
4Ontologies:
we understand pharma, chemistry, life and material
sciences, and more …!
5. We have created structure based chemical ontologies that are used to classify chemical
compounds automatically. These classifications can be used with success in semantic search engines
to find all representatives of a chemical class. In the present paper we would like to demonstrate use
cases when utilizing these chemical classes as features in typical machine learning approaches.
Thus, we have used the co-occurrence of chemical compounds with biological properties in
scientific articles and data sets to train models that predict properties of novel compounds that did
not occur in those training sets. One example is the prediction of toxicity as well as antimicrobial
activity. In principle, one can use any property that is found in the textual vicinity of compounds to
build such predictive models.
Abstract
5
6. Goal
● Prediction of any molecular property, e.g. toxicity, by using ontologies and text mining
● Application example: likelihood of drug approval based on toxicity
Inputs to predict any property?
● ACS 2018: Compound classifications (“Boyer molecular labels”) -
chemistry compound classes (OntoChem & ClassyFire) and ChEMBL bioactivities
● Now
Millions of full text scientific documents
Co-occurrences - compound and property (e.g. toxicities)
Scoring ontology classes
Prediction of compound properties
Introduction
6
7. The Workflow
7
Chemical class score
for property
molecular
assignment/
labeling
compound property score
(likeliness to have the property)
Chem. Class Score
secondary amines 1.2052
benzenes 1.1388
propoxybenzenes 0.8142
... ...
Co-Occurrences
Chemical Ontology
[2]
Property Ontology
[2]
Ø
[1]
8. Chemical Ontology Classes & Hierarchy
8
“An ontology encompasses a representation, formal naming and definition of the categories, properties and
relations between the concepts”
(https://en.wikipedia.org/wiki/Ontology_(information_science)).
http://www.cse.buffalo.edu/~rapaport/676/F01/
icecreamontology.jpg
all amine compounds
all secondary amine compounds
9. Property Ontology e.g. cytotoxicity as a subclass of toxicity
9
toxicity = parent of cytotoxicity
class cytotoxicity
10. Ontologies
10
● Ontology to represent chemical features:
● Chemistry ontology including chemical classes & chemical compounds
● Ontologies to represent properties:
● Toxicology ontology: toxic properties
● Effects & processes:
● pharmacologic properties
● antimicrobial,antibacterial, antiviral
● anti-infective, anti-depressant, anti-alzheimer
● Protein interactions
● protein binding
● enzyme inhibition
13. The occurrence of two named entities (words) in a defined range of text (usually a sentence).
For example PMC3663206 [4]:
The PM genotype has furthermore been associated with an increased risk of toxicity after metoprolol administration [22, 58] …
=> Gives one co-occurrence between metoprolol and toxicity.
Co-Occurrence
13
[3]
14. Compound - Toxicity Co-Occurrences in Documents
14
Co-Occurrences
[2] [2]
[1]
2,817,902 - PMC Documents1,333,362 - unique compounds
6,385 - compound classes
83,511,588 - compound annotations
12,842,016 - compound class annotations
9,784,556 - toxicity annotations
816 - toxicity concepts
3,311 - unique compound classes co-occur
with a toxicity concept
1,801,709 - co-occurrences between
compounds/classes and toxicity
15. Chemical Ontology
15
6,500 chemical classes
millions of chemical
compounds are assigned to
its structural chemical
classes by SODIAC
Each compound is
characterized regarding
their chemical features by
the corresponding chemical
classes
16. molecular assignment/labeling
● given a molecules structure (e.g. smiles).
● given a chemical classes ontology.
● assign molecule substructures to chemical
classes.
● SODIAC (Structure based Ontology Development
and Individual Assignment Center)
Assignment of Compounds to Ontology Classes
16
secondary aminesbenzenes
propoxybenzenes
alkylarylether
17. Aggregated molecular information
1. Establish co-occurrence between a chemical compound and a property (e.g.: toxicity)
2. Assign ontological information, substructure assignments = chemical compound classes
3. Generalize: co-occurrences are generalized (inherited) to all ancestors of the respective
chemical class
Hypothesis
The more often a chemical class co-occurs with a property, the more likely this class is related to this
property.
Compound Class - Property Co-occurrence Score
17
18. Let:
coocij
:= number of co-occurrences between class i and property j plus the number of unique
co-occurrences of any child of class i respective property j
Example:
Ni
:= the total number of occurrences of class i plus the total number of unique occurences of any
child of class i
Example:
Then the property score pscrij
of a class i for property j is:
Example:
This defines a score for each of the ~6000 compound classes.
Cooc-Property-Score - Definition
18
coocij
= [all secondary amines]i
- [toxicity]j
cooccurrences = 115,245
Ni
= [all secondary amines]i
total occurrences = 4,581,235
pscrij
= 115,245/4,581,235*100 = 2.5155
19. To get a normalized cooc-property score, where a score > 1 corresponds to classes that enforce the
property and a score < 1 weakens the property we normalize all scores by the root compound class
“compounds” because the score of this class describes the average property likeliness of all
compounds.
Let: c := the root “compounds” class. Then the normalize score npscr_ij is:
Example:
Cooc-Property-Score - Normalization
19
npscrij
= 2.5155/2.1017 = 1.9968
20. For a given compound k, the compound property score for a property j is simply the average
compound class property score pscrlj
of all assigned classes l ∈ Ak
normalized to the root compound
class:
Aggregated Compound - Property Score
20
secondary amines
benzenes
propoxybenzenes
1.9968
1.1388
0.8142
Chemical classes
k
npscr Tox Score
j for k
... ...
Metoprolol
score kj =
Mean 1.1751
21. Chemical Compounds
some with
classes (fragments)
in common
Desired
property j
source
compound
Metoprolol
source fragments
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
coocij
= # of co-occurrences
between class i and property j
property (toxicity)
score for the
compound
Metoprolol
score kj =
Mean 1.1751
Metoprolol
Chemical
classes I
in common = N
(chem fragments)
i
chemical class (fragment)
present in the source molecule
chemical compound that has a
chemical class (fragment) in common
with the source molecule but no
association with the desired property
The desired property
e.g. toxicity all 6,385 chemical
classes (fragments)
all properties
(in this case 816
toxicity classes)
all compounds
In this case 1,333,362
compounds
chemical compound
line thickness
indicates strong or
week association
chemical compound that has a
chemical class (fragment) in common
with the source molecule and has an
association with the desired property
Other properties
e.g. adhesion, LD50, etc
chemical class (fragment) not
in the source molecule
28. ● Similar to the ToxScore calculation
● Used “anti-microbial” concept from “effect or process” ontology as property.
● Co-occurrences between “anti-microbial” child concepts and all chemical classes respective
compounds.
● Five example compounds and their antimicrobial score:
Antimicrobial Property
28
Compound Antimicrobial Score
Halicin 1.44
Trimethoprim 2.34
(+)-(4R)-Limonene 1.84
3-ethenyl-3,4-dihydrodithiine 0.55
2-Vinyl-4H-1,3-dithiine 1.28
30. To evaluate the toxicity score, we used a FDA drug approval and clinical toxicity dataset (ClinTox):
Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS et al (2017)
MoleculeNet: a benchmark for molecular machine learning.
arXiv preprint arXiv:170300564 http://moleculenet.ai/
1,478 molecules/smiles were assigned to classes
For each compound, two outcomes where given:
1. Clinical trial toxic (CT_TOX):
112 - toxic
1,366 - not_toxic
2. FDA approved (FDA_APPROVED):
1,384 - approved
94 - not_approved
ToxScore Evaluation - FDA ClinTox and FDA Drug Approval
30
31. ROC curve for the ToxScore - CT_TOX outcome.
ToxScore Evaluation - FDA Clinical Trial Tox Prediction
31
ToxScore distributions of not toxic vs. toxic
ClinTox compounds.
Using our toxicity score (ToxScore), we tried to predict the ClinTox CT_TOX outcome.
32. ROC curve for the ToxScore - FDA approval
outcome.
ToxScore Evaluation - FDA Approval Prediction
32
ToxScore distributions of not approved vs.
approved ClinTox compounds.
Using our toxicity score (ToxScore), we tried to predict the ClinTox approval outcome.
33. Aim: Evaluation of chemical class assignment to provide suitable features for property prediction
The Tox21 Challenge [5] provided more than 10k compounds. Outcomes for 12 different pathway
assays, where 1 means the compound affects this pathway and 0 not.
We made an assignment with this >10k compounds and used this assigned chem classes as features
to train a Support Vector Machine (SVM), on the training data, to predict the outcomes.
The final classifiers were used to predict the outcome of the remaining test set compounds.
Result, see next slide.
Chemical Feature Evaluation - Tox21 Challenge
33
34. Tox21 test set results:
AUC under the ROC curve (0.5 = no enrichment; 1 - 100% enrichment)
NR.AR androgen receptor
SR.MMP matrix metalloprotease (NR=nuclear receptor effect, SR=stress response effect)
Chemical Feature Evaluation - Tox21 Data Set
34
35. Tox21 test set results:
different performance for compound prediction of the 3 example clusters:
● NR.AR
● NR.AR.LBD
● NR.AhR
Chemical Feature Evaluation - Tox21 Data Set
35
36. ● Structural chemistry ontology classes can be used as a feature space for machine learning
● Data from Tox21 reveals most relevant chemical ontology classes
● Scientific publications annotated with toxicity terms and small molecule compounds can be
used to automatically generate a toxicity prediction system solely based on chemical structure,
even for non-published compounds
● Beside toxicity, any other annotated compound property can also be used to predict the same
property, e.g. as shown for antimicrobial activity
Summary
36
37. [1] http://madan.org.il/en/news/languages-still-major-barrier-global-science
[2] https://idslabblog.wordpress.com/research-topic-ontology/
[3] https://commons.wikimedia.org/wiki/File:Hazard_T.svg
[4] Samer CF, Lorenzini KI, Rollason V, Daali Y, Desmeules JA. Applications of CYP450 testing in the clinical setting.
Mol Diagn Ther. 2013 Jun;17(3):165-84. doi: 10.1007/s40291-013-0028-5. PMID: 23588782; PMCID:
PMC3663206.
[5] Andersen, M. E., and Krewski, D. (2009). Toxicity testing in the 21st century: bringing the vision to life. Toxicol.
Sci. 107, 324–330. doi: 10.1093/toxsci/kfn255
[6] Bobach C, Böhme T, Laube U, Püschel A, Weber L (2012) Automated compound classification using a chemical
ontology. J Cheminformatics 4(12):40
References
37