Joint contrastive learning with infinite possibilitiestaeseon ryu
Contrastive Learning은 두 이미지가 유사한지 유사하지 않은 지에 대해서 어떤 label이 없이 피쳐들을 배우게 하는 머신 learning 테크닉 중에 하나입니다 우리는 기존에 있는 Supervised learning과 조금 차이가 있는데 Supervised learning은 label cost가 들고
그다음에 Task specific 하기 때문에 generalizability가 조금 떨어질 수 있습니다 하지만 Contrastive Learning은 label이 없이 진행하기때문에 label cost가 없고 generalizability가 조금 더 좋을수 있습니다. 해당 논문은 보다 유용한 Contrastive Learning을 위한 Joint Contrastive Learning에 대해 제안을 하는대요 https://youtu.be/0NLq-ikBP1I
This document provides an overview of linear regression analysis. It defines key terms like dependent and independent variables. It describes simple linear regression, which involves predicting a dependent variable based on a single independent variable. It covers techniques for linear regression including least squares estimation to calculate the slope and intercept of the regression line, the coefficient of determination (R2) to evaluate the model fit, and assumptions like independence and homoscedasticity of residuals. Hypothesis testing methods for the slope and correlation coefficient using the t-test and F-test are also summarized.
This document provides an overview of computer-aided drug design (CADD). It discusses how CADD uses computer modeling techniques to accelerate the drug design and development process. The key aspects covered include the modern drug design lifecycle, types of drug design approaches like ligand-based and structure-based methods, and how techniques like docking and scoring are used. The document also notes the significance of CADD in filtering large compound libraries, optimizing lead compounds, and designing novel drug candidates in a more efficient manner compared to traditional drug discovery approaches.
Physicochemical properties play an important role in determining a drug molecule's biological activity, absorption, distribution, and elimination. Key properties include lipophilicity, measured by logP, which influences solubility; acidity and basicity, determined by pKa, which influences ionization; and steric factors like size and shape, which influence receptor binding. Understanding these physicochemical properties allows for rational drug design and prediction of a molecule's behavior in the body.
This document provides an agenda for a workshop on structural equation modeling using SmartPLS2.0. The morning session will cover basic concepts, components of structural and measurement models, and the PLS algorithm. After lunch, the analysis method for reflective models will be discussed. The afternoon will conclude with examining the analysis method for formative models. Hugo Watanuki is the PhD student leading the workshop at Napier University.
Introduction to principal component analysis (pca)Mohammed Musah
This document provides an introduction to principal component analysis (PCA), outlining its purpose for data reduction and structural detection. It defines PCA as a linear combination of weighted observed variables. The procedure section discusses assumptions like normality, homoscedasticity, and linearity that are evaluated prior to PCA. Requirements for performing PCA include the variables being at the metric or nominal level, sufficient sample size and variable ratios, and adequate correlations between variables.
Molecular descriptors are numerical values that characterize molecular properties and structures. They can represent physicochemical properties or values derived from algorithmic techniques applied to molecular structures. Descriptors vary in complexity and computational requirements. Some are based on experimental data while others are algorithmic constructs. Two-dimensional (2D) descriptors are calculated from 2D structures and include counts, physicochemical properties, and topological indices. Three-dimensional (3D) descriptors encode spatial relationships and include fragment screens and pharmacophore keys.
Joint contrastive learning with infinite possibilitiestaeseon ryu
Contrastive Learning은 두 이미지가 유사한지 유사하지 않은 지에 대해서 어떤 label이 없이 피쳐들을 배우게 하는 머신 learning 테크닉 중에 하나입니다 우리는 기존에 있는 Supervised learning과 조금 차이가 있는데 Supervised learning은 label cost가 들고
그다음에 Task specific 하기 때문에 generalizability가 조금 떨어질 수 있습니다 하지만 Contrastive Learning은 label이 없이 진행하기때문에 label cost가 없고 generalizability가 조금 더 좋을수 있습니다. 해당 논문은 보다 유용한 Contrastive Learning을 위한 Joint Contrastive Learning에 대해 제안을 하는대요 https://youtu.be/0NLq-ikBP1I
This document provides an overview of linear regression analysis. It defines key terms like dependent and independent variables. It describes simple linear regression, which involves predicting a dependent variable based on a single independent variable. It covers techniques for linear regression including least squares estimation to calculate the slope and intercept of the regression line, the coefficient of determination (R2) to evaluate the model fit, and assumptions like independence and homoscedasticity of residuals. Hypothesis testing methods for the slope and correlation coefficient using the t-test and F-test are also summarized.
This document provides an overview of computer-aided drug design (CADD). It discusses how CADD uses computer modeling techniques to accelerate the drug design and development process. The key aspects covered include the modern drug design lifecycle, types of drug design approaches like ligand-based and structure-based methods, and how techniques like docking and scoring are used. The document also notes the significance of CADD in filtering large compound libraries, optimizing lead compounds, and designing novel drug candidates in a more efficient manner compared to traditional drug discovery approaches.
Physicochemical properties play an important role in determining a drug molecule's biological activity, absorption, distribution, and elimination. Key properties include lipophilicity, measured by logP, which influences solubility; acidity and basicity, determined by pKa, which influences ionization; and steric factors like size and shape, which influence receptor binding. Understanding these physicochemical properties allows for rational drug design and prediction of a molecule's behavior in the body.
This document provides an agenda for a workshop on structural equation modeling using SmartPLS2.0. The morning session will cover basic concepts, components of structural and measurement models, and the PLS algorithm. After lunch, the analysis method for reflective models will be discussed. The afternoon will conclude with examining the analysis method for formative models. Hugo Watanuki is the PhD student leading the workshop at Napier University.
Introduction to principal component analysis (pca)Mohammed Musah
This document provides an introduction to principal component analysis (PCA), outlining its purpose for data reduction and structural detection. It defines PCA as a linear combination of weighted observed variables. The procedure section discusses assumptions like normality, homoscedasticity, and linearity that are evaluated prior to PCA. Requirements for performing PCA include the variables being at the metric or nominal level, sufficient sample size and variable ratios, and adequate correlations between variables.
Molecular descriptors are numerical values that characterize molecular properties and structures. They can represent physicochemical properties or values derived from algorithmic techniques applied to molecular structures. Descriptors vary in complexity and computational requirements. Some are based on experimental data while others are algorithmic constructs. Two-dimensional (2D) descriptors are calculated from 2D structures and include counts, physicochemical properties, and topological indices. Three-dimensional (3D) descriptors encode spatial relationships and include fragment screens and pharmacophore keys.
- The document proposes a multi-view stacking ensemble method for drug-target interaction (DTI) prediction that combines predictions from multiple machine learning models trained on different drug and target feature view combinations.
- It generates 126 view combination datasets from 14 drug views and 9 target views, then trains extra trees, random forest, and XGBoost classifiers on each view combination. Predictions from these base models are then combined using a stacking ensemble with an extra trees meta-learner.
- The method is shown to outperform single models and voting ensembles, and calibration of the meta-learner and use of local imbalance measures provide further improvements to predictive performance on DTI prediction tasks.
Bioassay is defined as measuring the biological response of living tissues to determine the potency or concentration of an active principle in a preparation. There are various types of bioassays including quantal assays, graded assays, and multiple point assays. Bioassays can be performed on intact animals, isolated tissues, specific cells, or organisms and are useful for standardizing drugs obtained from natural sources and for measuring the activity of new or undefined substances. While powerful, bioassays can be time-consuming and expensive compared to physico-chemical methods.
The document discusses various topics related to drug discovery through bioinformatics and computational approaches. It begins by discussing comparative genomics and using knowledge about model organisms to identify similar biological areas and pathways in other species. It also discusses topics like high-throughput screening of large libraries, the definitions of targets, hits and leads in drug discovery, and approaches like using RNAi and phenotypic screening in model organisms. Finally, it discusses computational methods that can be used throughout the drug discovery process, including for target identification and validation, virtual screening, assessing drug-likeness of compounds, and describing compounds using structural and physicochemical descriptors.
Network analysis of cancer metabolism: A novel route to precision medicineVarshit Dusad
This document discusses using network analysis and mass flow graphs to analyze cancer cell metabolism. It assesses different published genome-scale metabolic models of cancer and determines that PRIME models are best suited for applying mass flow graph analysis. Constraint-based analysis is performed on PRIME models to simulate metabolic conditions and genetic perturbations. Centrality analysis using PageRank reveals changes in network structure under different conditions but does not fully support the centrality-lethality hypothesis regarding essential reactions. Future work is needed to better integrate omics data and identify centrality measures that correlate with biological importance.
This document discusses various topics related to drug discovery through bioinformatics. It begins by describing how genome-wide RNAi screening in the nematode C. elegans can be used to identify genes involved in biological pathways related to diseases like type-2 diabetes. It then discusses topics like structural genomics, target identification and validation, high-throughput screening approaches and facilities, sources for screening libraries, criteria for hit and lead compounds, and computational methods used in hit identification and optimization like pharmacophore modeling and evaluating compounds against the "rule of five". Descriptors that can be used for characterizing compounds are also listed.
Sorting bioactive wheat from database chaff: Challenges of discerning correct...Guide to PHARMACOLOGY
Since 2009 the Guide to PHARMACOLOGY database (GtoPdb) team have curated 7586 ligands from papers, including approved drugs, clinical candidates , research compounds peptides and clinical antibodies (PMID 24234439). As PubChem pushes towards 70 million compound identifiers (CIDs), we have noticed the problem
of “multiplexing” during the curation of 5713 small molecules as CIDs. we encountered many representations (i.e. different CIDs) of the same pharmacological entities. Three types of variation dominate: stereochemistry, mixtures and isotopic analogues. These are known constitutive issues for chemical databases but in
recent years we observed this multiplexing was reaching
problematic proportions (i.e. more chaff), especially for clinically used drugs (i.e. proportionally less wheat)
Sorting bioactive wheat from database chaffChris Southan
Abstract
Databases of bioactive compounds are crucial for pharmacology, drug discovery and chemical genomics as public sources approach ~ 100 million records. However, in recent years this famine-to-feast presents difficulties for searching chemical structures and linked activity data, particularly for those unfamiliar with the constitutive challenges of molecular representation in silico (PMID 25415348). A key problem is entries of structural variants of “the same thing” as pharmacological entities (i.e. representational multiplexing). For example, a 2009 comparison of three database subsets of ~1200 approved drugs recorded only 807 structures in-common (PMID 20298516). In addition, published counts of approved drugs vary widely. These issues have been continually encountered by the Guide to PHARMACOLOGY database (GtoPdb) team that, since 2009, has achieved the curation of ~5500 small molecules (including approved drugs) from papers. Concomitantly, we have noticed an increase in multiplexing as PubChem pushes towards 65 million compound identifiers (CIDs). Since one of our key objectives is to affinity-map ligands to their targets, we decided to assess this multiplexing problem in order to optimise our curation rules. The results have implications for the entire bioactivity information space. We began by compiling CID sets for seven different sources within PubChem encompassing approved drugs. Initially a 7x7 pairwise comparison matrix indicated low overlap between these sources. A Venn diagram was then made from the approved drug CIDs mapped by DrugBank, Therapeutic Target Database and ChEMBL. At 749, the three-way intersect was less than 35% of the union of all CIDs covered by the sets. Strikingly, this looks worse that the 2009 study (although the sources and comparison methods were different). We will present further analyses that go some way towards explaining these results. One of these is determining “same connectivity” statistics inside PubChem as a measure of multiplexing. For DrugBank, each approved drug was related to, on average, 19 different CIDs as structural variants. Analysis of multiplexing confirmed trends we had observed during individual drug curation. This included ~ 30% stereoisomer enumerations but, surprisingly, ~70% isotopic derivatives, dominated by patent-derived virtual deuteration. We also established the ratio of submissions (SIDs) to CIDs was 78. The paradox was that, despite this high “majority vote” support for approved drug CIDs curated by DrugBank, only 55% were in the 3-way consensus (figures for the other two curated sources were similar). Analysing by year in PubChem indicated how the recent expansion of vendor and patent-extraction structures contributes to both multiplexing and the SID: CID ratio. While approved drugs are strongly impacted, associated problems, such as split activity data and deciding the “correct” structures, affect essentially all public drug discovery chem
Humans are potentially exposed to tens of thousands of man-made chemicals in the environment. It is well known that some environmental chemicals mimic natural hormones and thus have the potential to be endocrine disruptors. Most of these environmental chemicals have never been tested for their ability to disrupt the endocrine system, in particular, their ability to interact with the estrogen receptor. EPA needs tools to prioritize thousands of chemicals, for instance in the Endocrine Disruptor Screening Program (EDSP). This project was intended to be a demonstration of the use of predictive computational models on HTS data including ToxCast and Tox21 assays to prioritize a large chemical universe of 32464 unique structures for one specific molecular target – the estrogen receptor. CERAPP combined multiple computational models for prediction of estrogen receptor activity, and used the predicted results to build a unique consensus model. Models were developed in collaboration between 17 groups in the U.S. and Europe and applied to predict the common set of chemicals. Structure-based techniques such as docking and several QSAR modeling approaches were employed, mostly using a common training set of 1677 compounds provided by U.S. EPA, to build a total of 42 classification models and 8 regression models for binding, agonist and antagonist activity. All predictions were evaluated on ToxCast data and on an external validation set collected from the literature. In order to overcome the limitations of single models, a consensus was built weighting models based on their prediction accuracy scores (including sensitivity and specificity against training and external sets). Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. The final consensus predicted 4001 chemicals as actives to be considered as high priority for further testing and 6742 as suspicious chemicals. This abstract does not necessarily reflect U.S. EPA policy
The document discusses various topics related to drug discovery including target identification and validation, high-throughput screening, hit and lead identification, computational approaches like docking and de novo design, and clinical trial phases. It provides definitions for key terms like target, screening, hit, and lead. It also discusses sources for screening libraries and describes factors to consider for an optimal drug target.
The document outlines the schedule and content for a bioinformatics course. It includes 10 lessons covering topics like biological databases, sequence alignments, database searching, phylogenetics, and protein structure. It also mentions that the final exam will include randomly generated images from a set of 713 images.
USFDA guidelines for bioanalytical method validationbhatiaji123
The document discusses guidelines for bioanalytical method validation from the USFDA. It describes key parameters that must be validated for a bioanalytical method, including selectivity, accuracy, precision, recovery, calibration curves, sensitivity, reproducibility and stability. Accuracy and precision are determined by analyzing quality control samples in replicates across multiple runs. Recovery experiments compare extracted samples to unextracted standards. A calibration curve consisting of multiple concentrations over the expected range must be precise and reproducible.
Scoring and ranking of metabolic trees to computationally prioritize chemical...Kamel Mansouri
The aim of this work was to design an in silico and in vitro approach to prioritize compounds and perform a quantitative safety assessment. To this end, we pursue a tiered approach taking into account bioactivity and bioavailability of chemicals and their metabolites using a human uterine epithelial cell (Ishikawa)-based assay. This biologically relevant fit-for-purpose assay was designed to quantitatively recapitulate in vivo human response and establish a margin of safety.
This document discusses various topics related to drug discovery through bioinformatics and computational approaches. It covers target identification and validation, high-throughput screening, developing hits into leads, evaluating drug-likeness of compounds using rules like Lipinski's Rule of Five, and using computational descriptors for virtual screening. The goal is to discuss how computational tools can help streamline the drug discovery process by aiding in target selection and validation, compound screening and optimization of leads.
The document summarizes Andy Pope's presentation on lead discovery and hit identification approaches at GSK. It discusses current hit identification methods like high-throughput screening (HTS) and encoded library technologies. HTS involves screening large libraries of up to 2 million compounds but has limitations due to costs and inability to screen under different conditions. Encoded library technologies address these limitations by allowing synthesis and screening of much larger virtual libraries of over 1 billion compounds under various conditions using DNA-encoded small molecules. The document emphasizes the importance of compound quality and hit qualification methods to identify true hits from screening assays.
Bringing bioassay protocols to the world of informatics, using semantic annot...Alex Clark
This document discusses bringing bioassay protocols into the world of informatics by using semantic annotations. It describes how measurements from bioassays contain many details that are usually only available as text, and outlines an approach using ontologies, natural language processing, and machine learning to extract this information and make it accessible for searching, comparing datasets, and identifying trends. The goal is to make all bioassay protocol data machine readable by developing common templates and annotation standards that can be applied to existing and new assay data sources.
The document describes building an in-silico phenotypic screening approach using Reaxys. Phenotypic screening identifies compounds that produce a biological response in cells and has proven successful in drug discovery. The approach develops a target fingerprint for disease-specific cell lines to identify pharmacological targets and molecular mechanisms of action. As a scenario, the target fingerprint is generated for melanoma A375 cells based on 996 compounds with IC50 < 1μM, identifying targets like BRAF, ERK2 and melanocortin receptors. Literature also links the melanocortin receptor A3R to reduced proliferation in A375 cells.
Autonomous model building with a preponderance of well annotated assay protocolsAlex Clark
Combining large amounts of publicly available structure-activity data with assays that have carefully curated annotations opens the door to a number of ways to analyze the data behind the scenes. Combining fully machine readable input for a diverse variety of projects with modelling techniques that can be used without fussy parametrization allows models to be created and updated whenever new data arrives. Predictions from these models can be integrated into normal searching and visualization workflows, without any need for the user to opt-in or make extra decisions. This approach is novel and different from the way structure-activity models are normally deployed: useful predictions can be presented ubiquitously with literally zero additional work on behalf of the user. We will present our efforts to date regarding ways to both passively and actively draw attention to important drug discovery trends while exploring compounds and assays.
Protein remote homology detection refers to identifying homologous proteins that are structurally and functionally similar but share low sequence identity. It is important for proteomics, biomedical science, and predicting protein structure and function. The number of protein sequences is growing rapidly due to sequencing techniques, but the number of known protein structures is growing much slower. Computational approaches provide an alternative to traditional expensive techniques for remote homology detection. Discriminative methods treat homology detection as a supervised classification problem using positive and negative samples to predict new samples and reduce false positives compared to alignment methods. The current state-of-the-art uses a BLSTM model that achieves high accuracy on benchmark datasets like SCOP.
This document describes the Reaxys Medicinal Chemistry database and how it can be used for drug discovery tasks. The database contains over 5 million substances and 25 million biological data points. It organizes and normalizes data to calculate pX values for comparing biological results. A heatmap allows comparing pX values by filtering and adjusting axes. The document demonstrates searches for in vitro activity data, cytochrome P450 inhibition data, PK data, and toxicity data. It also shows how to find substances active against a target but not others.
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Kamel Mansouri
The aim of this work was to design an in silico and in vitro approach to prioritize compounds and perform a quantitative safety assessment. To this end, we pursue a tiered approach taking into account bioactivity and bioavailability of chemicals and their metabolites using a human uterine epithelial cell (Ishikawa)-based assay. This biologically relevant fit-for-purpose assay was designed to quantitatively recapitulate in vivo human response and establish a margin of safety.
Mixtures QSAR: modelling collections of chemicalsAlex Clark
This document discusses representing and modeling chemical mixtures. It proposes a new data format called Mixfile or MInChI to hierarchically define mixtures and their components, including concentrations. This format aims to support cheminformatics applications like property prediction. Examples are given modeling theophylline solubility and gas absorption using mixture data. The document also describes applying similar methods to model polymer entropy of mixing using a spreadsheet dataset converted to the mixtures format. It concludes that defining mixtures in digital formats will enable greater analysis, modeling and use of mixture data.
Mixtures InChI: a story of how standards drive upstream productsAlex Clark
This document discusses the development of Mixtures InChI (MInChI), a standard for representing chemical mixtures in a machine-readable format. MInChI was developed to address the lack of standards for mixture informatics and interoperability. The document outlines the development of open source tools to generate and edit MInChI notation, as well as efforts to build a community and integrate MInChI into commercial products and databases to enable widespread use and generation of mixture data. Future work discussed includes finalizing the MInChI specification, extending it to additional chemical entities, developing associated properties and metadata, and implementing MInChI at large scale.
More Related Content
Similar to CDD BioAssay Express: Expanding the target dimension: How to visualize a lot of SAR
- The document proposes a multi-view stacking ensemble method for drug-target interaction (DTI) prediction that combines predictions from multiple machine learning models trained on different drug and target feature view combinations.
- It generates 126 view combination datasets from 14 drug views and 9 target views, then trains extra trees, random forest, and XGBoost classifiers on each view combination. Predictions from these base models are then combined using a stacking ensemble with an extra trees meta-learner.
- The method is shown to outperform single models and voting ensembles, and calibration of the meta-learner and use of local imbalance measures provide further improvements to predictive performance on DTI prediction tasks.
Bioassay is defined as measuring the biological response of living tissues to determine the potency or concentration of an active principle in a preparation. There are various types of bioassays including quantal assays, graded assays, and multiple point assays. Bioassays can be performed on intact animals, isolated tissues, specific cells, or organisms and are useful for standardizing drugs obtained from natural sources and for measuring the activity of new or undefined substances. While powerful, bioassays can be time-consuming and expensive compared to physico-chemical methods.
The document discusses various topics related to drug discovery through bioinformatics and computational approaches. It begins by discussing comparative genomics and using knowledge about model organisms to identify similar biological areas and pathways in other species. It also discusses topics like high-throughput screening of large libraries, the definitions of targets, hits and leads in drug discovery, and approaches like using RNAi and phenotypic screening in model organisms. Finally, it discusses computational methods that can be used throughout the drug discovery process, including for target identification and validation, virtual screening, assessing drug-likeness of compounds, and describing compounds using structural and physicochemical descriptors.
Network analysis of cancer metabolism: A novel route to precision medicineVarshit Dusad
This document discusses using network analysis and mass flow graphs to analyze cancer cell metabolism. It assesses different published genome-scale metabolic models of cancer and determines that PRIME models are best suited for applying mass flow graph analysis. Constraint-based analysis is performed on PRIME models to simulate metabolic conditions and genetic perturbations. Centrality analysis using PageRank reveals changes in network structure under different conditions but does not fully support the centrality-lethality hypothesis regarding essential reactions. Future work is needed to better integrate omics data and identify centrality measures that correlate with biological importance.
This document discusses various topics related to drug discovery through bioinformatics. It begins by describing how genome-wide RNAi screening in the nematode C. elegans can be used to identify genes involved in biological pathways related to diseases like type-2 diabetes. It then discusses topics like structural genomics, target identification and validation, high-throughput screening approaches and facilities, sources for screening libraries, criteria for hit and lead compounds, and computational methods used in hit identification and optimization like pharmacophore modeling and evaluating compounds against the "rule of five". Descriptors that can be used for characterizing compounds are also listed.
Sorting bioactive wheat from database chaff: Challenges of discerning correct...Guide to PHARMACOLOGY
Since 2009 the Guide to PHARMACOLOGY database (GtoPdb) team have curated 7586 ligands from papers, including approved drugs, clinical candidates , research compounds peptides and clinical antibodies (PMID 24234439). As PubChem pushes towards 70 million compound identifiers (CIDs), we have noticed the problem
of “multiplexing” during the curation of 5713 small molecules as CIDs. we encountered many representations (i.e. different CIDs) of the same pharmacological entities. Three types of variation dominate: stereochemistry, mixtures and isotopic analogues. These are known constitutive issues for chemical databases but in
recent years we observed this multiplexing was reaching
problematic proportions (i.e. more chaff), especially for clinically used drugs (i.e. proportionally less wheat)
Sorting bioactive wheat from database chaffChris Southan
Abstract
Databases of bioactive compounds are crucial for pharmacology, drug discovery and chemical genomics as public sources approach ~ 100 million records. However, in recent years this famine-to-feast presents difficulties for searching chemical structures and linked activity data, particularly for those unfamiliar with the constitutive challenges of molecular representation in silico (PMID 25415348). A key problem is entries of structural variants of “the same thing” as pharmacological entities (i.e. representational multiplexing). For example, a 2009 comparison of three database subsets of ~1200 approved drugs recorded only 807 structures in-common (PMID 20298516). In addition, published counts of approved drugs vary widely. These issues have been continually encountered by the Guide to PHARMACOLOGY database (GtoPdb) team that, since 2009, has achieved the curation of ~5500 small molecules (including approved drugs) from papers. Concomitantly, we have noticed an increase in multiplexing as PubChem pushes towards 65 million compound identifiers (CIDs). Since one of our key objectives is to affinity-map ligands to their targets, we decided to assess this multiplexing problem in order to optimise our curation rules. The results have implications for the entire bioactivity information space. We began by compiling CID sets for seven different sources within PubChem encompassing approved drugs. Initially a 7x7 pairwise comparison matrix indicated low overlap between these sources. A Venn diagram was then made from the approved drug CIDs mapped by DrugBank, Therapeutic Target Database and ChEMBL. At 749, the three-way intersect was less than 35% of the union of all CIDs covered by the sets. Strikingly, this looks worse that the 2009 study (although the sources and comparison methods were different). We will present further analyses that go some way towards explaining these results. One of these is determining “same connectivity” statistics inside PubChem as a measure of multiplexing. For DrugBank, each approved drug was related to, on average, 19 different CIDs as structural variants. Analysis of multiplexing confirmed trends we had observed during individual drug curation. This included ~ 30% stereoisomer enumerations but, surprisingly, ~70% isotopic derivatives, dominated by patent-derived virtual deuteration. We also established the ratio of submissions (SIDs) to CIDs was 78. The paradox was that, despite this high “majority vote” support for approved drug CIDs curated by DrugBank, only 55% were in the 3-way consensus (figures for the other two curated sources were similar). Analysing by year in PubChem indicated how the recent expansion of vendor and patent-extraction structures contributes to both multiplexing and the SID: CID ratio. While approved drugs are strongly impacted, associated problems, such as split activity data and deciding the “correct” structures, affect essentially all public drug discovery chem
Humans are potentially exposed to tens of thousands of man-made chemicals in the environment. It is well known that some environmental chemicals mimic natural hormones and thus have the potential to be endocrine disruptors. Most of these environmental chemicals have never been tested for their ability to disrupt the endocrine system, in particular, their ability to interact with the estrogen receptor. EPA needs tools to prioritize thousands of chemicals, for instance in the Endocrine Disruptor Screening Program (EDSP). This project was intended to be a demonstration of the use of predictive computational models on HTS data including ToxCast and Tox21 assays to prioritize a large chemical universe of 32464 unique structures for one specific molecular target – the estrogen receptor. CERAPP combined multiple computational models for prediction of estrogen receptor activity, and used the predicted results to build a unique consensus model. Models were developed in collaboration between 17 groups in the U.S. and Europe and applied to predict the common set of chemicals. Structure-based techniques such as docking and several QSAR modeling approaches were employed, mostly using a common training set of 1677 compounds provided by U.S. EPA, to build a total of 42 classification models and 8 regression models for binding, agonist and antagonist activity. All predictions were evaluated on ToxCast data and on an external validation set collected from the literature. In order to overcome the limitations of single models, a consensus was built weighting models based on their prediction accuracy scores (including sensitivity and specificity against training and external sets). Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. The final consensus predicted 4001 chemicals as actives to be considered as high priority for further testing and 6742 as suspicious chemicals. This abstract does not necessarily reflect U.S. EPA policy
The document discusses various topics related to drug discovery including target identification and validation, high-throughput screening, hit and lead identification, computational approaches like docking and de novo design, and clinical trial phases. It provides definitions for key terms like target, screening, hit, and lead. It also discusses sources for screening libraries and describes factors to consider for an optimal drug target.
The document outlines the schedule and content for a bioinformatics course. It includes 10 lessons covering topics like biological databases, sequence alignments, database searching, phylogenetics, and protein structure. It also mentions that the final exam will include randomly generated images from a set of 713 images.
USFDA guidelines for bioanalytical method validationbhatiaji123
The document discusses guidelines for bioanalytical method validation from the USFDA. It describes key parameters that must be validated for a bioanalytical method, including selectivity, accuracy, precision, recovery, calibration curves, sensitivity, reproducibility and stability. Accuracy and precision are determined by analyzing quality control samples in replicates across multiple runs. Recovery experiments compare extracted samples to unextracted standards. A calibration curve consisting of multiple concentrations over the expected range must be precise and reproducible.
Scoring and ranking of metabolic trees to computationally prioritize chemical...Kamel Mansouri
The aim of this work was to design an in silico and in vitro approach to prioritize compounds and perform a quantitative safety assessment. To this end, we pursue a tiered approach taking into account bioactivity and bioavailability of chemicals and their metabolites using a human uterine epithelial cell (Ishikawa)-based assay. This biologically relevant fit-for-purpose assay was designed to quantitatively recapitulate in vivo human response and establish a margin of safety.
This document discusses various topics related to drug discovery through bioinformatics and computational approaches. It covers target identification and validation, high-throughput screening, developing hits into leads, evaluating drug-likeness of compounds using rules like Lipinski's Rule of Five, and using computational descriptors for virtual screening. The goal is to discuss how computational tools can help streamline the drug discovery process by aiding in target selection and validation, compound screening and optimization of leads.
The document summarizes Andy Pope's presentation on lead discovery and hit identification approaches at GSK. It discusses current hit identification methods like high-throughput screening (HTS) and encoded library technologies. HTS involves screening large libraries of up to 2 million compounds but has limitations due to costs and inability to screen under different conditions. Encoded library technologies address these limitations by allowing synthesis and screening of much larger virtual libraries of over 1 billion compounds under various conditions using DNA-encoded small molecules. The document emphasizes the importance of compound quality and hit qualification methods to identify true hits from screening assays.
Bringing bioassay protocols to the world of informatics, using semantic annot...Alex Clark
This document discusses bringing bioassay protocols into the world of informatics by using semantic annotations. It describes how measurements from bioassays contain many details that are usually only available as text, and outlines an approach using ontologies, natural language processing, and machine learning to extract this information and make it accessible for searching, comparing datasets, and identifying trends. The goal is to make all bioassay protocol data machine readable by developing common templates and annotation standards that can be applied to existing and new assay data sources.
The document describes building an in-silico phenotypic screening approach using Reaxys. Phenotypic screening identifies compounds that produce a biological response in cells and has proven successful in drug discovery. The approach develops a target fingerprint for disease-specific cell lines to identify pharmacological targets and molecular mechanisms of action. As a scenario, the target fingerprint is generated for melanoma A375 cells based on 996 compounds with IC50 < 1μM, identifying targets like BRAF, ERK2 and melanocortin receptors. Literature also links the melanocortin receptor A3R to reduced proliferation in A375 cells.
Autonomous model building with a preponderance of well annotated assay protocolsAlex Clark
Combining large amounts of publicly available structure-activity data with assays that have carefully curated annotations opens the door to a number of ways to analyze the data behind the scenes. Combining fully machine readable input for a diverse variety of projects with modelling techniques that can be used without fussy parametrization allows models to be created and updated whenever new data arrives. Predictions from these models can be integrated into normal searching and visualization workflows, without any need for the user to opt-in or make extra decisions. This approach is novel and different from the way structure-activity models are normally deployed: useful predictions can be presented ubiquitously with literally zero additional work on behalf of the user. We will present our efforts to date regarding ways to both passively and actively draw attention to important drug discovery trends while exploring compounds and assays.
Protein remote homology detection refers to identifying homologous proteins that are structurally and functionally similar but share low sequence identity. It is important for proteomics, biomedical science, and predicting protein structure and function. The number of protein sequences is growing rapidly due to sequencing techniques, but the number of known protein structures is growing much slower. Computational approaches provide an alternative to traditional expensive techniques for remote homology detection. Discriminative methods treat homology detection as a supervised classification problem using positive and negative samples to predict new samples and reduce false positives compared to alignment methods. The current state-of-the-art uses a BLSTM model that achieves high accuracy on benchmark datasets like SCOP.
This document describes the Reaxys Medicinal Chemistry database and how it can be used for drug discovery tasks. The database contains over 5 million substances and 25 million biological data points. It organizes and normalizes data to calculate pX values for comparing biological results. A heatmap allows comparing pX values by filtering and adjusting axes. The document demonstrates searches for in vitro activity data, cytochrome P450 inhibition data, PK data, and toxicity data. It also shows how to find substances active against a target but not others.
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Kamel Mansouri
The aim of this work was to design an in silico and in vitro approach to prioritize compounds and perform a quantitative safety assessment. To this end, we pursue a tiered approach taking into account bioactivity and bioavailability of chemicals and their metabolites using a human uterine epithelial cell (Ishikawa)-based assay. This biologically relevant fit-for-purpose assay was designed to quantitatively recapitulate in vivo human response and establish a margin of safety.
Similar to CDD BioAssay Express: Expanding the target dimension: How to visualize a lot of SAR (20)
Mixtures QSAR: modelling collections of chemicalsAlex Clark
This document discusses representing and modeling chemical mixtures. It proposes a new data format called Mixfile or MInChI to hierarchically define mixtures and their components, including concentrations. This format aims to support cheminformatics applications like property prediction. Examples are given modeling theophylline solubility and gas absorption using mixture data. The document also describes applying similar methods to model polymer entropy of mixing using a spreadsheet dataset converted to the mixtures format. It concludes that defining mixtures in digital formats will enable greater analysis, modeling and use of mixture data.
Mixtures InChI: a story of how standards drive upstream productsAlex Clark
This document discusses the development of Mixtures InChI (MInChI), a standard for representing chemical mixtures in a machine-readable format. MInChI was developed to address the lack of standards for mixture informatics and interoperability. The document outlines the development of open source tools to generate and edit MInChI notation, as well as efforts to build a community and integrate MInChI into commercial products and databases to enable widespread use and generation of mixture data. Future work discussed includes finalizing the MInChI specification, extending it to additional chemical entities, developing associated properties and metadata, and implementing MInChI at large scale.
Mixtures as first class citizens in the realm of informaticsAlex Clark
Presented at Cambridge (UK) cheminformatics meeting, February 2021. Mixtures of chemicals are underutilised from an informatics point of view, and this presentation shows some of the work done by Collaborative Drug Discovery, IUPAC and InChI Trust to remedy this.
See recording: https://www.youtube.com/watch?v=0ILc0owuEzQ&list=PLfj_gc4RCduuwv9p8lh2xS1EhQ3p_Nd9S&index=1 ... my part starts at 1:05:00
Mixtures: informatics for formulations and consumer productsAlex Clark
The document proposes standards for representing mixtures in a machine-readable format. It introduces Mixfile and MInChI (Mixtures InChI) as hierarchical and concise formats for describing mixtures. Examples of formulations are provided to demonstrate how components, concentrations, and metadata can be encoded. Potential applications of the standards are discussed, such as enabling sophisticated searches of mixture data from publications and vendors to facilitate properties prediction and hazards assessment. Adoption of the standards could help ensure the longevity and sharing of mixture data.
Chemical mixtures: File format, open source tools, example data, and mixtures...Alex Clark
This document discusses representing chemical mixtures using an open format called Mixfile. It proposes Mixfile as a standard format for mixtures, analogous to Molfile for individual molecules. Tools were created to edit and manipulate Mixfiles. Over 5,600 real-world mixture examples were extracted from text and represented in the Mixfile format. A MInChI notation was also defined as a condensed representation of mixtures. Future work is proposed to integrate mixture definitions and lookups into electronic lab notebooks and improve automated extraction of mixture information from text.
Representing molecules with minimalism: A solution to the entropy of informaticsAlex Clark
Cheminformatics as we know it is possible because so many molecular structures can be represented with datastructures and rules that are at first glance quite trivial. This first impression is highly misleading, since even within supposedly well behaved domains, edge cases arising from issues such as resonance, tautomerization, symmetry and stereochemistry - to name but a few - quickly add up. To supplement these genuine challenges, there is a whole additional class of problems caused by the mismatch between chemists' understanding of molecules and the datatypes that are necessary to capture a structure for informatics purposes. This line is blurred by the convenience of representing structures in a form that is very closely related to the diagram styles that have been in use since the dawn of chemistry. There are currently four major approaches to structure representation: connection tables (e.g. MDL Molfile), sketches (e.g. ChemDraw), canonical strings (e.g. SMILES and InChI) and atomic models (numerous 3D formats). Not only do all of these approaches have valid use cases, but they are deceptively incompatible with each other, even when addressing identical needs. Almost without exception, format conversions are not commutative, and every translation involves losing some amount of data. Given that recording chemical structures in machine readable form has become such a critical part of scientific research, it is essential to define a fundamental representation that captures the key structural definition asserted by the experimental chemist, for a broad and useful range of molecules, and ideally in a way that is closely related to visual drawing mnemonics. The number of data concepts needed to satisfy these conditions is quite small, and is mostly satisfied by the most commonly used subset of the venerable MDL Molfile format. This presentation will discuss how this subset, with a few minor corrections and clarifications, can and should be used as the reference standard for molecules, and how the informatics community can benefit from having well defined standards.
Presentation to the EPA (August 2016) about the BioAssay Express project, from Collaborative Drug Discovery. Describes the history and potential of the project, with the intention of opening a dialog about incorporating EPA toxicity data.
SLAS2016: Why have one model when you could have thousands?Alex Clark
Society for Laboratory Automation & Screening, San Diego, January 2016. Presented by Dr. Alex M. Clark. Describes the use of open data resources (ChEMBL) to build target-activity models for drug discovery and toxicity prediction, on a massive scale, using a fully automated process. Concludes with a demo of the PolyPharma app, which shows how these models can be used for prospective drug discovery.
The anatomy of a chemical reaction: Dissection by machine learning algorithmsAlex Clark
This document discusses using machine learning algorithms to analyze chemical reaction data. It describes how current reaction reporting formats are not well-suited for computational analysis. A more structured reporting format is proposed to fully describe reactions in a digitally friendly way, including specifying reactants, products, quantities, yields, and metrics like atom efficiency. This structured data would allow modeling of reaction substitutability and enable large-scale machine learning of chemical transformations.
Compact models for compact devices: Visualisation of SAR using mobile appsAlex Clark
Presented at American Chemical Society meeting, Boston, 2015. Describes how cheminformatics algorithms and visualisation interfaces have advanced on mobile apps to cover a diverse variety of functionality, increasingly calculated on the device itself rather than deferring to a web service. Culminates in a demo of the PolyPharma app prototype (see http://cheminf20.org/2015/08/06/the-polypharma-app-a-mash-up-of-ideas-and-technology)
Green chemistry in chemical reactions: informatics by designAlex Clark
Chemical informatics technology can be of assistance to chemists for describing reactions in numerous ways, including calculating green chemistry metrics such as process mass intensity, E-factor and atom economy. To facilitate this, chemical reactions have to be described in more precise detail than is the norm for most chemists. There are also numerous practical ways to add more green chemistry functionality to lab notebooks, such as enumerating searchable reaction transforms for environmentally favourable reactions, automatically looking up toxicity and hazard information, and others which are mentioned in the slides.
This presentation was given at the Green Chemistry & Engineering conference in 2015 (Americal Chemical Society Green Chemistry Insititute).
Green chemistry is an important subject that needs to be a part of every chemist's education, as well as a part of the daily routine of the professional synthetic chemist. This talk describes how a new app can be used to bring green chemistry metrics to reaction descriptions, once they are captured in a proper cheminformatics format. It also describes some of the additional data resources that can be incorporated into the user experience, and how this helps both students and professionals.
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Alex Clark
Mobile apps for cheminformatics are quite powerful on their own, but can be significantly boosted by connecting them with cloud-hosted functionality. This talk explores the range of functionality that can be covered simply by making use of apps with stateless webservices, i.e. anonymous access without persistent data.
Building a mobile reaction lab notebook (ACS Dallas 2014)Alex Clark
This document discusses building a mobile electronic lab notebook focused on chemical reactions called the Green Lab Notebook. It would allow users to draw chemical structures, balance reactions, and calculate quantities, yields, and green metrics. Key features include digitally capturing reaction data, prioritizing computer-friendly data structures and intuitive workflows, and linking to external databases for solvent data, sustainable feedstocks, and curated green reaction transforms. The goal is to facilitate recording, analyzing, and promoting the reuse of experimental reaction data in a sustainable chemistry context.
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Alex Clark
Presented at 2013 the German Chem[o]informatics Conference in Fulda, 2013: entitled "Putting together the pieces: building a reaction-centric electronic lab notebook for mobile devices".
Cheminformatics
workflows using the
mobile + cloud platform. Presentation by Dr. Alex M. Clark of Molecular Materials Informatics at the NETTAB 2013 meeting in Venice, Italy. The presentation introduces the significance of mobile apps in science, and the scope of their capabilities in chemical structure informatics. The bulk of the talk describes an account of a preliminary workflow using open science data to search for viable leads for a cure for tuberculosis. The workflow described makes use of a combination of mobile, cloud and conventional desktop-based technology, all stitched together by facile communication, sharing and collaboration features.
Open Drug Discovery Teams @ Hacking Health MontrealAlex Clark
Alex Clark of Molecular Materials Informatics (http://molmatinf.com) presents the Open Drug Discovery Teams project to the Hacking Health Montreal audience, April 2013.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot of SAR
1. Expanding the target dimension:
How to visualize a lot of SAR
Alex M. Clark
http://www.bioassayexpress.com
⚡
COLLABORATIVE DRUG DISCOVERY
2. Protocols & Datasets
◈ Structure-activity data
usually exists in isolation:
▷ one target, one protocol
▷ many compounds &
measurements
◈ Datasets often related:
▷ same target?
▷ same protocol?
◈ How would we know?
2
3. Prerequisite
◈ Protocols normally described using scientific English
◈ Alternatively, many ontologies are available:
▷ BAO (BioAssay Ontology)
▷ DTO (Drug Target Ontology)
▷ CLO (Cell Line Ontology)
▷ GO (Gene Ontology)
▷ ... and many others
◈ Each term has a URI: global meaning, hierarchical
3
4. Common Assay Template
◈ Pulls together
ontology
categories
◈ ~22 assignments
◈ Captures key
summary
information
4
5. Common Assay Template
◈ Pulls together
ontology
categories
◈ ~22 assignments
◈ Captures key
summary
information
4
6. Protocol Similarity
5
504854 449764
A Cell Based Secondary Assay To Explore Vero
Cell Cytotoxicity of Purified and Synthesized
Compounds that Inhibit Mycobacterium
Tuberculosis (4)
This functional assay was developed for
detection of compounds inhibiting Vero E6 cells
viability as a secondary screen to the beta-
lactam sensitizing M. tuberculosis bacteriocidal
assay.
In this assay, we treated Vero E6 cells with
compounds selected as hits in the M.
tuberculosis assay for 72 hours over a 10 point
2-fold dilution series, ranging from 0.195 uM to
100 uM. Following 72 hours of treatment,
relative viable cell number was determined
using Cell Titer Glo from Promega. Each plate
contained 64 replicates of vehicle treated cells
which served as negative controls.
Outcome: Compounds that showed <70% cell
viability for at least one concentration were
defined as "Active". If the % viability at all doses
was >70%, the compound was defined as
"Inactive".
...
A High Throughput Confirmatory Assay used to
Identify Novel Compounds that Inhibit
Mycobacterium Tuberculosis in the absence of
Glycerol
Outcome: Compounds that showed >30% inhibition
for at least one concentration in the Mtb dose
response were defined as "Active". If the inhibition
at all doses was <30% in the Mtb assay, the
compound was defined as "Inactive". In the primary
screen a compound was deemed "Inactive" if it had
a Percent Inhibition <70.31%. Compounds with a
Percent Inhibition >70.31% but were not selected
for follow up dose response were labeled
"Inconclusive."
The following tiered system has been implemented
at Southern Research Institute for use with the
PubChem Score. Compounds in the primary screen
are scored on a scale of 0-40 based on inhibitory
activity where a score of 40 corresponds to 100%
inhibition. In the confirmatory dose response
screen, active compounds were scored on a scale
of 41-80 based on the IC50 result in the Mtb assay
while compounds that did not confirm as actives
were given the score 0.
...
69%
similar (?)
34. Conclusion
◈ Cheminformatics & bioinformatics are well
established, but assay informatics is new
◈ Data curation has begun: much yet remains (>1M)
◈ Content curation is the main bottleneck
▷ BioAssay Express is key to solving this problem
◈ Content consumption is at proof of concept stage
▷ ... and this is the fun part.
18
35. Future Work
◈ Crowd curation of public data: you should want your
assays to be machine readable
◈ Model building: create from structure-activity-assay,
apply all existing knowledge to prospective discovery
◈ ELN context: use semantic annotations instead of
text, describe assays comprehensively
▷ convert annotations to text, for communication
▷ translate protocols to import formats for robots
19
36. Assay Data
◈ All features, and curated data, available at:
◦ http://www.bioassayexpress.com
◦ http://beta.bioassayexpress.com
◈ Ready for crowd curation - free to use in public
◈ Commercial product: private installations, with
custom settings, are being deployed
20
37. Acknowledgments
◈ Biologists
▷ Janice Kranz
▷ Haifa Ghandour
▷ Karen Featherstone
21
◈ Collaborative Drug Discovery
⇨ Barry Bunin
⇨ Hande Kücük
⇨ and the rest of the team
◈ More information
http://github.com/cdd/bioassay-template
http://www.bioassayexpress.com
http://collaborativedrug.com
PeerJ
Comp Sci
2:e61
(2016)