Biological, chemical, and physical properties of molecules are encoded in their molecular structure. The challenge lies in discovering the relationships between the structure of the molecular graphs and the measured activity. In this presentation, we introduce Chemaxon’s new product, the Trainer Engine. It is designed to streamline the workflow starting from input data containing measured activities until validated models are implemented for a wide audience.
In addition to summarizing our results obtained with various machine learning model training scenarios, our goal is to highlight the model inference aspects. Accordingly, we present an integration use case with Chemaxon’s Design Hub. Connecting these applications widens the range of information resources available for decision-making on compound series to enhance drug discovery pipelines.
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?ChemAxon
Ákos' study aims to provide an overview of ChemAxon's different search engines and web services. A benchmark will be presented along with statistics on the performance of the different JChem engines.
Biological, chemical and physical properties of molecules are encoded in their molecular structure. The challenge lies in discovering the relationships between the molecular graphs and the measured activity. Where data is measured, collected and curated for a series of compounds there is an opportunity to find the hidden relationships.
Chemical structures come in various shapes and sizes, depending on the scientists or even algorithms that create them. Though variability may sometimes seem subtle to a trained chemist’s eyes, these can introduce inconsistencies that impair chemical search algorithms or model building. Structure normalization is a key component of any cheminformatics workflow with an often underestimated significance. Finding relationships between chemical structures and their measured properties primarily relies on the representation of the chemical matter. Variability of the calculated features and descriptors for these representations can influence data analysis and accuracy of the predictions. During the first part of the presentation we will present the effect of chemical normalization on investigating correlations and building predictive models.
The second part of the talk will incorporate the results of model building on 163 ChEMBL targets extracted from the bioactivity benchmark set1. Results with different descriptor generation methods including ECFP fingerprints, MACCS key, structural properties, geometry properties and phy-chem properties will be discussed in detail. This part focuses on summarizing the results of more than 3000 Random Forest models.
Finally model development for ADMET targets will be highlighted including hERG cardiotoxicity prediction, permeability and blood brain barrier penetration. We will describe how these models can be built, analyzed, optimized and deployed using our new machine learning platform.
Efficient biomolecular structural data handling and analysis - Webinar with D...ChemAxon
In this joint event our experts are coming together to elaborate on the technical and scientific opportunities coming from this partnership. If you are working on the discovery of next generation drugs and using structure-based approach in your research, join us to learn how to leverage vast biomolecular structural data with innovative technologies.
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseChemAxon
The next generation of ChemAxon’s chemical editor, Marvin Pro, was created to excel in handling large synthesis schemes and creating publication quality figures. The first version, planned for release in December 2021, elegantly combines the intuitiveness of the clean web-based user interface and the chemical smartness of ChemAxon’s well-known previous editors.
Intellectual property (IP) intelligence solutions designed for the way resear...ChemAxon
Leveraging IP intelligence through the researcher workflow requires the curation of chemistry patents including many thousands of molecules. This complex task is time-consuming and error-prone when done manually, whereas using ChemAxon’s ChemCuratora to analyze and extract chemical information in patents and other documents means the process can be done accurately in a fraction of the time.
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...ChemAxon
Boehringer Ingelheim's Nils Weskamp discusses eDesign: a computational platform for molecule design and optimization. This presentation explaing how to combine data, algorithms and user experience to impact compound design, and gives a glimpse into the agile and interdisciplinary teamwork as facilitated by Design Hub as a success factor for the development of digital tools.
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?ChemAxon
Ákos' study aims to provide an overview of ChemAxon's different search engines and web services. A benchmark will be presented along with statistics on the performance of the different JChem engines.
Biological, chemical and physical properties of molecules are encoded in their molecular structure. The challenge lies in discovering the relationships between the molecular graphs and the measured activity. Where data is measured, collected and curated for a series of compounds there is an opportunity to find the hidden relationships.
Chemical structures come in various shapes and sizes, depending on the scientists or even algorithms that create them. Though variability may sometimes seem subtle to a trained chemist’s eyes, these can introduce inconsistencies that impair chemical search algorithms or model building. Structure normalization is a key component of any cheminformatics workflow with an often underestimated significance. Finding relationships between chemical structures and their measured properties primarily relies on the representation of the chemical matter. Variability of the calculated features and descriptors for these representations can influence data analysis and accuracy of the predictions. During the first part of the presentation we will present the effect of chemical normalization on investigating correlations and building predictive models.
The second part of the talk will incorporate the results of model building on 163 ChEMBL targets extracted from the bioactivity benchmark set1. Results with different descriptor generation methods including ECFP fingerprints, MACCS key, structural properties, geometry properties and phy-chem properties will be discussed in detail. This part focuses on summarizing the results of more than 3000 Random Forest models.
Finally model development for ADMET targets will be highlighted including hERG cardiotoxicity prediction, permeability and blood brain barrier penetration. We will describe how these models can be built, analyzed, optimized and deployed using our new machine learning platform.
Efficient biomolecular structural data handling and analysis - Webinar with D...ChemAxon
In this joint event our experts are coming together to elaborate on the technical and scientific opportunities coming from this partnership. If you are working on the discovery of next generation drugs and using structure-based approach in your research, join us to learn how to leverage vast biomolecular structural data with innovative technologies.
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseChemAxon
The next generation of ChemAxon’s chemical editor, Marvin Pro, was created to excel in handling large synthesis schemes and creating publication quality figures. The first version, planned for release in December 2021, elegantly combines the intuitiveness of the clean web-based user interface and the chemical smartness of ChemAxon’s well-known previous editors.
Intellectual property (IP) intelligence solutions designed for the way resear...ChemAxon
Leveraging IP intelligence through the researcher workflow requires the curation of chemistry patents including many thousands of molecules. This complex task is time-consuming and error-prone when done manually, whereas using ChemAxon’s ChemCuratora to analyze and extract chemical information in patents and other documents means the process can be done accurately in a fraction of the time.
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...ChemAxon
Boehringer Ingelheim's Nils Weskamp discusses eDesign: a computational platform for molecule design and optimization. This presentation explaing how to combine data, algorithms and user experience to impact compound design, and gives a glimpse into the agile and interdisciplinary teamwork as facilitated by Design Hub as a success factor for the development of digital tools.
Patent Data for Artificial Intelligence based Drug DiscoveryChemAxon
Han-Jo KIm from Standigm presents on using ChemAxon's ChemCurator in processing structures and relevant data from patents, from Google Patents, PDF and text format.
The Synergy platform is ChemAxon’s approach to SaaS solutions for chemistry related R&D data management. It provides an integrated system that cleans, organizes and links together pre-clinical research data and a collaborative workspace where people from multiple sites can work with each other, as well as with CROs and partners. Besides giving a summary of the fundamental platform components, the presentation guides the audience through the process of capturing chemical data in our Compound Registration tool, uploading and standardizing assay results and visualizing, as well as analyzing combined chemical data and biological results.
This webinar will guide you through the Design Hub platform for scientific design and discovery project management using an antiviral compound optimization example. The workflow starts from capturing the first observation and corresponding hypothesis, showcase compound design relying on a vast amount of information sources and predictive models, including the new hERG toxicity prediction and docking using RDock. It will highlight how tracking the fate of compound ideas created as draft virtual compounds through synthesis targets and finally registered samples is fluently managed by the interaction of Design Hub and Compound Registration system.
JChem Microservices provide microservices in small separate modules for different areas of ChemAxon functionalities like chemical dataset searching, conversion between chemical file formats.
Chemicalize Pro - Cheminfo Stories 2020 Day 5ChemAxon
Chemicalize is an online SaaS product providing UI based chemical calculations, drawing and searching as well as API based endpoints for integrators and embeddable web components for website owners. In the presentation we introduce the service and showing the essence of the embeddable web components through a real-life use case focusing on the compliance questions.
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5ChemAxon
Here, we present an updated version of iPPI-DB, our manually curated database of PPI modulators. In this release, the data model, the graphical interface and the tools to query the database have been completely redesigned. We used Chemaxon MarvinJS and JChem library to support this development. We added new PPI modulators, new PPI targets, and extended our focus to stabilizers of PPIs as well. Finally, we introduce a web application relying on crowdsourcing for the maintenance of the database. This application can be used outside of our group to collaboratively maintain iPPI-DB within a community of curators.
ChemAxon ChemLocator - Cheminfo Stories Day 5ChemAxon
ChemLocator is useful tool for extracting chemical and biological insights from documents. The count of yearly published articles, patents, journals etc. is wildly increasing. In this presentation we show how the tool will save you time and take some load off your shoulders when your job needs document searching. We show how to use it through its built-in web-based user interface and also introduce the way of API level integration into 3rd party applications.
Search Engine Improvements - Cheminfo Stories 2020 Day 1ChemAxon
Chemaxon's second generation search engine is being improved in order to serve distributed searches. Short overview of the roadmap, other improvements.
An application of ChemAxon's platform for educationChemAxon
Online tools for chemical education are widely used in the last decade. Several state of the art homework and test systems are available for chemistry online learning. Most of these applications are based on chemistry textbooks and uses well-curated questions created by professionals to help students to master a certain division of chemistry.
Our ultimate goal is to support online learning and help students practice and master chemistry and biochemistry. ChemAxon’s learning platform has a friendly and intuitive interface to easily create and share online learning materials. Tools like Marvin JS, BioEddie and JChemBase are used to automatically evaluate and grade student’s assignments.
ChemAxon provides an open cloud-based learning hub to enhance classroom collaboration and increase the effectiveness of learning. This online learning platform is a powerful tool to help teachers in coaching students based on progress tracking.
Chemical intelligence that makes hidden knowledge effortlessly reachableChemAxon
The knowledge, that is being produced and stored in the forms of reports, patents and scientific journal articles is expanding exponentially. Although, the unstructured nature of such contents impose constraints for seamless information access and scientific decision support. Chemistry is a unique field in this regard, for two reasons. First, the nomenclature is verbose in a sense that a chemical structure can be represented with various synonyms, for example traditional name, IUPAC name or a wide range of brand names or chemical formats (SMILES). Second, the navigation in the knowledge base, with queries related to the encapsulated chemical space, calls for peculiar search methods like similarity-based or substructure searches.
Our study highlights computational approaches to turn chemistry related knowledge stored in all the open access articles easily accessible. We present our results obtained on this large corpus through the following workflow: i) large-scale conversion of text content to chemical objects, ii) automated preparation of databases to store and organize relevant data, and iii) analysis of the collected chemistry space.
Extraction of chemical objects was done from nearly 1.9M articles that stretches the chemical space of open access scientific literature with ChemLocator application. Chemical space was analysed with calculation of fingerprint-based chemical similarity matrix and clustering by MadFast Similarity Search. In order to explore the scaffold diversity of this exclusive chemical space, the obtained set was fragmented to yield rings and ring systems. Hidden relationships were explored by combining text and chemical information in graph data model and related visualization.
In summary, our use-case highlights the potential of novel technologies to pre-process, search and explore the information network enfolded in large document sets on the field of chemistry.
Deep analysis of chemical patents and Markush claimsChemAxon
Finding the relevant prior art document is only a first step in FTO analysis or novelty check. The precise understanding of the chemical scope of complex Markush claims is the critical aspect of these workflows. It is difficult and tedious to determine if a large set of structures is covered when you have a complex Markush claim over multiple pages. Computer-assisted analysis of chemical patents simplifies such challenges; helping you in faster and better understanding of the claims and avoiding mistakes. The main technical building blocks are the advanced Markush visualization techniques, and automatic checking of your compounds against the Claimed Markush structures. Crucially, these analysis steps can be performed in your secure inhouse environment.
Bridging the gap between small molecule and biologics editingChemAxon
An increasing number of new FDA approved drugs are biologics; in 2015 alone, 19 out of the 51 approved drugs were biological entities. Increasingly, the development of these complex drugs requires chemists and biologists to collaborate closely from ideation to product maturity. During this process candidate molecules undergo iterative changes which need to be communicated precisely and unambiguously to all researchers involved in the project. Although the cheminformatics world is well covered in terms of software to draw, store, search, report and manage small molecules, there is currently no efficient way to handle biological entities in the same manner.
ChemAxon, a well-known cheminformatics software provider, recognized and bridged this information gap between biology and chemistry by the development and integration of Biomolecule Toolkit and the biological editor, BioEddie. We provide unambiguous representation for biologics: peptides, oligonucleotides, proteins, antibodies, antibody drug conjugates etc., including those containing unnatural and chemically-modified components with the ability to define ambiguous structural elements. The standardized representation, paired with the ability to round-trip between standard chemical and biological file formats (MDL MOL to HELM conversion and vice versa), allows researchers to keep a single data store of molecular assets (Biomolecule Toolkit), in which they can query based on sequences, chemical structure or metadata. Relevant molecules can be exported for further processing in other computational tools. In this poster, we will demonstrate the novelty of our approach and present a couple of case studies: one for CHEMBL v21 peptides dataset and one for antibody registration.
EUGM15 - Zoltán Simon (Printnet): Drug Profile Matching - Drug Discovery by P...ChemAxon
Most drugs exert their effects via multi-target interactions, as hypothesized by polypharmacology. Here we introduce Drug Profile Matching (DPM) which is able to relate complex drug-protein interaction profiles with effect and target profiles. Structural data and registered effect profiles of all small-molecule drugs were collected and interactions to a series of non-target protein binding sites of each drug were calculated. Statistical analyses confirmed close relationships between the studied 177 effect and 77 target categories and the in silico generated interaction profiles of cca. 1,200 FDA-approved small-molecule drugs. Receiver Operating Characteristic analysis and 10-fold cross-validation was performed to assess the accuracy and robustness of the method. Based on the found relationships, the effect and target profiles of drugs can be revealed in their entirety, and hitherto uncovered effects and targets can be predicted in a systematic manner.
In order to investigate the predictive power of DPM, four effect categories (PPAR agonist, angiotensin-converting enzyme inhibitor, cyclooxygenase inhibitor and dopamine agent) were selected and predictions in the set of the FDA-approved small-molecule drugs were verified by literature analysis and experimental tests.
Moreover, a large set consisting of 600,000 druglike molecules was selected from a database of 50 million compounds and their interaction profiles were generated. Based on these profiles and chemical similarity considerations, predictions were calculated and tested experimentally to find new candidates that are chemically dissimilar to the reference drugs.
Patent Data for Artificial Intelligence based Drug DiscoveryChemAxon
Han-Jo KIm from Standigm presents on using ChemAxon's ChemCurator in processing structures and relevant data from patents, from Google Patents, PDF and text format.
The Synergy platform is ChemAxon’s approach to SaaS solutions for chemistry related R&D data management. It provides an integrated system that cleans, organizes and links together pre-clinical research data and a collaborative workspace where people from multiple sites can work with each other, as well as with CROs and partners. Besides giving a summary of the fundamental platform components, the presentation guides the audience through the process of capturing chemical data in our Compound Registration tool, uploading and standardizing assay results and visualizing, as well as analyzing combined chemical data and biological results.
This webinar will guide you through the Design Hub platform for scientific design and discovery project management using an antiviral compound optimization example. The workflow starts from capturing the first observation and corresponding hypothesis, showcase compound design relying on a vast amount of information sources and predictive models, including the new hERG toxicity prediction and docking using RDock. It will highlight how tracking the fate of compound ideas created as draft virtual compounds through synthesis targets and finally registered samples is fluently managed by the interaction of Design Hub and Compound Registration system.
JChem Microservices provide microservices in small separate modules for different areas of ChemAxon functionalities like chemical dataset searching, conversion between chemical file formats.
Chemicalize Pro - Cheminfo Stories 2020 Day 5ChemAxon
Chemicalize is an online SaaS product providing UI based chemical calculations, drawing and searching as well as API based endpoints for integrators and embeddable web components for website owners. In the presentation we introduce the service and showing the essence of the embeddable web components through a real-life use case focusing on the compliance questions.
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5ChemAxon
Here, we present an updated version of iPPI-DB, our manually curated database of PPI modulators. In this release, the data model, the graphical interface and the tools to query the database have been completely redesigned. We used Chemaxon MarvinJS and JChem library to support this development. We added new PPI modulators, new PPI targets, and extended our focus to stabilizers of PPIs as well. Finally, we introduce a web application relying on crowdsourcing for the maintenance of the database. This application can be used outside of our group to collaboratively maintain iPPI-DB within a community of curators.
ChemAxon ChemLocator - Cheminfo Stories Day 5ChemAxon
ChemLocator is useful tool for extracting chemical and biological insights from documents. The count of yearly published articles, patents, journals etc. is wildly increasing. In this presentation we show how the tool will save you time and take some load off your shoulders when your job needs document searching. We show how to use it through its built-in web-based user interface and also introduce the way of API level integration into 3rd party applications.
Search Engine Improvements - Cheminfo Stories 2020 Day 1ChemAxon
Chemaxon's second generation search engine is being improved in order to serve distributed searches. Short overview of the roadmap, other improvements.
An application of ChemAxon's platform for educationChemAxon
Online tools for chemical education are widely used in the last decade. Several state of the art homework and test systems are available for chemistry online learning. Most of these applications are based on chemistry textbooks and uses well-curated questions created by professionals to help students to master a certain division of chemistry.
Our ultimate goal is to support online learning and help students practice and master chemistry and biochemistry. ChemAxon’s learning platform has a friendly and intuitive interface to easily create and share online learning materials. Tools like Marvin JS, BioEddie and JChemBase are used to automatically evaluate and grade student’s assignments.
ChemAxon provides an open cloud-based learning hub to enhance classroom collaboration and increase the effectiveness of learning. This online learning platform is a powerful tool to help teachers in coaching students based on progress tracking.
Chemical intelligence that makes hidden knowledge effortlessly reachableChemAxon
The knowledge, that is being produced and stored in the forms of reports, patents and scientific journal articles is expanding exponentially. Although, the unstructured nature of such contents impose constraints for seamless information access and scientific decision support. Chemistry is a unique field in this regard, for two reasons. First, the nomenclature is verbose in a sense that a chemical structure can be represented with various synonyms, for example traditional name, IUPAC name or a wide range of brand names or chemical formats (SMILES). Second, the navigation in the knowledge base, with queries related to the encapsulated chemical space, calls for peculiar search methods like similarity-based or substructure searches.
Our study highlights computational approaches to turn chemistry related knowledge stored in all the open access articles easily accessible. We present our results obtained on this large corpus through the following workflow: i) large-scale conversion of text content to chemical objects, ii) automated preparation of databases to store and organize relevant data, and iii) analysis of the collected chemistry space.
Extraction of chemical objects was done from nearly 1.9M articles that stretches the chemical space of open access scientific literature with ChemLocator application. Chemical space was analysed with calculation of fingerprint-based chemical similarity matrix and clustering by MadFast Similarity Search. In order to explore the scaffold diversity of this exclusive chemical space, the obtained set was fragmented to yield rings and ring systems. Hidden relationships were explored by combining text and chemical information in graph data model and related visualization.
In summary, our use-case highlights the potential of novel technologies to pre-process, search and explore the information network enfolded in large document sets on the field of chemistry.
Deep analysis of chemical patents and Markush claimsChemAxon
Finding the relevant prior art document is only a first step in FTO analysis or novelty check. The precise understanding of the chemical scope of complex Markush claims is the critical aspect of these workflows. It is difficult and tedious to determine if a large set of structures is covered when you have a complex Markush claim over multiple pages. Computer-assisted analysis of chemical patents simplifies such challenges; helping you in faster and better understanding of the claims and avoiding mistakes. The main technical building blocks are the advanced Markush visualization techniques, and automatic checking of your compounds against the Claimed Markush structures. Crucially, these analysis steps can be performed in your secure inhouse environment.
Bridging the gap between small molecule and biologics editingChemAxon
An increasing number of new FDA approved drugs are biologics; in 2015 alone, 19 out of the 51 approved drugs were biological entities. Increasingly, the development of these complex drugs requires chemists and biologists to collaborate closely from ideation to product maturity. During this process candidate molecules undergo iterative changes which need to be communicated precisely and unambiguously to all researchers involved in the project. Although the cheminformatics world is well covered in terms of software to draw, store, search, report and manage small molecules, there is currently no efficient way to handle biological entities in the same manner.
ChemAxon, a well-known cheminformatics software provider, recognized and bridged this information gap between biology and chemistry by the development and integration of Biomolecule Toolkit and the biological editor, BioEddie. We provide unambiguous representation for biologics: peptides, oligonucleotides, proteins, antibodies, antibody drug conjugates etc., including those containing unnatural and chemically-modified components with the ability to define ambiguous structural elements. The standardized representation, paired with the ability to round-trip between standard chemical and biological file formats (MDL MOL to HELM conversion and vice versa), allows researchers to keep a single data store of molecular assets (Biomolecule Toolkit), in which they can query based on sequences, chemical structure or metadata. Relevant molecules can be exported for further processing in other computational tools. In this poster, we will demonstrate the novelty of our approach and present a couple of case studies: one for CHEMBL v21 peptides dataset and one for antibody registration.
EUGM15 - Zoltán Simon (Printnet): Drug Profile Matching - Drug Discovery by P...ChemAxon
Most drugs exert their effects via multi-target interactions, as hypothesized by polypharmacology. Here we introduce Drug Profile Matching (DPM) which is able to relate complex drug-protein interaction profiles with effect and target profiles. Structural data and registered effect profiles of all small-molecule drugs were collected and interactions to a series of non-target protein binding sites of each drug were calculated. Statistical analyses confirmed close relationships between the studied 177 effect and 77 target categories and the in silico generated interaction profiles of cca. 1,200 FDA-approved small-molecule drugs. Receiver Operating Characteristic analysis and 10-fold cross-validation was performed to assess the accuracy and robustness of the method. Based on the found relationships, the effect and target profiles of drugs can be revealed in their entirety, and hitherto uncovered effects and targets can be predicted in a systematic manner.
In order to investigate the predictive power of DPM, four effect categories (PPAR agonist, angiotensin-converting enzyme inhibitor, cyclooxygenase inhibitor and dopamine agent) were selected and predictions in the set of the FDA-approved small-molecule drugs were verified by literature analysis and experimental tests.
Moreover, a large set consisting of 600,000 druglike molecules was selected from a database of 50 million compounds and their interaction profiles were generated. Based on these profiles and chemical similarity considerations, predictions were calculated and tested experimentally to find new candidates that are chemically dissimilar to the reference drugs.
6. Effect of standardization
- Simple descriptors (Mw, fsp3,
HBDA, etc. )
Imipramine pamoate Furan-2-ol
- Phys-chem (logD, pKa)
- Molecular graph, Fingerprints
Salts, solvates Tautomerism
“Overall and despite our efforts to use open software wherever possible, we find that
ChemAxon Tautomers node outperforms the other approaches we tested.”
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-022-00606-7
7. Small molecule retention time (SMRT) dataset: Tautomerization
https:/
/www.nature.com/articles/s41467-019-13680-7
10. Activity dataset: the ‘ChEMBL bioactivity benchmark set’
Data source: Journal of Cheminformatics, 9, 45 (2017) by Eelke B. Lenselink, Niels
ten Dijke, Brandon Bongers, George Papadatos, Herman W. T. van Vlijmen, Wojtek
Kowalczyk, Adriaan P. IJzerman, Gerard J. P. van Westen
- ChEMBL database (version 20)
- Activities were selected that met the following criteria:
- at least 30 compounds tested per protein and from at least 2 separate publications
- assay confidence score of 9
- ‘single protein’ target type
- assigned pCHEMBL value
- no flags on potential duplicate or data validity comment
- originating from scientific literature
- data points with activity comments ‘not active’, ‘inactive’, ‘inconclusive’, and ‘undetermined’ were
removed
- MED value was chosen
https:/
/jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0232-0
11. Application Study on ChEMBL
- Data points in range: 500-4703 (med:776)
- 163 ChEMBL targets, pAct
- Sorted by Document Year, last 30 points
reserved as External set: Ext
Last 30 Ext
12. Application Study on ChEMBL
- Data points in range: 500-4703 (med:776)
- 163 ChEMBL targets, pAct
- Sorted by Document Year, last 30 points reserved
as External set: Ext
- 10-90% test-training set split: Test
- ~160k total training size
- ~18k total test size
Rnd 90% Train
Last 30 Ext
Rnd 10% Test
...
16. Conformal prediction
Proper
Training Set
Model Error model
Calibration
set
Error Prediction
Training Set
P(80%)
calibration
factor (ɑ)
https:/
/www.jmlr.org/papers/volume9/shafer08a/shafer08a.pdf
https:/
/pubs.acs.org/doi/10.1021/ci5001168
17. Conformal prediction
Proper
Training Set
Model Error model
Calibration
set
Error Prediction
Training Set
P(80%)
calibration
factor (ɑ)
Test: 14233 / 17661 80.6% within the error bound
Ext: 3344 / 4890 68.4% within the error bound
18. Feature engineering is the
process of using domain
knowledge to extract
features from raw data.
36. Discovery teams
Fill the gap
Production
Models
Design Hub
Services Series
Trainer GUI
Training /
Analysis
Comp. Chem
Trainer
Engine
H1 H2 H3 H4
Trainer
Engine { }
REST
…
API
{ }
REST
…
API
37. Discovery teams
Multi parameter optimization
Production
Models
Design Hub
Services Series
Trainer GUI
Training /
Analysis
Comp. Chem
Trainer
Engine
H1 H2 H3 H4
Trainer
Engine
38. Translate data to reliable
models
Centralize model
management
Connect project team
members and resources
Track and manage discovery
Design Hub
Lower the barrier to adopt AI models in design
Trainer Engine
https:/
/chemaxon.com/products/trainer-engine https:/
/chemaxon.com/products/design-hub