The document describes the NCI/CADD Chemical Identifier Resolver, which works as a resolver for different chemical structure identifiers. It allows the conversion of a given structure identifier into another representation or identifier. The resolver indexes over 150 chemical structure databases containing over 120 million structures. It is accessible via a web API that detects the identifier type and performs the requested conversion via calculation or database lookup.
The document describes several web services and resources provided by the National Cancer Institute's (NCI) Computer-Aided Drug Design (CADD) Group for accessing chemical structure and compound information. The key services and databases mentioned include the Chemical Identifier Resolver, which allows conversion between various chemical structure representations and identifiers, and the NCI/CADD Chemical Structure Database, which contains over 100 million chemical structures and structure identifiers.
The Chemical Identifier Resolver (CIR) works as a resolver for different chemical structure identifiers and representations. It allows one to convert a given structure identifier into another representation or structure identifier. The CIR uses a programmatic URL API and can be accessed by various programming libraries and languages. It works by detecting the identifier type, looking up the structure in its underlying Chemical Structure Database, calculating the requested representation, and returning the result along with a MIME type in the HTTP response.
The document provides a 10 step tutorial for creating animations on the online tool Go!Animate. It explains how to sign up, choose characters and backgrounds, add text boxes and other elements, create and edit scenes, add effects, preview and save the animation. The tutorial aims to introduce users to basic animation techniques using Go!Animate's free online program.
Photography has the power to both capture fleeting moments and create works of art. Good photographs are limited only by the photographer's own perspective and ability to find beauty in any subject. While there are no strict rules, the most impactful photographs are often the result of a photographer mastering their craft through extensive experience and practice.
This document discusses using Universal Design for Learning (UDL) principles to make study tools accessible and beneficial for all learners. UDL provides flexible instructional approaches that can be customized for individual needs rather than a one-size-fits-all solution. Assistive technology promotes independence by enhancing or changing how users interact with needed technology. When choosing Web 2.0 learning tools, considerations include closed captioning, adjustable sound, vocabulary support, on-screen tools, feedback, and options for navigation, keys, steps, and time commitment. Tools exist to support a wide range of subjects like math, science, vocabulary, history and language skills.
The ARCS model of motivation focuses on gaining and sustaining students' attention, showing the relevance of topics, building confidence in learning, and satisfying students with appropriate feedback and acknowledgment. It recommends using varied methods like examples, humor, interaction and addressing students' needs and questions to engage them throughout the learning process and help them see the value and impact of instruction.
The document discusses different definitions and perspectives of the term "family". The Merriam Webster Dictionary defines a family as a group of individuals living under one roof and usually under one head, or a group of persons of common ancestry. In contrast, the Urban Dictionary provides a more tongue-in-cheek definition, describing a family as "a bunch of people who hate each other and eat dinner together". The document goes on to ask questions about what comes to mind when thinking of family and what attributes make a family strong. It concludes by providing instructions for creating a family tree layout.
The document describes several web services and resources provided by the National Cancer Institute's (NCI) Computer-Aided Drug Design (CADD) Group for accessing chemical structure and compound information. The key services and databases mentioned include the Chemical Identifier Resolver, which allows conversion between various chemical structure representations and identifiers, and the NCI/CADD Chemical Structure Database, which contains over 100 million chemical structures and structure identifiers.
The Chemical Identifier Resolver (CIR) works as a resolver for different chemical structure identifiers and representations. It allows one to convert a given structure identifier into another representation or structure identifier. The CIR uses a programmatic URL API and can be accessed by various programming libraries and languages. It works by detecting the identifier type, looking up the structure in its underlying Chemical Structure Database, calculating the requested representation, and returning the result along with a MIME type in the HTTP response.
The document provides a 10 step tutorial for creating animations on the online tool Go!Animate. It explains how to sign up, choose characters and backgrounds, add text boxes and other elements, create and edit scenes, add effects, preview and save the animation. The tutorial aims to introduce users to basic animation techniques using Go!Animate's free online program.
Photography has the power to both capture fleeting moments and create works of art. Good photographs are limited only by the photographer's own perspective and ability to find beauty in any subject. While there are no strict rules, the most impactful photographs are often the result of a photographer mastering their craft through extensive experience and practice.
This document discusses using Universal Design for Learning (UDL) principles to make study tools accessible and beneficial for all learners. UDL provides flexible instructional approaches that can be customized for individual needs rather than a one-size-fits-all solution. Assistive technology promotes independence by enhancing or changing how users interact with needed technology. When choosing Web 2.0 learning tools, considerations include closed captioning, adjustable sound, vocabulary support, on-screen tools, feedback, and options for navigation, keys, steps, and time commitment. Tools exist to support a wide range of subjects like math, science, vocabulary, history and language skills.
The ARCS model of motivation focuses on gaining and sustaining students' attention, showing the relevance of topics, building confidence in learning, and satisfying students with appropriate feedback and acknowledgment. It recommends using varied methods like examples, humor, interaction and addressing students' needs and questions to engage them throughout the learning process and help them see the value and impact of instruction.
The document discusses different definitions and perspectives of the term "family". The Merriam Webster Dictionary defines a family as a group of individuals living under one roof and usually under one head, or a group of persons of common ancestry. In contrast, the Urban Dictionary provides a more tongue-in-cheek definition, describing a family as "a bunch of people who hate each other and eat dinner together". The document goes on to ask questions about what comes to mind when thinking of family and what attributes make a family strong. It concludes by providing instructions for creating a family tree layout.
The document describes the NCI/CADD Chemical Identifier Resolver, a web-based tool that converts between different chemical structure identifiers and representations. It indexes over 150 million structures from public databases and assigns unique identifiers to represent chemical structures and related forms in a standardized way. This allows disambiguation of structures and tracking of chemical space.
ACS Salt Lake City 2009 CINF Talk (InChI Symposium)Markus Sitzmann
The document compares NCI/CADD structure identifiers to InChI/InChIKey identifiers. NCI/CADD identifiers aim to uniquely and consistently represent chemical structures, accounting for tautomers, isotopes, charges, and stereochemistry. InChI/InChIKey identifiers provide an open standard for structure representation.
This is a presentation given to a group of students at the UNC Eshelman School of Pharmacy.
As chemists many of us want to resource information that is high quality, accurate and addresses our query. With the increasing proliferation of online chemistry resources it is very common for us to turn to these resources to source data. However, are resources such as Wikipedia, PubChem and the plethora of databases delivering information for metabolism, medicinal chemistry and synthetic chemistry trustworthy? Which of these resources, if any, should be treated as authorities? What is the most integrated approach to resource chemistry related data online? What approaches can be taken to validate the data that is available and how can individual scientists participate in helping to improve the content and quality of chemistry related data on the web.
Antony Williams is ChemSpiderman. He started the ChemSpider database (www.chemspider.com) as a hobby to deliver a free platform for the community to source chemistry related data. Within three years the system was acquired by the Royal Society of Chemistry and now serves up close to 25 million chemical structures linked to over 400 data sources across the internet and offers individual scientists the opportunity to host and share their data with the community and to participate in data curation and annotation. Tony will share his experiences of building this chemistry database with a focus on data validation and curation and sourcing high quality data. During the presentation he will discuss ways to check chemical structure representations before submission to public systems for searching and provide an overview of chemical identifiers such as SMILES strings and the International Chemical Identifier (InChI) allows for the interlinking of resources. Attendees can expect to leave the session with a deeper understanding of utilizing the internet to resource chemistry related data.
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...Dr. Haxel Consult
Identifying and locating chemical substances, which can be disclosed in patents by names, structures, variable tables, etc. presents a time-intensive challenge to chemical patent analysis. Though emerging technology can help, recently published research demonstrates that algorithmic identification of chemical substances alone successfully identifies only ~60% of the disclosed compounds, compared to intellectual compound identification. PatentPakTM addresses this gap by extending the efforts of CAS scientists, who have intellectually analyzed the global patent literature for claimed and exemplified compounds for more than 100 years, to also elucidate the location of the substances in the patent text. This presentation will explore a number of examples, including a case study on vitamin D metabolites, to demonstrate the significant time savings and enhanced comprehensiveness of this approach.
The document summarizes the backend systems and processes that power the new EBI search engine EB-eye. It describes the large amounts and various formats of data being indexed, the parsing and indexing of different data formats using various tools, and the distributed indexing approach across multiple servers that allows indexing to be completed in under 18 hours. It also provides an overview of the web frontend and load balancing, as well as future plans for automatic updates and verifications.
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
The Cahn-Ingold-Prelog (CIP) priority rules have been the corner stone in written communication of stereo-chemical configuration for more than half a century. The rules rank ligands around a stereocentre allowing an atom order and layout invariant stereo-descriptor to be assigned, for example R (right) or S (left) for tetrahedral atoms. Despite their widespread daily use, many chemists may be surprised to find that beyond trivial cases, different software may assign different labels to the same structure diagram.
There have been several attempts to either replace or amend the CIP rules. This talk will highlight the more challenging aspects of the ranking and present a comparison of software that provide CIP labels and where they disagree. Providing an IUPAC verified free and open source CIP implementation would allow software maintainers and vendors to validate and improve their implementations. Ultimately this would improve the accuracy in exchange of written chemical information for all.
Building support for the semantic web for chemistry at the Royal Society of C...Ken Karapetyan
The Royal Society of Chemistry provides a variety of databases and services covering multiple domains of Chemistry. That includes our electronic publishing platform, ChemSpider and its related databases, the National Chemistry Database and digital access to the RSC archive that spans over 170 years. In order to support the rising tide of semantic web technologies we are now working on exposing our data to conform with the linked data paradigm. This presentation will provide an overview of our work to introduce semantic structure to all RSC electronic resources as well as outlining ways to access this information using standard formats and various APIs.
The Royal Society of Chemistry provides a variety of databases and services covering multiple domains of Chemistry. That includes our electronic publishing platform, ChemSpider and its related databases, the National Chemistry Database and digital access to the RSC archive that spans over 170 years. In order to support the rising tide of semantic web technologies we are now working on exposing our data to conform with the linked data paradigm. This presentation will provide an overview of our work to introduce semantic structure to all RSC electronic resources as well as outlining ways to access this information using standard formats and various APIs.
Chemistry Resource is a database for finding chemical compounds in Derwent World Patents Index records. It provides simple access to the Derwent bibliographic database back to 1981. Chemistry Resource assigns unique identifiers called numbers to chemical compounds, linking them to the bibliographic indexing in Derwent WPI. Users can search Chemistry Resource by fields like systematic name, preferred name, and molecular weight. It displays compound information such as names, structures, and associated Derwent records.
The document summarizes the Chemical Validation and Standardization Platform (CVSP) used by Open PHACTS to validate and standardize chemical structure data from various sources. CVSP performs validation of chemical structures, generates standardized representations, and establishes parent-child relationships between structures. It has validated over 1.3 million records from ChEMBL and over 6,500 from DrugBank, identifying various issues. Standardized structures and relationships are exported in RDF/turtle format to integrate with the Open PHACTS semantic web platform.
There is an increasing availability of free and open access resources for chemists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. It was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge.
There are tens if not hundreds of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the fact that there were a large number of databases containing chemical compounds and data available online their inherent quality, accuracy and completeness was lacking in many regards. The intention with ChemSpider was to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data, experimental properties and linking to other valuable resources. It has grown into a resource containing over 21 million unique chemical structures from over 200 data sources.
ChemSpider has enabled real time curation of the data, association of analytical data with chemical structures, real-time deposition of single or batch chemical structures (including with activity data) and transaction-based predictions of physicochemical data. The social community aspects of the system demonstrate the potential of this approach. Curation of the data continues daily and thousands of edits and depositions by members of the community have dramatically improved the quality of the data relative to other public resources for chemistry.
This presentation will provide an overview of the history of ChemSpider, the present capabilities of the platform and how it can become one of the primary foundations of the semantic web for chemistry. It will also discuss some of the present projects underway since the acquisition of ChemSpider by the Royal Society of Chemistry.
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsNextMove Software
This document summarizes a presentation given at the ACS National Meeting in Philadelphia on August 19th, 2012 about chemical text mining of pharmaceutical patents. The presentation discussed trends in US patent applications for pharmaceuticals from 2002-2012, workflows for extracting and analyzing information from patent texts, and tools like LeadMine and PatFetch that can recognize chemical entities and access patent texts programmatically.
The document summarizes the vision and challenges of ChemSpider, a free online database for chemists. Key points:
- ChemSpider aims to connect chemistry online by allowing searches by chemical structure and linking to related data across the web.
- It was built as a "hobby project" on limited resources but has grown significantly.
- Ensuring data quality is a major challenge due to errors inherited from other databases. Extensive curation is needed.
- Name searching is problematic, structure and substructure searching is preferred.
- Future work includes continued curation, improved search capabilities, and collaborative data cleaning across databases.
The document summarizes the vision and challenges of ChemSpider, a free online database for chemists. Key points:
- ChemSpider aims to connect chemistry online by allowing structure and substructure searching across databases and literature.
- It was built as a "hobby project" on limited resources but has grown significantly.
- Ensuring data quality is challenging due to errors inherited from other databases that propagate.
- Crowdsourcing curation and developing identification standards like InChI can help address data quality issues.
- Future work includes expanding search capabilities, curating more data sources, and developing collaborative curation.
Increasingly Accurate Representation of Biochemistry (v2)Michel Dumontier
Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of bio-curation are substantially more accurate.
1. George Papadatos presented on mining compounds, targets, and indications from the patent corpus using SureChEMBL and Open PHACTS.
2. He outlined how SureChEMBL annotates chemicals in patents and links them to biological data for Open PHACTS, and how relevance scoring helps prioritize important entities.
3. Examples were given of using the Open PHACTS API and patent data in SureChEMBL for use cases like identifying key compounds, analyzing chemical spaces, and integrating external data.
The internet continues to offer increased access to chemistry data that may be of value to scientists interested in populating systems containing reference toxicology data as well as to provide data for the development of predictive models. This presentation will give an overview of some of the various sources of data available via the internet, provide an overview of some of the challenges associated with gathering high-quality data and discuss methods by which to mesh together disparate data sources.
This document provides information on how to find physical and thermodynamic property data for chemical substances from electronic and print resources. It discusses the importance of this type of data for chemists and engineers. It describes various types of data compilations that contain extracted property data from journal articles. Examples of print and electronic resources are provided, including Scifinder, NIST Chemistry Webbook, CRC Handbook of Chemistry and Physics, and Chemspider. Methods for searching these resources to find specific physical property values are also outlined.
This document discusses ChIP-Atlas, a database of chromatin immunoprecipitation sequencing (ChIP-seq) data. It contains over 1,000 TB of ChIP-seq data from thousands of experiments profiling transcription factor and histone modifications across various cell types. The database can be used to identify transcription factors enriched at tissue-specific genes and provides tools to analyze ChIP-seq data, including a peak browser and enrichment analysis. It aims to facilitate understanding of gene regulation networks in different cell types.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
More Related Content
Similar to 5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
The document describes the NCI/CADD Chemical Identifier Resolver, a web-based tool that converts between different chemical structure identifiers and representations. It indexes over 150 million structures from public databases and assigns unique identifiers to represent chemical structures and related forms in a standardized way. This allows disambiguation of structures and tracking of chemical space.
ACS Salt Lake City 2009 CINF Talk (InChI Symposium)Markus Sitzmann
The document compares NCI/CADD structure identifiers to InChI/InChIKey identifiers. NCI/CADD identifiers aim to uniquely and consistently represent chemical structures, accounting for tautomers, isotopes, charges, and stereochemistry. InChI/InChIKey identifiers provide an open standard for structure representation.
This is a presentation given to a group of students at the UNC Eshelman School of Pharmacy.
As chemists many of us want to resource information that is high quality, accurate and addresses our query. With the increasing proliferation of online chemistry resources it is very common for us to turn to these resources to source data. However, are resources such as Wikipedia, PubChem and the plethora of databases delivering information for metabolism, medicinal chemistry and synthetic chemistry trustworthy? Which of these resources, if any, should be treated as authorities? What is the most integrated approach to resource chemistry related data online? What approaches can be taken to validate the data that is available and how can individual scientists participate in helping to improve the content and quality of chemistry related data on the web.
Antony Williams is ChemSpiderman. He started the ChemSpider database (www.chemspider.com) as a hobby to deliver a free platform for the community to source chemistry related data. Within three years the system was acquired by the Royal Society of Chemistry and now serves up close to 25 million chemical structures linked to over 400 data sources across the internet and offers individual scientists the opportunity to host and share their data with the community and to participate in data curation and annotation. Tony will share his experiences of building this chemistry database with a focus on data validation and curation and sourcing high quality data. During the presentation he will discuss ways to check chemical structure representations before submission to public systems for searching and provide an overview of chemical identifiers such as SMILES strings and the International Chemical Identifier (InChI) allows for the interlinking of resources. Attendees can expect to leave the session with a deeper understanding of utilizing the internet to resource chemistry related data.
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...Dr. Haxel Consult
Identifying and locating chemical substances, which can be disclosed in patents by names, structures, variable tables, etc. presents a time-intensive challenge to chemical patent analysis. Though emerging technology can help, recently published research demonstrates that algorithmic identification of chemical substances alone successfully identifies only ~60% of the disclosed compounds, compared to intellectual compound identification. PatentPakTM addresses this gap by extending the efforts of CAS scientists, who have intellectually analyzed the global patent literature for claimed and exemplified compounds for more than 100 years, to also elucidate the location of the substances in the patent text. This presentation will explore a number of examples, including a case study on vitamin D metabolites, to demonstrate the significant time savings and enhanced comprehensiveness of this approach.
The document summarizes the backend systems and processes that power the new EBI search engine EB-eye. It describes the large amounts and various formats of data being indexed, the parsing and indexing of different data formats using various tools, and the distributed indexing approach across multiple servers that allows indexing to be completed in under 18 hours. It also provides an overview of the web frontend and load balancing, as well as future plans for automatic updates and verifications.
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
The Cahn-Ingold-Prelog (CIP) priority rules have been the corner stone in written communication of stereo-chemical configuration for more than half a century. The rules rank ligands around a stereocentre allowing an atom order and layout invariant stereo-descriptor to be assigned, for example R (right) or S (left) for tetrahedral atoms. Despite their widespread daily use, many chemists may be surprised to find that beyond trivial cases, different software may assign different labels to the same structure diagram.
There have been several attempts to either replace or amend the CIP rules. This talk will highlight the more challenging aspects of the ranking and present a comparison of software that provide CIP labels and where they disagree. Providing an IUPAC verified free and open source CIP implementation would allow software maintainers and vendors to validate and improve their implementations. Ultimately this would improve the accuracy in exchange of written chemical information for all.
Building support for the semantic web for chemistry at the Royal Society of C...Ken Karapetyan
The Royal Society of Chemistry provides a variety of databases and services covering multiple domains of Chemistry. That includes our electronic publishing platform, ChemSpider and its related databases, the National Chemistry Database and digital access to the RSC archive that spans over 170 years. In order to support the rising tide of semantic web technologies we are now working on exposing our data to conform with the linked data paradigm. This presentation will provide an overview of our work to introduce semantic structure to all RSC electronic resources as well as outlining ways to access this information using standard formats and various APIs.
The Royal Society of Chemistry provides a variety of databases and services covering multiple domains of Chemistry. That includes our electronic publishing platform, ChemSpider and its related databases, the National Chemistry Database and digital access to the RSC archive that spans over 170 years. In order to support the rising tide of semantic web technologies we are now working on exposing our data to conform with the linked data paradigm. This presentation will provide an overview of our work to introduce semantic structure to all RSC electronic resources as well as outlining ways to access this information using standard formats and various APIs.
Chemistry Resource is a database for finding chemical compounds in Derwent World Patents Index records. It provides simple access to the Derwent bibliographic database back to 1981. Chemistry Resource assigns unique identifiers called numbers to chemical compounds, linking them to the bibliographic indexing in Derwent WPI. Users can search Chemistry Resource by fields like systematic name, preferred name, and molecular weight. It displays compound information such as names, structures, and associated Derwent records.
The document summarizes the Chemical Validation and Standardization Platform (CVSP) used by Open PHACTS to validate and standardize chemical structure data from various sources. CVSP performs validation of chemical structures, generates standardized representations, and establishes parent-child relationships between structures. It has validated over 1.3 million records from ChEMBL and over 6,500 from DrugBank, identifying various issues. Standardized structures and relationships are exported in RDF/turtle format to integrate with the Open PHACTS semantic web platform.
There is an increasing availability of free and open access resources for chemists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. It was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge.
There are tens if not hundreds of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the fact that there were a large number of databases containing chemical compounds and data available online their inherent quality, accuracy and completeness was lacking in many regards. The intention with ChemSpider was to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data, experimental properties and linking to other valuable resources. It has grown into a resource containing over 21 million unique chemical structures from over 200 data sources.
ChemSpider has enabled real time curation of the data, association of analytical data with chemical structures, real-time deposition of single or batch chemical structures (including with activity data) and transaction-based predictions of physicochemical data. The social community aspects of the system demonstrate the potential of this approach. Curation of the data continues daily and thousands of edits and depositions by members of the community have dramatically improved the quality of the data relative to other public resources for chemistry.
This presentation will provide an overview of the history of ChemSpider, the present capabilities of the platform and how it can become one of the primary foundations of the semantic web for chemistry. It will also discuss some of the present projects underway since the acquisition of ChemSpider by the Royal Society of Chemistry.
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsNextMove Software
This document summarizes a presentation given at the ACS National Meeting in Philadelphia on August 19th, 2012 about chemical text mining of pharmaceutical patents. The presentation discussed trends in US patent applications for pharmaceuticals from 2002-2012, workflows for extracting and analyzing information from patent texts, and tools like LeadMine and PatFetch that can recognize chemical entities and access patent texts programmatically.
The document summarizes the vision and challenges of ChemSpider, a free online database for chemists. Key points:
- ChemSpider aims to connect chemistry online by allowing searches by chemical structure and linking to related data across the web.
- It was built as a "hobby project" on limited resources but has grown significantly.
- Ensuring data quality is a major challenge due to errors inherited from other databases. Extensive curation is needed.
- Name searching is problematic, structure and substructure searching is preferred.
- Future work includes continued curation, improved search capabilities, and collaborative data cleaning across databases.
The document summarizes the vision and challenges of ChemSpider, a free online database for chemists. Key points:
- ChemSpider aims to connect chemistry online by allowing structure and substructure searching across databases and literature.
- It was built as a "hobby project" on limited resources but has grown significantly.
- Ensuring data quality is challenging due to errors inherited from other databases that propagate.
- Crowdsourcing curation and developing identification standards like InChI can help address data quality issues.
- Future work includes expanding search capabilities, curating more data sources, and developing collaborative curation.
Increasingly Accurate Representation of Biochemistry (v2)Michel Dumontier
Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key nitrogen atom must first be de-protonated. Thus, our current representation of biochemical knowledge may improve such that manual and automatic methods of bio-curation are substantially more accurate.
1. George Papadatos presented on mining compounds, targets, and indications from the patent corpus using SureChEMBL and Open PHACTS.
2. He outlined how SureChEMBL annotates chemicals in patents and links them to biological data for Open PHACTS, and how relevance scoring helps prioritize important entities.
3. Examples were given of using the Open PHACTS API and patent data in SureChEMBL for use cases like identifying key compounds, analyzing chemical spaces, and integrating external data.
The internet continues to offer increased access to chemistry data that may be of value to scientists interested in populating systems containing reference toxicology data as well as to provide data for the development of predictive models. This presentation will give an overview of some of the various sources of data available via the internet, provide an overview of some of the challenges associated with gathering high-quality data and discuss methods by which to mesh together disparate data sources.
This document provides information on how to find physical and thermodynamic property data for chemical substances from electronic and print resources. It discusses the importance of this type of data for chemists and engineers. It describes various types of data compilations that contain extracted property data from journal articles. Examples of print and electronic resources are provided, including Scifinder, NIST Chemistry Webbook, CRC Handbook of Chemistry and Physics, and Chemspider. Methods for searching these resources to find specific physical property values are also outlined.
This document discusses ChIP-Atlas, a database of chromatin immunoprecipitation sequencing (ChIP-seq) data. It contains over 1,000 TB of ChIP-seq data from thousands of experiments profiling transcription factor and histone modifications across various cell types. The database can be used to identify transcription factors enriched at tissue-specific genes and provides tools to analyze ChIP-seq data, including a peak browser and enrichment analysis. It aims to facilitate understanding of gene regulation networks in different cell types.
Similar to 5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk (20)
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3Data Hops
Free A4 downloadable and printable Cyber Security, Social Engineering Safety and security Training Posters . Promote security awareness in the home or workplace. Lock them Out From training providers datahops.com
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
1. Markus Sitzmann 1 , Wolf-Dietrich Ihlenfeldt 2 , and Marc C. Nicklaus 1 [1] Computer-Aided Drug Design Group, Chemical Biology Laboratory, NCI-Frederick, NIH, DHHS [2] Xemistry GmbH, Auf den Stieden 8, D-35094 Lahntal, Germany NCI/CADD Chemical Identifier Resolver: Indexing and Analysis of Available Chemistry Space
2.
3. Chemical Identifier Resolver chemical structure NCI/CADD Identifiers InChI/InChIKey ChemSpider ID PubChem SID/CID chemical names CAS Registry Number NSC number FDA UNII ChemNavigator SID SMILES SD File Chemical Formula ChEBI ID PDB Ligand ID MRV CML SYBYL Line Notation GIF image
4. http://cactus.nci.nih.gov/chemical/structure Works as a resolver for different chemical structure identifiers. Allows one to convert a given structure identifier into another representation or structure identifier. Chemical Identifier Resolver NCI/CADD Web Resources first beta release: July 2009 current release (beta 4): April 2011
5.
6. resolver chemical names IUPAC names (by OPSIN ) CAS numbers SMILES strings IUPAC InChI/InChIKeys NCI/CADD Identifiers CACTVS HASHISY NSC number PubChem SID ChemSpider ID ChemNavigator SID FDA UNII /smiles /names, /iupac_name /cas /inchi, /stdinchi /inchikey, /stdinchikey /ficts, /ficus, /uuuuu /image /file, /sdf /mw, /monoisotopic_mass /formula /twirl, /3d /urls /chemspider_id /pubchem_sid /chemnavigator_sid “ identifier” “ representation” http://cactus.nci.nih.gov/chemcial/structure Chemical Identifier Resolver NCI/CADD Public Web Resources
7. identifier representation http request http response detection of the identifier type identifier is a full structure representation (e.g. SMILES, InChI) calculation of the requested structure representation identifier is a hashed structure representation (e.g. InChIKey), trivial name etc. database lookup MIME type Chemical Identifier Resolver NCI/CADD Web Resources structure e.g. InChI, GIF image e.g. CAS number, chemical name CACTVS NCI/CADD Chemical Structure Database (CSDB)
8. identifier representation http request http response identifier is a full structure representation (e.g. SMILES, InChI) calculation of the requested structure representation identifier is a hashed structure representation (e.g. InChIKey), trivial name etc. database lookup MIME type Chemical Identifier Resolver NCI/CADD Web Resources structure e.g. InChI, GIF image e.g. CAS number, chemical name CACTVS NCI/CADD Chemical Structure Database (CSDB) detection of the identifier type
13. original structure record Molfile SDF SMILES ChemDraw cdx PDB structure normalization parent structure SDF SMILES database NCI/CADD Identifier hashcode calculation E_HASHISY NCI/CADD Structure Identifiers Unique Representation of Chemical Structures
14.
15. Fragments Isotopes Charges Stereo Tautomers FICTS FICuS uuuuu sensitive / not sensitive <CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum> Na + 4A122D094098B50D -FICTS-01-1D 0E26B623DF7FAD30 -FICuS-01-70 9850FD9F9E2B4E25 -uuuuu-01-27 NCI/CADD Structure Identifiers Unique Representation of Chemical Structures H N N N H 2 O - O
16. H N N N H 2 O - O N a + charged form tautomer isotope salt stereoisomers “ errors” histidine N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O H N N N H 2 O H O
17. A3DAE0788050DDE4-FICTS E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomer isotope salt stereoisomers FICTS “ errors” histidine H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
18. A3DAE0788050DDE4-FICuS E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + charged form tautomer isotope salt stereoisomers FICuS “ errors” 9850FD9F9E2B4E25 -FICuS histidine N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O H N N N H 2 O H O
19. 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + charged form tautomer isotope stereoisomers salt uuuuu “ errors” 9850FD9F9E2B4E25 -uuuuu histidine N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O H N N N H 2 O H O
20. HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVDQJCIGZPNO - UHFFFAOYSA -N H N N N H 2 O - O N a + charged form tautomer isotope stereoisomers salt Std. InChIKey “ errors” HNDVDQJCIGZPNO - UHFFFAOYSA -N UHPNKBYGGMJTIM -UHFFFAOYSA-M UHPNKBYGGMJTIM -UHFFFAOYSA-M histidine HNDVDQJCIGZPNO - UHFFFAOYSA -N N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O H N N N H 2 O H O
21. original record original record original record original record original record original record original record original record original record original record original record NCI/CADD Chemical Structure Database Structure Normalization 119.8 million original structure records in CSDB
22. FICTS original record original record original record original record FICTS original record original record original record original record original record original record original record FICTS FICTS FICTS FICTS FICTS FICTS 83.1 million FICTS parent structures 119.8 million original structure records in CSDB NCI/CADD Chemical Structure Database Structure Normalization
23. FICTS original record original record original record original record FICTS original record original record original record original record original record original record original record FICTS FICTS FICTS FICTS FICTS FICTS FICuS FICuS FICuS FICuS FICuS FICuS 83.1 million FICTS parent structures 81.6 million FICuS parent structures 119.8 million original structure records in CSDB NCI/CADD Chemical Structure Database Structure Normalization
24. FICTS original record original record original record original record FICTS original record original record original record original record original record original record original record FICTS FICTS FICTS FICTS FICTS FICTS FICuS FICuS FICuS FICuS FICuS FICuS uuuuu uuuuu uuuuu uuuuu 83.1 million FICTS parent structures 81.6 million FICuS parent structures 76.2 million uuuuu parent structures 119.8 million original structure records in CSDB NCI/CADD Chemical Structure Database Structure Normalization
25. FICTS original record original record original record original record FICTS original record original record original record original record original record original record original record FICTS FICTS FICTS FICTS FICTS FICTS FICuS FICuS FICuS FICuS FICuS FICuS uuuuu uuuuu uuuuu uuuuu tautomer- invariant 83.1 million FICTS parent structures 81.6 million FICuS parent structures 76.2 million uuuuu parent structures 119.8 million original structure records in CSDB NCI/CADD Chemical Structure Database Structure Normalization
26. Tautomer Analysis How much “chemical space” is “just generated” by drawing tautomers?
27.
28.
29. FICuS FICuS FICuS FICuS FICuS FICuS 70.6 million FICuS parent structures NCI/CADD Chemical Structure Database Tautomer Analysis starting from the set of FICuS parent structures we systematically generated all tautomers based on the 21 SMIRKS rule set available in CACTVS generated 680 million tautomers for 1.7% of the FICuS parent structures the enumeration was not exhaustive (2009 DB version)
30. NCI/CADD Chemical Structure Database Tautomer Analysis number database releases 0 10 20 30 40 50 60 70 80 90 0.0 0.5 1.0 1.5 2.0 frequency tautomeric overlap within each individual database release (%) average: ~0.3% of original structure records
31. NCI/CADD Chemical Structure Database Tautomer Analysis number database releases 0 10 20 30 40 50 60 70 80 90 0.0 0.5 1.0 1.5 2.0 frequency tautomeric overlap within each individual database release (%) average: ~0.3% of original structure records Asinex ChemBridge ComGenex ChemNavigator Columbia University Molecular Screening Center EPA DSSTox Specs Ambinter BIND BindingDB ChemNavigator KEGG NCI Open Database NIST WebBook NLM ChemIDplus NMRShiftDB Thomson Pharma Wombat NCI/DTP PASS Training Set SGC-Ox ChemDB ZINC ChEBI ChemSpider
32. NCI/CADD Chemical Structure Database Tautomer Analysis 0 5 10 15 20 25 30 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 frequency number database releases percentage of FICuS parent structure in each database release occurring somewhere in CSDB with a conflict occurrence of “tautomerism-critical” molecules within each individual database release (%) average: ~9.5% of FICuS parent structures
33.
34. HPMBP (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) CACTVS generates 7 tautomers Example for a Tautomer “Conflict” canonical tautomer by CACTVS 5 tautomers have potential stereo center on atoms or bonds N N O H O H N N O O H N N O O R/S H N N O H O H R/S H N N O O H E/Z N N O O H E/Z N N O O R/S
35. H H 4551-69-1 33064-14-1 127117-31-1 859 references 49 references 3 references HPMBP (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) 3 tautomers have CAS Registry Numbers assigned Example for a Tautomer “Conflict” (no stereo) (Z) N N O O H N N O O H N N O O R/S H N N O H O H R/S N N O O H E/Z N N O O H E/Z N N O O R/S
36. N N O H O N N O O N N O O H H N N O O H H N N O H O H H N N O O 6 databases 16 databases (no stereo) 3 databases (R) 2 databases (S) 12 databases 1 database (no stereo) HPMBP (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) Example for a Tautomer “Conflict” occurrences in databases indexed in CSDB R/S R/S E/Z E/Z R/S H N N O O
37. 6 databases 16 databases (no stereo) 3 databases (R) 2 databases (S) 12 databases occurrences in databases N N O H O 1 database (no stereo) HPMBP (1-Phenyl-3-methyl-4-benzoyl-pyrazolone-5) Example for a Tautomer “Conflict” ACD 3D Ambinter BindingDB ChemBank ChemDB ChemSpider ChemNavigator MLSMR NIAID Scripps Screening Center Thomson Pharma ZINC ChemDB ACD 3D ACX Ambinter BioByte QSAR ChemBank ChemBridge ChemDB ChemSpider DiscoveryGate EPA GCES MLSMR NCI Open Database NIST MS-Lib NLM ChemIDplus Sigma-Aldrich Thomson Pharma Ambinter ChemDB ChemSpider DiscoveryGate ChemNavigator Thomson Pharma ChemSpider ZINC ChemSpider ECOTOX ZINC N N O O R / S H N N O O N N O O H E / Z H N N O O H E / Z H N N O H O H R / S H N N O O R / S
39. Scaffold Analysis NCI/CADD Chemical Structure Database molecular scaffold tree archetype scaffold simple scaffold Schuffenhauer et al. J. Chem. Inf. Model. 2007 , 47 , 47-58 Bemis et al. J. Med. Chem. 1996, 39 , 2887-2893 Bemis et al. J. Med. Chem. 1996, 39 , 2887-2893 S O O N N O level 2 level 1 example N N H O N N H O N N H
41. NCI/CADD Chemical Structure Database molecular scaffold tree archetype scaffold simple scaffold 76.2 million 8.1 million scaffolds 6.8 million scaffolds 0.8 million scaffolds CSDB Scaffold Analysis uuuuu compound set level 2 level 1 N N H O O N N H N N H
42. NCI/CADD Chemical Structure Database 76.2 million number of unique scaffolds per hierarchy level CSDB Scaffold Analysis uuuuu compound set 8.1 million scaffolds 0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 1 2 3 4 5 6 7 8 9 10 Hierarchy Level Number of Unique Scaffolds (in millions) 0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 Number of unique structures (in million) level 2 level 1 molecular scaffold tree N N H O O N N H
44. Multilevel Neighborhoods of Atoms (MNA) HC C(C(CC-H)C(CC-C)-H(C)) HO C(C(CC-H)C(CN-H)-H(C)) CHCC C(C(CC-H)C(CN-H)-C(C-O-O)) CHCN C(C(CC-H)N(CC)-H(C)) CCCC C(C(CC-C)N(CC)-H(C)) CCOO N(C(CN-H)C(CN-H)) NCC -H(C(CC-H)) OHC -H(C(CN-H)) OC -H(-O(-H-C)) -C(C(CC-C)-O(-H-C)-O(-C)) -O(-H(-O)-C(C-O-O)) -O(-C(C-O-O)) NCI/CADD Chemical Structure Database Filimonov D., Poroikov V., Borodina Yu., Gloriozova T. J. Chem. Inf. Comput. Sci., 1999 , 39 (4), 666-670. MNA level 1 MNA level 2 N O H O H H
45. Multilevel Neighborhoods of Atoms (MNA) NCI/CADD Chemical Structure Database 76.2 million CSDB uuuuu compound set
46. Multilevel Neighborhoods of Atoms (MNA) NCI/CADD Chemical Structure Database Unique MNAs level 1 level 2 13,426 918,516 76.2 million CSDB uuuuu compound set
47. Multilevel Neighborhoods of Atoms (MNA) NCI/CADD Chemical Structure Database Unique MNAs level 1 level 2 13,426 918,516 2.3 billion relationships 1.3 billion relationships ~ 17 MNAs per uuuuu parent structure ~ 30 MNAs per uuuuu parent structure 76.2 million CSDB uuuuu compound set
48. Multilevel Neighborhoods of Atoms (MNA) NCI/CADD Chemical Structure Database surprising: 424,784 MNAs (level 2) are exclusive to a set of 1,3 million structures in ChemSpider Unique MNAs level 1 level 2 13,426 918,516 2.3 billion relationships 1.3 billion relationships ~ 17 MNAs per uuuuu parent structure ~ 30 MNAs per uuuuu parent structure 76.2 million CSDB uuuuu compound set
49. Chemical Structure Web Services NCI/CADD web service NCI/CADD web service NCI/CADD Chemical Structure Database (CSDB) CACTVS external (web) services http Chemical Identifier Resolver other software packages e.g. OPSIN Chemical Structure Web Services NCI/CADD Web Resources
50. IUPHAR DATABASE http://www.iuphar-db.org http://www.akosgmbh.eu/globalsearch/index.htm CACTVS http://www.xemistry.com gChem Virtual Molecular Model Kit http://chemagic.com/web_molecules/script_page_large.aspx Chemical Identifier Resolver NCI/CADD Web Resources Symyx Draw Resolver http://www.symyx.com/ webel.py - A Cinfony module http://baoilleach.blogspot.com/2009/11/ introducing-webel-cheminformatics.html avogadro.openmolecules.net/
53. Acknowledgments ChemNavigator Scott Hutton Tad Hurst Thanks to all database providers! http://cactus.nci.nih.gov Our web site: University of Cambridge Daniel Lowe Peter Murray-Rust Noel’ O Boyle (University College Cork, Ireland) Richard Apodaca (Metamolecular) Hans-Juergen Himmler CADD Group, CBL, NCI Igor Filippov ChemSpider Antony Williams Valery Tkachenko