Presented at the 256th American Chemical Society (ACS) National Meeting in Boston, MA (August 19, 2018).
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical information resource, containing more than 242 million chemical substance descriptions, 94 million unique compounds, and 234 million bioactivities determined from 1.25 million assay experiments. Importantly, data contribution from multiple sources, including IBM, SureChEMBL, ScripDB, NextMove, and BindingDB, allows PubChem to provide links to patent documents that mention chemicals. Currently, PubChem offers links between about 6.7 million patent documents and more than 20 million unique chemical structures, with over 137 million compound-patent links, covering primarily U.S. patents with some from European, and World Intellectual Property Organization, and Japanese patent documents. This presentation will provide an overview of the patent information in PubChem as well as the best practice for using it.
PubChem and its application for cheminformatics educationSunghwan Kim
Presented at the American Chemical Society Middle Atlantic Regional Meeting (MARM) 2021 (June 9, 2021).
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a chemical information resource, developed and maintained by the U.S. National Institutes of Health. It contains a large corpus of publicly chemical data collected from more than 700 data sources. Visited by millions of users every month, it serves a wide range of audiences, from scientific communities to the general public. Considering that many PubChem users are undergraduate and graduate students at academic institutions, it has great potential as a cheminformatics education resource. In this presentation, we will give a brief overview of PubChem’s data content, tools, and services. Important aspects of PubChem as cheminformatics education will be discussed, including data quality and accuracy, data provenance and governance, and structure standardization. Besides, we will discuss PubChem’s education and outreach efforts, including the PubChem Laboratory Chemical Safety Summary (LCSS) and the Cheminformatics On-Line Chemistry Course (OLCC).
PubChem and Its Applications for Drug DiscoverySunghwan Kim
Presentation delivered at Lehigh University (Bethlehem, PA) on Friday, April 26, 2019.
This presentation provides a brief introduction to PubChem and discusses how to use PubChem for drug discovery. More detailed information on this topic can found in the following paper:
Getting the most out of PubChem for virtual screening.
Expert Opin Drug Discov. 2016 Aug 5; 11(9):843-55.
https://doi.org/10.1080/17460441.2016.1216967
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045798/
Covers legal aspects of Patenting in India.Explains the difference between Patent, Trademark and Copyright. Differentiates between patentable and non patentable inventions and explains the process of obtaining a patent, with case studies and examples.
PubChem and its application for cheminformatics educationSunghwan Kim
Presented at the American Chemical Society Middle Atlantic Regional Meeting (MARM) 2021 (June 9, 2021).
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a chemical information resource, developed and maintained by the U.S. National Institutes of Health. It contains a large corpus of publicly chemical data collected from more than 700 data sources. Visited by millions of users every month, it serves a wide range of audiences, from scientific communities to the general public. Considering that many PubChem users are undergraduate and graduate students at academic institutions, it has great potential as a cheminformatics education resource. In this presentation, we will give a brief overview of PubChem’s data content, tools, and services. Important aspects of PubChem as cheminformatics education will be discussed, including data quality and accuracy, data provenance and governance, and structure standardization. Besides, we will discuss PubChem’s education and outreach efforts, including the PubChem Laboratory Chemical Safety Summary (LCSS) and the Cheminformatics On-Line Chemistry Course (OLCC).
PubChem and Its Applications for Drug DiscoverySunghwan Kim
Presentation delivered at Lehigh University (Bethlehem, PA) on Friday, April 26, 2019.
This presentation provides a brief introduction to PubChem and discusses how to use PubChem for drug discovery. More detailed information on this topic can found in the following paper:
Getting the most out of PubChem for virtual screening.
Expert Opin Drug Discov. 2016 Aug 5; 11(9):843-55.
https://doi.org/10.1080/17460441.2016.1216967
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045798/
Covers legal aspects of Patenting in India.Explains the difference between Patent, Trademark and Copyright. Differentiates between patentable and non patentable inventions and explains the process of obtaining a patent, with case studies and examples.
This Presentation teaches on how to search patent using various patent database like Free patent database, Patent database of National Authority and Paid patent database. It also focus on general parts of the patent and why patenting is needed. This presentation was delivered to M.Pharm. student by Mr. Pratik Vora for supporting them in their dissertation topic search. Hope you may find it helpful to you, also.
Compulsory licensing is when a government allows someone else to produce a patented product or process without the consent of the patent owner or plans to use the patent-protected invention itself.
It is focused to provide basic knowledge on prior art search for new intellectuals in the field of IPR. It includes Basic knowledge of Prior art, File wrapper analysis, not list preparation, and one of the important law of Prior Art.
IPR in Life Sciences :Unlock & Harness Your Innovative Potentialsabuj kumar chaudhuri
Invited lecture on IPR in Life Sciences :Unlock & Harness Your Innovative Potential on 9th January 2017 in the Refresher Course in Life Sciences of the UGC-HRDC (University of Calcutta)(thrust area: Challenges and options in Life Science Research in the developing world today) for college and university teachers during Dec.23- Jan.13, 2017 at the Department of Zoology, University of Calcutta , 35, Ballygunge Circular Road, Kolkata-700019.
Creativity is an enigmatic issue. It is influenced and governed by so many determinants that it is yet to be defined properly. It has both philosophical and functional perspectives as well. Presentation is dealt with only its functional side which is manifested in tangible forms. IPR and life sciences has very complex relationship which became more complex with the emerging biotechnology and priority of the industries. Patenting life science invention from its ideation stage to granting a patent has been lucidly demonstrated in this presentation.
Advanced Bioinks for 3D Printing: A Materials Science Perspective
The recent emergence of 3D printing technology in
tissue engineering
DESIGN PARAMETERS FOR ADVANCED
BIOINK DEVELOPMENT
MULTIMATERIAL BIOINKS FOR 3D PRINTING
A Materials Science Perspective
Cheminformatics Education with PubChemSunghwan Kim
Presented on November 13, 2020, as part of the "Integrating Bioinformatics Education Series" (https://ualr.edu/bioinformatics/education-series/), organized by the Arkansas IDeA Network of Biomedical Research Excellence (Arkansas INBRE) (https://inbre.uams.edu/).
Sunghwan Kim
National Library of Medicine, National Institutes of Health, Rockville, Maryland, United States
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
Presented at the 256th American Chemical Society (ACS) National Meeting in Boston, MA (August 19, 2018).
==== Abstract ====
PubChem is one of the largest sources of publicly available chemical information, with more than 242.3 million depositor-provided substance descriptions, 94.7 million unique chemical structures, and 234.8 million bioactivity outcomes from 1.25 million assays covering around ten thousand unique protein target sequences. This presentation provides an overview of PubChem’s data, tools, and services useful for drug discovery based on natural products.
PubChem contains a large amount of bioactivity data, most of which are generated from high-throughput screening (HTS). However, these data also include a substantial amount of bioactivity information extracted from scientific articles published in journals in the chemical biology, medicinal chemistry, and natural product domains, thanks to data contribution by other databases like ChEMBL, Guide to Pharmacology, BindingDB, and PDBbind. In addition, through data integration with other databases such as DrugBank, HSDB, and HMDB, PubChem contains a wide range of annotations useful for drug discovery, including pharmacology, toxicology, drug target, metabolism, chemical vendors, scientific articles, patents, and many others.
PubChem supports various types of chemical structure searches, including identify search, 2-D and 3-D similarity searches, substructure and superstructure searches, molecular formula search. It also provides multiple programmatic access routes, including E-Utilities, Power User Gateway (PUG), PUG-SOAP, PUG-REST, and PUG-View, allowing one to build an automated workflow that takes advantage of information contained in PubChem. In addition, through PubChemRDF, users can integrate PubChem’s data into their own in-house data on a local computing machine.
This Presentation teaches on how to search patent using various patent database like Free patent database, Patent database of National Authority and Paid patent database. It also focus on general parts of the patent and why patenting is needed. This presentation was delivered to M.Pharm. student by Mr. Pratik Vora for supporting them in their dissertation topic search. Hope you may find it helpful to you, also.
Compulsory licensing is when a government allows someone else to produce a patented product or process without the consent of the patent owner or plans to use the patent-protected invention itself.
It is focused to provide basic knowledge on prior art search for new intellectuals in the field of IPR. It includes Basic knowledge of Prior art, File wrapper analysis, not list preparation, and one of the important law of Prior Art.
IPR in Life Sciences :Unlock & Harness Your Innovative Potentialsabuj kumar chaudhuri
Invited lecture on IPR in Life Sciences :Unlock & Harness Your Innovative Potential on 9th January 2017 in the Refresher Course in Life Sciences of the UGC-HRDC (University of Calcutta)(thrust area: Challenges and options in Life Science Research in the developing world today) for college and university teachers during Dec.23- Jan.13, 2017 at the Department of Zoology, University of Calcutta , 35, Ballygunge Circular Road, Kolkata-700019.
Creativity is an enigmatic issue. It is influenced and governed by so many determinants that it is yet to be defined properly. It has both philosophical and functional perspectives as well. Presentation is dealt with only its functional side which is manifested in tangible forms. IPR and life sciences has very complex relationship which became more complex with the emerging biotechnology and priority of the industries. Patenting life science invention from its ideation stage to granting a patent has been lucidly demonstrated in this presentation.
Advanced Bioinks for 3D Printing: A Materials Science Perspective
The recent emergence of 3D printing technology in
tissue engineering
DESIGN PARAMETERS FOR ADVANCED
BIOINK DEVELOPMENT
MULTIMATERIAL BIOINKS FOR 3D PRINTING
A Materials Science Perspective
Cheminformatics Education with PubChemSunghwan Kim
Presented on November 13, 2020, as part of the "Integrating Bioinformatics Education Series" (https://ualr.edu/bioinformatics/education-series/), organized by the Arkansas IDeA Network of Biomedical Research Excellence (Arkansas INBRE) (https://inbre.uams.edu/).
Sunghwan Kim
National Library of Medicine, National Institutes of Health, Rockville, Maryland, United States
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
Presented at the 256th American Chemical Society (ACS) National Meeting in Boston, MA (August 19, 2018).
==== Abstract ====
PubChem is one of the largest sources of publicly available chemical information, with more than 242.3 million depositor-provided substance descriptions, 94.7 million unique chemical structures, and 234.8 million bioactivity outcomes from 1.25 million assays covering around ten thousand unique protein target sequences. This presentation provides an overview of PubChem’s data, tools, and services useful for drug discovery based on natural products.
PubChem contains a large amount of bioactivity data, most of which are generated from high-throughput screening (HTS). However, these data also include a substantial amount of bioactivity information extracted from scientific articles published in journals in the chemical biology, medicinal chemistry, and natural product domains, thanks to data contribution by other databases like ChEMBL, Guide to Pharmacology, BindingDB, and PDBbind. In addition, through data integration with other databases such as DrugBank, HSDB, and HMDB, PubChem contains a wide range of annotations useful for drug discovery, including pharmacology, toxicology, drug target, metabolism, chemical vendors, scientific articles, patents, and many others.
PubChem supports various types of chemical structure searches, including identify search, 2-D and 3-D similarity searches, substructure and superstructure searches, molecular formula search. It also provides multiple programmatic access routes, including E-Utilities, Power User Gateway (PUG), PUG-SOAP, PUG-REST, and PUG-View, allowing one to build an automated workflow that takes advantage of information contained in PubChem. In addition, through PubChemRDF, users can integrate PubChem’s data into their own in-house data on a local computing machine.
Presented online at KSEA - Virginia Washington Metro Regional Conference 2020 (VWMRC 2020) (May 9, 2020)
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource, visited by millions of unique users per month. It contains chemical data from more than 700 data sources and disseminates these data to the public free of charge. Arguably, it is the largest source of publicly available chemical information, containing more than 250 million depositor-provided substance descriptions, 100 million unique chemical structures, and 260 million bioactivity outcomes from one million assays covering around ten thousand unique protein target sequences. This presentation provides an overview of PubChem’s data, tools, and services useful for drug discovery.
The immense quantity of bioactivity data in PubChem can be used to develop computational models to predict bioactivities of small molecules. While these data are primarily generated from high-throughput screening (HTS), they also include a substantial amount of bioactivity information extracted from peer-reviewed journal articles. In addition, through data integration with other databases, PubChem has a wide range of annotations useful for drug discovery, including pharmacology, toxicology, drug target, metabolism, chemical vendors, scientific articles, patents, and many others.
PubChem supports various types of chemical structure searches, including identity, 2-D and 3-D similarity, substructure, superstructure, and molecular formula. It also provides multiple programmatic access routes, including E-Utilities, Power User Gateway (PUG), PUG-SOAP, PUG-REST, and PUG-View, allowing one to build an automated workflow that takes advantage of information contained in PubChem. In addition, through PubChemRDF, users can integrate PubChem data with their own.
PubChem as a resource for chemical information educationSunghwan Kim
Presented at the Fall 2020 American Chemical Society (ACS) National Meeting (Virtual) on August 20, 2020.
Sunghwan Kim & Evan Bolton
National Library of Medicine, National Institutes of Health, Rockville, Maryland, United States
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical information resource that contains one of the largest corpus of publicly available chemical information. It is one of the top five most visited chemistry web sites in the world, with more than four million unique users per month (as of April 2020). Considering that many of PubChem users are undergraduate students in academic institutions, PubChem has a great potential as an online resource for chemical education. However, it also has some important issues with data accuracy, data provenance, structure standardization, terminologies and so on, because PubChem is essentially a data aggregator that collects heterogeneous data from 700+ data sources in various domains. This presentation will discuss various aspects of PubChem as a chemical information education resource. Especially, a focus will be given on how to help students develop the ability to critically assess chemical information available in PubChem and other public databases.
PubChem: a public chemical information resource for big data chemistrySunghwan Kim
Presented at the Joint Statistical Meetings (JSM) 2020 (virtual) on August 3, 2020.
==== Abstract ====
The idea of “big data” has recently been drawing much attention of the scientific community as well as the general public. An example of big data in Chemistry is the data contained in PubChem, which is a public database of chemical substance descriptions and their biological activities at the National Institutes of Health. PubChem is a sizeable system with 235 million depositor-provided substance descriptions, 96 million unique chemical structures, 1.1 million biological assays, and 268 million biological activity result outcomes. It also contains significant amounts of scientific research data and the inter-relationships between chemicals, proteins, genes, scientific literature, patents and more. PubChem resources have been used in many studies for developing bioactivity and toxicity prediction models, discovering multi-target ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). This presentation provides an overview of how PubChem’s data, tools, and services can be used for bioassay data analysis and virtual screening (VS) and discusses important aspects of exploiting PubChem for drug discovery.
Presented at the Bioinformatics Seminar at the University of Arkansas, Little Rock on November 5, 2021.
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical database at the National Library of Medicine, National Institutes of Health. Arguably, PubChem is one of the largest chemical information resources in the public domain, with 111 million unique chemical structures, 1.39 million biological assays, and 292 million biological activity result outcomes. It also contains significant amounts of scientific research data and the inter-relationships between chemicals, proteins, genes, scientific literature, patents, and more. PubChem is a key resource for big data in chemistry and has been used in many studies for developing bioactivity and toxicity prediction models, discovering polypharmacologic (multi-target) ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). It has also been used for cheminformatics education as well as chemical health and safety training. This presentation provides a high-level overview of PubChem’s data, tools, and services.
PubChem: A Public Chemical Information Resource for Big Data ChemistrySunghwan Kim
A web-seminar jointly organized by KWSE (Korean Woman Scientists & Engineers) and KWiSE (Korean-American Women in Science and Engineering). Presented on July 27, 2021.
PubChem for chemical information literacy trainingSunghwan Kim
Presented at the American Chemical Society Fall 2021 National Meeting (August 23, 2021; virtual).
==== Abstracts ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical information resource that collects chemical information from 780+ data sources. It is visited by millions of users every month and many of them are young students at academic undergraduate or graduate students at academic institutions. While PubChem has a great potential as an online resource for chemical education, it also has important issues that are not familiar to students and educators, including data accuracy, data provenance, structure standardization, terminologies, etc. In this presentation, various aspects of PubChem as a chemical education resource will be discussed, with a special emphasis on how to help students develop chemical information literacy skills.
PubChem as a resource for chemical information trainingSunghwan Kim
Presented at the 257th American Chemical Society (ACS) National Meeting in Orlando, FL (March 31, 2019). [CINF 13]
==== Abstract ====
Libraries at many large academic institutions provide chemical information training programs for students. However, these programs are based on commercial chemical information resources, which come with non-trivial subscription fees. These fees are often too expensive for small organizations, including many primarily undergraduate institutions (PUIs) and community colleges (CCs). It leads to disparity in access to chemical information as well as learning opportunities among students. This issue may be addressed at least in part by developing free online training programs based on public chemical databases, such as PubChem (https://pubchem.ncbi.nlm.nih.gov). PubChem has a great potential as an online resource for chemical education, but it also has important issues that students and teachers should keep in mind, such as data accuracy, data provenance, structure standardization, terminologies and so on. In this presentation, we will discuss various aspects of PubChem as a resource for chemical information training.
Searching for chemical information using PubChemSunghwan Kim
Presented at the 257th American Chemical Society (ACS) National Meeting in Orlando, FL (April 1, 2019). [CHED 303]
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical database, which provides information on a broad range of chemical entities, including small molecules, lipids, carbohydrates, and (chemically-modified) amino acid and nucleic acid sequences (including siRNA and miRNA). With three million unique users per month at peak, PubChem is ranked as one of the most visited chemistry websites in the world. A substantial number of PubChem users are between ages 18 and 24, who are likely to be undergraduate or graduate students at academic institutions. Therefore, PubChem has a great potential as an online resource for chemical education. In this talk, we will present “PubChem Search”, a new web interface that allows users to quickly find desired chemical information. This interface supports chemical name search as well as various types of chemical structure search, including identity/similarity search, superstructure/substructure search, and molecular search. Using PubChem Search, it is also possible to search for journal articles or patent documents that mention a given chemical. The hits returned from a search can be downloaded to local machines or further refined or analyzed in conjunction with other PubChem tools and services. In this presentation, we will demonstrate how the PubChem Search interface can be used to search beyond google for chemical information of interest.
Non-targeted analysis (NTA) uses high-resolution mass spectrometry to better understand the identity of a wide variety of chemicals present in environmental samples (and other matrices). However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. Analysis of the resultant mass spectrometry information relies on cheminformatics to identify and rank chemicals and the US EPA has developed functionality within the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) to address challenges related to this analysis. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will review how the CompTox Chemicals Dashboard via its flexible search capabilities, rich data for ~900,000 chemical substances, and visualization approaches within this open chemistry resource provides a freely available software tool to support structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Researchers at EPA’s National Center for Computational Toxicology integrate advances in biology, chemistry, and computer science to examine the toxicity of chemicals and help prioritize chemicals for further research based on potential human health risks. The goal of this research program is to quickly evaluate thousands of chemicals, but at a much reduced cost and shorter time frame relative to traditional approaches. The data generated by the Center includes characterization of thousands of chemicals across hundreds of high-throughput screening assays, consumer use and production information, pharmacokinetic properties, literature data, physical-chemical properties as well as the predictive computational modeling of toxicity and exposure. We have developed a number of databases and applications to deliver the data to the public, academic community, industry stakeholders, and regulators. This presentation will provide an overview of our work to develop an architecture that integrates diverse large-scale data from the chemical and biological domains, our approaches to disseminate these data, and the delivery of models supporting predictive computational toxicology. In particular, this presentation will review our new CompTox Chemistry Dashboard and the developing architecture to support real-time property and toxicity endpoint prediction. This abstract does not reflect U.S. EPA policy.
This presentation was made to the University of North Carolina in Chapel Hill on 9/20/21. The presentation was a general introduction to cheminformatics prior to how to navigate the Dashboard.
• An introduction to the dashboard
• Substances vs structures
• Structure formats for data exchange and connectivity (SMILES, InChIs, molfiles)
• Identifiers – CASRN, chemical names, systematic names
• Data curation approaches: substance-structure ambiguity
• ChemReg: substance registration
• Data gathering for systematic reviews
• Curated lists
• Properties/Fate and Transport
• Access to Exposure Data
• Hazard data in the dashboard – ToxVal data (sourced from >40 databases, >50,000 chemicals, >900,000 data points)
• The Executive Summary of data
• Single chemical searches vs Batch searches
Presentation given at the Federal Environment Symposium on March 28th 2022.
As part of its mission the Center for Computational Toxicology and Exposure (CCTE) delivers access to chemicals related data via online Dashboards. The CompTox Chemicals Dashboard (available at https://comptox.epa.gov/dashboard) provides access to >900,000 chemicals and associated data including experimental and predicted property data, in vivo hazard data, in vitro bioactivity data, exposure data, and various other data types. The application provides a set of flexible searches allowing for search, visualization and downloads of the data to the desktop for further interrogation. This presentation will provide an overview of the Dashboard and other proof-of-concept applications. For example, the Hazard Comparison Dashboard has a module which allows profiling of chemicals based on toxicity types (https://doi.org/10.1007/s10098-019-01795-w). This presentation will also introduce a number of proof-of-concept modules in development. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are many tens of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of over 20 million chemical substances integrated with over 300 disparate data sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for the semantic web for chemistry and to provide access to a set online tools and services to support access to these data. I will also discuss how ChemSpider is being used to enhance Semantic Publishing in Chemistry at RSC.
The U.S. Environmental Protection Agency (EPA) Computational Toxicology Program utilizes computational and data-driven approaches that integrate chemistry, exposure and biological data to help characterize potential risks from chemical exposure. The National Center for Computational Toxicology (NCCT) has measured, assembled and delivered an enormous quantity and diversity of data for the environmental sciences, including high-throughput in vitro screening data, in vivo and functional use data, exposure models and chemical databases with associated properties. The CompTox Chemicals Dashboard website provides access to data associated with ~900,000 chemical substances. New data are added on an ongoing basis, including the registration of new and emerging chemicals, data extracted from the literature, chemicals studied in our labs, and data of interest to specific research projects at the EPA. Hazard and exposure data have been assembled from a large number of public databases and as a result the dashboard surfaces hundreds of thousands of data points. Other data includes experimental and predicted physicochemical property data, in vitro bioassay data for over 4000 chemicals and ~1500 assays, and millions of chemical identifiers (names and CAS Registry Numbers) to facilitate searching. Other integrated modules include an interactive read-across module, real-time physicochemical and toxicity endpoint prediction and an integrated search to PubMed. This presentation will provide an overview of the CompTox Chemicals Dashboard and how it has developed into an integrated data hub for environmental data. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
PubChem for drug discovery in the age of big data and artificial intelligenceSunghwan Kim
Presented at the American Chemical Society Middle Atlantic Regional Meeting (MARM) 2021 (June 10, 2021).
==== Abstract ====
With the emergence of the age of big data and artificial intelligence, biomedical research communities have a great interest in exploiting the massive amount of chemical and biological data available in the public domain. PubChem (https://pubchem.ncbi.nlm.nih.gov) is one of the largest sources of publicly available chemical information, with +270 million substance descriptions, +110 million unique compounds, +285 million bioactivity outcomes from more than one million biological assay experiments. PubChem provides a wide range of chemical information, including structure, pharmacology, toxicology, drug target, metabolism, chemical vendors, patents, regulations, clinical trials, and many others. These contents can be accessed interactively through web browsers as well as programmatically using computer scripts. They can also be downloaded in bulk through the PubChem File Transfer Protocol (FTP) site. PubChem data has been used in many studies for developing bioactivity and toxicity prediction models, discovering polypharmacologic (multi-target) ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). This presentation provides an overview of PubChem data, tools, and services useful for drug discovery.
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Sunghwan Kim
Presented at the American Chemical Society (ACS) Spring 2021 National Meeting (Virtual, April 16, 2021).
==== Abstract ====
Computer and informatics skills to handle an ever-increasing amount of chemical information are considered important for students pursuing STEM careers in the age of big data. However, many schools do not offer a cheminformatics course or alternative training opportunities. The Cheminformatics Online Chemistry Course (OLCC) is a community effort to introduce cheminformatics content into the undergraduate chemistry curriculum. It is a highly collaborative teaching project involving instructors at multiple schools as well as external cheminformatics experts recruited across sectors, including academia, government, and industry. Three Cheminformatics OLCCs were offered in the Fall 2015, Spring 2017, and Fall 2019 semesters. In each OLCC, the instructors at participating schools would meet face-to-face with the students, while external cheminformatics experts engaged through online discussions across campuses with both the instructors and students. All the material created in the course has been made available at the open education repositories of LibreTexts and CCCE websites for other institutions to adapt to their future needs. This presentation describes the instructional approaches of the Cheminformatics OLCC project and the lessons learned from this community effort. We also discuss future directions for this project as well as cheminformatics education in general, including pedagogy, resources, and course content.
PubChem as an Emerging Toxicological Information ResourceSunghwan Kim
Presented on October 20, 2020 at the 9th American Society for Cellular and Computational Toxicology (ASCCT) National Meeting.
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical information resource at the U.S. National Institutes of Health. It collects chemical information from 750+ data sources and disseminates it to the public free of charge. Arguably, PubChem contains the largest amount of chemical information available in the public domain, with more than 265 million depositor-provided substance descriptions, 100 million unique chemical structures, and 270 million bioactivity outcomes from one million assays covering around twenty thousand unique protein target sequences.
Included in the many types of content in PubChem is toxicological information about chemicals, e.g., human and animal toxicity, ecotoxicity, exposure limits, exposure symptoms, and antidote & emergency treatment. Notably, a substantial amount of toxicological information from resources formerly offered by the TOXicology data NETwork (TOXNET) is now integrated into PubChem, e.g., the Hazardous Substances Data Bank (HSDB), Genetic Toxicology Data Bank (Gene-Tox), Chemical Carcinogenesis Research Information System (CCRIS), LactMed, and LiverTox. In addition, PubChem contains a large amount of bioactivity and toxicity screening data that can be used to build toxicity prediction models based on statistical and machine-learning approaches. This presentation provides an overview of PubChem’s toxicological information and describes how open data in PubChem can be used to develop prediction models for chemical toxicity.
Presented at the Fall 2020 American Chemical Society (ACS) National Meeting (Virtual) on August 20, 2020.
Sunghwan Kim, Jian Zhang, Paul Thiessen, Asta Gindulyte, Pertti J. Hakkinen & Evan Bolton
National Library of Medicine, National Institutes of Health, Rockville, Maryland, United States
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical information resource at the U.S. National Institutes of Health. It collects chemical information from 700+ data sources and disseminates the collected data to the public free of charge. Arguably, PubChem contains the largest amount of chemical information available in the public domain, with more than 250 million depositor-provided substance descriptions, 100 million unique chemical structures, and 265 million bioactivity outcomes from one million assays covering around twenty thousand unique protein target sequences.
Included in the many types of content in PubChem is toxicological information about chemicals, e.g., human and animal toxicity, ecotoxicity, exposure limits, exposure symptoms, and antidote & emergency treatment. Notably, a substantial amount of toxicological information from resources formerly offered by the TOXicology data NETwork (TOXNET) is now integrated into PubChem, e.g., the Hazardous Substances Data Bank (HSDB), LactMed, and LiverTox. In addition, PubChem contains a large amount of bioactivity and toxicity screening data that can be used to build toxicity prediction models based on statistical and machine-learning approaches. This presentation provides an overview of PubChem’s toxicological information as well as tools and services that help users exploit this information. It also describes how open data in PubChem can be used to develop prediction models for chemical toxicity.
Chemical Health and Safety Information in PubChemSunghwan Kim
Presented at the 258th American Chemical Society (ACS) National Meeting in San Diego, CA (August 26, 2019).
Risk assessment in laboratories requires ready access to health and safety (H&S) information for many different chemicals used in laboratory work. Because chemical H&S data in the public domain are scattered across many websites, it is essential to create a centralized data repository that collects, organizes, and disseminates these data. An example is PubChem (https://pubchem.ncbi.nlm.nih.gov), developed and maintained by the U.S. National Institutes of Health.
PubChem contains a substantial corpus of H&S information of chemicals collected from authoritative government agencies and international organizations. PubChem’s H&S data include flammability, toxicity, exposure limits, exposure symptoms, first aid, handling, clean-up procedure, GHS symbols, and more. In addition, for 100,000+ compounds, PubChem provides a tailored data view called the Laboratory Chemical Safety Summary (LCSS), which presents pertinent H&S data for a given compound. The complete list of chemicals with an LCSS can be accessed through the PubChem LCSS project webpage (https://pubchemdocs.ncbi.nlm.nih.gov/lcss/) or the PubChem Classification Browser (https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72). If desired, LCSS data can be downloaded from the LCSS page for each compound, or in bulk from the PubChem LCSS project webpage, enabling local annotation of the data to support specific procedures in place at an institution. The LCSS page can be readily accessed from a mobile device using a chemical QR code.
Chemical Structure Standardization and Synonym Filtering in PubChemSunghwan Kim
Presented at the 258th American Chemical Society (ACS) National Meeting in San Diego, CA (August 26, 2019).
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical data repository that provides information on various chemical entities, including small molecules, siRNA, miRNA, peptides, lipids, carbohydrates, chemically modified biologics, etc. One of the most commonly requested tasks in PubChem is to search for a compound by chemical name (also commonly called “chemical synonym”). PubChem performs this task by looking up chemical synonym-structure associations provided by individual depositors to PubChem. These name-structure associations are used to create links between chemicals and Medical Subject Headings (MeSH) terms, which in turn are used to generate associations between chemicals and PubMed articles. The accuracy of these depositor-provided synonym-structure associations is dependent upon two important quality control methods used in PubChem: (1) chemical structure standardization and (2) synonym filtering based on crowd voting. In this presentation, we will discuss the two quality control methods and their effects on the chemical synonym-structure associations.
Presentation delivered at Lehigh University (Bethlehem, PA) on Friday, April 26, 2019.
This presentation begins with discussing the history of the cheminformatics field. In addition, it also discusses a question "what makes cheminformatics different from bioinformatics?" (by comparing the ways in which molecules are described and compared in the two fields).
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
Presented at the 2018 Research Festival at the National Institutes of Health (NIH) in Bethesda, MD (September 13, 2018).
==== Abstract ====
The retinoid X receptor (RXR) is a nuclear hormone receptor that functions as a transcription factor with roles in development, cell differentiation, metabolism, and cell death. Chemicals that interfere the RXR signaling pathway may cause adverse effects on human health. In this study, public-domain bioactivity data available in PubChem (https://pubchem.ncbi.nlm.nih.gov) were used to develop machine learning-based prediction models for chemical modulators of RXR-alpha, which is a subtype of RXR that plays a role in metabolic signaling pathways, dermal cysts, cardiac development, insulin sensitization, etc. The models were constructed from quantitative high-throughput screening (qHTS) data from the Tox21 project, using popular supervised machine learning methods (including support vector machine, random forest, neural network, k-nearest neighbors, decision tree, and naïve Bayes). The general applicability of the developed models was evaluated with external data sets from ChEMBL and the NCATS Chemical Genomics Center (NCGC). This study showcases how open data in the public domain can be used to develop prediction models for bioactivity of small molecules.
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
Presented at the 256th American Chemical Society (ACS) National Meeting in Boston, MA (August 22, 2018).
==== Abstract ====
The retinoid X receptor (RXR) is a nuclear hormone receptor that functions as a transcription factor with roles in development, cell differentiation, metabolism, and cell death. Chemicals that interfere the RXR signaling pathway may cause adverse effects on human health. In this study, open bioactivity data available at PubChem (https://pubchem.ncbi.nlm.nih.gov) were used to develop prediction models for chemical modulators of RXR-alpha, which is a subtype of RXR that plays a role in metabolic signaling pathways, dermal cysts, cardiac development, insulin sensitization, etc. The models were constructed from quantitative high-throughput screening (qHTS) data from the Tox21 project, using various supervised machine learning methods (including support vector machine, random forest, neural network, k-nearest neighbors, decision tree, and naïve Bayes). The performance of the models was evaluated with an external data set containing bioactivity data submitted by ChEMBL and the NCATS Chemical Genomics Center (NCGC). This study showcases how open data in the public domain can be used to develop prediction models for chemical toxicity.
NCBI Minute: Integrating PubChem into Your Chemistry TeachingSunghwan Kim
NCBI Webinar delivered via online on May 9, 2018.
PubChem is one of most visited chemistry web sites in the world with more than 2.9 million unique users per month. This NCBI Minute shows how you can integrate PubChem in your chemistry teaching as cheminformatics education resource. In addition to learning about tools and services for search, analysis, and download of chemical information, you will see how PubChem has been incorporated in Cheminformatics OLCC (On-Line Chemistry Courses), an intercollegiate hybrid course.
How can you access PubChem programmatically?Sunghwan Kim
Presented at the 255th American Chemical Society (ACS) National Meeting in New Orleans, LA (March. 19, 2018).
Building automated workflows that exploit the vast amount of data contained in PubChem requires programmatic access to the data through application programming interfaces (APIs). PubChem provides several programmatic access routes to its data, including Entrez Utilities (E-Utilities or E-Utils), PubChem Power User Gateway (PUG), PUG-SOAP, PUG-REST, PUG-View, and a REST-ful interface to PubChemRDF. This presentation provides an overview of these programmatic access tools, including recent updates, limitations, usage policies, and best practices.
*References*
(1) PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem, Nucleic Acids Research, 2015, 43(W1):W605–W611. https://doi.org/10.1093/nar/gkv396
(2) An update on PUG-REST: RESTful interface for programmatic access to PubChem, Nucleic Acids Research, 2018, 46(W1):gky294. https://doi.org/10.1093/nar/gky294
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Searching for patent information in PubChem
1. Searching for
Patent Information in PubChem
Sunghwan Kim (sunghwan.kim@nih.gov),
Paul Thiessen, Asta Gindulyte, Evan Bolton
National Center for Biotechnology Information
National Library of Medicine
National Institutes of Health
ACS Fall 2018 National Meeting in Boston, MA
Sunday, August 19, 2018
2. 2
NIH’s chemical information resource.
Collects public-domain chemical data from >620 data sources.
Disseminates it back to the public free of charge.
What is PubChem?
The Public
Data
Collection
Data
Dissemination
(free of charge)
Government agencies
University labs
Publishers
Pharma Companies
Chemical venders
Public databases
3. 3
Data organization in PubChem
Unique chemical
structure extraction
through Standardization
Depositor-provided
substance descriptions
Unique chemical structures
Data Contributors
Substance
deposition
Depositor-provided
Bioactivity test results
Activity of tested
“substances”
Activity of “compounds” derived
from associated “substances”
Assay
deposition
4. 4
Unique chemical
structure extraction
through Standardization
Activity of tested
“substances”
Activity of “compounds” derived
from associated “substances”
Data Contributors
Substance
deposition
Assay
deposition
Data organization in PubChem
Substance ID (SID)
Compound ID (CID)
Assay ID (AID)
5. 5
PubChem (https://pubchem.ncbi.nlm.nih.gov)
PubChem contains:
• >247.2 million substance descriptions,
• >96.4 million unique chemical structures,
• >236.7 million biological activity test results
• >1.25 million biological assays, covering >10,000
unique protein sequence targets.
The largest collection of
publicly available chemical information
from >620 data sources.
(as of August 15, 2018)
11. 11
How to access PubChem patent information
1. How to find patent information for a given chemical.
2. How to find chemicals mentioned in a given patent document.
3. How to retrieve all chemicals with patent information.
4. How to search for chemicals with patent information through:
• Identity/similarity search
• Substructure/superstructure search
5. How to retrieve chemicals associated with a patent classification.
6. How to access patent information programmatically.
12. 12
How to access PubChem patent information
1. How to find patent information for a given chemical.
2. How to find chemicals mentioned in a given patent document.
3. How to retrieve all chemicals with patent information.
4. How to search for chemicals with patent information through:
• Identity/similarity search
• Substructure/superstructure search
5. How to retrieve chemicals associated with a patent classification
6. How to access patent information programmatically
13. 13
Compound Summary page
Provides an aggregated view of all information available in PubChem
for a given chemical.
Can be accessed:
• from various search/analysis tools
• via a simple URL ending with the CID or common chemical name
(ex) aspirin (CID 2244)
https://pubchem.ncbi.nlm.nih.gov/compound/2244
https://pubchem.ncbi.nlm.nih.gov/compound/aspirin
14. 14
Compound Summary page
Includes patent information on a given chemical.
• Drug patents from FDA Orange Book and DrugBank
• Depositor-provided patents that mention the chemical
• WIPO International Patent Classification
• Related records with patent information
24. 24
How to access PubChem patent information
1. How to find patent information for a given chemical.
2. How to find chemicals mentioned in a given patent document.
3. How to retrieve all chemicals with patent information.
4. How to search for chemicals with patent information through:
• Identity/similarity search
• Substructure/superstructure search
5. How to retrieve chemicals associated with a patent classification.
6. How to access patent information programmatically.
25. 25
Patent View
PubChem generates the Patent View page for a patent document
available in PubChem.
The Patent View provides:
• Patent title and abstract
• Inventor and applicant
• Application and publication dates
• List of chemicals mentioned
• Patent classification information
based on the WIPO International Patent Classification (IPC).
26. 26
Patent View
Accessible via a simple web address containing the patent number at
the end.
(ex) The Patent View page for EP0521471:
https://pubchem.ncbi.nlm.nih.gov/patent/EP0521471
It can also be accessed through several PubChem tools and services
such as:
• Compound Summary
• PubChem Search
• Classification Browser
36. 36
How to access PubChem patent information
1. How to find patent information for a given chemical.
2. How to find chemicals mentioned in a given patent document.
3. How to retrieve all chemicals with patent information.
4. How to search for chemicals with patent information through:
• Identity/similarity search
• Substructure/superstructure search
5. How to retrieve chemicals associated with a patent classification.
6. How to access patent information programmatically.
39. 39
How to access PubChem patent information
1. How to find patent information for a given chemical.
2. How to find chemicals mentioned in a given patent document.
3. How to retrieve all chemicals with patent information.
4. How to search for chemicals with patent information through:
• Identity/similarity search
• Substructure/superstructure search
5. How to retrieve chemicals associated with a patent classification.
6. How to access patent information programmatically.
57. 57
How to access PubChem patent information
1. How to find patent information for a given chemical.
2. How to find chemicals mentioned in a given patent document.
3. How to retrieve all chemicals with patent information.
4. How to search for chemicals with patent information through:
• Substructure/superstructure search
• Identity/similarity search
5. How to retrieve chemicals associated with a patent classification.
6. How to access patent information programmatically.
58. 58
Classification Browser
(https://pubchem.ncbi.nlm.nih.gov/classification)
Browse PubChem data using a classification of interest.
Search for records annotated with the desired classification/term.
Available ontologies/classifications:
• MeSH
• ChEBI
• FDA Pharm Classes
• KEGG
• LIPID MAPS classification system for lipids
• PubChem Compound Table of Contents
• PubChem BioAssay Classification
• WHO ATC Code (Anatomical Therapeutic Chemical Classification
System)
• WIPO International Patent Classification
• ……
66. 66
How to access PubChem patent information
1. How to find patent information for a given chemical.
2. How to find chemicals mentioned in a given patent document.
3. How to retrieve all chemicals with patent information.
4. How to search for chemicals with patent information through:
• Identity/similarity search
• Substructure/superstructure search
5. How to retrieve chemicals associated with a patent classification.
6. How to access patent information programmatically.
67. 67
PUG-REST
Representational State Transfer (REST)-style
interface.
Simplified access route
without the overhead of XML or SOAP envelopes
Access to data that are not accessible
through other PUG Services.
Intended to handle short, synchronous requests (<30
seconds).
68. 68
https://pubchem.ncbi.nlm.nih.gov/rest/pug/<INPUT>/<OPERATION>/<OUTPUT>[?OPTIONS]
Prolog
(common to all PUG REST requests)
Options specific to
some operations
<INPUT>
Specifies identifiers of interest,
by identifiers
by chemical name
by chemical structure search
by cross reference
by listkey, ......
<OPERATION>
Specifies what to do with input
get full records
get molecular properties
get synonyms or images
get cross references
many other operations
<OUTPUT>
Specifies desired output format
XML PNG
JSON SDF
JSONP CSV
ASNB TXT
ASNT
URL construction for a PUG-REST request
The three parts are (mostly) independent of each other.
Many possible requests in a PUG-REST request.
69. 69
https://pubchem.ncbi.nlm.nih.gov/rest/pug/<INPUT>/<OPERATION>/<OUTPUT>[?OPTIONS]
Prolog
(common to all PUG REST requests)
Options specific to
some operations
<INPUT>
Specifies identifiers of interest,
by identifiers
by chemical name
by chemical structure search
by cross reference
by listkey, ......
<OPERATION>
Specifies what to do with input
get full records
get molecular properties
get synonyms or images
get cross references
many other operations
<OUTPUT>
Specifies desired output format
XML PNG
JSON SDF
JSONP CSV
ASNB TXT
ASNT
URL construction for a PUG-REST request
Retrieve all Patent IDs associated with CID 2244.
https://......../rest/pug/compound/cid/2244/xrefs/PatentID/TXT
70. 70
https://pubchem.ncbi.nlm.nih.gov/rest/pug/<INPUT>/<OPERATION>/<OUTPUT>[?OPTIONS]
Prolog
(common to all PUG REST requests)
Options specific to
some operations
<INPUT>
Specifies identifiers of interest,
by identifiers
by chemical name
by chemical structure search
by cross reference
by listkey, ......
<OPERATION>
Specifies what to do with input
get full records
get molecular properties
get synonyms or images
get cross references
many other operations
<OUTPUT>
Specifies desired output format
XML PNG
JSON SDF
JSONP CSV
ASNB TXT
ASNT
URL construction for a PUG-REST request
Retrieve all compounds associated with Patent US20050159403A1.
https://....../rest/pug/compound/xref/PatentID/US20050159403A1/cids/TXT
72. 72
Limitations
1. PubChem does not directly extract information from
patents. Instead, it relies on voluntary contributions from
data sources.
• Lag time between PubChem and original data sources.
• If the data sources are wrong, so is PubChem.
• PubChem does not cover all patent documents.
2. Not all patents worldwide are considered.
• Primary focus on USPTO
• EPO, WIPO, JPO
73. 73
Limitations
3. Multiple patent documents about a single invention (e.g.,
with different kind codes) are aggregated into a single
patent view page.
• It is not possible to tell between documents which
chemicals are mentioned.
4. Only WIPO IPC is available.
• Cooperative Patent Classification (CPC) information is
not available at this time.
75. 75
21 M unique compounds associated with 6.9 M
patents from five data sources, including:
• SureChEMBL
• IBM
• SCRIPDB
• NextMove
• BindingDB
On the Summary page for each compound
• Patent IDs
• Patent Classifications
• FDA Orange book patents
• Structurally similar compounds with patent
information
76. 76
Various search types for chemicals with patent
information are supported.
• Text (chemical name) search
• Substructure/superstructure search
• Identity/similarity search
Classification browser to retrieve compounds with a
given patent classification
Programmatic access to patent information through
PUG-REST
77. 77
Acknowledgements
Evan Bolton
Jie Chen
Tiejun Cheng
Asta Gindulyte
Jia He
Siqian He
Qingliang Li
Benjamin Shoemaker
Thiessen Paul
Bo Yu
Leonid Zaslavsky
Jian Zhang
The PubChem Team
PubChem depositors, users, and collaborators
Funded by the National Library of Medicine