SlideShare a Scribd company logo
1 of 63
Download to read offline
PubChem: A Public Chemical Information
Resource for Big Data Chemistry
Sunghwan Kim, Ph.D., M.Sc.
KWSE-KWiSE Joint Web-Seminar
July 27, 2021
2
Outline
1. What Is PubChem?
2. What Does PubChem Have?
3. Exploring Chemical Information in PubChem
4. Programmatic Access to PubChem
5. PubChem for Cheminformatics Education
6. Summary
3
1. What Is PubChem?
4
▪ https://pubchem.ncbi.nlm.nih.gov
▪ Public chemical database at NIH.
▪ Contains information on various chemical entities:
• (Drug-like) small molecules
• siRNAs & miRNAs
• Carbohydrates
• Lipids
• Peptides
• Chemically modified macromolecules
• ……
PubChem Is a Public Chemical Information Resource
5
PubChem Is a Data Aggregator
PubChem Sources: https://pubchem.ncbi.nlm.nih.gov/sources
Gov’t
agencies
Academic
institutions
Publishers
Pharma
companies
Chemical
vendors
Scientific
databases
800+ data sources Users
o Biomedical Researchers
• Chemical biology
• Medicinal chemistry
• Drug design & discovery
• Cheminformatics
o Data scientists
o Patent agents/examiners
o Chemical safety officers
o Educators/librarians
o Students
6
History of PubChem
➢ NIH Molecular Libraries Program (MLP)
▪ Common Fund project.
7
History of PubChem
➢ NIH Molecular Libraries Program (MLP)
▪ Common Fund project.
▪ Aimed to provide academic researchers with high-throughput
screening (HTS) resources for drug discovery.
8
History of PubChem
➢ NIH Molecular Libraries Program (MLP)
▪ Common Fund project.
▪ Aimed to provide academic researchers with high-throughput
screening (HTS) resources for drug discovery.
▪ Had three components:
Large, shared
compound
library
HTS centers at
academic
institutions
Central data
repository
(PubChem)
9
History of PubChem
➢ PubChem was launched in 2004 as a component of MLP.
➢ All Common Fund projects are supported only up to 10 years.
Large, shared
compound
library
HTS centers at
academic
institutions
Central data
repository
(PubChem)
10
History of PubChem
➢ PubChem was launched in 2004 as a component of MLP.
➢ All Common Fund projects are supported only up to 10 years.
➢ PubChem evolved to play a dual role:
▪ As a data archive
▪ As a knowledgebase
Large, shared
compound
library
HTS centers at
academic
institutions
Central data
repository
(PubChem)
11
0
1
2
3
4
5
6
Unique
Monthly
Users
(millions)
Time
Monthly Usage Statistics
(Unique Interactive Users Only)
Source: Google Analytics
▪ 5 million unique interactive users per month at peak (Oct. 2020)
▪ Programmatic requests are not included.
▪ These statistics are lower-bound.
12
29.0%
16.4%
5.0%
3.7%
3.5%
3.5%
2.2%
2.2%
1.7%
1.5%
1.4%
1.2%
1.2%
1.1%
1.1%
0 3 6 9 12 15
United States
India
China
United Kingdom
Canada
Philippines
Germany
Japan
Australia
Brazil
South Korea
Indonesia
France
Pakistan
Italy
Millions
Yearly Interactive Users by Country
(July 2020 – June 2021)
13
~75% of PubChem Users Come through Search Engines
14
2. What Does PubChem Have?
15
PubChem Data Content
Structures and properties
16
Spectra
PubChem Data Content
Structures and properties
17
Spectra
Chemical
health & safety
PubChem Data Content
Structures and properties
18
Spectra
Chemical
health & safety
Bioactivity
PubChem Data Content
Structures and properties
19
Spectra
Chemical
health & safety
Bioactivity Chemical vendors & synthesis
PubChem Data Content
Structures and properties
20
Drugs
PubChem Data Content
21
Clinical trials
Drugs
PubChem Data Content
22
Clinical trials
Patents
Drugs
PubChem Data Content
23
Clinical trials
Patents
Drugs
Scientific articles
PubChem Data Content
24
3. Exploring Chemical Information in
PubChem
25
Text Query
▪ Chemical name
▪ Gene/protein name
▪ Pathway name
▪ Patent ID
▪ CAS registry number
▪ PubChem record ID
(CID, SID, AID)
26
Multiple
collections are
searched
simultaneously.
https://pubchem.ncbi.nlm.nih.gov/
#query=%22salicylic%20acid%22
27
Compound
Summary for
salicylic acid
(CID 338)
https://pubchem.ncbi.nlm.nih.gov/
compound/338
28
Chemical Structure
Query
▪ SMILES
▪ InChI/InChIKey
29
Multiple types of
chemical
structure search
▪ Identity
▪ 2-D similarity
▪ 3-D similarity
▪ Substructure
▪ Superstructure
30
Gene/Protein/Pathway Summary
➢ Suppose that you want to:
o Retrieve ALL active compounds
against a given protein/gene/pathway target
(e.g., HMGCR=3-hydroxy-3-methylglutaryl-CoA reductase).
• To identify common chemical scaffolds responsible for bioactivity.
• To build a quantitative structure-activity relationship (QSAR) model.
→Gene/Protein/Pathway Summary
• Provides a target-centric view of PubChem data.
• Organizes all data available in PubChem for a given
gene/protein/pathway.
31
32
Patent Summary
➢ Suppose that you want to:
o Retrieve ALL chemicals mentioned in a given patent document.
→Patent Summary page
• Provides a list of chemicals “mentioned” in the patent application/grant.
• No information on why they are mentioned.
(e.g., as a subject matter or as a prior art?)
• Other information, including:
- Title, abstract, date, inventor, …
- International patent classification (IPC) codes
33
34
▪ https://pubchem.ncbi.nlm.nih.gov/classification
▪ Browse PubChem data using a classification of interest.
▪ Search for records annotated with the desired classification/term.
▪ A few examples of supported ontologies/classifications.
• MeSH (Medical Subject Headings)
• ChEBI (Chemical Entities of Biological Interest)
• FDA Pharm Classes
• PubChem Compound Table of Contents
• PubChem BioAssay Classification
• WHO ATC (Anatomical Therapeutic Chemical Classification System) Code
• WIPO International Patent Classification
Classification Browser
35
Classification
Browser
36
37
▪ Identifier Exchange Service
https://pubchemdocs.ncbi.nlm.nih.gov/identifier-exchange-service
▪ Score Matrix Service
https://pubchemdocs.ncbi.nlm.nih.gov/identifier-exchange-service
▪ Standardization Service
https://pubchem.ncbi.nlm.nih.gov/standardize/standardize.cgi
▪ PubChem Data Sources (https://pubchem.ncbi.nlm.nih.gov/sources)
▪ PubChem Widgets (https://pubchemdocs.ncbi.nlm.nih.gov/widgets)
▪ PubChem Upload (https://pubchem.ncbi.nlm.nih.gov/upload/)
▪ PubChem Blog (https://pubchemblog.ncbi.nlm.nih.gov)
▪ PubChemDocs (https://pubchemdocs.ncbi.nlm.nih.gov)
Other Tools & Services
38
4. Programmatic Access to
PubChem
39
➢ PubChem users have very diverse
backgrounds/interests.
➢ PubChem’s web interfaces are optimized
to perform commonly requested tasks
interactively.
40
➢ PubChem users have very diverse
backgrounds/interests.
➢ PubChem’s web interfaces are optimized
to perform commonly requested tasks
interactively.
➢ Everything you can do with PubChem
through the web browser can be
automated through PubChem’s
programmatic interfaces.
41
➢ PubChem users have very diverse
backgrounds/interests.
➢ PubChem’s web interfaces are optimized
to perform commonly requested tasks
interactively.
➢ Everything you can do with PubChem
through the web browser can be
automated through PubChem’s
programmatic interfaces.
➢ Programmatic access enables one to do
much more complicated tasks that cannot
be done through the web browser.
42
➢ Multiple programmatic access routes
➢ Two major programmatic access methods
o PUG-REST (primarily for computed properties).
https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest
o PUG-View (primarily for text information).
https://pubchemdocs.ncbi.nlm.nih.gov/pug-view
➢ Request volume limitation:
o No more than 5 requests per second
(See more at: https://pubchemdocs.ncbi.nlm.nih.gov/programmatic-
access$_RequestVolumeLimitations)
o Violators/abusers may be blocked for a certain period of time.
Entrez
Utilities
(E-Utils)
Power User
Gateway
(PUG)
PUG-SOAP PUG-REST
PubChem
RDF REST
PUG-View
43
➢ Bulk Download
o PubChem FTP Site
ftp://ftp.ncbi.nlm.nih.gov/pubchem
o PubChem RDF (Resource Description Network)
https://pubchemdocs.ncbi.nlm.nih.gov/rdf
44
5. PubChem for Cheminformatics Education
45
User Demographics
(June 2020 through May 2021)
36.5%
27.4%
13.5%
10.4%
6.7% 5.4%
0
1
2
3
4
5
6
18-24 25-34 35-44 45-54 55-64 65+
Number
of
Users
(millions)
Age
34.64% of total users
~40% of PubChem users are aged between 18 and 24.
(likely to be college students)
46
▪ Popularity: many young people are already using PubChem.
▪ Sustainability: it is 17 years old and not going away soon.
▪ Zero-cost (to students): U.S. taxpayers have already paid for it.
47
PubChem presents a strong potential as an education resource
especially for small organizations like:
• primarily undergraduate institutions (PUIs)
• community colleges (CCs)
▪ Popularity: many young people are already using PubChem.
▪ Sustainability: it is 17 years old and not going away soon.
▪ Zero-cost (to students): U.S. taxpayers have already paid for it.
48
➢ How about R1 universities with large endowments?
▪ Likely to have access to proprietary databases.
• Primarily used for research.
• Inconvenient off-campus access.
• Students will lose access when they graduate.
▪ Most students will eventually rely on public resources.
→ Need for training/education opportunities while in school.
49
Cheminformatics Online Chemistry Course (OLCC)
Collaborative Teaching
Instructors at
multiple schools
Open Education
Resource (OER)
Development
Cheminformatics
experts from
outside
S. Kim et al., J. Chem. Educ. 2021, 98(2), 416–425.
50
Cheminformatics
experts
Prepare online reading materials &
homework problem sets
Instructional Approach
Course website
51
Cheminformatics
experts
Prepare online reading materials &
homework problem sets
Run the course
using the course materials
at multiple schools
Course website
Course
Instructor
Students
Instructional Approach
52
Course website
Cheminformatics
experts
Prepare online reading materials &
homework problem sets
Course
Instructor
Students
Run the course
using the course materials
at multiple schools
Face-to-face
meeting
Online discussion among
experts, instructors & students
Instructional Approach
53
Enrollment Statistics
Semester # Schools # Students Participating schoolsa
Fall 2015 4 36
UALR (6), Centre (4), WVU (5), UNF
(21)
Spring 2017 9 47
UALR (12), Centre (0b), UHSP (3),
IQS (6c), SDSU (4), Potsdam (3),
UIS (6), Campbell (12), Rutgers (1)
Fall 2019 5 23
UALR (2), Centre (3), UHSP (4),
IQS (9c), Otterbein (5),
a Numbers in parentheses are the numbers of enrolled students at individual schools.
b The course was taken by three faculty and staff members who participated in a faculty learning circle.
No students enrolled.
c Not formally enrolled as the course was offered as a non-credit seminar.
54
OLCC Course websites
2015 • http://olcc.ccce.divched.org/Fall2015OLCC
• https://chem.libretexts.org/link?50598
2017 • http://olcc.ccce.divched.org/Spring2017OLCC
• https://chem.libretexts.org/link?83678
2019 • https://chem.libretexts.org/link?143689
Course Websites
➢ All course materials are freely available for reuse at:
▪ Committee on Computers in Chemical Education (CCCE) website
(http://olcc.ccce.divched.org)
▪ LibreTexts (https://libretexts.org)
55
▪ Critical assessment of chemical information
▪ Chemical representations (e.g., InChI and SMILES)
▪ Search by chemical name
▪ Search by chemical structure
o Identity search
o 2-D/3-D similarity search
o Substructure/superstructure search
▪ Structure clustering
▪ Structure-activity relationship analysis
▪ Automation of chemical data retrieval
PubChem-Related Topics
56
▪ Python Jupyter Notebooks
(with sample codes and assignments)
o Programmatic access to PubChem data
o Cheminformatics tasks using open-source
software packages
(e.g., RDKit, scikit-learn, and Mordred).
o Bioactivity prediction using machine learning
and PubChem data.
Python/R Programming (Fall 2019)
57
▪ Python Jupyter Notebooks
(with sample codes and assignments)
o Programmatic access to PubChem data
o Cheminformatics tasks using open-source
software packages
(e.g., RDKit, scikit-learn, and Mordred).
o Bioactivity prediction using machine learning
and PubChem data.
▪ Similar materials for the R language
(using JupyterLab and R-Studio)
Python/R Programming (Fall 2019)
58
▪ Students had two options to run the notebooks:
o Download and run them on their own computers.
o Run the notebooks on JupyterHub available through LibreTexts.
Python/R Programming (Fall 2019)
59
• PubChem is the largest source of publicly available chemical
information, collected from hundreds of data sources.
• All contents are provided to the public free of charge.
• PubChem contains a wide range of chemical information, necessary
for drug discovery.
• It is used by more than five million users per month at peak.
Summary
60
• PubChem’s web interface allows average users to readily perform
commonly requested, simple tasks.
• PubChem supports programmatic access to its data, allowing
advanced users to perform much more complex tasks that are not
supported by the web interfaces.
• PubChem can be downloaded in bulk.
• PubChem data, tools, and services can be used to teach
cheminformatics for college students.
• Please reach out to us for collaboration.
Summary
61
Acknowledgements
▪ The PubChem Team
▪ Funding
Evan Bolton Jia He Thiessen Paul Zhi Sun
Jie Chen Siqian He Bo Yu
Tiejun Chung Qingliang Li Leonid Zaslavsky
Asta Gindulyte Ben Shoemaker Jian Zhang
Intramural Research Program of the National Library of Medicine
62
Thank you for your attention.
Questions?
Sunghwan Kim, Ph.D., M.Sc.
Email: kimsungh@ncbi.nlm.nih.gov
LinkedIn: https://www.linkedin.com/in/sunghwan-kim/
63
❑ References
▪ Getting the most out of PubChem for virtual screening
S. Kim, Expert Opin. Drug Discov. 2016, 11(9), 843-855.
▪ PubChem in 2021: new data content and improved web interfaces
S. Kim et al., Nucleic Acids Res. 2021, 49(D1):D1388–D1395.
▪ An update on PUG-REST: RESTful interface for programmatic access to PubChem
S. Kim et al., Nucleic Acids Res. 2018, 46(W1):W563-W570.
▪ PUG-View: programmatic access to chemical annotations integrated in PubChem
S. Kim et al., J. Cheminform. 2019, 11:56.
▪ Teaching Cheminformatics through a Collaborative Intercollegiate Online Chemistry
Course (OLCC)
S. Kim et al., J. Chem. Educ. 2021, 98(2), 416–425.

More Related Content

What's hot

How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?Sunghwan Kim
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingSunghwan Kim
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem Sunghwan Kim
 
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Sunghwan Kim
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligenceSunghwan Kim
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoverySunghwan Kim
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data ChemistrySunghwan Kim
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChemSunghwan Kim
 
Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The BasicsPeter Berger
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...ChemAxon
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
 
Environmental chemical information in PubChem
Environmental chemical information in PubChem Environmental chemical information in PubChem
Environmental chemical information in PubChem Jian Zhang
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
 
Digging out Structures for Repurposing: Non-competitive Intelligence ...
Digging out Structures for Repurposing: Non-competitive Intelligence        ...Digging out Structures for Repurposing: Non-competitive Intelligence        ...
Digging out Structures for Repurposing: Non-competitive Intelligence ...Chris Southan
 

What's hot (20)

How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
 
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug Discovery
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Pubchem
PubchemPubchem
Pubchem
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChem
 
Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The Basics
 
PubChem LCSS
PubChem LCSSPubChem LCSS
PubChem LCSS
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
 
ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
 
Knowledge is Property- All YOU need to know ABC of Patent Searching
Knowledge is Property- All YOU need to know ABC of Patent SearchingKnowledge is Property- All YOU need to know ABC of Patent Searching
Knowledge is Property- All YOU need to know ABC of Patent Searching
 
Environmental chemical information in PubChem
Environmental chemical information in PubChem Environmental chemical information in PubChem
Environmental chemical information in PubChem
 
Assay Development and Drug Repurposing Core
Assay Development and Drug Repurposing CoreAssay Development and Drug Repurposing Core
Assay Development and Drug Repurposing Core
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
Digging out Structures for Repurposing: Non-competitive Intelligence ...
Digging out Structures for Repurposing: Non-competitive Intelligence        ...Digging out Structures for Repurposing: Non-competitive Intelligence        ...
Digging out Structures for Repurposing: Non-competitive Intelligence ...
 

Similar to PubChem: A Public Chemical Information Resource for Big Data Chemistry

PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICRafael C. Jimenez
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Amit Sheth
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance
 
Biologics information in PubChem
Biologics information in PubChemBiologics information in PubChem
Biologics information in PubChemJian Zhang
 
Informatics Principles of Modern Institutional Bio-banking: The Road Ahead
Informatics Principles of Modern Institutional Bio-banking: The Road AheadInformatics Principles of Modern Institutional Bio-banking: The Road Ahead
Informatics Principles of Modern Institutional Bio-banking: The Road AheadMike Hogarth, MD, FACMI, FACP
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Peter Embi
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
Linked-spls-initial-hcls-presentation-06082012
Linked-spls-initial-hcls-presentation-06082012Linked-spls-initial-hcls-presentation-06082012
Linked-spls-initial-hcls-presentation-06082012Richard Boyce, PhD
 
The Impact of Real-World Data in Pharmacovigilance and Regulatory Decision-Ma...
The Impact of Real-World Data in Pharmacovigilance and Regulatory Decision-Ma...The Impact of Real-World Data in Pharmacovigilance and Regulatory Decision-Ma...
The Impact of Real-World Data in Pharmacovigilance and Regulatory Decision-Ma...ClinosolIndia
 
Mobile health plaform strategy
Mobile health plaform strategyMobile health plaform strategy
Mobile health plaform strategyGwanhoo Lee
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
 

Similar to PubChem: A Public Chemical Information Resource for Big Data Chemistry (20)

Cheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting ExposomicsCheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting Exposomics
 
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUIC
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
 
Project Hippocrates
Project HippocratesProject Hippocrates
Project Hippocrates
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016
 
Biologics information in PubChem
Biologics information in PubChemBiologics information in PubChem
Biologics information in PubChem
 
Informatics Principles of Modern Institutional Bio-banking: The Road Ahead
Informatics Principles of Modern Institutional Bio-banking: The Road AheadInformatics Principles of Modern Institutional Bio-banking: The Road Ahead
Informatics Principles of Modern Institutional Bio-banking: The Road Ahead
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
Linked-spls-initial-hcls-presentation-06082012
Linked-spls-initial-hcls-presentation-06082012Linked-spls-initial-hcls-presentation-06082012
Linked-spls-initial-hcls-presentation-06082012
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
The Impact of Real-World Data in Pharmacovigilance and Regulatory Decision-Ma...
The Impact of Real-World Data in Pharmacovigilance and Regulatory Decision-Ma...The Impact of Real-World Data in Pharmacovigilance and Regulatory Decision-Ma...
The Impact of Real-World Data in Pharmacovigilance and Regulatory Decision-Ma...
 
Mobile health plaform strategy
Mobile health plaform strategyMobile health plaform strategy
Mobile health plaform strategy
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
 

Recently uploaded

Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 

Recently uploaded (20)

Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 

PubChem: A Public Chemical Information Resource for Big Data Chemistry

  • 1. PubChem: A Public Chemical Information Resource for Big Data Chemistry Sunghwan Kim, Ph.D., M.Sc. KWSE-KWiSE Joint Web-Seminar July 27, 2021
  • 2. 2 Outline 1. What Is PubChem? 2. What Does PubChem Have? 3. Exploring Chemical Information in PubChem 4. Programmatic Access to PubChem 5. PubChem for Cheminformatics Education 6. Summary
  • 3. 3 1. What Is PubChem?
  • 4. 4 ▪ https://pubchem.ncbi.nlm.nih.gov ▪ Public chemical database at NIH. ▪ Contains information on various chemical entities: • (Drug-like) small molecules • siRNAs & miRNAs • Carbohydrates • Lipids • Peptides • Chemically modified macromolecules • …… PubChem Is a Public Chemical Information Resource
  • 5. 5 PubChem Is a Data Aggregator PubChem Sources: https://pubchem.ncbi.nlm.nih.gov/sources Gov’t agencies Academic institutions Publishers Pharma companies Chemical vendors Scientific databases 800+ data sources Users o Biomedical Researchers • Chemical biology • Medicinal chemistry • Drug design & discovery • Cheminformatics o Data scientists o Patent agents/examiners o Chemical safety officers o Educators/librarians o Students
  • 6. 6 History of PubChem ➢ NIH Molecular Libraries Program (MLP) ▪ Common Fund project.
  • 7. 7 History of PubChem ➢ NIH Molecular Libraries Program (MLP) ▪ Common Fund project. ▪ Aimed to provide academic researchers with high-throughput screening (HTS) resources for drug discovery.
  • 8. 8 History of PubChem ➢ NIH Molecular Libraries Program (MLP) ▪ Common Fund project. ▪ Aimed to provide academic researchers with high-throughput screening (HTS) resources for drug discovery. ▪ Had three components: Large, shared compound library HTS centers at academic institutions Central data repository (PubChem)
  • 9. 9 History of PubChem ➢ PubChem was launched in 2004 as a component of MLP. ➢ All Common Fund projects are supported only up to 10 years. Large, shared compound library HTS centers at academic institutions Central data repository (PubChem)
  • 10. 10 History of PubChem ➢ PubChem was launched in 2004 as a component of MLP. ➢ All Common Fund projects are supported only up to 10 years. ➢ PubChem evolved to play a dual role: ▪ As a data archive ▪ As a knowledgebase Large, shared compound library HTS centers at academic institutions Central data repository (PubChem)
  • 11. 11 0 1 2 3 4 5 6 Unique Monthly Users (millions) Time Monthly Usage Statistics (Unique Interactive Users Only) Source: Google Analytics ▪ 5 million unique interactive users per month at peak (Oct. 2020) ▪ Programmatic requests are not included. ▪ These statistics are lower-bound.
  • 12. 12 29.0% 16.4% 5.0% 3.7% 3.5% 3.5% 2.2% 2.2% 1.7% 1.5% 1.4% 1.2% 1.2% 1.1% 1.1% 0 3 6 9 12 15 United States India China United Kingdom Canada Philippines Germany Japan Australia Brazil South Korea Indonesia France Pakistan Italy Millions Yearly Interactive Users by Country (July 2020 – June 2021)
  • 13. 13 ~75% of PubChem Users Come through Search Engines
  • 14. 14 2. What Does PubChem Have?
  • 17. 17 Spectra Chemical health & safety PubChem Data Content Structures and properties
  • 18. 18 Spectra Chemical health & safety Bioactivity PubChem Data Content Structures and properties
  • 19. 19 Spectra Chemical health & safety Bioactivity Chemical vendors & synthesis PubChem Data Content Structures and properties
  • 24. 24 3. Exploring Chemical Information in PubChem
  • 25. 25 Text Query ▪ Chemical name ▪ Gene/protein name ▪ Pathway name ▪ Patent ID ▪ CAS registry number ▪ PubChem record ID (CID, SID, AID)
  • 27. 27 Compound Summary for salicylic acid (CID 338) https://pubchem.ncbi.nlm.nih.gov/ compound/338
  • 29. 29 Multiple types of chemical structure search ▪ Identity ▪ 2-D similarity ▪ 3-D similarity ▪ Substructure ▪ Superstructure
  • 30. 30 Gene/Protein/Pathway Summary ➢ Suppose that you want to: o Retrieve ALL active compounds against a given protein/gene/pathway target (e.g., HMGCR=3-hydroxy-3-methylglutaryl-CoA reductase). • To identify common chemical scaffolds responsible for bioactivity. • To build a quantitative structure-activity relationship (QSAR) model. →Gene/Protein/Pathway Summary • Provides a target-centric view of PubChem data. • Organizes all data available in PubChem for a given gene/protein/pathway.
  • 31. 31
  • 32. 32 Patent Summary ➢ Suppose that you want to: o Retrieve ALL chemicals mentioned in a given patent document. →Patent Summary page • Provides a list of chemicals “mentioned” in the patent application/grant. • No information on why they are mentioned. (e.g., as a subject matter or as a prior art?) • Other information, including: - Title, abstract, date, inventor, … - International patent classification (IPC) codes
  • 33. 33
  • 34. 34 ▪ https://pubchem.ncbi.nlm.nih.gov/classification ▪ Browse PubChem data using a classification of interest. ▪ Search for records annotated with the desired classification/term. ▪ A few examples of supported ontologies/classifications. • MeSH (Medical Subject Headings) • ChEBI (Chemical Entities of Biological Interest) • FDA Pharm Classes • PubChem Compound Table of Contents • PubChem BioAssay Classification • WHO ATC (Anatomical Therapeutic Chemical Classification System) Code • WIPO International Patent Classification Classification Browser
  • 36. 36
  • 37. 37 ▪ Identifier Exchange Service https://pubchemdocs.ncbi.nlm.nih.gov/identifier-exchange-service ▪ Score Matrix Service https://pubchemdocs.ncbi.nlm.nih.gov/identifier-exchange-service ▪ Standardization Service https://pubchem.ncbi.nlm.nih.gov/standardize/standardize.cgi ▪ PubChem Data Sources (https://pubchem.ncbi.nlm.nih.gov/sources) ▪ PubChem Widgets (https://pubchemdocs.ncbi.nlm.nih.gov/widgets) ▪ PubChem Upload (https://pubchem.ncbi.nlm.nih.gov/upload/) ▪ PubChem Blog (https://pubchemblog.ncbi.nlm.nih.gov) ▪ PubChemDocs (https://pubchemdocs.ncbi.nlm.nih.gov) Other Tools & Services
  • 39. 39 ➢ PubChem users have very diverse backgrounds/interests. ➢ PubChem’s web interfaces are optimized to perform commonly requested tasks interactively.
  • 40. 40 ➢ PubChem users have very diverse backgrounds/interests. ➢ PubChem’s web interfaces are optimized to perform commonly requested tasks interactively. ➢ Everything you can do with PubChem through the web browser can be automated through PubChem’s programmatic interfaces.
  • 41. 41 ➢ PubChem users have very diverse backgrounds/interests. ➢ PubChem’s web interfaces are optimized to perform commonly requested tasks interactively. ➢ Everything you can do with PubChem through the web browser can be automated through PubChem’s programmatic interfaces. ➢ Programmatic access enables one to do much more complicated tasks that cannot be done through the web browser.
  • 42. 42 ➢ Multiple programmatic access routes ➢ Two major programmatic access methods o PUG-REST (primarily for computed properties). https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest o PUG-View (primarily for text information). https://pubchemdocs.ncbi.nlm.nih.gov/pug-view ➢ Request volume limitation: o No more than 5 requests per second (See more at: https://pubchemdocs.ncbi.nlm.nih.gov/programmatic- access$_RequestVolumeLimitations) o Violators/abusers may be blocked for a certain period of time. Entrez Utilities (E-Utils) Power User Gateway (PUG) PUG-SOAP PUG-REST PubChem RDF REST PUG-View
  • 43. 43 ➢ Bulk Download o PubChem FTP Site ftp://ftp.ncbi.nlm.nih.gov/pubchem o PubChem RDF (Resource Description Network) https://pubchemdocs.ncbi.nlm.nih.gov/rdf
  • 44. 44 5. PubChem for Cheminformatics Education
  • 45. 45 User Demographics (June 2020 through May 2021) 36.5% 27.4% 13.5% 10.4% 6.7% 5.4% 0 1 2 3 4 5 6 18-24 25-34 35-44 45-54 55-64 65+ Number of Users (millions) Age 34.64% of total users ~40% of PubChem users are aged between 18 and 24. (likely to be college students)
  • 46. 46 ▪ Popularity: many young people are already using PubChem. ▪ Sustainability: it is 17 years old and not going away soon. ▪ Zero-cost (to students): U.S. taxpayers have already paid for it.
  • 47. 47 PubChem presents a strong potential as an education resource especially for small organizations like: • primarily undergraduate institutions (PUIs) • community colleges (CCs) ▪ Popularity: many young people are already using PubChem. ▪ Sustainability: it is 17 years old and not going away soon. ▪ Zero-cost (to students): U.S. taxpayers have already paid for it.
  • 48. 48 ➢ How about R1 universities with large endowments? ▪ Likely to have access to proprietary databases. • Primarily used for research. • Inconvenient off-campus access. • Students will lose access when they graduate. ▪ Most students will eventually rely on public resources. → Need for training/education opportunities while in school.
  • 49. 49 Cheminformatics Online Chemistry Course (OLCC) Collaborative Teaching Instructors at multiple schools Open Education Resource (OER) Development Cheminformatics experts from outside S. Kim et al., J. Chem. Educ. 2021, 98(2), 416–425.
  • 50. 50 Cheminformatics experts Prepare online reading materials & homework problem sets Instructional Approach Course website
  • 51. 51 Cheminformatics experts Prepare online reading materials & homework problem sets Run the course using the course materials at multiple schools Course website Course Instructor Students Instructional Approach
  • 52. 52 Course website Cheminformatics experts Prepare online reading materials & homework problem sets Course Instructor Students Run the course using the course materials at multiple schools Face-to-face meeting Online discussion among experts, instructors & students Instructional Approach
  • 53. 53 Enrollment Statistics Semester # Schools # Students Participating schoolsa Fall 2015 4 36 UALR (6), Centre (4), WVU (5), UNF (21) Spring 2017 9 47 UALR (12), Centre (0b), UHSP (3), IQS (6c), SDSU (4), Potsdam (3), UIS (6), Campbell (12), Rutgers (1) Fall 2019 5 23 UALR (2), Centre (3), UHSP (4), IQS (9c), Otterbein (5), a Numbers in parentheses are the numbers of enrolled students at individual schools. b The course was taken by three faculty and staff members who participated in a faculty learning circle. No students enrolled. c Not formally enrolled as the course was offered as a non-credit seminar.
  • 54. 54 OLCC Course websites 2015 • http://olcc.ccce.divched.org/Fall2015OLCC • https://chem.libretexts.org/link?50598 2017 • http://olcc.ccce.divched.org/Spring2017OLCC • https://chem.libretexts.org/link?83678 2019 • https://chem.libretexts.org/link?143689 Course Websites ➢ All course materials are freely available for reuse at: ▪ Committee on Computers in Chemical Education (CCCE) website (http://olcc.ccce.divched.org) ▪ LibreTexts (https://libretexts.org)
  • 55. 55 ▪ Critical assessment of chemical information ▪ Chemical representations (e.g., InChI and SMILES) ▪ Search by chemical name ▪ Search by chemical structure o Identity search o 2-D/3-D similarity search o Substructure/superstructure search ▪ Structure clustering ▪ Structure-activity relationship analysis ▪ Automation of chemical data retrieval PubChem-Related Topics
  • 56. 56 ▪ Python Jupyter Notebooks (with sample codes and assignments) o Programmatic access to PubChem data o Cheminformatics tasks using open-source software packages (e.g., RDKit, scikit-learn, and Mordred). o Bioactivity prediction using machine learning and PubChem data. Python/R Programming (Fall 2019)
  • 57. 57 ▪ Python Jupyter Notebooks (with sample codes and assignments) o Programmatic access to PubChem data o Cheminformatics tasks using open-source software packages (e.g., RDKit, scikit-learn, and Mordred). o Bioactivity prediction using machine learning and PubChem data. ▪ Similar materials for the R language (using JupyterLab and R-Studio) Python/R Programming (Fall 2019)
  • 58. 58 ▪ Students had two options to run the notebooks: o Download and run them on their own computers. o Run the notebooks on JupyterHub available through LibreTexts. Python/R Programming (Fall 2019)
  • 59. 59 • PubChem is the largest source of publicly available chemical information, collected from hundreds of data sources. • All contents are provided to the public free of charge. • PubChem contains a wide range of chemical information, necessary for drug discovery. • It is used by more than five million users per month at peak. Summary
  • 60. 60 • PubChem’s web interface allows average users to readily perform commonly requested, simple tasks. • PubChem supports programmatic access to its data, allowing advanced users to perform much more complex tasks that are not supported by the web interfaces. • PubChem can be downloaded in bulk. • PubChem data, tools, and services can be used to teach cheminformatics for college students. • Please reach out to us for collaboration. Summary
  • 61. 61 Acknowledgements ▪ The PubChem Team ▪ Funding Evan Bolton Jia He Thiessen Paul Zhi Sun Jie Chen Siqian He Bo Yu Tiejun Chung Qingliang Li Leonid Zaslavsky Asta Gindulyte Ben Shoemaker Jian Zhang Intramural Research Program of the National Library of Medicine
  • 62. 62 Thank you for your attention. Questions? Sunghwan Kim, Ph.D., M.Sc. Email: kimsungh@ncbi.nlm.nih.gov LinkedIn: https://www.linkedin.com/in/sunghwan-kim/
  • 63. 63 ❑ References ▪ Getting the most out of PubChem for virtual screening S. Kim, Expert Opin. Drug Discov. 2016, 11(9), 843-855. ▪ PubChem in 2021: new data content and improved web interfaces S. Kim et al., Nucleic Acids Res. 2021, 49(D1):D1388–D1395. ▪ An update on PUG-REST: RESTful interface for programmatic access to PubChem S. Kim et al., Nucleic Acids Res. 2018, 46(W1):W563-W570. ▪ PUG-View: programmatic access to chemical annotations integrated in PubChem S. Kim et al., J. Cheminform. 2019, 11:56. ▪ Teaching Cheminformatics through a Collaborative Intercollegiate Online Chemistry Course (OLCC) S. Kim et al., J. Chem. Educ. 2021, 98(2), 416–425.