Presented at the Fall 2020 American Chemical Society (ACS) National Meeting (Virtual) on August 20, 2020.
Sunghwan Kim & Evan Bolton
National Library of Medicine, National Institutes of Health, Rockville, Maryland, United States
==== Abstract ====
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical information resource that contains one of the largest corpus of publicly available chemical information. It is one of the top five most visited chemistry web sites in the world, with more than four million unique users per month (as of April 2020). Considering that many of PubChem users are undergraduate students in academic institutions, PubChem has a great potential as an online resource for chemical education. However, it also has some important issues with data accuracy, data provenance, structure standardization, terminologies and so on, because PubChem is essentially a data aggregator that collects heterogeneous data from 700+ data sources in various domains. This presentation will discuss various aspects of PubChem as a chemical information education resource. Especially, a focus will be given on how to help students develop the ability to critically assess chemical information available in PubChem and other public databases.
PubChem as a resource for chemical information education
1. PubChem as a Resource for
Chemical Information Education
ACS Fall 2020 Virtual Meeting
August 20, 2020
Sunghwan Kim, Ph.D., M.Sc.
2. 2
PubChem (https://pubchem.ncbi.nlm.nih.gov)
Public chemical database.
Developed and maintained by
the U.S. National Institutes of Health.
Contains various chemical entities:
• Small molecules
• siRNAs & miRNAs
• Carbohydrates
• Lipids
• Peptides
• Chemically modified
macromolecules
• ……
3. 3
PubChem (https://pubchem.ncbi.nlm.nih.gov)
Collects chemical information from 750+ data sources
and disseminates it to the public free of charge.
103 million unique chemical structures.
Crosslinks to many other databases.
Search, analysis, download and visualization tools.
A key resource in many areas:
• Cheminformatics
• Chemical biology
• Medicinal chemistry
• Drug discovery
5. 5
Top 5 Chemistry Websites
1. acs.org
2. rsc.org
3. sigmaaldrich.com
4. pubchem.ncbi.nlm.nih.gov
5. cas.org
Source: https://www.alexa.com/topsites/category/Top/Science/Chemistry
PubChem is the only public website among them.
PubChem Usage Statistics
6. 6
~36% of PubChem users are between 18-24.
(likely to be college students)
[CELLRANGE]
[CELLRANGE]
[CELLRANGE]
[CELLRANGE][CELLRANGE][CELLRANGE]
0%
10%
20%
30%
40%
18-24 25-34 35-44 45-54 55-64 65+
%NumberofUsers
Age
Users by Age (April 2020)
PubChem Usage Statistics
7. 7
Popularity:
Many young people are already using PubChem.
Sustainability:
It is sixteen years old and not going away soon.
Zero-cost (to students):
U.S. taxpayers have already paid for it.
8. 8
Popularity:
Many young people are already using PubChem.
Sustainability:
It is sixteen years old and not going away soon.
Zero-cost (to students):
U.S. taxpayers have already paid for it.
A strong potential as an education resource,
especially for small organizations like:
• primarily undergraduate institutions (PUIs)
• community colleges (CCs)
9. 9
How about R1 universities with large endowments?
Likely to have access to proprietary databases.
• Primarily used for research.
• Inconvenient off-campus access.
• Students will lose access when they graduate.
Most students will eventually rely on public resources.
Need for training/education opportunities while in
school.
10. 10
Exploring Chemical Information in PubChem
1. Search by chemical name
2. Search by chemical structure
3. Search by gene/protein name
4. PubChem Periodic Table and Element pages
5. Programmatic access
11. 11
Exploring Chemical Information in PubChem
1. Search by chemical name
2. Search by chemical structure
3. Search by gene/protein name
4. PubChem Periodic Table and Element pages
5. Programmatic access
20. 20
Exploring Chemical Information in PubChem
1. Search by chemical name
2. Search by chemical structure
3. Search by gene/protein name
4. PubChem Periodic Table and Element pages
5. Programmatic access
25. 25
Exploring Chemical Information in PubChem
1. Search by chemical name
2. Search by chemical structure
3. Search by gene/protein name
4. PubChem Periodic Table and Element pages
5. Programmatic access
32. 32
Exploring Chemical Information in PubChem
1. Search by chemical name
2. Search by chemical structure
3. Search by gene/protein name
4. PubChem Periodic Table and Element pages
5. Programmatic access
34. 34
Kim et al., Chem. Teacher International, 2020. doi:10.1515/cti-2020-0006
35. 35
Kim et al., Chem. Teacher International, 2020. doi:10.1515/cti-2020-0006
36. 36
Kim et al., Chem. Teacher International, 2020. doi:10.1515/cti-2020-0006
37. 37
Kim et al., Chem. Teacher International, 2020. doi:10.1515/cti-2020-0006
38. 38
Kim et al., Chem. Teacher International, 2020. doi:10.1515/cti-2020-0006
39. 39
0
5
10
15
20
25
30
0 10 20 30 40 50 60 70 80 90 100
IonizationEnergy(eV)
Atomic Number
He
Ne
Ar
Kr
Xe
Rn
Li Na K Rb Cs Fr
Kim et al., Chem. Teacher International, 2020. doi:10.1515/cti-2020-0006
40. 40
Exploring Chemical Information in PubChem
1. Search by chemical name
2. Search by chemical structure
3. Search by gene/protein name
4. PubChem Periodic Table and Element pages
5. Programmatic access
41. 41
Why should students learn programmatic
access?
PubChem users have very diverse
backgrounds/interests.
PubChem’s web interfaces are optimized to perform
commonly requested tasks interactively.
Everything you can do with PubChem through the web
browser can be automated through PubChem’s
programmatic interfaces.
Programmatic access enables one to do much more
complicated and specialized tasks that cannot be
done through the web browser.
42. 42
Why should students learn programmatic
access?
Programming skills are essential for:
• automating routine tasks and
• processing/analyzing a large data set
Important skills for students pursuing STEM careers in
the age of big data.
43. 43
Programmatic Access to PubChem
Multiple programmatic access routes.
Two major programmatic access methods.
• PUG-REST (primarily for computed properties).
Kim et al., Nucleic Acids Res. 2018, 46(W1):W563-570.
https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest
• PUG-View (primarily for text information).
Kim et al., J. Cheminform. 2019, 11:56.
https://pubchemdocs.ncbi.nlm.nih.gov/pug-view
Jupyter Notebooks containing sample codes (in
python/R) are freely available at LibreTexts:
https://chem.libretexts.org/link?143689
45. 45
Cheminformatics OLCC
Unique challenges to teaching cheminformatics
Cheminformatics is not an established chemistry field.
Chemistry + Informatics + Computer Science
+ Library Science + Pharmaceutical Science + ……
Not so many faculty members with Cheminformatics
expertise.
No textbook suitable for undergraduate chemistry students.
46. 46
The Cheminformatics OLCC addresses these issues!
Cheminformatics OLCC
Unique challenges to teaching cheminformatics
Cheminformatics is not an established chemistry field.
Chemistry + Informatics + Computer Science
+ Library Science + Pharmaceutical Science + ……
Not so many faculty members with Cheminformatics
expertise.
No textbook suitable for undergraduate chemistry students.
49. 49
Course website
Cheminformatics
experts
Prepare online reading materials &
homework problem sets
Course
Instructor
Students
Run the course
using the course materials
at multiple schools
Face-to-face
meeting
Online discussion among
experts, instructors, & students
through the website
Cheminformatics OLCC
50. 50
It was offered three times:
Fall 2015: 36 students from 4 schools
Spring 2017: 47 students from 9 schools
Fall 2019: 23 students from 5 schools
All course materials are available at:
CCCE website (http://olcc.ccce.divched.org)
LibreTexts (https://libretexts.org)
(free online textbook site)
Many of the course materials cover PubChem data, tools
and services.
Cheminformatics OLCC
51. 51
PubChem-related topics in Cheminformatics OLCC
Critical assessment of chemical information
Chemical representations (e.g., InChI and SMILES)
• As alternatives to chemical name queries
• For chemical data exchange/integration/sharing
Search by chemical name
Search by chemical structure
• Identity search
• 2-D/3-D similarity search
• Substructure/superstructure search
• Molecular formula search
Structure clustering and structure-activity relationship analysis
Automation of chemical data retrieval through a computer code
Cheminformatics OLCC
52. 52
Many PubChem users are likely to be college students.
Summary
PubChem has a strong potential as a resource for
chemical information training because of its:
• popularity
• sustainability
• low cost
53. 53
Summary
PubChem supports various use cases beyond simple
chemical name search.
• Search by chemical structure
• Search by gene/protein name
• PubChem Periodic Table and Element pages
• Programmatic access
54. 54
Summary
PubChem works with the chemical education community
to provide chemical information training for students.
Please reach out to us for collaboration if you are
interested.
55. 55
Acknowledgements
Evan Bolton
Jie Chen
Tiejun Cheng
Asta Gindulyte
Jia He
Siqian He
Qingliang (Leon) Li
Benjamin Shoemaker
Thiessen Paul
Olga Pujolras
Bo Yu
Leonid Zaslavsky
Jian (Jeff) Zhang
Zhi (Leon) Sun
The PubChem Team
PubChem users, depositors, and collaborators
Funded by the National Library of Medicine