Recombination DNA Technology (Nucleic Acid Hybridization )
Biologics information in PubChem
1. Biologics information in PubChem
Jian Zhang*, Paul Thiessen, Tiejun Cheng, Ben Shoemaker, Evan Bolton,
Noel O'Boyle, Roger Sayle
2019 Fall ACS National Meeting and Expo, San Diego
2. Biologics: definitions..
Biologic(s) - Biological products:
a wide range of products such as vaccines, blood and blood
components, tissues ..
can be composed of sugars, proteins, lipids, or nucleic acids or
complex combinations of these substances
isolated from a variety of natural sources - human, animal, or
microorganism ...
produced by biotechnology methods and other cutting-edge
technologies...
used to treat a variety of medical conditions for which no other
treatments are available...
3. Biologics: large to small
• Biologics – large molecules, can be composed of sugars, proteins, or
nucleic acids or complex combinations of these substances.
• Information of sugars, proteins, nucleic acids, peptides are important for
biologic studies.
Example: vaccination – virus or bacteria antigen to stimulate the body to
produce antibody
4. Biologics: large to small
• Biologics extension – small biopolymers (oligomers, or repeat units) play
an import role for biologic studies.
• Small molecules – atoms < 1000 … biopolymers: components of protein,
glycans, nucleotides…
• Extended (PubChem) definition:
Structure contains recognized biopolymers monomers (glycan, lipids,
amino-acid, nucleotide… )
5. Outline
• PubChem brief
• Biologic information in PubChem - line notations from Sugar &
Splice
• Data accessing and retrieving
• NCBI Glycans
• Summary
6. PubChem brief ..
• An open chemistry database
• A public chemical information repository
• A chemical information hub
Contents: Chemicals structure, depictions and
notations, properties, drug information, food
additives, safety, toxicity, target, pathways,
bioactivities, literature, patents, and more ..
7. PubChem brief ..
• PubChem keep growing ..
Data Collection Live Item Count
Compounds 95,753,185
Substances 234,916,398
BioAssays 1,340,534
Bioactivities 265,373,498
Gene Targets 58,029
Protein Targets 17,847
Taxonomy Targets 3,746
Literature 29,876,654
Patents 3,142,716
12. Biologics in PubChem –
structure can be very complex
E.g. Teriparatide
(Teriparatide is a recombinant human
parathyroid hormone analogue that is
used to treat osteoporosis in women or
men with a high risk for bone fracture)
https://pubchem.ncbi.nlm.nih.gov/comp
ound/Teriparatide
13. Sugar & Slice – Generate line notations for biologics
Teriparatide
14. Biologics in PubChem – saccharides example
G(M1)-Oligosaccharide:
https://pubchem.ncbi.nlm.nih.gov/compound/G(M1)-
Oligosaccharide
15. Biologics in PubChem – peptide lipids example
(2S,3S)-3-methyl-2-[[(2S)-2-[[2-[[2-
(tetradecanoylamino)acetyl]amino]ace
tyl]amino]propanoyl]amino]pentanoic
acid
CID 138810998
https://pubchem.ncbi.nlm.nih.gov/com
pound/138810998
Line notations from “Sugar and Splice” simplified the
complex structure information which is readable for both
human and computer.
16. Biologics in PubChem – browse and download
PubChem classification browser: Compound TOC tree
https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72
17. 1) Start from the PubChem homepage
2) Click the “browse” data to launch the PubChem classification
browser
3) At the dropdown menu, choose “PubChem”, then “PubChem
Compound TOC”
Biologics in PubChem – browse and download
18. Biologics in PubChem – data accessing
• Website:
1. Text search (Google, PubChem .. )
2. PubChem structure search
• Programmatic: Pug_view API
Format Heading: Biologic Description
CID
https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/6918011/
XML?heading=Biologic%20Description
19. Biologics in PubChem – data accessing
https://pubchem.ncbi.nlm.nih.gov/compound/6918011#section=
Biologic-Description
Example: CID 6918011 - Lanreotide
20. Biologics in PubChem – data accessing
Format Heading: Biologic Description
CID
https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/6918011/
XML?heading=Biologic%20Description
Pug_view API for the same compound:
21. NCBI Glycans
NCBI glycans website was created in 2016 as a joint project from
PubChem and the Glycan Informatics Advisory Group (globally).
• Glycan information resource summary
• Definition for the carbohydrate monomer depictions
• Links to various resource tools.
22. NCBI Glycans – https://www.ncbi.nlm.nih.gov/glycans/
The main page contains a brief introduction, links to other pages
and external resources
Symbol Nomenclature for Glycans (SNFG)
23. NCBI Glycans – https://www.ncbi.nlm.nih.gov/glycans/snfg.html
• The SNFG (symbol nomenclature for glycans) page provides
carbohydrate monomer depictions, useful resource links, and
SNFG examples.
26. Summary
• PubChem provides biologic information for more than 1.5
million compounds.
• The line notation created using “Sugar and Splice” simplified the
complex structure information which is readable for both human
and computer.
• The biologic information in PubChem can be accessed and
retrieved in both website and program.
• The NCBI glycans website provides a great resource for glycan
studies.
27. Thank you. This research was supported by the Intramural Research
Program of the NIH, National Library of Medicine.
Evan Bolton
Asta Gindulyte
Ben Shoemaker
Paul Thiessen
Siqian He
Bo Yu
Jie Chen
Tiejun Cheng
Jane He
Sunghwan Kim
Leon Li
Leonid Zaslavsky
Collaborators
Noel O'Boyle, NextMove Software
Roger Sayle, NextMove Software
The Glycan Informatics Advisory Group (GlyAG)