SlideShare a Scribd company logo
1 of 64
Chemical Structure Standardization and
Synonym Filtering in PubChem
Sunghwan Kim, Ph.D., M.Sc.
ACS National Meeting in San Diego, CA
(August 26, 2019)
2
PubChem
(https://pubchem.ncbi.nlm.nih.gov)
3
PubChem
 Public chemical information resource
 Collects data from more than 690+ sources
 Disseminates data back to the public free of charge
 Contains the largest amount of publicly available chemical
information
 Faces unique challenges to
deal with many big data issues
on a daily basis.
• Chemical structure
standardization
• Name-structure association
clean up
Depositor-provided
Bioactivity test results
Unique chemical
structure extraction
through Standardization
Depositor-provided
substance descriptions
Unique chemical structures
Activity of tested
“substances”
Activity of “compounds” derived
from associated “substances”
690+ Data Contributors
Substance
deposition
Assay
deposition
Data Organization in PubChem
Substance ID (SID) Assay ID (AID)
Compound ID (CID)
4
Unique chemical
structure extraction
through Standardization
Depositor-provided
substance descriptions
Unique chemical structures
690+ Data Contributors
Substance
deposition
Data Organization in PubChem
Substance ID (SID)
Depositor-provided
Bioactivity test results
Activity of tested
“substances”
Activity of “compounds” derived
from associated “substances”
Assay
deposition
Assay ID (AID)
Compound ID (CID)
5
Unique chemical
structure extraction
through Standardization
Depositor-provided
substance descriptions
Unique chemical structures
690+ Data Contributors
Substance
deposition
Data Organization in PubChem
Substance ID (SID)
Compound ID (CID)
6
 Individual data depositors
provide PubChem with:
• Chemical structures
• Chemical names (synonyms)
 They need to be
organized/cleaned up through:
• Structure standardization
• Synonym filtering
7
Common Issues with
Chemical Structure Representations in
PubChem
Drawing conventions
Drawing conventions are often ignored in
structures deposited by original data sources.
Kekulé 1 Kekulé 2aromatic
Aromatic Compounds
Many Kekulé structures for aromatic compounds
Which one should be used as a standard?
Tautomerism
Ionization
Mesomerism
Ionization
Different Forms of the Same Molecule
Different tautomers, resonance forms, protonation states!
Choose the most stable one?
Most stable
in vacuum
Most stable
in water
The stability depends upon the context.
Different Forms of the Same Molecule
12
PubChem
Chemical Structure Standardization
Detect components
•Isolate covalent units
•Neutralize (by  H+ or e-)
•Reprocess
•Detect unique components
PubChem
Standardization
Normalize representation
• Tautomer invariance
• Aromaticity detection
• Stereochemistry
• Explicit hydrogen
Validate chemical contents
• Atoms defined/real
• Implicit hydrogen
• Functional group
• Atom valence
Calculate
•Coordinates
•Properties
•Descriptors
14
J. Cheminform. (2018) 10:36
15
• ~90% of the substances
are subject to
standardization.
• Mostly organic
compounds.
• Standardization success rate:
99.64%
• Modification rate:
44.43%
J. Cheminform. (2018) 10:36
Standardization
Statistics
Most stable
in vacuum
Most stable
in water
It is not necessarily what one may expect
Standardized Structures
Standardized
by PubChem
 In most cases, tautomeric forms of a molecule are
standardized into a single form.
 There are a few exceptions.
CID 18630CID 31261
Standardized Structures
tautomerization
Standardization and Structure Identity Search
 You can search PubChem using a structure as a query.
 The input structure may be provided:
• using a line notation (e.g., SMILES, InChI)
• through using the PubChem Sketcher.
 The input structure for identity search will be standardized
first before the search is performed.
 Therefore, hits from identity search may have different
structures from the original input structure.
19
Uracil
(CID 1174)
Identity
search
2,4-Dihydroxypyrimidine
(SID 377954591)
2-hydroxy-4(1h)-pyrimidinone
(SID 341255477)
Standardization and Structure Identity Search
20
Depositor-supplied synonyms &
MeSH Entry Terms
21
Two kinds of chemical names in PubChem
22
MeSH Entry Terms
 A set of “terms” related to ibuprofen.
 Used to index PubMed articles to help find articles
about ibuprofen.
23
Depositor-Supplied Synonyms
 Synonyms provided for “substance” records by depositors.
 “Filtered” synonyms are provided on the “Compound” Summary
24
Raw (unfiltered)
depositor-provided synonym
associated with the largest number of CIDs
Examples
25
Synonym # SIDs # CIDs
N/A 6,869 6,368
SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT
SUPPORTED IN CURRENT VERSION 4,903 4,902
NULL 4,610 4,599
ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT
VERSION 2,554 2,554
NOT AVAILABLE 1,867 1,816
LECITHIN 1,157 1,142
DIACYLGLYCEROL 847 842
DIGLYCERIDE 841 841
MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT
VERSION! 797 794
VITASMLAB 461 461
MIXTURE NAME 419 413
CLA 770 394
CHLOROPHYLL A 749 393
NA 7,081 371
Unfiltered Depositor-provided synonyms (page 1/3)
26
Synonym # SIDs # CIDs
N/A 6,869 6,368
SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT
SUPPORTED IN CURRENT VERSION 4,903 4,902
NULL 4,610 4,599
ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT
VERSION 2,554 2,554
NOT AVAILABLE 1,867 1,816
LECITHIN 1,157 1,142
DIACYLGLYCEROL 847 842
DIGLYCERIDE 841 841
MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT
VERSION! 797 794
VITASMLAB 461 461
MIXTURE NAME 419 413
CLA 770 394
CHLOROPHYLL A 749 393
NA 7,081 371
Various forms of
“Not Available”
Unfiltered Depositor-provided synonyms (page 1/3)
27
Synonym # SIDs # CIDs
N/A 6,869 6,368
SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT
SUPPORTED IN CURRENT VERSION 4,903 4,902
NULL 4,610 4,599
ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT
VERSION 2,554 2,554
NOT AVAILABLE 1,867 1,816
LECITHIN 1,157 1,142
DIACYLGLYCEROL 847 842
DIGLYCERIDE 841 841
MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT
VERSION! 797 794
VITASMLAB 461 461
MIXTURE NAME 419 413
CLA 770 394
CHLOROPHYLL A 749 393
NA 7,081 371
Various forms of
“Not Available”
Unfiltered Depositor-provided synonyms (page 1/3)
28
Synonym # SIDs # CIDs
N/A 6,869 6,368
SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT
SUPPORTED IN CURRENT VERSION 4,903 4,902
NULL 4,610 4,599
ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT
VERSION 2,554 2,554
NOT AVAILABLE 1,867 1,816
LECITHIN 1,157 1,142
DIACYLGLYCEROL 847 842
DIGLYCERIDE 841 841
MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT
VERSION! 797 794
VITASMLAB 461 461
MIXTURE NAME 419 413
CLA 770 394
CHLOROPHYLL A 749 393
NA 7,081 371
Various forms of
“Not Available”
Great reduction in the structure count
after structure standardization
 SIDs are standardized to Na (sodium)
Unfiltered Depositor-provided synonyms (page 1/3)
29
Synonym # SIDs # CIDs
N/A 6,869 6,368
SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT
SUPPORTED IN CURRENT VERSION 4,903 4,902
NULL 4,610 4,599
ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT
VERSION 2,554 2,554
NOT AVAILABLE 1,867 1,816
LECITHIN 1,157 1,142
DIACYLGLYCEROL 847 842
DIGLYCERIDE 841 841
MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT
VERSION! 797 794
VITASMLAB 461 461
MIXTURE NAME 419 413
CLA 770 394
CHLOROPHYLL A 749 393
NA 7,081 371
Error messages from
name generation software
Unfiltered Depositor-provided synonyms (page 1/3)
30
Synonym # SIDs # CIDs
N/A 6,869 6,368
SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT
SUPPORTED IN CURRENT VERSION 4,903 4,902
NULL 4,610 4,599
ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT
VERSION 2,554 2,554
NOT AVAILABLE 1,867 1,816
LECITHIN 1,157 1,142
DIACYLGLYCEROL 847 842
DIGLYCERIDE 841 841
MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT
VERSION! 797 794
VITASMLAB 461 461
MIXTURE NAME 419 413
CLA 770 394
CHLOROPHYLL A 749 393
NA 7,081 371
Names of
chemical classes
Unfiltered Depositor-provided synonyms (page 1/3)
31
Synonym # SIDs # CIDs
(1-(5-CARBOXYPENTYL)-3,3-DIMETHYL-3H-INDOL-1-IUM-2-
YL)METHANIDE HYDROBROMIDE 405 345
ETHANONE,1- - 328 328
CANNOT MAKE CHOICE: LIGANDS ARE COMPARED UP TO 10 SPHERES 304 304
COMPLEX BRIDGED FUSED SYSTEMS ARE NOT SUPPORTED IN CURRENT
VERSION! 302 302
TRIACYLGLYCEROL 286 285
TRIGLYCERIDE 286 285
QUINOLONE DER. 280 279
UNABLE TO GENERATE VALUE 274 264
UNL 656 255
UNKNOWN LIGAND 615 235
HEPT DERIV. 213 211
MULTIPARENT NAMES FOR FUSED SYSTEMS ARE NOT SUPPORTED IN
CURRENT VERSION! 208 208
ACHIRAL CENTER(S) 187 187
Unfiltered Depositor-provided synonyms (page 2/3)
32
Synonym # SIDs # CIDs
(1-(5-CARBOXYPENTYL)-3,3-DIMETHYL-3H-INDOL-1-IUM-2-
YL)METHANIDE HYDROBROMIDE 405 345
ETHANONE,1- - 328 328
CANNOT MAKE CHOICE: LIGANDS ARE COMPARED UP TO 10 SPHERES 304 304
COMPLEX BRIDGED FUSED SYSTEMS ARE NOT SUPPORTED IN CURRENT
VERSION! 302 302
TRIACYLGLYCEROL 286 285
TRIGLYCERIDE 286 285
QUINOLONE DER. 280 279
UNABLE TO GENERATE VALUE 274 264
UNL 656 255
UNKNOWN LIGAND 615 235
HEPT DERIV. 213 211
MULTIPARENT NAMES FOR FUSED SYSTEMS ARE NOT SUPPORTED IN
CURRENT VERSION! 208 208
ACHIRAL CENTER(S) 187 187
“Derivative” of
a chemical
Unfiltered Depositor-provided synonyms (page 2/3)
33
Synonym # SIDs # CIDs
C9H11NO2 179 174
HEM 4,645 165
BCR 290 160
C10H13NO2 161 154
BETA-CAROTENE 298 147
C8H10N2O2 149 144
C10H10N2O2 149 143
-ACETICACID 141 141
C9H8N2O2 143 141
PROTOPORPHYRIN IX CONTAINING FE 3,690 140
C8H9NO2 144 139
NAG 9,599 130
METHANOL 247 128
C8H9NO3 129 127
C10H9NO2 133 126
PYRIDINONE DERIV. 130 126
N. A. 128 125
Unfiltered Depositor-provided synonyms (page 3/3)
34
Synonym # SIDs # CIDs
C9H11NO2 179 174
HEM 4,645 165
BCR 290 160
C10H13NO2 161 154
BETA-CAROTENE 298 147
C8H10N2O2 149 144
C10H10N2O2 149 143
-ACETICACID 141 141
C9H8N2O2 143 141
PROTOPORPHYRIN IX CONTAINING FE 3,690 140
C8H9NO2 144 139
NAG 9,599 130
METHANOL 247 128
C8H9NO3 129 127
C10H9NO2 133 126
PYRIDINONE DERIV. 130 126
N. A. 128 125
Molecular formula
Unfiltered Depositor-provided synonyms (page 3/3)
35
Synonym # SIDs # CIDs
C9H11NO2 179 174
HEM 4,645 165
BCR 290 160
C10H13NO2 161 154
BETA-CAROTENE 298 147
C8H10N2O2 149 144
C10H10N2O2 149 143
-ACETICACID 141 141
C9H8N2O2 143 141
PROTOPORPHYRIN IX CONTAINING FE 3,690 140
C8H9NO2 144 139
NAG 9,599 130
METHANOL 247 128
C8H9NO3 129 127
C10H9NO2 133 126
PYRIDINONE DERIV. 130 126
N. A. 128 125
Abbreviation for
chemical names
Unfiltered Depositor-provided synonyms (page 3/3)
36
Synonym # SIDs # CIDs
C9H11NO2 179 174
HEM 4,645 165
BCR 290 160
C10H13NO2 161 154
BETA-CAROTENE 298 147
C8H10N2O2 149 144
C10H10N2O2 149 143
-ACETICACID 141 141
C9H8N2O2 143 141
PROTOPORPHYRIN IX CONTAINING FE 3,690 140
C8H9NO2 144 139
NAG 9,599 130
METHANOL 247 128
C8H9NO3 129 127
C10H9NO2 133 126
PYRIDINONE DERIV. 130 126
N. A. 128 125
Abbreviation for
chemical names
Unfiltered Depositor-provided synonyms (page 3/3)
Description
37
Synonym # SIDs # CIDs
C9H11NO2 179 174
HEM 4,645 165
BCR 290 160
C10H13NO2 161 154
BETA-CAROTENE 298 147
C8H10N2O2 149 144
C10H10N2O2 149 143
-ACETICACID 141 141
C9H8N2O2 143 141
PROTOPORPHYRIN IX CONTAINING FE 3,690 140
C8H9NO2 144 139
NAG 9,599 130
METHANOL 247 128
C8H9NO3 129 127
C10H9NO2 133 126
PYRIDINONE DERIV. 130 126
N. A. 128 125
Abbreviation for
chemical names
Unfiltered Depositor-provided synonyms (page 3/3)
Description
“Not available”
38
Unfiltered Depositor-provided synonyms
 Depositor-provided synonyms include:
• Real chemical names
• Abbreviations for chemical names
• “Derivatives” of some chemicals
• Names of chemical classes
• Molecular formula
• N/A, NULL, Not Available, NA, N.A., etc
• Error messages or comments
 Not feasible to manually clean up.
 PubChem uses crowd-voting-based synonym filtering.
39
PubChem Synonym Filtering
40
PubChem Synonym filtering
 Crowd-voting approach
 Check for a consensus on the name-structure association
between depositors.
 Consensus threshold : >60% of the total votes
 When a consensus is reached,
the synonym is added to the “filtered” synonym list of the
corresponding compound (standardized structure).
41
CID 1
Synonym A SID 1Depositor 1
Synonyms that occurs only “once”
 No disagreement in the name-structure association
 Consider that the Synonym A means CID 1,
(although it may not be correct)
42
CID 1
CID 2
CID 3
Synonym A SID 1Depositor 1
Synonym A
Synonym A
Synonym A
Synonym A
SID 2
SID 4
SID 5
SID 3
Depositor 2
SID 7
Synonym A
Synonym A
SID 8
SID 6
Synonym A
Depositor 3
SID 10
SID 9Synonym A
Synonym A
Depositor 4
Synonyms occurring multiple times
Which one is
the best choice?
43
Synonym filtering using crowd voting
 Two potential approaches
• Multiple-votes-per-depositor
• Single-vote-per-depositor
44
CID 1
CID 2
CID 3
Synonym A SID 1Depositor 1
Synonym A
Synonym A
Synonym A
Synonym A
SID 2
SID 4
SID 5
SID 3
Depositor 2
SID 7
Synonym A
Synonym A
SID 8
SID 6
Synonym A
Depositor 3
SID 10
SID 9Synonym A
Synonym A
Depositor 4
# votes
3 (30%)
5 (50%)
2 (20%)
Consensus Threshold = 60%
Multiple-Votes-per-Depositor Strategy
45
CID 1
CID 2
CID 3
Synonym A SID 1Depositor 1
Synonym A
Synonym A
Synonym A
Synonym A
SID 2
SID 4
SID 5
SID 3
Depositor 2
SID 7
Synonym A
Synonym A
SID 8
SID 6
Synonym A
Depositor 3
SID 10
SID 9Synonym A
Synonym A
Depositor 4
# votes
Consensus Threshold = 60%
Single-Vote-per-Depositor Strategy
46
CID 1
CID 2
CID 3
Synonym A SID 1Depositor 1
Synonym A
Synonym A
Synonym A
Synonym A
SID 2
SID 4
SID 5
SID 3
Depositor 2
SID 7
Synonym A
Synonym A
SID 8
SID 6
Synonym A
Depositor 3
SID 10
SID 9Synonym A
Synonym A
Depositor 4
# votes
Consensus Threshold = 60%
Single-Vote-per-Depositor Strategy
47
CID 1
CID 2
CID 3
Synonym A SID 1Depositor 1
Synonym A
Synonym A
Synonym A
Synonym A
SID 2
SID 4
SID 5
SID 3
Depositor 2
SID 7
Synonym A
Synonym A
SID 8
SID 6
Synonym A
Depositor 3
SID 10
SID 9Synonym A
Synonym A
Depositor 4
# votes
Consensus Threshold = 60%
Single-Vote-per-Depositor Strategy
48
CID 1
CID 2
CID 3
Synonym A SID 1Depositor 1
Synonym A
Synonym A
Synonym A
Synonym A
SID 2
SID 4
SID 5
SID 3
Depositor 2
SID 7
Synonym A
Synonym A
SID 8
SID 6
Synonym A
Depositor 3
SID 10
SID 9Synonym A
Synonym A
Depositor 4
# votes
Consensus Threshold = 60%
Single-Vote-per-Depositor Strategy
49
CID 1
CID 2
CID 3
Synonym A SID 1Depositor 1
Synonym A
Synonym A
Synonym A
Synonym A
SID 2
SID 4
SID 5
SID 3
Depositor 2
SID 7
Synonym A
Synonym A
SID 8
SID 6
Synonym A
Depositor 3
SID 10
SID 9Synonym A
Synonym A
Depositor 4
# votes
1 (33%)
2 (67%)
0 (0%)
Consensus Threshold = 60%
Single-Vote-per-Depositor Strategy
Consensus has reached!
Synonym A = CID 2
50
Additional consideration:
Different contexts of chemical sameness
CID 6305
(L-Tryptophan)
CID 1148
(Tryptophan)
CID 9060
(D-Tryptophan)
CID 12209747 CID 58478580
51
Abbr. CACTVS hash code used Description
CID CID hash code Connectivity + isotopes + stereochemistry
STE CID stereo hash code Connectivity + stereochemistry
CON CID connectivity hash code Connectivity
PCID Parent CID hash code CID of the parent compound
PSTE Parent CID stereo hash code STE of the parent compound
PCON Parent CID connectivity hash code CON of the parent compound
In practice, synonym filtering uses CACTVS hash codes (instead
of CID) to determine whether a consensus is reached or not.
Additional consideration:
Different contexts of chemical sameness
52
Filtered Depositor-provided synonyms with
the largest number of CIDs
Before Clustering After clustering
Synonym # SIDs # CIDs # SIDs # CIDs
124-07-2 (PARENT) 27 25 27 25
VITAMIN B12 38 23 37 22
159351-69-6 50 23 48 21
64-18-6 (PARENT) 25 23 22 20
1397-89-3 57 24 51 18
RIFAPENTINE 59 18 59 18
7681-93-8 44 19 43 18
NYSTATIN 61 28 34 17
50-14-6 61 17 61 17
104376-79-6 33 17 33 17
AMPHOTERICIN B 67 21 63 17
68-19-9 37 21 33 17
ACONITINE 47 19 45 17
QUININE SULFATE 38 17 38 17
53
Filtered Depositor-provided synonyms with
the largest number of CIDs
Before Clustering After clustering
Synonym # SIDs # CIDs # SIDs # CIDs
124-07-2 (PARENT) 27 25 27 25
VITAMIN B12 38 23 37 22
159351-69-6 50 23 48 21
64-18-6 (PARENT) 25 23 22 20
1397-89-3 57 24 51 18
RIFAPENTINE 59 18 59 18
7681-93-8 44 19 43 18
NYSTATIN 61 28 34 17
50-14-6 61 17 61 17
104376-79-6 33 17 33 17
AMPHOTERICIN B 67 21 63 17
68-19-9 37 21 33 17
ACONITINE 47 19 45 17
QUININE SULFATE 38 17 38 17
CAS numbers
Before Clustering After clustering
Synonym # SIDs # CIDs # SIDs # CIDs
124-07-2 (PARENT) 27 25 27 25
VITAMIN B12 38 23 37 22
159351-69-6 50 23 48 21
64-18-6 (PARENT) 25 23 22 20
1397-89-3 57 24 51 18
RIFAPENTINE 59 18 59 18
7681-93-8 44 19 43 18
NYSTATIN 61 28 34 17
50-14-6 61 17 61 17
104376-79-6 33 17 33 17
AMPHOTERICIN B 67 21 63 17
68-19-9 37 21 33 17
ACONITINE 47 19 45 17
QUININE SULFATE 38 17 38 17
54
Filtered Depositor-provided synonyms with
the largest number of CIDs
CAS numbers for
parent compounds
55
1. Synonym filtering focuses on consistency, not correctness.
• It resolves the discrepancies in name-structure associations
within & between depositors.
• It does not mean that filtered synonyms are correct.
Limitations of Synonym Filtering
Fentin acetate (CID 16682804)
Its filtered synonyms include:
• m-Nitrobenzaldehyde 3-thio-4-o-tolylsemicarbazone
• Benzaldehyde, m-nitro-, 3-thio-4-o-tolylsemicarbazone
56
Limitations of Synonym Filtering
1. Synonym filtering focuses on consistency, not correctness.
57
Limitations of Synonym Filtering
 Synonym filtering focuses on consistency, not correctness.
58
Limitations of Synonym Filtering
1. Synonym filtering focuses on consistency, not correctness.
• Data sources integrate synonym data from another sources that are
regarded to be authoritative (e.g., government resources).
• Erroneous data in one source propagate into another sources.
• This practice helps incorrect name-chemical associations getting more
votes than it should during the synonym filtering process.
59
2. More than 90% of depositor-provided synonyms occur only once.
• Automatically assigned to the structures represented by their
corresponding CIDs.
Limitations of Synonym Filtering
60
Uracil
(CID 1174)
2,4-Dihydroxypyrimidine
(SID 377954591)
2-hydroxy-4(1h)-pyrimidinone
(SID 341255477)
3. Different tautomers are merged into one standardized tautomeric
structure.
 Their names are also merged with those of the standardized
tautomer.
Limitations of Synonym Filtering
61
Limitations of Synonym Filtering
62
Summary
63
 PubChem contains a large amount of chemical information provided by
690+ data sources.
 Through the chemical structure standardization process, PubChem
standardizes depositor-provided chemical structures and extracts unique
structures.
 PubChem uses a crowd-voting-based synonym filtering to clean up
name-structure associations provided by depositors.
Summary
64
Acknowledgements
Evan Bolton
Jie Chen
Tiejun Cheng
Asta Gindulyte
Jia He
Siqian He
Qingliang Li
Benjamin Shoemaker
Thiessen Paul
Bo Yu
Leonid Zaslavsky
Jian Zhang
 The PubChem Team
 PubChem depositors, users, and collaborators
 Funded by the National Library of Medicine

More Related Content

Similar to Chemical Structure Standardization and Synonym Filtering in PubChem

表面活性剂技术
表面活性剂技术表面活性剂技术
表面活性剂技术passkalilo
 
表面活性剂技术
表面活性剂技术表面活性剂技术
表面活性剂技术passkalilo
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACSExtracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACSSimBioSys_Inc
 
Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)BIOVIA
 
Patent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsPatent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsSorel Muresan
 
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
Which Drug Did You Mean ?
Which Drug Did You Mean ?Which Drug Did You Mean ?
Which Drug Did You Mean ?Chris Southan
 
Chemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemChemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemSunghwan Kim
 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...Kamel Mansouri
 

Similar to Chemical Structure Standardization and Synonym Filtering in PubChem (20)

表面活性剂技术
表面活性剂技术表面活性剂技术
表面活性剂技术
 
表面活性剂技术
表面活性剂技术表面活性剂技术
表面活性剂技术
 
Checking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying ChemistryChecking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying Chemistry
 
foglar book.pdf
foglar book.pdffoglar book.pdf
foglar book.pdf
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACSExtracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
Extracting Synthetic Knowledge from Reaction Databases - ARChem at the 246th ACS
 
Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)
 
Patent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsPatent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patents
 
Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry
Crowdsourcing, Collaborations And Text Mining In A World Of Open ChemistryCrowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry
Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry
 
Enfin, DAS and BioMart
Enfin, DAS and BioMartEnfin, DAS and BioMart
Enfin, DAS and BioMart
 
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
Using Text-Mining and Crowdsourced Curation to Build a Structure Centric Comm...
Using Text-Mining and Crowdsourced Curation to Build a Structure Centric Comm...Using Text-Mining and Crowdsourced Curation to Build a Structure Centric Comm...
Using Text-Mining and Crowdsourced Curation to Build a Structure Centric Comm...
 
Chemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet EraChemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet Era
 
Which Drug Did You Mean ?
Which Drug Did You Mean ?Which Drug Did You Mean ?
Which Drug Did You Mean ?
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
 
Chemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemChemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChem
 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
 
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
 
Structural databases
Structural databases Structural databases
Structural databases
 

More from Sunghwan Kim

PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data ChemistrySunghwan Kim
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy trainingSunghwan Kim
 
PubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistryPubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistrySunghwan Kim
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligenceSunghwan Kim
 
PubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationPubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationSunghwan Kim
 
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Sunghwan Kim
 
Cheminformatics Education with PubChem
Cheminformatics Education with PubChemCheminformatics Education with PubChem
Cheminformatics Education with PubChemSunghwan Kim
 
PubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourcePubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourceSunghwan Kim
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistrySunghwan Kim
 
PubChem as a resource for chemical information education
PubChem as a resource for chemical information educationPubChem as a resource for chemical information education
PubChem as a resource for chemical information educationSunghwan Kim
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChemSunghwan Kim
 
Exploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoveryExploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoverySunghwan Kim
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoverySunghwan Kim
 
A Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsA Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsSunghwan Kim
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChemSunghwan Kim
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingSunghwan Kim
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem Sunghwan Kim
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
 

More from Sunghwan Kim (20)

PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
PubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistryPubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data Chemistry
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
PubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationPubChem and its application for cheminformatics education
PubChem and its application for cheminformatics education
 
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
 
Cheminformatics Education with PubChem
Cheminformatics Education with PubChemCheminformatics Education with PubChem
Cheminformatics Education with PubChem
 
PubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourcePubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information Resource
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistry
 
PubChem as a resource for chemical information education
PubChem as a resource for chemical information educationPubChem as a resource for chemical information education
PubChem as a resource for chemical information education
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
 
Exploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoveryExploiting PubChem for Drug Discovery
Exploiting PubChem for Drug Discovery
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug Discovery
 
A Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsA Brief Overview of Cheminformatics
A Brief Overview of Cheminformatics
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChem
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

Chemical Structure Standardization and Synonym Filtering in PubChem

  • 1. Chemical Structure Standardization and Synonym Filtering in PubChem Sunghwan Kim, Ph.D., M.Sc. ACS National Meeting in San Diego, CA (August 26, 2019)
  • 3. 3 PubChem  Public chemical information resource  Collects data from more than 690+ sources  Disseminates data back to the public free of charge  Contains the largest amount of publicly available chemical information  Faces unique challenges to deal with many big data issues on a daily basis. • Chemical structure standardization • Name-structure association clean up
  • 4. Depositor-provided Bioactivity test results Unique chemical structure extraction through Standardization Depositor-provided substance descriptions Unique chemical structures Activity of tested “substances” Activity of “compounds” derived from associated “substances” 690+ Data Contributors Substance deposition Assay deposition Data Organization in PubChem Substance ID (SID) Assay ID (AID) Compound ID (CID) 4
  • 5. Unique chemical structure extraction through Standardization Depositor-provided substance descriptions Unique chemical structures 690+ Data Contributors Substance deposition Data Organization in PubChem Substance ID (SID) Depositor-provided Bioactivity test results Activity of tested “substances” Activity of “compounds” derived from associated “substances” Assay deposition Assay ID (AID) Compound ID (CID) 5
  • 6. Unique chemical structure extraction through Standardization Depositor-provided substance descriptions Unique chemical structures 690+ Data Contributors Substance deposition Data Organization in PubChem Substance ID (SID) Compound ID (CID) 6  Individual data depositors provide PubChem with: • Chemical structures • Chemical names (synonyms)  They need to be organized/cleaned up through: • Structure standardization • Synonym filtering
  • 7. 7 Common Issues with Chemical Structure Representations in PubChem
  • 8. Drawing conventions Drawing conventions are often ignored in structures deposited by original data sources.
  • 9. Kekulé 1 Kekulé 2aromatic Aromatic Compounds Many Kekulé structures for aromatic compounds Which one should be used as a standard?
  • 10. Tautomerism Ionization Mesomerism Ionization Different Forms of the Same Molecule Different tautomers, resonance forms, protonation states! Choose the most stable one?
  • 11. Most stable in vacuum Most stable in water The stability depends upon the context. Different Forms of the Same Molecule
  • 13. Detect components •Isolate covalent units •Neutralize (by  H+ or e-) •Reprocess •Detect unique components PubChem Standardization Normalize representation • Tautomer invariance • Aromaticity detection • Stereochemistry • Explicit hydrogen Validate chemical contents • Atoms defined/real • Implicit hydrogen • Functional group • Atom valence Calculate •Coordinates •Properties •Descriptors
  • 15. 15 • ~90% of the substances are subject to standardization. • Mostly organic compounds. • Standardization success rate: 99.64% • Modification rate: 44.43% J. Cheminform. (2018) 10:36 Standardization Statistics
  • 16. Most stable in vacuum Most stable in water It is not necessarily what one may expect Standardized Structures Standardized by PubChem
  • 17.  In most cases, tautomeric forms of a molecule are standardized into a single form.  There are a few exceptions. CID 18630CID 31261 Standardized Structures tautomerization
  • 18. Standardization and Structure Identity Search  You can search PubChem using a structure as a query.  The input structure may be provided: • using a line notation (e.g., SMILES, InChI) • through using the PubChem Sketcher.  The input structure for identity search will be standardized first before the search is performed.  Therefore, hits from identity search may have different structures from the original input structure.
  • 21. 21 Two kinds of chemical names in PubChem
  • 22. 22 MeSH Entry Terms  A set of “terms” related to ibuprofen.  Used to index PubMed articles to help find articles about ibuprofen.
  • 23. 23 Depositor-Supplied Synonyms  Synonyms provided for “substance” records by depositors.  “Filtered” synonyms are provided on the “Compound” Summary
  • 24. 24 Raw (unfiltered) depositor-provided synonym associated with the largest number of CIDs Examples
  • 25. 25 Synonym # SIDs # CIDs N/A 6,869 6,368 SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT SUPPORTED IN CURRENT VERSION 4,903 4,902 NULL 4,610 4,599 ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION 2,554 2,554 NOT AVAILABLE 1,867 1,816 LECITHIN 1,157 1,142 DIACYLGLYCEROL 847 842 DIGLYCERIDE 841 841 MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT VERSION! 797 794 VITASMLAB 461 461 MIXTURE NAME 419 413 CLA 770 394 CHLOROPHYLL A 749 393 NA 7,081 371 Unfiltered Depositor-provided synonyms (page 1/3)
  • 26. 26 Synonym # SIDs # CIDs N/A 6,869 6,368 SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT SUPPORTED IN CURRENT VERSION 4,903 4,902 NULL 4,610 4,599 ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION 2,554 2,554 NOT AVAILABLE 1,867 1,816 LECITHIN 1,157 1,142 DIACYLGLYCEROL 847 842 DIGLYCERIDE 841 841 MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT VERSION! 797 794 VITASMLAB 461 461 MIXTURE NAME 419 413 CLA 770 394 CHLOROPHYLL A 749 393 NA 7,081 371 Various forms of “Not Available” Unfiltered Depositor-provided synonyms (page 1/3)
  • 27. 27 Synonym # SIDs # CIDs N/A 6,869 6,368 SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT SUPPORTED IN CURRENT VERSION 4,903 4,902 NULL 4,610 4,599 ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION 2,554 2,554 NOT AVAILABLE 1,867 1,816 LECITHIN 1,157 1,142 DIACYLGLYCEROL 847 842 DIGLYCERIDE 841 841 MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT VERSION! 797 794 VITASMLAB 461 461 MIXTURE NAME 419 413 CLA 770 394 CHLOROPHYLL A 749 393 NA 7,081 371 Various forms of “Not Available” Unfiltered Depositor-provided synonyms (page 1/3)
  • 28. 28 Synonym # SIDs # CIDs N/A 6,869 6,368 SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT SUPPORTED IN CURRENT VERSION 4,903 4,902 NULL 4,610 4,599 ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION 2,554 2,554 NOT AVAILABLE 1,867 1,816 LECITHIN 1,157 1,142 DIACYLGLYCEROL 847 842 DIGLYCERIDE 841 841 MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT VERSION! 797 794 VITASMLAB 461 461 MIXTURE NAME 419 413 CLA 770 394 CHLOROPHYLL A 749 393 NA 7,081 371 Various forms of “Not Available” Great reduction in the structure count after structure standardization  SIDs are standardized to Na (sodium) Unfiltered Depositor-provided synonyms (page 1/3)
  • 29. 29 Synonym # SIDs # CIDs N/A 6,869 6,368 SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT SUPPORTED IN CURRENT VERSION 4,903 4,902 NULL 4,610 4,599 ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION 2,554 2,554 NOT AVAILABLE 1,867 1,816 LECITHIN 1,157 1,142 DIACYLGLYCEROL 847 842 DIGLYCERIDE 841 841 MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT VERSION! 797 794 VITASMLAB 461 461 MIXTURE NAME 419 413 CLA 770 394 CHLOROPHYLL A 749 393 NA 7,081 371 Error messages from name generation software Unfiltered Depositor-provided synonyms (page 1/3)
  • 30. 30 Synonym # SIDs # CIDs N/A 6,869 6,368 SPIRO COMPOUNDS WITH POLYCYCLIC COMPONENTS ARE NOT SUPPORTED IN CURRENT VERSION 4,903 4,902 NULL 4,610 4,599 ASSEMBLIES OF CYCLIC SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION 2,554 2,554 NOT AVAILABLE 1,867 1,816 LECITHIN 1,157 1,142 DIACYLGLYCEROL 847 842 DIGLYCERIDE 841 841 MULTIPLICATIVE NOMENCLATURE IS NOT SUPPORTED IN CURRENT VERSION! 797 794 VITASMLAB 461 461 MIXTURE NAME 419 413 CLA 770 394 CHLOROPHYLL A 749 393 NA 7,081 371 Names of chemical classes Unfiltered Depositor-provided synonyms (page 1/3)
  • 31. 31 Synonym # SIDs # CIDs (1-(5-CARBOXYPENTYL)-3,3-DIMETHYL-3H-INDOL-1-IUM-2- YL)METHANIDE HYDROBROMIDE 405 345 ETHANONE,1- - 328 328 CANNOT MAKE CHOICE: LIGANDS ARE COMPARED UP TO 10 SPHERES 304 304 COMPLEX BRIDGED FUSED SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION! 302 302 TRIACYLGLYCEROL 286 285 TRIGLYCERIDE 286 285 QUINOLONE DER. 280 279 UNABLE TO GENERATE VALUE 274 264 UNL 656 255 UNKNOWN LIGAND 615 235 HEPT DERIV. 213 211 MULTIPARENT NAMES FOR FUSED SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION! 208 208 ACHIRAL CENTER(S) 187 187 Unfiltered Depositor-provided synonyms (page 2/3)
  • 32. 32 Synonym # SIDs # CIDs (1-(5-CARBOXYPENTYL)-3,3-DIMETHYL-3H-INDOL-1-IUM-2- YL)METHANIDE HYDROBROMIDE 405 345 ETHANONE,1- - 328 328 CANNOT MAKE CHOICE: LIGANDS ARE COMPARED UP TO 10 SPHERES 304 304 COMPLEX BRIDGED FUSED SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION! 302 302 TRIACYLGLYCEROL 286 285 TRIGLYCERIDE 286 285 QUINOLONE DER. 280 279 UNABLE TO GENERATE VALUE 274 264 UNL 656 255 UNKNOWN LIGAND 615 235 HEPT DERIV. 213 211 MULTIPARENT NAMES FOR FUSED SYSTEMS ARE NOT SUPPORTED IN CURRENT VERSION! 208 208 ACHIRAL CENTER(S) 187 187 “Derivative” of a chemical Unfiltered Depositor-provided synonyms (page 2/3)
  • 33. 33 Synonym # SIDs # CIDs C9H11NO2 179 174 HEM 4,645 165 BCR 290 160 C10H13NO2 161 154 BETA-CAROTENE 298 147 C8H10N2O2 149 144 C10H10N2O2 149 143 -ACETICACID 141 141 C9H8N2O2 143 141 PROTOPORPHYRIN IX CONTAINING FE 3,690 140 C8H9NO2 144 139 NAG 9,599 130 METHANOL 247 128 C8H9NO3 129 127 C10H9NO2 133 126 PYRIDINONE DERIV. 130 126 N. A. 128 125 Unfiltered Depositor-provided synonyms (page 3/3)
  • 34. 34 Synonym # SIDs # CIDs C9H11NO2 179 174 HEM 4,645 165 BCR 290 160 C10H13NO2 161 154 BETA-CAROTENE 298 147 C8H10N2O2 149 144 C10H10N2O2 149 143 -ACETICACID 141 141 C9H8N2O2 143 141 PROTOPORPHYRIN IX CONTAINING FE 3,690 140 C8H9NO2 144 139 NAG 9,599 130 METHANOL 247 128 C8H9NO3 129 127 C10H9NO2 133 126 PYRIDINONE DERIV. 130 126 N. A. 128 125 Molecular formula Unfiltered Depositor-provided synonyms (page 3/3)
  • 35. 35 Synonym # SIDs # CIDs C9H11NO2 179 174 HEM 4,645 165 BCR 290 160 C10H13NO2 161 154 BETA-CAROTENE 298 147 C8H10N2O2 149 144 C10H10N2O2 149 143 -ACETICACID 141 141 C9H8N2O2 143 141 PROTOPORPHYRIN IX CONTAINING FE 3,690 140 C8H9NO2 144 139 NAG 9,599 130 METHANOL 247 128 C8H9NO3 129 127 C10H9NO2 133 126 PYRIDINONE DERIV. 130 126 N. A. 128 125 Abbreviation for chemical names Unfiltered Depositor-provided synonyms (page 3/3)
  • 36. 36 Synonym # SIDs # CIDs C9H11NO2 179 174 HEM 4,645 165 BCR 290 160 C10H13NO2 161 154 BETA-CAROTENE 298 147 C8H10N2O2 149 144 C10H10N2O2 149 143 -ACETICACID 141 141 C9H8N2O2 143 141 PROTOPORPHYRIN IX CONTAINING FE 3,690 140 C8H9NO2 144 139 NAG 9,599 130 METHANOL 247 128 C8H9NO3 129 127 C10H9NO2 133 126 PYRIDINONE DERIV. 130 126 N. A. 128 125 Abbreviation for chemical names Unfiltered Depositor-provided synonyms (page 3/3) Description
  • 37. 37 Synonym # SIDs # CIDs C9H11NO2 179 174 HEM 4,645 165 BCR 290 160 C10H13NO2 161 154 BETA-CAROTENE 298 147 C8H10N2O2 149 144 C10H10N2O2 149 143 -ACETICACID 141 141 C9H8N2O2 143 141 PROTOPORPHYRIN IX CONTAINING FE 3,690 140 C8H9NO2 144 139 NAG 9,599 130 METHANOL 247 128 C8H9NO3 129 127 C10H9NO2 133 126 PYRIDINONE DERIV. 130 126 N. A. 128 125 Abbreviation for chemical names Unfiltered Depositor-provided synonyms (page 3/3) Description “Not available”
  • 38. 38 Unfiltered Depositor-provided synonyms  Depositor-provided synonyms include: • Real chemical names • Abbreviations for chemical names • “Derivatives” of some chemicals • Names of chemical classes • Molecular formula • N/A, NULL, Not Available, NA, N.A., etc • Error messages or comments  Not feasible to manually clean up.  PubChem uses crowd-voting-based synonym filtering.
  • 40. 40 PubChem Synonym filtering  Crowd-voting approach  Check for a consensus on the name-structure association between depositors.  Consensus threshold : >60% of the total votes  When a consensus is reached, the synonym is added to the “filtered” synonym list of the corresponding compound (standardized structure).
  • 41. 41 CID 1 Synonym A SID 1Depositor 1 Synonyms that occurs only “once”  No disagreement in the name-structure association  Consider that the Synonym A means CID 1, (although it may not be correct)
  • 42. 42 CID 1 CID 2 CID 3 Synonym A SID 1Depositor 1 Synonym A Synonym A Synonym A Synonym A SID 2 SID 4 SID 5 SID 3 Depositor 2 SID 7 Synonym A Synonym A SID 8 SID 6 Synonym A Depositor 3 SID 10 SID 9Synonym A Synonym A Depositor 4 Synonyms occurring multiple times Which one is the best choice?
  • 43. 43 Synonym filtering using crowd voting  Two potential approaches • Multiple-votes-per-depositor • Single-vote-per-depositor
  • 44. 44 CID 1 CID 2 CID 3 Synonym A SID 1Depositor 1 Synonym A Synonym A Synonym A Synonym A SID 2 SID 4 SID 5 SID 3 Depositor 2 SID 7 Synonym A Synonym A SID 8 SID 6 Synonym A Depositor 3 SID 10 SID 9Synonym A Synonym A Depositor 4 # votes 3 (30%) 5 (50%) 2 (20%) Consensus Threshold = 60% Multiple-Votes-per-Depositor Strategy
  • 45. 45 CID 1 CID 2 CID 3 Synonym A SID 1Depositor 1 Synonym A Synonym A Synonym A Synonym A SID 2 SID 4 SID 5 SID 3 Depositor 2 SID 7 Synonym A Synonym A SID 8 SID 6 Synonym A Depositor 3 SID 10 SID 9Synonym A Synonym A Depositor 4 # votes Consensus Threshold = 60% Single-Vote-per-Depositor Strategy
  • 46. 46 CID 1 CID 2 CID 3 Synonym A SID 1Depositor 1 Synonym A Synonym A Synonym A Synonym A SID 2 SID 4 SID 5 SID 3 Depositor 2 SID 7 Synonym A Synonym A SID 8 SID 6 Synonym A Depositor 3 SID 10 SID 9Synonym A Synonym A Depositor 4 # votes Consensus Threshold = 60% Single-Vote-per-Depositor Strategy
  • 47. 47 CID 1 CID 2 CID 3 Synonym A SID 1Depositor 1 Synonym A Synonym A Synonym A Synonym A SID 2 SID 4 SID 5 SID 3 Depositor 2 SID 7 Synonym A Synonym A SID 8 SID 6 Synonym A Depositor 3 SID 10 SID 9Synonym A Synonym A Depositor 4 # votes Consensus Threshold = 60% Single-Vote-per-Depositor Strategy
  • 48. 48 CID 1 CID 2 CID 3 Synonym A SID 1Depositor 1 Synonym A Synonym A Synonym A Synonym A SID 2 SID 4 SID 5 SID 3 Depositor 2 SID 7 Synonym A Synonym A SID 8 SID 6 Synonym A Depositor 3 SID 10 SID 9Synonym A Synonym A Depositor 4 # votes Consensus Threshold = 60% Single-Vote-per-Depositor Strategy
  • 49. 49 CID 1 CID 2 CID 3 Synonym A SID 1Depositor 1 Synonym A Synonym A Synonym A Synonym A SID 2 SID 4 SID 5 SID 3 Depositor 2 SID 7 Synonym A Synonym A SID 8 SID 6 Synonym A Depositor 3 SID 10 SID 9Synonym A Synonym A Depositor 4 # votes 1 (33%) 2 (67%) 0 (0%) Consensus Threshold = 60% Single-Vote-per-Depositor Strategy Consensus has reached! Synonym A = CID 2
  • 50. 50 Additional consideration: Different contexts of chemical sameness CID 6305 (L-Tryptophan) CID 1148 (Tryptophan) CID 9060 (D-Tryptophan) CID 12209747 CID 58478580
  • 51. 51 Abbr. CACTVS hash code used Description CID CID hash code Connectivity + isotopes + stereochemistry STE CID stereo hash code Connectivity + stereochemistry CON CID connectivity hash code Connectivity PCID Parent CID hash code CID of the parent compound PSTE Parent CID stereo hash code STE of the parent compound PCON Parent CID connectivity hash code CON of the parent compound In practice, synonym filtering uses CACTVS hash codes (instead of CID) to determine whether a consensus is reached or not. Additional consideration: Different contexts of chemical sameness
  • 52. 52 Filtered Depositor-provided synonyms with the largest number of CIDs Before Clustering After clustering Synonym # SIDs # CIDs # SIDs # CIDs 124-07-2 (PARENT) 27 25 27 25 VITAMIN B12 38 23 37 22 159351-69-6 50 23 48 21 64-18-6 (PARENT) 25 23 22 20 1397-89-3 57 24 51 18 RIFAPENTINE 59 18 59 18 7681-93-8 44 19 43 18 NYSTATIN 61 28 34 17 50-14-6 61 17 61 17 104376-79-6 33 17 33 17 AMPHOTERICIN B 67 21 63 17 68-19-9 37 21 33 17 ACONITINE 47 19 45 17 QUININE SULFATE 38 17 38 17
  • 53. 53 Filtered Depositor-provided synonyms with the largest number of CIDs Before Clustering After clustering Synonym # SIDs # CIDs # SIDs # CIDs 124-07-2 (PARENT) 27 25 27 25 VITAMIN B12 38 23 37 22 159351-69-6 50 23 48 21 64-18-6 (PARENT) 25 23 22 20 1397-89-3 57 24 51 18 RIFAPENTINE 59 18 59 18 7681-93-8 44 19 43 18 NYSTATIN 61 28 34 17 50-14-6 61 17 61 17 104376-79-6 33 17 33 17 AMPHOTERICIN B 67 21 63 17 68-19-9 37 21 33 17 ACONITINE 47 19 45 17 QUININE SULFATE 38 17 38 17 CAS numbers
  • 54. Before Clustering After clustering Synonym # SIDs # CIDs # SIDs # CIDs 124-07-2 (PARENT) 27 25 27 25 VITAMIN B12 38 23 37 22 159351-69-6 50 23 48 21 64-18-6 (PARENT) 25 23 22 20 1397-89-3 57 24 51 18 RIFAPENTINE 59 18 59 18 7681-93-8 44 19 43 18 NYSTATIN 61 28 34 17 50-14-6 61 17 61 17 104376-79-6 33 17 33 17 AMPHOTERICIN B 67 21 63 17 68-19-9 37 21 33 17 ACONITINE 47 19 45 17 QUININE SULFATE 38 17 38 17 54 Filtered Depositor-provided synonyms with the largest number of CIDs CAS numbers for parent compounds
  • 55. 55 1. Synonym filtering focuses on consistency, not correctness. • It resolves the discrepancies in name-structure associations within & between depositors. • It does not mean that filtered synonyms are correct. Limitations of Synonym Filtering Fentin acetate (CID 16682804) Its filtered synonyms include: • m-Nitrobenzaldehyde 3-thio-4-o-tolylsemicarbazone • Benzaldehyde, m-nitro-, 3-thio-4-o-tolylsemicarbazone
  • 56. 56 Limitations of Synonym Filtering 1. Synonym filtering focuses on consistency, not correctness.
  • 57. 57 Limitations of Synonym Filtering  Synonym filtering focuses on consistency, not correctness.
  • 58. 58 Limitations of Synonym Filtering 1. Synonym filtering focuses on consistency, not correctness. • Data sources integrate synonym data from another sources that are regarded to be authoritative (e.g., government resources). • Erroneous data in one source propagate into another sources. • This practice helps incorrect name-chemical associations getting more votes than it should during the synonym filtering process.
  • 59. 59 2. More than 90% of depositor-provided synonyms occur only once. • Automatically assigned to the structures represented by their corresponding CIDs. Limitations of Synonym Filtering
  • 60. 60 Uracil (CID 1174) 2,4-Dihydroxypyrimidine (SID 377954591) 2-hydroxy-4(1h)-pyrimidinone (SID 341255477) 3. Different tautomers are merged into one standardized tautomeric structure.  Their names are also merged with those of the standardized tautomer. Limitations of Synonym Filtering
  • 63. 63  PubChem contains a large amount of chemical information provided by 690+ data sources.  Through the chemical structure standardization process, PubChem standardizes depositor-provided chemical structures and extracts unique structures.  PubChem uses a crowd-voting-based synonym filtering to clean up name-structure associations provided by depositors. Summary
  • 64. 64 Acknowledgements Evan Bolton Jie Chen Tiejun Cheng Asta Gindulyte Jia He Siqian He Qingliang Li Benjamin Shoemaker Thiessen Paul Bo Yu Leonid Zaslavsky Jian Zhang  The PubChem Team  PubChem depositors, users, and collaborators  Funded by the National Library of Medicine