SlideShare a Scribd company logo
1 of 30
Download to read offline
1
OPSIN
Taming the jungle of IUPAC
chemical nomenclature
Daniel Lowe, Peter Murray-Rust, Robert C Glen
8th September 2013
Indianapolis, ACS
4-[(19S,21R,26R,27S)-19,21-dihydroxy-27-methoxy-26-
methylnonacosyl]phenyl 3,6-di-O-methyl-α-D-glucopyranosyl-(1→4)-2,3-di-O-
methyl-α-L-rhamnopyranosyl-(1→2)-3-O-methyl-α-L-rhamnopyranoside
2
ol
What is chemical name to structure?
(2S)- but2-Amino 1--
Stereochemistry locant substituent locant alk unsaturation suffix
an
NH2•
1
2
3
4
3
• Identify documents by their chemical
structures
• Assist with structure viewing
• Identify incorrect chemical names
• Extract reagent structures hence allowing
reactions to be reconstructed from text
Uses of chemical name to structure
4
5
Parsing
• Over 4000 discrete morphemes form the
program’s vocabulary
(a morpheme is the smallest section of a word with meaning)
• These are grouped into 140 classes e.g.
• unsaturator (‘ene’)
• aminoAcidEndsInIne (‘tyros’)
• simpleSubstituent (‘amino’)
6
Word Rule Example
acetal Propanal dimethyl acetal
additionCompound Carbon tetrachloride
acidHalideOrPseudoHalide Cyanic chloride
amide Nitrous amide
anhydride Acetic anhydride
biochemicalEster Adenosine 5'-triphosphate
carbonylDerivative Propanone oxime
divalentFunctionalGroup Diethyl ether
ester Ethyl ethanoate
functionalClassEster Acetic acid ethyl ester
functionGroupAsGroup Cyanide
glycol Ethylene glycol
glycolEther Ethylene glycol monomethyl ether
hydrazide Phosphoric hydrazide
monovalentFunctionalGroup Ethyl alcohol
multiEster Ethyl propyl methylphosphonate
oxide Thiophene 1,1-dioxide
polymer Poly(ethylene)
simple Ethylbenzene
substituent Chloro
7
Supported chain nomenclature
Alkanes Heteroatom hydrides Heterogeneous heteroatom hydrides
dodectetractkiliane pentaphosphane disilazane
Trivial acids
butyric acid
8
Supported ring nomenclature
Monocyclic spiro
dispiro[4.2.4.2]tetradecane
Hantzsch-Widman
1,3,5-triazine
furo[3,2-b]thieno[2,3-e]pyridine 2,2':6',2''-terpyridyl
Fused ring Ring assembly
Von Baeyer
tricyclo[2.2.1.12,5]octane
Polycyclic spiro
spiro[piperidine-4,9'-xanthene]
9
Structural assembly nomenclature
Conjunctive nomenclature
benzeneethanol
Substitutive nomenclature
2,4,6-trinitrotoluene
Additive nomenclature
methylsulfonyl
Multiplicative nomenclature
4,4'-methylenedioxydibenzoic acid
Functional class
nomenclature
ethyl alcohol
10
Structural modifications
Heteroatom replacement
1-thia-4-aza-2,6-disilacyclohexane
Unsaturation
hexa-1,3-dien-5-yne
Hydro, dehydro, indicated
hydrogen and added hydrogen
2,7-dihydro-1H-azepine
Functional replacement
Suffixes including
infixed suffixes
methanedithioic acid
1-chloro-2,4-
diimidotricarbonic acid
Lambda convention
2λ6-trisulfane
11
Bridges and stereochemistry
Bridges
4a,8a-propanoquinoline
E/Z stereochemistry
(Z)-2-chloro-but-2-ene
Relative cis/trans stereochemistry
trans-2,6-dimethyl-2,6-dihydronaphthalene
R/S stereochemistry
(1R,3S)-3-amino-3-methylcyclohexanol
12
Miscellaneous nomenclature
1,3-xylene
Groups with indeterminately
positioned structural features
Charge and oxidation
numbers
methylmercury(1+) or
methylmercury(II)
“per-nomenclature”
2-deoxy-ᴅ-ribose
Subtractive nomenclature
perhydroanthracene
perchlorobenzene
13
Polymer nomenclature
poly[(benzo[1,2-d:4,5-d']bis[1,3]thiazole-2,6-diyl)-1,4-phenyleneoxy-1,3-phenylene(1,3,5,7-tetraoxo-
1,2,3,5,6,7-hexahydrobenzo[1,2-c:4,5-c']dipyrrole-2,6-diyl)-1,3-phenyleneoxy-1,4-phenylene]
Structure-based polymer nomenclature
14
Domain specific nomenclature
Steroid nomenclature
17β-Hydroxy-8α,9β,10α-androst-4-en-3-one
ʟ-leucinamide
Amino acid
cyclo(ᴅ-alanyl-ʟ-phenylalanyl)ʟ-arginyl-O-phosphono-ʟ-seryl-ʟ-alanyl-ʟ-proline
Oligopeptide Cyclic peptide
guanylyl(3'-5')uridine 3'-monophosphate
Nucleotide nomenclature
15
Carbohydrate nomenclature (acyclic)
ᴅ-gluco-hexose or
ᴅ-glucose (preferred)
ʟ-ribo-ᴅ-manno-nonose
• Carbohydrates are defined using configurational prefixes
that each specify the stereochemistry for between 1 and 4
stereocentres
16
Carbohydrate derivatives
• These carbohydrate chains can then be algorithmically
modified by suffixes
ᴅ-glucose
ᴅ-glucitol
ᴅ-glucaric acid
ᴅ-gluconic acid
17
Carbohydrate nomenclature (cyclic)
α-ᴅ-glucopyranose
2,7-anhydro-D-glycero-β-D-galacto-oct-2-
ulopyranosonic acid
ᴅ-glucose
18
Carbohydrate nomenclature
(oligosaccharides)
β-ᴅ-Fructofuranosyl α-ᴅ-glucopyranoside β-ᴅ-glucopyranosyl-(1→3)-β-ᴅ-glucopyranosyl-
(1→3)-ᴅ-glucopyranose
19
Fused ring nomenclature
• All fused ring nomenclature is processed algorithmically e.g.
even benzofuran is constructed from benzene and furan
rather than being a trivial name
• For example:
benzo[b]cycloocta[jk]fluorene
8
6 6
6
5
20
Fused ring nomenclature
(numbering)
• Transform to an idealised grid aligned along
the longest row of rings
• Apply quadrant rules e.g. favour most rings in
upper right quadrant
8 6
6 6 5 6 6 6 5 8
8 6 6 5 6 6 6 6 5 8
6 6 5 6 8 6 6 5 8 6
21
Fused ring nomenclature
(numbering)
• Atoms numbered in ascending order from
upper rightmost ring
6
6 6 5 8
Peripheral numbering rules used to
choose grid layout that gives the
best numbering
22
Beyond IUPAC:
CAS index name un-inversion
CAS Index Name IUPAC name
benzene, ethyl- ethylbenzene
Disulfide, bis(2-chloroethyl) Bis(2-chloroethyl) disulfide
Benzoic acid, 4,4’-methylenebis[2-chloro- 4,4'-Methylenebis[2-chlorobenzoic acid]
Phosphoric acid, ethyl dimethyl ester ethyl dimethyl phosphate
23
Beyond IUPAC:
Correcting missing spaces
tert-butylacetate tert-butyl acetate
tert-butyl-4-vinylperbenzoate
No locant and perbenzoate has more
than one non-degenerate hydrogen
diethylcarbonate
Has no substitutable hydrogen
Ethylacetate
non-ester would be
butanoate or butyrate!
24
Performance on machine-
generated names
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
ACD/Name 12.02
Names
ChemBioDraw13
Names
Lexichem 2.1 Names Marvin 6.0.2 Names
No Result
Constitutional Discrepancy
Stereochemical Discrepancy
Correctly Interpreted
30,000 structures randomly selected from PubChem
used as input to machine-generate names
25
Performance on unique names
from US patent headings
26
What’s not supported
• Parsing of generic chemical names
• E.g. 2- or 3- alkylsubstitutedbenzofurans
• Advanced inorganic nomenclature e.g. coordinate bonding
• Some natural product nomenclature
• Advanced stereochemistry e.g. pseudo asymmetric stereo
centers, axial stereochemistry etc.
27
Usage
Batch conversion on the
command line
RESTful web service
(opsin.ch.cam.ac.uk)
NameToStructure nts = NameToStructure.getInstance();
String chemicalName = "acetonitrile";
String smiles = nts.parseToSmiles(chemicalName);
Java API
java -jar opsin-1.5.0-jar-with-dependencies.jar -osmi input.txt output.smi
28
Who is using OPSIN?
Commercial software
Cinfony
(interface to
Python)
Many text mining efforts
Workflows
Web services
29
Conclusions
• OPSIN combines high recall, precision and speed of
execution
• Recent improvements have significantly improved
coverage of biochemical nomenclature
Visit opsin.ch.cam.ac.uk to try it out and download!
30
OPSIN: Taming the jungle of IUPAC chemical nomenclature
daniel@nextmovesoftware.com
For more information see:
Chemical Name to Structure: OPSIN, an Open Source Solution
J. Chem. Inf. Model., 2011, 51 (3), pp 739–753
Extraction of chemical structures and reactions from the literature
(https://www.repository.cam.ac.uk/handle/1810/244727)
Acknowledgements
Albina Asadulina
Rich Apodaca
Peter Corbett
Roger Sayle
Funding

More Related Content

Viewers also liked

Viewers also liked (8)

Translating IUPAC - like Chemical Nomenclature to and from Simplified Chinese
Translating IUPAC - like Chemical Nomenclature to and from Simplified ChineseTranslating IUPAC - like Chemical Nomenclature to and from Simplified Chinese
Translating IUPAC - like Chemical Nomenclature to and from Simplified Chinese
 
Ch13 Concentration
Ch13   ConcentrationCh13   Concentration
Ch13 Concentration
 
Chapter 14
Chapter 14Chapter 14
Chapter 14
 
Chemical nomenclature 1
Chemical nomenclature 1Chemical nomenclature 1
Chemical nomenclature 1
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Acids and Bases
Acids and BasesAcids and Bases
Acids and Bases
 
Relative molecular mass and percentage composition
Relative molecular mass and percentage compositionRelative molecular mass and percentage composition
Relative molecular mass and percentage composition
 
Acids Bases and Salts
Acids Bases and SaltsAcids Bases and Salts
Acids Bases and Salts
 

Similar to OPSIN: Taming the jungle of IUPAC chemical nomenclature

Similar to OPSIN: Taming the jungle of IUPAC chemical nomenclature (12)

Organic chemistry nomenclature
Organic chemistry nomenclatureOrganic chemistry nomenclature
Organic chemistry nomenclature
 
Combined Draft 4
Combined Draft 4 Combined Draft 4
Combined Draft 4
 
Systematic naming software tools for exchange
Systematic naming software tools for exchangeSystematic naming software tools for exchange
Systematic naming software tools for exchange
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics II
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
Chemxseer qr-sagnik
Chemxseer qr-sagnikChemxseer qr-sagnik
Chemxseer qr-sagnik
 
Research Proposal.pptx
Research Proposal.pptxResearch Proposal.pptx
Research Proposal.pptx
 
Manisha.pptx
Manisha.pptxManisha.pptx
Manisha.pptx
 
Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...
 
Biodegradable Polymers
Biodegradable PolymersBiodegradable Polymers
Biodegradable Polymers
 
B.sc. frst yr fundamental concepts
B.sc. frst yr fundamental conceptsB.sc. frst yr fundamental concepts
B.sc. frst yr fundamental concepts
 
Chemical Structure Standardization and Synonym Filtering in PubChem
Chemical Structure Standardization and Synonym Filtering in PubChemChemical Structure Standardization and Synonym Filtering in PubChem
Chemical Structure Standardization and Synonym Filtering in PubChem
 

More from dan2097

From Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resourcesFrom Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resourcesdan2097
 
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...dan2097
 
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical NomenclatureOPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclaturedan2097
 
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms
Evaluating the Quality and Performance of Automatic Atom Mapping AlgorithmsEvaluating the Quality and Performance of Automatic Atom Mapping Algorithms
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithmsdan2097
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patentsdan2097
 
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChIInChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChIdan2097
 
Automated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent LiteratureAutomated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent Literaturedan2097
 

More from dan2097 (7)

From Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resourcesFrom Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resources
 
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
 
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical NomenclatureOPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
 
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms
Evaluating the Quality and Performance of Automatic Atom Mapping AlgorithmsEvaluating the Quality and Performance of Automatic Atom Mapping Algorithms
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
 
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChIInChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
 
Automated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent LiteratureAutomated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent Literature
 

Recently uploaded

No 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaNo 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaAmil Baba Mangal Maseeh
 
Asli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAsli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAmil Baba Mangal Maseeh
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Mangal Maseeh
 
Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24deerfootcoc
 
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdfThe-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdfSana Khan
 
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxThe Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxNetwork Bible Fellowship
 
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...baharayali
 
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedA Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedVintage Church
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Mangal Maseeh
 
Asli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in LahoreAsli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in Lahoreamil baba kala jadu
 
Unity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfUnity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfRebeccaSealfon
 
Culture Clash_Bioethical Concerns_Slideshare Version.pptx
Culture Clash_Bioethical Concerns_Slideshare Version.pptxCulture Clash_Bioethical Concerns_Slideshare Version.pptx
Culture Clash_Bioethical Concerns_Slideshare Version.pptxStephen Palm
 
Seerah un nabi Muhammad Quiz Part-1.pdf
Seerah un nabi  Muhammad Quiz Part-1.pdfSeerah un nabi  Muhammad Quiz Part-1.pdf
Seerah un nabi Muhammad Quiz Part-1.pdfAnsariB1
 
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAsli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAmil Baba Mangal Maseeh
 
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Darul Amal Chishtia
 
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfUnity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfRebeccaSealfon
 
Study of the Psalms Chapter 1 verse 1 by wanderean
Study of the Psalms Chapter 1 verse 1 by wandereanStudy of the Psalms Chapter 1 verse 1 by wanderean
Study of the Psalms Chapter 1 verse 1 by wandereanmaricelcanoynuay
 
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxThe King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxOH TEIK BIN
 

Recently uploaded (20)

No 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaNo 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in Canada
 
Asli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAsli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahir
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24
 
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdfThe-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
The-Clear-Quran,-A-Thematic-English-Translation-by-Dr-Mustafa-Khattab.pdf
 
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxThe Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
 
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
 
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedA Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
Asli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in LahoreAsli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in Lahore
 
Unity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfUnity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdf
 
Culture Clash_Bioethical Concerns_Slideshare Version.pptx
Culture Clash_Bioethical Concerns_Slideshare Version.pptxCulture Clash_Bioethical Concerns_Slideshare Version.pptx
Culture Clash_Bioethical Concerns_Slideshare Version.pptx
 
Top 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdfTop 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdf
 
Seerah un nabi Muhammad Quiz Part-1.pdf
Seerah un nabi  Muhammad Quiz Part-1.pdfSeerah un nabi  Muhammad Quiz Part-1.pdf
Seerah un nabi Muhammad Quiz Part-1.pdf
 
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAsli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
 
St. Louise de Marillac: Animator of the Confraternities of Charity
St. Louise de Marillac: Animator of the Confraternities of CharitySt. Louise de Marillac: Animator of the Confraternities of Charity
St. Louise de Marillac: Animator of the Confraternities of Charity
 
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
 
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfUnity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
 
Study of the Psalms Chapter 1 verse 1 by wanderean
Study of the Psalms Chapter 1 verse 1 by wandereanStudy of the Psalms Chapter 1 verse 1 by wanderean
Study of the Psalms Chapter 1 verse 1 by wanderean
 
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxThe King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
 

OPSIN: Taming the jungle of IUPAC chemical nomenclature

  • 1. 1 OPSIN Taming the jungle of IUPAC chemical nomenclature Daniel Lowe, Peter Murray-Rust, Robert C Glen 8th September 2013 Indianapolis, ACS 4-[(19S,21R,26R,27S)-19,21-dihydroxy-27-methoxy-26- methylnonacosyl]phenyl 3,6-di-O-methyl-α-D-glucopyranosyl-(1→4)-2,3-di-O- methyl-α-L-rhamnopyranosyl-(1→2)-3-O-methyl-α-L-rhamnopyranoside
  • 2. 2 ol What is chemical name to structure? (2S)- but2-Amino 1-- Stereochemistry locant substituent locant alk unsaturation suffix an NH2• 1 2 3 4
  • 3. 3 • Identify documents by their chemical structures • Assist with structure viewing • Identify incorrect chemical names • Extract reagent structures hence allowing reactions to be reconstructed from text Uses of chemical name to structure
  • 4. 4
  • 5. 5 Parsing • Over 4000 discrete morphemes form the program’s vocabulary (a morpheme is the smallest section of a word with meaning) • These are grouped into 140 classes e.g. • unsaturator (‘ene’) • aminoAcidEndsInIne (‘tyros’) • simpleSubstituent (‘amino’)
  • 6. 6 Word Rule Example acetal Propanal dimethyl acetal additionCompound Carbon tetrachloride acidHalideOrPseudoHalide Cyanic chloride amide Nitrous amide anhydride Acetic anhydride biochemicalEster Adenosine 5'-triphosphate carbonylDerivative Propanone oxime divalentFunctionalGroup Diethyl ether ester Ethyl ethanoate functionalClassEster Acetic acid ethyl ester functionGroupAsGroup Cyanide glycol Ethylene glycol glycolEther Ethylene glycol monomethyl ether hydrazide Phosphoric hydrazide monovalentFunctionalGroup Ethyl alcohol multiEster Ethyl propyl methylphosphonate oxide Thiophene 1,1-dioxide polymer Poly(ethylene) simple Ethylbenzene substituent Chloro
  • 7. 7 Supported chain nomenclature Alkanes Heteroatom hydrides Heterogeneous heteroatom hydrides dodectetractkiliane pentaphosphane disilazane Trivial acids butyric acid
  • 8. 8 Supported ring nomenclature Monocyclic spiro dispiro[4.2.4.2]tetradecane Hantzsch-Widman 1,3,5-triazine furo[3,2-b]thieno[2,3-e]pyridine 2,2':6',2''-terpyridyl Fused ring Ring assembly Von Baeyer tricyclo[2.2.1.12,5]octane Polycyclic spiro spiro[piperidine-4,9'-xanthene]
  • 9. 9 Structural assembly nomenclature Conjunctive nomenclature benzeneethanol Substitutive nomenclature 2,4,6-trinitrotoluene Additive nomenclature methylsulfonyl Multiplicative nomenclature 4,4'-methylenedioxydibenzoic acid Functional class nomenclature ethyl alcohol
  • 10. 10 Structural modifications Heteroatom replacement 1-thia-4-aza-2,6-disilacyclohexane Unsaturation hexa-1,3-dien-5-yne Hydro, dehydro, indicated hydrogen and added hydrogen 2,7-dihydro-1H-azepine Functional replacement Suffixes including infixed suffixes methanedithioic acid 1-chloro-2,4- diimidotricarbonic acid Lambda convention 2λ6-trisulfane
  • 11. 11 Bridges and stereochemistry Bridges 4a,8a-propanoquinoline E/Z stereochemistry (Z)-2-chloro-but-2-ene Relative cis/trans stereochemistry trans-2,6-dimethyl-2,6-dihydronaphthalene R/S stereochemistry (1R,3S)-3-amino-3-methylcyclohexanol
  • 12. 12 Miscellaneous nomenclature 1,3-xylene Groups with indeterminately positioned structural features Charge and oxidation numbers methylmercury(1+) or methylmercury(II) “per-nomenclature” 2-deoxy-ᴅ-ribose Subtractive nomenclature perhydroanthracene perchlorobenzene
  • 14. 14 Domain specific nomenclature Steroid nomenclature 17β-Hydroxy-8α,9β,10α-androst-4-en-3-one ʟ-leucinamide Amino acid cyclo(ᴅ-alanyl-ʟ-phenylalanyl)ʟ-arginyl-O-phosphono-ʟ-seryl-ʟ-alanyl-ʟ-proline Oligopeptide Cyclic peptide guanylyl(3'-5')uridine 3'-monophosphate Nucleotide nomenclature
  • 15. 15 Carbohydrate nomenclature (acyclic) ᴅ-gluco-hexose or ᴅ-glucose (preferred) ʟ-ribo-ᴅ-manno-nonose • Carbohydrates are defined using configurational prefixes that each specify the stereochemistry for between 1 and 4 stereocentres
  • 16. 16 Carbohydrate derivatives • These carbohydrate chains can then be algorithmically modified by suffixes ᴅ-glucose ᴅ-glucitol ᴅ-glucaric acid ᴅ-gluconic acid
  • 18. 18 Carbohydrate nomenclature (oligosaccharides) β-ᴅ-Fructofuranosyl α-ᴅ-glucopyranoside β-ᴅ-glucopyranosyl-(1→3)-β-ᴅ-glucopyranosyl- (1→3)-ᴅ-glucopyranose
  • 19. 19 Fused ring nomenclature • All fused ring nomenclature is processed algorithmically e.g. even benzofuran is constructed from benzene and furan rather than being a trivial name • For example: benzo[b]cycloocta[jk]fluorene 8 6 6 6 5
  • 20. 20 Fused ring nomenclature (numbering) • Transform to an idealised grid aligned along the longest row of rings • Apply quadrant rules e.g. favour most rings in upper right quadrant 8 6 6 6 5 6 6 6 5 8 8 6 6 5 6 6 6 6 5 8 6 6 5 6 8 6 6 5 8 6
  • 21. 21 Fused ring nomenclature (numbering) • Atoms numbered in ascending order from upper rightmost ring 6 6 6 5 8 Peripheral numbering rules used to choose grid layout that gives the best numbering
  • 22. 22 Beyond IUPAC: CAS index name un-inversion CAS Index Name IUPAC name benzene, ethyl- ethylbenzene Disulfide, bis(2-chloroethyl) Bis(2-chloroethyl) disulfide Benzoic acid, 4,4’-methylenebis[2-chloro- 4,4'-Methylenebis[2-chlorobenzoic acid] Phosphoric acid, ethyl dimethyl ester ethyl dimethyl phosphate
  • 23. 23 Beyond IUPAC: Correcting missing spaces tert-butylacetate tert-butyl acetate tert-butyl-4-vinylperbenzoate No locant and perbenzoate has more than one non-degenerate hydrogen diethylcarbonate Has no substitutable hydrogen Ethylacetate non-ester would be butanoate or butyrate!
  • 24. 24 Performance on machine- generated names 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ACD/Name 12.02 Names ChemBioDraw13 Names Lexichem 2.1 Names Marvin 6.0.2 Names No Result Constitutional Discrepancy Stereochemical Discrepancy Correctly Interpreted 30,000 structures randomly selected from PubChem used as input to machine-generate names
  • 25. 25 Performance on unique names from US patent headings
  • 26. 26 What’s not supported • Parsing of generic chemical names • E.g. 2- or 3- alkylsubstitutedbenzofurans • Advanced inorganic nomenclature e.g. coordinate bonding • Some natural product nomenclature • Advanced stereochemistry e.g. pseudo asymmetric stereo centers, axial stereochemistry etc.
  • 27. 27 Usage Batch conversion on the command line RESTful web service (opsin.ch.cam.ac.uk) NameToStructure nts = NameToStructure.getInstance(); String chemicalName = "acetonitrile"; String smiles = nts.parseToSmiles(chemicalName); Java API java -jar opsin-1.5.0-jar-with-dependencies.jar -osmi input.txt output.smi
  • 28. 28 Who is using OPSIN? Commercial software Cinfony (interface to Python) Many text mining efforts Workflows Web services
  • 29. 29 Conclusions • OPSIN combines high recall, precision and speed of execution • Recent improvements have significantly improved coverage of biochemical nomenclature Visit opsin.ch.cam.ac.uk to try it out and download!
  • 30. 30 OPSIN: Taming the jungle of IUPAC chemical nomenclature daniel@nextmovesoftware.com For more information see: Chemical Name to Structure: OPSIN, an Open Source Solution J. Chem. Inf. Model., 2011, 51 (3), pp 739–753 Extraction of chemical structures and reactions from the literature (https://www.repository.cam.ac.uk/handle/1810/244727) Acknowledgements Albina Asadulina Rich Apodaca Peter Corbett Roger Sayle Funding