The document discusses representing non-standard peptides and biopolymers using systematic amino acid and monomer naming conventions. It proposes using semi-systematic naming rules to represent peptides and biopolymers containing modified or non-standard subunits in a way that is comprehensible to scientists. The rules include using D-/L- prefixes to represent stereoconfiguration, retaining widely used 3-letter codes, using modifying prefixes to specify substitutions, representing substitutions through line formulae, and specifying default or optional substitution locants. Standardizing the naming conventions could help ensure unambiguous representation of peptides in databases and registration systems.
The availability of systematic names can enable the simple textual exchange of chemical structure information. The exchange of molecular structures in graphical format or connection tables has become well established in the field of cheminformatics and many structure drawing tools exist to enable this exchange. However, even with the availability of systematic naming rules, software tools to allow the generation of names from structures, and hopefully the reversal of these systematic names back to the original chemical structure, have been sorely lacking in capability and quality. Here we review the need for systematic naming as well as some of the tools and approaches being taken today in this area.
Here is a short an detailed explanation on nomenclature of steroids for pharmacy students. It includes all the points necessary for the topic along with an example for better understanding
The availability of systematic names can enable the simple textual exchange of chemical structure information. The exchange of molecular structures in graphical format or connection tables has become well established in the field of cheminformatics and many structure drawing tools exist to enable this exchange. However, even with the availability of systematic naming rules, software tools to allow the generation of names from structures, and hopefully the reversal of these systematic names back to the original chemical structure, have been sorely lacking in capability and quality. Here we review the need for systematic naming as well as some of the tools and approaches being taken today in this area.
Here is a short an detailed explanation on nomenclature of steroids for pharmacy students. It includes all the points necessary for the topic along with an example for better understanding
OPSIN: Taming the jungle of IUPAC chemical nomenclaturedan2097
OPSIN (Open Parser for Systematic IUPAC Nomenclature) is an open source freely available program for converting chemical names, especially those that are systematic in nature, to chemical structures. The software is available as a Java library, command-line interface and as a web service (opsin.ch.cam.ac.uk). OPSIN accepts names that conform to either IUPAC or CAS nomenclature and can convert them to SMILES, InChI and CML (Chemical Markup Language).
OPSIN has grown from covering only simple general organic chemical nomenclature to the point of having competent coverage of all areas of organic chemical nomenclature. One of the most recent additions is comprehensive support for the nomenclature of carbohydrates. This brings support for dialdoses, diketoses, ketoaldoses, alditols, aldonic acids, uronic acids, aldaric acids, glycosides and oligosacchardides, in both the open chain and cyclic forms, named systematically or from trivial sugar stems with support for modification terms such as anhydro or deoxy.
OPSIN’s support for specialised and general organic nomenclature will be demonstrated through illustrative examples and accompanying performance metrics. We focus in particular on areas of nomenclature for which support was recently added and those that are complex to implement such as fused ring nomenclature.
All US patents from 1976 onwards are freely available in computer-readable formats providing a large corpus for chemical text mining. Other patent offices are increasingly also offering their back-catalogues as XML, allowing chemical text mining to be performed in the same way as for recent US patents.
We investigate how much chemistry is found in non-US patents (compared to US patents) and, where the chemistry is present in publications from multiple patent offices, how long were the delays between these publications.
We show that non-US patents can be text mined for a large number of chemical reactions and analyse the overlap with reactions from US patents. Finally we use all the extracted chemical reactions to explore whether models for predicting reaction yield may be built from features such as the reaction type and its reaction conditions (as text mined from the patent text).
OPSIN: Taming the jungle of IUPAC chemical nomenclaturedan2097
OPSIN (Open Parser for Systematic IUPAC Nomenclature) is an open source freely available program for converting chemical names, especially those that are systematic in nature, to chemical structures. The software is available as a Java library, command-line interface and as a web service (opsin.ch.cam.ac.uk). OPSIN accepts names that conform to either IUPAC or CAS nomenclature and can convert them to SMILES, InChI and CML (Chemical Markup Language).
OPSIN has grown from covering only simple general organic chemical nomenclature to the point of having competent coverage of all areas of organic chemical nomenclature. One of the most recent additions is comprehensive support for the nomenclature of carbohydrates. This brings support for dialdoses, diketoses, ketoaldoses, alditols, aldonic acids, uronic acids, aldaric acids, glycosides and oligosacchardides, in both the open chain and cyclic forms, named systematically or from trivial sugar stems with support for modification terms such as anhydro or deoxy.
OPSIN’s support for specialised and general organic nomenclature will be demonstrated through illustrative examples and accompanying performance metrics. We focus in particular on areas of nomenclature for which support was recently added and those that are complex to implement such as fused ring nomenclature.
All US patents from 1976 onwards are freely available in computer-readable formats providing a large corpus for chemical text mining. Other patent offices are increasingly also offering their back-catalogues as XML, allowing chemical text mining to be performed in the same way as for recent US patents.
We investigate how much chemistry is found in non-US patents (compared to US patents) and, where the chemistry is present in publications from multiple patent offices, how long were the delays between these publications.
We show that non-US patents can be text mined for a large number of chemical reactions and analyse the overlap with reactions from US patents. Finally we use all the extracted chemical reactions to explore whether models for predicting reaction yield may be built from features such as the reaction type and its reaction conditions (as text mined from the patent text).
CINF 4: Naming algorithms for derivatives of peptide-like natural productsNextMove Software
The nomenclature of natural products is a highly specialized field of biochemistry. Fortunately, some classes of natural products are more amenable to computer analysis than others. Non-ribosomal peptides and heavily post-translationally modified peptides, such as derivatives of the homodetic cycles gramicidin S and the cyclic depsi-peptide valinomycin and the natural product cyclic isopeptides anantin and sungsanpin push the current state-of-the-art in automated natural product naming. Where a compound is structurally related to an existing peptide, perceiving this relationship is required for generating succinct human understandable names. In this talk, we describe the use of databases/dictionaries based upon HELM notation and IUPAC's condensed line notations for specifying 'parent' peptides from which derivatives and analogues can be named. Using the described techniques the name '[5-L-valine]dichotomin C' may be assigned to the cyclic peptide CHEMBL478596. These techniques have been successfully used to identify and correct naming issues in the UniProt and IUPhar/BPS guide to pharmacology databases, which have then been updated by their curators.
Classification, Nomenclature and structural isomerism of organic compound Ganesh Mote
Classification of organic compound, Nomenclature of alkane, alkene, alkyne, alcohol, alkyl halide, aldehyde, ketone, carboxylic acid and its derivatives, amines, ethers, polyfunctional groups and structural isomerism of organic compounds
Both RNA and DNA are made of nucleotides and take similar shapes. Both contain five-carbon sugars, phosphate groups, and nucleobases (nitrogenous bases). They both play important roles in protein synthesis. DNA has the five-carbon sugar deoxyribose and RNA has the five-carbon sugar ribose, hence their names
Hydrocarbon nomenclature
1. Naming Hydrocarbons (nomenclature)
2. Drawing Structures: It’s All Good CH3 C H C H CH3 CH3 CH3 CH3 CH CH CH3 2-butene This is called the “condensed structure” C C C C H H H H H H H H CH3 CH CH CH3 On a test, choose a method that shows all Hs CH3CH=CHCH3 Using brackets can also shorten some formulas: CH3CH2CH2CH2CH2CH3 vs. CH3(CH2)4CH3
3. Basic Naming of Hydrocarbons Hydrocarbon names are based on: 1) type, 2) # of carbons, 3) side chain type and position 1) name will end in -ane, -ene, or -yne 2) the number of carbons is given by a “prefix” 1 meth- 2 eth- 3 prop- 4 but- 5 pent- 6 hex- 7 hept- 8 oct- 9 non- 10 dec- Actually, all end in a, but a is dropped when next to a vowel. E.g. a 6 C alkene is hexene Q - What names would be given to these: 7C, 9C alkane 2C, 4C alkyne 1C, 3C alkene heptane, nonane ethyne, butyne methene, propene
4. Mnemonic for First Four Prefixes First four prefixes • Meth- • Eth- • Prop- • But- Monkeys Eat Peeled Bananas
5. ? Decade Decimal Decathalon Other Prefixes •
Similar to Representation and display of non-standard peptides using semi-systematic amino acid monomer naming (20)
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
Prediction is much harder than analysis. Consider hurricanes and tornadoes; it's much easier to follow the path of destruction by locating devastated neighborhoods, than to forecast the paths of such weather systems in advance. Likewise for many chemical reactions, such as nitration (by refluxing with nitric acid and sulfuric acid) where the appearance of one or more nitro groups indicates a nitration reaction, but predicting where on a non-trivial organic molecule this functional group appears is a much harder challenge. In this sense, reaction analysis is much simpler than (either forward or retrosynthetic) synthesis planning.
NextMove Software's namerxn is an expert system for classifying reactions (from reaction SMILES, MDL connection tables or ChemDraw sketches) typically assigning each reaction instance to a leaf classification in the Royal Society of Chemistry's RXNO ontology. These tools can be helpful in the analysis of regioselectivity preferences of reactions.
This talk consists of two parts. A technical part describing the recent algorithmic and methodological improvements to the namerxn software, including describing some of the more challenging of the 1000+ reactions it currently identifies. And a scientific part that investigates the regioselective preferences of some of these reactions.
CINF 35: Structure searching for patent information: The need for speedNextMove Software
Chemical databases grow larger every year. Without investing in additional hardware or improved software, the time to search these databases will in turn grow longer annually. With an ever-increasing number of pharmaceutical patents, the amount of chemical data associated with these is growing at a rate with which hardware advances alone cannot keep up.
Using automated mining of U.S. and European patents, we have extracted large collections of structural data in the form of reactions, mixtures, and exemplified compounds. Additional information such as protein targets and diseases are also extracted from each patent and associated with the structural data. We will describe how this data can be queried with natural language phrases and how these phrases are interpreted as structural queries.
Through innovations in substructure and similarity search algorithms, it is possible to search and retrieve hundreds of millions of chemical records in fractions of a second. We will demonstrate how this is achieved on a regular desktop machine using just-in-time and ahead-of-time compilation techniques.
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
Presented by Roger Sayle at the 11th International Conference on Chemical Structures (ICCS) 2018.
Exponential database growth will always be a technical challenge but evolutionary strategies offer to hold off the inevitable in the short term and revolutionary strategies promise a longer-term solution.
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
The Cahn-Ingold-Prelog (CIP) priority rules have been the corner stone in written communication of stereo-chemical configuration for more than half a century. The rules rank ligands around a stereocentre allowing an atom order and layout invariant stereo-descriptor to be assigned, for example R (right) or S (left) for tetrahedral atoms. Despite their widespread daily use, many chemists may be surprised to find that beyond trivial cases, different software may assign different labels to the same structure diagram.
There have been several attempts to either replace or amend the CIP rules. This talk will highlight the more challenging aspects of the ranking and present a comparison of software that provide CIP labels and where they disagree. Providing an IUPAC verified free and open source CIP implementation would allow software maintainers and vendors to validate and improve their implementations. Ultimately this would improve the accuracy in exchange of written chemical information for all.
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
We have previously described the extraction of reactions from US and European patents. This talk will discuss the assembly of over six million extracted reaction details consisting of the connection tables, procedure, quantities, solvents, catalysts and yields into a searchable "read-only" Electronic Lab Notebook.
In addition to reactions details, concepts including diseases, drug targets, and assignees are recognised from the patent documents and normalised to appropriate ontologies. Each normalised term is paired with the reaction details found in the document to allow intuitive cross concept querying (e.g. "GlaxoSmithKline C-C Bond Formation greater than 80% yield Myocardial Infarction"). Reactions are classified and assigned to leafs in the RXNO Ontology. The ontologies are used to provide organisation, faceting, and filtering of results. The reaction classification also provides a precise atom mapping that facilitates structural transformation queries and can improve reaction diagram layout.
Through improvements in substructure search technology we will demonstrate several types of chemical synthesis queries that can be efficiently answered. The combination of high performance chemical searching and additional document terms provides a powerful exploratory and trend analysis tool for chemists.
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
The molfile serves as a de facto standard for chemical information exchange. It is perhaps the most widely supported format with its core syntax being easy to understand, parse, and generate. Beyond the core syntax, more advanced features such as sgroups and enhanced stereochemistry are rarely supported, often only being partially implemented and used. Additionally, several vendors, toolkits, and service providers have added extended syntax to their molfiles to solve particular corner cases or representation problems. This talk will provide a brief summary of the less widely supported features of the molfile including sgroups and enhanced stereochemistry. Additionally, a survey of how extensions can/have been implemented and which extensions exist "in the wild". Special attention will be paid to capturing of coordination bonds for extending the InChI to handle organometallics.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Representation and display of non-standard peptides using semi-systematic amino acid monomer naming
1. representation and display of
non-standard peptides using
semi-systematic amino acid
monomer naming
Roger Sayle
Nextmove software, cambridge, uk
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
2. The primordial soup
• The vast majority of chemical entities stored in
chemical databases and pharmaceutical registration
systems are classical “small molecules”.
• For these molecules, the universal cheminformatics
“all-heavy-atom” representations work well,
providing duplicate checking, substructure search,
normalization, depiction, etc.
• Solutions include InChI, SMILES and V2000 mol file.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
3. Biopolymers of life
• The low Shannon entropy of a small,
but scientifically significant, fraction
of compounds, biopolymers, allow
them to represented more succinctly.
• For these frequent subunits (or
monomers) can be used instead of
atoms in connection tables.
• Representations with small monomer
sets are easier for scientists to
comprehend.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
4. Peptide representation
The all-atom depiction (left) is more complicated to
interpret than the 3-letter code and 1-letter code
forms (right).
Recognizing this is [Asp5]oxytocin is perhaps only
possible at the “sequence” level.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
5. Protein representation
All atom representation vs. 3-letter and 1-letter sequence forms of crambin.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
7. Sequence bioinformatics
• Traditional bioinformatics represents sequences as
one letter codes, often in FASTA or Uniprot format.
• Four characters for nucleic acids:
– A,C,G and T for DNA and A,C,G and U for RNA.
• 20 (or 22) characters for proteinogenic amino acids:
– A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y and V (plus O and U).
• The challenge arises when more than the primary
sequence needs to be represented.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
8. 3-letter codes to the rescue?
• The use of 3-letter codes to represent monomers
allows for significantly more possible monomers.
• Conveniently common AAs have standard codes:
– Ala, Arg, Asn, Asp, Cys, Glu, Gln, Gly, His, Ile, Leu, Lys, Met,
Phe, Pro, Ser, Thr, Trp, Tyr and Val, plus Sec and Pyl.
• These assignments are mandated by IUPAC and IUB
recommendations known as 3AA (dated 1983).
• Unfortunately the prevalence of standard amino
acids and 3-letter codes has had adverse affects.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
9. Pdb 3-letter residue codes
• wwPDB’s components.cif currently lists 17,764
3-letter residue names (digits and capitals).
• These contain not only amino acids, but
nucleic acids, carbohydrates, ligands, salts
solvents and co-factors.
• Completely different codes are assigned to the
L- and D- forms of amino acids.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
10. Monomer database solutions
• Biopolymer systems based on monomer databases
and strict 3-letter codes breakdown when faced with
the huge number of potential chemical and post-
translational modifications.
• At best, a human being can remember 50-100 codes
before having to consult “dictionary” to lookup
solutions.
• Worse, once beyond the commonly encountered
amino acids, biochemists disagree on the meaning of
3-letter codes…
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
15. Bitter Sweet Conclusion
• In practice, a glycoinformatics registration system has
to handle over 232 monosaccharide monomers.
• This number is clearly too large to assign each its
own 3-letter code or to use a monomer database.
• For example, monosaccharidedb.org curates a
database of 771 monosaccharides, whereas 936 of
the simple monosaccharides on the previous page
can be found in PubChem.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
16. The aa naming solution
• The solution to this problem is the systematic
naming of amino acid derivatives.
• This approach is well known to peptide chemists,
with these names routinely used in scientific articles,
patent applications and supplier catalogues, and
easily recognizable by chemists.
• Alas this “de facto” nomenclature, which hasn’t been
formalized by IUPAC nor CAS, has been overlooked
by the informatics community.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
18. monosaccharide examples
• In fact constructing systematic names is the same
solution as used by the glycoinformatics community
– b-D-GlcpNAc, a-D-Neup5Ac, b-D-GlcpA, D-GlcNAc-ol, a-D-
GlcpN, a-L-4-en-4-deoxy-thrHexpA, b-D-Galp3Me, b-D-6-
deoxy-Glcp, b-D-2,6-deoxy-ribHexp, b-D-GlcpA6Me, a-D-
Rhap4N-formyl, b-D-Glcp6Ac, a-D-Neup5Ac9Ac…
• And in RNA informatics
– P-rGuo, P-Gua-Rib2Me, P-m22Gua-Rib, P-m2Gua-Rib, P-
Cyt-Rib2Me, P-m5Cyt-Rib, P-m5Ura-Rib, P-m1Ade-Rib…
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
19. Rule #1: use of D- and DL- prefixes
• Rather than have separate code forms for D- form
(and DL-) amino acids, the widely understood and
accepted stereo configuration prefixes “D-”, “L-”
(default) and “DL-” should be used.
– D-Arg
– DL-Ala
– D-Ser
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
20. Rule #2: retain widely used 3LC’s.
• Universally accepted code/definitions should be used
– Abu (2-aminobutyric acid), Aib (2-aminoisobutyric acid),
aIle (allo-isoleucine), aThe (allo-threonine), bAla (beta-
alanine), Cha (beta-cyclohexylalanine), Chg (alpha-
cyclohexylglycine), Cit (Citrulline), Dab (2,3-diaminobutyric
acid), Dap (2,3-diaminopropionic acid), hArg
(homoarginine), Hcy (homocysteine), Hse (homoserine),
hPhe (homophenylalanine), Nle (norleucine), Nva
(norvaline), Orn (ornithine), Pen (penicillamine), Phg
(phenylglycine), Sar (sarcosine), Sec (selenocysteine), 2Thi
(thien-2-ylalanine), 3Thi (thien-3-ylalanine), xiIle (xi-
isoleucine), xiThr (xi-threonine), and many more.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
21. Rule #3: modifying prefixes
• “N(Me)” and friends are used to specify N-modified
variants. “N(Me)Gly” is equivalent to “Sar”.
• The “O” prefix is used to specify depsi-peptides that
connect via ester linkages, e.g. “Arg-OAla-Ser”.
• The “aMe” prefix is used to specify alpha-methyl
variants of amino acids. “aMeAla” is same as “Aib”.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
22. Rule #4: line formulae substs
• Substitutions are indicated by line formulae
– Ph phenyl CF3 trifluoromethyl
– Me methyl Tos tosyl
– Ac acetyl Trt trityl
– Cl chloro Tf triflyl
– CN cyano SO3H sulfonic acid
– Bn benzyl OPfp perfluorophenoxy
– NH2 amino iPr isopropyl
– NO2 nitro For formyl
– tBu tert-butyl … and many more
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
23. RULE #5: DEFAULT SUBSTITUTION
• Many amino acids have default substitution locants
– Abu ABA.CG
– Ala ALA.CB
– Arg ARG.NH2
– Cys CYS.SG
– Dab DAB.ND
– Dap DPP.NG
– Gly GLY.CA
– Lys LYS.NZ
– Ser SER.OG
– Thr THR.OG1
– Tyr TYR.OH
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
24. RULE #6: specified SUBSTITUTION
• Amino acids may be substituted at particular locant
using the following syntax:
– Phe(4-Cl)
– Phg(4-CH2NH2)
– Tyr(2,6-diCl-Bn)
• Substitutions may occur at more than one locant:
– His(2,5-diI)
– Tyr(2,6-diF)
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
25. Rule #7: implicit leaving groups
• For most amino acids, parenthesized substitution
implies replaced of hydrogen.
– Ala(Cl)
• Carboxylic acids assume an OH leaving group,
requiring an explicit “oxy” to form esters.
– Asp(NH2)
– Asp(OMe)
• Cysteine (and penicillamine) substitute on the sulfur
– Cys(tBu)
– Cys(StBu)
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
26. The need for standardization #1
• For peptide registration, we need a canonical name.
• Preferred forms are most visible in line notations.
– For carboxy, do we prefer “COOH” or “CO2H”.
– For acetyl, “Ac” or “COCH3”.
– For ethylamine, “EtNH2” or “CH2CH2NH2”.
– For phosphate, “PO3H2” vs. “P”.
• However decomposition also affects preferred forms
– Gly(Me) is the same as Ala
– Asp(NH2) is the same as Asn
– Phe(4-Ph) is the same as 4-Bip
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
27. The need for standardization #2
• More seriously we need to avoid ambiguous names.
• It is widely acknowledge by peptide chemists that
“Ac” means acetyl, rather than Actinium, and
connects to the parent at the carbonyl carbon.
• In supplier catalogues for a German vendor, the
name “Cys(AcOH)” implies “*CC(=O)O” instead of
“*C(=O)CO”.
• Which names should be used, “COCH2OH” etc.?
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
28. Example Peptide names (chembl)
• The following names are machine generated
• H-Cys-Pro-Trp-His-Leu-Leu-Pro-Phe-Cys-OH CHEMBL501567
• H-Tyr-Pro-Phe-Phe-OtBu CHEMBL500195
• cyclo[Ala-Tyr-Val-Orn-Leu-D-Phe-Pro-Phe-D-Phe-Asn] CHEMBL438006
• H-Nle(Et)-Tyr-Pro-Trp-Phe-NH2 CHEMBL500704
• H-DL-hPhe-Val-Met-Tyr(PO3H2)-Asn-Leu-Gly-Glu-OH CHEMBL439086
• cyclo[Phe-D-Trp-Tyr(Me)-D-Pro] CHEMBL507127
• H-D-Pyr-D-Leu-pyrrolidide CHEMBL1181307
• Ac-DL-Phe-aThr-Leu-Asp-Ala-Asp-DL-Phe(4-Cl)-OH CHEMBL1791047
• H-D-Cys(1)-D-Asp-Gly-Tyr(3-NO2)-Gly-Hyp-Asp-D-Cys(1)-NH2 CHEMBL583516
• Boc-Tyr-Tyr(3-Br)-OMe CHEMBL1976073
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
29. Example Peptide names (pubchem)
• The following names are existing supplier synonyms
• Boc-Asp(OtBu)-Pro-OH CID57383532
• Tyr-D-Ala-Gly-NMePhe CID45483974
• Tyr-D-Pro-Gly-Trp-NMeNle-Asp-Phe-NH2 CID11693641
• Ac-Cys-Ile-Tyr-Lys-Phe(4-Cl)-Tyr CID44412001
• Z-D-Val-Lys(Z)-OH CID5978?
• deamino-Cys-D-Tyr(Et)-Ile-Thr-Asn-Cys-Pro-Orn-Gly-NH2 CID68613
• H-Arg(NO2)-OMe.HCl CID135193
• Unlike HELM or PLN, these IUPAC condensed line
notations (3AA) are already used in publications.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
30. The big win: display of peptides
• One of the largest end-user benefits of systematic
naming is in the improved display of compounds:
Structure of Degarelix from wikipedia: http://en.wikipedia.org/wiki/Degarelix
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
31. The big win: Display of peptides
• The image of Degarelix becomes optimal (IMHO)
Image credit: WO2011066386A1
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
32. A real example: CID10033747
CC(C)[C@H]1C(=O)N[C@H](C(=O)N[C@H](
C(=O)N2CCC[C@H]2C(=O)N[C@H](C(=O)N
[C@H](C(=O)N1)CC3=CC=C(C=C3)OP(=O)(
O)O)CSSC[C@@H](C(=O)O)NC(=O)C)C(C)
C)CC(=O)N
IUPAC depiction
cyclo[Asn-Val-Pro-Cys(1)-Tyr(PO3H2)-Val].
Ac-Cys(1)-OH.
IUPAC Condensed
SMILES
PEPTIDE1{[ac].C}|PEPTIDE2{N.V.P.C}|
CHEM1{*N[C@@H](Cc1ccc(cc1)OP(=O)
(O)O)C(=O)*|$_R1;;;;;;;;;;;;;;;;;_R2$|}|
PEPTIDE3{V}$
PEPTIDE1,PEPTIDE2,2:R34:R3|
PEPTIDE2,CHEM1,4:R2-1:R1|
CHEM1,PEPTIDE3,1:R21:R1|
PEPTIDE3,PEPTIDE2,1:R2-1:R1$$$
HELM
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
33. conclusions
• The use of 3-letter codes to represent non-standard
amino acids doesn’t scale to peptide registration
systems.
• Adopting the systematic rules already frequently
used by peptide chemists looks to solve a number of
amino acid nomenclature challenges faced by PDB,
HELM and the pharmaceutical industry.
• This presentation/proposal formalizes some of the
syntax/rules for constructing systematic codes for
non-standard amino acids.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014
34. acknowledgements
• Lisa Sach-Peltason, Hoffmann-La Roche, Basel.
• Evan Bolton, NCBI PubChem project, Bethesda, MD.
• Noel O’Boyle, NextMove Software, Cambridge, UK.
• Daniel Lowe, NextMove Software, Cambridge, UK.
247th ACS National Meeting, Dallas, TX, Wednesday 19th March 2014