Self-contained sequence representation (SCSR): A hybrid approach for representing biomolecules
1. Self-contained sequence representation (SCSR): Bridging the gap between bioinformatics and cheminformatics K. T. Taylor, W. L. Chen, B. D. Christie, Joseph L. Durant, D. L. Grier, B. A. Leland, J. G. Nourse Recent Progress in Chemical Structure Representation ACS Boston 2010: CINF Division of Chemical Information August 23, 2010
4. Residue-based Representation: Alphabet of Life Jamey D. Marth, "A Unified Vision of the Building Blocks of Life", Nature Cell Biology, Vol 10, pg 1015-1016, 2008
68 molecules identified in the "alphabet of life"
Additions to prochiral centers: most of the chain atoms in amino acids are prochiral, and many can undergo OH addition reactions. The presence of hydroxyproline was one of the modifications found in T. Rex collagen fragments which were recently sequenced.
Extract the azolinone core from Green Fluorescent Protein to use as a query
Extract the azolinone core from Green Fluorescent Protein to use as a query
Extract the azolinone core from Green Fluorescent Protein to use as a query
The attachment point contains both the atom and optionally the bond to the leaving group.
Anantin ANF (atrial natriuretic factor) antagonist Sequence numbering is supported Cyclization by an isoaspartyl glycine isopeptide linkage (Gly-Asn) β Attachment point information is stored, so isopeptide linkages can be disambiguated. Compression factor 8x (135 heavy > 17 template atoms) β D - Aspartate E - Glutamate F - Phenylalanine G - Glycine H - Histidine I - Isoleucine K - Lysine L - Leucine N - Asparagine S - Serine W - Tryptophan Y - Tyrosine
Bacitracin A Explicit Chemistry shown in blue Modifications: 1) loss of methylimidamide group from arginine 2) thiazole formation between Cysteine and Isoleucine 100 heavy atoms, 29 in compressed form. Note that attachment point information allows determination that isoleucine is bonded to the omega-N in lysine. Attachment point information also allows identification of the amino acid N and C terminii in the ring. D - Aspartate E - Glutamate F - Phenylalanine H - Histidine I - Isoleucine K - Lysine L - Leucine N - Asparagine
Again, we have suppressed the amino acid sequences on either side of the custom amino acids found in our search
Have used this to generate a number of databases, including ~7500 human proteins