Co-opting the Genetic
Code
DNA Binary Encoding Protocol
Amit Snyderman, ITP Design Frontiers, 2010
Genetic Code
The genetic code is the set of rules by which
information encoded in genetic material is translated
into proteins by living cells.
DNA
Deoxyribonucleic acid (DNA) contains the genetic
instructions used in the development and functioning
of all known living organisms. The main role of DNA
molecules is the long-term storage of information.
Units of the Code
Adenosine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
Codons
Mapping between nucleotide triplets and amino acids
43 combinations = 64 possible codons

http://upload.wikimedia.org/wikipedia/en/d/d6/GeneticCode21-
version-2.svg
Amino Acids
20 amino acids

Just as the letters of the alphabet can be combined to
form an almost endless variety of words, amino acids
can be linked together in varying sequences to form a
vast variety of proteins.

http://upload.wikimedia.org/wikipedia/commons/3/37/Aa.svg
Proteins
Every protein is chemically defined by its unique
sequence of amino acid residues, which in turn define
the three-dimensional structure of the protein.
MIME Base64
Encoding scheme that encodes binary data by treating
it numerically and translating it into a base 64
representation.
      TEXT              M         a         n
     ASCII              77        97       110
   BIT PATTERN    010011010110000101101110
     INDEX          19       22        5        46
 BASE64-ENCODED     T        W         F        u
Remap
Rather than mapping to a character, map to a codon.
VALUE   CHARACTE   CODON     AMINO ACID      VALUE   CHARACTE   CODON    AMINO ACID     VALUE   CHARACTE   CODON     AMINO ACID
           R                                            R                                          R
 0         A       AAA       Lysine (K)       22        W       CCG      Proline (P)     43        r       GGT       Glycine (G)

 1         B       AAC     Asparagine (N)     23        X       CCT      Proline (P)     44        s       GTA       Valine (V)

 2         C       AAG       Lysine (K)       24        Y       CGA     Arginine (R)     45        t       GTC       Valine (V)

 3         D       AAT     Asparagine (N)     25        Z       CGC     Arginine (R)     46        u       GTG       Valine (V)

 4         E       ACA     Threonine (T)      26        a       CGG     Arginine (R)     47        v       GTT       Valine (V)

 5         F       ACC     Threonine (T)      27        b       CGT     Arginine (R)     48        w       TAA         STOP

 6         G       ACG     Threonine (T)      28        c       CTA      Leucine (L)     49        x       TAC      Tyrosine (Y)

 7         H       ACT     Threonine (T)      29        d       CTC      Leucine (L)     50        y       TAG         STOP

 8         I       AGA      Arginine (R)      30        e       CTG      Leucine (L)     51        z       TAT      Tyrosine (Y)

 9         J       AGC       Serine (S)       31        f       CTT      Leucine (L)     52        0       TCA       Serine (S)

 10        K       AGG      Arginine (R)      32        g       GAA     Glutamate (E)    53        1       TCC       Serine (S)

 11        L       AGT       Serine (S)       33        h       GAC     Aspartate (D)    54        2       TCG       Serine (S)

 12        M       ATA      Isoleucine (I)    34        i       GAG     Glutamate (E)    55        3       TCT       Serine (S)

 13        N       ATC      Isoleucine (I)    35        j       GAT     Aspartate (D)    56        4       TGA         STOP

 14        O       ATG     Methionine (M)     36        k       GCA      Alanine (A)     57        5       TGC       Cystine (C)

 15        P       ATT      Isoleucine (I)    37        l       GCC      Alanine (A)     58        6       TGG     Tryptophan (W)

 16        Q       CAA     Glutamine (Q)      38        m       GCG      Alanine (A)     59        7       TGT       Cystine (C)

 17        R       CAC      Histidine (H)     39        n       GCT      Alanine (A)     60        8       TTA       Leucine (L)

 18        S       CAG     Glutamine (Q)      40        o       GGA      Glycine (G)     61        9       TTC     Phenylalanine
                                                                                                                        (F)
 19        T       CAT      Histidine (H)     41        p       GGC      Glycine (G)     62        +       TTG      Leucine (L)

 20        U       CCA       Proline (P)      42        q       GGG      Glycine (G)     63        /       TTT     Phenylalanine
                                                                                                                        (F)
 21        V       CCC       Proline (P)
Example: Hello,
world!
    BASE64       SGVsbG8sIHdvcmxkIQo=
  BASE64 INDEX   18 6 21 44 27 6 60 44 8 7 29 47 28 38 49 36 8 16 40
     DNA         CAGACGCCCGTACGTACGTTAGTAAGAACTCTCGTTCTAGCGTACGCAAG
                 ACAAGGA
    PROTEIN      QTPVRTLVRTLVLAYARQG


Any binary data can be represented: text (unicode),
bitmap, audio, video, etc.
Try It: http://amitsnyderman.com/school/designfrontiers/
encoder.php
Disk is cheap.
Who cares?
Non-Digital Library
Via recombinant DNA technologies, craft a portable,
reproducible, time-resistant library. Embed, grow and
spread in bacteria. Package as a pill. Organic time
capsule.
Spime
"The key to the Spime is identity. A Spime is, by
definition, the protagonist of a documented process. It
is an historical entity with an accessible, precise
trajectory through space and time."
                           –Bruce Sterling, Shaping Things
Spime Ingredients
Unique ID code
History of ownership
Geographical position
Customization details
Public discourse
Etc.
Human
Identification
Tools, artifacts, archeology
Bone structure, teeth, dental records, fingerprints, DNA
Yellowpages, resume, Facebook/LinkedIn/etc, Google
Narrative
Embedded Histories. Family trees and Lineage. Stories.
Junk DNA
Noncoding DNA describes sequences that do not
encode for protein sequences. Much of this DNA has no
known biological function and is sometimes referred to
as "junk DNA".

More than 98% of the human genome is non-coding.
Human Spime
Recycle junk DNA by recombining encoded messages
into non-coding DNA regions.
In-vitro manipulation. Gene therapy.
Hereditary storytelling.

DNA Encoding Protocol

  • 1.
    Co-opting the Genetic Code DNABinary Encoding Protocol Amit Snyderman, ITP Design Frontiers, 2010
  • 2.
    Genetic Code The geneticcode is the set of rules by which information encoded in genetic material is translated into proteins by living cells.
  • 3.
    DNA Deoxyribonucleic acid (DNA)contains the genetic instructions used in the development and functioning of all known living organisms. The main role of DNA molecules is the long-term storage of information.
  • 4.
    Units of theCode Adenosine (A) Cytosine (C) Guanine (G) Thymine (T)
  • 5.
    Codons Mapping between nucleotidetriplets and amino acids 43 combinations = 64 possible codons http://upload.wikimedia.org/wikipedia/en/d/d6/GeneticCode21- version-2.svg
  • 6.
    Amino Acids 20 aminoacids Just as the letters of the alphabet can be combined to form an almost endless variety of words, amino acids can be linked together in varying sequences to form a vast variety of proteins. http://upload.wikimedia.org/wikipedia/commons/3/37/Aa.svg
  • 7.
    Proteins Every protein ischemically defined by its unique sequence of amino acid residues, which in turn define the three-dimensional structure of the protein.
  • 8.
    MIME Base64 Encoding schemethat encodes binary data by treating it numerically and translating it into a base 64 representation. TEXT M a n ASCII 77 97 110 BIT PATTERN 010011010110000101101110 INDEX 19 22 5 46 BASE64-ENCODED T W F u
  • 9.
    Remap Rather than mappingto a character, map to a codon.
  • 10.
    VALUE CHARACTE CODON AMINO ACID VALUE CHARACTE CODON AMINO ACID VALUE CHARACTE CODON AMINO ACID R R R 0 A AAA Lysine (K) 22 W CCG Proline (P) 43 r GGT Glycine (G) 1 B AAC Asparagine (N) 23 X CCT Proline (P) 44 s GTA Valine (V) 2 C AAG Lysine (K) 24 Y CGA Arginine (R) 45 t GTC Valine (V) 3 D AAT Asparagine (N) 25 Z CGC Arginine (R) 46 u GTG Valine (V) 4 E ACA Threonine (T) 26 a CGG Arginine (R) 47 v GTT Valine (V) 5 F ACC Threonine (T) 27 b CGT Arginine (R) 48 w TAA STOP 6 G ACG Threonine (T) 28 c CTA Leucine (L) 49 x TAC Tyrosine (Y) 7 H ACT Threonine (T) 29 d CTC Leucine (L) 50 y TAG STOP 8 I AGA Arginine (R) 30 e CTG Leucine (L) 51 z TAT Tyrosine (Y) 9 J AGC Serine (S) 31 f CTT Leucine (L) 52 0 TCA Serine (S) 10 K AGG Arginine (R) 32 g GAA Glutamate (E) 53 1 TCC Serine (S) 11 L AGT Serine (S) 33 h GAC Aspartate (D) 54 2 TCG Serine (S) 12 M ATA Isoleucine (I) 34 i GAG Glutamate (E) 55 3 TCT Serine (S) 13 N ATC Isoleucine (I) 35 j GAT Aspartate (D) 56 4 TGA STOP 14 O ATG Methionine (M) 36 k GCA Alanine (A) 57 5 TGC Cystine (C) 15 P ATT Isoleucine (I) 37 l GCC Alanine (A) 58 6 TGG Tryptophan (W) 16 Q CAA Glutamine (Q) 38 m GCG Alanine (A) 59 7 TGT Cystine (C) 17 R CAC Histidine (H) 39 n GCT Alanine (A) 60 8 TTA Leucine (L) 18 S CAG Glutamine (Q) 40 o GGA Glycine (G) 61 9 TTC Phenylalanine (F) 19 T CAT Histidine (H) 41 p GGC Glycine (G) 62 + TTG Leucine (L) 20 U CCA Proline (P) 42 q GGG Glycine (G) 63 / TTT Phenylalanine (F) 21 V CCC Proline (P)
  • 11.
    Example: Hello, world! BASE64 SGVsbG8sIHdvcmxkIQo= BASE64 INDEX 18 6 21 44 27 6 60 44 8 7 29 47 28 38 49 36 8 16 40 DNA CAGACGCCCGTACGTACGTTAGTAAGAACTCTCGTTCTAGCGTACGCAAG ACAAGGA PROTEIN QTPVRTLVRTLVLAYARQG Any binary data can be represented: text (unicode), bitmap, audio, video, etc. Try It: http://amitsnyderman.com/school/designfrontiers/ encoder.php
  • 12.
  • 13.
    Non-Digital Library Via recombinantDNA technologies, craft a portable, reproducible, time-resistant library. Embed, grow and spread in bacteria. Package as a pill. Organic time capsule.
  • 16.
    Spime "The key tothe Spime is identity. A Spime is, by definition, the protagonist of a documented process. It is an historical entity with an accessible, precise trajectory through space and time." –Bruce Sterling, Shaping Things
  • 17.
    Spime Ingredients Unique IDcode History of ownership Geographical position Customization details Public discourse Etc.
  • 18.
    Human Identification Tools, artifacts, archeology Bonestructure, teeth, dental records, fingerprints, DNA Yellowpages, resume, Facebook/LinkedIn/etc, Google
  • 23.
    Narrative Embedded Histories. Familytrees and Lineage. Stories.
  • 24.
    Junk DNA Noncoding DNAdescribes sequences that do not encode for protein sequences. Much of this DNA has no known biological function and is sometimes referred to as "junk DNA". More than 98% of the human genome is non-coding.
  • 25.
    Human Spime Recycle junkDNA by recombining encoded messages into non-coding DNA regions. In-vitro manipulation. Gene therapy. Hereditary storytelling.