UNIT V -
COMPUTING WITH NEW
NATURAL MATERIALS: DNA
Computing
Motivation, DNA Molecule , Adleman's experiment , Universal
DNA Computers , PAM Model , Splicing Systems , Lipton's
Solution to SAT Problem , Scope of DNA Computing , From
Classical to DNA Computing
INTRODUCTION
• The processes that transform matter and energy in living systems do so under the
direction of a set of symbolically encoded instructions.
• The „machine language that describes the objects and processes of living systems
‟
contains four letters {A,C,T,G}, and the „text that describes a person has a great
‟
number of characters.
• These form the basis of the DNA molecules that contain the genetic information of
all living beings. DNA is also the main component of DNA computing.
INTRODUCTION
• DNA computing is one particular component of a broader field called molecular
computing that can be broadly defined as the use of (bio)molecules and
biomolecular operations to solve problems and to perform computation.
• It constitutes a unique combination between computer science and molecular
biology. DNA computing was introduced by L. Adleman in 1994 when he solved an
NP complete problem using DNA molecules and biomolecular techniques for
manipulating DNA.
• The basic idea is that it is possible to apply operations to a set of (bio)molecules,
resulting in interesting and practical performances.
• To date, most molecular computing techniques are based on a brute force strategy in
which the operations are simultaneously (in parallel) applied to all molecules being
used.
INTRODUCTION
• DNA computing can be viewed as a novel approach for solving complex problems,
or as a completely new computing paradigm that may eventually complement or
supplement the current silicon-based computers. In order for such goals to be met,
there are several questions concerning DNA computing that have to be answered.
Two important ones are related to the potential of this new paradigm and to the
feasibility of physically implementing it.
1. Can any algorithm be simulated by means of DNA computing? In other words, is
the DNA computing computationally complete? Is there a universal DNA system in
the same sense as there is a universal Turing Machine: given a computable function,
can it simulate the actions of that function for any argument?
2. Is it possible to design a programmable molecular computer? In other words, how
feasible is it to construct a molecular computer? What difficulties should be
overcome for the construction of a real (physical) molecular computer?
INTRODUCTION
• Several models of DNA computing have been
proposed to answer these and other questions, and
some of these models will be reviewed here.
• It is possible to divide the DNA computing models in
two major classes:
⚬first class, commonly referred to as filtering models, which includes
models based on operations that are successfully implemented in the
laboratory
⚬second class composed of the so-called formal models, such as the
splicing systems and the sticker systems, whose properties are much
easier to study, but only the first steps have been taken toward their
practical implementation
INTRODUCTION
• In brief, DNA computing is based on the use of DNA molecules as data structure,
and the application of DNA manipulation techniques to compute with DNA.
• Based upon standard biological operations to manipulate DNA, it has been possible
to introduce what can be called DNA programming languages, with specific
operators and data structures.
• Two of these languages are discussed, namely, the test tube programming
language, and the DNA Pascal.
Motivation
The main advantages of DNA computing are its high speed, energy efficiency, and
economical information storage. There are, of course, a possibility for error and
difficulties in implementing a real DNA computer. When compared with the currently
known silicon-based computers, DNA computing offers some unique features
• It uses DNA as data structures. Thus, data is stored using strings built out of a
quaternary alphabet, {A,C,G,T}, instead of a binary alphabet {0,1}. The structure of
DNA and how it is manipulated is completely different from the data structure and
operations used in today s computers.
‟
• DNA molecules, and thus computers, can work in a massively parallel fashion.
• Computation can be performed at a molecular level, potentially a size limit that may
never be reached by the semiconductor industry.
• DNA computers can potentially work in an extraordinary way of high energy
efficiency and economical information storage.
• DNA computers are highly effective in solving NP-complete problems; that is, DNA
computers can be used to solve problems that cannot be (practically) solved using
standard computers.
BASIC CONCEPTS FROM MOLECULAR BIOLOGY
• DNA computing is based on two aspects of molecular biology: the use of DNA
molecules as a type of data structure to perform computation, and the use of DNA
manipulation techniques to compute with DNA molecules.
The DNA Molecule
• the genetic material is contained in the cell nucleus and is complexed with proteins
and organized into linear structures called chromosomes. Chromosomes are
composed of genes, which, by themselves, are segments of a helix molecule called
deoxyribonucleic acid, or DNA for short
The DNA Molecule
• All the genetic information in cellular organisms is stored in DNA, which consists of
polymer chains, commonly referred to as DNA strands, of four simple nucleic acid
units, called deoxyribonucleotides or simply nucleotides.
• There are four nucleotides found in DNA. Each nucleotide consists of three parts:
one base molecule, a sugar (deoxyribose in DNA), and a phosphate group.
• The four bases are adenine (A), guanine (G), cytosine (C), and thymine (T). As
the nucleotides differ only by their bases, they are often called bases.
The DNA Molecule
• Numbers from 1′ to 5′ are used to denote the five carbon atoms of the sugar part of
the nucleotide. The phosphate group is attached to the 5′ carbon, and the base is
attached to the 1′ carbon. To the 3′ carbon there is attached a hydroxyl (OH) group.
Each strand has, according to chemical convention, a 5′ and a 3′ end, thus any single
strand has a natural orientation. This orientation, and the notation used here, is due to
the fact that one end of the single strand has a free (i.e., unattached to another
nucleotide) 5′ phosphate group, and the other has a free 3′ hydroxyl group.
The DNA Molecule
• Figure 9.2(a) brings a schematic representation of a nucleotide and Figure 9.2(b)
depicts the chemical structure of a nucleotide with thymine base. In Figure 9.2(a), the
carbons of the sugar base are enumerated from 1′ to 5′. Note that the phosphate group
is attached to the 5′ carbon, the hydroxyl is attached to the 3′ carbon, and the base is
attached to the 1′ carbon, as discussed above.
The DNA Molecule
Nucleotides can link together in two different ways
• The 5′ phosphate group of one nucleotide is joined with the 3′ hydroxyl group of the
other forming a covalent bond.
• The base of one nucleotide interacts with the base of the other to form a hydrogen
bond, which is a bond weaker than the covalent bond.
The DNA Molecule
• Another important feature of the nucleotide bonding is that any two nucleotides can
link together to form a sequence in the same way as several symbols form a string.
• As for symbols on a string, there is no restriction on nucleotides in the sequence.
• However, the bonds between the bases can only occur by the pairwise attraction of
the following bases: A binds with T, and G binds with C.
⚬ This is called the Watson-Crick complementarity , after J. D. Watson and F. H. C. Crick who
discovered the double helix structure of DNA.
• Since DNA consists of two complementary strands bond together, these units are
often called base pairs. The length of a DNA sequence is often measured in
thousands of bases, abbreviated kb. Nucleotides are generally abbreviated by their
first letter, and appended into sequences, written, e.g., GTACAGTT. The nucleotides
are linked to each other in the polymer by phosphodiester bonds. It can be observed
from Figure 9.3, that the bond is directional; a strand of DNA has a head (the 5′end)
and a tail (the 3′end).
The DNA Molecule
• One well known fact about DNA is that it forms a double helix; that is, two helical
(spiral-shaped) strands of the polypeptide, running in opposite directions, held
together by hydrogen bonds
The DNA Molecule
• The single stranded sequences of nucleotides have a directionality given by the
carbons used by the covalent bonds. The hydrogen bonds, based on complementarity,
can bring together single stranded sequences of nucleotides only if they are of
opposite directionality. Although the sequence in one strand of DNA is completely
unrestricted, because of the bonding restrictions the sequence in the complementary
strand is completely determined. It is this feature that makes it possible to produce
high fidelity copies of the information stored in the DNA. Despite the many
schematic representations of DNA molecules, such as the ones depicted in Figure 9.3
and Figure 9.4, in the DNA computing context, most DNA molecules are simply
represented as linear strings of characters, such as the one depicted
The DNA Molecule
• Note the directionality and complementarity in this representation.
• Note also that the two strands are drawn one over the other, with the upper strand
oriented from left to right in 5′−3′ direction, and the lower strand oriented in the
opposite direction, 3′−5′.
The DNA Molecule
• The DNA molecules not necessarily have to have the same number of bases in the
upper and lower strands. In some cases, there may be „free bases on the left or right
‟
end of the molecules. In this case we say that the molecule has sticky ends, as
illustrated in Figure 9.6. These ends are called sticky because any complementary
sequence to the free end can bind with it. it.
Manipulating DNA
• All DNA computing techniques apply a specific set of biological operations to a set
of strands.
• These operations are all commonly used by molecular biologists, and the most
important of them, in the context of DNA computing, are discussed.
• Many operations with DNA can be mediated by enzymes, which are proteins that
catalyze some chemical reactions where DNA is involved.
Manipulating DNA
• Denaturation (separates DNA strands): by heating double-stranded DNA, it
becomes possible to separate the two strands into single strands. It is possible to
separate the two strands without breaking the single strands because the hydrogen
bonds between complementary nucleotides are much weaker than the covalent bonds
between adjacent nucleotides in the two strands. Denaturation is also called melting.
• Annealing (fuses DNA strands): annealing is the reverse of melting, whereby a
solution of single strands is cooled down so as to allow for complementary strands to
bind together. When the single strands in the solution are not complementary in their
entirety, the result of annealing is DNA molecules with sticky ends. Annealing is also
called renaturation.
Manipulating DNA
• Polymerase extension (fills in incomplete strands): a class of enzymes
called polymerases is able to add nucleotides to an incomplete DNA molecule, such
as the one shown in Figure 9.7(b). Thus, the molecule can be completed to a double
strand without sticky ends. The polymerases are able to add nucleotides in the 5′−3′
direction until pairing each nucleotide with its Watson-Crick complement. This
process requires an existing single strand that acts as a template prescribing the
chain of nucleotides to be added (by Watson-Crick complementarity), and an existing
sequence, called primer, which is bonded to a part of the template with the 3′ end
available for extension in the 5′−3′ direction (Figure Figure 9.8(a)).
• Some exonucleases remove nucleotides from the 5′ end, others remove from the 3′
end, and others remove nucleotides from single strands. Figure 9.8 illustrates the
polymerase extension and nuclease degradation of DNA molecules.
Manipulating DNA
Manipulating DNA
• Endonucleases (cut DNA molecules): endonucleases are able to cut DNA
molecules by destroying the covalent bonds between adjacent nucleotides. They can
be specific as to what, where, and how they cut. For instance, endonucleases, called
restriction enzymes, can cut DNA molecules at specific sites. The cut can be blunt,
i.e., straight through both strands, or staggered, i.e., leaving sticky ends (Figure 9.9).
There are also endonucleases that cut only single strands, or within single strand
pieces of a mixed DNA molecule containing single stranded and double stranded
pieces.
• Ligation (links DNA molecules): a class of enzymes can be used to link
together, or ligate, fragments of DNA molecules. Ligation is often performed after an
annealing operation is used to concatenate DNA strands. The difference between
annealing and ligation is that the former allows for the bonding of complementary
bases, while the latter performs the phosphodiester bonding between two consecutive
nucleotides. The hydrogen bond keeps complementary sticky ends together, but a gap
(called nick) remains in each of the strands. A ligase is, thus, used to establish this
bond (see Figure 9.10). Although it is possible to use some ligases to concatenate free
floating double stranded DNA molecules with blunt ends, it is much easier to allow
single strands with sticky ends to bind together.
Manipulating DNA
• Modifying nucleotides (inserts or deletes short subsequences): it is
possible to add, substitute, or delete certain subsequences from DNA molecules. The
enzymes used in these operations are called modifying enzymes.
• Amplification (multiplies DNA molecules): given some DNA molecules, a
technique called polymerase chain reaction (PCR) can be used to make multiple
copies of a subset of the strands present.
• PCR requires a start and an end subsequence, called primers, that are used to
identify the sequence, called template, to be replicated.
• The PCR was devised in 1985 by Kary Mullis and revolutionized molecular biology.
• It is very sensitive, simple and efficient, consisting basically of three steps:
denaturation, annealing, and polymerase extension.
Manipulating DNA
• Step 0: To start with, prepare a solution containing the molecule m to be copied and
the primers.
• Step 1: (Denaturation) Then, heat the solution so that the hydrogen bonds between
the two strands are destroyed, and the molecule m denatures into two strands.
• Step 2: (Annealing) Now, cool down the solution so that the primers will anneal to
their complementary subsequences.
• Step 3: (Polymerase extension) Finally, apply the polymerase extension technique
to fill in the sticky ends so as to form double strands.
Manipulating DNA
• Gel electrophoresis (measures the length of DNA molecules and
separates them by length): the length of a single stranded molecule is the
number of nucleotides composing the molecule. For example, if a molecule has 12
nucleotides, it is a 12 mer; that is, a polymer consisting of 12 monomers.
• The length of a double stranded molecule is the number of base pairs. For example, a
double stranded molecule with 12 bp in each strand has length 12 base pairs.
• Gel electrophoresis is a technique that can be used with two main purposes:
1.to measure the length of a DNA molecule, and
2.2) to sort (separate) DNA strands by length. Gel electrophoresis (separation by
length) is one of the ways often used to read out the results of DNA computing, with
the advantage of being quite precise.
Manipulating DNA
• In the gel electrophoresis process, a gel is prepared and will function as a support for
the separation of DNA fragments. Holes are created in the gel and will serve as
reservoirs to hold the solution containing DNA.
• An electrical charge is then applied to the set up and since smaller molecules travel
faster through the gel, larger molecules will lag behind.
• As the separation process continues, the distance between larger and smaller
molecules becomes more apparent.
• After the gel electrophoresis has been run it is necessary to visualize the results.
⚬ To do so, the DNA is stained with a dye and then the gel is visualized under ultraviolet light. The gel is
then photographed,
⚬ DNA fragments of same length cluster together producing visible horizontal bands.
Manipulating DNA
Manipulating DNA
Manipulating DNA
• Filtering (separates or extracts specific molecules): there are a few
techniques that can be used to separate some particular strands from a solution. One
of the simplest methods is as follows.
• If a single stranded molecule of type o is to be separated from those of other types in
a given solution S, one can attach its complement (ō) to a filter and pour the solution
S through the filter.
• Then, o molecules will bind to ō molecules while the others will just flow through
the filter.
• The annealing between o and ō results in a collection of double-stranded molecules
fixed to the filter, and a solution S* results from S by removing the o molecules.
• Finally, the filter is transferred to a container where the double stranded DNA is
denatured, resulting only in the molecules to be separated.
Manipulating DNA
• Synthesis (creates DNA molecules): it has already been discussed how to
synthesize double strands from single strands via polymerase extension. However, it
still remains the question as to how can we synthesize single stranded molecules. It is
possible to chemically synthesize single stranded molecules, called oligonucleotides
(or simply oligos), using a particular machine. This synthesizer is supplied with the
four nucleotide bases in solution and adds nucleotide by nucleotide following a
prescribed sequence entered by the user.
Manipulating DNA
• Sequencing (reads out the sequence of a DNA molecule): determining
the exact sequence of nucleotides comprising a given DNA molecule is essential for
the interpretation of the results obtained in most DNA computing approaches. The
most popular sequencing technique is based on the extension of a primed single
stranded template. For our present purposes, all we have to know is that there are
techniques able to read a DNA molecule nucleotide by nucleotide. Further details are
left as an exercise for the reader.
FILTERING MODELS
• In all filtering models, a computation consists of a sequence of operations on finite
multi-sets of strings that usually starts and ends with a single multi-set.
• By initializing a multi-set and applying specific operations to it, new or modified
multi-sets are generated.
• The computation then proceeds by filtering out strings that cannot be a solution
1.Adleman’s Experiment
2.Lipton’s Solution to the SAT problem
Adleman’s Experiment
• Speculations about the possibility of using DNA molecules to perform computation date
back to the early 1970s and maybe even earlier.
• However, none of these insights was followed by practical attempts at real world
implementations.
• The first successful experiment involving the use of DNA molecules and DNA manipulation
techniques for computing was reported by L. M. Adleman in 1994 (Adleman, 1994).
• In that paper, Adleman solved a small instance of the Hamiltonian path problem (HPP) in a
directed graph using purely biochemical means.
• In general, the HPP consists of deciding whether or not an arbitrarily given graph has a
Hamiltonian path.
• HPP can be solved by an exhaustive search and various algorithms have been proposed to
solve it.
• Although these algorithms are successful for some special classes of graphs, they all have
an exponential worst-case complexity for general directed graphs
• The Hamiltonian path problem has been shown to be an NP-complete problem
• With Adleman s DNA computing solution to the HPP, the number of the laboratory steps
‟
was linear in terms of the size of the graph (number of vertices), although the problem itself
is known to be NP-complete.
Adleman’s Experiment
• The Hamiltonian path problem can be explained as follows. A directed graph G with
designated vertices vin and vout , is said to have a Hamiltonian path if and only if
there is a sequence of compatible directed edges e1, e2 , … ez (i.e., a path) that
begins at vin , ends at vout , and passes through each vertex exactly once. Figure 9.13
illustrates the instance of the Hamiltonian path problem used by Adleman (1994) in
his pioneering implementation of DNA computing. In this graph, vin = 0 and vout =
6, and a Hamiltonian path is given by the following sequence of edges: 0 → 1 → 2
→ 3 → 4 → 5 → 6. This can be easily verified by inspection.
Adleman’s Experiment
• To solve this problem Adleman used the following deterministic algorithm:
• Step 1: generate random paths through the graph.
• Step 2: keep only those paths that begin with vin and end with vout .
• Step 3: if the graph has n vertices, then keep only those paths that enter exactly n
vertices.
• Step 4: keep only those paths that enter all the vertices of the graph at least once.
• Step 5: if any path remains, say YES; else, say NO.
Adleman’s Experiment
• Adleman translated this algorithm step by step into molecular biology. Before
generating the random paths through the graph, it is necessary to decide how these
are going to be encoded using DNA molecules.
• Adleman chose to encode each vertex of the graph by a single stranded sequence of
nucleotides of length 20 (a 20mer). The codes were constructed at random, and the
length 20 was chosen so as to ensure different codes.
• A large number of these oligonucleotides were generated by PCR and placed in a test
tube.
• The edges are encoded as follows: if there is an edge from vertex i to vertex j, and the
codes of these vertices are vi = aibi and vj = ajbj , where ai , bi , aj , bj are sequences
of length 10, then the edge i→j is encoded by the Watson-Crick complement of the
sequence biaj , as illustrated in Figure 9.14.
Adleman’s Experiment
Adleman’s Experiment
• In order to link the edges to form paths, oligonucleotides Ōi complementary to those
representing the edges (Oi) had to be synthesized. An enzymatic ligation reaction was
then carried out, linking the 20mer strands encoding the edges so as to form random
paths through
• To implement Step 2, the product of Step 1 was amplified by polymerase chain
reaction (PCR) using as primers Ō0 and Ō6 . Therefore, only those molecules
encoding paths that begin with vertex 0 and end with vertex 6 were amplified. A
filtering operation can then be used to separate the strands that start with vertex 0 and
end with vertex 6. the graph, corresponding to Step 1 of the algorithm.
Adleman’s Experiment
• Then, the DNA molecules were separated according to their length by gel
electrophoresis. The band on the gel, which by comparison with a molecular weight
marker was identified as consisting of strands with 140bp (7 vertices) was separated
from the gel and the DNA was extracted. Repeated cycles of PCR and
electrophoresis were used to purify the product further. At the end of this step, there
is a set of molecules that start with 0, end with 6, and pass through 7 vertices. Note
however, that this does not ensure such a path is Hamiltonian, for example, the path
0, 3, 2, 3, 4, 5, 6 satisfies all these conditions but is not Hamiltonian.
• To implement Step 4 of the algorithm, single stranded DNA was probed with
complementary oligonucleotides attached to magnetic beads. Thus, with one step for
every vertex, the molecules containing the required sequence could be literally pulled
out of the solution. This process can also be explained as follows: for each vertex i
melt the result of Step 3, add the complement of the code Ōi of vertex i (i = 1, 2, … ,
5), and let it anneal; then remove all molecules that do not anneal. Finally, to obtain
the YES/NO answer of Step 5, one still has to amplify the product of Step 4 by PCR
and analyze it by gel electrophoresis.
Lipton’s solution to SAT
problem
Lipton’s solution to SAT
problem
Lipton’s solution to SAT
problem
• Lipton constructed a series of test tubes, where the first one, t0 , is the tube
containing all two bit (binary) strings. He proposed, among others, an extract test
tube operation E(t,i,a) that extracts all sequences in test tube t whose i-th bit is equal
to a, a {0,1}. (Remember that the binary strings are encoded using a word
∈
composed of nucleotides, but it is simpler to explain the procedure by looking at
binary strings instead of a quaternary alphabet with the letters A, C, T, G.) Then, he
operated as follows:
F = (e1∨e2) ∧ (ē1∨ē2)
Lipton’s solution to SAT
problem
Unit5_Topic1.pptx_20240923_213122_0000.pptx

Unit5_Topic1.pptx_20240923_213122_0000.pptx

  • 1.
    UNIT V - COMPUTINGWITH NEW NATURAL MATERIALS: DNA Computing Motivation, DNA Molecule , Adleman's experiment , Universal DNA Computers , PAM Model , Splicing Systems , Lipton's Solution to SAT Problem , Scope of DNA Computing , From Classical to DNA Computing
  • 2.
    INTRODUCTION • The processesthat transform matter and energy in living systems do so under the direction of a set of symbolically encoded instructions. • The „machine language that describes the objects and processes of living systems ‟ contains four letters {A,C,T,G}, and the „text that describes a person has a great ‟ number of characters. • These form the basis of the DNA molecules that contain the genetic information of all living beings. DNA is also the main component of DNA computing.
  • 3.
    INTRODUCTION • DNA computingis one particular component of a broader field called molecular computing that can be broadly defined as the use of (bio)molecules and biomolecular operations to solve problems and to perform computation. • It constitutes a unique combination between computer science and molecular biology. DNA computing was introduced by L. Adleman in 1994 when he solved an NP complete problem using DNA molecules and biomolecular techniques for manipulating DNA. • The basic idea is that it is possible to apply operations to a set of (bio)molecules, resulting in interesting and practical performances. • To date, most molecular computing techniques are based on a brute force strategy in which the operations are simultaneously (in parallel) applied to all molecules being used.
  • 4.
    INTRODUCTION • DNA computingcan be viewed as a novel approach for solving complex problems, or as a completely new computing paradigm that may eventually complement or supplement the current silicon-based computers. In order for such goals to be met, there are several questions concerning DNA computing that have to be answered. Two important ones are related to the potential of this new paradigm and to the feasibility of physically implementing it. 1. Can any algorithm be simulated by means of DNA computing? In other words, is the DNA computing computationally complete? Is there a universal DNA system in the same sense as there is a universal Turing Machine: given a computable function, can it simulate the actions of that function for any argument? 2. Is it possible to design a programmable molecular computer? In other words, how feasible is it to construct a molecular computer? What difficulties should be overcome for the construction of a real (physical) molecular computer?
  • 5.
    INTRODUCTION • Several modelsof DNA computing have been proposed to answer these and other questions, and some of these models will be reviewed here. • It is possible to divide the DNA computing models in two major classes: ⚬first class, commonly referred to as filtering models, which includes models based on operations that are successfully implemented in the laboratory ⚬second class composed of the so-called formal models, such as the splicing systems and the sticker systems, whose properties are much easier to study, but only the first steps have been taken toward their practical implementation
  • 6.
    INTRODUCTION • In brief,DNA computing is based on the use of DNA molecules as data structure, and the application of DNA manipulation techniques to compute with DNA. • Based upon standard biological operations to manipulate DNA, it has been possible to introduce what can be called DNA programming languages, with specific operators and data structures. • Two of these languages are discussed, namely, the test tube programming language, and the DNA Pascal.
  • 7.
    Motivation The main advantagesof DNA computing are its high speed, energy efficiency, and economical information storage. There are, of course, a possibility for error and difficulties in implementing a real DNA computer. When compared with the currently known silicon-based computers, DNA computing offers some unique features • It uses DNA as data structures. Thus, data is stored using strings built out of a quaternary alphabet, {A,C,G,T}, instead of a binary alphabet {0,1}. The structure of DNA and how it is manipulated is completely different from the data structure and operations used in today s computers. ‟ • DNA molecules, and thus computers, can work in a massively parallel fashion. • Computation can be performed at a molecular level, potentially a size limit that may never be reached by the semiconductor industry. • DNA computers can potentially work in an extraordinary way of high energy efficiency and economical information storage. • DNA computers are highly effective in solving NP-complete problems; that is, DNA computers can be used to solve problems that cannot be (practically) solved using standard computers.
  • 8.
    BASIC CONCEPTS FROMMOLECULAR BIOLOGY • DNA computing is based on two aspects of molecular biology: the use of DNA molecules as a type of data structure to perform computation, and the use of DNA manipulation techniques to compute with DNA molecules.
  • 9.
    The DNA Molecule •the genetic material is contained in the cell nucleus and is complexed with proteins and organized into linear structures called chromosomes. Chromosomes are composed of genes, which, by themselves, are segments of a helix molecule called deoxyribonucleic acid, or DNA for short
  • 10.
    The DNA Molecule •All the genetic information in cellular organisms is stored in DNA, which consists of polymer chains, commonly referred to as DNA strands, of four simple nucleic acid units, called deoxyribonucleotides or simply nucleotides. • There are four nucleotides found in DNA. Each nucleotide consists of three parts: one base molecule, a sugar (deoxyribose in DNA), and a phosphate group. • The four bases are adenine (A), guanine (G), cytosine (C), and thymine (T). As the nucleotides differ only by their bases, they are often called bases.
  • 11.
    The DNA Molecule •Numbers from 1′ to 5′ are used to denote the five carbon atoms of the sugar part of the nucleotide. The phosphate group is attached to the 5′ carbon, and the base is attached to the 1′ carbon. To the 3′ carbon there is attached a hydroxyl (OH) group. Each strand has, according to chemical convention, a 5′ and a 3′ end, thus any single strand has a natural orientation. This orientation, and the notation used here, is due to the fact that one end of the single strand has a free (i.e., unattached to another nucleotide) 5′ phosphate group, and the other has a free 3′ hydroxyl group.
  • 12.
    The DNA Molecule •Figure 9.2(a) brings a schematic representation of a nucleotide and Figure 9.2(b) depicts the chemical structure of a nucleotide with thymine base. In Figure 9.2(a), the carbons of the sugar base are enumerated from 1′ to 5′. Note that the phosphate group is attached to the 5′ carbon, the hydroxyl is attached to the 3′ carbon, and the base is attached to the 1′ carbon, as discussed above.
  • 13.
    The DNA Molecule Nucleotidescan link together in two different ways • The 5′ phosphate group of one nucleotide is joined with the 3′ hydroxyl group of the other forming a covalent bond. • The base of one nucleotide interacts with the base of the other to form a hydrogen bond, which is a bond weaker than the covalent bond.
  • 14.
    The DNA Molecule •Another important feature of the nucleotide bonding is that any two nucleotides can link together to form a sequence in the same way as several symbols form a string. • As for symbols on a string, there is no restriction on nucleotides in the sequence. • However, the bonds between the bases can only occur by the pairwise attraction of the following bases: A binds with T, and G binds with C. ⚬ This is called the Watson-Crick complementarity , after J. D. Watson and F. H. C. Crick who discovered the double helix structure of DNA. • Since DNA consists of two complementary strands bond together, these units are often called base pairs. The length of a DNA sequence is often measured in thousands of bases, abbreviated kb. Nucleotides are generally abbreviated by their first letter, and appended into sequences, written, e.g., GTACAGTT. The nucleotides are linked to each other in the polymer by phosphodiester bonds. It can be observed from Figure 9.3, that the bond is directional; a strand of DNA has a head (the 5′end) and a tail (the 3′end).
  • 15.
    The DNA Molecule •One well known fact about DNA is that it forms a double helix; that is, two helical (spiral-shaped) strands of the polypeptide, running in opposite directions, held together by hydrogen bonds
  • 16.
    The DNA Molecule •The single stranded sequences of nucleotides have a directionality given by the carbons used by the covalent bonds. The hydrogen bonds, based on complementarity, can bring together single stranded sequences of nucleotides only if they are of opposite directionality. Although the sequence in one strand of DNA is completely unrestricted, because of the bonding restrictions the sequence in the complementary strand is completely determined. It is this feature that makes it possible to produce high fidelity copies of the information stored in the DNA. Despite the many schematic representations of DNA molecules, such as the ones depicted in Figure 9.3 and Figure 9.4, in the DNA computing context, most DNA molecules are simply represented as linear strings of characters, such as the one depicted
  • 17.
    The DNA Molecule •Note the directionality and complementarity in this representation. • Note also that the two strands are drawn one over the other, with the upper strand oriented from left to right in 5′−3′ direction, and the lower strand oriented in the opposite direction, 3′−5′.
  • 18.
    The DNA Molecule •The DNA molecules not necessarily have to have the same number of bases in the upper and lower strands. In some cases, there may be „free bases on the left or right ‟ end of the molecules. In this case we say that the molecule has sticky ends, as illustrated in Figure 9.6. These ends are called sticky because any complementary sequence to the free end can bind with it. it.
  • 19.
    Manipulating DNA • AllDNA computing techniques apply a specific set of biological operations to a set of strands. • These operations are all commonly used by molecular biologists, and the most important of them, in the context of DNA computing, are discussed. • Many operations with DNA can be mediated by enzymes, which are proteins that catalyze some chemical reactions where DNA is involved.
  • 20.
    Manipulating DNA • Denaturation(separates DNA strands): by heating double-stranded DNA, it becomes possible to separate the two strands into single strands. It is possible to separate the two strands without breaking the single strands because the hydrogen bonds between complementary nucleotides are much weaker than the covalent bonds between adjacent nucleotides in the two strands. Denaturation is also called melting. • Annealing (fuses DNA strands): annealing is the reverse of melting, whereby a solution of single strands is cooled down so as to allow for complementary strands to bind together. When the single strands in the solution are not complementary in their entirety, the result of annealing is DNA molecules with sticky ends. Annealing is also called renaturation.
  • 22.
    Manipulating DNA • Polymeraseextension (fills in incomplete strands): a class of enzymes called polymerases is able to add nucleotides to an incomplete DNA molecule, such as the one shown in Figure 9.7(b). Thus, the molecule can be completed to a double strand without sticky ends. The polymerases are able to add nucleotides in the 5′−3′ direction until pairing each nucleotide with its Watson-Crick complement. This process requires an existing single strand that acts as a template prescribing the chain of nucleotides to be added (by Watson-Crick complementarity), and an existing sequence, called primer, which is bonded to a part of the template with the 3′ end available for extension in the 5′−3′ direction (Figure Figure 9.8(a)). • Some exonucleases remove nucleotides from the 5′ end, others remove from the 3′ end, and others remove nucleotides from single strands. Figure 9.8 illustrates the polymerase extension and nuclease degradation of DNA molecules.
  • 23.
  • 24.
    Manipulating DNA • Endonucleases(cut DNA molecules): endonucleases are able to cut DNA molecules by destroying the covalent bonds between adjacent nucleotides. They can be specific as to what, where, and how they cut. For instance, endonucleases, called restriction enzymes, can cut DNA molecules at specific sites. The cut can be blunt, i.e., straight through both strands, or staggered, i.e., leaving sticky ends (Figure 9.9). There are also endonucleases that cut only single strands, or within single strand pieces of a mixed DNA molecule containing single stranded and double stranded pieces.
  • 25.
    • Ligation (linksDNA molecules): a class of enzymes can be used to link together, or ligate, fragments of DNA molecules. Ligation is often performed after an annealing operation is used to concatenate DNA strands. The difference between annealing and ligation is that the former allows for the bonding of complementary bases, while the latter performs the phosphodiester bonding between two consecutive nucleotides. The hydrogen bond keeps complementary sticky ends together, but a gap (called nick) remains in each of the strands. A ligase is, thus, used to establish this bond (see Figure 9.10). Although it is possible to use some ligases to concatenate free floating double stranded DNA molecules with blunt ends, it is much easier to allow single strands with sticky ends to bind together.
  • 26.
    Manipulating DNA • Modifyingnucleotides (inserts or deletes short subsequences): it is possible to add, substitute, or delete certain subsequences from DNA molecules. The enzymes used in these operations are called modifying enzymes. • Amplification (multiplies DNA molecules): given some DNA molecules, a technique called polymerase chain reaction (PCR) can be used to make multiple copies of a subset of the strands present. • PCR requires a start and an end subsequence, called primers, that are used to identify the sequence, called template, to be replicated. • The PCR was devised in 1985 by Kary Mullis and revolutionized molecular biology. • It is very sensitive, simple and efficient, consisting basically of three steps: denaturation, annealing, and polymerase extension.
  • 27.
    Manipulating DNA • Step0: To start with, prepare a solution containing the molecule m to be copied and the primers. • Step 1: (Denaturation) Then, heat the solution so that the hydrogen bonds between the two strands are destroyed, and the molecule m denatures into two strands. • Step 2: (Annealing) Now, cool down the solution so that the primers will anneal to their complementary subsequences. • Step 3: (Polymerase extension) Finally, apply the polymerase extension technique to fill in the sticky ends so as to form double strands.
  • 29.
    Manipulating DNA • Gelelectrophoresis (measures the length of DNA molecules and separates them by length): the length of a single stranded molecule is the number of nucleotides composing the molecule. For example, if a molecule has 12 nucleotides, it is a 12 mer; that is, a polymer consisting of 12 monomers. • The length of a double stranded molecule is the number of base pairs. For example, a double stranded molecule with 12 bp in each strand has length 12 base pairs. • Gel electrophoresis is a technique that can be used with two main purposes: 1.to measure the length of a DNA molecule, and 2.2) to sort (separate) DNA strands by length. Gel electrophoresis (separation by length) is one of the ways often used to read out the results of DNA computing, with the advantage of being quite precise.
  • 30.
    Manipulating DNA • Inthe gel electrophoresis process, a gel is prepared and will function as a support for the separation of DNA fragments. Holes are created in the gel and will serve as reservoirs to hold the solution containing DNA. • An electrical charge is then applied to the set up and since smaller molecules travel faster through the gel, larger molecules will lag behind. • As the separation process continues, the distance between larger and smaller molecules becomes more apparent. • After the gel electrophoresis has been run it is necessary to visualize the results. ⚬ To do so, the DNA is stained with a dye and then the gel is visualized under ultraviolet light. The gel is then photographed, ⚬ DNA fragments of same length cluster together producing visible horizontal bands.
  • 31.
  • 32.
  • 33.
    Manipulating DNA • Filtering(separates or extracts specific molecules): there are a few techniques that can be used to separate some particular strands from a solution. One of the simplest methods is as follows. • If a single stranded molecule of type o is to be separated from those of other types in a given solution S, one can attach its complement (ō) to a filter and pour the solution S through the filter. • Then, o molecules will bind to ō molecules while the others will just flow through the filter. • The annealing between o and ō results in a collection of double-stranded molecules fixed to the filter, and a solution S* results from S by removing the o molecules. • Finally, the filter is transferred to a container where the double stranded DNA is denatured, resulting only in the molecules to be separated.
  • 34.
    Manipulating DNA • Synthesis(creates DNA molecules): it has already been discussed how to synthesize double strands from single strands via polymerase extension. However, it still remains the question as to how can we synthesize single stranded molecules. It is possible to chemically synthesize single stranded molecules, called oligonucleotides (or simply oligos), using a particular machine. This synthesizer is supplied with the four nucleotide bases in solution and adds nucleotide by nucleotide following a prescribed sequence entered by the user.
  • 35.
    Manipulating DNA • Sequencing(reads out the sequence of a DNA molecule): determining the exact sequence of nucleotides comprising a given DNA molecule is essential for the interpretation of the results obtained in most DNA computing approaches. The most popular sequencing technique is based on the extension of a primed single stranded template. For our present purposes, all we have to know is that there are techniques able to read a DNA molecule nucleotide by nucleotide. Further details are left as an exercise for the reader.
  • 36.
    FILTERING MODELS • Inall filtering models, a computation consists of a sequence of operations on finite multi-sets of strings that usually starts and ends with a single multi-set. • By initializing a multi-set and applying specific operations to it, new or modified multi-sets are generated. • The computation then proceeds by filtering out strings that cannot be a solution 1.Adleman’s Experiment 2.Lipton’s Solution to the SAT problem
  • 37.
    Adleman’s Experiment • Speculationsabout the possibility of using DNA molecules to perform computation date back to the early 1970s and maybe even earlier. • However, none of these insights was followed by practical attempts at real world implementations. • The first successful experiment involving the use of DNA molecules and DNA manipulation techniques for computing was reported by L. M. Adleman in 1994 (Adleman, 1994). • In that paper, Adleman solved a small instance of the Hamiltonian path problem (HPP) in a directed graph using purely biochemical means. • In general, the HPP consists of deciding whether or not an arbitrarily given graph has a Hamiltonian path. • HPP can be solved by an exhaustive search and various algorithms have been proposed to solve it. • Although these algorithms are successful for some special classes of graphs, they all have an exponential worst-case complexity for general directed graphs • The Hamiltonian path problem has been shown to be an NP-complete problem • With Adleman s DNA computing solution to the HPP, the number of the laboratory steps ‟ was linear in terms of the size of the graph (number of vertices), although the problem itself is known to be NP-complete.
  • 38.
    Adleman’s Experiment • TheHamiltonian path problem can be explained as follows. A directed graph G with designated vertices vin and vout , is said to have a Hamiltonian path if and only if there is a sequence of compatible directed edges e1, e2 , … ez (i.e., a path) that begins at vin , ends at vout , and passes through each vertex exactly once. Figure 9.13 illustrates the instance of the Hamiltonian path problem used by Adleman (1994) in his pioneering implementation of DNA computing. In this graph, vin = 0 and vout = 6, and a Hamiltonian path is given by the following sequence of edges: 0 → 1 → 2 → 3 → 4 → 5 → 6. This can be easily verified by inspection.
  • 39.
    Adleman’s Experiment • Tosolve this problem Adleman used the following deterministic algorithm: • Step 1: generate random paths through the graph. • Step 2: keep only those paths that begin with vin and end with vout . • Step 3: if the graph has n vertices, then keep only those paths that enter exactly n vertices. • Step 4: keep only those paths that enter all the vertices of the graph at least once. • Step 5: if any path remains, say YES; else, say NO.
  • 40.
    Adleman’s Experiment • Adlemantranslated this algorithm step by step into molecular biology. Before generating the random paths through the graph, it is necessary to decide how these are going to be encoded using DNA molecules. • Adleman chose to encode each vertex of the graph by a single stranded sequence of nucleotides of length 20 (a 20mer). The codes were constructed at random, and the length 20 was chosen so as to ensure different codes. • A large number of these oligonucleotides were generated by PCR and placed in a test tube. • The edges are encoded as follows: if there is an edge from vertex i to vertex j, and the codes of these vertices are vi = aibi and vj = ajbj , where ai , bi , aj , bj are sequences of length 10, then the edge i→j is encoded by the Watson-Crick complement of the sequence biaj , as illustrated in Figure 9.14.
  • 41.
  • 42.
    Adleman’s Experiment • Inorder to link the edges to form paths, oligonucleotides Ōi complementary to those representing the edges (Oi) had to be synthesized. An enzymatic ligation reaction was then carried out, linking the 20mer strands encoding the edges so as to form random paths through • To implement Step 2, the product of Step 1 was amplified by polymerase chain reaction (PCR) using as primers Ō0 and Ō6 . Therefore, only those molecules encoding paths that begin with vertex 0 and end with vertex 6 were amplified. A filtering operation can then be used to separate the strands that start with vertex 0 and end with vertex 6. the graph, corresponding to Step 1 of the algorithm.
  • 43.
    Adleman’s Experiment • Then,the DNA molecules were separated according to their length by gel electrophoresis. The band on the gel, which by comparison with a molecular weight marker was identified as consisting of strands with 140bp (7 vertices) was separated from the gel and the DNA was extracted. Repeated cycles of PCR and electrophoresis were used to purify the product further. At the end of this step, there is a set of molecules that start with 0, end with 6, and pass through 7 vertices. Note however, that this does not ensure such a path is Hamiltonian, for example, the path 0, 3, 2, 3, 4, 5, 6 satisfies all these conditions but is not Hamiltonian. • To implement Step 4 of the algorithm, single stranded DNA was probed with complementary oligonucleotides attached to magnetic beads. Thus, with one step for every vertex, the molecules containing the required sequence could be literally pulled out of the solution. This process can also be explained as follows: for each vertex i melt the result of Step 3, add the complement of the code Ōi of vertex i (i = 1, 2, … , 5), and let it anneal; then remove all molecules that do not anneal. Finally, to obtain the YES/NO answer of Step 5, one still has to amplify the product of Step 4 by PCR and analyze it by gel electrophoresis.
  • 44.
  • 45.
  • 46.
    Lipton’s solution toSAT problem • Lipton constructed a series of test tubes, where the first one, t0 , is the tube containing all two bit (binary) strings. He proposed, among others, an extract test tube operation E(t,i,a) that extracts all sequences in test tube t whose i-th bit is equal to a, a {0,1}. (Remember that the binary strings are encoded using a word ∈ composed of nucleotides, but it is simpler to explain the procedure by looking at binary strings instead of a quaternary alphabet with the letters A, C, T, G.) Then, he operated as follows: F = (e1∨e2) ∧ (ē1∨ē2)
  • 47.

Editor's Notes