take it apart
Georg Dionysius Ehret's illustration of Linnaeus's
sexual system of plant classification, 1736
• Understand basic concepts of molecular
• Understand and apply fundamental
models, algorithms, data structures, and
computational techniques to answer
• Wide range of topics, but special focus on
biological sequences and their evolutionary
• Wed 13-14 (CAB G52), Fri 13-15 (ML F34)
• Prof. Gonnet will hold the lectures
• Thu 14-16 (CAB H56), starting this week
• If you do not have a nethz account, ask
Stefan Zoller as soon as possible.
• Stefan Zoller
• Nives Skunca
Date Topic Lecturer
Sept. 19/21 Course Introduction; Basic Molecular
Sept. 26/28 Markov models/String Alignment I GHG
Oct. 3/5 String Alignment II (indels, estimating
Oct. 10/12 Substitution Matrices GHG
Oct. 17/19 Approximate Alignment Methods;
Statistics of Pairwise Alignments
Oct. 24/26 Phylogeny I GHG
Oct.31/Nov.2 Phylogeny II GHG
Nov. 7/9 Phylogeny III GHG
Nov. 14/16 Multiple Sequence Alignments AS
Nov. 21/23 Synthetic Evolution; Evaluation of
Nov. 28/30 Current research; Mass profiling Guests/
Dec. 5/7 Orthology/Lateral Gene Transfer NS
Dec. 12/14 Codon bias SZ
Dec. 19/21 Genome Rearrangements GHG
Course Grade & Credits
• Participation in the exercises is strongly
encouraged, but not mandatory
• Written Exam
• During winter session
• 3 hours
• Only support materials are 2 A4 pages
(4 sides), personally handwritten.
• Interpreted language based on Maple
• Environment for bioinformatics, can do
sequence management, mathematics,
alignments, trees, drawing, etc.
• Available for download mac and linux
• A collection of real
problems with coded
solutions in the
• Darwin input in green
• Darwin output in red
• Slides can be downloaded from the
• Additional notes and references will be
made available as well.
Slides of this part are largely
based on material from
Dr. Gina Cannarozzi
• Universality of life on earth: water,
carbon-based biochemistry; genetic
material; genetic code (largely) universal.
→ common origin!
• Life is compartmentalized: cells are
fundamental units of structure,
• Capable of Darwinian evolution
Encyclopedia of Life
So what is life?
• What about endospores? viruses? mules? priests?
prions? computer viruses?
• In biology, there are exceptions to almost every rule.
“Living organisms undergo metabolism,
maintain homeostasis, possess a capacity
to grow, respond to stimuli, reproduce
and, through natural selection, adapt to
their environment in successive
Inside a Cell
10-30 µm~2 µm
• Ribosomes translate mRNA into proteins.
• Mitochondria (eukaryotes) have their own
DNA and are a result of early inclusion of α-
proteobacteria into a eukaryotic cell.
• Chloroplasts (plants, protists) have their own
DNA as a result of early inclusion of
cyanobacteria into a eukaryotic cell.
• Plasmids (bacteria) are short pieces of circular
DNA in multiple copies; nonessential; get
transferred between bacteria.
• Genome: all the genetic
material of an organism.
• The genome consists of
genes and non-coding
• Genes consist of
Escherichia coli Homo sapiens
1 circular chromosome
1 plasmid (multiple copies)
~4.6 million base pairs
coding bases (85%)
4132 protein-coding genes
172 RNA (tRNA, rRNA,etc)
23 chromosome pairs
~3 billion base pairs
~50 million coding bases (1.5%)
~21,000 protein-coding genes
~60,000 different transcripts
~4,800 RNA genes
~2,900 RNA pseudogenes
• Double helix
• Backbones: phosphate and
deoxyribose , directed
(5’ → 3’), antiparallel
• Connection: 4 bases Adenine,
Thymine, Cytosine, Guanine.
• A-T and C-G are paired by
hydrogen bonds (relatively weak) Wikipedia
C ···· G: 3 H-bonds
A ···· T: 2H-bonds
• X-H ···· Y where X,Y is
atom (typically N,O,F)
• Responsible for high
boiling point of water
(each H20 can have
up to 4 H bonds)
“Central dogma of
Polymerase can only add bases from 5’→3’
(DNA is read 3’ → 5’)
• Single stranded (can form structure)
• Uracil instead of Thymine
• mRNA: messenger RNA, for translation
• rRNA: subunit of ribosome
• tRNA: specific for one amino-acid,
selectively bind to codon via ribosome.
• microRNA: short nucleotides (~22 nts)
which regulate gene function
• Transcription factors bind to promoter sites at
the 5’ regulatory region.
• RNA polymerase, binds to the complex.
• Working together, they open the DNA double
• Genes can be on either strand, but direction of
growing mRNA sequence is always 5’ → 3’
Nobel Prize Chemistry 2006
The chain shown in grey is RNA polymerase,
with the portion that clamps on the DNA
shaded in yellow.The DNA helix being
unwound and transcribed by RNA
polymerase is shown in green and blue, and
the growing RNA stand is shown in red.
• 5’ Cap
• Poly-A tail
• Splicing (removal of introns)
Research questions:Where are the introns? Where are the
coding sequences? Where are the stop and start of
transcription? Where are the binding sites for the transcription
factors that control when transcription takes place?
• Humans: >50% of genes have splice variants.
• Dscam gene in D. melanogaster: 95 alternative
exons can express 38,016 different mRNAs through
extracted from gel
Split into fragments
of 5-10 amino acids
using MS (Mass
acid sequence and
compare with sequ-
Jiang Long, Science Creative
Quarterly Image Bank
Growth of sequence databases
2000 2002 2004 2006 2008 2010 2012
Protein Data Bank
• Start from an initial population
• reproduce and “mutate” randomly
• natural selection: fittest individuals
survive and have descendants
→ selects “good” mutations
• sometimes: a “branching” occurs (e.g.
Not only the “good”
• Genetic drift (random sampling)
• Population bottleneck
• Founder effect
• Genetic hitchhiking (neutral or mildly
deleterious alleles linked to positively
• Speciation: the
evolutionary process by
which new species arise
• Can occur from
geographic isolation or
barriers, new niche
Diane Dodd’s fruit fly experiment
Krzywinski et al. Circos: an information aesthetic for comparative genomics. Genome Research (2009) vol. 19 (9) pp. 1639-45
e.g. Human vs. Dog
among E. coli strains
Mau et al. Genome Biology 2006 7:R44
• Time since divergence
• Number of common traits.
• Edit distance (minimum # of elementary
operations to transform one object into the
• Desirable properties
• distance estimable without knowing history
• metric properties (e.g. triangle inequality)
How can we quantify the amount of evolution
between two subjects?
Markov Model: every site evolves independently,
probability of mutation only depends on present
state (no memory), probabilities of mutation are
expressed by transition matrix.
A C G T
A 0.900 0.033 0.033 0.033
C 0.033 0.900 0.033 0.033
G 0.033 0.033 0.900 0.033
T 0.033 0.033 0.033 0.900
After “one unit” of evolution, the
probability that an A mutates into a
C is given by the corresponding
entry in the matrix:
p(A→C | d=1) = M1[A→C] = 0.033
You are here.