4. What is Bioinformatics ?
• Application of information technology to the
storage, management and analysis of biological
information (Facilitated by the use of
computers)
– Sequence analysis?
– Molecular modeling (HTX) ?
– Phylogeny/evolution?
– Ecology and population studies?
– Medical informatics?
– Image Analysis ?
– Statistics ? AI ?
– Sterkstroom of zwakstroom ?
5. Promises of genomics and bioinformatics
• Medicine (Pharma)
– Genome analysis allows the targeting of genetic
diseases
– The effect of a disease or of a therapeutic on RNA and
protein levels can be elucidated
– Knowledge of protein structure facilitates drug design
– Understanding of genomic variation allows the tailoring
of medical treatment to the individual’s genetic make-
up
• The same techniques can be applied to crop (Agro) and
livestock improvement (Animal Health)
6. Bioinformatics: What’s in a name ?
• Begin 1990’s
• “Bio-informatics”:
Computing Power
Genbank
(Log)
Time (years)
7. Bioinformatics: What’s in a name ?
• Begin 1990’s
• “Bio-informatics”:
– convergence of explosive growth in
biotechnology, paralled by the explosive growth
in information technology
• Not new: > 30 years that people use
“computers” in biology
• In silico biology, database biology, ...
11. PCR + dye termination
Suddenly, a flash of insight caused him to pull the car
off the road and stop. He awakened his friend
dozing in the passenger seat and excitedly
explained to her that he had hit upon a solution -
not to his original problem, but to one of even
greater significance. Kary Mullis had just conceived
of a simple method for producing virtually unlimited
copies of a specific DNA sequence in a test tube -
the polymerase chain reaction (PCR)
12. Bioinformatics, a scientific discipline …
Math
Computer Science Theoretical Biology
Bioinformatics
(Molecular)
Informatics
Biology
Computational Biology
15. Doel van de cursus
• Meer dan een inleiding tot ... het is de
bedoeling van de cursus een onderliggend
inzicht te verschaffen achter de
verschillende technieken.
• Naast het gebruik van recepten, wat terug
te vinden is in delen van de syllabus laat
een inzicht in
– de werking van databanken
– en de achterliggende algoritmen
• toe
– om wisselende interfaces op nieuwe
problemen toe te passen.
18. Examen
• Theorie
– Deel rond een zelf te kiezen publicatie die in verband
staat met de cursus
• Bv Bioinformatics of Computational Biology
– Drie inzichtsvragen over de cursus (inclusief !!)
• Practicum (“open-book”)
– Viertal oefeningen die meestal het schrijven van een
programma veronderstellen
• Puntenverdeling 50/50
19. Cursus
• 25 Euro
– Syllabus
– Hand-outs van Les/Practicum 1
– V|Podcasts
– Weblems – Screencasts
– Flash Drive
• Image to be installed
29. Genome Size
E. coli = 4.2 x 106
Yeast = 18 x 106
Arabidopsis = 80 x 106
C.elegans = 100 x 106
Drosophila = 180 x 106
Human/Rat/Mouse = 3000 x 106
Lily = 300 000 x 106
With ... : 99.9 %
To primates: 99%
DOGS: Database Of Genome Sizes
32. And this is just the beginning ….
Next Generation Sequencing is here
33. Basics of the “old” technology
• Clone the DNA.
• Generate a ladder of labeled (colored) molecules
that are different by 1 nucleotide.
• Separate mixture on some matrix.
• Detect fluorochrome by laser.
• Interpret peaks as string of DNA.
• Strings are 500 to 1,000 letters long
• 1 machine generates 57,000 nucleotides/run
• Assemble all strings into a genome.
34. Basics of the “new” technology
• Get DNA.
• Attach it to something.
• Extend and amplify signal with some color
scheme.
• Detect fluorochrome by microscopy.
• Interpret series of spots as short strings of DNA.
• Strings are 30-300 letters long
• Multiple images are interpreted as 0.4 to 1.2
GB/run (1,200,000,000 letters/day).
• Map or align strings to one or many genome.
35. Next Generation Technologies
• 454
–Emulsion PCR
–Polymerase
–Natural Nucleotides
• 20-100Mb for 5-15k
–1% error rate
–Homopolymers
42. Read Length is Not As Important For Resequencing
100%
% of Paired K-mers with Uniquely 90%
Assignable Location 80%
70%
60%
E.COLI
50%
HUMAN
40%
30%
20%
10%
0%
8 10 12 14 16 18 20
Length of K-mer Reads (bp)
Jay Shendure
56. Paired End Reads are Important!
Known Distance
Read 1 Read 2
Repetitive DNA
Unique DNA
Paired read maps uniquely
Single read maps to
multiple positions
57. Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh
Single Molecule Sequencing
Microscope slide
* * *
Single DNA
molecule
Super-cooled
primer TIRF microscope
dNTP-Cy3 *
Helicos Biosciences Corp.
69. Weblems
• What ?
– Web-based problemes (over de huidige les
en/of voorbereiding op volgende les)
• When ?
– Einde van elke les
• How ?
– Oplossingen online via screencasts
– Practicum
– Voorbedereiding op het practicum examen ...
Niet alle problemen vereisen noodzakelijk
programmacode ...
70. Weblems
W1.1: To which phyla do the following species belong (a)
starfish (b) ginko tree (c) scorpion
W1.2: What are the common names for the following
species (a) Orycterophus afer (b) Beta vulagaris (c)
macrocystis pyrifera
W1.3: What species has the smallest known genome ? And
is genome size related to number of genes ?
W1.4: What are the 5 latest genomes published ? How
complete is “coverage” ?
W1.5: For approximately 10% of europeans, the painkiller
codeine is ineffective because the patients lack the
enzyme that converts codeine into the active molecule,
morphine. What is the most common mutation that
causes this condition ?