This document provides an overview of bioinformatics and discusses key topics in the field. It begins by defining bioinformatics as the application of information technology to the analysis and management of biological data, facilitated by the use of computers. It then lists some common applications of bioinformatics like sequence analysis, molecular modeling, phylogeny analysis, and medical informatics. The document also discusses some of the promises of genomics and bioinformatics for applications in medicine, agriculture, and other fields. It provides a brief history of the emergence of bioinformatics as a field in the 1990s. Finally, it outlines some of the main topics that will be covered in the bioinformatics course, including databases, algorithms, interface design, and computational methods.
4. What is Bioinformatics ?
• Application of information technology to the
storage, management and analysis of biological
information (Facilitated by the use of
computers)
– Sequence analysis?
– Molecular modeling (HTX) ?
– Phylogeny/evolution?
– Ecology and population studies?
– Medical informatics?
– Image Analysis ?
– Statistics ? AI ?
– Sterkstroom of zwakstroom ?
5. Promises of genomics and bioinformatics
• Medicine (Pharma)
– Genome analysis allows the targeting of genetic
diseases
– The effect of a disease or of a therapeutic on RNA and
protein levels can be elucidated
– Knowledge of protein structure facilitates drug design
– Understanding of genomic variation allows the tailoring
of medical treatment to the individual’s genetic make-up
• The same techniques can be applied to crop (Agro) and
livestock improvement (Animal Health)
6. Bioinformatics: What’s in a name ?
• Begin 1990’s
• “Bio-informatics”:
Computing Power
Genbank
(Log)
Time (years)
7. Bioinformatics: What’s in a name ?
• Begin 1990’s
• “Bio-informatics”:
– convergence of explosive growth in
biotechnology, paralled by the explosive growth
in information technology
• Not new: > 30 years that people use
“computers” in biology
• In silico biology, database biology, ...
11. PCR + dye termination
Suddenly, a flash of insight caused him to pull the car
off the road and stop. He awakened his friend
dozing in the passenger seat and excitedly
explained to her that he had hit upon a solution -
not to his original problem, but to one of even
greater significance. Kary Mullis had just conceived
of a simple method for producing virtually unlimited
copies of a specific DNA sequence in a test tube -
the polymerase chain reaction (PCR)
12. Math
Bioinformatics, a scientific discipline …
Informatics
Theoretical Biology
Computational Biology
(Molecular)
Biology
Computer Science
Bioinformatics
15. Doel van de cursus
• Meer dan een inleiding tot ... het is de
bedoeling van de cursus een onderliggend
inzicht te verschaffen achter de
verschillende technieken.
• Naast het gebruik van recepten, wat terug
te vinden is in delen van de syllabus laat
een inzicht in
– de werking van databanken
– en de achterliggende algoritmen
• toe
– om wisselende interfaces op nieuwe
problemen toe te passen.
18. Examen
• Theorie
– Deel rond een zelf te kiezen publicatie die in verband
staat met de cursus
• Bv Bioinformatics of Computational Biology
– Drie inzichtsvragen over de cursus (inclusief !!)
• Practicum (“open-book”)
– Viertal oefeningen die meestal het schrijven van een
programma veronderstellen
• Puntenverdeling 50/50
19. Cursus
• Syllabus 25 Euro
– Syllabus
• V|Podcasts
• Weblems – Screencasts
27. Genome Size
E. coli = 4.2 x 106
Yeast = 18 x 106
Arabidopsis = 80 x 106
C.elegans = 100 x 106
Drosophila = 180 x 106
Human/Rat/Mouse = 3000 x 106
Lily = 300 000 x 106
With ... : 99.9 %
To primates: 99%
DOGS: Database Of Genome Sizes
30. And this is just the beginning ….
Next Generation Sequencing is here
31. Basics of the “old” technology
• Clone the DNA.
• Generate a ladder of labeled (colored) molecules
that are different by 1 nucleotide.
• Separate mixture on some matrix.
• Detect fluorochrome by laser.
• Interpret peaks as string of DNA.
• Strings are 500 to 1,000 letters long
• 1 machine generates 57,000 nucleotides/run
• Assemble all strings into a genome.
32. Basics of the “new” technology
• Get DNA.
• Attach it to something.
• Extend and amplify signal with some color
scheme.
• Detect fluorochrome by microscopy.
• Interpret series of spots as short strings of DNA.
• Strings are 30-300 letters long
• Multiple images are interpreted as 0.4 to 1.2
GB/run (1,200,000,000 letters/day).
• Map or align strings to one or many genome.
33. Next Generation Technologies
• 454
–Emulsion PCR
–Polymerase
–Natural Nucleotides
• 20-100Mb for 5-15k
–1% error rate
–Homopolymers
40. Read Length is Not As Important For Resequencing
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
8 10 12 14 16 18 20
Length of K-mer Reads (bp)
% of Paired K-mers with Uniquely
Assignable Location
E.COLI
HUMAN
Jay Shendure
54. Paired End Reads are Important!
Read 1 Read 2
Repetitive DNA
Unique DNA
Single read maps to
multiple positions
Paired read maps uniquely
Known Distance
55. Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh
Single Molecule Sequencing
Helicos Biosciences Corp.
Microscope slide
Single DNA
molecule
dNTP-Cy3
* * *
*
primer
Super-cooled
TIRF microscope
66. Weblems
• What ?
– Web-based problemes (over de huidige les
en/of voorbereiding op volgende les)
• When ?
– Einde van elke les
• How ?
– Oplossingen online via screencasts
– Practicum
– Voorbedereiding op het practicum examen ...
Niet alle problemen vereisen noodzakelijk
programmacode ...
67. Weblems
W1.1: To which phyla do the following species belong (a)
starfish (b) ginko tree (c) scorpion
W1.2: What are the common names for the following
species (a) Orycterophus afer (b) Beta vulagaris (c)
macrocystis pyrifera
W1.3: What species has the smallest known genome ? And
is genome size related to number of genes ?
W1.4: What are the 5 latest genomes published ? How
complete is “coverage” ?
W1.5: For approximately 10% of europeans, the painkiller
codeine is ineffective because the patients lack the
enzyme that converts codeine into the active molecule,
morphine. What is the most common mutation that
causes this condition ?