0
Introduction toBioinformatics                  1
Introduction to Bioinformatics.LECTURE 5: Variation within and between                    species*   Chapter 5: Are Neande...
Neandertal, Germany, 1856              Initial interpretations:                    * bear skull                    * patho...
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION                                     4
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION                                     5
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION                                     6
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5.1 Variation in DNA sequences * Even closely r...
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCESDiploid chromosomes                                 8
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCESMitosis: diploid reproduction                                 9
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCESMeiosis: diploid (=double) → haploid (=single)                ...
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES  * typing error rate very good typist: 1 error / 1K typed let...
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES GERM LINE Reverse time and follow your cells: • Now you count...
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES   GERM LINE MUTATIONS   This potentially immortal lineage of ...
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES  * Polymorphism : multiple possibilities for a nucleotide: al...
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCESPurines – Pyrimidines                                 15
Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES Transitions – Transversions                                 16
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5.2 Mitochondrial DNA * mitochondriae are inher...
Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNAH.sapiens mitochondrion          18
Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA           EM photograph of H. Sapiens mtDNA                          ...
Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA                                 20
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5.3 Variation between species * genetic variati...
Introduction to Bioinformatics5.3 VARIATION BETWEEN SPECIESSubstitution rate* Mutations originate in single individuals* M...
Introduction to Bioinformatics5.3 VARIATION BETWEEN SPECIES    Substitution rate and mutation rate    * For neutral mutati...
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION   5.4 Estimating genetic distance   * Substitut...
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE     Multiple substitutions and Back-mutations     conceal t...
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE    * Saturation: on average one substitution per site    * ...
Introduction to Bioinformatics5.4 ESTIMATING GENETIC DISTANCE      * True genetic distance (proportion): K      * Observed...
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE SEQUENCE EVOLUTION is a Markov process: a sequence at gener...
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE The Jukes-Cantor model Correction for multiple substitution...
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Therefore, the one-step Markov process has the following transit...
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL  After t generations the substitution probability is:         M(...
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Spectral decomposition of M(t):        MJCt = ∑i λitviviT Define...
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL   substitution probability s(t) per site after t generations:   ...
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL For small α the observed genetic distance is:                   ...
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL   The Jukes-Cantor formula :    K ≈ − 3 ln (1 − 4 d )           ...
Jukes-Cantor               36
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Variance in K                                                   ...
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Variance in K Variance: Var(d) = d(1-d)/n               ∂K    1 ...
Var(K)         39
Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL  EXAMPLE 5.4 on page 90  * Create artificial data with n = 1000:...
Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90                                 41
Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90                                 42
Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90                                 43
Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 (= FIG 5.3)                                      44
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE The Kimura 2-parameter model Include substitution bias in c...
Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL The one-step Markov process substitution matrix now becomes:  ...
Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL  After t generations the substitution probability is:         ...
Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL  Spectral decomposition of M(t):         MK2Pt = ∑i λitviviT  ...
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCEOther models for nucleotide evolution* Different types of tr...
Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCEOther models for nucleotide evolutionDEFICIT:all above model...
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION  5.5 CASE STUDY: Neanderthals  * mtDNA of 206 H...
Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals  * Pairwise genetic difference – corrected with Jukes-Cantor  f...
Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals                     distance map d(i,j)                        ...
Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals   MDS                                       ted                ...
Introduction to Bioinformatics5.5 CASE STUDY: Neanderthalsphylogentic tree                                 55
END of LECTURE 5             56
Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION                                               57
58
Upcoming SlideShare
Loading in...5
×

Varriation Within and Between Species

216

Published on

Case study: Are Neanderthals still among us?

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
216
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Varriation Within and Between Species"

  1. 1. Introduction toBioinformatics 1
  2. 2. Introduction to Bioinformatics.LECTURE 5: Variation within and between species* Chapter 5: Are Neanderthals among us? 2
  3. 3. Neandertal, Germany, 1856 Initial interpretations: * bear skull * pathological idiot * Old Dutchman ... 3
  4. 4. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 4
  5. 5. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5
  6. 6. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 6
  7. 7. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5.1 Variation in DNA sequences * Even closely related individuals differ in genetic sequences * (point) mutations : copy error at certain location * Sexual reproduction – diploid genome 7
  8. 8. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCESDiploid chromosomes 8
  9. 9. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCESMitosis: diploid reproduction 9
  10. 10. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCESMeiosis: diploid (=double) → haploid (=single) 10
  11. 11. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES * typing error rate very good typist: 1 error / 1K typed letters * all our diploid cells constantly reproduce 7 billion letters * typical cell copying error rate is ~ 1 error /1 Gbp 11
  12. 12. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES GERM LINE Reverse time and follow your cells: • Now you count ~ 1013 cells • One generation ago you had 2 cells ‘somewhere’ in your parents body • Small T generations ago you had (2T – multiple ancestors) cells • Large T generations ago you counted #(fertile ancestors) cells • Congratulations: you are 3.4 billion years old !!! Fast-forward time and follow your cells: • Only a few cells in your reproductive organs have a chance to live on in the next generations • The rest (including you) will die … 12
  13. 13. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES GERM LINE MUTATIONS This potentially immortal lineage of (germ) cells is called the GERM LINE All mutations that we have accumulated are en route on the germ line 13
  14. 14. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES * Polymorphism : multiple possibilities for a nucleotide: allelle * Single Nucleotide Polymorphism – SNP (“snip”) point mutation example: AAATAAA vs AAACAAA * Humans: SNP = 1/1500 bases = 0.067% * STR = Short Tandem Repeats (microsatelites) example: CACACACACACACACACA … * Transition - transversion 14
  15. 15. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCESPurines – Pyrimidines 15
  16. 16. Introduction to Bioinformatics5.1 VARIATION IN DNA SEQUENCES Transitions – Transversions 16
  17. 17. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5.2 Mitochondrial DNA * mitochondriae are inherited only via the maternal line!!! * Very suitable for comparing evolution, not reshuffled 17
  18. 18. Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNAH.sapiens mitochondrion 18
  19. 19. Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA EM photograph of H. Sapiens mtDNA 19
  20. 20. Introduction to Bioinformatics 5.2 MITOCHONDRIAL DNA 20
  21. 21. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5.3 Variation between species * genetic variation accounts for morphological- physiological-behavioral variation * Genetic variation (c.q. distance) relates to phylogenetic relation (=relationship) * Necessity to measure distances between sequences: a metric 21
  22. 22. Introduction to Bioinformatics5.3 VARIATION BETWEEN SPECIESSubstitution rate* Mutations originate in single individuals* Mutations can become fixed in a population* Mutation rate: rate at which new mutations arise* Substitution rate: rate at which a species fixes new mutations* For neutral mutations 22
  23. 23. Introduction to Bioinformatics5.3 VARIATION BETWEEN SPECIES Substitution rate and mutation rate * For neutral mutations * ρ = 2Nμ*1/(2N) = μ * ρ = K/(2T) 23
  24. 24. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5.4 Estimating genetic distance * Substitutions are independent (?) * Substitutions are random * Multiple substitutions may occur * Back-mutations mutate a nucleotide back to an earlier value 24
  25. 25. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE Multiple substitutions and Back-mutations conceal the real genetic distance GACTGATCCACCTCTGATCCTTTGGAACTGATCGT TTCTGATCCACCTCTGATCCTTTGGAACTGATCGT TTCTGATCCACCTCTGATCCATCGGAACTGATCGT GTCTGATCCACCTCTGATCCATTGGAACTGATCGT observed : 2 (= d) actual : 4 (= K) 25
  26. 26. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE * Saturation: on average one substitution per site * Two random sequences of equal length will match for approximately ¼ of their sites * In saturation therefore the proportional genetic distance is ¼ 26
  27. 27. Introduction to Bioinformatics5.4 ESTIMATING GENETIC DISTANCE * True genetic distance (proportion): K * Observed proportion of differences: d * Due to back-mutations K ≥ d 27
  28. 28. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE SEQUENCE EVOLUTION is a Markov process: a sequence at generation (= time) t depends only the sequence at generation t-1 28
  29. 29. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE The Jukes-Cantor model Correction for multiple substitutions Substitution probability per site per second is α Substitution means there are 3 possible replacements (e.g. C → {A,G,T}) Non-substitution means there is 1 possibility (e.g. C → C) 29
  30. 30. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Therefore, the one-step Markov process has the following transition matrix: A C G T A 1-α α/3 α/3 α/3 C α/3 1-α α/3 α/3 MJC = G α/3 α/3 1-α α/3 T α/3 α/3 α/3 1-α 30
  31. 31. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL After t generations the substitution probability is: M(t) = MJCt Eigen-values and eigen-vectors of M(t): λ1 = 1, (multiplicity 1): v1 = 1/4 (1 1 1 1)T λ2..4 = 1-4α/3, (multiplicity 3): v2 = 1/4 (-1 -1 1 1)T v3 = 1/4 (-1 -1 -1 1)T v4 = 1/4 (1 -1 1 -1)T 31
  32. 32. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Spectral decomposition of M(t): MJCt = ∑i λitviviT Define M(t) as: r(t) s(t) s(t) s(t) s(t) r(t) s(t) s(t) MJCt = s(t) s(t) r(t) s(t) s(t) s(t) s(t) r(t) Therefore, substitution probability s(t) per site after t generations is: s(t) = ¼ - ¼ (1 - 4α/3)t 32
  33. 33. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL substitution probability s(t) per site after t generations: s(t) = ¼ - ¼ (1 - 4α/3)t observed genetic distance d after t generations ≈ s(t) : d = ¼ - ¼ (1 - 4α/3)t For small α : 3 t≈− ln (1 − 4 d ) 4α 3 33
  34. 34. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL For small α the observed genetic distance is: 3 t≈− ln (1 − 4 d ) 4α 3 The actual genetic distance is (of course): K = αt So: K ≈ − 3 ln (1 − 4 d ) 4 3 This is the Jukes-Cantor formula : independent of α and t. 34
  35. 35. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL The Jukes-Cantor formula : K ≈ − 3 ln (1 − 4 d ) 4 3 For small d using ln(1+x) ≈ x : K≈d So: actual distance ≈ observed distance For saturation: d ↑ ¾ : K →∞ So: if observed distance corresponds to random sequence- distance then the actual distance becomes indeterminate 35
  36. 36. Jukes-Cantor 36
  37. 37. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Variance in K 2  ∂K   ∂K  If: K = f(d) then: 2δK =  δd ⇒ δK 2 =   δd 2  ∂K   ∂d   ∂d  So: Var ( K ) =  ∂d  Var(d )   Generation of a sequence of length n with substitution rate n k d is a binomial process: Prob(k ) =  d (1 − d ) n − k k    and therefore with variance: Var(d) = d(1-d)/n ∂K 1 Because of the Jukes-Cantor formula: = ∂d 1 − 4 d 3 37
  38. 38. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL Variance in K Variance: Var(d) = d(1-d)/n ∂K 1 Jukes-Cantor: = ∂d 1 − 4 d 3 So: d (1 − d ) Var ( K ) ≈ n(1 − 4 d ) 2 3 38
  39. 39. Var(K) 39
  40. 40. Introduction to Bioinformatics 5.4 THE JUKES-CANTOR MODEL EXAMPLE 5.4 on page 90 * Create artificial data with n = 1000: generate K* mutations * Count d * With Jukes-Cantor relation reconstruct estimate K(d) * Plot K(d) – K* 40
  41. 41. Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 41
  42. 42. Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 42
  43. 43. Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 43
  44. 44. Introduction to Bioinformatics 5.4 EXAMPLE 5.4 on page 90 (= FIG 5.3) 44
  45. 45. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCE The Kimura 2-parameter model Include substitution bias in correction factor Transition probability (G↔A and T↔C) per site per second is α Transversion probability (G↔T, G↔C, A↔T, and A↔C) per site per second is β 45
  46. 46. Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL The one-step Markov process substitution matrix now becomes: A C G T A 1-α-β β α β MK2P = C β 1-α-β β α G α β 1-α-β β T β α β 1-α-β 46
  47. 47. Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL After t generations the substitution probability is: M(t) = MK2Pt Determine of M(t): eigen-values {λi} and eigen-vectors {vi} 47
  48. 48. Introduction to Bioinformatics 5.4 THE KIMURA 2-PARAM MODEL Spectral decomposition of M(t): MK2Pt = ∑i λitviviT Determine fraction of transitions per site after t generations : P(t) Determine fraction of transitions per site after t generations : Q(t) Genetic distance: K ≈ - ½ ln(1-2P-Q) – ¼ ln(1 – 2Q) Fraction of substitutions d = P + Q → Jukes-Cantor 48
  49. 49. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCEOther models for nucleotide evolution* Different types of transitions/transversions* Pairwise substitutions GTR (= General Time Reversible) model* Amino-acid substitutions matrices*… 49
  50. 50. Introduction to Bioinformatics 5.4 ESTIMATING GENETIC DISTANCEOther models for nucleotide evolutionDEFICIT:all above models assume symmetric substitution probs; prob(A→T) = prob(T→A)Now strong evidence that this assumption is not trueChallenge: incorporate this in a self-consistent model 50
  51. 51. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 5.5 CASE STUDY: Neanderthals * mtDNA of 206 H. sapiens from different regions * Fragments of mtDNA of 2 H. neanderthaliensis, including the original 1856 specimen. * all 208 samples from GenBank * A homologous sequence of 800 bp of the HVR could be found in all 208 specimen. 51
  52. 52. Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals * Pairwise genetic difference – corrected with Jukes-Cantor formula * d(i,j) is JC-corrected genetic difference between pair (i,j); * dT = d * MDS (Multi Dimensional Scaling): translate distance table d to a nD-map X, here 2D-map 52
  53. 53. Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals distance map d(i,j) 53
  54. 54. Introduction to Bioinformatics5.5 CASE STUDY: Neanderthals MDS ted se para well- H. neanderthaliensis H. sapiens 54
  55. 55. Introduction to Bioinformatics5.5 CASE STUDY: Neanderthalsphylogentic tree 55
  56. 56. END of LECTURE 5 56
  57. 57. Introduction to BioinformaticsLECTURE 5: INTER- AND INTRASPECIES VARIATION 57
  58. 58. 58
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×