Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lec02 phylogeny (3-23-2017)


Published on

Microbial Diversity course, UMass Amherst. Covers Brown Ch. 4, 5 & 6

Published in: Science
  • Be the first to comment

Lec02 phylogeny (3-23-2017)

  1. 1. Lecture 14: Phylogeny Microbiology 480 University of Massachusetts Amherst Spring 2017
  2. 2. What are the different kinds of diversity? l  Taxonomic diversity l  Phylogenetic diversity l  Genetic diversity l  Functional diversity Morphological  diversity   Structural  diversity   Metabolic  diversity     Ecological  diversity   Behavioral  diversity   …there  are  many  ways  to  measure  diversity!  
  3. 3. Objec>ves   •  Define  phylogeny   •  What  is  a  phylogene>c  tree,  and  what  does  it  tell   you?   •  How  do  you  read  a  tree?  What  are  the  different   parts?   •  How  do  you  construct  a  tree?   •  How  do  you  root  the  tree  of  life?   •  What  are  some  alterna>ves  SSU  rRNA  analysis?  
  4. 4. Quiz   1.  What  does  a  phylogene>c  tree   tell  you?   a.  Evolu>onary  rela>onships  among   a  group  of  organisms   b.  Shared  func>ons  within  clades   c.  Major  ex>nc>on  events   2.  What  is  wrong  with  the  picture   at  right  (Brown,  Fig  2.2)?  
  5. 5. Phylogeny   •  inferred  evolu>onary  rela>onships  among  species  based  on   similari>es  and  differences  in  their  physical  or  gene>c   characteris>c   •  Phylogeny  may  also  refer  to  a  phylogene>c  tree,  the   illustra>on  of  these  rela>onships  
  6. 6. Phylogeny   •  Last  >me  we  looked  at  some  “wrong”  trees   including  Haeckel’s  3  kingdom  tree  and   Whi[aker’s  5  kingdom  tree.  Why  are  these   problema>c?  Subjec>ve  &  qualita>ve.   •  PCR  and  sequencing  make  it  possible  to   understand  how  organisms  are  related  in  an   objec>ve  and  quan>ta>ve  way.  
  7. 7. Descent  with  modifica>on  
  8. 8. Descent  with  modifica>on  
  9. 9. Descent  with  modifica>on  =  evolu>on   •  Defini>on  of  evolu>on  =  descent  with   modifica>on  from  a  common  ancestor   •  Charles  Darwin  first  presented  this  idea  in  his   book  The  Descent  of  Man.     •  Individual  species  ‘split’  into  two  or  more   daughter  species   –  concept  of  ver>cal  inheritance   –  common  ancestor  at  basal  nodes   –  Molecular  clock   •  Evolu>on  only  occurs  when  there  is  a  change  in   gene  frequency  within  a  popula>on  over  >me  
  10. 10. Reading  a  tree…  
  11. 11. Reading  a  tree   A.  Elements  of  a  tree:  root,  nodes,  branches  and  >ps   B.  Same  tree  as  in  A,  but  rotate  90  degrees,  so  that  evolu>onary  >me  progresses              from  leg  to  right.    
  12. 12. Reading  a  tree   •  To  “read”  a  tree,  you  need  to  know  the  parts.     –  The  >ps  are  the  extant  organisms  whose  rela>onship   you  are  trying  to  discern.     –  There  are  nodes  connec>ng  the  >ps,  which  represent   a  hypothe>cal  common  ancestor  between  the   organisms  in  that  clade.     –  Branch  lengths  correspond  the  number  of  differences   in  sequences,  which  are  also  an  expression  of  >me  if   you  assume  that  the  rate  of  evolu>on  is  the  same   across  the  tree.  This  is  a  mostly  good  assump>on.     –  The  separa>on  of  >ps  has  no  meaning.   –  The  tree’s  branches  can  rotate  freely  around  the  axes.    
  13. 13. Unequal  rates  of  evolu>on  
  14. 14. Unequal  rates  of  evolu>on   Similarity  between  organisms  is  not  necessarily   equal  to  evolu>onary  rela>onship.     •  Which  one  evolved  faster?     – ‘3’  evolved  faster  than  ‘2’   •  Which  is  most  similar  to  2?  Why?       – ‘2’  is  more  ‘similar’  to  ‘1’  than  to  ‘3’   •  However,  ‘2’  and  ‘3’  share  a  common  ancestor   ‘B’  
  15. 15. Derived  vs  Ancestral  Trait  
  16. 16. Derived  vs  Ancestral  Trait   •  A  derived  trait  is  one  that  was  NOT  present  in   the  common  ancestor.   •  Ancestral  (or  primi>ve  traits)  are  characters   that  WERE  present  in  a  common  ancestor.   •  These  terms  are  rela>ve  because  it  depends   which  common  ancestor  you  are  referring  to;   every  node  is  the  last  common  ancestor  for  all   descendants  of  that  group.    
  17. 17. Phylogene>c  groups   Monophyle)c      -­‐  derived  from  the  same  common  ancestor.   Paraphyle)c      -­‐  groups  which  have  evolved  from  a  single  ancestral  species  but  which  do   not  contain  all  the  descendants  of  that  ancestor.   Polyphyle)c      -­‐  a  taxonomic  group  having  origin  in  several  different  lines  of  descent      -­‐  think  of  the  prefix  “poly-­‐”  sugges>ng  many  common  ancestors    
  18. 18. What  about  “prokaryotes”?  
  19. 19. What  about  “prokaryotes”?   •  The  term  prokaryote  defines  organisms  by   what  do  not  have  (nuclei)   •  Are  nuclei  an  ancestral  trait?  NO.   •  Prokaryotes  are  not  a  monophyle>c  group.   •  Norm  Pace  says,  “I  believe  it  is  cri>cal  to  shake   loose  from  the  prokaryote/eukaryote  concept.   It  is  outdated,  a  guesswork  solu>on  to  an   ar>cula>on  of  biological  diversity  and  an   incorrect  model  for  the  course  of  evolu>on.“  
  20. 20. Construc>ng  a  phylogene>c  tree   •  Assume  you  have  chosen  which  species  to   analyze   •  (1)  Decide  which  gene  to  use  …     – SSU  ribosomal  RNA  gene  is  very  popular.   Why?  
  21. 21. Construc>ng  a  phylogene>c  tree   •  Assume  you  have  chosen  which  species  to   analyze   •  (1)  Decide  which  gene  to  use  (SSU  rRNA  gene)   •  SSU  ribosomal  RNA  gene   + Short,  only  1500  base  pairs   + Informa>on-­‐dense  because  it  is  a  non-­‐coding,   structural  RNA   + Essen>al  for  life  so  probably  not  horizontally   transferred   - Mul>ple  copies  per  genome   - Cannot  resolve  close  rela>onships  
  22. 22. Phylogeny  with  any  gene   •  Other  RNAs   + LSU  and  ITS  (aka  rRNA  spacer)  is  more  popular  for   fungi,  be[er  fine  scale  resolu>on   – Sequence  length  is  variable,  unlike  SSU   •  Protein  genes  or  sequences   •  Concatenated  genes–  collec>on  of  ~100  single   copy  “housekeeping”  genes  
  23. 23. Phylogeny  with  other  markers   •  Fa[y  acid  methyl  ester  (FAME)  measures   membrane  fa[y  acids   + Specific  to  some  monophyle>c  groups   – Include  paraphyle>c  groupings   – Can  vary  by  stress  condi>ons  
  24. 24. Construc>ng  a  phylogene>c  tree   •  (2)  Determine  the  gene  sequences  
  25. 25. Construc>ng  a  phylogene>c  tree   •  (3)  Use  sequence  alignment  to  iden>fy   homologous  residues,  measure    sequence   similarity  and  make  a  distance  matrix    
  26. 26. Jukes  &  Cantor  method     •  relates  sequence  similarity  to  evolu>onary   distance   -­‐ If  all  sequences  are  the  same,  distance  is  zero   -­‐ Distances  increase  as  sequence  similarity   decreases,  which  means  that  one  or  two  bases   difference  does  not  change  the  distance  much   -­‐ The  lowest  sequence  similarity  is  about  0.25   because  all  sequences  are  about  25%  similar  by   chance;  there  are  4  bases  in  the  gene>c  code  so   the  chance  that  one  base  will  match  another  is  1   in  4  
  27. 27. Construc>ng  a   phylogene>c  tree   •  (4)  Perform  phylogene>c   analysis,  which  usually  means   construc>ng  a  tree  
  28. 28. Neighbor-­‐joining  method   for  calcula>ng  branch  lengths   D   C   This  is  a   corrected  version   of  the  picture  on   Brown,  page  46.    
  29. 29. Construc>ng  a  phylogene>c  tree   •  How  can  you  tell  what  the  branch  lengths  are?     •  In  other  words,  you  need  to  place  the  node  u   – You  know  how  far  apart  a  &  b  are  from  each  other   – You  know  how  far  apart  a  is  from  something  else,   say  c,  so  measure  b  from  c  and  you  can  es>mate   where  node  u  should  be  
  30. 30. Tree  Construc>on  Complexi>es   1.  Choice  of  subs>tu>on  model   2.  GC  bias   3.  Long-­‐branch  a[rac>on   4.  Tree  algorithms  (besides  Neighbor  Joining)   5.  Bootstrapping  
  31. 31. Subs>tu>on  models   •  Two-­‐parameter  models  only  care  about   whether  a  subs>tu>on  is  a  transi)on  or   transversion  
  32. 32. Subs>tu>on  models   •  Transi>on  –  purine  to  purine  or  pyrimidine  to   pyrimidine   •  Transversion  –  purine  to  pyrimidine  or  vice  versa   •  Transi>ons  are  much  more  common  than   transversions,  so  these  are  weighted  differently   in  deciding  what  distance  to  assign  to  a  mismatch   •  Six-­‐parameter  models  consider  different  types  of   transi>ons  and  transversions,  weigh>ng  each   change  differently   •  Gaps  are  also  tricky…  for  example,  adjacent  gaps   are  not  unrelated  
  33. 33. GC  bias   •  Thermophiles  tend  to  prefer  GC  over  AT   •  To  solve,  ignore  transi>ons  and  only  base   tree  on  transversions  
  34. 34. Long-­‐branch  a[rac>on   •  Ar>ficial  clustering  of  long  branches  together,  or   of  short  branches   •  Different  rates  of  evolu>on  for  different  >ps   •  Very  long  branches  are  ususally  because  of  bad   sequence  or  poor  alignment  
  35. 35. Tree  algorithms     •  Neighbor-­‐joining  starts  with  a  radial  tree  and   joins  neighbors   •  Fitch  starts  with  two  sequences  and  adds  next   closest  rela>ves   •  Parsimony  makes  a  bunch  of  trees  and  find  the   one  that  is  the  most  simple,  usually  based  on  the   fewest  muta>ons   •  Maximum  likelihood  trees  are  the  best  &  most   computa>onally  intensive,  based  on  probability   •  Bayesian  inference  starts  with  random  tree   structure  &  random  parameters,  then  iterates   un>l  an  “op>mal”  tree  is  found  
  36. 36. Bootstrapping   •  A  measure  of   confidence  in  your   sequence  alignment   •  Numbers  are  from   0-­‐100,  with  100  being   perfect  confidence   •  Random  sampling  with   replacement  to  create   new  trees  
  37. 37. Roo>ng  the  tree  of  life  
  38. 38. Roo>ng  the  tree  of  life   •  How  do  you  root  a  tree?  You  need  an   outgroup.  This  is  an  organism  that  you  know  is   very  unrelated.     •  What  if  you  want  to  root  the  tree  of  life?  In   other  words,  what  if  you  want  to  put  all  the   organisms  of  life  in  the  tree?    
  39. 39. Sequence  homology   •  Homologous  genes  have  a  shared  ancestry.   Two  segments  of  DNA  can  have  shared   ancestry  because  of  either  a  specia>on  event   (orthologs)  or  a  duplica>on  event  (paralogs).  
  40. 40. Root  the  tree  of  life  using  paralogs   •  Duplica>on  of  Elonga>on  Factors  occurred  prior  to  divergence   (paralogs)   •  One  gene  tree  can  be  rooted  with  the  other  gene   •  Both  trees  yield  the  same  rela>onship  and  are  rooted  in  the   same  loca>on.  
  41. 41. Root  the  tree  of  life  using  paralogs   •  The  genes  for  the  protein  synthesis  elonga>on  factors  Tu  (EF-­‐Tu)   and  G  (EF-­‐G)  are  the  products  of  an  ancient  gene  duplica>on,  which   appears  to  predate  the  divergence  of  all  extant  organismal   lineages.     •  Thus,  it  should  be  possible  to  root  a  universal  phylogeny  based  on   either  protein  using  the  second  protein  as  an  outgroup.     •  This  approach  was  also  used  with  the  regulatory  and  cataly>c   subunits  of  the  proton  ATPases     •  All  phylogene>c  methods  used  strongly  place  the  root  of  the   universal  tree  between  two  highly  dis>nct  groups,  the  archaeons/ eukaryotes  and  the  eubacteria.     •  A  combined  data  set  of  EF-­‐Tu  and  EF-­‐G  sequences  favors  placement   of  the  eukaryotes  within  the  Archaea,  as  the  sister  group  to  the   Crenarchaeota   •  h[p://>cles/PMC38819/  
  42. 42. Protein-­‐based  models  of  evolu>on  
  43. 43. Protein-­‐based  models  of  evolu>on   •  database  of  proteins  from  420  modern  organisms,   looking  for  structures  that  were  common  to  all.     •  5  to  11  per  cent  were  universal-­‐-­‐  conserved  enough  to   have  originated  in  LUCA   •  LUCA  had  enzymes  to  break  down  and  extract  energy   from  nutrients,  and  some  protein-­‐making  equipment   •  LUCA  lacked  the  enzymes  for  making  and  reading  DNA   molecules   Kim  and  Caetano-­‐Anollés  BMC  Evolu*onary  Biology  2011  
  44. 44. The  root  moves   depending  on   whether  you  use   nucleic  acids  or   protein!  
  45. 45. The  root  moves  depending  on  whether  you   use  nucleic  acids  or  protein!   •  DNA  sequence-­‐based  roo>ng  of  the  tree  of  life   puts  the  root  within  the  Bacteria.   –   usuallyderived  from  analyses  of  the  sequence  of   ancient  gene  paralogs  (e.g.,  ATPases,  aaRSs,   elonga>on  factors)   •  Proteomic  analyses  for  many  proteins  puts  the   root  of  the  tree  of  life  within  the  Archaea.   – Archaeal  roo>ng  has  been  observed  for   phylogene>c  analyses  of  tRNA,  5S,  Rnase  P…  
  46. 46. Last  universal  common  ancestor   …also  known  as  LUCA  
  47. 47. Last  universal  common  ancestor   •  One  cannot  rely  on  nucleo>de  gene   sequences  because  these  would  have  mutated   beyond  recogni>on   •  Amino  acid  sequences  mutate  more  slowly   because  neutral  muta>ons  leave  the  amino   acid  sequence  fixed   •  The  ter>ary  folded  structure  of  a  protein  is   even  more  strongly  conserved  than  the   secondary  structure  
  48. 48. Objec>ves   •  Define  phylogeny   •  What  is  a  phylogene>c  tree,  and  what  does  it  tell   you?   •  How  do  you  read  a  tree?  What  are  the  different   parts?   •  How  do  you  construct  a  tree?   •  How  do  you  root  the  tree  of  life?   •  What  are  some  alterna>ves  SSU  rRNA  analysis?