Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MIB200A at UCDavis Module: Microbial Phylogeny; Class 2

365 views

Published on

Slides for discussion of Baldauf and Eisen papers

Published in: Science
  • Be the first to comment

  • Be the first to like this

MIB200A at UCDavis Module: Microbial Phylogeny; Class 2

  1. 1. Class 2: MIB200 Biology of Organisms without Nuclei Class #2: Phylogeny UC Davis, Fall 2019 Instructor: Jonathan Eisen 1
  2. 2. Hugenholtz et al. 1998
  3. 3. Woese 1987
  4. 4. Some Questions • What is a phylogenetic tree? • What can be shown in a phylogenetic tree? • How does one infer a phylogenetic tree? • How does one know if a tree is correct? • How can one use phylogenetic trees? • What is the difference between a gene tree and a species tree?
  5. 5. Raff J. How to Read and Understand a Scientific Article 1. Begin by reading the introduction, not the abstract. https://violentmetaphors.files.wordpress.com/2018/01/how-to-read-and-understand-a-scientific-article.pdf 2. Identify the big question. 3. Summarize the background in five sentences or less. 4. Identify the specific question(s). 5. Identify the approach. 6. Read the methods section. 7. Read the results section. 8. Determine whether the results answer the specific question(s). 9. Read the conclusion/discussion/interpretation section. 10. Go back to the beginning and read the abstract. 11. Find out what other researchers say about the paper.
  6. 6. Raff J. How to Read and Understand a Scientific Article 1. Begin by reading the introduction, not the abstract. https://violentmetaphors.files.wordpress.com/2018/01/how-to-read-and-understand-a-scientific-article.pdf 2. Identify the big question. 3. Summarize the background in five sentences or less. 4. Identify the specific question(s). 5. Identify the approach. 6. Read the methods section. 7. Read the results section. 8. Determine whether the results answer the specific question(s). 9. Read the conclusion/discussion/interpretation section. 10. Go back to the beginning and read the abstract. 11. Find out what other researchers say about the paper. X
  7. 7. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  8. 8. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  9. 9. A phylogenetic tree is composed of branches (edges) and nodes. Branches connect nodes; a node is the point at which two (or more) branches diverge. Branches and nodes can be internal or external (terminal). An internal node corresponds to the hypothetical last common ancestor (LCA) of everything arising from it. Terminal nodes correspond to the sequences from which the tree was derived (also referred to as operational taxonomic units or ‘OTUs’).
  10. 10. Internal nodes represent hypothetical ancestral taxa a b c d e f g h root, root node terminal (or tip) taxa internal nodes internal branches u v w x y z t Terminal branches Parts of a phylogenetic tree 13
  11. 11. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  12. 12. Groups
  13. 13. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  14. 14. Types of Trees
  15. 15. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  16. 16. Tree Roots
  17. 17. Tree Roots At the base of a phylogenetic tree is its ‘root’. This is the oldest point in the tree, and it, in turn, implies the order of branching in the rest of the tree; that is, who shares a more recent common ancestor with whom. The only way to root a tree is with an ‘outgroup’, an external point of reference. An outgroup is anything that is not a natural member of the group of interest (i.e. the ‘ingroup’
  18. 18. Rooting 21
  19. 19. Woese 1987
  20. 20. Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016 Unrooted Tree of Life from Woese 23 ROOT
  21. 21. Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016 Unrooted Tree of Life from Woese 24 ROOT MAJOR DEBATE/AMBIGUITIES
  22. 22. Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016 Alternative Position of Eukaryote Branch 25 ROOT
  23. 23. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  24. 24. Orthology vs. Paralogy
  25. 25. Orthology vs. Paralogy Evolution is about homology; that is, the similarity due to common ancestry.
  26. 26. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  27. 27. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  28. 28. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  29. 29. Sequence Alignment
  30. 30. Refining Alignment
  31. 31. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  32. 32. The methods for calculating phylogenetic trees fall into two general categories. These are distance-matrix methods, also known as clustering or algorithmic methods (e.g. UPGMA, neighbour-joining, Fitch–Margoliash), and discrete data methods, also known as tree searching methods (e.g. parsimony, maximum likelihood, Bayesian methods)
  33. 33. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  34. 34. Bootstrapping
  35. 35. Long branch attraction 39
  36. 36. Phylogenomics
  37. 37. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  38. 38. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  39. 39. tion ary in form ation can be used to im - prove fun ction al prediction s. Below, I presen t an outlin e of on e such phylog- enomic m eth od (see Fig. 1), an d I com - pare th is m eth od to n on evolution ary fun ction al prediction m eth ods. Th is m eth od is based on a relatively sim ple assum ption —because gen e fun ction s ch an ge as a result of evolution , recon - structin g th e evolution ary h istory of gen es sh ould h elp predict th e fun ction s of un ch aracterized gen es. Th e first step is th e gen eration of a ph ylogen etic tree represen tin g th e evolution ary h istory of th e gen e of in terest an d its h om ologs. Such trees are distin ct from clusters an d oth er m ean s of ch aracterizin g sequen ce sim ilarity because th ey are in ferred by special tech n iques th at h elp con vert pat- tern s of sim ilarity in to evolution ary re- lation sh ips (see Swofford et al. 1996). Af- ter th e gen e tree is in ferred, biologically determ in ed fun ction s of th e various h o- m ologs are overlaid on to th e tree. Fi- n ally, th e structure of th e tree an d th e relative ph ylogen etic position s of gen es of differen t fun ction s are used to trace th e h istory of fun ction al ch an ges, wh ich is th en used to predict fun ction s of un - ch aracterized gen es. More detail of th is m eth od is provided below. Identification of Homologs Th e first step in studyin g th e evolution of a particular gen e is th e iden tification of h om ologs. As with sim ilarity-based fun ction al prediction m eth ods, likely h om ologs of a particular gen e are iden - database erated se BLAST (A fam ily is ers), it m a a subset m ust be d m igh t ac th at wou sis. Alignment Sequen ce an alysis h th e assign Each col align m en acids or m on evol um n is tr gen etic a wh ich th m ology cluded (G sion of ce kn own as gen etic m n atory po ated with m an y seq ages) are th e evolu with m as Phylogene For exten atin g ph y Table 1. Methods of Predicting Gene Function When Homologs Have Multiple Functions Highest Hit The uncharacterized gene is assigned the function (or frequently, the annotated function) of the gene that is identified as the highest hit by a similarity search program (e.g., Tomb et al. 1997). Top Hits Identify top 10+ hits for the uncharacterized gene. Depending on the degree of consensus of the functions of the top hits, the query sequence is assigned a specific function, a general activity with unknown specificity, or no function (e.g., Blattner et al. 1997). Clusters of Orthologous Groups Genes are divided into groups of orthologs based on a cluster analysis of pairwise similarity scores between genes from different species. Uncharacterized genes are assigned the function of characterized orthologs (Tatusov et al. 1997). Phylogenomics Known functions are overlaid onto an evolutionary tree of all homologs. Functions of uncharacterized genes are predicted by their phylogenetic position relative to characterized genes (e.g., Eisen et al. 1995, 1997). Insight/Outlook
  40. 40. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  41. 41. greatly from m ore data, it is useful to augm en t th is in itial list by usin g iden ti- fied h om ologs as queries for furth er m on ly used: parsim on y, distan ce, an d m axim um likelih ood (Table 3), an d each h as its advan tages an d disadvan tages. I Table 2. Types of Molecular Homology Homolog Genes that are descended from a common ancestor (e.g., all globins) Ortholog Homologous genes that have diverged from each other after speciation events (e.g., human b- and chimp b-globin) Paralog Homologous genes that have diverged from each other after gene duplication events (e.g., b- and g-globin) Xenolog Homologous genes that have diverged from each other after lateral gene transfer events (e.g., antibiotic resistance genes in bacteria) Positional homology Common ancestry of specific amino acid or nucleotide positions in different genes
  42. 42. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  43. 43. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  44. 44. al. 1989). However, exam in ation of th e percen t sim ilarity between m ycoplasm al gen es an d th eir h om ologs in bacteria does n ot clearly sh ow th is relation sh ip. Th is is because m ycoplasm as h ave un - dergon e an accelerated rate of m olecular evolution relative to oth er bacteria. Th us, a BLAST search with a gen e from Bacillus subtilis (a low GC Gram -positive species) will result in a list in wh ich th e m ycoplasm a h om ologs (if th ey exist) score lower th an gen es from m an y spe- Table 3. Molecular Phylogenetic Methods Method Parsimony Possible trees are compared and each is given a score that is a reflection of the minimum number of character state changes (e.g., amino acid substitutions) that would be required over evolutionary time to fit the sequences into that tree. The optimal tree is considered to be the one requiring the fewest changes (the most parsimonious tree). Distance The optimal tree is generated by first calculating the estimated evolutionary distance between all pairs of sequences. Then these distances are used to generate a tree in which the branch patterns and lengths best represent the distance matrix. Maximum likelihood Maximum likelihood is similar to parsimony methods in that possible trees are compared and given a score. The score is based on how likely the given sequences are to have evolved in a particular tree given a model of amino acid or nucleotide substitution probabilities. The optimal tree is considered to be the one that has the highest probability. Bootstrapping Alignment positions within the original multiple sequence alignment are resampled and new data sets are made. Each bootstrapped data set is used to generate a separate phylogenetic tree and the trees are compared. Each node of the tree can be given a bootstrap percentage indicating how frequently those species joined by that node group together in different trees. Bootstrap percentage does not correspond directly to a confidence limit. Insight/Outlook

×