• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Powerpoint slides
 

Powerpoint slides

on

  • 1,433 views

 

Statistics

Views

Total Views
1,433
Views on SlideShare
1,433
Embed Views
0

Actions

Likes
0
Downloads
25
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview
  • DNA/RNA overview

Powerpoint slides Powerpoint slides Presentation Transcript

  • CS 177 Phylogenetics II Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic software packages Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Phylogenetics II Before describing any theoretical or practical aspects of phylogenetics, it is necessary to give some disclaimers. This area of computational biology is an intellectual minefield! Neither the theory nor the practical applications of any algorithms are universally accepted throughout the scientific community. The application of different software packages to a data set is very likely to give different answers; minor changes to a data set are also likely to profoundly change the result. Disclaimers Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Are there Correct trees?? Phylogenetics II Despite all of all problems, it is actually quite simple to use computer programs calculate phylogenetic trees for data sets Provided the data are clean, outgroups are correctly specified, appropriate algorithms are chosen, no assumptions are violated, etc., can the true, correct tree be found and proven to be scientifically valid? Unfortunately, it is impossible to ever conclusively state what is the "true" tree for a group of sequences (or a group of organisms); taxonomy is constantly under revision as new data is gathered Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages  helix  sheet
      • Phenetic methods construct trees ( phenograms ) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes; phenograms are based on overall similarity
      • Cladistic methods construct trees ( cladograms) rely on assumptions about ancestral relationships as well as on current data; cladograms are based on character evolution ( e.g. shared derived characters)
    Phenetics versus cladistics Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Tree building methods Data type: genetic distance / character-state
      • Computational method: optimality criterion/clustering algorithmen
    Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Tree building (distance based)
    • UPGMA
    • - The simplest of the distance methods is the UPGMA (Unweighted Pair Group Method using Arithmetic averages)
    • Many multiple alignment programs such as PILEUP use a variant of UPGMA to create a dendrogram of DNA sequences which is then used to guide the multiple alignment algorithm
    Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • UPGMA Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages - 102 96 20 43 92 107 G - 62 106 89 58 23 F - 100 83 16 67 E - 47 96 111 D - 79 94 C - 63 B - A G F E D C B A
  • UPGMA Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages - 102 96 20 43 92 107 G - 62 106 89 58 23 F - 100 83 16 67 E - 47 96 111 D - 79 94 C - 63 B - A G F E D C B A
  • UPGMA Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages 94 - F 88 62 - E - 35 84 94 DG 89 58 23 F 83 16 67 E - 79 94 C - 63 B - A DG C B A
  • UPGMA Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages - CDG 74 61 64 61 CDG - 62 58 23 F - 16 67 E - 63 B - A F E B A
  • UPGMA Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages - CDG 61 64 112 CDG - 16 106 E - 98 B - AF E B AF
  • UPGMA Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages - CDG 108 112 CDG - 188 BE - AF BE AF Root
  • Maximum Parsimony (MP) Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
    • Parsimony involves evaluating all possible trees for each vertical column of sequence character (nucleotide position)
    • only informative sites are considered
    • each tree is given a score based on the number of evolutionary changes that are needed to explain the observed data
    • - finally, those trees that produce the smallest number of changes (shortest trees) overall for all sequence positions are identified
    I II III
  • Maximum Likelihood (ML) Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
    • Maximum Likelihood uses probability calculations based on a specific model of sequence evolution to find a tree that best accounts for the variation in a set of sequences
    • all possible trees for each nucleotide position are considered
    • the less mutations needed to fit a tree to the data, the more likely the tree
    • ML resembles MP in that the tree with the least number of changes will be most likely
    • however, ML evaluates trees using explicit evolutionary models
    • thus, the method can be used to explore relationships among more diverse taxa
    I II III
  • Computational methods for finding optimal trees Possible evolutionary trees Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages 2,027,025 10 135,135 9 10,395 8 954 7 105 6 15 5 3 4 1 3 1 2 unrooted (2 n -5)!/(2 n -3( n -3)!) Taxa ( n ) 30 3.58 x 10 36 . . . . . .
  • Computational methods for finding optimal trees
    • Exact algorithms
    • “ Guarantee” to find the optimal or “best” tree for the method of choice
    • Two types used in tree building:
    • Exhaustive search: Evaluates all possible unrooted trees, choosing the one with the best score for the method
    • Branch-and-bound search: Eliminates part of the tree that only contain suboptimal solutions
    • Heuristic algorithms
    • Approximate or “quick-and-dirty” methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so
    • Often operate by “hill-climbing” methods
    Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Heuristic algorithms Heuristic search algorithms are input order dependent and can get stuck in local minima or maxima From NHGRI lecture, C.-B. Stewart Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages Search for global minimum GLOBAL MAXIMUM GLOBAL MINIMUM local minimum local maximum Search for global maximum GLOBAL MAXIMUM GLOBAL MINIMUM Rerunning heuristic searches using different input orders of taxa can help find global minima or maxima
  • Assessing Phylogenetic Data Most data includes potentially misleading evidence of relationships One should not only construct phylogenetic hypotheses but should also assess what ‘confidence’ can be placed in these hypotheses How much support is there for a particular clade? Is there signal in the data? Questions: Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Assessing Phylogenetic Data How much support is there for a particular clade? Bootstrapping/Jack-knifing: Lots of randomized data sets are produced by sampling the real data with replacement (or in jackknifing, by removing some random proportion of the data); Frequencies of occurrence of groups are a measure of support for those groups - Bootstrap proportions aren’t easily interpretable - no indication for how good the data are but simply for how well the tree fits the data Problems: Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Assessing Phylogenetic Data Is there signal in the data? Possible approach: Random Permutations - Random permutation destroys any correlation among characters to that expected by chance alone - It preserves number of taxa, characters and character states in each character (and the theoretical maximum and minimum tree lengths) Original structured data with strong correlations among characters Randomly permuted data with any correlation among characters due to chance Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Assessing Phylogenetic Data Matrix Randomization Tests Original structured data with strong correlations among characters Randomly permuted data with any correlation among characters due to chance Compare some measure of data quality/hierarchical structure for the real and many randomly permuted data sets This allows us to define a test statistic for the null hypothesis that the real data are not better structured than randomly permuted and phylogenetically uninformative data Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • PTP (permutation tail probability) test Null Hypothesis: The length of the shortest tree is what you would see given random data How it works: Reject the null if the real data has shorter tree (the real data is more internally consistent than random data) Comments: Even a little bit of signal can lead you to reject the null; does not mean phylogenetic signal Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Review available at: http://evolution.genetics.washington.edu/phylip/software.html Popular phylogenetic software packages Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages PHYLIP version 3.6, Joe Felsenstein It is available free , from its Web site, in C source code , or as executables for pre-386 DOS, 386/486/Pentium DOS, Windows 3.1, Windows95/98/NT , 68k Macintosh, or PowerMac. The C source code is easily compiled on Unix systems, and VMS compilation support is also available in the package. It includes programs to carry out parsimony , distance matrix methods , maximum likelihood , and other methods on a variety of types of data, including DNA and RNA sequences, protein sequences, restriction sites, 0/1 discrete characters data, gene frequencies, continuous characters and distance matrices. It is the most widely-distributed phylogeny package, with over 7,000 registered users, some of them satisfied. It competes with PAUP* to be the program responsible for the most published trees. It has been distributed since October, 1980. PHYLIP is distributed at the PHYLIP web site at http://evolution.genetics.washington.edu, or by anonymous ftp from evolution.genetics.washington.edu in directory pub/phylip. All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages PHYLIP version 3.6, Joe Felsenstein Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages PAUP* (Phylogenetic Analysis Using Parsimony and other Methods) version 4.0beta, David Swofford PAUP* has been released as a provisional version by Sinauer Associates, of Sunderland, Massachusetts. It has Macintosh, PowerMac, Windows, and Unix/OpenVMS versions. PAUP* is the most sophisticated parsimony program, with many options and close compatibility with MacClade . It has become much broader with the inclusion of more methods. It includes parsimony, distance matrix , invariants, and maximum likelihood methods and many indices and statistical tests. It is described in a web page at http://www.sinauer.com/Titles/frswofford.htm, and in more detail at its web site at the LMS at http://www.lms.si.edu/PAUP/about.html. The price is $100 US for the Macintosh and PowerMac executable versions, $85 for the Windows executable version, and $150 for the Unix source code version, plus $20 for shipment. All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages PAUP* (Phylogenetic Analysis Using Parsimony and other Methods) version 4.0beta, David Swofford Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages MrBayes: Bayesian Inference of Phylogeny MrBayes is a program for Bayesian inference of phylogeny using Markov chain Monte Carlo methods. Avaialble for Mac, PC, and Unix. Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages MacClade , Wayne Maddison and David Maddison MacClade is described on its Web page , at http://phylogeny.arizona.edu/macclade/ macclade.html. A demonstration version of MacClade 3 is also available there. MacClade enables you to use the mouse-window interface to specify and rearrange phylogenies by hand, and watch the number of character steps and the distribution of states of a given character on the tree change as you do so. Available for Macintosh only. All distribution is by Sinauer Associates, 23 Plumtree Road, Sunderland, Massachusetts 01375-0407, USA. A disk with program, help file, and example data files, plus book (which has about 100 pages of intro to phylogenetic theory, and 250 pages of program instructions), is $100 U.S. ($40 for the book alone). All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages MacClade , Wayne Maddison and David Maddison Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages RASA , version 2.5, James Lyons-Weiler Software for Macintoshes that will perform "Relative Apparent Synapomorphy Analysis", a test for the presence of phylogenetic signal in any type of discrete character data matrix (morphological or molecular). The RASA program carries out the test and plots the results. RASA is menu-driven. The test compares the observed and null rates of increase in cladistic similarity among pairs of taxa predicted by an increase in the phenetic similarity among taxon pairs. The programs are available as Macintosh executables from their web page at http://bio.uml.edu/LW/RASA.html. All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages TCS version 1.06, Mark Clement and David Posada A program for estimating gene genealogies within a population . It does so by using the method introduced in the paper: Templeton, A. R., K. A. Crandall and C. F. Sing. 1992. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics 132: 619-633. This is a method that connects existing haplotypes in a minimum spanning tree which is essentially a parsimony method . It can also infer networks with loops in them. TCS is written in Java and has a graphic user interface for the display of the resulting networks. It may be run on any system that has the Java runtime environment. The program is described in the paper: Clement M., D. Posada, and K. Crandall. 2000. TCS: a computer program to estimate gene genealogies. Molecular Ecology 9: 1657-1660. TCS is available as Java executables, with documentation, at its web site at: http://bioag.byu.edu/zoology/crandall_lab/tcs.htm. All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages TCS version 1.06, Mark Clement and David Posada Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages BioEdit , version 4.8.4., Tom Hall This is a sequence editor with many kinds of general molecular biology functions available (alignment, BLAST searches, plasmid drawing, restriction mapping, sequence machine trace viewing, etc.). For our purposes the feature worth mentioning is that it comes with a number of existing phylogeny programs which can be automatically run from within BioEdit. These are: TreeView , fastDNAml , and six DNA and protein programs from PHYLIP . BioEdit is available as Windows95/98/NT executables from its web site at http://www.mbio.ncsu.edu/RNaseP/info/programs/BIOEDIT/bioedit.html. All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  •  
  • Popular phylogenetic software packages TreeView , Rod Page A program for displaying trees on Apple Macs and Windows PCs. It can draw rooted and unrooted trees, display bootstrap values, and supports the native font and graphics file formats of both Macs and PCs. The program reads NEXUS, PHYLIP , and Hennig86 style tree files (including files produced by fastDNAml and CLUSTALW ), and can save trees in the same formats so that it can convert trees among these formats. TreeView can read up to 100 trees with up to 500 taxa. The program is free , and can be obtained by World Wide Web from http://taxonomy.zoology.gla.ac.uk/rod/treeview.html. It comes in 68K Mac, PowerMac, and Windows 95/NT executable versions (and in a Windows 3.1 executable for version 1.4). There is also online help including an online manual. All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages TreeView , Rod Page Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages DnaSP version 3.53, Julio Rozas and Ricardo Rosas A software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DnaSP can estimate several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites), as well as linkage disequilibrium, recombination, gene flow and gene conversion parameters. It can also carry out several tests of neutrality: Additionally, it can estimate the confidence intervals of some test-statistics by the coalescent. The results of the analyses are displayed on tabular and graphic form. For the purposes of this web site, the relevant features are the calculation of measures of population divergence, which include the Jukes-Cantor method which can be used as a distance in phylogeny reconstruction. It is distributed as a Windows95/98/NT executable from its web site at http://www.bio.ub.es/~julio/DnaSP.html. All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages DnaSP version 3.53, Julio Rozas and Ricardo Rosas Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages Arlequin version 2.0, Laurent Excoffier A program for population genetics analysis . It can perform many kinds of population genetic tasks including estimation of gene frequencies, testing of linkage disequilibrium, and analysis of diversity between populations . For the purposes of this list, the relevant feature is its ability to compute a variety of genetic distance measures including of Jukes and Cantor, the Kimura 2-parameter distance, and the Tamura-Nei distance, each of these with or without correction for gamma-distributed rates of evolution. It can also compute a Minimum Spanning Tree network . Arlequin has its interactive "front end" written in Java , and requires the Java Runtime Environment (which is available from the Arlequin site for those who do not already have it). The core routines are available as binaries for Windows95/98/NT/2000, for MacOS for the PowerPC processor, and for Linux for Intel-compatible x86 processors. The binaries, Java code, Java Runtime Environment, and a PDF documentation file are available at its web site at http://acasun1.unige.ch/arlequin/. All information from: http://evolution.genetics.washington.edu/phylip/software.html Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages
  • Popular phylogenetic software packages Arlequin version 2.0, Laurent Excoffier Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic packages