Crunching Huge Phylogenies. A. Stamatakis
Upcoming SlideShare
Loading in...5
×
 

Crunching Huge Phylogenies. A. Stamatakis

on

  • 1,792 views

 

Statistics

Views

Total Views
1,792
Views on SlideShare
1,792
Embed Views
0

Actions

Likes
0
Downloads
23
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Crunching Huge Phylogenies. A. Stamatakis Crunching Huge Phylogenies. A. Stamatakis Presentation Transcript

  • Crunching Huge Phylogenies: A Rapid Bootstrap Algorithm and Massive Parallelism on the IBM BlueGene Alexandros Stamatakis Swiss Federal Institute of Technology Lausanne (EPFL) School of Computer & Communication Sciences Laboratory for Computational Biology and Bioinformatics Lausanne, Switzerland & Swiss Institute of Bioinformatics Alexandros.Stamatakis@epfl.ch icwww.epfl.ch/~stamatak
  • The Missing Part Data Assembly Inference ? Tree Analysis Alexandros Stamatakis, October 2007
  • The Missing Part Data Assembly Tree Analysis Alexandros Stamatakis, October 2007
  • IBM BlueGene/L supercomputer Alexandros Stamatakis, October 2007
  • Rapid Bootstrapping Bootstopping Criterion Alexandros Stamatakis, October 2007
  • The Big Hardware Problem CPU Speed 40% p.a. Memory Speed 9% p.a. 2007 1980 Alexandros Stamatakis, October 2007
  • ... and why this concerns Bioinformatics Sequence CPU Speed 40% p.a. Data Memory Speed 9% p.a. 2007 1980 Alexandros Stamatakis, October 2007
  • ... and why this concerns Bioinformatics Application of HPC techniques will become Sequence much moreSpeed 40% p.a. CPU important Data Memory Speed 9% p.a. 2007 1980 Alexandros Stamatakis, October 2007
  • Cache Hierarchy Alexandros Stamatakis, October 2007
  • Outline Introduction ● Computation of Phylogenies ● Maximum Likelihood ● Web & Grid Services ● Three Steps Towards the Tree of Life ● Parallelism on IBM BlueGene/L ● Rapid Bootstrapping ● A Bootstopping criterion ● Related Projects ● Outlook ● Alexandros Stamatakis, October 2007
  • Phylogenetics Input: “good” multiple Alignment  Output: unrooted binary tree  Various methods for phylogenetic  inference Neighbour Joining (fast & simple)  Maximum Parsimony (relatively fast &  simple) Maximum Likelihood (complex & slow)  Bayesian Methods (complex & slower)  Alexandros Stamatakis, October 2007
  • Phylogenetics Input: “good” multiple Alignment  Output: unrooted binary tree  ML & Bayesian: explicit Various methods choice model for phylogenetic  inference Neighbour Joining (fast & simple)  Maximum Parsimony (relatively fast &  simple) Maximum Likelihood (complex & slow)  Bayesian Methods (complex & slower)  Alexandros Stamatakis, October 2007
  • Phylogenetics Complex Methods & Input: “good” multiple Alignment  Models required to Output: unrooted binary tree  reconstruct large & Various methods for phylogenetic complicated trees !  inference NeighbourFocus of(fast talk is on Joining this & simple)  Maximum Likelihood! Maximum Parsimony (relatively fast &  simple) Maximum Likelihood (complex & slow)  Bayesian Methods (complex & slower)  Alexandros Stamatakis, October 2007
  • Phylogenetics Input: “good” multiple Alignment  Output: unrooted binary tree  Various methods for phylogenetic  inference NeighbourThe real (fast & simple) Joining reason for  Maximum working on (relatively fast & Parsimony ML: ......  simple) Maximum Likelihood (complex & slow)  Bayesian Methods (complex & slower)  Alexandros Stamatakis, October 2007
  • Challenges for Phyloinformatics Holy grail: “Tree of Life”  What is a good alignment in a  phylogenetic context? Simultaneous alignment and tree building  Improve/extend models ... but thereby size  of computable trees decreases! More HPC awareness  Exploit multi-core architectures  Amount of available data grows at a  higher rate than algorithms are getting faster Alexandros Stamatakis, October 2007
  • The algorithmic problem Alexandros Stamatakis, October 2007
  • The number of trees Alexandros Stamatakis, October 2007
  • The number of trees Alexandros Stamatakis, October 2007
  • The number of trees Alexandros Stamatakis, October 2007
  • The number of trees explodes! BANG ! Alexandros Stamatakis, October 2007
  • Outline Introduction ● Computation of Phylogenies ● Maximum Likelihood ● Web & Grid Services ● Three Steps Towards the Tree of Life ● Parallelism on IBM BlueGene/L ● Rapid Bootstrapping ● A Bootstopping criterion ● Related Projects ● Outlook ● Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m Seq1 Seq2 Alignment Seq3 Seq4 Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m ACGT Seq1 A Seq2 C Substitution Alignment model Seq3 G Seq4 T Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m ACGT Prior probabilities, Empirical base frequencies Seq1 A Seq2 C Substitution Alignment πA πC πG πT model Seq3 G Seq4 T Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m ACGT Prior probabilities, Empirical base frequencies Seq1 A Seq2 C Substitution Alignment πA πC πG πT model Seq3 G Seq4 T Seq 3 Seq 1 b3 b1 b5 b2 b4 Seq 2 Seq 4 Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m ACGT Prior probabilities, Empirical base frequencies Seq1 A Seq2 C Substitution Alignment πA πC πG πT model Seq3 G Seq4 T Seq 3 Seq 1 b3 b1 b5 b2 b4 Seq 2 Seq 4 virtual root: vr Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m ACGT Prior probabilities, Empirical base frequencies Seq1 A Seq2 C Substitution Alignment πA πC πG πT model Seq3 G Seq4 T Seq 3 Seq 1 b3 b1 vr b5 b2 b4 Seq 2 Seq 4 P(A) P(C) P(G) P(T) P(A) P(C) P(G) P(T) m Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m ACGT Prior probabilities, Empirical base frequencies Seq1 A Seq2 C Substitution Alignment πA πC πG πT model Seq3 G Seq4 T Lots of floating pointSeq 3 Seq 1 b3 b1 operations! vr b5 b2 b4 Seq 2 Seq 4 P(A) P(C) P(G) P(T) P(A) P(C) P(G) P(T) m Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m ACGT Prior probabilities, Empirical base frequencies Seq1 A Seq2 C Substitution Alignment πA πC πG πT model Seq3 G Seq4 T Seq 3 Seq 1 Seq 2 Seq 4 optimize branch lengths Alexandros Stamatakis, October 2007
  • Maximum Likelihood Length: m ACGT Prior probabilities, Empirical base frequencies Seq1 A Seq2 C Substitution Alignment πA πC πG πT model Seq3 G Seq4 T optimize model parameters Seq 3 Seq 1 Seq 2 Seq 4 Alexandros Stamatakis, October 2007
  • Maximum Likelihood Goal: Obtain topology with maximum likelihood value Problem I: Number of possible topologies is exponential in n Problem II: Computation of likelihood function is expensive Problem III: Probably high score accuracy required Problem IV: High memory consumption Solution: • New Algorithms • New Models • High Performance Computing Alexandros Stamatakis, October 2007
  • Maximum Likelihood Goal: Obtain topology with maximum likelihood value Problem I: Number of possible topologies is exponential in n RAxML Problem II: Computation of likelihood function is expensive Randomized Axelerated Problem III: Probably high score accuracy required Maximum Likelihood Problem IV: High memory consumption Solution: • New Algorithms • New Models • High Performance Computing Alexandros Stamatakis, October 2007
  • Web & Grid Services RAxML Web-Server at San Diego Supercomputing  Center via www.phylo.org (CIPRES project) Web-Server at Vital-IT unit of Swiss Institute of  Bioinformatics phylobench.vital-it.ch/raxml-bb/  Includes novel search algorithm with 1 order of magnitude run-time improvement  Since Sept 3, about 700 jobs from 130 Ips  Extension to SwissGrid planned  Novel algorithm with Bootstopping to be integrated into CIPRES portal soon RAxML integration into Distributed European  Infrastructure for Supercomputing Applications www.deisa.org started 10 days ago Integration into Debian medical distribution  Alexandros Stamatakis, October 2007
  • RAxML Black Box Alexandros Stamatakis, October 2007
  • RAxML Black Box Why are Black Boxes useful? Alexandros Stamatakis, October 2007
  • Outline Introduction ● Computation of Phylogenies ● Maximum Likelihood ● Web & Grid Services ● Three Steps Towards the Tree of Life ● Parallelism on IBM BlueGene/L ● Rapid Bootstrapping ● A Bootstopping criterion ● Related Projects ● Outlook ● Alexandros Stamatakis, October 2007
  • Levels of Parallelism Embarrassing Parallelism MPI, CORBA, Grid Technologies Alexandros Stamatakis, October 2007
  • Coarse-Grained Parallelism: MPI Version of RAxML PC-CLUSTER Worker Processes B-2 B-3 B-1 B-4 Interconnection B-0 Network Master Process Alexandros Stamatakis, October 2007
  • Levels of Parallelism Embarrassing Parallelism MPI, CORBA, Grid Technologies Inference Parallelism MPI, algorithm-dependent Alexandros Stamatakis, October 2007
  • Levels of Parallelism Embarrassing Parallelism MPI, CORBA, Grid Technologies Inference Parallelism MPI, algorithm-dependent Loop-Level Parallelism OpenMP, GPUs, IBM CELL (Playstation), IBM BlueGene, Clusters with fast Interconnect Alexandros Stamatakis, October 2007
  • Loop Level Parallelism virtual root P Q R P[i] = f(Q[i], R[i]) Alexandros Stamatakis, October 2007
  • Loop Level Parallelism virtual root This operation uses ≥ 90% P of total execution time ! Q R P[i] = f(Q[i], R[i]) Alexandros Stamatakis, October 2007
  • Loop Level Parallelism virtual root This operation uses ≥ 90% P of total execution time ! → simple fine-grained parallelization Q R P[i] = f(Q[i], R[i]) Alexandros Stamatakis, October 2007
  • Loop Level Parallelism virtual root P Q R Alexandros Stamatakis, October 2007
  • Loop Level Parallelism virtual root P Q R Alexandros Stamatakis, October 2007
  • Loop Level Parallelism virtual root P Q R Alexandros Stamatakis, October 2007
  • Loop Level Parallelism virtual root The real reason for assuming independent evolution among sites: P ...... Q R Alexandros Stamatakis, October 2007
  • Fine-Grained Parallelism: OpenMP version of RAxML Alexandros Stamatakis, October 2007
  • Fine-Grained Parallelism: OpenMP version of RAxML Alexandros Stamatakis, October 2007
  • HPC for ML (Bayesian) Proof of Concept & Programming  Techniques:  RAxML on a Graphics Processing Unit  RAxML on the IBM CELL & Playstation Production Level Implementations:   RAxML with OpenMP  RaxML with MPI  RAxML on BlueGene  Multi-Core Architectures Alexandros Stamatakis, October 2007
  • HPC for ML (Bayesian) Proof of Concept & Programming  Techniques:  RAxML on a Graphics Processing Unit  RAxML on the IBM CELL & Playstation Production Level Implementations:  A good excuse to buy one  RAxML with OpenMP  RaxML with MPI  RAxML on BlueGene  Multi-Core Architectures Alexandros Stamatakis, October 2007
  • RAxML-BlueGene Many slow processors: 1024 in one rack  512 MB or 1GB of main memory per node  But: high performance network  Challenges:  Distribute tree data structure among CPUs  Exploit fast collective communication network  For optimal efficiency: loop-level +  embarrassing parallelism  hybrid parallelism with MPI Test & Production Run Data  With Olaf Bininda-Emonds, Jena: 2,182  mammalian sequences x 51,000 base pairs With Dan Janies, Ohio State: 270 Human  Haplotype Map sequences x 500,000 base pairs Alexandros Stamatakis, October 2007
  • RAxML-BlueGene To be presented at IEEE/ACM 2007 Supercomputing Many slow processors: 1024 in one rack  Conference. 512 MB or 1GB of main memory per node  But: high performance network  Challenges:  Distribute tree data structure among CPUs  Exploit fast collective communication network  For optimal efficiency: loop-level +  embarrassing parallelism  hybrid parallelism with MPI Test & Production Run Data  With Olaf Bininda-Emonds, Jena: 2,182  mammalian sequences x 51,000 base pairs With Dan Janies, Ohio State: 270 Human  Haplotype Map sequences x 500,000 base pairs Alexandros Stamatakis, October 2007
  • RAxML-BlueGene Many slow processors: 1024 in one rack  512 MB or 1GB of main memory per node  But: high performance network  Challenges:  Distribute tree data structure among CPUs in Largest ML analysis to date  terms of memory footprint Exploit fast collective communication network  For optimal efficiency: loop-level +  embarrassing parallelism  hybrid parallelism with MPI Test & Production Run Data  With Olaf Bininda-Emonds, Jena: 2,182  mammalian sequences x 51,000 base pairs With Dan Janies, Ohio State: 270 Human  Haplotype Map sequences x 500,000 base pairs Alexandros Stamatakis, October 2007
  • Loop-Level Parallelism on BlueGene Alexandros Stamatakis, October 2007
  • 50 Seqs x 23,385 bp Alexandros Stamatakis, October 2007
  • 50 Seqs x 23,385 bp Superlinear Speedup Alexandros Stamatakis, October 2007
  • 250 Seqs x 403,581 bp Alexandros Stamatakis, October 2007
  • Embarrassing Parallelism W W W W M W W M M M W W W W W W Alexandros Stamatakis, October 2007
  • Outline Introduction ● Computation of Phylogenies ● Maximum Likelihood ● Web & Grid Services ● Three Steps Towards the Tree of Life ● Parallelism on IBM BlueGene/L ● Rapid Bootstrapping ● A Bootstopping criterion ● Related Projects ● Outlook ● Alexandros Stamatakis, October 2007
  • Confidence Values Tree without node confidence  values is mostly useless Problem:  Confidence value calculation is major  computational obstacle  We can compute large trees but not analyse them: compute ≠analyse ! Current Slow Methods  Sampling with Bayesian methods  Non-parametric Bootstrapping  Alexandros Stamatakis, October 2007
  • A Tree with Confidence Values Joint work Stamatakis, October 2007 Alexandros with Marc Gottschling, Charite Hospital, Berlin
  • Bootstrapping Original Alignment perturbation compute tree compute tree compute tree Alexandros Stamatakis, October 2007
  • Bootstrapping Original Alignment This needs to be done 100-1000 times Embarrassingly Parallel ! perturbation compute tree compute tree compute tree Alexandros Stamatakis, October 2007
  • Two Questions How to compute Bootstraps faster?  How many Bootstrap replicates do we  need? Alexandros Stamatakis, October 2007
  • Current Work: Rapid Bootstrapping Algorithm Tested on 22 diverse (mammals, bacteria, archaea,  grasses, fishes, plants, viral) real-world DNA/AA single-/multi-gene datasets containing 125-7,764 sequences Pearson correlation on best-scoring ML trees between  RBS (Rapid BS) & SBS (Standard BS) support values 0.95-0.99 (except one dataset at 0.91), average 0.97 Weighted topological distance < 6%, average 4%  Program Acceleration: 8-20, average ≈ 15  Acceleration by one order of magnitude  Full ML analysis (100BS + ML search) of datasets of  up to 5,000 sequences within less than 5 days on your desktop! Allows for a sufficiently large number of Bootstrap  replicates Alexandros Stamatakis, October 2007
  • Quick & Dirty Bootstrap Modify Algorithm Computational Experiments Alexandros Stamatakis, October 2007
  • Quick & Dirty Bootstrap Modify Algorithm iterate Computational Experiments Alexandros Stamatakis, October 2007
  • Rapid Bootstrap 11111111111111 01102211111111 10111102220111 11111110112021 Alexandros Stamatakis, October 2007
  • Rapid Bootstrap 11111111111111 Compute Starting Tree 01102211111111 10111102220111 11111110112021 Alexandros Stamatakis, October 2007
  • Rapid Bootstrap Optimize Model Params & 11111111111111 Branch Lengths 01102211111111 10111102220111 11111110112021 Alexandros Stamatakis, October 2007
  • Rapid Bootstrap Use Starting Tree & Model Params to compute RELL scores 11111111111111 01102211111111 -110 10111102220111 -105 11111110112021 -100 Alexandros Stamatakis, October 2007
  • Rapid Bootstrap Use Starting Tree & Model Params to compute RELL scores 11111111111111 01102211111111 -110 10111102220111 -105 Sort by RELL 11111110112021 -100 Alexandros Stamatakis, October 2007
  • Rapid Bootstrap 11111111111111 11111110112021 -100 T0: Thorough Search 10111102220111 -105 01102211111111 -110 Alexandros Stamatakis, October 2007
  • Rapid Bootstrap 11111111111111 11111110112021 -100 T0: Thorough Search 10111102220111 -105 T1: Quick Search on T0 01102211111111 -110 Alexandros Stamatakis, October 2007
  • Rapid Bootstrap 11111111111111 11111110112021 -100 T0: Thorough Search 10111102220111 -105 T1: Quick Search on T0 01102211111111 -110 T2: Quick Search on T1 Alexandros Stamatakis, October 2007
  • Rapid Bootstrap 11111111111111 sequential dependency is bad for 11111110112021 -100 parallelism T0: Thorough Search 10111102220111 -105 T1: Quick Search on T0 01102211111111 -110 T2: Quick Search on T1 Alexandros Stamatakis, October 2007
  • Scalability of Rapid Bootstrap Alexandros Stamatakis, October 2007
  • Scalability of Rapid Bootstrap Some datasets are harder than others Alexandros Stamatakis, October 2007
  • Scalability of Rapid Bootstrap Alexandros Stamatakis, October 2007
  • ML-Scores: Garli, RAxML, PHYML 715 Sequences Alexandros Stamatakis, October 2007
  • Correlation 125 Taxa: 0.91 Alexandros Stamatakis, October 2007
  • Support Value Distribution Alexandros Stamatakis, October 2007
  • Bootstrap Likelihood Values 125 x 19,436 10,000 replicates only 195 non-trivial bipartitions Alexandros Stamatakis, October 2007
  • Bootstrap Likelihood Values 125 x 19,436 Alexandros Stamatakis, October 2007
  • 3,491 rBCL sequences Rapid versus Standard BS Correlation: 0.98 Alexandros Stamatakis, October 2007
  • 7,764 DNA Best Tree Alexandros Stamatakis, October 2007
  • 7,764 DNA All Bipartitions Alexandros Stamatakis, October 2007
  • 775 x 3,838 AA Alexandros Stamatakis, October 2007
  • New Opportunities Assess Impact of Alignment Method  on tree and support values Test Bootstrap of the Bootstrap  (double Bootstrap) procedures Devise and empirically verify  Bootstopping criteria Alexandros Stamatakis, October 2007
  • Bootstrap of the Bootstrap 140 AA (Efron et al PNAS 1996) Alexandros Stamatakis, October 2007
  • Bootstrap of the Bootstrap 3,491 rBCL Alexandros Stamatakis, October 2007
  • Bootstopping Rapid Bootstrapping allows to assess  Bootstopping criteria as follows 1. Compute a high number of BS replicates (10,000) 2. Devise topology-based bootstopping criterion and apply it to these 10,000 replicates 3. Compare support values induced by bootstopped trees (say 300 replicates) with 10,000 replicates We have 10,000 replicates for 18  datasets containing 125 to 2,554 sequences Alexandros Stamatakis, October 2007
  • Bootstopping Criterion Every 50, 100, 150, ... replicates do a test:   Say we have N BS trees  Do the following 100 times:  Randomly split up this set of N trees into 2 equal sets S1, S2, of size N/2  Compute the bipartition support vectors for S1 and S2  Compute Pearson correlation of the support vectors  return average of the 100 Pearson correlations if average > 0.99 stop  Alexandros Stamatakis, October 2007
  • Result Overview Bootstopped between 100-400 (avg  213) Correlation on best tree: Bootstopped  versus 10,000 replicates > 0.99 (avg 0.995) Correlation of all bipartitions > 0.995  (avg 0.997) Alexandros Stamatakis, October 2007
  • Bootstopping Best 140 AA Alexandros Stamatakis, October 2007
  • Bootstopping Best 404 DNA (Multi-Gene) Alexandros Stamatakis, October 2007
  • Bootstopping Best 994 DNA Alexandros Stamatakis, October 2007
  • Bootstopping All 994 DNA Alexandros Stamatakis, October 2007
  • Bootstopping Best 1,908 DNA Alexandros Stamatakis, October 2007
  • Bootstopping Best 2,554 DNA Alexandros Stamatakis, October 2007
  • Putting the Pieces together Blue-Gene: Can handle huge datasets  Use Cat approximation on BlueGene  Further speedup of factor 3.5  Memory footprint reduction factor 4  Alexandros Stamatakis, October 2007
  • 8,864 Bacteria under GTR+Γ and GTR+CAT Log Likelihood Score under Γ 7 days 14 days Execution Time Alexandros Stamatakis, October 2007
  • Putting the Pieces together Blue-Gene: Can handle huge datasets  Use Cat approximation on BlueGene  Further speedup of factor 3.5  Memory footprint reduction factor 4  Integrate rapid Bootstrap into BlueGene  version Additional speedup ≈ 15  Mechanisms available to accelerate  BlueGene version by factor 50-60 Integrate Bootstopping into BlueGene   Conclusion: We will soon be able to compute a small tree of life with 10,000 organisms and data from multiple genes! Alexandros Stamatakis, October 2007
  • Outline Introduction ● Computation of Phylogenies ● Maximum Likelihood ● Web & Grid Services ● Three Steps Towards the Tree of Life ● Parallelism on IBM BlueGene/L ● Rapid Bootstrapping ● A Bootstopping criterion ● Related Projects ● Outlook ● Alexandros Stamatakis, October 2007
  • Host-Parasite Co-Evolution Parasites (eg Lice) Hosts (eg Mammals) Alexandros Stamatakis, October 2007
  • Host-Parasite Co-Evolution Hosts Parasites Co-Evolution Hypothesis 8 Parasites Adjacency 6 hosts Matrix 0/1 Alexandros Stamatakis, October 2007
  • Host-Parasite Co-Evolution Hosts Parasites Co-Evolution Hypothesis 8 Parasites Adjacency 6 hosts Matrix 0/1 Statistical Test Alexandros Stamatakis, October 2007
  • What can HPC do forBioinformatics? Axelerated Parafit “Parafit: statistical test of co-evolution”, Pierre  Legendre, Syst. Biol. 2003 AxParafit (Axelerated Parafit)   Statistical test of hypotheses of host-parasite co- evolution  C porting, optimization, BLAS integration  Speedup up to factor 67  Master-Worker MPI-parallelization Largest co-phylogenetic study to date conducted  within 8 minutes instead of 4 weeks Open-Source Code:  http://icwww.epfl.ch/~stamatak/AxParafit.html SwissGrid-based Web-Server planned  Alexandros Stamatakis, October 2007
  • AxParafit: Sequential Performance Alexandros Stamatakis, October 2007
  • AxParafit: Parallel Performance Alexandros Stamatakis, October 2007
  • The ML Benchmark: A Current Community Project Standardized way required to test ML search programs  Web-Server with real-world alignments and performance data  at Swiss Institute of Bioinformatics Many developers of popular ML programs involved   Stephane Guindon (PHYML) Montpellier  Simon Wheelan (LeaPhy) Manchester  Bui Quang Minh (IQPNNI) Vienna  Derrick Zwickl (GARLI) Virginia  Thomas Keane (dprML) Cambridge Byproduct: SPEC-like CPU benchmark for phylogenetics  Follow-up: (planned) ML competition at major conference with  industrial sponsor Alexandros Stamatakis, October 2007
  • A Current Problem: Handling Multi-Gene Alignments Gene 1 Gene 2 Sequence 1 Sequence 5 Missing Data ≠ Gap Data Alexandros Stamatakis, October 2007
  • A Multi-Gene Model Alexandros Stamatakis, October 2007
  • A Multi-Gene Model Alexandros Stamatakis, October 2007
  • A Multi-Gene Model Alexandros Stamatakis, October 2007
  • A Multi-Gene Model LogLH (T) = LogLh (T|Red) Alexandros Stamatakis, October 2007
  • A Multi-Gene Model LogLH (T) = LogLh (T|Red) + LogLH(T|Yellow) Alexandros Stamatakis, October 2007
  • A Multi-Gene Model Challenge: devise efficient data structures for this LogLH (T) = LogLh (T|Red) + LogLH(T|Yellow) Alexandros Stamatakis, October 2007
  • Why are Individual Branches per Gene a Challenge? Alexandros Stamatakis, October 2007
  • Why are Individual Branches per Gene a Challenge? Alexandros Stamatakis, October 2007
  • Outlook Alexandros Stamatakis, October 2007
  • Outlook Tree of Life  What is a good alignment in a  phylogenetic context? Simultaneous alignment and tree building  More HPC & memory-aware programming  Multi-core architectures  Models for “gappy” multi-gene alignments  Alexandros Stamatakis, October 2007
  • Acknowledgements BlueGene Project  Michael Ott, TUM  Srinivas Aluru, Jaroslaw Zola, Iowa State  Dan Janies, Andrew Johnson, Ohio State  IBM CELL & Playstation  Filip Blagojevic, Dimitris Nikolopoulos, Virginia Tech  Christos Antonopoulos, Univ. of Thessaly  Bootstopping  Bernard Moret, Masoud Alipour, EPFL  Olaf Bininda-Emonds, Univ. Jena  RAxML Web-Server  Jacques Rougemont, SIB  Terri Liebowitz, SDSC  AxParafit/AxPcoords  Markus Goeker, Alexander Auch, Jan Meier-Kolthoff, University of Tuebingen  Datasets for Studies  Jun Inoue (Florida), Nicolas Salamin (Lausanne), Marc Gottschling (Berlin), Guido Grimm  (Tuebingen), Nikos Poulakakis (Yale), Usman Roshan (NJIT) Alexandros Stamatakis, October 2007
  • Thank you for your Attention ! Lake Geneva, Switzerland Alexandros Stamatakis, October 2007