TIGRTIGR
Phylogenomics:
Combining Evolutionary
Reconstructions and Genome
Analysis into a Single
Composite Approach
0
2500...
TIGRTIGR
Topics of Discussion
• Introduction to phylogenomics
• Phylogenomics Examples
– Functional prediction
– Not makin...
TIGRTIGRTIGRTIGR
“Nothing in biology makes sense
except in the light of evolution.”
T. H. Dobzhansky (1973)
TIGRTIGR
TIGRTIGR
Uses of Evolutionary Analysis in
Molecular Biology
• Identification of mutation patterns (e.g., ts/tv ratio)
• Am...
TIGRTIGR
Evolutionary Studies Improve
Most Aspects of Genome Analysis
• Phylogeny of species places comparative data in pe...
TIGRTIGR
Genome Information and Analysis
Improves Studies of Evolution
• Complete genome information particularly useful
•...
TIGRTIGR
Phylogenomic Analysis
• There are feedback loop between evolutionary and genome
analysis such that for many studi...
TIGRTIGR
Outline of Phylogenomics
Gene Evolution EventsPhenotype PredictionsDatabaseSpecies treePresence/AbsenceGene trees...
TIGRTIGR
TIGRTIGR
Uses of Phylogenomics I:
Functional Predictions
TIGRTIGR
Predicting Function
• Identification of motifs
• Homology/similarity based methods
– Highest hit
– Top hits
– Clu...
TIGRTIGR
Types of Molecular Homology
• Homologs: genes that are descended from a common
ancestor (e.g., all globins)
• Ort...
TIGRTIGR
Phylogenomic Analysis of the
MutS Family of Proteins
• Published analysis
– Eisen JA et al. 1997. Nature Medicine...
TIGRTIGR
TIGRTIGR
Blast Search of H. pylori “MutS”
Score E
Sequences producing significant alignments: (bits) Value
sp|P73625|MUTS_...
TIGRTIGR
H. pylori and MutS
• Prior to this genome, all species that
encoded a MutS homolog also encoded a
MutL homolog
• ...
TIGRTIGR
Phylogenetic Tree of MutS Family
AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThem...
TIGRTIGR
MutS SubfamiliesAquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrS...
TIGRTIGR
MutS Subfamilies
• MutS1 Bacterial MMR
• MSH1 Euk - mitochondrial MMR
• MSH2 Euk - all MMR in nucleus
• MSH3 Euk ...
TIGRTIGR
Overlaying Functions onto Tree
AquaeTrepaRatFlyXenlaMouseHumanYeastNeucrArathBorbuSynspNeigoThemaStrpyBacsuEcoliT...
TIGRTIGR
Functional Prediction Using Tree
AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThem...
TIGRTIGR
Table 3. Presence of MutS Homologs in Complete Genomes Sequences
Species # of MutS
Homologs
Which
Subfamilies?
Mu...
TIGRTIGR
Why was the MutS2 Family Missed?
Blast Search of Syn. sp. MutS#2
Sequences producing significant alignments: (bit...
TIGRTIGR
Problems with Similarity Based
Functional Prediction
• Prone to database error propagation.
• Cannot identify ort...
TIGRTIGR
Evolutionary Rate Variation
231456
TIGRTIGR
Rate Variation and Duplication
Species 3Species 1Species 21A2A3A1B2B3BDuplication
TIGRTIGR
Evolutionary
Method
PHYLOGENENETIC PREDICTION OF GENE FUNCTIONIDENTIFY HOMOLOGSOVERLAY KNOWN
FUNCTIONS ONTO TREE
...
TIGRTIGR
MutS.Aquaeorf.TrepaSPE1.DromeMSH2.XenlaMSH2.RatMSH2.MouseMSH2.HumanMSH2.YeastMSH2.NeucratMSH2.ArathMutS.Borbuorf....
TIGRTIGR
ETL1_M.mYA19_S.cCHD1_M.mSYGP4_S.cMOT1_S.cERCC6_H.sRAD26_S.cNUCP_H.sNUCP_M.mYB53_S.cRAD54_S.cDNRPPX_S.pRAD5_S.cRAD...
TIGRTIGR
4 F17L22 170 Arabidopsis thali
4455279 Arabidopsis thaliana
1049068 Lycopersicon esculentu
Homo sapiens
5514652 D...
TIGRTIGR
Novel Large Subunit Rubisco in
Chlorobium tepidumAgathis.gi3982533
Agathis.gi3982549
Araucaria.gi3982517
Agathis....
TIGRTIGR
Uses of Phylogenomics II:
Knowing when to Not Predict
Functions
TIGRTIGR
Deinococcus radiodurans
TIGRTIGR
DNA Repair Genes in D.
radiodurans Complete Genome
Process Genes in D. radiodurans
Nucleotide Excision Repair Uvr...
TIGRTIGR
Recombination Genes in Genomes
Pathway |------------------------------Bacteria---------------------------| |---Ar...
TIGRTIGR
Unusual Features of D. radiodurans
DNA Repair Genes
Process Genes
Nucleotide excision repair Two UvrAs
Base excis...
TIGRTIGR
Problem:
List of DNA repair gene homologs
in D. radiodurans genome is not
significantly different from other
bact...
TIGRTIGR
-Ogt
-RecFRQN
-RuvC
-Dut
-SMS
-PhrI
-AlkA
-Nfo
-Vsr
-SbcCD
-LexA
-UmuC
-PhrI
-PhrII
-AlkA
-Fpg
-Nfo
-MutLS
-RecFO...
TIGRTIGR
Repair Studies in Different Species
(determined by Medline searches as of 1998)
Humans 7028
E. coli 3926
S. cerev...
TIGRTIGR
Uses of Phylogenomics III:
Gene Duplication
TIGRTIGR
Why Duplications Are Useful to Identify
• Allows division into orthologs and paralogs
• Aids functional predictio...
TIGRTIGR
Recent Duplications
TIGRTIGR
MutY-NthDEIRA ORF00829DEIRA ORF02784DEIRAAQUAEMETJAMETTHTHEMACHLTRHAEINMCYTUTHEMAMETTHPYRHOAQUAEMETJAARCFUCELEGVI...
TIGRTIGR
Expansion of MCP Family in V. choleraeE.coli gi1787690B.subtilis gi2633766Synechocystis sp. gi1001299Synechocysti...
TIGRTIGR
Phosphate Transporters
ARCFUSYNSPTHEMAAQUAEMETJAMCYTUMCYTUVIBCHECOLIDEIRA_ORF00198DEIRA_ORFA00139DEIRA_ORF00510
TIGRTIGR
Levels of Paralogy Within A Genome
• All
– All members of a gene family are linked together
• Top matches
– Only ...
TIGRTIGR
C. pneumoniae Paralogs - All
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1...
TIGRTIGR
C. pneumoniae Paralogs - Top
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1...
TIGRTIGR
C. pneumoniae Paralogs – Recent
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 75000...
TIGRTIGR
Uses of Phylogenomics IV:
Genetic Exchange within Genomes
TIGRTIGR
Circular Maps
TIGRTIGR
TIGRTIGR
Uses of Phylogenomics V:
Gene Loss
TIGRTIGR
Why Gene Loss is Useful to Identify
• Indicates that gene is not absolutely required for
survival
• Helps disting...
TIGRTIGR
EuksArchBacteriaLossEvolutionary Origin of GeneMTMJSCHSAADRTABSMGMPBBTPHPHIECSSMTPresence ( ) or Absence of GeneS...
TIGRTIGR
51234
E. coliH. influenzaeN. gonorrhoeaeH. pyloriSyn. spB. subtilisS. pyogenesM. pneumoniaeM. genitaliumA. aeolic...
TIGRTIGR
Loss of MMR
• Lost in many pathogen species
• Mechanism of loss
– gene deletion (e.g., M. tuberculosis, H. pylori...
TIGRTIGR
Need for Phylogenomics Example:
Gene Duplication and Loss
• Genome analysis required to determine number of
homol...
TIGRTIGR
Uses of Phylogenomics VI:
Specialization
TIGRTIGR
Circular Maps
TIGRTIGR
Species Distribution of Homologs of
D. radiodurans Genes
01020304050600510152005010015005101520
Number of Species...
TIGRTIGR
Specialized Genetic Elements
(Chromosome II and Megaplasmid)
• Many two component systems
• Nitrogen metabolism
•...
TIGRTIGR
Uses of Phylogenomics VII:
Genome Rearrangements
TIGRTIGR
V. cholerae vs. E. coli All Hits
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 ...
TIGRTIGR
V. cholerae vs. E. coli Top Hits
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 ...
TIGRTIGR
V. cholerae vs. E. coli
Only if EC-Orf is Closest in All Genomes
0
1000000
2000000
3000000
4000000
5000000
E. col...
TIGRTIGR
V. cholerae vs. E. coli Proteins
Top
0
1000000
2000000
3000000
4000000
V. cholerae ORF Coordinates
TIGRTIGR
S. pneumoniae vs. S. pyogenes DNA F+R
0500000100000015000002000000BSP vs Spyo
TIGRTIGR
M. tuberculosis vs. M. leprae DNA
0
1000000
2000000
3000000
4000000
M1
TIGRTIGR
Duplication and Gene Loss Model
A
B
CD
E
F
A
B
CD
E
F
A
B
C
D
E
F
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
A
B
C
D
E
F
A’
B’...
TIGRTIGR
V. cholerae vs. E. coli Proteins
Top
0
1000000
2000000
3000000
4000000
V. cholerae ORF Coordinates
TIGRTIGR C. trachomatis MoPn
C.pneumoniaeAR39
Origin
Termination
C. trachomatis vs C. pneumoniae Dot Plot
TIGRTIGR
B1
A1
B2
A2
B3
A3
A2
A1 A2
A3
B2
B1
B3
B2
24
23
22
21
20
19
18171615
14
13
12
11
10
9
6
7
258
26
27
28
29
30
1 2
...
TIGRTIGR
Uses of Phylogenomics VIII:
Horizontal Gene Transfer and
Species Evolution
TIGRTIGR
Vertical Inheritance
TIGRTIGR
Examples of Horizontal Transfers
• Antibiotic resistance genes on plasmids
• Insertion sequences
• Pathogenicity ...
TIGRTIGR
Why Gene Transfers Are Useful to Identify
• Laterally transferred genes frequently involved in
environmental adap...
TIGRTIGR
Steps in Lateral Gene Transfer
1
2
3-5
6
A B C D
TIGRTIGR
How to Infer Gene Transfers
• Unusual distribution patterns
• Unusual nucleotide composition
• High sequence simi...
TIGRTIGR
E. coli and S. typhimurium Transfer
E. coliS. typhimuriumOld ModelE. coliS. typhimuriumNew Model
TIGRTIGR
Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes**
Bacterial speciesBacterial species Best ...
TIGRTIGR
Evidence for lateral gene transfer inEvidence for lateral gene transfer in
ThermotogaThermotoga
1. 81 archaeal-li...
TIGRTIGR
0987 09900989ThermotogaThermotoga ORFORF
Archaea homologArchaea homolog
Bacterial homologBacterial homolog
Eukary...
TIGRTIGR
0
100
200
300
400
500
600
700
500 1000 1500 2000 2500 3000 3500 4000 4500
Orfs in Target Genome
Best
Matches
Best...
TIGRTIGR
A. thaliana T1E2.8 is a
Chloroplast Derived HSP60ARATH -T1E2.8**********ECOLHAEINVIBCHVIBCHRICPRYEASTCHLPNCHLTRAQ...
TIGRTIGR
Organellar HSP60s
DROMECG12101DROMECG7235DROMECG2830DROMECG16954ARATH At2g33210ARATH F14O13.19ARATH MCP4.7YEAST S...
TIGRTIGR
ParA Phylogeny
pOMB25.Bor
BBl32.Borb
Borbu3
Borbu.2
BBM32.Borb
CP32-6.Bor
BBA20.Borb
Cp18.Borbu
pOMB10.Bor
pLp7E....
TIGRTIGR
Horizontal Gene Transfer II
TIGRTIGR
Reconciling a Tree of Life in the
Context of Lateral Gene Transfer
TIGRTIGR
rRNA Tree of Complete Genomes
Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansD...
TIGRTIGR
Whole Genome Phylogeny
TIGRTIGR
rRNA vs. Whole Genome Trees
Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDro...
TIGRTIGR
Outline of Phylogenomics
Gene Evolution EventsPhenotype PredictionsDatabaseSpecies treePresence/AbsenceGene trees...
TIGRTIGR
Evolutionary Genome Scanning
• Distribution patterns/phylogenetic profiles
• Patterns of evolution (ds/dn, correl...
TIGRTIGR
Evolutionary Diversity Still Poorly
Represented in Complete Genomes
Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-le...
TIGRTIGR
True Phylogenetic Methods
Work Best
MutS2.SynsMutS2.BacsMutS2.HelpMutS2.DeirMutsl.MettMSH4.CelegMSH4.YeastMSH4.hu...
TIGRTIGR
Acknowledgements
• Genome duplications: S. Salzberg, J. Heidelberg, O. White,
A. Stoltzfus, J. Peterson
• Genome ...
TIGRTIGR
Evolutionary Diversity Still Poorly
Represented in Complete Genomes
Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-le...
TIGRTIGR
TIGRTIGR
TIGTIG
RR
OtherOther
peoplepeople
Mom and DadMom and Dad
S. KarlinS. Karlin
M. FeldmanM. Feldman
A. M. CampbellA....
TIGRTIGR
Uses of Phylogenomics IX:
Evolution Within Species
TIGRTIGR
M. tuberculosis strain phylogeny (Indels)
TIGRTIGR
Musser-Type Evolution (Indel Phylogeny)
98a
107a
43a
73a
105a
133a
114a
169a
218a
290a
160a
159a
13a
18a
26a
30a
...
TIGRTIGR
Consistency Indices (Indel Phylogeny)
Calculated over stored trees
CI
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.4...
TIGRTIGR
TIGRTIGR
Phylogenomics I:
Presence/Absence of Homologs
• Important to have complete genomes
• Similarity searches with hig...
TIGRTIGR
Phylogenomics II:
Phylogenetic Analysis of Homologs
• Multiple sequence alignment
• Mask alignment (exclude certa...
TIGRTIGR
Phylogenomics III:
Inferring Evolutionary Events
• Infer evolutionary distribution patterns (overlay
presence/abs...
TIGRTIGR
Phylogenomics IV:
Functional Predictions and Evolution
• Overlay experimentally determined functions
onto gene tr...
TIGRTIGR
Phylogenomics V:
Pathway Analysis
• Correlated presence/absence of all genes in pathway in different
species?
– I...
TIGRTIGR
Steps in Phylogenomic Analysis
• Create database of genes of interest
• Presence/absence of homologs in complete ...
TIGRTIGR
Evolution as a Screening
Method
• Gene duplications
• Gene loss
• Lateral gene transfers
• Organellar genes
• Str...
TIGRTIGR
Evolutionary Genome Scanning
• Distribution patterns/phylogenetic profiles
• Patterns of evolution
– (ds/dn)
– St...
TIGRTIGR
Genome Sequences Allow
“Hypothesisless Research”
• DNA microarrays
• Proteomics
• GC skew and other nucleotide co...
Upcoming SlideShare
Loading in …5
×

"Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach" talk in 12/2000 by J. Eisen

674
-1

Published on

Talk by Jonathan Eisen given in December 2000 as guest seminar at the University of Maryland. Title; "Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach"

Published in: Health & Medicine, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
674
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • <number>
  • <number>
  • "Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach" talk in 12/2000 by J. Eisen

    1. 1. TIGRTIGR Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach 0 250000 500000 750000 1000000 1250000 Subject Orf Position 0 250000 500000 750000 1000000 1250000 Query Orf Position Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changes ArchaeaBacteriaEukarya Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZeamays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZeamays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85 BacteriaArchaeaBacteriaArchaeaA. rRNA tree of Bacterial and Archaeal Major GroupsB. Groups with Completed Genomes Highlighted A B CD E F A B CD E F A B C D E F A B C D E F A’ B’ C’ D’ E’ F’ A B C D E F A’ B’ C’ D’ E’ F’ A C D F A’ B’ E’ E.coli E. coli B C D F A’ B’ D’ E’ V. cholerae A B C D E F A’ B’ C’ D’ E’ F’ B1 A1 B2 A2 B3 A3 A2 A1 A2 A3 B2 B1 B3 B2 24 23 22 21 20 19 18171615 14 13 12 11 10 9 6 7 258 26 27 28 29 30 1 2 3 4 5 3132 B1 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 B3 24 23 22 21 20 19 18171615 14 13 12 11 10 9 6 7 258 26 27 28 29 3 3231 30 4 5 2 1 A1 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A2 3132 6 7 8 9 10 11 12 13 19 18171615 14 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A3 2 6 7 8 9 10 11 12 13 19 18171615 14 20 21 22 23 24 25 26 27 5 4 3 31 30 29 28 1 32 B2 Inversion Around Terminus (*) Inversion Around Terminus (*) Inversion Around Origin(*) Inversion Around Origin(*) * * * * * * * * Figure 4 Common Ancestorof A and B 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 Three V. cholerae Photolyases Phr.S thyp PHR E. coli ORFA00965********* phr.neucr Phr.Tricho Phr.Yeast Phr.B firm phr.strpy phr.haloba PHR STRGR pCRY1.huma phr.mouse phr2.human phr2.mouse phr.drosop phr3.Synsp ORF02295.Vibch******** phr.neigo ORF01792.Vibch******* Phr.Adiant Phr2.Adian Phr3.Adian phr.tomato CRY1 ARATH phr.phycom CRY2 ARATH PHH1.arath PHR1 SINAL phr.chlamy PHR ANANI phr.Synsp PHR SYNY3 phr.Theth Rh.caps MTHF type Class I CPD Photolyases 6-4 Photolyases Blue Light Receptors 8-HDF type CPD Photolyases Three Photolyase Homologs inV. cholerae UvrA2 UvrA2 S. coelicolorDrrC S. peuceteusUvrA2 D. radioduransDuplication in UvrA family UvrA1 UvrA H. influenzaeUvrA E. coliUvrA N. gonorrhoaeaUvrA R. prowazekiiUvrA S. mutansUvrA S. pyogenesUvrA S. pneumoniaeUvrA B. subtilisUvrA M. luteusUvrA M. tuberculosisUvrA M. hermoautotrophicumUvrA H. pyloriUvrA C. jejuniUvrA P. gingivalisUvrA C. tepidumuvra1 D. radioduransUvrA T. thermophilusUvrA T. pallidumUvrA B. burgdorefiUvrA T. maritimaUvrA A. aeolicusUvrA Synechocystis sp. UvrA1UvrA2OppDFUUPNodILivFXylGNrtDCPstBMDRHlyBTAP1CFTR, SURA. ABC TransportersB. UvrA Subfamily 01020304050600510152005010015005101520 Number of Species With High Hits050100150200250 Frequency05101520 Papa BearMama BearBaby Bear 010020030040050005101520 E. coli
    2. 2. TIGRTIGR Topics of Discussion • Introduction to phylogenomics • Phylogenomics Examples – Functional prediction – Not making functional predictions – Gene duplication – Genetic exchange within genomes – Gene loss – Specialization – Horizontal gene transfer
    3. 3. TIGRTIGRTIGRTIGR “Nothing in biology makes sense except in the light of evolution.” T. H. Dobzhansky (1973)
    4. 4. TIGRTIGR
    5. 5. TIGRTIGR Uses of Evolutionary Analysis in Molecular Biology • Identification of mutation patterns (e.g., ts/tv ratio) • Amino-acid/nucleotide substitution patterns useful in structural studies (e.g., rRNA) • Sequence searching matrices (e.g., PAM, Blosum) • Motif analysis (e.g., Blocks) • Functional predictions • Classifying multigene families • Evolutionary history puts other information into perspective (e.g., duplications, gene loss) TIGRTIGR
    6. 6. TIGRTIGR Evolutionary Studies Improve Most Aspects of Genome Analysis • Phylogeny of species places comparative data in perspective • Evolution of genes and gene families – Functional predictions – Identification of orthologs and paralogs – Species specific mutation patterns • Evolution of pathways – Convergence – Prediction of function • Evolution of gene order/genome rearrangements • Phylogenetic distribution patterns • Identification of novel features
    7. 7. TIGRTIGR Genome Information and Analysis Improves Studies of Evolution • Complete genome information particularly useful • Unbiased sampling • More sequences of genes • Presence/absence information needed to infer certain events (e.g., gene loss, duplication) • Genome wide mutation and substitution patterns (e.g., strand bias) • Diversification and duplication
    8. 8. TIGRTIGR Phylogenomic Analysis • There are feedback loop between evolutionary and genome analysis such that for many studies, genome and evolutionary analyses are interdependent. • Therefore, I have proposed that they actually be combined into a single composite approach I refer to as phylogenomics • Phylogenomics involves combining evolutionary reconstructions of genes, proteins, pathways, and species with analysis of complete genome sequences.
    9. 9. TIGRTIGR Outline of Phylogenomics Gene Evolution EventsPhenotype PredictionsDatabaseSpecies treePresence/AbsenceGene treesCongruenceEvol. DistributionF(x) PredictionsPathway Evolution TIGRTIGR
    10. 10. TIGRTIGR
    11. 11. TIGRTIGR Uses of Phylogenomics I: Functional Predictions
    12. 12. TIGRTIGR Predicting Function • Identification of motifs • Homology/similarity based methods – Highest hit – Top hits – Clusters of orthologous groups – HMM models – Structural threading and modeling – Evolutionary reconstructions TIGRTIGR
    13. 13. TIGRTIGR Types of Molecular Homology • Homologs: genes that are descended from a common ancestor (e.g., all globins) • Orthologs: homologs that have diverged after speciation events (e.g., human and chimp β-globins) • Paralogs: homologs that have diverged after gene duplication events (e.g., α and β globin). • Xenologs: homologs that have diverged after lateral transfer events • Positional homology: common ancestry of specific amino acid or nucleotide positions in different genes
    14. 14. TIGRTIGR Phylogenomic Analysis of the MutS Family of Proteins • Published analysis – Eisen JA et al. 1997. Nature Medicine 3(10):1076-1078. – Eisen JA. 1998. Nucleic Acids Research 26(18): 4291-4300
    15. 15. TIGRTIGR
    16. 16. TIGRTIGR Blast Search of H. pylori “MutS” Score E Sequences producing significant alignments: (bits) Value sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10 sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09 sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08 sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07 sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07 • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs
    17. 17. TIGRTIGR H. pylori and MutS • Prior to this genome, all species that encoded a MutS homolog also encoded a MutL homolog • Experimental studies have shown MutS and MutL always work together in mismatch repair • Problem: what do we conclude about H. pylori mismatch repair
    18. 18. TIGRTIGR Phylogenetic Tree of MutS Family AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHuman
    19. 19. TIGRTIGR MutS SubfamiliesAquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2
    20. 20. TIGRTIGR MutS Subfamilies • MutS1 Bacterial MMR • MSH1 Euk - mitochondrial MMR • MSH2 Euk - all MMR in nucleus • MSH3 Euk - loop MMR in nucleus • MSH6 Euk - base:base MMR in nucleus • MutS2 Bacterial - function unknown • MSH4 Euk - meiotic crossing-over • MSH5 Euk - meiotic crossing-over
    21. 21. TIGRTIGR Overlaying Functions onto Tree AquaeTrepaRatFlyXenlaMouseHumanYeastNeucrArathBorbuSynspNeigoThemaStrpyBacsuEcoliTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuHumanCelegYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2
    22. 22. TIGRTIGR Functional Prediction Using Tree AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMSH1 Repair in Mictochondria MSH3 Repair of Loops in Nucleus MSH6 Repair of Mismatches in Nucleus MutS1 Repair of Loops and Mismatches StrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4 Meiotic Crossing-Over MSH5 Meiotic Crossing-Over MutS2 Unknown FunctionsMSH2 Repair of Loops and Mismatches in Nucleus
    23. 23. TIGRTIGR Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Homologs Which Subfamilies? MutL Homologs Bacteria Escherichia coli K12 1 MutS1 1 Haemophilus influenzae Rd KW20 1 MutS1 1 Neisseria gonorrhoeae 1 MutS1 1 Helicobacter pylori 26695 1 MutS2 - Mycoplasma genitalium G-37 - - - Mycoplasma pneumoniae M129 - - - Bacillus subtilis 169 2 MutS1,MutS2 1 Streptococcus pyogenes 2 MutS1,MutS2 1 Mycobacterium tuberculosis - - - Synechocystis sp. PCC6803 2 MutS1,MutS2 1 Treponema pallidum Nichols 1 MutS1 1 Borrelia burgdorferi B31 2 MutS1,MutS2 1 Aquifex aeolicus 2 MutS1,MutS2 1 Deinococcus radiodurans R1 2 MutS1,MutS2 1 Archaea Archaeoglobus fulgidus VC-16, DSM4304 - - - Methanococcus janasscii DSM 2661 - - - Methanobacterium thermoautotrophicum ∆Η 1 ΜυτΣ2 − Ευκαρψοτεσ Σαχχηαροµψχεσχερεϖισιαε 6 ΜΣΗ1−6 3+ Ηοµο σαπιενσ 5 ΜΣΗ2−6 3+
    24. 24. TIGRTIGR Why was the MutS2 Family Missed? Blast Search of Syn. sp. MutS#2 Sequences producing significant alignments: (bits) Value sp|Q56239|MUTS_THETH DNA MISMATCH REPAIR PROTEIN MUT 91 3e-17 sp|P26359|SWI4_SCHPO MATING-TYPE SWITCHING PROTEIN 87 4e-16 sp|P27345|MUTS_AZOVI DNA MISMATCH REPAIR PROTEIN MUTS 83 1e-14 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN MUTS 81 3e-14 sp|Q56215|MUTS_THEAQ DNA MISMATCH REPAIR PROTEIN MUTS 81 4e-14 sp|P10564|HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA 80 5e-14 • Blast search pulls up standard MutS genes but with only a moderate p value (10-17 )
    25. 25. TIGRTIGR Problems with Similarity Based Functional Prediction • Prone to database error propagation. • Cannot identify orthologous groups reliably. • Perform poorly in cases of evolutionary rate variation and non-hierarchical trees (similarity will not reflect evolutionary relationships in these cases) • May be misled by modular proteins or large insertion/deletion events. • Are not set up to deal with expanding data sets. TIGRTIGR
    26. 26. TIGRTIGR Evolutionary Rate Variation 231456
    27. 27. TIGRTIGR Rate Variation and Duplication Species 3Species 1Species 21A2A3A1B2B3BDuplication
    28. 28. TIGRTIGR Evolutionary Method PHYLOGENENETIC PREDICTION OF GENE FUNCTIONIDENTIFY HOMOLOGSOVERLAY KNOWN FUNCTIONS ONTO TREE INFER LIKELY FUNCTION OF GENE(S) OF INTEREST 1234563531A2A3A1B2B3B2A1B1A3A1B2B3BALIGN SEQUENCESCALCULATE GENE TREE1246CHOOSE GENE(S) OF INTEREST2A2A53Species 3Species 1Species 211222311A3A1A2A3A1A2A3A464564562B3B1B2B3B1B2B3B ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication?EXAMPLE AEXAMPLE BDuplication?Duplication?Duplication5 METHODAmbiguous
    29. 29. TIGRTIGR MutS.Aquaeorf.TrepaSPE1.DromeMSH2.XenlaMSH2.RatMSH2.MouseMSH2.HumanMSH2.YeastMSH2.NeucratMSH2.ArathMutS.Borbuorf.StrpyMutS.BacsuMutS Synsp MutS Ecoli orf Neigo MutS Thema MutS Theaq orf.Deiraorf.ChltrMSH1.SpombeMSH1.YeastMSH3.YeastSwi4.SpombeRep3.MousehMSH3.Humanorf.ArathMSH6.YeastGTBP.HumanGTBP.MouseMSH6.Arathorf Strpy yshD Bacsu MSH5 Caeel hMHS5 human MSH5 Yeast MutS.Metthorf Borbu MutS2 Aquae MutS Synsp orf Deira MutS.HelpysgMutS.SauglMSH4.YeastMSH4.CaeelhMSH4.Human A.AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMutS2.MetthMutS2.SauglStrpyBacsuCaeelHumanYeastBorbuAquaeSynspDeiraHelpyYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2B.AquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathMouseMouseFlyRatMouseHumanYeastStrpyBacsuEcoliTheaqYeastYeastHumanYeastHumanArathStrpyBacsuHumanMutS2-MetthBorbuAquaeSynspDeiraHelpyMutS2-SauglCaeelYeastYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2C.MutS2StrpyBacsuMutS2.MetthBorbuAquaeSynspDeiraHelpyMutS2.SauglCaeelYeastYeastCaeelHumanHumanMSH4 Segregation & Crossover MSH5 Segregation & Crossover FlyMouseHumanYeastAquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathArathMutS1 All MMR (Bacteria) RatStrpyBacsuEcoliTheaqYeastYeastMouseHumanYeastHumanMouseMSH1 MMR in Mitochondria MSH3 MMR of Large Loops in Nucleus MSH6 MMR of Mismatches and Small Loops in Nucleus MSH2 All MMR in Nucleus D.
    30. 30. TIGRTIGR ETL1_M.mYA19_S.cCHD1_M.mSYGP4_S.cMOT1_S.cERCC6_H.sRAD26_S.cNUCP_H.sNUCP_M.mYB53_S.cRAD54_S.cDNRPPX_S.pRAD5_S.cRAD8_S.pHIP116A_H.sRAD16_S.cLODE._D.mNPHCG_42HEPA._E.cYB95_S.cF37A4_C.eISWI_D.mSNF2L_H.sBRM_D.mBRM_H.sBRG1_H.sBRG1_M.mSTH1_S.cSNF2_S.c SNF2SNF2LCHD1ETL1CSBRAD54RAD16LODEEvolution of the SNF2 Family of Proteins
    31. 31. TIGRTIGR 4 F17L22 170 Arabidopsis thali 4455279 Arabidopsis thaliana 1049068 Lycopersicon esculentu Homo sapiens 5514652 Drosophila melanogaste Drosophila melanogaster2 123725 Caenorhabditis elegans 6606113 Capronia mansonii RpoII.Yeast.YOR151C 107346 Schizosaccharomyces pom 151348 Euplotes octocarinatus 265427 Euplotes octocarinatus 3845258 Plasmodium falciparum RpoIII.Drome RpoIII.Drome.7303535 EGAD 114464 Caenorhabditis ele RpoIII.Yeast.172383 EGAD 145012 Schizosaccharomyce RpoIII.Neucr.7800864 ARATH5 K18C1 1 Aeropyrum pernix EGAD 8025 Sulfolobus acidocald 5458046 Pyrococcus abyssi PH1546 Pyrococcus horikoshii Thermococcus celer EGAD 14667 Methanococcus vanni MJ1040 Methanococcus jannaschi AF1886 Archaeoglobus fulgidus Halobacterium halobium Thermoplasma acidophilum RPB2 Methanobacterium thermoau atmystery.BAB02021 ARATH3 MRC8.7 ARATH3 MYM9.12 6723961 Schizosaccharomyces po RpoI.Yeast.YPR010C RpoI.Neucr.3668171 RPA2 Rattus norvegicus Mus musculus RpoI.Drome.7296211 Caenorhabditis elegans 92131 Euplotes octocarinatus ARATH1 T1P2.15 ARATH1 F1N18.2 1492072Molluscum contagiosum v 439046 Variola major virus 1143635 Variola virus 2772787 Vaccinia virus 323395 Cowpox virus 6578643 Rabbit fibroma virus 6523969 Myxoma virus 6682809 Yaba monkey tumor viru 7271687 Fowlpox virus 4049822 Melanoplus sanguinipes 2887 Kluyveromyces lactis EGAD 151364 Sacch kluyveri 1369760 Borrelia burgdorferi BB0389 Borrelia burgdorferi TP0241 Treponema pallidum 6652714 Rickettsia massiliae 6652723 Rickettsia sp. Bar29 6652720 Rickettsia conorii RP140 Rickettsia prowazekii 6960339 Salmonella typhimurium EGAD 1084 Salmonella choleraes EC3987 Escherichia coli EGAD 23892 Buchnera aphidicola HI0515 Haemophilus influenzae EGAD 6020 Pseudomonas putida RPOB Coxiella burnetii 3549149 Legionella pneumophila RPOB Neisseria meningitidis HP1198 Helicobacter pylori 6967949 Campylobacter jejuni AA1339 Aquifex aeolicus BS0107 Bacillus subtilis 4512396 Bacillus halodurans 6002201 Listeria monocytogenes EGAD 32012 Staphylococcus aure EGAD 32011 Spiroplasma citri MG341 Mycoplasma genitalium MP326 Mycoplasma pneumoniae 6899151 Ureaplasma urealyticum Rv0667 Mycobacterium tuberculo Mycobacterium leprae 7144498 Mycobacterium smegmati EGAD 39063 Mycobacterium smegm GP 7331268 Amycolatopsis medit 7248348 Streptomyces coelicolo 7573273 Thermus aquaticus DR0912 Deinococcus radiodurans TM0458 Thermotoga maritima EGAD 74970 80693 Heterosigma c EGAD Odontella sinensis EGAD 60306 Spinacia oleracea EGAD Nicotiana tabacum 6723742 Oenothera elata 5457427 Sinapis alba 5881686 Arabidopsis thaliana 4958867 Triticum aestivum EGAD 76270 Zea mays RPOB Oryza sativa EGAD Pinus thunbergii EGAD Marchantia polymorpha 7259525 Mesostigma viride 5880717 Nephroselmis olivacea RPOB Guillardia theta sll1787 Synechocystis PCC6803 EGAD 75526 Porphyra purpurea 6466433 Cyanidium caldarium EGAD 76712 Cyanophora paradoxa RPOB Chlorella vulgaris EGAD 76424 Euglena gracilis 5231258 Toxoplasma gondii 6492294 Neospora caninum EGAD 83446 Plasmodium falcipar 100 78 100 85 93 83 100 79 100 100 100 100 100 100 94100 100 74 99 100 99 100 100 99 9480 100 100 100 100 59 100 100 99 56100 100 100 100 58 95 100 97 63 95 100 100 100 81 100 100 100 59 60 99 100 100 94 100 100 69 100 77 100 97 100 71 100 99 58 83 100100 100 99 100 98 100 100 61 99 75 100 73 100 100 59 100 100 72 72 98 52 98 59 100 100 a Novel RNA Polymerase in A. thaliana Archaeal IV II III I Viral Bacterial - RpoB Plastid- RpoBs
    32. 32. TIGRTIGR Novel Large Subunit Rubisco in Chlorobium tepidumAgathis.gi3982533 Agathis.gi3982549 Araucaria.gi3982517 Agathis.gi3982535 Agathis.gi3982541 Venturiella.gi4009420 Leucobryum.gi6230571 Mougeotia.gi1145415 Anabaena.gi68158 Thife.gi2411435 Thiin.gi4105518 Metja.gi2129276 Pyrho.gi|3257353 Pyrab.gi|5458634 Pyr karaensis.gi3769302 Arcfu.gi2648911 Arcfu.gi2648975 Bacsu.gi2633730 Chlte.ORF02314 100 100 96 54 99 58 66 59 100 100 82 67 100 100 100 93 Type X Type I Rubisco Large Subunit Phylogeny
    33. 33. TIGRTIGR Uses of Phylogenomics II: Knowing when to Not Predict Functions
    34. 34. TIGRTIGR Deinococcus radiodurans
    35. 35. TIGRTIGR DNA Repair Genes in D. radiodurans Complete Genome Process Genes in D. radiodurans Nucleotide Excision Repair UvrABCD, UvrA2 Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths, MPG AP Endonuclease Xth Mismatch Excision Repair MutS, MutL Recombination Initiation Recombinase Migration and resolution RecFJNRQ, SbcCD, RecD RecA RuvABC, RecG Replication PolA, PolC, PolX, phage Pol Ligation DnlJ dNTP pools, cleanup MutTs, RRase Other LexA, RadA, HepA, UVDE, MutS2
    36. 36. TIGRTIGR Recombination Genes in Genomes Pathway |------------------------------Bacteria---------------------------| |---Archaea---| Euks Protein Name(s) Initiation RecBCD pathway RecB + + - - - - - - + + - + - - - - - - - - RecC + + - - - - - - + ±+ - ± - - - - - - - - RecD + + - - ± - - - + ±+ - ++ - ± ±+ - - - - - RecF pathway RecF + + + - + - - + + - + ± - - + - - ± ± ± RecJ + + + + + - - + - + + + + + + - - - - - RecO + + - - + - - + + - - - - - ± - - - - - RecR + + + ±+ + - - + + - + + - + + - - - - - RecN + + + + + - - + + - + - ± + + - - ± ± - RecQ + + - - + - - + - - + - - - + - - - - + ++ RecE pathway RecE/ExoVIII + - - - - - - - - - - - - - - - - - - - RecT + - - - + - - - - - - - - - - - - - - - SbcBCD pathway SbcB/ExoI + + - - - - - - - - - - - - - - - - - - SbcC + - - - + - - + - + + - + + + ± ± ± ± ± ± SbcD + - - - + - - + - + + - + + + ± ± ± ± ± ± AddAB Pathway AddA/RexA - - + - + - - - - - + + - ± - - - - - - AddB/RexB - - - - + - - - - - - - - - - - - - - - Rad52 pathway Rad52, Rad59 - - - - - - - - - - - - - - - - - - - ++ + Mre11/Rad32 ± - - - ± - - ± - ± ± - ± ± ± + + + + + + Rad50 ± - - - ± - - ± - ± ± - ± ± ± + + + ± + + Recombinase RecA, Rad51 + + + + + + + + + + + + + + + + + + + ++ ++ Branch migration RuvA + + + + + + + + + + + + + - + - - - - - RuvB + + + + + + + + + + + + + - + - - - - - RecG + + + + + - - + + + + - + + + - - - - - Resolvases RuvC + + + + - - - + + - + + + - + - - - - - RecG + + + + + - - + + + + - + + + - - - - - Rus + - - - - - - - ±+ - - - - ±+ - - - - - - CCE1 - - - - - - - - - - - - - - - - - - - + Other recombination proteins Rad54 - - - - - - - - - - - - - - - - - - - + + Rad55 - - - - - - - - - - - - - - - - - - - + + Rad57 - - - - - - - - - - - - - - - - - - - + + Xrs2 - - - - - - - - - - - - - - - - - - - +
    37. 37. TIGRTIGR Unusual Features of D. radiodurans DNA Repair Genes Process Genes Nucleotide excision repair Two UvrAs Base excision repair Four MutY-Nths Recombination RecD but not RecBC Replication Four Pol genes dNTP pools Many MutTs, two RRases Other UVDE
    38. 38. TIGRTIGR Problem: List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other bacterial genomes of the similar size
    39. 39. TIGRTIGR -Ogt -RecFRQN -RuvC -Dut -SMS -PhrI -AlkA -Nfo -Vsr -SbcCD -LexA -UmuC -PhrI -PhrII -AlkA -Fpg -Nfo -MutLS -RecFORQ -SbcCD -LexA -UmuC -TagI -PhrI -Ogt -AlkA -Xth -MutLS -RecFJORQN -Mfd -SbcCD -RecG -Dut -PriA -LexA -SMS -MutT -PhrI -PhrII? -AlkA -Fpg -Nfo -RecO -LexA -UmuC -PhrI -Ung? -MutLS -RecQ? -Dut -UmuC -PhrII -Ogg -Ogt -AlkA -TagI -Nfo -Rec -SbcCD -LexA -Ogt -AlkA -Nfo -RecQ -SbcD? -Lon -LexA -AlkA -Xth -Rad25? -AlkA -Rad25 -Nfo -Ogt -Ung -Nfo -Dut -Lon -Ung -PhrII -PhrI Ecoli Haein Neigo Helpy Bacsu Strpy Mycge Mycpn Borbu Trepa Synsp Metjn Arcfu Metth Human Yeast BACTERIA ARCHAEA EUKARYOTES from mitochondria +Ada +MutH +SbcB dPhr +TagI? +Fpg +UvrABCD +Mfd +RecFJNOR +RuvABC +RecG +LigI +LexA +SSB +PriA +Dut? +Rus +UmuD +Nei? +RecE tRecT? +Vsr +RecBCD? +RFAs +TFIIH +Rad4,10,14,16,23,26 +CSA +Rad52,53,54 +DNA-PK, Ku dSNF2 dMutS dMutL dRecA +Rad1 +Rad2 +Rad25? +Ogg +LigII +Ung? +SSB, +Dut? +PhrI, PhrII +Ogt +Ung, AlkA, MutY-Nth +AlkA +Xth, Nfo? +MutLS? +SbcCD +RecA +UmuC +MutT +Lon dMutSI/MutSII dRecA/SMS dPhrI/PhrII +Spr t3MG +Rad7 +CCE1 +P53 dRecQ dRad23 +MAG? -PhrII -RuvC tRad25 +TagI? +RecT tUvrABCD tTagI ? Gain and Loss of Repair Genes TIGRTIGR
    40. 40. TIGRTIGR Repair Studies in Different Species (determined by Medline searches as of 1998) Humans 7028 E. coli 3926 S. cerevisiae 988 Drosophila 387 B. subtilits 284 S. pombe 116 Xenopus 56 C. elegans 25 A. thaliana 20 Methanogens 16 Haloferax 5 Giardia 0
    41. 41. TIGRTIGR Uses of Phylogenomics III: Gene Duplication
    42. 42. TIGRTIGR Why Duplications Are Useful to Identify • Allows division into orthologs and paralogs • Aids functional predictions • Recent duplications may be indicative of species’ specific adaptations • Helps identify mechanisms of duplication • Can be used to study mutation processes in different parts of genome
    43. 43. TIGRTIGR Recent Duplications
    44. 44. TIGRTIGR MutY-NthDEIRA ORF00829DEIRA ORF02784DEIRAAQUAEMETJAMETTHTHEMACHLTRHAEINMCYTUTHEMAMETTHPYRHOAQUAEMETJAARCFUCELEGVIBCHECOLIHAEINTREPARICPRAQUAEBACSUCAMJEHELPYMCYTUSYNSPCHLPNCHLTRBBUR
    45. 45. TIGRTIGR Expansion of MCP Family in V. choleraeE.coli gi1787690B.subtilis gi2633766Synechocystis sp. gi1001299Synechocystis sp. gi1001300Synechocystis sp. gi1652276Synechocystis sp. gi1652103H.pylori gi2313716H.pylori99 gi4155097C.jejuni Cj1190cC.jejuni Cj1110cA.fulgidus gi2649560A.fulgidus gi2649548B.subtilis gi2634254B.subtilis gi2632630B.subtilis gi2635607B.subtilis gi2635608B.subtilis gi2635609B.subtilis gi2635610B.subtilis gi2635882E.coli gi1788195E.coli gi2367378E.coli gi1788194E.coli gi1789453C.jejuni Cj0144C.jejuni Cj0262cH.pylori gi2313186H.pylori99 gi4154603C.jejuni Cj1564C.jejuni Cj1506cH.pylori gi2313163H.pylori99 gi4154575H.pylori gi2313179H.pylori99 gi4154599C.jejuni Cj0019cC.jejuni Cj0951cC.jejuni Cj0246cB.subtilis gi2633374T.maritima TM0014T.pallidum gi3322777T.pallidum gi3322939T.pallidum gi3322938B.burgdorferi gi2688522T.pallidum gi3322296B.burgdorferi gi2688521T.maritima TM0429T.maritima TM0918T.maritima TM0023T.maritima TM1428T.maritima TM1143T.maritima TM1146P.abyssi PAB1308P.horikoshii gi3256846P.abyssi PAB1336P.horikoshii gi3256896P.abyssi PAB2066P.horikoshii gi3258290P.abyssi PAB1026P.horikoshii gi3256884D.radiodurans DRA00354D.radiodurans DRA0353D.radiodurans DRA0352P.abyssi PAB1189P.horikoshii gi3258414B.burgdorferi gi2688621M.tuberculosis gi1666149V.cholerae VC0512V.cholerae VCA1034V.cholerae VCA0974V.cholerae VCA0068V.cholerae VC0825V.cholerae VC0282V.cholerae VCA0906V.cholerae VCA0979V.cholerae VCA1056V.cholerae VC1643V.cholerae VC2161V.cholerae VCA0923V.cholerae VC0514V.cholerae VC1868V.cholerae VCA0773V.cholerae VC1313V.cholerae VC1859V.cholerae VC1413V.cholerae VCA0268V.cholerae VCA0658V.cholerae VC1405V.cholerae VC1298V.cholerae VC1248V.cholerae VCA0864V.cholerae VCA0176V.cholerae VCA0220V.cholerae VC1289V.cholerae VCA1069V.cholerae VC2439V.cholerae VC1967V.cholerae VCA0031V.cholerae VC1898V.cholerae VCA0663V.cholerae VCA0988V.cholerae VC0216V.cholerae VC0449V.cholerae VCA0008V.cholerae VC1406V.cholerae VC1535V.cholerae VC0840V.cholerae VC0098V.cholerae VCA1092V.cholerae VC1403V.cholerae VCA1088V.cholerae VC1394V.cholerae VC0622NJ*******************************************************************************
    46. 46. TIGRTIGR Phosphate Transporters ARCFUSYNSPTHEMAAQUAEMETJAMCYTUMCYTUVIBCHECOLIDEIRA_ORF00198DEIRA_ORFA00139DEIRA_ORF00510
    47. 47. TIGRTIGR Levels of Paralogy Within A Genome • All – All members of a gene family are linked together • Top matches – Only top matching pairs are linked together. Therefore, if in a large gene family, only the pair from the most recent duplication event is included • Recent – Operational definition based on comparison to other species. Only pairs which are more similar to each other than to selected other species are included.
    48. 48. TIGRTIGR C. pneumoniae Paralogs - All 0 250000 500000 750000 1000000 1250000 Subject Orf Position 0 250000 500000 750000 1000000 1250000 Query Orf Position
    49. 49. TIGRTIGR C. pneumoniae Paralogs - Top 0 250000 500000 750000 1000000 1250000 Subject Orf Position 0 250000 500000 750000 1000000 1250000 Query Orf Position
    50. 50. TIGRTIGR C. pneumoniae Paralogs – Recent 0 250000 500000 750000 1000000 1250000 Subject Orf Position 0 250000 500000 750000 1000000 1250000 Query Orf Position
    51. 51. TIGRTIGR Uses of Phylogenomics IV: Genetic Exchange within Genomes
    52. 52. TIGRTIGR Circular Maps
    53. 53. TIGRTIGR
    54. 54. TIGRTIGR Uses of Phylogenomics V: Gene Loss
    55. 55. TIGRTIGR Why Gene Loss is Useful to Identify • Indicates that gene is not absolutely required for survival • Helps distinguish likelihood of gene transfers • Correlated loss of same gene in different species may indicate selective advantage of loss of that gene • Correlated loss of genes in a pathway indicates a conserved association among those genes
    56. 56. TIGRTIGR EuksArchBacteriaLossEvolutionary Origin of GeneMTMJSCHSAADRTABSMGMPBBTPHPHIECSSMTPresence ( ) or Absence of GeneSpecies AbbreviationKingdom Example of Tracing Gene Loss TIGRTIGR
    57. 57. TIGRTIGR 51234 E. coliH. influenzaeN. gonorrhoeaeH. pyloriSyn. spB. subtilisS. pyogenesM. pneumoniaeM. genitaliumA. aeolicusD. radioduransT. pallidumB.burgdorferiA. aeolicusS pyogenesB. subtilisSyn. spD. radioduransB. burgdorferiSyn. spB. subtilisS. pyogenesA. aeolicusD. radioduransB. burgdorferiMutS2MutS1A.B.Gene Duplication Gene Duplication Ancient Duplication in MutS Family
    58. 58. TIGRTIGR Loss of MMR • Lost in many pathogen species • Mechanism of loss – gene deletion (e.g., M. tuberculosis, H. pylori) – frameshifts (e.g., N. meningitidis, S. pneumoniae) – some species have evolved systems to turn MMR on and off depending on conditions (e.g., E. coli)
    59. 59. TIGRTIGR Need for Phylogenomics Example: Gene Duplication and Loss • Genome analysis required to determine number of homologs in different species • Evolutionary analysis required to divide into orthology groups and identify gene duplications • Genome analysis is then required to determine presence and absence of orthologs • Then loss of orthologs can be traced onto evolutionary tree of species
    60. 60. TIGRTIGR Uses of Phylogenomics VI: Specialization
    61. 61. TIGRTIGR Circular Maps
    62. 62. TIGRTIGR Species Distribution of Homologs of D. radiodurans Genes 01020304050600510152005010015005101520 Number of Species With High Hits050100150200250 Frequency05101520 Papa BearMama BearBaby Bear010020030040050005101520 E. coli
    63. 63. TIGRTIGR Specialized Genetic Elements (Chromosome II and Megaplasmid) • Many two component systems • Nitrogen metabolism • LexA • Ribonucleotide reductase • UvrA2 • Many transcription factors (e.g., HepA) • Iron metabolism
    64. 64. TIGRTIGR Uses of Phylogenomics VII: Genome Rearrangements
    65. 65. TIGRTIGR V. cholerae vs. E. coli All Hits 0 1000000 2000000 3000000 4000000 5000000 E. coli Coordinates 0 1000000 2000000 3000000 V. cholerae Coordinates
    66. 66. TIGRTIGR V. cholerae vs. E. coli Top Hits 0 1000000 2000000 3000000 4000000 5000000 E. coli Coordinates 0 1000000 2000000 3000000 V. cholerae Coordinates
    67. 67. TIGRTIGR V. cholerae vs. E. coli Only if EC-Orf is Closest in All Genomes 0 1000000 2000000 3000000 4000000 5000000 E. coli Coordinates 0 1000000 2000000 3000000 V. cholerae Coordinates
    68. 68. TIGRTIGR V. cholerae vs. E. coli Proteins Top 0 1000000 2000000 3000000 4000000 V. cholerae ORF Coordinates
    69. 69. TIGRTIGR S. pneumoniae vs. S. pyogenes DNA F+R 0500000100000015000002000000BSP vs Spyo
    70. 70. TIGRTIGR M. tuberculosis vs. M. leprae DNA 0 1000000 2000000 3000000 4000000 M1
    71. 71. TIGRTIGR Duplication and Gene Loss Model A B CD E F A B CD E F A B C D E F A B C D E F A’ B’ C’ D’ E’ F’ A B C D E F A’ B’ C’ D’ E’ F’ A C D F A’ B’ E’ E. coli E. coli B C D F A’ B’ D’ E’ V. cholerae A B C D E F A’ B’ C’ D’ E’ F’
    72. 72. TIGRTIGR V. cholerae vs. E. coli Proteins Top 0 1000000 2000000 3000000 4000000 V. cholerae ORF Coordinates
    73. 73. TIGRTIGR C. trachomatis MoPn C.pneumoniaeAR39 Origin Termination C. trachomatis vs C. pneumoniae Dot Plot
    74. 74. TIGRTIGR B1 A1 B2 A2 B3 A3 A2 A1 A2 A3 B2 B1 B3 B2 24 23 22 21 20 19 18171615 14 13 12 11 10 9 6 7 258 26 27 28 29 30 1 2 3 4 5 3132 B1 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 B3 24 23 22 21 20 19 18171615 14 13 12 11 10 9 6 7 258 26 27 28 29 3 3231 30 4 5 2 1 A1 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A2 3132 6 7 8 9 10 11 12 13 19 18171615 14 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A3 2 6 7 8 9 10 11 12 13 19 18171615 14 20 21 22 23 24 25 26 27 5 4 3 31 30 29 28 1 32 B2 Inversion Around Terminus (*) Inversion Around Terminus (*) Inversion Around Origin (*) Inversion Around Origin (*) * * * * * * * * Figure 4 Common Ancestor of A and B 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132
    75. 75. TIGRTIGR Uses of Phylogenomics VIII: Horizontal Gene Transfer and Species Evolution
    76. 76. TIGRTIGR Vertical Inheritance
    77. 77. TIGRTIGR Examples of Horizontal Transfers • Antibiotic resistance genes on plasmids • Insertion sequences • Pathogenicity islands • Toxin resistance genes on plasmids • Agrobacterium Ti plasmid • Viruses and viroids • Organelle to nucleus transfers
    78. 78. TIGRTIGR Why Gene Transfers Are Useful to Identify • Laterally transferred genes frequently involved in environmental adaptations and/or pathogenicity • Helps identify transposons, integrons, and other vectors of gene transfer • Helps identify species associations in the environment
    79. 79. TIGRTIGR Steps in Lateral Gene Transfer 1 2 3-5 6 A B C D
    80. 80. TIGRTIGR How to Infer Gene Transfers • Unusual distribution patterns • Unusual nucleotide composition • High sequence similarity to supposedly distantly related species • Unusual gene trees • Observe transfer events
    81. 81. TIGRTIGR E. coli and S. typhimurium Transfer E. coliS. typhimuriumOld ModelE. coliS. typhimuriumNew Model
    82. 82. TIGRTIGR Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes** Bacterial speciesBacterial species Best hits to ArchaealBest hits to Archaeal Thermotoga maritimaThermotoga maritima 451 (24%)451 (24%) Aquifex aeolicusAquifex aeolicus 246 (16%)246 (16%) SynechocystisSynechocystis sp.sp. 126 (4%)126 (4%) Borrelia burgdorferiBorrelia burgdorferi 45 (3.6%)45 (3.6%) Escherichia coliEscherichia coli 99 (2.3%)99 (2.3%) ** 1010-5-5 over 60% of sequenceover 60% of sequence
    83. 83. TIGRTIGR Evidence for lateral gene transfer inEvidence for lateral gene transfer in ThermotogaThermotoga 1. 81 archaeal-like genes are clustered in 15 regions which range in size from ~ 4 to 20 kb; many share conserved gene order with their archaeal counterparts. 2. Many of the archaeal-like genes correspond to regions with a significantly different base composition than the rest of the chromosome. 3. Some of these regions are associated with a 30 bp repeat structure found only in thermophiles. 4. Initial phylogenetic analyses of some of these genes lends support to lateral gene transfer.
    84. 84. TIGRTIGR 0987 09900989ThermotogaThermotoga ORFORF Archaea homologArchaea homolog Bacterial homologBacterial homolog Eukaryote homologEukaryote homolog ThermotogaThermotoga ORFORF Archaea homologArchaea homolog Bacterial homologBacterial homolog Eukaryote homologEukaryote homolog 0988 0991 0992 0993 0994 0995 0996 0997 0998 0999 1000 10021001 1003 Region TM00987 - TM1003 ( 21kb Archaea-like stretch)Region TM00987 - TM1003 ( 21kb Archaea-like stretch) 79% 69% 69% 72% 72% 69% 65%61% 78% 72% TransposonTransposon 54% 48% 68% 51% 73% 73% Regulatory proteinRegulatory protein
    85. 85. TIGRTIGR 0 100 200 300 400 500 600 700 500 1000 1500 2000 2500 3000 3500 4000 4500 Orfs in Target Genome Best Matches Best Matches to Prokaryotes CAUCR BACSU ECOLI MYCTU SYNSP
    86. 86. TIGRTIGR A. thaliana T1E2.8 is a Chloroplast Derived HSP60ARATH -T1E2.8**********ECOLHAEINVIBCHVIBCHRICPRYEASTCHLPNCHLTRAQUAECAMJEHELPYBBURTREPATHEMABACSUDEIRAMCYTUMCYTUSYNSPSYNSPODONT CPSTMYCGEMYCPNCHLPNCHLTRCHLPNCHLTRARCFUARCFUMETJAPYRHOMETTHMETTHYEASTYEASTYEASTYEASTCELEGYEASTYEASTYEASTCELEGYEASTYEASTCELEGYEASTCELEGCELEG EukaryaArchaeaBacteriaCyano/Cpst
    87. 87. TIGRTIGR Organellar HSP60s DROMECG12101DROMECG7235DROMECG2830DROMECG16954ARATH At2g33210ARATH F14O13.19ARATH MCP4.7YEAST SWCAUCR ORF03639RICPR gi|3861167ECOLI gi|1790586NEIMEb gi|7227233.AQUAE gi|2984379CHLPN gi|4376399|DEIRA ORF02245BACSU gi|2632916SYNSP gi|1652489SYNSP gi|1001103ARATH At2g28000ARATH MRP15.11MCYTU gi|2909515MCYTU gi|1449370THEMA TM0506BBUR gi|2688576TREPA gi|3322286PORGI ORF00933CHLTE ORF00173HELPY gi|2313084 Mitochondrial Forms α−ΠροτεοΧψανοβαχτεριαΠλαστιδ Φορµσ
    88. 88. TIGRTIGR ParA Phylogeny pOMB25.Bor BBl32.Borb Borbu3 Borbu.2 BBM32.Borb CP32-6.Bor BBA20.Borb Cp18.Borbu pOMB10.Bor pLp7E.Borb BBE19.Borb BBB12.Borb BBN32.Borb BBF13.Borb BBH28.Borb BBK21.Borb BBU05.Borb BBJ17.Borb BBQ08.Borb BBF24.Borb OrfC.Borbu BBG08.Borb Pyrab Pyrho YZ24 METJA IncC1.Enta IncC2.Enta INC1 ECOLI INC2 ECOLI Orf.pRK2 IncC.pRK2 pM3.ParA ORF3.Pseae ORFB.Psepu 2603.Vibch***** ParA.Strco Strco2 Strco3 Myctu4 Mycle3 Deira.Chro Soj.Trepa SOJ BACSU Ricpr YGI1 PSEPU ParA.Caucr pAG1.Corgl Mycle Mycle2 Rv1708.Myc Strco Rv3213.Myc Helpy99 Helpy26695 A00900.Vib***** ParB.pR27. ParA.pMT1. parA.pMT1 parA.phage ParA phage ORFA00900 SOPA ECOLI F-Plasmid PhageN13 pCD1.Yerpe pCD1#2.Yer pYVe227.Ye pNL1.Sphar pQPH1.Coxb p42d.Rhile p42d.Rhiet REPA AGRRA pRiA4b.Agr pTiB6S3.Ag pTi-SAKURA pRL8JI.Rhi Y4CK Plasm ParA.Raleu pL6.5.Psef Chr2.Deira MP1#2.Deir MP1.Deira PX02.Bacan ORF298.Clo SojC.Halsp Borbu4 sojD.Halsp plasmid.St SojB.Halsp ParA.Rhoer SOJ MYCPN SOJ MYCGE MinD2.Pyra Pyrho2 pK214.Lacl PatA.synsp Deira.ParA pCHL1.Chlt2 GP5D CHLTR pCHL1.Chlt Chltr Chlps Chlps2 Chlpn Chltr2 Chlpn2 Chromosomal Plasmid and Phage BBQ08.Borb Chlamydial Inc Borrelia Plasmids Archaea Misc Evolution of Chromosome Partitioning Proteins (ParA)
    89. 89. TIGRTIGR Horizontal Gene Transfer II
    90. 90. TIGRTIGR Reconciling a Tree of Life in the Context of Lateral Gene Transfer
    91. 91. TIGRTIGR rRNA Tree of Complete Genomes Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changes ArchaeaBacteriaEukarya
    92. 92. TIGRTIGR Whole Genome Phylogeny
    93. 93. TIGRTIGR rRNA vs. Whole Genome Trees Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changes ArchaeaBacteriaEukarya
    94. 94. TIGRTIGR Outline of Phylogenomics Gene Evolution EventsPhenotype PredictionsDatabaseSpecies treePresence/AbsenceGene treesCongruenceEvol. DistributionF(x) PredictionsPathway Evolution TIGRTIGR
    95. 95. TIGRTIGR Evolutionary Genome Scanning • Distribution patterns/phylogenetic profiles • Patterns of evolution (ds/dn, correlations, constraints) • Lateral gene transfers (organellar genes, Pathogenicity islands) • Subdividing gene families • Functional predictions (gene trees, PG profiles) • Gene duplications • Gene loss • Specialization • Comparing close relatives • Species evolution
    96. 96. TIGRTIGR Evolutionary Diversity Still Poorly Represented in Complete Genomes Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85 BacteriaArchaeaBacteriaArchaeaA. rRNA tree of Bacterial and Archaeal Major GroupsB. Groups with Completed Genomes Highlighted
    97. 97. TIGRTIGR True Phylogenetic Methods Work Best MutS2.SynsMutS2.BacsMutS2.HelpMutS2.DeirMutsl.MettMSH4.CelegMSH4.YeastMSH4.humanmMutS.SacoMSH3.yeastC23C11.SpoMSH1.YeastMSH3.HumanREP1.MouseGTBP.MouseGTBP.HumanMSH6.YeastMSH5.HumanMSH5.CelegMSH5.YeastMSH2.HumanMSH2.MouseMSH2.YeastMutS.EcoliMutS.SynspMutS.DeiraMutS.Bacsu MutS.EcoliMutS.SynspMutS.BacsuMutS.DeiraMSH2.HumanMSH2.MouseMSH2.YeastMSH3.HumanREP1.MouseGTBP.MouseGTBP.HumanMSH6.YeastC23C11.SpoMSH1.YeastMSH3.yeastMSH4.CelegMSH4.humanMSH5.CelegMSH5.YeastmMutS.SacoMSH5.HumanMSH4.YeastMutS2.SynsMutS2.BacsMutS2.DeirMutS2.HelpMutsl.Mett UPGMANeighbor-Joining
    98. 98. TIGRTIGR Acknowledgements • Genome duplications: S. Salzberg, J. Heidelberg, O. White, A. Stoltzfus, J. Peterson • Genome sequences and analysis: J. Heidelberg, T. Read, H. Tettelin, K. Nelson, J. Peterson, R. Fleischmann, D. Bryant • Horizontal transfers: K. Nelson, W. F. Doolittle • TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul, Seqcore • $$$: DOE, NSF, NIH, ONR
    99. 99. TIGRTIGR Evolutionary Diversity Still Poorly Represented in Complete Genomes Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85 BacteriaArchaeaBacteriaArchaeaA. rRNA tree of Bacterial and Archaeal Major GroupsB. Groups with Completed Genomes Highlighted
    100. 100. TIGRTIGR
    101. 101. TIGRTIGR TIGTIG RR OtherOther peoplepeople Mom and DadMom and Dad S. KarlinS. Karlin M. FeldmanM. Feldman A. M. CampbellA. M. Campbell R. FernaldR. Fernald R. ShaferR. Shafer D. AckerlyD. Ackerly D. GoldsteinD. Goldstein M. EisenM. Eisen J. CourcelleJ. Courcelle R. MyersR. Myers C. M. CavanaughC. M. Cavanaugh P. HanawaltP. Hanawalt NSFNSF J. HeidelberJ. Heidelber T.ReadT.Read S. KaulS. Kaul M-I BenitoM-I Benito J. C. VenterJ. C. VenterC. FraseC. Fraser S. SalzbergS. Salzberg O. WhiteO. White K. NelsonK. Nelson $$$$$$ ONRONR DOEDOE NIHNIH H. TettelinH. Tettelin
    102. 102. TIGRTIGR Uses of Phylogenomics IX: Evolution Within Species
    103. 103. TIGRTIGR M. tuberculosis strain phylogeny (Indels)
    104. 104. TIGRTIGR Musser-Type Evolution (Indel Phylogeny) 98a 107a 43a 73a 105a 133a 114a 169a 218a 290a 160a 159a 13a 18a 26a 30a 32a 53a 58a 70a 96a 97a 100a 124a 204a 208a 236a 239a 249a 286a 99a 279a 205a 304a 54a 155a 165a CDC1551a 223a 110a 122a 245a 313a 36a 40a 71a 79a 168a 254a 283a 312a 4a 12a 41a 42a 52a 77a 187a 214a 81a 129a 274a 220a 64a 48a 55a 60a 72a 80a 83a 85a 89a 91a 95a 111a 170a 171a 182a 212a 219a 225a 244a 278a 301a 195a 2a 123a 207a 306a 69a 94a 101a 102a 112a 113a 121a 132a 211a 222a 235a 250a 284a 285a N1a 87a 117a 120a 136a 191a 237a 261a 37a 131a 269a 240a 63a 197a 206a 75a 108a 263a 128a 172a 162a 86a 38a 109a 119a 248a 6a 65a 68a 189a 66a 106a 227a 31a 78a 202a 213a 62a 163a 224a 256a 276a 287a 173a 291a 252a 281a 295a 310a 251a 151a 188a 292a 140a 141a 103a 174a 229a 259a H37Rv 88a 44a 74a 76a 126a 282a 166a 210a 84a
    105. 105. TIGRTIGR Consistency Indices (Indel Phylogeny) Calculated over stored trees CI 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 maximum average minimum 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 Character
    106. 106. TIGRTIGR
    107. 107. TIGRTIGR Phylogenomics I: Presence/Absence of Homologs • Important to have complete genomes • Similarity searches with high “homology threshold” (to prevent false positives) • Iterative searches (to prevent false negatives) • Multiple sequence alignments to confirm assignment of homology and to divide up multi-domain proteins
    108. 108. TIGRTIGR Phylogenomics II: Phylogenetic Analysis of Homologs • Multiple sequence alignment • Mask alignment (exclude certain regions) – ambiguous regions of alignment – hypervariable regions and regions with large gaps • Phylogenetic tree with method of choice • Robustness checks – bootstrapping – compare trees with different alignments – compare trees with different tree-building methods
    109. 109. TIGRTIGR Phylogenomics III: Inferring Evolutionary Events • Infer evolutionary distribution patterns (overlay presence/absence onto species tree) • Compare gene tree vs. species tree • Compare gene tree vs. evolutionary distribution • Infer gene duplication and transfer events • Combine gene transfer and duplication information with evolutionary distribution analysis to infer gene loss, gene origin, and timing of gene duplications and transfers
    110. 110. TIGRTIGR Phylogenomics IV: Functional Predictions and Evolution • Overlay experimentally determined functions onto gene tree • Infer changes in function – many changes suggests caution should be used in making new predictions • Predict functions based on position in tree relative to genes with known functions and based on orthology groups
    111. 111. TIGRTIGR Phylogenomics V: Pathway Analysis • Correlated presence/absence of all genes in pathway in different species? – If not, maybe non-orthologous gene displacement – Alternatively, pathway may be different between species • Correlated evolutionary events for genes in pathway – loss of all genes at once – correlated duplications? • Compare evolution of function between pathways – The number of times an activity has evolved helps in making predictions of function/phenotype
    112. 112. TIGRTIGR Steps in Phylogenomic Analysis • Create database of genes of interest • Presence/absence of homologs in complete genomes • Phylogenetic trees of each gene family • Infer evolutionary events (gene origin, duplication, loss and transfer) • Refine presence/absence (orthologs, paralogs, subfamilies) • Functional predictions and functional evolution • Analysis of pathways
    113. 113. TIGRTIGR Evolution as a Screening Method • Gene duplications • Gene loss • Lateral gene transfers • Organellar genes • Structurally constrained genes • Correlated evolutionary changes
    114. 114. TIGRTIGR Evolutionary Genome Scanning • Distribution patterns/phylogenetic profiles • Patterns of evolution – (ds/dn) – Structurally constrained genes – Correlated evolutionary changes • Lateral gene transfers – Organellar genes – Pathogenicity islands • Subdividing gene families – Orthologs vs paralogs – Functional predictions – Subfamilies – Motif identification • Gene duplications • Gene loss
    115. 115. TIGRTIGR Genome Sequences Allow “Hypothesisless Research” • DNA microarrays • Proteomics • GC skew and other nucleotide composition analyses • Parallel genome wide genetic experiments • Evolutionary genome scanning • Phylogenetic profiles
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×