Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Evolutionary Genome Scanning - talk by J. Eisen in 2000 at MBL Molecular Evolution Course

7,784 views

Published on

Evolutionary Genome Scanning - talk by J. Eisen in 2000 at MBL Molecular Evolution Course.

Published in: Science
  • Be the first to comment

Evolutionary Genome Scanning - talk by J. Eisen in 2000 at MBL Molecular Evolution Course

  1. 1. TIGRTIGR Topics of Discussion • Introduction to phylogenomics • Phylogenomics Examples – Functional prediction – Identifying “unusual” genes in genomes – Gene duplication – Genetic exchange within genomes – Gene loss – Horizontal gene transfer – Specialization – Comparing close relatives – Species evolution
  2. 2. TIGRTIGRTIGRTIGR “Nothing in biology makes sense except in the light of evolution.” T. H. Dobzhansky (1973)
  3. 3. TIGRTIGR
  4. 4. TIGRTIGR Uses of Evolutionary Analysis in Molecular Biology • Identification of mutation patterns (e.g., ts/tv ratio) • Amino-acid/nucleotide substitution patterns useful in structural studies (e.g., rRNA) • Sequence searching matrices (e.g., PAM, Blosum) • Motif analysis (e.g., Blocks) • Functional predictions • Classifying multigene families • Evolutionary history puts other information into perspective (e.g., duplications, gene loss) TIGRTIGR
  5. 5. TIGRTIGR Phylogenomic Analysis Phylogenomics involves combining evolutionary reconstructions of genes, proteins, pathways, and species with analysis of complete genome sequences.
  6. 6. TIGRTIGR Why use Phylogenomics? Evolutionary information improves genome analysis -Classification of multigene families -Predicting functions -Origins of genes and pathways Genomics information improves evolutionary reconstructions -More sequences of genes -Unbiased sampling -Presence/absence needed to infer certain events Feedback loop between two types of analysis TIGRTIGR
  7. 7. TIGRTIGR
  8. 8. TIGRTIGR Uses of Phylogenomics I: Functional Predictions
  9. 9. TIGRTIGR Predicting Function • Identification of motifs • Homology/similarity based methods – Highest hit – Top hits – Clusters of orthologous groups – HMM models – Structural threading and modeling – Evolutionary reconstructions TIGRTIGR
  10. 10. TIGRTIGR Types of Molecular Homology • Homologs: genes that are descended from a common ancestor (e.g., all globins) • Orthologs: homologs that have diverged after speciation events (e.g., human and chimp β-globins) • Paralogs: homologs that have diverged after gene duplication events (e.g., α and β globin). • Xenologs: homologs that have diverged after lateral transfer events • Positional homology: common ancestry of specific amino acid or nucleotide positions in different genes
  11. 11. TIGRTIGR
  12. 12. TIGRTIGR Blast Search of H. pylori “MutS” Score E Sequences producing significant alignments: (bits) Value sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10 sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09 sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08 sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07 sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07 • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs
  13. 13. TIGRTIGR H. pylori and MutS • Prior to this genome, all species that encoded a MutS homolog also encoded a MutL homolog • Experimental studies have shown MutS and MutL always work together in mismatch repair • Problem: what do we conclude about H. pylori mismatch repair
  14. 14. TIGRTIGR Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Homologs Bacteria Escherichia coli K12 1 Haemophilus influenzae Rd KW20 1 Neisseria gonorrhoeae 1 Helicobacter pylori 26695 1 Mycoplasma genitalium G-37 0 Mycoplasma pneumoniae M129 0 Bacillus subtilis 169 2 Streptococcus pyogenes 2 Synechocystis sp. PCC6803 2 Treponema pallidum Nichols 1 Borrelia burgdorferi B31 2 Aquifex aeolicus 2 Deinococcus radiodurans R1 2 Archaea Archaeoglobus fulgidus VC-16, DSM4304 0 Methanococcus janasscii DSM 2661 0 Methanobacterium thermoautotrophicum ∆H 1 Eukaryotes Saccharomyces cerevisiae 6 Homo sapiens 5
  15. 15. TIGRTIGR MutS Alignment EEDLKNRLCQKF . DA . HYNT IWMPT IQA I SN IDCLLA I TRTSEYLGAPSC DTSLKDCMRRLFCNFDKNHKDWQSAVEC IAVLDVLLCLANYSQGGDGPMC CSAEWLDFLEK . FS . . EHYHSLCKAVHHLATVDC I FSLAKV . . AKQGDYC SELQYKEFLNK . I T . . AEYTELRK I TLNLAQYDC I LSLAAT . . SCNVNYV EYELYKELRER . VV . . KELDKVGNNASAVAEVDF IQSLAQ I . . AYEKDWA EYELFTELREK . VK . . QY I PRLQQLAKQMSELDALQCFAT I . . SENRHYT EYE I FTEVRAT . VA . . EKAQP IRDVAKAVAA IDVLAGLAEV . . AVYQGYC EQRVLKS I TDE . IV . . SHHKTLRSLANALDELD I STSLATL . . AQEQDFV EAN I IDLFKRK . F I . . DRSNVVRQVATTLGYLDTLSSFAVL . . ANERNLV QDA IVKE IVN I . SS . . GYVEPMQTLNDVLAQLDAVVSFAHVSNGAPVPYV QSALVRE I IN I . TL . . TYTPVFEKLSLVLAHLDV IASFAHTSSYAP I PY I EEER I LRQLSDQVL . . EVLLDLEHLLA IATRLDLATARVRY . . . SFWLGA EVRKVLQR I TEY IG . . DYAKELLESFEACVEVDFQQCKYRF . . SKLVEGS E I ER I LRVLTEKTA . . EYTEELFLDLQVLQTLDF I FAKARY . . AKAVKAT TYMIVCKLLSE . IY . . EH IHCLYKLSDTVSMLDMLLSFAHA . . CTLSDYV SEETVDELLDK . IA . . TH I SELFMIAEAVA I LDLVCSFTYN . . LKENNYT ETLLMYQLQCQ . VL . . ARAAVLTRVLDLASRLDVLLALASA . . ARDYGYS E I E I LFSLQEQ . I L . . RRKTQLTAYN I LLSELE I LLSFAQV . . SAERNYA RPT IVDEVDSKTNTQLNGFLKFKSLRHPCFNLGA . . . TTA . KDF I PND I E RPE IVLP . . GEDTHP . . . FLEFKGSRHPC I TKTF . . . FG . . DDF I PND I L RPTVQEE . . . . . . R . . . . K IV IKNGRHPV IDVLL . . GEQ . . DQYVPNNTD RPTFVNG . . . . . QQ . . . . A I IAKNARNP I I ESLD . . . . . . . VHYVPND IM KPQ IHE . . . . . . GY . . . . EL I I EEGRHPV I EEF . . . . . V . . ENYVPNDTK KPEFSK . . . . . . D . . . . . EVEV I EGRHPVVEKVM. . . DS . . QEYVPNNCM RP IMQM. . . . . EPG . . . . L ID I EAGRHPVVEQSL . . . GA . . GFFVANDTQ RPVVDD . . . . . SH . . . . . AHTV IQGRHP IVEKGL . . SHKL . I PFTPNDCF CPKVDE . . . . . SN . . . . . KLEVVNGRHLMVEEGL . . SARSLETFTANNCE RPA I LEK . . . . GQG . . . . R I I LKASR . . . VEVQD . . . . E . . IAF I PNDVY RPKLHPM. . . DSER . . . . RTHL I SSRHPVLEMQD . . . . D . . I SF I SNDVT HPPQWL . . . TPGDEK . . . P I TLRQLRHPLLHWQA . . EKEGGPAVVP I TLT FPDFGE . . . . .WVE . . . . . . . LYEARHPVLVLVKED . . . . . VVPVG I LLK KP IMND . . . . . TG . . . . . F IRLKKARHPLLPP . . . . . . . . . DQVVAND I E RPEFTD . . . . . . . . . . . . TLA IKQGWHP I LEK I S . . . . A . . EKP IANNTY I P I FTN . . . . . . . . . . . . NLL IRDSRHPLLEKVL . . . . . . . KNFVPNT I S RPRYSPQ . . . . VL . . . . . GVR IQNGRHPLMELCA . . . . . . . RTFVPNSTE EPQLVE . . . . . DEC . . . . I LE I INGRHALYETFL . . . . . . . DNY I PNSTM LGKE . . . . . . QPR . . . . . . IGCE . . . EEAEEHGKAY . . LSED . . . . . . SER . . . . . . MSPE . . . . . . NGK . . . . . . LDRD . . . . . . SF . . . . . . . MGDN . . . . . . RQ . . . . . . . LGHD . . . . . . HWHPD . . . . VGNGNV . . . . N . . . . . . . . LAKD . . . . . . N . . . . . . . . FEKD . . . . . . KQM. . . . . . LESG . . . . . . KGD . . . . . . IDSQ . . . . . . IR . . . . . . . EKKG . . . . . . . . . . . . . . . LGRD . . . . . . FS . . . . . . . VTE . . . . . . . GSN . . . . . . STKH . . . . . . SSS . . . . . . CGGD . . . . . . KGR . . . . . . IDGG . . LFSELSWCEQNKG . LGLLTGANAAGKST I LRMAC IAV IMAQMGC . CVLVTGPNMGGKSTL IRQAGLLAVMAQLGC . VMI I TGPNMGGKSSY IKQVAL I T IMAQ IGS . IN I I TGPNMGGKSSY IRQVALLT IMAQ IGS . IHV I TGPNMAGKSSY IRQVGVLTLLSH IGS .MLL I TGPNMSGKSTYMRQ IAL I S IMAQ IGC . LV I LTGPNASGKSCYLRQVGL IQLMAQTGS . IWL I TGPNMAGKSTFLRQNA I I S I LAQ IGS . LWV I TGPNMGGKSTFLRQNA I IV I LAQ IGC . FH I I TGPNMGGKSTY IRQTGV IVLMAQ IGC . FL I I TGPNMGGKSTY IRQVGV I SLMAQ IGC . V IA I TGPNTGGKTVTLKTLGLVALMAKVGL . . L I LTGPNTGGKTVALKTLGLSVLMFQSA I . T IV I TGPNTGGKTVTLKTLGLLTLMAQSGL . FL I I TGPNMSGKSTYLKQ IALCQ IMAQ IGS . LQ I I TGCNMSGKSVYLKQVAL IC IMAQMGS . VKV I TGPNSSGKS IYLKQVGL I TFMALVGS R I IVVTGANASGKSVYLTQNGL IVYLAQ IGC YVPCESA . VLTP IDR IMTRLGANDN IMQGKSTFFVELAETKK I LD . . . . . YVPAEKC . RLTPVDRVFTRLGASDR IMSGESTFFVELSETAS I LR . . . . . YVPAEEA . T IG IVDG I FTRMGAADN IYKGRSTFMEELTDTAE I IR . . . . . FVPAEE I . RLS I FENVLTR IGAHDD I INGDSTFKVEMLD I LH I LK . . . . . F I PARRA . K I PVVDALFTR IGSGDVLALGVSTFMNEMLEVSN I LN . . . . . FVPAKKA . VLP I FDQ I FTR IGAADDL I SGQSTFMVEMLEAKNA IV . . . . . F I PAKTA . TLS ICDR I FTRVGAVDDLATGQSTFMVEMNETAN I LN . . . . . FVPASNA . R IG IVDQ I FSR IGSADNLYQQKSTFMVEMMETSF I LK . . . . . FVPCSKA . RVG IVDKLFSRVGSADDLYNEMSTFMVEMI ETSF I LQ . . . . . FVPCESA . EVS IVDC I LARVGAGDSQLKGVSTFMAEMLETAS I LR . . . . . FVPCEEA . E IA IVDA I LCRVGAGDSQLKGVSTFMVE I LETAS I LK . . . . . Y I PAKETVEMPWFAQ I LAD IGDEQSLQQNLSTFSGH ICR I IR I LQALPSG PVPASPNSKLPLFEKVFTD IGDEQS I EQNLSTFSAHVKNMAEFLP . . . . . H I PADEGSEAAVFEHVFAD IGDEQS I EQSLSTFSSHMVN IVG I LE . . . . . YVPAEYS . SFR IAKQ I FTR I STDDD I ETNSSTFMKEMKE IAY I LH . . . . . G I PALYG . SFPVFKRLHARV . CNDSMELTSSNFGFEMKEMAYFLD . . . . . FVPAEEA . E IGAVDA I FTR IHSCES I SLGLSTFMIDLNQVAKAVN . . . . . FVPAERA . R IG IADK I LTR IRTQETVYKTQSSFLLDSQQMAKSLS . . . . . C S A A A S A A A A A A A G A S L . . . . . . . . . . . . .MATNRSLLVVDELGRGGSSSDGFA I . . . . . . . . . . . . . HATAHSLVLVDELGRGTATFDGTA I . . . . . . . . . . . . . KATSQSLV I LDELGRGTSTHDG IA I . . . . . . . . . . . . . NCNKRSLLLLDEVGRGTGTHDG IA I . . . . . . . . . . . . . NATEKSLV I LDEVGRGTSTYDG IA I . . . . . . . . . . . . . NATKNSL I LFDE IGRGTSTYDGMAL . . . . . . . . . . . . . HATAKSLVLLDE IGRGTATFDGLA I . . . . . . . . . . . . . NATRRSFV IMDE IGRGTTASDG IA I . . . . . . . . . . . . . GATERSLA I LDE IGRGTSGKEG I S I . . . . . . . . . . . . . SATKDSL I I IDELGRGTSTYDGFGL . . . . . . . . . . . . . NASKNSL I IVDELGRGTSTYDGFGL VQDVLDPE IDSPNHP I FPSLVLLDEVGAGTDPTEGSAL . . . . . . . . . . . . . KSDENTLVL IDELGAGTDP I EGSAL . . . . . . . . . . . . . QVNENSLVLFDELGAGTDPQEGAAL . . . . . . . . . . . . . NANDKSL I L IDELGRGTNTEEG IG I . . . . . . . . . . . . . D INTETLL I LDELGRGSS IADGFCV . . . . . . . . . . . . . NATAQSLVL IDEFGKGTNTVDGLAL . . . . . . . . . . . . . LATEKSL I L IDEYGKGTD I LDGPSLF Y ESVLHHVATH I SAVVKELAET I YATLEYF IRDV AL IKYFSELS KA IVKY I SEKL QA I I EYVHDH I WSVAEYLAGE I YGCLKYLST IN YATLKYLLENN WA I SEY IATK I WA IAEH IASK I IALLRHLADQP IG I LEYLKKKK MS I LDDVHRTN YAVCEYLLSLK LAVTEHLLRTE AAVLRHWLARG GS IMLNMSKSE . QSLGF . FATHYGTLASSFKHHPQ . VRPLKMS I L . . . VDE . . . . . A . . . . . KCRTL . FSTHYHSLVEDYSKSVC . VRLGHMACM. . . VENECEDPS . . . . . KSLTL . FVTHYPPVCELEKNYSHQVGNYHMGFL . . . VSEDESKLDPGAA . DCPL I LFTTHFPMLGE IKSPL . . . IRNYHMDYV . . . . EEQKTGED . . . . . KAKTL . LATHFLE I TELEGK I EG . VKNYHMEVE . . . . . . . . . . . KT . . . . GAKTL . FSTHYHELTVLEDKLPQ . LKNVHVRAE . . . . . . . . . . . EY . . . . QART I . FATHYHELNELASLLEN . VANFQVTVK . . . . . . . . . . . EL . . . . HSRTL . FATHAHQLTNLTKSFKN . VECYCTNLS . . . . . IDRD . . . . . . . . QCRTL . FATHFGQELKQ I IDNKC . SKGMSEKVK . . . . . . FYQSG I TDLG . GAFCM. FATHFHELTALANQ I PT . VNNLHVTALT . . . . . . . . . . . . . . . . GCFAL . FATHFHELTELSEKLPN . VKNMHVVAH I . . . . . EKNLKEQKH . . . CLTV . ATTHYGELKALKYQDAR . FENASVEFD . . . . . . . . . . . . . . . . . . AWVF . VTTHHTP IKLYSTNSDY . YTPASVLFD . . . . . . . . . . . . . . . . . . ARVL . ATTHYPELKAYGYNREG . VMNASVEFD . . . . . . . . . . . . . . . . . . AFTL . FATHFLELCH IDALYPN . VENMHFEVQ . . . . . . . . HVK . . . NT . . ATVF . LSTHFQD I PK IMSKKPA . VSHLHMDAV . . . . . . . . LLN . . . . . PTCPH I FVATNFLSLVQLQLLPQGPLVQYLTMET . . . . . . . . . . . . . . . . . KCPR I IACTHFHELFNENVLTEN IKG IKHYCTD I L I SQKYNLLETAHVG . . . . TRNVTFLYKMLEGQSEGSFGMHVASMCG I SKE I IDNAQ IAAD . . . . QET I TFLYKF IKGACPKSYGFNAARLANLPEEV IQKGHRKAR EQV . PDFVTFLYQ I TRG IAARSYGLNVAKLADVPGE I LKKAAHKSK . . . .WMSV I FLYKLKKGLTYNSYGMNVAKLARLDKD I INRAFS I SE . . . . PEG IRFLY I LKEGKAEGSFG I EVAKLAGLPEEVVEEARK I LR . . . . NGTVVFLHQ IKEGAADKSYG IHVAQLAELPGDL IARAQD I LK . . . . PEE I I FLHQVTPGGADKSYG I EAGRLAGLPSSV I TRARQVMA . . . . DHTFSFDYKLKKGVNYQSHGLKVAEMAG I PKNVLLAAEEVLT . . . . GNNFCYNHKLKPG ICTKSDA IRVAELAGFPMEALKEARE I LG . . . TEETLTMLYQVKKGVCDQSFG IHVAELANFPKHV I ECAKQKAL . . . DDED I TLLYKVEPG I SDQSFG IHVAEVVQFPEK IVKMAKRKAN . . . . DQSLSPTYRLLWG I PGRSNALA IAQRLGLPLA IVEQAKDKLG . . . . RETLKPLYK IAYNTVGESMAFY IAQKYG I PSEV I E IAKRHVG . . . . I ETLSPTYKLL IGVPGRSNAFE I SKRLGLPDH I IGQAKSEMT SRNKEA I LYTYKLSKGLTEEKNYGLKAAEVSSLPPS IVLDAKE I TT . . . . DNSVKMNYQLTQKSVA I ENSG IRVVKK I FNPD I IAEAYNMDS . . . CEDGNDLVFFYQVCEGVAKASHASHTAAQAGLPDKLVARGKEV EDHESEG I TFLFKVKEG I SKQSFG IYCAKVCGLSRD IVERAEELSR ----------------I------------------ -----------II------------ ------------III------------ ------IV------ MSH6__Yeast MSH6__Mouse MSH3__Human MSH3__Yeast MutS__Aquae MutS__Bacsu MutS__Synsp MSH1__Pombe MSH1__Yeast MSH2__Human MSH2__Yeast MutS2_Synsp MutS2_Aquae MutS2_Bacsu MSH4__Human MSH4__Yeast MSH5__Human MSH5__Yeast MSH6__Yeast MSH6__Mouse MSH3__Human MSH3__Yeast MutS__Aquae MutS__Bacsu MutS__Synsp MSH1__Pombe MSH1__Yeast MSH2__Human MSH2__Yeast MutS2_Synsp MutS2_Aquae MutS2_Bacsu MSH4__Human MSH4__Yeast MSH5__Human MSH5__Yeast MSH6__Yeast MSH6__Mouse MSH3__Human MSH3__Yeast MutS__Aquae MutS__Bacsu MutS__Synsp MSH1__Pombe MSH1__Yeast MSH2__Human MSH2__Yeast MutS2_Synsp MutS2_Aquae MutS2_Bacsu MSH4__Human MSH4__Yeast MSH5__Human MSH5__Yeast
  16. 16. TIGRTIGR Phylogenetic Tree of MutS Family Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Celeg Human Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human
  17. 17. TIGRTIGR MutS Subfamilies Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Celeg Human Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2
  18. 18. TIGRTIGR MutS Subfamilies • MutS1 Bacterial MMR • MSH1 Euk - mitochondrial MMR • MSH2 Euk - all MMR in nucleus • MSH3 Euk - loop MMR in nucleus • MSH6 Euk - base:base MMR in nucleus • MutS2 Bacterial - function unknown • MSH4 Euk - meiotic crossing-over • MSH5 Euk - meiotic crossing-over
  19. 19. TIGRTIGR Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Homologs Which Subfamilies? Bacteria Escherichia coli K12 1 MutS1 Haemophilus influenzae Rd KW20 1 MutS1 Neisseria gonorrhoeae 1 MutS1 Helicobacter pylori 26695 1 MutS2 Mycoplasma genitalium G-37 0 - Mycoplasma pneumoniae M129 0 - Bacillus subtilis 169 2 MutS1,MutS2 Streptococcus pyogenes 2 MutS1,MutS2 Synechocystis sp. PCC6803 2 MutS1,MutS2 Treponema pallidum Nichols 1 MutS1 Borrelia burgdorferi B31 2 MutS1,MutS2 Aquifex aeolicus 2 MutS1,MutS2 Deinococcus radiodurans R1 2 MutS1,MutS2 Archaea Archaeoglobus fulgidus VC-16, DSM4304 0 - Methanococcus janasscii DSM 2661 0 - Methanobacterium thermoautotrophicum ∆H 1 MutS2 Eukaryotes Saccharomyces cerevisiae 6 MSH1-6 Homo sapiens 5 MSH2-6
  20. 20. TIGRTIGR Overlaying Functions onto Tree Aquae Trepa Rat Fly Xenla Mouse Human Yeast Neucr Arath Borbu Synsp Neigo Thema Strpy Bacsu Ecoli TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Human Celeg Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2
  21. 21. TIGRTIGR Functional Prediction Using Tree Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath MSH1 Repair in Mictochondria MSH3 Repair of Loops in Nucleus MSH6 Repair of Mismatches in Nucleus MutS1 Repair of Loops and Mismatches StrpyBacsu Celeg Human Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 Meiotic Crossing-Over MSH5 Meiotic Crossing-Over MutS2 Unknown Functions MSH2 Repair of Loops and Mismatches in Nucleus
  22. 22. TIGRTIGR Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Homologs Which Subfamilies? MutL Homologs Bacteria Escherichia coli K12 1 MutS1 1 Haemophilus influenzae Rd KW20 1 MutS1 1 Neisseria gonorrhoeae 1 MutS1 1 Helicobacter pylori 26695 1 MutS2 - Mycoplasma genitalium G-37 - - - Mycoplasma pneumoniae M129 - - - Bacillus subtilis 169 2 MutS1,MutS2 1 Streptococcus pyogenes 2 MutS1,MutS2 1 Mycobacterium tuberculosis - - - Synechocystis sp. PCC6803 2 MutS1,MutS2 1 Treponema pallidum Nichols 1 MutS1 1 Borrelia burgdorferi B31 2 MutS1,MutS2 1 Aquifex aeolicus 2 MutS1,MutS2 1 Deinococcus radiodurans R1 2 MutS1,MutS2 1 Archaea Archaeoglobus fulgidus VC-16, DSM4304 - - - Methanococcus janasscii DSM 2661 - - - Methanobacterium thermoautotrophicum ∆H 1 MutS2 - Eukaryotes Saccharomyces cerevisiae 6 MSH1-6 3+ Homo sapiens 5 MSH2-6 3+
  23. 23. TIGRTIGR Why was the MutS2 Family Missed? Blast Search of Syn. sp. MutS#2 Sequences producing significant alignments: (bits) Value sp|Q56239|MUTS_THETH DNA MISMATCH REPAIR PROTEIN MUT 91 3e-17 sp|P26359|SWI4_SCHPO MATING-TYPE SWITCHING PROTEIN 87 4e-16 sp|P27345|MUTS_AZOVI DNA MISMATCH REPAIR PROTEIN MUTS 83 1e-14 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN MUTS 81 3e-14 sp|Q56215|MUTS_THEAQ DNA MISMATCH REPAIR PROTEIN MUTS 81 4e-14 sp|P10564|HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA 80 5e-14 • Blast search pulls up standard MutS genes but with only a moderate p value (10-17 )
  24. 24. TIGRTIGR Problems with Similarity Based Functional Prediction • Prone to database error propagation. • Cannot identify orthologous groups reliably. • Perform poorly in cases of evolutionary rate variation and non-hierarchical trees (similarity will not reflect evolutionary relationships in these cases) • May be misled by modular proteins or large insertion/deletion events. • Are not set up to deal with expanding data sets. TIGRTIGR
  25. 25. TIGRTIGR Non-hierarchical Tree 2 31 4 5 6
  26. 26. TIGRTIGR Evolutionary Rate Variation 2 3 1 4 5 6
  27. 27. TIGRTIGR Rate Variation and Duplication Species 3 Species 1 Species 2 1A 2A 3A 1B 2B 3B Duplication
  28. 28. TIGRTIGR AlkA Domain (O6-Me-Gglycosylase) Ogt Domain (O6-Me-Galkyltransferase) Ada Domain (transcriptions regulator) Ada E. coli Ada H. infl Ogt E. coli Ogt H. infl Ogt Gram+ Ogt D. radio AlkA Gram+ AlkAE. coli MGMTEuks Alkylation Repair Genes
  29. 29. TIGRTIGR Evolutionary Method P H Y L O G E N E N E T IC P R E D IC T IO N O F G E N E F U N C T IO N ID E N T IFY H O M O L O G S O V E R L A Y K N O W N FU N C T IO N S O N T O T R E E IN FE R L IK E L Y FU N C T IO N O F G E N E (S) O F IN T E R E ST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B A L IG N SE Q U E N C E S C A L C U L A T E G E N E T R E E 1 2 4 6 C H O O S E G E N E (S) O F IN T E R E ST 2A 2A 5 3 S pecies 3S pecies 1 S pecies 2 1 1 2 2 2 31 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B A C T U A L E V O L U T IO N (A SSU M E D T O B E U N K N O W N ) Duplication? E X A M P L E A E X A M P L E B D uplication? D uplication? D uplication 5 M E T H O D A m biguous
  30. 30. TIGRTIGR MutS.Aquae orf.Trepa SPE1.Drome MSH2.Xenla MSH2.Rat MSH2.Mouse MSH2.Human MSH2.Yeast MSH2.Neucr atMSH2.Arath MutS.Borbu orf.Strpy MutS.Bacsu MutS SynspMutS Ecoli orf Neigo MutS Thema MutS Theaq orf.Deira orf.Chltr MSH1.Spombe MSH1.Yeast MSH3.Yeast Swi4.Spombe Rep3.Mouse hMSH3.Human orf.Arath MSH6.Yeast GTBP.Human GTBP.Mouse MSH6.Arath orf Strpy yshD Bacsu MSH5 Caeel hMHS5 human MSH5 Yeast MutS.Metth orf Borbu MutS2 Aquae MutS Synsporf Deira MutS.Helpy sgMutS.Saugl MSH4.Yeast MSH4.Caeel hMSH4.Human A. Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath MutS2.Metth MutS2.Saugl StrpyBacsu Caeel Human Yeast Borbu Aquae Synsp Deira Helpy Yeast Caeel Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2 B. Aquae Trepa Xenla Neucr Arath Borbu Synsp Neigo Thema Deira Chltr Spombe Spombe Arath Mouse Mouse Fly Rat Mouse Human Yeast Strpy Bacsu Ecoli Theaq Yeast Yeast Human Yeast Human Arath StrpyBacsu Human MutS2-MetthBorbu Aquae Synsp Deira Helpy MutS2-Saugl Caeel Yeast Yeast Caeel Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2 C. MutS2StrpyBacsu MutS2.MetthBorbu Aquae Synsp Deira Helpy MutS2.Saugl Caeel Yeast Yeast Caeel Human Human MSH4 Segregation & Crossover MSH5 Segregation & Crossover Fly Mouse Human Yeast Aquae Trepa Xenla Neucr Arath Borbu Synsp Neigo Thema Deira Chltr Spombe Spombe Arath Arath MutS1 All MMR (Bacteria) Rat Strpy Bacsu Ecoli Theaq Yeast Yeast Mouse Human Yeast Human Mouse MSH1 MMR in Mitochondria MSH3 MMR of Large Loops in Nucleus MSH6 MMR of Mismatches and Small Loops in Nucleus MSH2 All MMR in Nucleus D.
  31. 31. TIGRTIGR Clustering vs. Neighbor-joining MutS2.Syns MutS2.Bacs MutS2.Help MutS2.Deir Mutsl.Mett MSH4.Celeg MSH4.Yeast MSH4.human mMutS.Saco MSH3.yeast C23C11.Spo MSH1.Yeast MSH3.Human REP1.Mouse GTBP.Mouse GTBP.Human MSH6.Yeast MSH5.Human MSH5.Celeg MSH5.Yeast MSH2.Human MSH2.Mouse MSH2.Yeast MutS.Ecoli MutS.Synsp MutS.Deira MutS.Bacsu M utS.Ecoli M utS.Synsp M utS.B acsu M utS.Deira M SH 2.H uman M SH 2.M ouse M SH 2.Yeast M SH 3.H uman R EP1.M ouse G TB P.M ouse G TB P.H uman M SH 6.Yeast C 23C 11.Sp o M SH 1.Yeast M SH 3.yeast M SH 4.C eleg M SH 4.human M SH 5.C eleg M SH 5.Yeast mM utS.Saco M SH 5.H uman M SH 4.Yeast M utS2.Syns M utS2.B acs M utS2.Deir M utS2.H elp M utsl.M ett UPGMANeighbor-Joining
  32. 32. TIGRTIGR Deinococcus radiodurans
  33. 33. TIGRTIGR UvrA Gene Family • UvrA has conserved role in nucleotide excision repair in bacteria (part of UvrABCD complex) • UvrA homologs found in all complete bacterial genomes • Some UvrA homologs have been found to be involved in resistance to DNA damaging antibiotics • UvrA accumulates at membrane after DNA damage • All UvrAs are members of the ABC transporter family • Possible role in DNA damage export?
  34. 34. TIGRTIGR UvrAs in D. radiodurans • UvrA homolog in D. radiodurans shown to be part of UV endonuclease α complex • D. radiodurans genome sequence reveals a second UvrA gene - on the large megaplasmid • D. radiodurans known to export DNA repair products (e.g., damaged bases) out of cell after damage • Export may be important for radiation resistance (Battista 1997)
  35. 35. TIGRTIGR UvrA Evolution • Originated by gene duplication of an ABC transporter • Subsequently, there was a tandem duplication of the ABC transporter motif within UvrA • Ancient duplication into UvrA1 and UvrA2 subfamilies • UvrA1s - conserved role in NER • UvrA2s - transport of DNA damage? • UvrA2 in D. radiodurans may be from lateral transfer
  36. 36. TIGRTIGR Evolution of UvrA Family UvrA2 UvrA2 S. coelicolor DrrC S. peuceteus UvrA2 D. radiodurans Duplication in UvrA family UvrA1 UvrA H. influenzae UvrA E. coli UvrA N. gonorrhoaea UvrA R. prowazekii UvrA S. mutans UvrA S. pyogenes UvrA S. pneumoniae UvrA B. subtilis UvrA M. luteus UvrA M. tuberculosis UvrA M. hermoautotrophicum UvrA H. pylori UvrA C. jejuni UvrA P. gingivalis UvrA C. tepidum uvra1 D. radiodurans UvrA T. thermophilus UvrA T. pallidum UvrA B. burgdorefi UvrA T. maritima UvrA A. aeolicus UvrA Synechocystis sp. UvrA1 UvrA2 OppDF UUP NodI LivF XylG NrtDC PstB MDR HlyB TAP1 CFTR, SUR A. ABC Transporters B. UvrA Subfamily
  37. 37. TIGRTIGR UvrA Evolution Diversification of ABC family UvrA UvrAC UvrAN UvrA1C UvrA1N UvrA2C UvrA2N ABC1ABC2 ABC Tandem Duplication Gene Duplication
  38. 38. TIGRTIGR Three V. cholerae Photolyases Phr.S thyp PHR E. coli ORFA00965* * * * * * * * * phr.neucr Phr.Tricho Phr.Yeast Phr.B firm phr.strpy phr.haloba PHR STRGR pCRY1.huma phr.mouse phr2.human phr2.mouse phr.drosop phr3.Synsp O RF02295.V ibch* * * * * * * * phr.neigo ORF01792.V ibch* * * * * * * Phr.Adiant Phr2.Adian Phr3.Adian phr.tomato CRY1 ARATH phr.phycom CRY2 ARATH PHH1.arath PHR1 SINAL phr.chlamy PHR ANANI phr.Synsp PHR SYNY3 phr.Theth Rh.caps MTHF type Class I CPD Photolyases 6-4 Photolyases Blue Light Receptors 8-HDF type CPD Photolyases Three Photolyase Homologs inV. cholerae
  39. 39. TIGRTIGR MFS phylogenetic tree Bmr Bsu TetB Eco Vmt1 Rno Mmr Sco EmrB Eco QacA Sau Sge1 Sce TetK Sau NarK Bsu NasA Bsu CrnA Eni NapO Ocu Ykh4 Cel Hup1 Cke AraE Eco Itr1 Sce Gtr1 Hsa ProP EcoKgtP Eco CitA Sty HI1104 Hin NanT Eco YjhB Eco Ycy8 Sce YaeC Spo Pho84 Sce UhpT Eco PgtP Sty UhpC Eco GlpT Bsu NupG Eco XapB Eco LacY Eco LacY Kpn RafB Eco CscB Eco YhjX Eco Y38K Tte OxlT Ofo T02G5 Cel XpcT Hsa Mct2 Rno Gal Bab FucP Eco Yhe7 Sce Yhe0 Sce YK86 Sce OFA OHS NHS OPA PHS SHS MHS SPACS NNP DHA14 DHA12 UMF FGHS MCP
  40. 40. TIGRTIGR Uses of Phylogenomics II: Knowing when to Not Predict Functions
  41. 41. TIGRTIGR DNA Repair Genes in D. radiodurans Complete Genome Process Genes in D. radiodurans Nucleotide Excision Repair UvrABCD, UvrA2 Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths, MPG AP Endonuclease Xth Mismatch Excision Repair MutS, MutL Recombination Initiation Recombinase Migration and resolution RecFJNRQ, SbcCD, RecD RecA RuvABC, RecG Replication PolA, PolC, PolX, phage Pol Ligation DnlJ dNTP pools, cleanup MutTs, RRase Other LexA, RadA, HepA, UVDE, MutS2
  42. 42. TIGRTIGR Recombination Genes in Genomes Pathway |------------------------------Bacteria---------------------------| |---Archaea---| Euks Protein Name(s) Initiation RecBCD pathway RecB + + - - - - - - + + - + - - - - - - - - RecC + + - - - - - - + ±+ - ± - - - - - - - - RecD + + - - ± - - - + ±+ - ++ - ± ±+ - - - - - RecF pathway RecF + + + - + - - + + - + ± - - + - - ± ± ± RecJ + + + + + - - + - + + + + + + - - - - - RecO + + - - + - - + + - - - - - ± - - - - - RecR + + + ±+ + - - + + - + + - + + - - - - - RecN + + + + + - - + + - + - ± + + - - ± ± - RecQ + + - - + - - + - - + - - - + - - - - + ++ RecE pathway RecE/ExoVIII + - - - - - - - - - - - - - - - - - - - RecT + - - - + - - - - - - - - - - - - - - - SbcBCD pathway SbcB/ExoI + + - - - - - - - - - - - - - - - - - - SbcC + - - - + - - + - + + - + + + ± ± ± ± ± ± SbcD + - - - + - - + - + + - + + + ± ± ± ± ± ± AddAB Pathway AddA/RexA - - + - + - - - - - + + - ± - - - - - - AddB/RexB - - - - + - - - - - - - - - - - - - - - Rad52 pathway Rad52, Rad59 - - - - - - - - - - - - - - - - - - - ++ + Mre11/Rad32 ± - - - ± - - ± - ± ± - ± ± ± + + + + + + Rad50 ± - - - ± - - ± - ± ± - ± ± ± + + + ± + + Recombinase RecA, Rad51 + + + + + + + + + + + + + + + + + + + ++ ++ Branch migration RuvA + + + + + + + + + + + + + - + - - - - - RuvB + + + + + + + + + + + + + - + - - - - - RecG + + + + + - - + + + + - + + + - - - - - Resolvases RuvC + + + + - - - + + - + + + - + - - - - - RecG + + + + + - - + + + + - + + + - - - - - Rus + - - - - - - - ±+ - - - - ±+ - - - - - - CCE1 - - - - - - - - - - - - - - - - - - - + Other recombination proteins Rad54 - - - - - - - - - - - - - - - - - - - + + Rad55 - - - - - - - - - - - - - - - - - - - + + Rad57 - - - - - - - - - - - - - - - - - - - + + Xrs2 - - - - - - - - - - - - - - - - - - - +
  43. 43. TIGRTIGR Unusual Features of D. radiodurans DNA Repair Genes Process Genes Nucleotide excision repair Two UvrAs Base excision repair Four MutY-Nths Recombination RecD but not RecBC Replication Four Pol genes dNTP pools Many MutTs, two RRases Other UVDE
  44. 44. TIGRTIGR Problem: List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other bacterial genomes of the similar size
  45. 45. TIGRTIGR Repair Studies in Different Species (determined by Medline searches as of 1998) Humans 7028 E. coli 3926 S. cerevisiae 988 Drosophila 387 B. subtilits 284 S. pombe 116 Xenopus 56 C. elegans 25 A. thaliana 20 Methanogens 16 Haloferax 5 Giardia 0
  46. 46. TIGRTIGR -Ogt -RecFRQN -RuvC -Dut -SMS -PhrI -AlkA -Nfo -Vsr -SbcCD -LexA -UmuC -PhrI -PhrII -AlkA -Fpg -Nfo -MutLS -RecFORQ -SbcCD -LexA -UmuC -TagI -PhrI -Ogt -AlkA -Xth -MutLS -RecFJORQN -Mfd -SbcCD -RecG -Dut -PriA -LexA -SMS -MutT -PhrI -PhrII? -AlkA -Fpg -Nfo -RecO -LexA -UmuC -PhrI -Ung? -MutLS -RecQ? -Dut -UmuC -PhrII -Ogg -Ogt -AlkA -TagI -Nfo -Rec -SbcCD -LexA -Ogt -AlkA -Nfo -RecQ -SbcD? -Lon -LexA -AlkA -Xth -Rad25? -AlkA -Rad25 -Nfo -Ogt -Ung -Nfo -Dut -Lon -Ung -PhrII -PhrI Ecoli Haein Neigo Helpy Bacsu Strpy Mycge Mycpn Borbu Trepa Synsp Metjn Arcfu Metth Human Yeast BACTERIA ARCHAEA EUKARYOTES from mitochondria +Ada +MutH +SbcB dPhr +TagI? +Fpg +UvrABCD +Mfd +RecFJNOR +RuvABC +RecG +LigI +LexA +SSB +PriA +Dut? +Rus +UmuD +Nei? +RecE tRecT? +Vsr +RecBCD? +RFAs +TFIIH +Rad4,10,14,16,23,26 +CSA +Rad52,53,54 +DNA-PK, Ku dSNF2 dMutS dMutL dRecA +Rad1 +Rad2 +Rad25? +Ogg +LigII +Ung? +SSB, +Dut? +PhrI, PhrII +Ogt +Ung, AlkA, MutY-Nth +AlkA +Xth, Nfo? +MutLS? +SbcCD +RecA +UmuC +MutT +Lon dMutSI/MutSII dRecA/SMS dPhrI/PhrII +Spr t3MG +Rad7 +CCE1 +P53 dRecQ dRad23 +MAG? -PhrII -RuvC tRad25 +TagI? +RecT tUvrABCD tTagI ? Gain and Loss of Repair Genes TIGRTIGR
  47. 47. TIGRTIGR Evolution of Uracil Glycosylase • Ung activity has evolve many times (many non- homologous proteins have uracil-DNA glycosylase activity) • Therefore, absence of homologs of these genes should not be used to infer likely absence of activity • However, presence of homologs of Ung and MUG genes can be used to indicate presence of activity because all homologs of these genes have this activity
  48. 48. TIGRTIGR Evolution of Photoreactivation • All known enzymes that perform photoreactivation are part of a single large photolyase gene family • Some members of the family do not function as photolyases, but instead work as blue-light receptors • If a species does not encode a member of the photolyase gene family, it likely does not have photoreactivation capability • If a species encodes a photolyase, one cannot conclude it has photolyase activity • Position of photolyase homologs within photolyase tree helps predict what activities they have
  49. 49. TIGRTIGR Evolution of Alkyltransferases • All known alkyltransferases share a conserved, homologous alkyltransferase domain • Therefore, if a species does not encode any protein with this domain, it likely does not have alkyltransferase activity • If a species does encode an member of this gene family, it likely has alkyltransferase activity
  50. 50. TIGRTIGR Uses of Phylogenomics III: Gene Duplication
  51. 51. TIGRTIGR Why Duplications Are Useful to Identify • Allows division into orthologs and paralogs • Aids functional predictions • Recent duplications may be indicative of species’ specific adaptations • Helps identify mechanisms of duplication • Can be used to study mutation processes in different parts of genome
  52. 52. TIGRTIGR MurA Homologs in A. thaliana A RA TH I F5F19.7 A R A TH II F26C24.7 A RA TH IV F7N22.13 A R A TH IV T3H13.11 A R A TH II F23M 2.29 A RA TH II T13H18.11 A RA TH IV T24H24.6 A R A TH IV T3H13.13A R A TH I F22O13.23 A R A TH II F9B22.14 A RA TH II F27C21.3 A R A TH II T9F8.6A R A TH II T13P21.2A R A TH II F5O4.4 A R A TH II T13E11.2 A RA TH IV T24M 8.2 A R A TH IV F7N22.10A R A TH IV T3F12.3 A R A TH II T13E11.15 A R A TH IV T7M 24.1 A R A TH IV T3F12.8 A RA TH V T21B04.1 A R A TH II F27L4.10 A R A TH II F26B6.15 A RA TH II F23M 2.24 A R A TH I F1N21.16 A R A TH IV F9D12.2 A RA TH II F9B22.8 A R A TH IV F28J12.70 A RA TH IV T3F12.12 A RA TH II T13P21.20 A R A TH II T13E11.10 A RA TH V T21B04.16 A R A TH V T19K24.12 A R A TH V T19K24.13 A R A TH V T19K24.17 A R A TH V T21B04.11 A R A TH V T21B04.14 A R A TH V T21B04.10 A R A TH II T13P21.21 A R A TH V T21B04.13 A R A TH V T21B04.12 A RA TH II T13P21.3 A R A TH V T19K24.15 A RA TH V T19K24.16 A R A TH II T13E11.20 A RA TH V T19K24.11 A R A TH II T13E11.21 A R A TH II T13E11.9 A R A TH V T19K24.10 A RA TH V T21B04.15 A RA TH V T19K24.14 A R A TH II T11J7.3
  53. 53. TIGRTIGR MurA Homologs in A. thaliana colored by chromosome A R A TH I F5F19.7 A RA TH I F22O13.23 A R A TH I F1N21.16 A RA TH V T21B04.1 A RA TH V T21B04.16 A R A TH V T19K24.12 A R A TH V T19K24.13 A RA TH V T19K24.17 A RA TH V T21B04.11 A R A TH V T21B04.14 A R A TH V T21B04.10 A RA TH V T21B04.13 A RA TH V T21B04.12 A RA TH V T19K24.15 A R A TH V T19K24.16 A R A TH V T19K24.11 A R A TH V T19K24.10 A RA TH V T21B04.15 A R A TH V T19K24.14 A R A TH IV F7N22.13 A RA TH IV T3H13.11 A RA TH IV T24H24.6 A R A TH IV T3H13.13 A R A TH IV T24M 8.2 A R A TH IV F7N22.10 A R A TH IV T3F12.3 A RA TH IV T7M 24.1 A R A TH IV T3F12.8 A RA TH IV F9D12.2 A R A TH IV F28J12.70 A R A TH IV T3F12.12 A R A TH II F26C24.7 A R A TH II F23M 2.29 A R A TH II T13H18.11 A R A TH II F9B22.14 A R A TH II F27C21.3 A RA TH II T9F8.6A R A TH II T13P21.2A R A TH II F5O4.4A R A TH II T13E11.2 A R A TH II T13E11.15 A R A TH II F27L4.10 A R A TH II F26B6.15 A RA TH II F23M 2.24 A R A TH II F9B22.8 A RA TH II T13P21.20 A R A TH II T13E11.10 A R A TH II T13P21.21 A R A TH II T13P21.3 A RA TH II T13E11.20 A RA TH II T13E11.21 A R A TH II T13E11.9 A RA TH II T11J7.3
  54. 54. TIGRTIGR Recent Duplications • Gene duplication is frequently accompanied by functional divergence • Evolutionary analysis can identify recent duplications with no bias towards type of gene • Location of duplicates can help identify mechanisms of duplication
  55. 55. TIGRTIGR MutY-Nth DEIRAORF00829 DEIRAORF02784 DEIRA AQUAE METJA METTH THEMA CHLTR HAEIN MCYTU THEMA METTH PYRHO AQUAE METJA ARCFU CELEG VIBCH ECOLI HAEIN TREPA RICPR AQUAE BACSU CAMJE HELPY MCYTU SYNSP CHLPN CHLTR BBUR
  56. 56. TIGRTIGR Expansion of MCP Family in V. cholerae E.coligi1787690 B.subtilisgi2633766 Synechocystissp. gi1001299 Synechocystissp. gi1001300 Synechocystissp. gi1652276 Synechocystissp.gi1652103 H.pylori gi2313716 H.pylori99 gi4155097 C.jejuniCj1190c C.jejuniCj1110c A.fulgidusgi2649560 A.fulgidusgi2649548 B.subtilisgi2634254 B.subtilisgi2632630 B.subtilisgi2635607 B.subtilisgi2635608 B.subtilisgi2635609 B.subtilisgi2635610 B.subtilisgi2635882 E.coligi1788195 E.coligi2367378 E.coligi1788194 E.coligi1789453 C.jejuniCj0144 C.jejuniCj0262c H.pylori gi2313186 H.pylori99 gi4154603 C.jejuniCj1564 C.jejuniCj1506c H.pylori gi2313163 H.pylori99 gi4154575 H.pylori gi2313179 H.pylori99 gi4154599 C.jejuniCj0019c C.jejuniCj0951c C.jejuniCj0246c B.subtilisgi2633374 T.maritima TM0014 T.pallidumgi3322777 T.pallidumgi3322939 T.pallidumgi3322938 B.burgdorferi gi2688522 T.pallidumgi3322296 B.burgdorferi gi2688521 T.maritima TM0429 T.maritima TM0918 T.maritima TM0023 T.maritima TM1428 T.maritima TM1143 T.maritima TM1146 P.abyssiPAB1308 P.horikoshiigi3256846 P.abyssiPAB1336 P.horikoshiigi3256896 P.abyssiPAB2066 P.horikoshiigi3258290 P.abyssiPAB1026 P.horikoshiigi3256884 D.radiodurans DR A00354 D.radiodurans DRA0353 D.radiodurans DRA0352 P.abyssiPAB1189 P.horikoshiigi3258414 B.burgdorferi gi2688621 M.tuberculosisgi1666149 V .c hole ra eV C0 5 1 2 V . c hol e ra eV CA1 0 3 4 V .c hole ra eV CA 0 9 7 4 V .c hole raeV CA 0 06 8 V . chol e ra eV C0 8 2 5 V . c hol e ra eV C0 28 2 V .c hol e raeV CA 0 9 0 6 V . chol e ra eV CA0 9 7 9 V .c hol e raeV CA 1 0 5 6 V . c hol e ra eV C1 64 3 V . c hol e ra eV C2 1 6 1 V .c hole ra eV CA 09 2 3 V .c hole raeV C0 5 1 4 V . c hol e ra eV C1 8 6 8 V . c hol era eV CA0 7 7 3 V .c hole raeV C1 3 1 3 V . c hol era eV C1 8 5 9 V . c hole ra eV C14 1 3 V .c hol e raeV CA 0 2 6 8 V .c hol e raeV CA0 6 5 8 V . c hole ra eV C14 0 5 V . c hol e ra eV C1 2 9 8 V . c hol e ra eV C1 2 4 8 V . c hol era eV CA0 8 6 4 V . c hole ra eV CA0 1 7 6 V. c hol e ra eV CA0 2 2 0 V .c hole ra eV C1 2 8 9 V .c hole ra eV CA 10 6 9 V . c hol e ra eV C2 43 9 V . chol e ra eV C1 9 6 7 V . chol e ra eV CA0 0 3 1 V . c hole ra eV C18 9 8 V . chol e ra eV CA0 6 6 3 V .c hole ra eV CA 0 9 8 8 V . c hol era eV C0 2 1 6 V . c hol era eV C0 4 4 9 V .c hole ra eV CA 0 0 0 8 V . c hole ra eV C14 0 6 V . chol e ra eV C1 5 3 5 V .c hole ra eV C0 8 4 0 V . c hol e raeV C0 0 98 V .c hole ra eV CA 1 0 9 2 V .c hole ra eV C1 4 0 3 V .c hole ra eV CA1 0 8 8 V . c hol e ra eV C1 3 9 4 V .c hole ra eV C0 6 2 2 NJ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * *
  57. 57. TIGRTIGR Phosphate Transporters ARCFU SYNSP THEMA AQUAE METJA MCYTU MCYTU VIBCH ECOLI DEIRA_ORF00198 DEIRA_ORFA00139 DEIRA_ORF00510
  58. 58. TIGRTIGR Levels of Paralogy Within A Genome • All – All members of a gene family are linked together • Top matches – Only top matching pairs are linked together. Therefore, if in a large gene family, only the pair from the most recent duplication event is included • Recent – Operational definition based on comparison to other species. Only pairs which are more similar to each other than to selected other species are included. TIGRTIGR
  59. 59. TIGRTIGR C. pneumoniae Paralogs - All 0 250000 500000 750000 1000000 1250000 SubjectOrfPosition 0 250000 500000 750000 1000000 1250000 Query Orf Position TIGRTIGR
  60. 60. TIGRTIGR C. pneumoniae Paralogs - Top 0 250000 500000 750000 1000000 1250000 SubjectOrfPosition 0 250000 500000 750000 1000000 1250000 Query Orf Position TIGRTIGR
  61. 61. TIGRTIGR C. pneumoniae Paralogs – Recent 0 250000 500000 750000 1000000 1250000 SubjectOrfPosition 0 250000 500000 750000 1000000 1250000 Query Orf Position TIGRTIGR
  62. 62. TIGRTIGR E. coli Paralogs - All 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 SubjectCoordinates 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Query Coordinates TIGRTIGR
  63. 63. TIGRTIGR E. coli Paralogs - Top 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 SubjectCoordinates 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Query Coordinates TIGRTIGR
  64. 64. TIGRTIGR E. coli Paralogs - Recent 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 SubjectCoordinates 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Query Coordinates TIGRTIGR
  65. 65. TIGRTIGR 0 1000 2000 3000 4000 ChromosomePositionofRecentDuplicate 0 1000 2000 3000 4000 Chromosome Position ofQuery Recent Duplications in N. meningitidis TIGRTIGR
  66. 66. TIGRTIGR 0 500000 1000000 1500000 Query ORF Chromosome Position C. pneumoniae AR39 BestMatchChromosomePosiion 0 500000 1000000 C. trachomatis MoPn Query ORF Chromosome Position BestMatchChromosomePosiion A. B. 0 500000 1000000 1500000 0 500000 1000000
  67. 67. TIGRTIGR Uses of Phylogenomics IV: Genetic Exchange within Genomes
  68. 68. TIGRTIGR Circular Maps
  69. 69. TIGRTIGR D. radiodurans Transposase Family DEIRA_ORF01427_transposase__ps DEIRA_ORF01431_transposase_{Sy DEIRA_ORF03257_transposase_{Sy DEIRA_ORFB01001_transposase__p DEIRA_ORFB01020_transposase_{S DEIRA_ORFB01025_transposase_{S DEIRA_ORFB01012_transposase_{S DEIRA_ORFB01035_transposase_{S DEIRA_ORFC0021_transposase_{Sy DEIRA_ORFC0025_hypothetical_pr DEIRA_ORFC0018_transposase__ps ORFB ORF0 ORFC
  70. 70. TIGRTIGR
  71. 71. TIGRTIGR Uses of Phylogenomics V: Gene Loss
  72. 72. TIGRTIGR Why Gene Loss is Useful to Identify • Indicates that gene is not absolutely required for survival • Helps distinguish likelihood of gene transfers • Correlated loss of same gene in different species may indicate selective advantage of loss of that gene • Correlated loss of genes in a pathway indicates a conserved association among those genes
  73. 73. TIGRTIGR EuksArch Bacteria Loss Evolutionary O rigin of Gene MT MJ SC HS AA DR TA BS MG MP BB TP HP HI EC SS MT Presence ( ) or Absence of Gene Species Abbreviation Kingdom Example of Tracing Gene Loss TIGRTIGR
  74. 74. TIGRTIGR 5 1 2 3 4 E.coli H.influenzae N.gonorrhoeae H.pylori Syn.sp B.subtilis S.pyogenes M.pneumoniae M.genitalium A.aeolicus D.radiodurans T.pallidum B.burgdorferi A.aeolicus Spyogenes B.subtilis Syn.sp D.radiodurans B.burgdorferi Syn.sp B.subtilis S.pyogenes A.aeolicus D.radiodurans B.burgdorferi MutS2 MutS1 A. B. Gene Duplication Gene Duplication Ancient Duplication in MutS Family
  75. 75. TIGRTIGR Need for Phylogenomics Example: Gene Duplication and Loss • Genome analysis required to determine number of homologs in different species • Evolutionary analysis required to divide into orthology groups and identify gene duplications • Genome analysis is then required to determine presence and absence of orthologs • Then loss of orthologs can be traced onto evolutionary tree of species
  76. 76. TIGRTIGR Uses of Phylogenomics VII: Specialization
  77. 77. TIGRTIGR Circular Maps
  78. 78. TIGRTIGR Species Distribution of Homologs of D. radiodurans Genes 0 10 20 30 40 50 60 0 5 10 15 20 0 50 100 150 0 5 10 15 20 NumberofSpeciesWithHighHits 0 50 100 150 200 250 Frequency 0 5 10 15 20 PapaBear MamaBear BabyBear 0 100 200 300 400 500 0 5 10 15 20 E.coli
  79. 79. TIGRTIGR Megaplasmid I: Iron Utilization/Iron Transport ORFB040 Na+/H+ antiporterORFB040 Na+/H+ antiporter ORFB042 iron ABC transporter, ATP-binding proteinORFB042 iron ABC transporter, ATP-binding protein ORFB044 iron ABC transporter, permease proteinORFB044 iron ABC transporter, permease protein ORFB045 iron ABC transporter, permease proteinORFB045 iron ABC transporter, permease protein ORFB046 iron-chelator utilization proteinORFB046 iron-chelator utilization protein ORFB047 iron ABC transporter, periplasmic substrate bpORFB047 iron ABC transporter, periplasmic substrate bp ORFB067 putative metal binding proteinORFB067 putative metal binding protein ORFB141 iron-chelator utilization proteinORFB141 iron-chelator utilization protein ORFB074 hemin ABC transporter, periplasmic hemin bpORFB074 hemin ABC transporter, periplasmic hemin bp ORFB075 hemin ABC transporter, permease proteinORFB075 hemin ABC transporter, permease protein ORFB076 hemin ABC transporter, ATP-binding proteinORFB076 hemin ABC transporter, ATP-binding protein
  80. 80. TIGRTIGR Specialized Genetic Elements (Chromosome II and Megaplasmid) • Many two component systems • Nitrogen metabolism • LexA • Ribonucleotide reductase • UvrA2 • Many transcription factors (e.g., HepA) • Iron metabolism
  81. 81. TIGRTIGR Uses of Phylogenomics VIII: Comparison of Closely Related Genomes
  82. 82. TIGRTIGR V. cholerae vs. E. coli All Hits 0 1000000 2000000 3000000 4000000 5000000E.coliCoordinates 0 1000000 2000000 3000000 V. cholerae CoordinatesTIGRTIGR
  83. 83. TIGRTIGR V. cholerae vs. E. coli Top Hits 0 1000000 2000000 3000000 4000000 5000000 E.coliCoordinates 0 1000000 2000000 3000000 V. cholerae CoordinatesTIGRTIGR
  84. 84. TIGRTIGR V. cholerae vs. E. coli Only if EC-Orf is Closest in All Genomes 0 1000000 2000000 3000000 4000000 5000000 E.coliCoordinates 0 1000000 2000000 3000000 V. cholerae Coordinates TIGRTIGR
  85. 85. TIGRTIGR V. cholerae vs. E. coli Proteins Top 0 1000000 2000000 3000000 4000000 V. cholerae ORF Coordinates
  86. 86. TIGRTIGR V. cholerae vs. E. coli F+R 0 1000000 2000000 3000000 4000000 5000000 Bert Ecoli R Ecoli
  87. 87. TIGRTIGR S. pneumoniae vs. S. pyogenes DNA F+R 0 500000 1000000 1500000 2000000 BSP vs Spyo
  88. 88. TIGRTIGR M. tuberculosis vs. M. leprae DNA 0 1000000 2000000 3000000 4000000 M1
  89. 89. TIGRTIGR C. trachomatis MoPn C.pneumoniaeAR39 Origin Termination C. trachomatis vs C. pneumoniae Dot Plot
  90. 90. TIGRTIGR Duplication and Gene Loss Model A B CD E F A B CD E F A B C D E F A B C D E F A ’ B’ C’ D’ E’ F ’ A B C D E F A ’ B’ C’ D’ E’ F’ A C D F A ’ B’ E’ E. coli E. coli B C D F A ’ B’ D’ E’ V. cholerae A B C D E F A ’ B’ C’ D’ E’ F’
  91. 91. TIGRTIGR B1 A1 B2 A2 B3 A3 A2 A1 A2 A3 B2 B1 B3 B2 24 23 22 21 20 19 18171615 14 13 12 11 10 9 6 7 258 26 27 28 29 30 1 2 3 4 5 3132 B1 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 B3 24 23 22 21 20 19 18171615 14 13 12 11 10 9 6 7 258 26 27 28 29 3 3231 30 4 5 2 1 A1 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A2 3132 6 7 8 9 10 11 12 13 19 18171615 14 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A3 2 6 7 8 9 10 11 12 13 19 18171615 14 20 21 22 23 24 25 26 27 5 4 3 31 30 29 28 1 32 B2 Inversion A round Terminus (*) Inversion A round Terminus (*) Inversion A round Origin (*) Inversion A round Origin (*) * * * * * * * * Figure 4 C ommon Ancestor of A and B 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132
  92. 92. TIGRTIGR M. tuberculosis strain phylogeny (Indels)
  93. 93. TIGRTIGR Musser-Type Evolution (Indel Phylogeny) 98a 107a 43a 73a 105a 133a 114a 169a 218a 290a 160a 159a 13a 18a 26a 30a 32a 53a 58a 70a 96a 97a 100a 124a 204a 208a 236a 239a 249a 286a 99a 279a 205a 304a 54a 155a 165a CDC1551a 223a 110a 122a 245a 313a 36a 40a 71a 79a 168a 254a 283a 312a 4a 12a 41a 42a 52a 77a 187a 214a 81a 129a 274a 220a 64a 48a 55a 60a 72a 80a 83a 85a 89a 91a 95a 111a 170a 171a 182a 212a 219a 225a 244a 278a 301a 195a 2a 123a 207a 306a 69a 94a 101a 102a 112a 113a 121a 132a 211a 222a 235a 250a 284a 285a N1a 87a 117a 120a 136a 191a 237a 261a 37a 131a 269a 240a 63a 197a 206a 75a 108a 263a 128a 172a 162a 86a 38a 109a 119a 248a 6a 65a 68a 189a 66a 106a 227a 31a 78a 202a 213a 62a 163a 224a 256a 276a 287a 173a 291a 252a 281a 295a 310a 251a 151a 188a 292a 140a 141a 103a 174a 229a 259a H37Rv 88a 44a 74a 76a 126a 282a 166a 210a 84a
  94. 94. TIGRTIGR Consistency Indices (Indel Phylogeny) Calculated over stored trees CI 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 maximum average minimum 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 Character
  95. 95. TIGRTIGR M. tuberculosis strain phylogeny (Indels/SNPs)
  96. 96. TIGRTIGR Musser-Type Evolution (Combined Phylogeny)
  97. 97. TIGRTIGR Consistency Indices (Combined Phylogeny) Calculated over stored trees CI 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 maximum average minimum 2 9 0 4 3 1 3 5 3 1 9 4 4 0 9 0 4 2 3 9 4 2 4 6 4 5 8 9 5 1 9 8 4 2 5 1 1 0 4 9 2 4C 6 6B 7 8C 9B 11B 14 15 15B 18C 4 8 12 16 18 M U S S E R S m e a r Si te 2 1 3 4 Character
  98. 98. TIGRTIGR Uses of Phylogenomics VI: Horizontal Gene Transfer and Species Evolution
  99. 99. TIGRTIGR Vertical Inheritance
  100. 100. TIGRTIGR Examples of Horizontal Transfers • Antibiotic resistance genes on plasmids • Insertion sequences • Pathogenicity islands • Toxin resistance genes on plasmids • Agrobacterium Ti plasmid • Viruses and viroids • Organelle to nucleus transfers
  101. 101. TIGRTIGR Why Gene Transfers Are Useful to Identify • Laterally transferred genes frequently involved in environmental adaptations and/or pathogenicity • Helps identify transposons, integrons, and other vectors of gene transfer • Helps identify species associations in the environment •
  102. 102. TIGRTIGR Steps in Lateral Gene Transfer 1 2 3-5 6 A B C D
  103. 103. TIGRTIGR How to Infer Gene Transfers • Unusual distribution patterns • Unusual nucleotide composition • High sequence similarity to supposedly distantly related species • Unusual gene trees • Observe transfer events
  104. 104. TIGRTIGR Inferring Lateral Transfers Observation Other Causes Always Occurs Unusual Distribution Sampling bias Not if recipient already has gene. Unusual GC/Codons Selection Not if donor/recipient similar. Not if it occurred long ago. High hit to "distant" species Selection Rate variation Gene loss Usually. Incongruent trees Bad trees Missed paralogs Usually. Correlation of above with neighbors Selection Only if genes keep order after transfer.
  105. 105. TIGRTIGR E. coli and S. typhimurium Transfer E. coli S. typhimurium Old Model E. coli S. typhimurium New Model
  106. 106. TIGRTIGR PGKPGK Neighbor-joining;Neighbor-joining; bootstrap;bootstrap; 50% majority rule50% majority rule consensusconsensus outgroup = Archaeaoutgroup = Archaea T. maritima M. genitalium M. pneumoniae A. aeolicus B. burgdorferi T. pallidum B. subtilis Synechocystis E. coli H. influenzae H. pylori M. tuburculosis S. cerevisiae A. fulgidus M. jannascii M. thermoauto P. horikoshii 89 57 100 59 58 58 100 83 100 B. subtilis S. cerevisiae T. maritima H. pylori M. pneumoniae M. genitalium Synechocystis B. burgdorferi T. pallidum P. horikoshii M. jannascii M. thermoautoA. fulgidus A. aeolicus H. influenzae E. coli M. tuburculosis
  107. 107. TIGRTIGR Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes** Bacterial speciesBacterial species Best hits to ArchaealBest hits to Archaeal Thermotoga maritimaThermotoga maritima 451 (24%)451 (24%) Aquifex aeolicusAquifex aeolicus 246 (16%)246 (16%) SynechocystisSynechocystis sp.sp. 126 (4%)126 (4%) Borrelia burgdorferiBorrelia burgdorferi 45 (3.6%)45 (3.6%) Escherichia coliEscherichia coli 99 (2.3%)99 (2.3%) ** 1010-5-5 over 60% of sequenceover 60% of sequence
  108. 108. TIGRTIGR Evidence for lateral gene transfer inEvidence for lateral gene transfer in ThermotogaThermotoga 1. 81 archaeal-like genes are clustered in 15 regions which range in size from ~ 4 to 20 kb; many share conserved gene order with their archaeal counterparts. 2. Many of the archaeal-like genes correspond to regions with a significantly different base composition than the rest of the chromosome. 3. Some of these regions are associated with a 30 bp repeat structure found only in thermophiles. 4. Initial phylogenetic analyses of some of these genes lends support to lateral gene transfer.
  109. 109. TIGRTIGR 0987 09900989ThermotogaThermotoga ORFORF Archaea homologArchaea homolog Bacterial homologBacterial homolog Eukaryote homologEukaryote homolog ThermotogaThermotoga ORFORF Archaea homologArchaea homolog Bacterial homologBacterial homolog Eukaryote homologEukaryote homolog 0988 0991 0992 0993 0994 0995 0996 0997 0998 0999 1000 10021001 1003 Region TM00987 - TM1003 ( 21kb Archaea-like stretch)Region TM00987 - TM1003 ( 21kb Archaea-like stretch) 79% 69% 69% 72% 72% 69% 65%61% 78% 72% TransposonTransposon 54% 48% 68% 51% 73% 73% Regulatory proteinRegulatory protein
  110. 110. TIGRTIGR Species Distribution of Top Hits: A. thaliana Chr II 0 250 500 750 1000 TopHits EAB Syn. sp
  111. 111. TIGRTIGR A. thaliana T1E2.8 is a Chloroplast Derived HSP60 AR ATH -T1E2.8********** ECOL HAEIN VIBCH VIBCH RICPR YEAST CHLPN CHLTR AQUAE CAMJE HELPY BBUR TREPA THEMA BACSU DEIRA MCYTU MCYTU SYNSP SYNSP ODONT CPST MYCGE MYCPN CHLPN CHLTR CHLPN CHLTR ARCFU ARCFU METJA PYRHO METTH METTH YEAST YEAST YEAST YEAST CELEG YEAST YEAST YEAST CELEG YEAST YEAST CELEG YEAST CELEG CELEG Eukarya Archaea Bacteria Cyano/Cpst
  112. 112. TIGRTIGR ParA Phylogeny pOMB25.Bor BBl32.Borb Borbu3 Borbu.2 BBM32.Borb CP32-6.Bor BBA20.Borb Cp18.Borbu pOMB10.Bor pLp7E.Borb BBE19.Borb BBB12.Borb BBN32.Borb BBF13.Borb BBH28.Borb BBK21.Borb BBU05.Borb BBJ17.Borb BBQ08.Borb BBF24.Borb OrfC.Borbu BBG08.Borb Pyrab Pyrho YZ24 METJA IncC1.Enta IncC2.Enta INC1 ECOLI INC2 ECOLI Orf.pRK2 IncC.pRK2 pM3.ParA ORF3.Pseae ORFB.Psepu 2603.Vibch***** ParA.Strco Strco2 Strco3 Myctu4 Mycle3 Deira.Chro Soj.Trepa SOJ BACSU Ricpr YGI1 PSEPU ParA.Caucr pAG1.Corgl Mycle Mycle2 Rv1708.Myc Strco Rv3213.Myc Helpy99 Helpy26695 A00900.Vib***** ParB.pR27. ParA.pMT1. parA.pMT1 parA.phage ParA phage ORFA00900 SOPA ECOLI F-Plasmid PhageN13 pCD1.Yerpe pCD1#2.Yer pYVe227.Ye pNL1.Sphar pQPH1.Coxb p42d.Rhile p42d.Rhiet REPA AGRRA pRiA4b.Agr pTiB6S3.Ag pTi-SAKURA pRL8JI.Rhi Y4CK Plasm ParA.Raleu pL6.5.Psef Chr2.Deira MP1#2.Deir MP1.Deira PX02.Bacan ORF298.Clo SojC.Halsp Borbu4 sojD.Halsp plasmid.St SojB.Halsp ParA.Rhoer SOJ MYCPN SOJ MYCGE MinD2.Pyra Pyrho2 pK214.Lacl PatA.synsp Deira.ParA pCHL1.Chlt2 GP5D CHLTR pCHL1.Chlt Chltr Chlps Chlps2 Chlpn Chltr2 Chlpn2 Chromosomal Plasmid and Phage BBQ08.Borb Chlamydial Inc Borrelia Plasmids Archaea Misc Evolution of Chromosome Partitioning Proteins (ParA)
  113. 113. TIGRTIGR 0 0.1 0.2 0.3 0.4 C B A 0 Best Matches by Genetic Element (D. radiodurans)
  114. 114. TIGRTIGR N. meningitidis hits vs. genome size 0.0 250.0 500.0 750.0 1000.0 1250.0 Number ofN. meningitidis ORFs that have a significant hit 0.01000.02000.03000.04000.05000.0 TotalORFsinGenome Proteome Comparison ofN. meningitidis to other Complete Genomes Archaea H. influenzae V. cholerae E. coli
  115. 115. TIGRTIGR Horizontal Gene Transfer II
  116. 116. TIGRTIGR Reconciling a Tree of Life in the Context of Lateral Gene Transfer
  117. 117. TIGRTIGR rRNA Tree of Complete Genomes Mycobacterium tuberculosis Bacillus subtilis Synechocystis sp. Caenorhabditis elegans Drosophila melanogaster Saccharomyces cerevisiae Methanobacterium thermoautotrophicum Archaeoglobus fulgidus Pyrococcus horikoshii Methanococcus jannaschii Aeropyrum pernix Aquifex aeolicus Thermotoga maritima Deinococcus radiodurans Treponema pallidum Borrelia burgdorferi Helicobacter pylori Campylobacter jejuni Neisseria meningitidis Escherichia coli Vibrio cholerae Haemophilus influenzae Rickettsia prowazekii Mycoplasma pneumoniae Mycoplasma genitalium Chlamydia trachomatis Chlamydia pneumoniae 0.05 changes Archaea Bacteria Eukarya
  118. 118. TIGRTIGR Whole Genome Phylogeny
  119. 119. TIGRTIGR rRNA vs. Whole Genome Trees Mycobacterium tuberculosis Bacillus subtilis Synechocystis sp. Caenorhabditis elegans Drosophila melanogaster Saccharomyces cerevisiae Methanobacterium thermoautotrophicum Archaeoglobus fulgidus Pyrococcus horikoshii Methanococcus jannaschii Aeropyrum pernix Aquifex aeolicus Thermotoga maritima Deinococcus radiodurans Treponema pallidum Borrelia burgdorferi Helicobacter pylori Campylobacter jejuni Neisseria meningitidis Escherichia coli Vibrio cholerae Haemophilus influenzae Rickettsia prowazekii Mycoplasma pneumoniae Mycoplasma genitalium Chlamydia trachomatis Chlamydia pneumoniae 0.05 changes Archaea Bacteria Eukarya
  120. 120. TIGRTIGR Gram Positives Archaea Eukaryotes Syn. sp Aquifex Thermotoga Proteobacteria Top Hits of D. radiodurans Genes
  121. 121. TIGRTIGR rRNA Suggested Deinococcus-Thermus Relationship From Embley et al. Syst. Appl. Microbiol. 16: 25-29 1993
  122. 122. TIGRTIGR Serratia marcescens Proteus mirabilis Proteus vulgaris Escherichia coli Erwinia carotovora Yersinia pestis Enterobacter agglomerans Vibrio anguillarum Vibrio cholerae Haem ophilus influenzae Pseudomonasfluorescens Pseudomonasputida Pseudomonasaeruginosa Azotobacter vinelandii Acinotobactercalcoaceticus Methylophilusmethylotrophus Methylomonasclara Methylobacillusflagellatum Burkholderia cepacia Bordetella pertussis Xanthomonas oryzae Legionella pneumophila Acidiphilum facilis Thiobacillus ferrooxidans Neisseria gonorrhoeae Rhizobium viciae Myxococcus xanthus1 Myxococcus xanthus2 Campylobacter jejuni StreptomycesviolaceusStreptomyceslividans Streptomycesambofaciens Mycobacteriumleprae Mycobacteriumtuberculosis Corynebacteriumglutamicum Arabidopis thaliana CPST Synechococcussp.PCC7002 Synechococcussp.PCC7942 Anabaenavariabilis Thermotoga maritima Lactococcuslactis Streptococcuspneumoniae Staphylococcusaureus Bacillussubtilis Acholeplasm a laidlawii Borrelia burgdorferi Mycoplasma pulmonis Mycoplasma mycoides Bacteroides fragilis Chlaymida trachomatis Thermus thermophilus Thermus aquaticus Deinococcus radiodurans Aquifex pyrophilus 0.10 α γ1 γ2 β Gram '+' High GC Cyanobacteria Gram '+' Low GC D/T Magnetospirillum magnetotacticum Helicobacter pylori ε δ 95 98 79 100 100 100 90 63 100 94 84 100 95 10088 93 91 75 100 100 100 100 100 8398 100 100 100 Rhizobium phaseoli Agrobacterium tumefaciens Rhizobium meliloti Brucella abortus Rhodobacter sphaeroides Rhodobacter capsulatus Rickettsia prowazekii Acetobacter polyoxogenes 72 97 78 100 71 100 100 77 88 100 61 55 54 48 49 42 48 46 50 63 46 100 40
  123. 123. TIGRTIGR Deinococcus-Thermus Comparison • Took all available T. thermophilus proteins • Searched against database of all available complete genomes (including D. radiodurans) • Identified gene with highest fasta p value • Phylogenetic analysis of all genes with >4 homologs
  124. 124. TIGRTIGR Other Bacteria Archaea D. radiodurans Top Hits of T. thermophilus Proteins
  125. 125. TIGRTIGR Significance of Deinococcus- Thermus Relationship • Mechanisms of extreme heat, radiation, and desiccation resistance may be similar • Complete genome of Thermus will be very useful in identifying novel genes in Deinococcus • Shows utility of incomplete genome sequences.
  126. 126. TIGRTIGR Outline of Phylogenomics Gene Evolution Events Phenotype Predictions Database Species tree Presence/AbsenceGene trees Congruence Evol. Distribution F(x) Predictions Pathway Evolution TIGRTIGR
  127. 127. TIGRTIGR Steps in Phylogenomic Analysis • Create database of genes of interest • Presence/absence of homologs in complete genomes • Phylogenetic trees of each gene family • Infer evolutionary events (gene origin, duplication, loss and transfer) • Refine presence/absence (orthologs, paralogs, subfamilies) • Functional predictions and functional evolution • Analysis of pathways
  128. 128. TIGRTIGR Phylogenomics I: Presence/Absence of Homologs • Important to have complete genomes • Similarity searches with high “homology threshold” (to prevent false positives) • Iterative searches (to prevent false negatives) • Multiple sequence alignments to confirm assignment of homology and to divide up multi-domain proteins
  129. 129. TIGRTIGR Phylogenomics II: Phylogenetic Analysis of Homologs • Multiple sequence alignment • Mask alignment (exclude certain regions) – ambiguous regions of alignment – hypervariable regions and regions with large gaps • Phylogenetic tree with method of choice • Robustness checks – bootstrapping – compare trees with different alignments – compare trees with different tree-building methods
  130. 130. TIGRTIGR Phylogenomics III: Inferring Evolutionary Events • Infer evolutionary distribution patterns (overlay presence/absence onto species tree) • Compare gene tree vs. species tree • Compare gene tree vs. evolutionary distribution • Infer gene duplication and transfer events • Combine gene transfer and duplication information with evolutionary distribution analysis to infer gene loss, gene origin, and timing of gene duplications and transfers
  131. 131. TIGRTIGR Phylogenomics IV: Functional Predictions and Evolution • Overlay experimentally determined functions onto gene tree • Infer changes in function – many changes suggests caution should be used in making new predictions • Predict functions based on position in tree relative to genes with known functions and based on orthology groups
  132. 132. TIGRTIGR Phylogenomics V: Pathway Analysis • Correlated presence/absence of all genes in pathway in different species? – If not, maybe non-orthologous gene displacement – Alternatively, pathway may be different between species • Correlated evolutionary events for genes in pathway – loss of all genes at once – correlated duplications? • Compare evolution of function between pathways – The number of times an activity has evolved helps in making predictions of function/phenotype
  133. 133. TIGRTIGR Evolution as a Screening Method • Gene duplications • Gene loss • Lateral gene transfers • Organellar genes • Structurally constrained genes • Correlated evolutionary changes
  134. 134. TIGRTIGR Evolutionary Genome Scanning • Distribution patterns/phylogenetic profiles • Patterns of evolution – (ds/dn) – Structurally constrained genes – Correlated evolutionary changes • Lateral gene transfers – Organellar genes – Pathogenicity islands • Subdividing gene families – Orthologs vs paralogs – Functional predictions – Subfamilies – Motif identification • Gene duplications • Gene loss
  135. 135. TIGRTIGR Genome Sequences Allow “Hypothesisless Research” • DNA microarrays • Proteomics • GC skew and other nucleotide composition analyses • Parallel genome wide genetic experiments • Evolutionary genome scanning • Phylogenetic profiles
  136. 136. TIGRTIGR Evolutionary Diversity Still Poorly Represented in Complete Genomes Tmf-penden R-rubrum3 Azs-brasi2 Rm-vanniel Rhb-legum8 Bdr-japoni Spg-capsul Ric-prowaz Ste-maltop Spr-voluta Rub-gelat2 Rcy-purpur Nis-gonor1 Hrh-halch2 Alm-vinosm Ps-aerugi3 E-coliMyx-xanthu Bde-stolpiDsv-desulfDsb-postgaC-leptum C-butyric4 C-pasteuri Eub-barker C-quercico Hel-chlor2 Acp-laidla M-capricol C-ramosum B-stearoth Eco-faecal Lis-monoc3 B-cereus4 B-subtilis Stc-therm3 L-delbruck L-casei Fus-nuclea Glb-violac Olst-lut_CZeamaysC Nost-muscr Syn-6301 Tnm-lapsum Flx-litora Cy-lytica Emb-brevi2 Bac-fragil Prv-rumcol Prb-difflu Cy-hutchin Flx-canada Sap-grandi Chl-limico Wln-succi2 Hlb-pylor6 Cam-jejun5Stm-ambofa Arb-globif Cor-xerosi Bif-bifidu Cfx-aurant Tmc-roseum Aqu-pyroph env-SBAR12 env-SBAR16 Msr-barker Tpl-acidop Msp-hungat Hf-volcani Mb-formici Mt-fervid1 Tc-celer Arg-fulgid Mpy-kandl1 M c-vanniel Mc-jannasc env-pJP27 Sul-acalda Thp-tenax env-pJP89 Tt-maritim Fer-island M ei-ruber4 D-radiodur Chd-psitta Acbt-capsl env-MC18 Pir-staley Lpn-illini Lps-interKSpi-stenos Trp-pallid Bor-burgdo Spi-haloph Brs-hyodys Fib-sucS85 Tmf-penden R-rubrum3 Azs-brasi2 Rm-vanniel Rhb-legum8 Bdr-japoni Spg-capsul Ric-prowaz Ste-maltop Spr-voluta Rub-gelat2 Rcy-purpur Nis-gonor1 Hrh-halch2 Alm-vinosm Ps-aerugi3 E-coliMyx-xanthu Bde-stolpiDsv-desulfDsb-postgaC-leptum C-butyric4 C-pasteuri Eub-barker C-quercico Hel-chlor2 Acp-laidla M-capricol C-ramosum B-stearoth Eco-faecal Lis-monoc3 B-cereus4 B-subtilis Stc-therm 3 L-delbruck L-casei Fus-nuclea Glb-violac Olst-lut_CZeamaysC Nost-muscr Syn-6301 Tnm -lapsum Flx-litora Cy-lytica Emb-brevi2 Bac-fragil Prv-rumcol Prb-difflu Cy-hutchin Flx-canada Sap-grandi Chl-limico Wln-succi2 Hlb-pylor6 Cam-jejun5Stm-ambofa Arb-globif Cor-xerosi Bif-bifidu Cfx-aurant Tmc-roseum Aqu-pyroph env-SBAR12 env-SBAR16 Msr-barker Tpl-acidop Msp-hungat Hf-volcani Mb-formici Mt-fervid1 Tc-celer Arg-fulgid Mpy-kandl1 M c-vanniel Mc-jannasc env-pJP27 Sul-acalda Thp-tenax env-pJP89 Tt-maritim Fer-island M ei-ruber4 D-radiodur Chd-psitta Acbt-capsl env-MC18 Pir-staley Lpn-illini Lps-interKSpi-stenos Trp-pallid Bor-burgdo Spi-haloph Brs-hyodys Fib-sucS85 Bacteria Archaea Bacteria Archaea A.rRNAtreeofBacterialandArchaealMajorGroups B.GroupswithCompletedGenomesHighlighted
  137. 137. TIGRTIGR Acknowledgements • Genome duplications: S. Salzberg, J. Heidelberg, O. White, A. Stoltzfus, J. Peterson • Genome sequences and analysis: J. Heidelberg, T. Read, H. Tettelin, K. Nelson, J. Peterson, R. Fleischmann • Horizontal transfers: K. Nelson, W. F. Doolittle • TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul, Seqcore • $$$: DOE, NSH, NIH, ONR TIGRTIGR
  138. 138. TIGRTIGR TIGTIG RR OtherOther peoplepeople Mom and DadMom and Dad S. KarlinS. Karlin M. FeldmanM. Feldman A. M. CampbellA. M. Campbell R. FernaldR. Fernald R. ShaferR. Shafer D. AckerlyD. Ackerly D. GoldsteinD. Goldstein M. EisenM. Eisen J. CourcelleJ. Courcelle R. MyersR. Myers C. M. CavanaughC. M. Cavanaugh P. HanawaltP. Hanawalt NSFNSF J. HeidelberJ. Heidelber T.ReadT.Read S. KaulS. Kaul M-I BenitoM-I Benito J. C. VenterJ. C. VenterC. FraseC. Fraser S. SalzbergS. Salzberg O. WhiteO. White K. NelsonK. Nelson $$$$$$ ONRONR DOEDOE NIHNIH H. TettelinH. Tettelin
  139. 139. TIGRTIGR Using Evolutionary Analysis To Help Identify Novel Features in D. radiodurans
  140. 140. TIGRTIGR Origins of Extreme Resistance • Functional divergence • Evolution of novel genes/processes • Acquire genes from other species • Gene duplication and functional divergence • Enhanced catalytic efficiency and/or coordination
  141. 141. TIGRTIGR
  142. 142. TIGRTIGR SNF2 Family of Proteins (1995) • SNF2 family defined by presence of conserved DNA- dependent ATPase domain • 100s of proteins • Diversity of functions: – transcriptional activation (SNF2) – transcriptional repression (MOT1) – Recombination (RAD54) – transcription-coupled repair (CSB) – post-replication repair (RAD5) – chromosome segregation (lodestar) – Many with unknown functions • Some species have 15+ representatives
  143. 143. TIGRTIGR How to Sort Out Diversity in SNF2 Family • Presence of additional motifs – RING fingers – Bromodomains – Chromodomains • Interactions with other proteins • Evolutionary relationships – Orthology and paralogy – Subfamilies – Relationships among subfamilies
  144. 144. TIGRTIGR SNF2 Alignment BRM hBRM hBRG1 mBRG1 STH1 SNF2 YB95 F37A4 ISWI SNF2L CHD1 SYGP ETL1 FUN30 MOT1 ERCC6 RAD26 YB53 DNRPPX hNUCP mNUCP RAD5 spRAD8 HIP116 RAD16 LODE NPH42 HepA B.cereus ORF I Ia Ib II III V VI C C R R R R Br CHD1 SNF2 SNF2L ETL1 RAD16 ERCC6 RAD54 RAD54 Br Br Br Br Br Protein Sub- Family SCALE (aa) 0 500 Helicase Motif s -- MOT1 IV
  145. 145. TIGRTIGR SNF2 Subfamilies Subfamily Conserved Function SNF2 Transcription activation (Swi/Snf complex) SNF2L Transcription activation (NURF complex) CHD1 Chromatin remodelling ETL1 Unknown MOT1 Transcription repression CSB Transcription-coupled repair Rad54 Recombinational repair Rad16 Chromatin access for DNA repair HepA Bacterial RNA polymerase subunit HepA2 Unknown
  146. 146. TIGRTIGR What Evolutionary Analysis Reveals About the SNF2 Family • Ancient duplication into two lineages may distinguish genes by type of activity • Multiple subfamilies with distinct sequences and functions. • Presence of particular orthologs can be predicted in species for which they have not been cloned. • Predict functions of uncharacterized members by orthology. • Addition of motifs to SNF2 domain occurred early in eukaryotic evolution. • Many duplications within eukaryotes. • Classificaiton into subfamilies helps search for functional motifs
  147. 147. TIGRTIGR ETL1_M.m YA19_S.c CHD1_M.m SYGP4_S.c MOT1_S.c ERCC6_H.s RAD26_S.c NUCP_H.s NUCP_M.m YB53_S.c RAD54_S.c DNRPPX_S.p RAD5_S.c RAD8_S.p HIP116A_H.s RAD16_S.c LODE._D.m NPHCG_42 HEPA._E.c YB95_S.c F37A4_C.e ISWI_D.m SNF2L_H.s BRM_D.m BRM_H.s BRG1_H.s BRG1_M.m STH1_S.c SNF2_S.c SNF2 SNF2L CHD1 ETL1 CSB R A D54 R A D16 LO DE Evolution of the SNF2 Family of Proteins
  148. 148. TIGRTIGR

×