2. Topics of Discussion
• DNA Repair
• Why study evolution of repair?
• Evolution of specific pathways with examples
from recent genome projects (e.g., A. thaliana,
Vibrio cholerae, Shewanella putrefaciens,
Buchnera aphidicolum symbiont)
• Big picture – evolutionary origins of repair
TIGR
5. General Mechanisms of Resistance to
Cellular Damaging Agents
• Damage protection/prevention
• Damage tolerance
• Repair and recovery
TIGR
6. Classes of DNA Repair
• Direct repair
– Photoreactivation
– Alkylation transfer
– DNA ligation/non-homologous end joining
• Excision repair
– Base excision repair
– Mismatch excision repair
– Nucleotide excision repair
• Recombinational repair
TIGR
7. Excision Repair Outline
NUCLEOTIDE BASE EXCISION
and
M ISM ATCH EXCISION
Damage
N-glycosylase
Re cognition
Endonucle ase AP e ndo
* *
Exonucle ase ,
He licase , Exonucle ase ,
Polyme rase Polyme rase
Ligase
TIGR
8. Recombination Outline
RecBCD Generation of
RecE,T single-strand
RecQ,J overhang
Rad50, M 11, XRS2
RE
RecA Initiation,
RecFOR alignment
Rad 52
RecA
Rad51,55,57 Strand invasion
DNA synthesis
RuvABC
RecG,RUS?
Branch migration
Rad54? and resolution
TIGR
9. “Nothing in biology makes sense
except in the light of evolution.”
T. H. Dobzhansky (1973)
TIGR
10. Why Study Evolution and Repair?
• Repair variation leads to differences in evolutionary
patterns within and between species.
• Evolutionary analysis can identify mutation/repair biases.
• Evolutionary studies can improve our understanding of
repair proteins and pathways.
• Comparisons of repair genes can be used to infer
evolutionary history.
• Information on mutation processes improves sequence and
phylogenetic analysis.
• Evolutionary analysis is required to infer the origins and
history of repair processes.
TIGR
11. Steps in Phylogenomic Analysis
• Create database of genes of interest
• Presence/absence of homologs in complete genomes
• Phylogenetic trees of each gene family
• Infer evolutionary events (gene origin, duplication, loss and
transfer)
• Refine presence/absence (orthologs, paralogs, subfamilies)
• Functional predictions and functional evolution
• Analysis of pathways
TIGR
15. Photoreactivation and Photolyases
• All photoreactivation is carried out by enzymes in the photolyase
family
• Two main classes of photolyases – class I and class II – are
distantly related to each other and likely the result of an ancient
duplication
• PhrI and PhrII missing from most species for which complete
genomes are available.
• Many cases of functional change (e.g., CPD -> 6-4) and some are
not even involved in DNA repair
• Many of the eukaryotic proteins appear to be of an organellar
ancestry
TIGR
16. Uses of Evolution : Photoreactivation
• All known enzymes that perform photoreactivation are part of
a single large photolyase gene family
• Some members of the family do not function as photolyases,
but instead work as blue-light receptors
• If a species does not encode a member of the photolyase gene
family, it likely does not have photoreactivation capability
• If a species encodes a photolyase, one cannot conclude it has
photolyase activity
• Position of photolyase homologs within photolyase tree helps
predict what activities they have
TIGR
17. Phr.S thyp
PHR E. coli
O R FA0 0 9 6 5* * * * * * * * *
p hr.neucr
M T H F ty pe
Phr.Tricho Class I CPD
Phr.Yeast Photoly ases
Phr.B firm
p hr.strp y
p hr.haloba
PHR STRGR
p C RY1.huma
p hr.mouse
p hr2.human
p hr2.mouse 6-4
p hr.drosop Photoly ases
phr3.Synsp
O R F0 2 2 9 5.V ib ch* * * * * * * *
p hr.neigo
O RF0 1 7 9 2 .V ib ch* * * * * * *
Phr.Adiant
Phr2.Adian
Phr3.Adian
p hr.tomato Blue
C RY1 ARATH
p hr.phycom
Light
C RY2 ARATH
Receptors
PHH1.arath
PHR1 SINAL
p hr.chlamy
PHR ANANI
p hr.Synsp
8-H DF ty pe
PHR SYNY3
CPD
TIGR p hr.Theth
Rh.cap s Photoly ases
19. Alkyltransferases
• All known alkyltransferases are members of a single
gene family
• Found in most but not all species
• Likely present in LUCA
• Ada protein in E. coli originated by fusion between
an alkyltransferase and a transcription-regulatory
domain
• Gram-positive bacteria have the Ada domain fused to
an alkylation glycosylase instead of alkyltransferase
TIGR
20. Alkylation Repair Genes
Ada E. coli
Ada H. infl
Ogt E. coli
Ogt H. infl
Ogt Gram+
Ogt D. radio
M M E
G T uks
AlkA Gram+
AlkAE. coli
AlkA Domain (O6-Me-G glycosylase)
Ogt Domain (O6-Me-G alkyltransferase)
Ada Domain (transcriptions regulator)
TIGR
21. DNA Ligases
• Two major ligase families
• Ligase I
– NAD dependent
– Found in all bacteria and only in bacteria
• Ligase II
– ATP dependent
– Found in all Archaea and eukaryotes
– Found in some bacteria
– Duplicated in many eukaryotes
TIGR
23. Mismatch Excision Repair
• Core of process highly homologous between bacteria and
eukaryotes (all use MutS and MutL homologs).
• Eukaryotes encode multiple MutS and MutL homologs, not all
of which are involved in mismatch repair.
• Two major MutS groups– MutS-I proteins involved in MMR
and MutS-II proteins involved in chromosome segregation.
• MutS1 and MutL missing from many bacteria, especially
pathogens. Other MMR proteins also defective in some.
• Few homologs in Archaea – some encode MutS2, none encode
MutS1, and some may encode MutL.
• Some evolutionary and functional relationships to restriction-
modification systems (MutH, MED1, Vsr).
TIGR
24. 9
9
0
5 MH
S 6
1 0
0
79 MH
S 3
1 0
0
1 0
0 MH
S 2
M tS
u -I
M ism tch
a
95
1 0
0
MH
S 1 R a
ep ir
2
9
5
6
M tS
u 1
6 /8
1 9
Proposed
duplication
55
1 0
0 MH
S 5
M tS
u -II
8
9
5
6
MH
S 4
C ro o m
h m so e
C sso er &
ro v
S reg tio
eg a n
TIGR
60
74 M tS
u 2
25. Ancient Duplication in MutS Family
A. B.
B ug of r
. r d rei
b
S y gns
po ee
5
Tp li u
. ald m
Bs bls
. ut i
i
Dr do ua s
. ai dr n
Sn s
y. p Mt 2
uS Aaoi u
. e lc s
Aa oc s
. el u
i
M e iai m
.gn l u
t
Dr d d r n
. a i ua s
o 4
3 M nu oi e
.p e mna
Bb r d rei
. ug of r Spo e e
. y gns
Spo e e
. y gns
Bs bii
. u tl s
Bs bls
. ut i
i Gn
ee Sns
y. p
Dpc to
ul a n
i i
Sn s
y. p Hpl r
. yoi
M tS
u 1
Gn
ee 2 Ng n rh e e
. oor oa
Dpc to
ul a n
i i Aa oc s
. el u
i
1 Hi funa
. nl e z e
Dr d d r n
. a i ua s
o
Bb r d rei
. ug of r Ec l
. oi
TIGR
26. Parallel Loss of MutLS
Lost in mycoplasmal lineage (present in B. subtilis and S.
pyogenes)
Lost in M. tuberculosis lineage (found in some other highGC
Gram-positives)
Lost in H. pylori / C. jejuni lineage (present in many other
Proteobacteria)
Possibly lost in Euryarchaeota lineage
Defective in many “wild” E. coli and S. typhimurium strains
Loss of genes may give an advantage in some conditions by
increasing mutation rate or recombination rate between
species.
TIGR
27. Nucleotide Excision Repair
• Bacterial and eukaryotic systems are not-homologous,
despite having very similar mechanisms
• Most of the eukaryotic and bacterial proteins originated
within each of these domains
• Some of the eukaryotic proteins are shared with Archaea
(Rad1, Rad2, Rad25).
• All free-living bacteria encode UvrABCD. B. aphidicolum
encodes Mfd but not UvrABCD.
• UvrABC also found in one Archaea.
• Some functional and evolutionary relationships with drug
resistance and transport
TIGR
28. Evolution of UvrA Family
A. ABC Transporters B. UvrA Subfamily UvrA H. influenzae
NrtDC UvrA E. coli
UvrA N. gonorrhoaea
OppDF UvrA R. prowazekii
UUP
UvrA S. mutans
UvrA S. pyogenes
UvrA S. pneumoniae
NodI UvrA B. subtilis
LivF UvrA M. luteus
UvrA M. tuberculosis
XylG UvrA1
UvrA M. hermoautotrophicum
UvrA H. pylori
UvrA1 UvrA C. jejuni
UvrA P. gingivalis
UvrA2 Dup lication
UvrA C. tepidum
in UvrA
uvra1 D. radiodurans
family
PstB UvrA T. thermophilus
UvrA T. pallidum
MDR UvrA B. burgdorefi
HlyB UvrA T. maritima
UvrA A. aeolicus
TAP1 UvrA Synechocystis sp.
UvrA2 S. coelicolor
CFTR, SUR DrrC S. peuceteus
UvrA2
UvrA2 D. radiodurans
TIGR
29. UvrA Evolution
UvrA1C UvrA1N UvrA2C UvrA2N
Gene Duplication
UvrAC UvrAN
Tandem Duplication
ABC2 ABC1 UvrA
Diversification of ABC family
ABC
TIGR
30. Base Excision Repair Glycosylases
• Distribution patterns highly uneven but some glycosylases
have been found in all species
• Some are ancient enzymes, probably presence in LUCA (e.g.,
MutY-Nth), others more recent (e.g., TagI).
• Many families are distantly related to each other (e.g., Ogg,
AlkA, MutY-Nth)
• Many cases of gene duplication, loss and possibly transfer,
especially from organellar genomes to nucleus
• Orthologs frequently have different specificity
TIGR
31. A. thaliana TAG homologs
C. crescentus
A. thaliana_ 5 K23L20 1
A. thaliana_ 3 MBK21.7
A. thaliana_ 1 F23A5.15
A. thaliana_ 1 T24D18.7
A. thaliana_5 MTI20 23
A. thaliana_1 F9E10.6
V. cholerae
H. influenzae
E.coli
M. tuberculosis
N. meningitidis A
TIGR N. meningitidis B
32. AP Endonucleases
• All species encode either Nfo or Xth homologs. Some encode
both.
• Only Nfo: mycoplasmas, Aquifex, M. jannascii, yeast
• Only Xth: many bacteria, A. fulgidus, humans (so far)
• Both: E. coli, B. subtilis, M. tuberculosis, M.
thermoautotrophicum
• Both Nfo and Xth are likely ancient.
• Many cases of gene loss of one or the other, but never both
TIGR
33. Recombinational Repair
• RecA homologs found in all free-living species (B.
aphidicolum encodes RecBCD but not RecA)
• Most recombination initiation pathways are of recent origin
– RecBCD, RecE within Proteobacteria/Gram-positives
– RecF within bacteria
– AddAB within low-GC gram-Positives
– SbcCD may be of ancient origin (possibly homologous to
MRE11/Rad50)
• Resolution pathways also somewhat recent origin
– CCE1 within eukaryotes
– RuvABC, RecG near origin of bacteria
– Rus within bacteria (phage origin?)
• Many cases of gene loss in initiation, resolution pathways.
TIGR
34. Xen.bov ie
Xen.nemat
Pr.v ulgari
Pr.mirabil
Ent.agglo
Y .pestis
S.marcesce
E.coli
Shig.flex
Shig.sonn
Shepu.tig
V ib.angui
V ib.choler
γ
Ps.oleov or
Ps.margina
Ps.fluores
Ps.putid
Ps.aerugi
Ps.aePA M
A z.v inelan
M BBA D17T F * * * * * *
A c.calcoac
A c.sp.A DP
Past.haem
H.influenz
Past.multo
A ctinobaci
A er.salmon
Xa.ory za
Xa.citri
Xa.campes
B.pertussi
Ps.cepaci
Chrom.v ino
M thmon.cla
M thphy .met
M thbac.fla
β
Nitrosomon
L.pneumop
Ne.gonorr
Ne.meningi
T.ferroox i
R hb.phase
R h.legumin
A .tumefaci
R h.melilot
Br.abortus
Blastochlo
α
R hps.palu
A ceto.pol
A ceto.alt
Gluc.ox y d
A q.magnet
Zy m.mobili
Caul.cresc
Prcs.denit
R ho.sphae
R ho.capsu
2M y x .x anth
1M y x .x anth
δ
He.py lori
TIGR ε
He.py lori2
Cmp.jejuni
Cmp.fetus
0.1
42. Repair Genes in Archaea
• All species: RecA,MRE11, Rad50, MutY-
Nth, Ogt, Rad2, Lig-II, PCNA
• UvrABCD in M. thermoautotrophicum
• PhrI and PhrII in some species
• Variety of glycosylases in some species
• No Ung homologs in any species, but
alternative glycosylases have Ung activity
• Rad1 in many species.
• New Holliday junction resolvase
TIGR
44. DNA Repair Genes in D.
radiodurans Complete Genome
Process Genes in D. radiodurans
Nucleotide Excision Repair UvrABCD, UvrA2
Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,
MPG
AP Endonuclease Xth
Mismatch Excision Repair MutS, MutL
Recombination
Initiation RecFJNRQ, SbcCD, RecD
Recombinase RecA
Migration and resolution RuvABC, RecG
Replication PolA, PolC, PolX, phage Pol
Ligation DnlJ
dNTP pools, cleanup MutTs, RRase
Other LexA, RadA, HepA, UVDE, MutS2
TIGR
45. Problem:
List of DNA repair gene homologs
in D. radiodurans genome is not
significantly different from other
bacterial genomes of the similar size
TIGR
46. Unusual Features of D. radiodurans
DNA Repair Genes
Process Genes
Nucleotide excision repair Two UvrAs
Base excision repair Four MutY-Nths
Recombination RecD but not RecBC
Replication Four Pol genes
dNTP pools Many MutTs, two RRases
Other UVDE
TIGR
48. Repair Studies in Different Species
(determined by Medline searches as of 1998)
Humans 7028
E. coli 3926
S. cerevisiae 988
Drosophila 387
B. subtilits 284
S. pombe 116
Xenopus 56
C. elegans 25
A. thaliana 20
Methanogens 16
Haloferax 5
Giardia 0
TIGR
49. Evolution of Repair Summary
• Mycoplasmas have lost many repair genes which may
explain high mutation rate.
• Mismatch repair genes absent in many pathogens (is high
mutation rate advantageous?)
• Whole pathways frequently lost as units (e.g., MutLS).
• May be able to predict pathway interactions by correlated
loss of genes.
• Archaeal genomes have few homologs of bacterial or
eukaryotic repair proteins.
• Some eukaryotic repair proteins have likely mitochondrial
and plastid ancestry
• Many ancient duplications (MutS, SNF2, UvrC).
• Some unusual distributions (XPB, UvrABCD)
TIGR
51. Acknowledgements
TIGR NIEHS
•Craig Venter •Ben Van Houten
•Claire Fraser
•John Heidelberg Louisiana State University
•Owen White •John Battista
•Steve Salzberg
Other
Stanford •J. Laval
•Phil Hanawalt •F. Taddei
•Rick Myers •A. Britt
•D. Crowley •J. Miller
U.C. Berkeley Funding
•Michael Eisen •DOE, OBER
•A. J. Clark •NIH
•NSF
TIGR
53. Unusual Distributions
• XP-B like gene in some bacteria and some Archaea.
• LigaseII in M. tuberculosis, B. subtilis, and A. aeolicus
• UvrABCD in M. thermoatuotrophicum
• Mycoplasmas and some low GC gram positives do not have
any Holliday junction resolving homologs (RuvC, RecG,
Rus)
• Mycoplasmas are the only species without MutY-Nth
homologs
• MutS2 unevenly distributed among bacteria, Archaea
• Genes in RecF pathway not always present as a unit
• Uracil glycosylase missing from Archaea and some bacteria
TIGR
55. Genes Lost in Mycoplasmal Lineage
Process Protein
Base excision repair MutY/Nth, AlkA
Recombination initiation RecF pathway, SbcCD
Recombination resolution RecG, RuvC
Mismatch repair MutLS
Transcription coupled repair MFD
Induction LexA
Direct repair PhrI, Ogt
AP endonuclease Xth
Other MutT, Dut, PriA, SMS
TIGR
56. Parallel Loss of MutLS
Lost in mycoplasmal lineage (present in B. subtilis and S.
pyogenes)
Lost in M. tuberculosis lineage (found in some other highGC
Gram-positives)
Lost in H. pylori lineage (present in many other Proteobacteria)
Possibly lost in Euryarchaeota lineage
Defective in many “wild” E. coli and S. typhimurium strains
Loss of genes may give an advantage in some conditions by
increasing mutation rate or recombination rate between
species.
TIGR
57. Need for Experimental Studies in Archaea
• No novel repair genes cloned in Archaea. All
repair genes show homology to repair genes in
other species.
• Many novel repair genes found in bacteria and
eukaryotes because of experimental work in
these species.
• Since novel repair pathways appear to evolve
frequently in bacteria and eukaryotes, there is
a need for more genetic and experimental
studies of repair in Archaea.
TIGR
58. Repair Genes in all Archaea
Process Protein
Nucleotide excision repair Rad2, Rad1 ±
Recombination RecA, Mre11, Rad50
Replication PolB, PCNA
Ligase Ligase II
Base excision repair MutY-Nth
dNTP pools MutT family
Alkyltransferase Ogt in all species
TIGR
59. DNA Repair Gene Summary
• Most of the standard eukaryotic DNA repair
genes are found
• Some likely plastid repair genes are found
• Some duplications relative to other species
TIGR
60. Acknowledgements
• Genome duplications: S. Salzberg, J. Heidelberg, O.
White, A. Stoltzfus, J. Peterson
• Genome sequences and analysis: J. Heidelberg, T.
Read, H. Tettelin, K. Nelson, J. Peterson, R.
Fleischmann, D. Bryant
• Horizontal transfers: K. Nelson, W. F. Doolittle
• TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul,
Seqcore
• $$$: DOE, NSF, NIH, ONR
TIGR
61. Evolution of Uracil Glycosylase
• Many non-homologous proteins have uracil-DNA
glycosylase activity (Ung, GPADH, MUG, cyclin)
• Therefore, absence of homologs of these genes
should not be used to infer likely absence of
activity
• However, presence of homologs of Ung and MUG
genes can be used to indicate presence of activity
because all homologs of these genes have this
activity
TIGR
62. Ambiguous Origin
Process Proteins
Base excision 3MG, GT MMR, Ung
Nucleotide excision repair Rad25
Recombination initiation RecQ
Other Dut
TIGR
64. Present in All Bacteria
Process Proteins
Nucleotide Excision Repair
Recombinase
Replication PolA,C
Single-strand DNA Binding SSB
Ligase LigaseI
TIGR
65. Present in All Free-Living Bacteria
Process Proteins
Nucleotide Excision Repair UvrABCD
Recombinase RecA
Replication PolA,C
Single-strand DNA Binding SSB
Ligase LigaseI
TIGR
66. Present in Most Bacteria
Process Protein
Nucleotide excision repair UvrABCD
Holliday junction resolution RuvABC
Recombination RecA; RecJ, RecG
Replication PolA,C; PriA; SSB
Ligase DnlJ
Transcription-coupled repair Mfd
Base excision repair Ung, MutY-Nth
AP endonuclease Xth
TIGR
67. Present in Bacteria or Eukaryotes
(But Not Both)
Process Bacteria Eukaryotes
Transcription-coupled repair CSB, CSA
Mismatch strand recognition MutH -
Nucleotide excision repair UvrABC XPs, TFIIH, etc.
Recombination initiation RecBCD, RecF KU, DNA-PK
Holliday junction resolution RuvABC CCE1
Base excision -
Inducible responses LexA P53
TIGR
68. Evolution of Alkyltransferases
• All known alkyltransferases share a conserved,
homologous alkyltransferase domain
• Therefore, if a species does not encode any protein
with this domain, it likely does not have
alkyltransferase activity
• If a species does encode an member of this gene
family, it likely has alkyltransferase activity
TIGR