ABSTRACT
THE 4R GENOME DUPLICATION IN SALMONINE FISHES:
INSIGHTS FROM CONSERVED NON-CODING ELEMENTS
Anibal H. Castillo Co-Advisors:
University of Guelph, 2008 Professors Dr. Moira M. Ferguson
Dr. Roy G. Danzmann
Gene and genome duplications are important processes in evolution. Salmonids are ideal animal model systems in which to study these processes, as they originated from a tetraploid ancestor. Conserved non-coding elements (CNEs) are of interest because of their highly conserved DNA consensus motifs spanning lineages as diverse and divergent as humans and fish. The main goal of this study is to test CNEs as a tool to study genome duplications and to revisit the “4R” hypothesis and phylogeny of Salmonine fishes (Salmonidae) Salmo salar, Salvelinus alpinus and Oncorhynchus mykiss through the study of copy number and nucleotide variation in six pairs of CNEs. Allele numbers for most CNE sequence pairs are consistent with the 4R hypothesis, as is the symmetric phylogenetic topology shown by some CNE pairs; the estimated date of CNE duplication is consistent with the only reported range of 25-100Mya. However, the phylogenetic relationships within Salmoninae remain unresolved.
1. The 4R Genome DuplicationThe 4R Genome Duplication
in Salmonine Fishes:in Salmonine Fishes:
Insights from ConservedInsights from Conserved
Non-Coding ElementsNon-Coding Elements
by Aníbal H. Castilloby Aníbal H. Castillo
February 5th, 2008February 5th, 2008
2. Outline-
•Introduction
•Polyploidy
•Salmonids: a recently polyploid taxon
•How to study polyploidy
•CNEs: a potential tool for studying genome evolution
•Hypotheses & predictions
•Methods
•Marker development
•Phylogenetic analyses
•Molecular clock
•Results and Discussion
•Phylogeny and 4R
•Dating 4R
•Salmoninae phylogeny
3. •Gene duplications and polyploidization
events are important in the evolution of
vertebrates (Ohno 1970; 1999)
•Recent outburst of genomic data provides an
opportunity for innovative testing of established
hypotheses
•Potential to extend new resources to organisms
of biological but non-commercial interest
Introduction- Polyploidy
4. •Common in plants, fungi and animals
•Among vertebrates, fishes, reptiles and
amphibians
• Within fishes:
•non-teleosts: paddlefish, sturgeon and
spotted gar
•teleosts: carps and Salmonids
Introduction- Polyploid taxa
5. Introduction
-
Modified fromModified from
Froschauer et al. 2006Froschauer et al. 2006
4R4R
MedakaMedaka
PlatyfishPlatyfish
PufferfishPufferfish
CichlidsCichlids
TilapiaTilapia
SticklebackStickleback
SalmonidsSalmonids
3R3R
2R2R
1R1R
CatfishCatfish
ZebrafishZebrafish
SturgeonsSturgeons
TeleostsTeleosts
TetrapodsTetrapods
OutgroupOutgroup
Cartilaginous fishCartilaginous fish
Jawless fishJawless fish
Lobe finned fishLobe finned fish
BichirBichir
BowfinBowfin
EelEel
Ray finned fishRay finned fish
450 MYA450 MYA
Whole Genome
Duplications in
Vertebrates
6. Introduction- Approaches to WGDs
•Genomic approaches:
•chromosomal and genome evolution,
•gene regulation
•genetic architecture of phenotypic
variation
• Patterns of gene and genome duplications:
•how many duplication events
•when they occurred
•mechanisms behind these events
7. •Individual genes, chromosome segments,
entire chromosomes and finally entire
genomes
•The most rigorous way of testing genome
doublings is identifying paralogous
chromosomal regions or block duplications
•Once multiple conserved syntenic blocks
are identified, a WGD can be inferred
Introduction- Approaches to WGDs
8. •Require a map-based dataset with the position
of markers in the species’ genome
•Substantial effort added for each species
included
•Inherently expensive
Introduction- Limitations of usual approaches
9. Introduction- WGD can also be studied through
a phylogenetic approach
WGD event
time
duplicated
genes
outgroup
Sp. A
Sp. B
Sp. C
Sp. D
Sp. A
Sp. B
Sp. C
Sp. D
Sp. D
time
Individual
duplication events
outgroup
duplicated
genes
Sp. A
Sp. A
Sp. B
Sp. B
Sp. C
Sp. C
Sp. D
10. Modified from
Froschauer et al. 2006 Mammals
Fishes
Fishes b
Fishes a
Fishes a
Fishes b
Fishes a
Fishes b
Mammals
Mammals
Birds
Mammals
Birds
Outgroup
Introduction-
WGD pattern
in Vertebrates
11. Introduction-
-Ancestrally tetraploid
-Monophyletic; Phylogeny unresolved
-Intermediate development as a genomic model
-Genetic maps
-Sequencing projects (O. mykiss & S. salar)
Subfamily Salmoninae, family Salmonidae
Rainbow troutRainbow trout
OncorhynchusOncorhynchus mykissmykiss
Atlantic salmonAtlantic salmon
Salmo salarSalmo salar
www.lofotakvariet.nowww.lofotakvariet.no
Arctic charrArctic charr
Salvelinus alpinusSalvelinus alpinus
www.fishbase.orgwww.fishbase.org www.fishbase.orgwww.fishbase.org
12. Introduction
-
Modified fromModified from
Froschauer et al. 2006Froschauer et al. 2006
4R4R
MedakaMedaka
PlatyfishPlatyfish
PufferfishPufferfish
CichlidsCichlids
TilapiaTilapia
SticklebackStickleback
SalmonidsSalmonids
3R3R
2R2R
1R1R
CatfishCatfish
ZebrafishZebrafish
SturgeonsSturgeons
TeleostsTeleosts
TetrapodsTetrapods
OutgroupOutgroup
Cartilaginous fishCartilaginous fish
Jawless fishJawless fish
Lobe finned fishLobe finned fish
BichirBichir
BowfinBowfin
EelEel
Ray finned fishRay finned fish
450 MYA450 MYA
Whole Genome
Duplications in
Vertebrates-
4R
13. •Comparing the genome of humans and
Japanese pufferfish (Fugu rubripes), 1373
CNEs were identified
•~90% conserved
•Average of 199bp, maximum length of 736bp
•Occur throughout the human genome;
regulating developmental genes
•Unique to vertebrates
Introduction- Conserved Non-coding Elements,
CNEs (Woolfe et al. 2005)
14. •CNEs in Salmonids
•Evolution of CNEs
•Relative importance of 4R vs. other
processes
•Potentially useful via both mapping and
phylogenetic approaches
•Test CNEs in Salmonids application to non-
classic organisms:
• e.g., South American rodents, Xenopus sp.
Introduction- CNEs: a tool to study genome
evolution
15. Hypotheses:
1. Members of a CNE family will show a
symmetric phylogeny consistent with the 4R
hypothesis in Salmonid fishes
2. The date of inferred CNE duplications will be
consistent with the range of 25-100Mya
(Allendorf & Thorgaard 1984)
3. Salmoninae phylogeny: are Oncorhynchus
and Salvelinus sister groups? Crespi &
Fulton (2003)
Introduction-
16. -Members of a CNE family will show a
symmetric phylogeny consistent with the 4R
hypothesis in Salmonid fishes
time
WGD
CNE
duplicate I
CNE
duplicate II
Sp. A
Sp. A
Sp. B
Sp. B
Sp. C
Sp. C
Outgroup
Hypothesis 1- Prediction
17. -The date of inferred CNE duplications will be
consistent with the range of 25-100Mya
(Allendorf & Thorgaard 1984)
time
25-100Mya
I
II
Hypothesis 2- Prediction
20. • Tree building
• Bayesian analyses
• Maximum Likelihood
• Maximum Parsimony
• Model selection
• Bayesian Information Criterion
• Akaike Information Criterion
• Likelihood Ratio Test
Methods- Phylogenetic analyses
21. •Strict and relaxed clocks
•Calibration points:
•Oncorhynchus fossils 6MY
•Salmonine fossils 20MY
Methods- Molecular clock
time
I
II
23. Results- 1. WGD pattern
WGD event
time
duplicated
genes
outgroup
Sp. A
Sp. B
Sp. C
Sp. D
Sp. A
Sp. B
Sp. C
Sp. D
Sp. D
time
Individual
duplication events
outgroup
duplicated
genes
Sp. A
Sp. A
Sp. B
Sp. B
Sp. C
Sp. C
Sp. D
27. •Three markers had signal at the 4R level
•Basal bifurcation with symmetric topology
consistent with the 4R hypothesis
•Three markers had no signal; these do not
refute 4R hypothesis
•Suggestive evidence that one duplicate from
CNE7061-7063 was lost in Atlantic salmon
Discussion- 1. Phylogeny at the 4R level
28. 6 Mya
20 Mya
38 - 47 Mya
20 Mya
Oncorhynchu
s
Salvelinus + Salmo
Subfamily
Salmoninae
CNE duplicate I
Subfamily
Salmoninae
CNE duplicate II
Oncorhynchus +
Salvelinus + Salmo
47
43
38
Mean
strict
relaxed
relaxed
Clock type
CNE6820-6816
CNE7061-7063
CNE7060-7061
Results- 2. Dating 4R
29. •Stochastic nature of the molecular clock
•Uncertainty of the assigned fossil dates and
classification
•The correspondence between fossils and
nodes in the tree
•Only the subfamily Salmoninae is included;
including more subfamilies within Salmonidae
(e.g., Thymallinae) would yield more robust
results
Discussion- 2. Molecular clock- Uncertainties…
30. •First estimates for the date of the 4R since
Allendorf & Thorgaard (1984)
•First ones ever based on nucleotide sequence
data
•Narrower estimate of the date of the 4R,
38Mya to 47Mya
Discussion- 2. Molecular clock
33. Discussion- 3. Salmonine phylogeny
•No conclusive evidence supporting a sister
relationship between Oncorhynchus and
Salvelinus
•Duplicates within one locus support alternative
phylogenies
•One locus suggests a sister relationship
between Salmo and Salvelinus, never reported
before
•Hard polytomy, reticulation
34. •Phylogeny showing a basal bifurcation and
symmetric topology in some CNE pairs,
consistent with 4R
•The estimated date of CNE duplication is
consistent with the reported range of 25-
100Mya
•However, the phylogenetic relationships
within Salmoninae remain unresolved
•CNEs are a suitable tool for preliminary
approaches to the study of Whole Genome
Duplications
Conclusions-
35. Acknowledgements-
•My advisors, Dr. Moira M. Ferguson and Dr. Roy G.
Danzmann
•Past and current lab members, specially Hooman,
Janet and Michael
•From my advisory committee: Dr. T. Ryan Gregory
•From my examination committee: Dr. J. Ballantyne
and Dr. R. Hanner
•Dr. Tom Nudds
•Members of the honourable Zoology House, John
Urquhart, Joe Crowley, Emilia Argue, Han Xu, Renji
Lu, Jackie Porter, Alison Fischer and Liyan Qing
•Derek Wong, Dan Noble, Vitali Rosen, Momina Mir
and last but not least, Jessica-Margaret Paige
Speech:
-I took many approaches both for the tree-building itself and for the selection of the models of molecular evolution upon which the trees were built.
-The ones shown are those done by Bayesian Analyses, obtained with models chosen by BIC.
-These agree in all major points with the ones obtained by the other methods.
-I’d be happy to discuss this in more detail after the talk…