Investigation into the phylogeny of odobenus rosmarus
Investigation into the phylogeny of Odobenus RosmarusA report for Nello Cristianini for the unit EMATM0004 Computational Genomics and Bioinformatics Algorithms By Samuel R Neaves SN0550 November 2011
IntroductionThis project investigates the evolutionary history of Odobenus rosmarus (The walrus). The evolutionof the Pinnipedia (Odobenidae- walruses, Otariidae- eared seals, including sea lions and fur seals &Phocidae- earless seals) is said to be enigmatic with the exact relationships between subspecies indispute. The majority of authors support a monophyletic origin of the pinnipeds from a caniform,however there are others who suggest a diphyletic origin with the phocidae being related to themustelids (The mustelids are themselves a disputed family). Arnason et al (1995).A further dispute is that some authors divide the walrus into three sub species of Odobenusrosmarus + (rosmarus, divergen or laptivai) however recent work by (Lindqvist et al, 2009),concludes that laptivai are not a distinct species from divergen. The aim of this investigation is togather evidence for the true phylogeny.Data DescriptionThe primary species for this investigation will be Odobenus rosmarus rosmarus. The completemitochondrial DNA accession number in genbank is: NC_004029(.2). Odobenus rosmarusrosmarus’s phylogeny will be computed in relation to Erignathus barbatus(Bearded Seal,representing Phocidae ) Zalophus californianus(California Sea Lion, representing Otariidae) Ursusmaritimus (Polar bear, representing Caniformia) and Gulo gulo (Wolverine, representing mustelids).Homo sapiens are used as an out group to root the phylogenetic trees. For the full table of accessionnumbers see appendix A.Sequence statistics.Odobenus rosmarus rosmarus mitochondrial DNA was statistically analyzed with the followinginformation found:The size of the genome is 16565 base pairs.The number of each base:A C G T5401 4310 2414 4440The base count frequency:A C G T0.3260 0.2602 0.1457 0.2680This shows that there are twice as many A’s as G’s, with roughly the same amount of C’s and T’s overthe whole genome. This seems an interesting break from the norm of A and T content being similarand G and C content being similar. To further investigate and in order to consider local fluctuationsin the frequencies of nucleotides we employ sliding windows of size 5000, 2000 and 500 and plot thefrequencies.
Nucleotide density 5000 Nucleotide density 20000.5 0.5 A A0.4 C 0.4 C G G0.3 T 0.3 T0.2 0.20.1 0.1 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 A-T C-G density A-T C-G density0.7 0.7 A-T A-T0.6 C-G 0.6 C-G0.5 0.50.4 0.4 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Nucleotide density 500 0.8 A 0.6 C G 0.4 T 0.2 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 A-T C-G density 0.7 A-T 0.6 C-G 0.5 0.4 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 A sliding window of size 5000 does not show a great deal of variation amongst the composition however a smaller windows clearly show peaks and troughs, which shows that the nucleotides are not drawn from a independent and identically distributed probability distribution as the distribution changes along the genome. With a caveat of caution because of the apparent violation of the aggregate frequencies, the GC content is also plotted; at the smallest window size this seems to show six distinct waves of variation in both AT and CG content. Next we employ an ab initio method to find protein encoding genes. The single-nucleotide permutation test calculates the significance of Open Reading Frames(ORFs) with a threshold set to be longer than all ORFs in a random sequence and it finds 1 gene. If we set α to 5% then we get a larger value of 12 genes found. We are careful to set the correct genetic code for vertebrate Mitochondrial. We translate these genes into protein sequences and identify cytochrome B and cytochrome C by translating into amino acid sequences and blasting. Once identified we run further protein blasts using both cytochromes to identify the nearest other species.
Results of CYTB blast:Rank Latin name Common Name Total Score(Max 760)1 Halichoerus grypus Grey Seal 6812 Gulo gulo Wolverine 6803 Phoca vitulina stejnegeri Harbour Seal 6794 Erignathus barbatus Bearded Seal 6795 Ictonyx libyca Saharan Striped Pole cat 679These results are interesting because they do not include any Otariidaes, suggesting that Pinnipediahave a diphyletic origin from the ancient caniform with the Odobenidae, Phocidae and the Mustelidson one branch and Otariidaes on another.Results of CYTC blastRank Latin Name Common name Total Score1 Tremarctos ornatus Spectacled bear 4472 Otaria byroni South American Sea Lion 4463 Arctocephalus Guadalupe fur seal 445 townsendi4 Neophoca cinerea Australian Sea Lion 4445 Callorhinus ursinus Northern fur seal 444This results is a contrast to the CYTB blast results, this time with many Otariidaes, no Phocidaes ahigh ranking Caniformia and no Mustelids. This data appears to support the monophyletic originhypothesis or a diphyletic origin but with the Odobenidae on the branch of Otariidaes. To furtherthe investigation we add the initially selected organisms to the data set and compute the geneticdistances between each pair. We utilize the Jukes-Cantor correction to account for multiplesubstitutions that have occurred in the same space. ( )This states that the number of substitutions per site between two sequences (K) can be estimatedfrom the observed fractions that differ (d).With this applied on cytochrome b it is clear that the Polar bear is very distantly related compared tothe other species. It is interesting that the data suggests that the Spectacled bear is a closer relationto the pinnipeds than the Polar bear.
If we remove the Polar bear to allow us to zoom in we can see five distinct groups. The data showsthe Walrus is about equally distant from the Otariidaes and the Phocidaes , with the Otariidaescloser to Mustelids and as far from the Phocidaes as it is from the Spectacled bear.Performing the same procedure for cytochrome C we get similar results however, this time thePhocidaes are clearly grouped with Mustelids along with the Polar bear. The Spectacled bear is onceagain on its own slightly closer to Phocidaes than the Otariidaes. This leaves the Walrus again as anoutliner being roughly equal distances from the two major clusters.
Four phylogenetic trees were built, one for each Cytochrome from both amino acid and nucleotidesequences’. In order to build the cytochrome C nucleotide tree, a number of animals includingOdobenus rosmarus had to use amino acid to nucleotide transformation due to unavailability ofsequence data, which as this is not a one to one relation results in some random substitutions whichmay affect the accuracy of this graph.The results present a confused picture with many contradictions between the four trees. However ifwe discount the Cytochrome C nt tree there appears to be some consensus, all the Otariidae andPhocidae are consistently grouped together and the Odobenus Rosmarus is seen to first split fromthe common ancestor of both the Otariidae and Phocidae which then diverged at a later date, thisstands in contrast to the results in Arnason et al(1995) which show the Phocidae first splitting, with alater split between the Otariidae and Odobenus Rosmarus. However (Lento et al, 1995) does offersome evidence for Odobenus Rosmarus being an early divergence from the common pinnipedancestor which would be consistent with these results. There are major differences in the placing ofUrsus maritimus, Tremarctos ornatus and Gulo gulo between the cytochrome b and c trees,cytochrome c puts the mustelids, Ursus maritimus and Tremarctos ornatus on the same branch asthe Phocidae, however the cytochrome B tree has the Mustelids and Tremarctos ornatus close tothe Otariidae, with Ursus maritimus being a distance relation. Castresana (2001) presents evidencethat Cytochrome B is more reliable for constructing trees at the genus and family level and thereforethis tree may be taken as a more reliable indicator to the true phylogeny.The online resource tax browser collated by NCBI has the Odobenidae, Phocidae and Otariidae asthree distinct families within the suborder of Caniformia and does not have any one group as anancestor to the other.Multiple alignmentsIn order to build multiple alignments and identify polymorphic sites the heuristic CLUSTALW tool wasused to align both the cytochrome B and cytochrome C protein sequences. This was set to use theBLOSUM Protein weight matrix with a GAP open penalty set to 10, GAP extension penalty set to0.20, GAP distances set to 5 and No End Gaps set to ‘No’. Too see the full alignments refer toappendix B. It is clear that both alignments are very good apart from of course the out-group andthe Polar bear in cytochrome B. The majority of polymorphic sites in cytochrome B are consistentwith the groupings of Odobenidae, Phocidae and Otariidae. They include both indels and pointmutations. The sites are fairly sporadic across the sequences which is in contrast to the polymorphicsites in cytochrome C which mostly lie between the 50th and 100th amino acid with the extremitiesremaining constant.
Addressing the question of how many species of OdobenusRosmarus there are we utilize a selection of walrus samples fromthe (Lindqvist et al, 2009) study. These sequences are ATL25tRNA-Trp and tRNA-Pro genes from the mtDNA region of thegenomes. We follow the same procedure as earlier computingthe genetic distance between the samples using jukes cantorcorrection and plotting these on a graph. We use thiscomputation to build an unrooted phylogenetic tree. Both thetree and the distance plot conforms with (Lindqvist et al, 2009)conclusion that the walruses sampled from the Laptev sea areindeed just a subgroup of the Pacific walrus because they exist ina sub branch of Odobenus rosmarus divergens and their geneticdistance is mixed amongst the Pacific samples. This data andanalysis therefore does not justify labeling these as a separatespecies.A further point of note is that the Atlantic walrus genetic datashow signs of going through a genetic bottle neck due to the lackof diversity compared to the Pacific walrus. This information sitswith the historic fact, that the Atlantic walrus was almost huntedto extinction by the 1950’s with numbers beginning to recoversince then. Whereas the more remote locations inhabited by thePacific walrus protected them from human hunting which hasallowed there numbers to remain much higher throughout the20th century and therefore accounting for the greater geneticdiversity shown in the samples. If further larger samples arecollected and more detailed analysis’s show the same resultsthen it may be it will be time to change the current NCBI taxbrowser to show only two species of Odobenus Rosmarus. Atlantic Pacific Laptivai
ConclusionThe analysis that we have performed present results that stand in contrast to the two papers Ulfureet al (1995) and Lento et al (1995). Proving that the question of pinniped evolution is indeed veryinteresting with a variety of hypothesis still in contention. The examination of the question of ifthere are two or three walruses species came to the same conclusion as (Lindqvist et al, 2009)despite using different techniques and methods. It must be said that the same data was used for thisstudy and Lindqvist et al’s (2009) study. Which when taken with the low numbers of samples and theuse of amplicons, as well as the inherent difficulty of sampling Odobenus Rosmarus potentiallyleading to sampling errors, such as close relatives being sampled, leaves the hypothesis very muchstill open to refutation.While the evolution of pinnipeds remain inconclusive there remains the need for further more in-depth studies to allow for reliable conclusions to be drawn so that wise actions can be taken toprotect this charismatic and vulnerable artic creature from the threats of hunting and habitatdestruction that continue to push many creatures to extinction. A pair of curious Walruses (image from http://www.free-extras.com/images/walrus-8927.htm)
Appendix C bibliographyAndersen et al. (1998). Population Structure and gene flow of the Atlanstic Walrus (Odobenus rosmarus rosmarus) in the eastern Atlantic Artic based on mitochondiral DNA and microsatellite variation. Molecular Ecology(7), 1323-1336.Castresana J. (2001). Molecular biology and Evolution(18), 465-471.Castresana J. (2001). Cytochrome b Phylogeny and the Taxonomy of Great Apes and Mammals. Molecular biology and Evolution(18), 465-471.Lento et al. (1995). Use of Spectral Anaylsis to test hypotheses on the orign of pinnipeds. Molecular Biology and Evolution(12), 28-52.Lindqvist et al. (2009). The Laptev Sea Walrus Odobenus rosmarus laptevi: an engima revisited. Zoologica Scripta(38), 113-127.Ulfure, A., bodin, K., Gullberg, A., Ledge, C., & Mouchaty, S. (1995). A Molecular View of Pinniped Relationships with Particular Emphasis on the True Seals. Journal of molecular Evolution(40), 78-85.