Genome exploration in A-T G-C space introducing Icarus a DNA walking program Jonathan Blakes MSc Biotechnology and Computation Department of Biosciences Faculty of Science, Technology and Medical Studies
Hypothesis Can DNA sequences be plotted in such a way that long sequences can be easily interpreted by humans without a prior i knowledge? “ It seems that the simplest method of visualizing some properties of genomes is to send a virtual walker for a genomic walk, ask "it" to talk about what it has seen and note its observations. If our walker doesn't move with a Brownian-like motion, it is possible to extract from its walk a lot of information . ” Stanislaw Cebrat , the principal Polish proponent of DNA walks Assigning a cardinal coordinate ( north , south , east or west ) to each of the four nucleotide bases ( A , T , G , C ) and taking steps in those directions as a sequence is read sequentially will produce a ‘walk’ of the sequence in which repetitive DNA elements will be seen as repetitive 2-dimensional ‘structures’.
DNA walks are plots of DNA or RNA sequences where each of the four nucleotide bases is assigned a direction and distance, the sequence is read off one nucleotide at a time and for each nucleotide the virtual walker takes a step in the designated direction creating a 'walk' of the sequence that reveals elements of structure in the nucleotide composition. DNA walking From Comparative Genometrics website, L'Université de Lausanne
Icarus Live Demonstration Could someone please suggest a mammalian gene to walk?
Mapping 24 possible combinations of cardinal vectors: 4 rotations for each of the 3 above mappings, and 4 rotations of each of their reflections about the x or y plane. Choosing which 3 ‘unique’ mappings of those 24 is a matter of parsimony.
A-T G-C is consistently smallest Smaller pictures can contain more information in less space and are therefore more amenable to publication, hence Genome Exploration in A-T G-C space
Duplications exons introns a 7 fold contiguous duplication in the male Y chromosome. Members of the TSPY (Testis-specific Y-encoded proteins) family identified by Skaletsky et al 1 using a combination of a whole chromosome dotplot with a 2-kb window and a custom Perl script running BLAST alignments of all 5-kb sequence segments, in 2-kb steps, of the entire MSY (Male Specific Y). In contrast I stumbled upon this purely by accident. 1. Skaletsky et al. Nature 2003 423.
DNA walks for phylogenetics <ul><li>But for a DNA walk the spatial distance between the first and last bases is a function of: </li></ul><ul><li>the nucleotide composition of a sequence and a 2D mapping </li></ul><ul><li>the order of the bases since the A might oppose T and C oppose G. </li></ul>Imagine a 1-dimensional textual DNA sequence. The distance from the first base to the last is simply the number of bases in the sequence. A comparison of aligned sequences on the basis of spatial distance (a much simpler measure than the Jukes-Cantor definition of evolutionary distance) will be unable to discriminate between them. 7 previously aligned 1798-nucleotide long small ribosomal subunit sequences of Candida and Saccharomyces species as detailed in Gilfillan 1 were walked and their total euclidean distances used to produce a phylogeny, which was compared to Gilfillan’s. 1. Gilfillan GD, et. al. Microbiology. 1998. 144: 829-838.
Does summing distances from 3 mappings eliminate bias and produce a better phylogeny? NO. A better distance measure is needed.
Conclusion <ul><li>Icarus is a DNA walk based genome browser that can retrieve sequences and annotate walks using Ensembl. </li></ul><ul><li>DNA walks can demonstrate the existence of duplications in DNA to the untrained eye. </li></ul><ul><li>Spatial distances measures can produce phylogenies, but a better measure is needed than Manhattan or Euclidean distance. </li></ul>
Acknowledgements I would like to thank: Dr. Gary Robinson Dr. Colin Johnson Dr. Anthony Baines And everyone I have met during the Biotechnology and Computation MSc.