Genome exploration in  A-T G-C space introducing   Icarus a DNA walking program Jonathan Blakes MSc Biotechnology and Comp...
Problem too much information!
EnsEMBL UCSC Genome Browsers
Hypothesis Can DNA sequences be plotted in such a way that long sequences can be easily interpreted by humans without  a p...
DNA walks are plots of DNA or RNA sequences where each of the four nucleotide bases is assigned a direction and distance, ...
Icarus Live Demonstration Could someone please suggest a mammalian gene to walk?
Mapping 24  possible combinations of cardinal vectors: 4 rotations for each of the 3 above mappings, and  4 rotations of e...
A-T G-C
A-G C-T
A-C G-T
A-T G-C
A-T G-C is consistently smallest Smaller pictures can contain more information in less space and are therefore more amenab...
Duplications exons   introns a  7  fold contiguous duplication in the male Y chromosome. Members of the TSPY (Testis-speci...
DNA walks for phylogenetics <ul><li>But for a DNA walk the spatial distance between the first and last bases is a function...
Phylogeny algorithms neighbour joining Icarus’ UPGMA Distance Matrix
Phylogeny Demonstration
Newick format    Distance Matrix Output Newick format string representation of a tree: (Bovine:0.69395, (Gibbon:0.36079, ...
Phylogenies with DNA walks
Does summing distances from 3 mappings eliminate bias and produce a better phylogeny? NO. A better distance measure is nee...
Conclusion <ul><li>Icarus is a DNA walk based genome browser that can retrieve sequences and annotate walks using Ensembl....
Acknowledgements I would like to thank: Dr. Gary Robinson Dr. Colin Johnson Dr. Anthony Baines And everyone I have met dur...
Upcoming SlideShare
Loading in …5
×

Genome Exploration in A-T G-C space (mk1)

445 views

Published on

My masters presentation demonstrating Icarus and DNA walking

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
445
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Genome Exploration in A-T G-C space (mk1)

  1. 1. Genome exploration in A-T G-C space introducing Icarus a DNA walking program Jonathan Blakes MSc Biotechnology and Computation Department of Biosciences Faculty of Science, Technology and Medical Studies
  2. 2. Problem too much information!
  3. 3. EnsEMBL UCSC Genome Browsers
  4. 4. Hypothesis Can DNA sequences be plotted in such a way that long sequences can be easily interpreted by humans without a prior i knowledge? “ It seems that the simplest method of visualizing some properties of genomes is to send a virtual walker for a genomic walk, ask &quot;it&quot; to talk about what it has seen and note its observations. If our walker doesn't move with a Brownian-like motion, it is possible to extract from its walk a lot of information . ” Stanislaw Cebrat , the principal Polish proponent of DNA walks Assigning a cardinal coordinate ( north , south , east or west ) to each of the four nucleotide bases ( A , T , G , C ) and taking steps in those directions as a sequence is read sequentially will produce a ‘walk’ of the sequence in which repetitive DNA elements will be seen as repetitive 2-dimensional ‘structures’.
  5. 5. DNA walks are plots of DNA or RNA sequences where each of the four nucleotide bases is assigned a direction and distance, the sequence is read off one nucleotide at a time and for each nucleotide the virtual walker takes a step in the designated direction creating a 'walk' of the sequence that reveals elements of structure in the nucleotide composition. DNA walking From Comparative Genometrics website, L'Université de Lausanne
  6. 6. Icarus Live Demonstration Could someone please suggest a mammalian gene to walk?
  7. 7. Mapping 24 possible combinations of cardinal vectors: 4 rotations for each of the 3 above mappings, and 4 rotations of each of their reflections about the x or y plane. Choosing which 3 ‘unique’ mappings of those 24 is a matter of parsimony.
  8. 8. A-T G-C
  9. 9. A-G C-T
  10. 10. A-C G-T
  11. 11. A-T G-C
  12. 12. A-T G-C is consistently smallest Smaller pictures can contain more information in less space and are therefore more amenable to publication, hence Genome Exploration in A-T G-C space
  13. 13. Duplications exons introns a 7 fold contiguous duplication in the male Y chromosome. Members of the TSPY (Testis-specific Y-encoded proteins) family identified by Skaletsky et al 1 using a combination of a whole chromosome dotplot with a 2-kb window and a custom Perl script running BLAST alignments of all 5-kb sequence segments, in 2-kb steps, of the entire MSY (Male Specific Y). In contrast I stumbled upon this purely by accident. 1. Skaletsky et al. Nature 2003 423.
  14. 14. DNA walks for phylogenetics <ul><li>But for a DNA walk the spatial distance between the first and last bases is a function of: </li></ul><ul><li>the nucleotide composition of a sequence and a 2D mapping </li></ul><ul><li>the order of the bases since the A might oppose T and C oppose G. </li></ul>Imagine a 1-dimensional textual DNA sequence. The distance from the first base to the last is simply the number of bases in the sequence. A comparison of aligned sequences on the basis of spatial distance (a much simpler measure than the Jukes-Cantor definition of evolutionary distance) will be unable to discriminate between them. 7 previously aligned 1798-nucleotide long small ribosomal subunit sequences of Candida and Saccharomyces species as detailed in Gilfillan 1 were walked and their total euclidean distances used to produce a phylogeny, which was compared to Gilfillan’s. 1. Gilfillan GD, et. al. Microbiology. 1998. 144: 829-838.
  15. 15. Phylogeny algorithms neighbour joining Icarus’ UPGMA Distance Matrix
  16. 16. Phylogeny Demonstration
  17. 17. Newick format  Distance Matrix Output Newick format string representation of a tree: (Bovine:0.69395, (Gibbon:0.36079, (Orang:0.33636, (Gorilla:0.17147, (Chimp:0.19268, Human:0.11927) :0.08386):0.06124):0.15057):0.54939, Mouse:1.21460);
  18. 18. Phylogenies with DNA walks
  19. 19. Does summing distances from 3 mappings eliminate bias and produce a better phylogeny? NO. A better distance measure is needed.
  20. 20. Conclusion <ul><li>Icarus is a DNA walk based genome browser that can retrieve sequences and annotate walks using Ensembl. </li></ul><ul><li>DNA walks can demonstrate the existence of duplications in DNA to the untrained eye. </li></ul><ul><li>Spatial distances measures can produce phylogenies, but a better measure is needed than Manhattan or Euclidean distance. </li></ul>
  21. 21. Acknowledgements I would like to thank: Dr. Gary Robinson Dr. Colin Johnson Dr. Anthony Baines And everyone I have met during the Biotechnology and Computation MSc.

×