Complementing Computation with Visualization in Genomics

1,413 views
1,304 views

Published on

A look at Genome Assembly Visualization with ABySS-Explorer, as well as complementing genome browsing
(Using clustering and interactive data exploration)

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,413
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Complementing Computation with Visualization in Genomics

  1. 1. British Columbia Cancer Agency<br />Genome Sciences Centre<br />Vancouver . British Columbia . Canada<br />Complementing Computation with Visualization in Genomics<br />March 11, 2010<br />EBI Interfaces Interest Forum<br />Cydney Nielsen<br />
  2. 2. Discovery path<br />Biological Sample<br />Genomic Data<br />Scientific Insight<br />
  3. 3. Discovery path<br />Biological Sample<br />Genomic Data<br />Scientific Insight<br />
  4. 4. Components of Data Analysis<br />Automation<br />Analysis<br />Genomic Data<br />Scientific Insight<br />Human Judgment<br />
  5. 5. Outline<br />Genome Assembly Visualization<br />ABySS-Explorer<br />Complement to genome browsing <br />Using clustering and interactive data exploration<br />
  6. 6. Outline<br />Genome Assembly Visualization<br />ABySS-Explorer<br />Complement to genome browsing <br />Using clustering and interactive data exploration<br />
  7. 7. Genome Sequencing<br />cell population<br />extracted DNA<br />Shotgun approach<br />sheared DNA<br />sequencing reads<br />AGCGGATTGCATGACAGT<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />
  8. 8. ABySS – Assembly ByShort Sequences<br />Simpson et al. Genome Res 2009<br />Sequencing read set (read length = 7 nt):<br />GGACATC<br />GGACAGA<br />Corresponding de Bruijn graph (k = 5 nt):<br />
  9. 9. ABySS – Assembly ByShort Sequences<br />Simpson et al. Genome Res 2009<br />Sequencing read set (read length = 7 nt):<br />GGACATC<br />GGACAGA<br />Corresponding de Bruijn graph (k = 5 nt):<br />ABySS merges unambiguously connected vertices to form contigs<br />
  10. 10. Assembly Ambiguities<br />True genome sequence<br />GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG<br />
  11. 11. Assembly Ambiguities<br />True genome sequence<br />GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG<br />Assembled sequence <br />de Bruijn graph representation<br />
  12. 12. Starting Point<br />Shaun Jackman<br />
  13. 13. Example of existing tools: Consed<br />
  14. 14. Example of existing tools: Consed<br />
  15. 15.
  16. 16.
  17. 17.
  18. 18. Properties of DNA<br />
  19. 19. Capture sequence strand<br />AAAAAT<br />2+<br />1+<br />
  20. 20. Capture sequence strand<br />AAAAAT<br />2+<br />1+<br />TTTTTA<br />2-<br />1-<br />
  21. 21. Capture sequence strand<br />AAAAAT<br />1+<br />2+<br />TTTTTA<br />
  22. 22. Capture sequence strand<br />AAAAAT<br />1-<br />2-<br />TTTTTA<br />
  23. 23.
  24. 24. Capture sequence length<br />one oscillation = 100 nt<br />
  25. 25. Genome Sequencing<br />cell population<br />extracted DNA<br />read pair information<br />read<br />sheared DNA<br />dsDNA<br />fragment<br />(known size)<br />sequencing reads<br />(typically produce millions)<br />AGCGGATTGCATGACAGT<br />read<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />
  26. 26. Capture read pair information<br />After building the initial single-end (SE) contigs from k-mer sequences, ABySS uses paired-end reads to resolve ambiguities.<br />
  27. 27. Capture read pair information<br />Paired end read information is used the construct paired end (PE) contigs<br />… 13+ 44- 46+ 4+ 79+ 70+ …<br />blue gradient = paired end contig<br />orange = selected single end contig<br />
  28. 28. ABySS-Explorer<br /><ul><li> Visual representation of:
  29. 29. contig adjacency information
  30. 30. contig strand
  31. 31. contig length
  32. 32. paired-end relationships
  33. 33. paired-end contigs
  34. 34. Implemented using the Java Universal Network/Graph Framework (JUNG)
  35. 35. Applied the Kamada-Kawai layout algorithm (JUNG implementation)
  36. 36. Use ABySS files as input (version 1.1.0 and higher)</li></li></ul><li>
  37. 37. http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer<br />
  38. 38. Part 1: Conclusions and Future Work<br /><ul><li> Graph encoding provides a integrated display of genome assemblies and associated meta-data
  39. 39. This representation is particularly powerful for revealing high-level genome assembly structure, not readily viewable in any other interactive tool
  40. 40. Future work includes:
  41. 41. support for other assembly algorithm outputs
  42. 42. enable flexible annotation display
  43. 43. integrate with existing assembly editing tools</li></li></ul><li>Outline<br />Genome Assembly Visualization<br />ABySS-Explorer<br />Complement to genome browsing <br />Using clustering and interactive data exploration<br />
  44. 44. Genome Sequencing<br />cell population<br />extracted DNA<br />sheared DNA<br />sequencing reads<br />(typically produce millions)<br />AGCGGATTGCATGACAGT<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />
  45. 45. Genome Sequencing<br />cell population<br />extracted DNA<br />sheared DNA<br />sequencing reads<br />(typically produce millions)<br />AGCGGATTGCATGACAGT<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />
  46. 46. Genome Sequencing<br />cell population<br />Chromatin Immunoprecipitationand Sequencing <br />(ChIP-Seq)<br />extracted DNA<br />selection<br />sheared DNA<br />sequencing reads<br />(typically produce millions)<br />AGCGGATTGCATGACAGT<br />GTACAGCCTGACAGAAGC<br />GCGCTACGATCAGATCAA<br />GTACAGCCTGACAGAAGC<br />CATGACAGTCCGAGTACA<br />TTCAGAATGGTACAGCAG<br />TTCAGAATGGTACAGCAG<br />
  47. 47. Align sequences to the genome<br />CCGAGTACAGCCTGACAGA<br />GCATGACAGTCCGAGTAC<br />TTGCATGACAGTCCGAGT<br />AGCGGATTGCATGACAGT<br />AGCGGATTGCATGACAGT<br />AGCGGATTGCATGACAGT<br />Reference Genome<br />AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA<br />Read coverage<br />Genomic coordinate<br />
  48. 48. Genome browser can reveal local patterns<br />H3K4me3<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />H3K9Ac<br />MRE<br />
  49. 49. Difficult to get global overview<br />
  50. 50. Focus on regions of interest<br />1. For example, transcriptional start sites (TSS +/- 3000 nt)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />MeDIP<br />MRE<br />2. Extract data matrices<br />Normalization for bin i, sample h:<br />3. Cluster matrices (k-means clustering with Euclidean distance)<br />
  51. 51. Focus on regions of interest<br />1. For example, transcriptional start sites (TSS +/- 3000 nt)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />MeDIP<br />MRE<br />2. Extract data matrices<br />Normalization for bin i, sample h:<br />3. Cluster matrices (k-means clustering with Euclidean distance)<br />
  52. 52. Focus on regions of interest<br />1. For example, transcriptional start sites (TSS +/- 3000 nt)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />MeDIP<br />MRE<br />2. Extract data matrices<br />Normalization for bin i, sample h:<br />3. Cluster matrices (k-means clustering with Euclidean distance)<br />
  53. 53. Enable interactive exploration<br />4. Interactive cluster visualization (data from H1 cells)<br />cluster size indicator (total n= 15,618)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />cluster <br />(average values displayed)<br />individual TSS<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />HOXC12 gene<br />scroll bar to explore all cluster members<br />5. Link-out to UCSC genome browser<br />
  54. 54. Enable interactive exploration<br />4. Interactive cluster visualization (data from H1 cells)<br />cluster size indicator (total n= 15,618)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />cluster <br />(average values displayed)<br />individual TSS<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />scroll bar to explore all cluster members<br />
  55. 55. Enable interactive exploration<br />4. Interactive cluster visualization (data from H1 cells)<br />cluster size indicator (total n= 15,618)<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />cluster <br />(average values displayed)<br />individual TSS<br />H3K4me3<br />H3K9Ac<br />H3K4me1<br />H3K36me3<br />H3K27me3<br />H3K9me3<br />MeDIP<br />MRE<br />mRNA<br />HOXC12 gene<br />scroll bar to explore all cluster members<br />5. Link-out to UCSC genome browser<br />
  56. 56.
  57. 57. Part 2: Conclusions and Future Work<br /><ul><li> Clustering reveals patterns that were not obvious using a genome browser.
  58. 58. Access to both global and detailed view is valuable
  59. 59. Future work includes:
  60. 60. search functionality (e.g. by region id)
  61. 61. integration with other clustering tools
  62. 62. richer analysis functionality (e.g. interactive clustering)</li></li></ul><li>Acknowledgements<br />NIH Epigenomics Roadmap<br />ABySS-Explorer<br />Joe Costello, UCSF<br />Peggy Farnham, UC Davis<br />Thea Tlsty, UCSF<br />Marco Marra<br />Martin Hirst<br />Yongjun Zhao<br />Nina Thiessen<br />Richard Varhol<br />Shaun Jackman<br />İnanç Birol<br />Jason Chang<br />Lymphoma Project Analyst<br />Karen Mungall<br />Supervisor<br />Primary Data Generation<br />Steven Jones<br />Lymphoma Genomics Team<br />
  63. 63.
  64. 64.
  65. 65. Complementing Computation with Visualization in Genomics<br />March 11, 2010<br />Cydney Nielsen<br />BC Cancer Agency<br />Genome Sciences Centre<br />Vancouver, Canada<br />

×