Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BactoGeNIE talk at Friendly Friday


Published on

visualizing lots of genomes on big displays

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

BactoGeNIE talk at Friendly Friday

  1. 1. Toward a 1000-genome visualization for scalable resolution display environments Jillian Aurisano April 12, 2013
  2. 2. The birth of genomics • A decade ago: – First complete human genome sequence released. – 13 years and 2.7 billion dollars to generate the sequence the 3.3 billion nucleotides in the human genome. – Intensive work for years to identify the 26,000 genes in this sequence
  3. 3. Genome sequencing boom • Since 2003: – Sequencing costs have decreased faster than Moore’s Law • Today: – 1000-genome project – Thousands of bacterial strains sequenced – Metagenomics
  4. 4. As a result • Scalable computational approaches to genome analysis needed – Lots of progress in data mining, bioinformatics, parallel algorithms, cloud computing But – More work needed to develop scalable genome visualization approaches • Why is visualization important? – 1/3 of human brain devoted to processing visual information – Keep expert ‘in the loop’ to verify computational results – Human experts needed to translate data  knowledge
  5. 5. Technical opportunity • High resolution displays – High density information presentation is possible • Large, high-resolution displays – Multiple high-density visualizations can be juxtaposed and compared • Multi-user environments – Collaborative data analysis and data set mashups • New visualization paradigm, opportunity for new visualization approaches
  6. 6. Specific sub-problem: analyzing local variations in gene content • Help bacterial genomics researchers look for conserved sets of genes in bacterial genomes • Why: In bacterial genomes, genes that occur close to each other in the genome may be functionally connected. • Also: Differences in gene content and order in related bacterial strains have evolutionary implications
  7. 7. Other approaches: browsers Pubmed: UCSD genome browser
  8. 8. Other approaches: large scale comparisons in circular representation Circos: An information aesthetic for comparative genomics
  9. 9. GeneRiViT
  10. 10. Goals • New type of genome visualization approach that is: – Scalable: not just comparing 4 genomes, but hundreds • Less text; high density presentation; – Interactive: enable analysis of these bacterial strains – Flexible: not locked to one reference coordinate system; genomes reorderable – Connected: Links between similar elements are shown to enable comparisons
  11. 11. How my prototype works • User specifies 2 types of files: – Genome features: start, stop coordinates of genes and other genome elements. Additional annotation info. – Sequence file: sequences of these genome features • A genome for one species not always in one piece • Use algorithm (cd-hit) to cluster sequences based on similarity • Create a local database, cache a subset to visualize
  12. 12. Demo
  13. 13. What’s next • Lots of analysis features I’d like to explore • Then port this to run on the wall in EVL cybercommons • Long term: Build it into an integrative genomics vis framework for large, high-resolution environments
  14. 14. GenoSAGE • Multiple high-resolution genome visualization in one view, plus associated visualizations of other data types – Can view data at multiple levels of detail – Evaluate details in context • Connections between visualizations – Integration across visualizations • Spatial organization of visualizations to encode information • Multi-user visualization environment
  15. 15. Thanks! • Questions? • Contact me at: (my other project)