Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Comparative Genomics, Visualization and Big Displays

9 views

Published on

Talk presented at Art of Science Chicago on June 9, 2016.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Comparative Genomics, Visualization and Big Displays

  1. 1. Art of Science: Comparative Genomics, Visualization and Big Displays Jillian Aurisano June 9, 2016
  2. 2. About me
  3. 3. Outline • Big data and how it changes the way we do science • How visualization and big displays may help • Walk through the process of building a new visualization with scientists for big displays • Discuss current and future work and where I think we are headed
  4. 4. My grandma • Lab tech at a university specializing in blood – Blood types, blood groups, antibodies, transfusion – Everything to do with blood • Bottom-up understanding of how science works • Informed my understanding of science
  5. 5. My grandma’s story of science Collect samples Look through the microscope See patterns See anomaliesCall upon expertise
  6. 6. My grandma’s recipe for good science Science! - unlimited observat ions - deep expert ise - endless curiosity - lots of grunt work - a pinch of luck - a community of scient ists - Know where your samples come from - Record observat ions - Be careful of your assumpt ions - Don't be in a t unnel - Consider alternate explanat ions Bake for 40+ years. Serves your community and humanity Eureka!
  7. 7. Science and scientists • Looking at other scientists, something in common: – People looking, observing, thinking, exploring, communicating, making decisions – using human expertise, curiosity, confusion, excitement… • Deeply human process to investigate the world and produce new knowledge
  8. 8. Science research in college • In college I started doing biology research in an immunology lab studying how immune cells developed • My grandma would ask me if I looked under the microscope and made observations
  9. 9. My story of science Generate samples Apply to a chip
  10. 10. My “big data” story of science • Tradition methods: a student would focus on collecting one or two data points – Closer to my grandma’s experience – Couldn’t directly observe these molecules, but you were isolating a small picture and collecting a small result that could be understood • New methods: digitized data collection allowed one student to collect thousands of data points – Potentially more comprehensive picture – Fast, efficient and cheap – But very difficult to directly understand and control
  11. 11. Science not from my grandma’s recipe Science! - unlimited observat ions - deep expert ise - endless curiosity - lots of grunt work - a pinch of luck - a community of scient ists - Know where your samples come from - Record observat ions - Be careful of your assumpt ions - Don't be in a t unnel - Consider alternate explanat ions Bake for 40+ years. Serves your community and humanity • I was removed from generating the data (black box) • Few observations: – The processes we studied were too small to observe – Our big data result was hard to map into what I knew • Assumption, tunnel thinking were baked into every step of this process – By necessity – Too hard to consider alternate explanations
  12. 12. The opportunity and cost of big data • Measuring more, drawing a big picture • But our capacity to understand does not grow at the same rate as our data • For me, for my grandma: disorientation from losing direct connection with science “No one looks under a microscope anymore. It is all DNA and computers and chips. How do we make discoveries?”
  13. 13. • Automated systems will help with big data • But it is not just about computers giving us answers • People can build and transmit new scientific knowledge • We need to give scientists access points to computational methods • Use my grandma’s image: To do this: we need some sort of big data microscope Picard: “Computer: scan everything, run diagnostics, and tell us the answer.” Computer: ”The answer is 42.” Role for computational approaches in big data and science
  14. 14. There is a computer science field which focuses bringing scientists back ’into the loop’ and building ways for scientists to observe, explore, use prior knowledge, share findings… (all the things that has made science work)
  15. 15. There is a computer science field which focuses bringing scientists back ’into the loop’ and building ways for scientists to observe, explore, use prior knowledge, share findings… (all the things that has made science work) DATA VISUALIZATION
  16. 16. What is data visualization? Goal: Visually representing data on interactive devices so that users can view, explore and analyze data and share findings with others
  17. 17. Powerful human visual system • Around 60% of our brain is involved in processing visual information • Evolved to recognize visual patterns, outliers and trends • To bring our expertise to bear on data, we just need good visual representations of data
  18. 18. Research in visualization • Several international research conferences and journals on data visualization – http://www.ieeevis.org/ • Questions: – How to best represent data of different types – How to design efficient algorithms for representing data – How to help users perform different kinds of tasks – How to design new ways to interact with visualizations • Combines computer science, psychology, art, math/statistics, diverse application domains (sciences, engineering, business, humanities, journalism, sports…)
  19. 19. My lucky break into the data vis world • Just starting my MS degree in computer science, I discovered the Electronic Visualization Lab at University of Illinois at Chicago – Big displays, touch displays, stereoscopic displays, gesture recognition • One day later: A group of biologists had a big data problem and believed new visualizations and big displays could help • No one in the lab knew biology
  20. 20. My lucky break into the data vis world • Just starting my MS degree in computer science, I discovered the Electronic Visualization Lab at University of Illinois at Chicago – Big displays, touch displays, stereoscopic displays, gesture recognition • One day later: A group of biologists had a big data problem and believed new visualizations and big displays could help • No one in the lab knew biology This is amazing. I need to work here!
  21. 21. My lucky break into the data vis world • Just starting my MS degree in computer science, I discovered the Electronic Visualization Lab at University of Illinois at Chicago – Big displays, touch displays, stereoscopic displays, gesture recognition • One day later: A group of biologists had a big data problem and believed new visualizations and big displays could help • No one in the lab knew biology This is amazing. I need to work here!
  22. 22. EVL history • Founded in 1973 • art/CS lab • Developing new environments for visualizing data and collaborating
  23. 23. EVL today: Big displays for big data • Big data revolution in science • At the same time: display resolutions and sizes also increasing • Improved rendering power from graphics cards • Tiled display walls using – Display clusters – Single machine with multiple graphics cards
  24. 24. Can big displays help with big data? • These environments are cool and futuristic and beautiful but… • Can they help us solve big data problems?
  25. 25. BactoGeNIE overview • Worked with a team of biologists who had thousands of bacterial genomes and a large tiled display wall • We learned that we needed new visualizations that would – Scale up to the wall – Scale up to large data volumes Next: the motivating problem and how I came up with the design. Example for how big displays could help with big data.
  26. 26. My biology collaborators and their genome sequencing boom • In 2000 it took billions of dollars, hundreds of researchers to sequence the human genome • Since then, changes in genome sequencing technology enabled cheap and fast genome sequencing • My bacterial genomics collaborators suddenly could sequence thousands of complete genome sequences of closely related bacterial strains
  27. 27. Why are bacterial genome sequences important?• Understanding bacterial genomes will help us – Develop antibiotics – Understand antibiotic resistance – Find genes that may be useful in drug development and agriculture • Finding subtle differences between genomes in related strains may help us explain why strains of bacteria have different properties – Eg. One is antibiotic resistant, another is not https://www.patricbrc.org/ portal/portal/patric/Home
  28. 28. What is a genome sequence? What does the data look like? • Genome: complete genetic material for an organism, consists of a set of long sequence of nucleotide – chemical building blocks of DNA • Genes: a small sequence of nucleotides within a genome that encodes a product, such as a protein, which performs functions in an organism. • Genomic data includes – Sequence: is composed of a linear sequence of subunits called nucleotides. – Annotation: position of genes and other elements within the genome sequence • With the gene sequences, can identify related genes across different genomes: Orthologs
  29. 29. Specific problem: Comparative Gene Neighborhood Analysis • In bacteria, a gene’s neighbors in the genome may be involved in similar functions. • Sequencing genomes would allow researchers to compare neighborhoods around interesting genes • This would allow my collaborators to – Explore to find new genes – Dig into differences between gene neighborhoods in related bacterial strains gene1 gene2 gene3 gene4 Biological process ? ?
  30. 30. What we needed • We needed a visualization that would – Show the interesting differences and similarities around genes of interest – Scale to lots of genomes – Scale to big displays • How should we ‘draw’ this genomic data to help the researchers do their work?
  31. 31. First: looked at existing visualizations • Could they find the features that interested them? • Did these scale up to larger numbers of genomes? • Designed for – Small collections of genomes (2-9), small numbers of genes – Because the ability to sequence so many genomes is new! • Why didn’t they scale - Line connections and text: visual clutter as you scale-up - Color to show similarity- but not enough colors • Conclusion: encodings and layouts incompatible with large numbers of gene neighborhoods McKay et al. Using the Generic Synteny Browser (GBrowse_syn). Current protocols in Bioinformatics Hoboken, NJ, USA: John Wiley & Sons Fong, Christine, et al. "PSAT: a web tool to compare genomic neighborhoods of multiple prokaryotic genomes." BMC bioinformatics 9.1 (2008): 170.
  32. 32. Designed for this Not for this
  33. 33. Next: What did they want to observe? Content Order and orientation Context for addressing errors in data verification Break Strain 1 Strain 2 Strain 3 Ground truth Strain 1 Strain 2 Strain 3 Strain 2 B B B C C C D D D B B B C C C D D D Break pt Break pt Gap Strain 1 Strain 2 Strain 3 Ground truth Strain 1 Strain 2 Strain 3 A B C D A B C D A A B C D D A A B C D D A A A A A A
  34. 34. Developed our basic encoding
  35. 35. How to make a high density design? • Traditional visualizations use lines and text to indicate related genes in different genomes – Low density – Lots of visual clutter – Hard to follow on compressed, high pixel- density displays • Our solution: High density encoding – Color to encode similarity – Removed the text, made it available ‘on demand’ gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id high-density display color, not orthology lines identification on-demand Existing, low-density approaches: >100 pixels BactoGeNIE 8-16 pixels
  36. 36. How to design for large displays? • Goal: consider whether design scales-up spatially across a big display. • An increase in display size could hamper the perception of data and relationships. – When related entities are on opposite ends of the display, preventing direct comparison
  37. 37. How to design for large displays? • Solution: Features to enable clustering, grouping and alignment features to bring related genes and their neighborhoods together to enable comparisons
  38. 38. Design to target perceptual scalability • Perceptual scalability: – Allow someone to look across a large number of entities on a big display surface and see patterns • Interaction design: gene targeting function: – User selects a gene of interest – Scene is reconfigured – A gradient is applied to neighbors and orthologs • Upstream (yellow to green) • Downstream (yellow to blue) • Encodes distance to target, order and orientation • Outcome: Make priority features stand out
  39. 39. Case Study: Neighborhood around orthologs to a hypothetical protein in 673 draft genomes from E.coli.
  40. 40. Video
  41. 41. What’s next?
  42. 42. Multiple views showing different kinds of biological data (mock ups) • Biologists don’t typically examine just one data type, but many at once • Difficult to do on small displays
  43. 43. How: Sage2 • Web-based • Multi-window • Collaborative • Tiled-display wall system
  44. 44. Articulate: Natural language inputs
  45. 45. Supporting big data in ecology
  46. 46. 5 years from now? 10 years from now? • Digital wall paper • Naturalistic inputs to visualization • ‘smart’ systems to track your behavior • Advances in graphics • Enable: High resolution ‘smart rooms’ for science?
  47. 47. What excites me • At its best, scientists are expressing curiosity, passion, interest, joy through their work • Big data presents a fantastic opportunity, but needs data visualization to keep scientists in the loop • Need data visualization so we can make big decisions about how to use science • I hope to see a more visually rich and beautiful world of technology for science
  48. 48. Thanks! www.evl.uic.edu Jillian.aurisano@gmail.com • Acknowledgements: – Andy Johnson, my advisor – Jason Leigh, my former advisor – The scientists I have worked with – All the of EVL – Lance Long, for the beautiful pictures – My grandma and my family

×