Comparative Genomics, Visualization and Big Displays

Art of Science: Comparative
Genomics, Visualization and Big
Displays
Jillian Aurisano
June 9, 2016

Outline
• Big data and how it changes the way we do
science
• How visualization and big displays may help
• Walk through the process of building a new
visualization with scientists for big displays
• Discuss current and future work and where I
think we are headed

My grandma
• Lab tech at a university specializing in
blood
– Blood types, blood groups, antibodies,
transfusion
– Everything to do with blood
• Bottom-up understanding of how science
works
• Informed my understanding of science

My grandma’s story of science
Collect samples
Look through the
microscope
See patterns
See anomaliesCall upon
expertise

My grandma’s recipe for good science
Science!
- unlimited observat ions
- deep expert ise
- endless curiosity
- lots of grunt work
- a pinch of luck
- a community of
scient ists
- Know where your samples come
from
- Record observat ions
- Be careful of your assumpt ions
- Don't be in a t unnel
- Consider alternate explanat ions
Bake for 40+ years.
Serves your community
and humanity
Eureka!

Science and scientists
• Looking at other scientists, something in common:
– People looking, observing, thinking, exploring, communicating, making
decisions
– using human expertise, curiosity, confusion, excitement…
• Deeply human process to investigate the world and produce new
knowledge

Science research in college
• In college I started doing
biology research in an
immunology lab studying how
immune cells developed
• My grandma would ask me if I
looked under the microscope
and made observations

My story of science
Generate samples
Apply to a chip

My “big data” story of science
• Tradition methods: a student would focus on
collecting one or two data points
– Closer to my grandma’s experience
– Couldn’t directly observe these molecules, but you
were isolating a small picture and collecting a small
result that could be understood
• New methods: digitized data collection allowed
one student to collect thousands of data points
– Potentially more comprehensive picture
– Fast, efficient and cheap
– But very difficult to directly understand and control

Science not from my grandma’s recipe
Science!
- unlimited observat ions
- deep expert ise
- endless curiosity
- lots of grunt work
- a pinch of luck
- a community of
scient ists
- Know where your samples come
from
- Record observat ions
- Be careful of your assumpt ions
- Don't be in a t unnel
- Consider alternate explanat ions
Bake for 40+ years.
Serves your community
and humanity
• I was removed from generating the
data (black box)
• Few observations:
– The processes we studied were
too small to observe
– Our big data result was hard to
map into what I knew
• Assumption, tunnel thinking were
baked into every step of this
process
– By necessity
– Too hard to consider alternate
explanations

The opportunity and cost of big data
• Measuring more, drawing a
big picture
• But our capacity to
understand does not grow
at the same rate as our
data
• For me, for my grandma:
disorientation from losing
direct connection with
science
“No one looks under a
microscope anymore.
It is all DNA and
computers and chips.
How do we make
discoveries?”

• Automated systems will help with
big data
• But it is not just about computers
giving us answers
• People can build and transmit
new scientific knowledge
• We need to give scientists access
points to computational methods
• Use my grandma’s image: To do
this: we need some sort of big
data microscope
Picard: “Computer: scan everything, run
diagnostics, and tell us the answer.”
Computer: ”The answer is 42.”
Role for computational approaches in
big data and science

There is a computer science field which focuses
bringing scientists back ’into the loop’ and
building ways for scientists to observe, explore,
use prior knowledge, share findings…
(all the things that has made science work)

There is a computer science field which focuses
bringing scientists back ’into the loop’ and
building ways for scientists to observe, explore,
use prior knowledge, share findings…
(all the things that has made science work)
DATA VISUALIZATION

What is data visualization?
Goal: Visually representing data on interactive
devices so that users can view, explore and
analyze data and share findings with others

Powerful human visual system
• Around 60% of our brain is involved in
processing visual information
• Evolved to recognize visual patterns, outliers
and trends
• To bring our expertise to bear on data, we just
need good visual representations of data

Research in visualization
• Several international research conferences and journals on data visualization
– http://www.ieeevis.org/
• Questions:
– How to best represent data of different types
– How to design efficient algorithms for representing data
– How to help users perform different kinds of tasks
– How to design new ways to interact with visualizations
• Combines computer science, psychology, art, math/statistics, diverse application
domains (sciences, engineering, business, humanities, journalism, sports…)

My lucky break into the data vis world
• Just starting my MS degree in computer science, I discovered the
Electronic Visualization Lab at University of Illinois at Chicago
– Big displays, touch displays, stereoscopic displays, gesture recognition
• One day later: A group of biologists had a big data problem and believed
new visualizations and big displays could help
• No one in the lab knew biology

My lucky break into the data vis world
• Just starting my MS degree in computer science, I discovered the
Electronic Visualization Lab at University of Illinois at Chicago
– Big displays, touch displays, stereoscopic displays, gesture recognition
• One day later: A group of biologists had a big data problem and believed
new visualizations and big displays could help
• No one in the lab knew biology
This is amazing.
I need to work
here!

EVL history
• Founded in
1973
• art/CS lab
• Developing
new
environments
for visualizing
data and
collaborating

EVL today: Big displays for big data
• Big data revolution in
science
• At the same time:
display resolutions and
sizes also increasing
• Improved rendering
power from graphics
cards
• Tiled display walls
using
– Display clusters
– Single machine with
multiple graphics
cards

Can big displays help with big data?
• These environments are cool and futuristic and
beautiful but…
• Can they help us solve big data problems?

BactoGeNIE overview
• Worked with a team of biologists who had thousands of bacterial genomes and a large tiled display
wall
• We learned that we needed new visualizations that would
– Scale up to the wall
– Scale up to large data volumes
Next: the motivating problem and how I came up with the design. Example for how big displays could
help with big data.

My biology collaborators and their
genome sequencing boom
• In 2000 it took billions of dollars,
hundreds of researchers to
sequence the human genome
• Since then, changes in genome
sequencing technology enabled
cheap and fast genome
sequencing
• My bacterial genomics
collaborators suddenly could
sequence thousands of complete
genome sequences of closely
related bacterial strains

Why are bacterial genome sequences
important?• Understanding bacterial
genomes will help us
– Develop antibiotics
– Understand antibiotic
resistance
– Find genes that may be useful
in drug development and
agriculture
• Finding subtle differences
between genomes in related
strains may help us explain why
strains of bacteria have
different properties
– Eg. One is antibiotic resistant,
another is not
https://www.patricbrc.org/
portal/portal/patric/Home

What is a genome sequence? What
does the data look like?
• Genome: complete genetic material for an organism,
consists of a set of long sequence of nucleotide –
chemical building blocks of DNA
• Genes: a small sequence of nucleotides within a
genome that encodes a product, such as a protein,
which performs functions in an organism.
• Genomic data includes
– Sequence: is composed of a linear sequence of subunits called
nucleotides.
– Annotation: position of genes and other elements within the
genome sequence
• With the gene sequences, can identify related genes
across different genomes: Orthologs

Specific problem: Comparative Gene
Neighborhood Analysis
• In bacteria, a gene’s neighbors in the
genome may be involved in similar
functions.
• Sequencing genomes would allow
researchers to compare neighborhoods
around interesting genes
• This would allow my collaborators to
– Explore to find new genes
– Dig into differences between gene
neighborhoods in related bacterial strains
gene1 gene2 gene3 gene4
Biological process
?
?

What we needed
• We needed a visualization that would
– Show the interesting differences and similarities
around genes of interest
– Scale to lots of genomes
– Scale to big displays
• How should we ‘draw’ this genomic data to
help the researchers do their work?

First: looked at existing visualizations
• Could they find the features that interested
them?
• Did these scale up to larger numbers of
genomes?
• Designed for
– Small collections of genomes (2-9), small numbers of
genes
– Because the ability to sequence so many genomes is
new!
• Why didn’t they scale
- Line connections and text: visual clutter as you scale-up
- Color to show similarity- but not enough colors
• Conclusion: encodings and layouts incompatible
with large numbers of gene neighborhoods
McKay et al. Using
the Generic Synteny
Browser
(GBrowse_syn).
Current protocols in
Bioinformatics
Hoboken, NJ, USA:
John Wiley & Sons
Fong, Christine, et al. "PSAT:
a web tool to compare
genomic neighborhoods of
multiple prokaryotic
genomes." BMC
bioinformatics 9.1 (2008):
170.

Designed
for this
Not for
this

Next: What did they want to observe?
Content Order and orientation
Context for addressing errors in data
verification
Break
Strain 1
Strain 2
Strain 3
Ground truth
Strain 1
Strain 2
Strain 3
Strain 2
B
B
B
C
C
C
D
D
D
B
B
B
C
C
C
D
D
D
Break pt
Break pt
Gap
Strain 1
Strain 2
Strain 3
Ground truth
Strain 1
Strain 2
Strain 3
A B C D
A B C D
A
A B C
D
D
A
A B C
D
D
A
A
A
A
A
A

How to make a high density
design?
• Traditional visualizations use lines and text to
indicate related genes in different genomes
– Low density
– Lots of visual clutter
– Hard to follow on compressed, high pixel-
density displays
• Our solution: High density encoding
– Color to encode similarity
– Removed the text, made it available ‘on
demand’
gene id
gene id
gene id
gene id
gene
id
gene
id
gene
id
gene
id
gene id
gene id
gene id
gene id
gene
id
gene
id
gene
id
gene
id
gene id
gene id
gene id
gene id
gene
id
gene
id
gene
id
gene
id
high-density display
color, not orthology lines
identiﬁcation on-demand
Existing,
low-density
approaches:
>100 pixels
BactoGeNIE
8-16 pixels

How to design for large displays?
• Goal: consider whether design scales-up spatially across a big display.
• An increase in display size could hamper the perception of data and relationships.
– When related entities are on opposite ends of the display, preventing direct
comparison

How to design for large displays?
• Solution: Features to enable clustering, grouping and
alignment features to bring related genes and their
neighborhoods together to enable comparisons

Design to target perceptual scalability
• Perceptual scalability:
– Allow someone to look across a large
number of entities on a big display
surface and see patterns
• Interaction design: gene targeting function:
– User selects a gene of interest
– Scene is reconfigured
– A gradient is applied to neighbors and
orthologs
• Upstream (yellow to green)
• Downstream (yellow to blue)
• Encodes distance to target, order and
orientation
• Outcome: Make priority features stand out

Case Study: Neighborhood around orthologs to a hypothetical
protein in 673 draft genomes from E.coli.

Multiple views showing different kinds of biological
data (mock ups)
• Biologists don’t typically examine just one data type,
but many at once
• Difficult to do on small displays

How: Sage2
• Web-based
• Multi-window
• Collaborative
• Tiled-display wall
system

Articulate: Natural language inputs

Supporting big data in ecology

5 years from now? 10 years from
now?
• Digital wall paper
• Naturalistic inputs to visualization
• ‘smart’ systems to track your behavior
• Advances in graphics
• Enable: High resolution ‘smart rooms’ for science?

What excites me
• At its best, scientists are expressing curiosity, passion,
interest, joy through their work
• Big data presents a fantastic opportunity, but needs
data visualization to keep scientists in the loop
• Need data visualization so we can make big decisions
about how to use science
• I hope to see a more visually rich and beautiful world
of technology for science

Thanks!
www.evl.uic.edu
Jillian.aurisano@gmail.com
• Acknowledgements:
– Andy Johnson, my advisor
– Jason Leigh, my former advisor
– The scientists I have worked with
– All the of EVL
– Lance Long, for the beautiful pictures
– My grandma and my family

Comparative Genomics, Visualization and Big Displays

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Comparative Genomics, Visualization and Big Displays

Editor's Notes