SlideShare a Scribd company logo
1 of 48
Art of Science: Comparative
Genomics, Visualization and Big
Displays
Jillian Aurisano
June 9, 2016
About me
Outline
• Big data and how it changes the way we do
science
• How visualization and big displays may help
• Walk through the process of building a new
visualization with scientists for big displays
• Discuss current and future work and where I
think we are headed
My grandma
• Lab tech at a university specializing in
blood
– Blood types, blood groups, antibodies,
transfusion
– Everything to do with blood
• Bottom-up understanding of how science
works
• Informed my understanding of science
My grandma’s story of science
Collect samples
Look through the
microscope
See patterns
See anomaliesCall upon
expertise
My grandma’s recipe for good science
Science!
- unlimited observat ions
- deep expert ise
- endless curiosity
- lots of grunt work
- a pinch of luck
- a community of
scient ists
- Know where your samples come
from
- Record observat ions
- Be careful of your assumpt ions
- Don't be in a t unnel
- Consider alternate explanat ions
Bake for 40+ years.
Serves your community
and humanity
Eureka!
Science and scientists
• Looking at other scientists, something in common:
– People looking, observing, thinking, exploring, communicating, making
decisions
– using human expertise, curiosity, confusion, excitement…
• Deeply human process to investigate the world and produce new
knowledge
Science research in college
• In college I started doing
biology research in an
immunology lab studying how
immune cells developed
• My grandma would ask me if I
looked under the microscope
and made observations
My story of science
Generate samples
Apply to a chip
My “big data” story of science
• Tradition methods: a student would focus on
collecting one or two data points
– Closer to my grandma’s experience
– Couldn’t directly observe these molecules, but you
were isolating a small picture and collecting a small
result that could be understood
• New methods: digitized data collection allowed
one student to collect thousands of data points
– Potentially more comprehensive picture
– Fast, efficient and cheap
– But very difficult to directly understand and control
Science not from my grandma’s recipe
Science!
- unlimited observat ions
- deep expert ise
- endless curiosity
- lots of grunt work
- a pinch of luck
- a community of
scient ists
- Know where your samples come
from
- Record observat ions
- Be careful of your assumpt ions
- Don't be in a t unnel
- Consider alternate explanat ions
Bake for 40+ years.
Serves your community
and humanity
• I was removed from generating the
data (black box)
• Few observations:
– The processes we studied were
too small to observe
– Our big data result was hard to
map into what I knew
• Assumption, tunnel thinking were
baked into every step of this
process
– By necessity
– Too hard to consider alternate
explanations
The opportunity and cost of big data
• Measuring more, drawing a
big picture
• But our capacity to
understand does not grow
at the same rate as our
data
• For me, for my grandma:
disorientation from losing
direct connection with
science
“No one looks under a
microscope anymore.
It is all DNA and
computers and chips.
How do we make
discoveries?”
• Automated systems will help with
big data
• But it is not just about computers
giving us answers
• People can build and transmit
new scientific knowledge
• We need to give scientists access
points to computational methods
• Use my grandma’s image: To do
this: we need some sort of big
data microscope
Picard: “Computer: scan everything, run
diagnostics, and tell us the answer.”
Computer: ”The answer is 42.”
Role for computational approaches in
big data and science
There is a computer science field which focuses
bringing scientists back ’into the loop’ and
building ways for scientists to observe, explore,
use prior knowledge, share findings…
(all the things that has made science work)
There is a computer science field which focuses
bringing scientists back ’into the loop’ and
building ways for scientists to observe, explore,
use prior knowledge, share findings…
(all the things that has made science work)
DATA VISUALIZATION
What is data visualization?
Goal: Visually representing data on interactive
devices so that users can view, explore and
analyze data and share findings with others
Powerful human visual system
• Around 60% of our brain is involved in
processing visual information
• Evolved to recognize visual patterns, outliers
and trends
• To bring our expertise to bear on data, we just
need good visual representations of data
Research in visualization
• Several international research conferences and journals on data visualization
– http://www.ieeevis.org/
• Questions:
– How to best represent data of different types
– How to design efficient algorithms for representing data
– How to help users perform different kinds of tasks
– How to design new ways to interact with visualizations
• Combines computer science, psychology, art, math/statistics, diverse application
domains (sciences, engineering, business, humanities, journalism, sports…)
My lucky break into the data vis world
• Just starting my MS degree in computer science, I discovered the
Electronic Visualization Lab at University of Illinois at Chicago
– Big displays, touch displays, stereoscopic displays, gesture recognition
• One day later: A group of biologists had a big data problem and believed
new visualizations and big displays could help
• No one in the lab knew biology
My lucky break into the data vis world
• Just starting my MS degree in computer science, I discovered the
Electronic Visualization Lab at University of Illinois at Chicago
– Big displays, touch displays, stereoscopic displays, gesture recognition
• One day later: A group of biologists had a big data problem and believed
new visualizations and big displays could help
• No one in the lab knew biology
This is amazing.
I need to work
here!
My lucky break into the data vis world
• Just starting my MS degree in computer science, I discovered the
Electronic Visualization Lab at University of Illinois at Chicago
– Big displays, touch displays, stereoscopic displays, gesture recognition
• One day later: A group of biologists had a big data problem and believed
new visualizations and big displays could help
• No one in the lab knew biology
This is amazing.
I need to work
here!
EVL history
• Founded in
1973
• art/CS lab
• Developing
new
environments
for visualizing
data and
collaborating
EVL today: Big displays for big data
• Big data revolution in
science
• At the same time:
display resolutions and
sizes also increasing
• Improved rendering
power from graphics
cards
• Tiled display walls
using
– Display clusters
– Single machine with
multiple graphics
cards
Can big displays help with big data?
• These environments are cool and futuristic and
beautiful but…
• Can they help us solve big data problems?
BactoGeNIE overview
• Worked with a team of biologists who had thousands of bacterial genomes and a large tiled display
wall
• We learned that we needed new visualizations that would
– Scale up to the wall
– Scale up to large data volumes
Next: the motivating problem and how I came up with the design. Example for how big displays could
help with big data.
My biology collaborators and their
genome sequencing boom
• In 2000 it took billions of dollars,
hundreds of researchers to
sequence the human genome
• Since then, changes in genome
sequencing technology enabled
cheap and fast genome
sequencing
• My bacterial genomics
collaborators suddenly could
sequence thousands of complete
genome sequences of closely
related bacterial strains
Why are bacterial genome sequences
important?• Understanding bacterial
genomes will help us
– Develop antibiotics
– Understand antibiotic
resistance
– Find genes that may be useful
in drug development and
agriculture
• Finding subtle differences
between genomes in related
strains may help us explain why
strains of bacteria have
different properties
– Eg. One is antibiotic resistant,
another is not
https://www.patricbrc.org/
portal/portal/patric/Home
What is a genome sequence? What
does the data look like?
• Genome: complete genetic material for an organism,
consists of a set of long sequence of nucleotide –
chemical building blocks of DNA
• Genes: a small sequence of nucleotides within a
genome that encodes a product, such as a protein,
which performs functions in an organism.
• Genomic data includes
– Sequence: is composed of a linear sequence of subunits called
nucleotides.
– Annotation: position of genes and other elements within the
genome sequence
• With the gene sequences, can identify related genes
across different genomes: Orthologs
Specific problem: Comparative Gene
Neighborhood Analysis
• In bacteria, a gene’s neighbors in the
genome may be involved in similar
functions.
• Sequencing genomes would allow
researchers to compare neighborhoods
around interesting genes
• This would allow my collaborators to
– Explore to find new genes
– Dig into differences between gene
neighborhoods in related bacterial strains
gene1 gene2 gene3 gene4
Biological process
?
?
What we needed
• We needed a visualization that would
– Show the interesting differences and similarities
around genes of interest
– Scale to lots of genomes
– Scale to big displays
• How should we ‘draw’ this genomic data to
help the researchers do their work?
First: looked at existing visualizations
• Could they find the features that interested
them?
• Did these scale up to larger numbers of
genomes?
• Designed for
– Small collections of genomes (2-9), small numbers of
genes
– Because the ability to sequence so many genomes is
new!
• Why didn’t they scale
- Line connections and text: visual clutter as you scale-up
- Color to show similarity- but not enough colors
• Conclusion: encodings and layouts incompatible
with large numbers of gene neighborhoods
McKay et al. Using
the Generic Synteny
Browser
(GBrowse_syn).
Current protocols in
Bioinformatics
Hoboken, NJ, USA:
John Wiley & Sons
Fong, Christine, et al. "PSAT:
a web tool to compare
genomic neighborhoods of
multiple prokaryotic
genomes." BMC
bioinformatics 9.1 (2008):
170.
Designed
for this
Not for
this
Next: What did they want to observe?
Content Order and orientation
Context for addressing errors in data
verification
Break
Strain 1
Strain 2
Strain 3
Ground truth
Strain 1
Strain 2
Strain 3
Strain 2
B
B
B
C
C
C
D
D
D
B
B
B
C
C
C
D
D
D
Break pt
Break pt
Gap
Strain 1
Strain 2
Strain 3
Ground truth
Strain 1
Strain 2
Strain 3
A B C D
A B C D
A
A B C
D
D
A
A B C
D
D
A
A
A
A
A
A
Developed our basic encoding
How to make a high density
design?
• Traditional visualizations use lines and text to
indicate related genes in different genomes
– Low density
– Lots of visual clutter
– Hard to follow on compressed, high pixel-
density displays
• Our solution: High density encoding
– Color to encode similarity
– Removed the text, made it available ‘on
demand’
gene id
gene id
gene id
gene id
gene
id
gene
id
gene
id
gene
id
gene id
gene id
gene id
gene id
gene
id
gene
id
gene
id
gene
id
gene id
gene id
gene id
gene id
gene
id
gene
id
gene
id
gene
id
high-density display
color, not orthology lines
identification on-demand
Existing,
low-density
approaches:
>100 pixels
BactoGeNIE
8-16 pixels
How to design for large displays?
• Goal: consider whether design scales-up spatially across a big display.
• An increase in display size could hamper the perception of data and relationships.
– When related entities are on opposite ends of the display, preventing direct
comparison
How to design for large displays?
• Solution: Features to enable clustering, grouping and
alignment features to bring related genes and their
neighborhoods together to enable comparisons
Design to target perceptual scalability
• Perceptual scalability:
– Allow someone to look across a large
number of entities on a big display
surface and see patterns
• Interaction design: gene targeting function:
– User selects a gene of interest
– Scene is reconfigured
– A gradient is applied to neighbors and
orthologs
• Upstream (yellow to green)
• Downstream (yellow to blue)
• Encodes distance to target, order and
orientation
• Outcome: Make priority features stand out
Case Study: Neighborhood around orthologs to a hypothetical
protein in 673 draft genomes from E.coli.
Video
What’s next?
Multiple views showing different kinds of biological
data (mock ups)
• Biologists don’t typically examine just one data type,
but many at once
• Difficult to do on small displays
How: Sage2
• Web-based
• Multi-window
• Collaborative
• Tiled-display wall
system
Articulate: Natural language inputs
Supporting big data in ecology
5 years from now? 10 years from
now?
• Digital wall paper
• Naturalistic inputs to visualization
• ‘smart’ systems to track your behavior
• Advances in graphics
• Enable: High resolution ‘smart rooms’ for science?
What excites me
• At its best, scientists are expressing curiosity, passion,
interest, joy through their work
• Big data presents a fantastic opportunity, but needs
data visualization to keep scientists in the loop
• Need data visualization so we can make big decisions
about how to use science
• I hope to see a more visually rich and beautiful world
of technology for science
Thanks!
www.evl.uic.edu
Jillian.aurisano@gmail.com
• Acknowledgements:
– Andy Johnson, my advisor
– Jason Leigh, my former advisor
– The scientists I have worked with
– All the of EVL
– Lance Long, for the beautiful pictures
– My grandma and my family

More Related Content

Recently uploaded

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Recently uploaded (20)

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Comparative Genomics, Visualization and Big Displays

  • 1. Art of Science: Comparative Genomics, Visualization and Big Displays Jillian Aurisano June 9, 2016
  • 3. Outline • Big data and how it changes the way we do science • How visualization and big displays may help • Walk through the process of building a new visualization with scientists for big displays • Discuss current and future work and where I think we are headed
  • 4. My grandma • Lab tech at a university specializing in blood – Blood types, blood groups, antibodies, transfusion – Everything to do with blood • Bottom-up understanding of how science works • Informed my understanding of science
  • 5. My grandma’s story of science Collect samples Look through the microscope See patterns See anomaliesCall upon expertise
  • 6. My grandma’s recipe for good science Science! - unlimited observat ions - deep expert ise - endless curiosity - lots of grunt work - a pinch of luck - a community of scient ists - Know where your samples come from - Record observat ions - Be careful of your assumpt ions - Don't be in a t unnel - Consider alternate explanat ions Bake for 40+ years. Serves your community and humanity Eureka!
  • 7. Science and scientists • Looking at other scientists, something in common: – People looking, observing, thinking, exploring, communicating, making decisions – using human expertise, curiosity, confusion, excitement… • Deeply human process to investigate the world and produce new knowledge
  • 8. Science research in college • In college I started doing biology research in an immunology lab studying how immune cells developed • My grandma would ask me if I looked under the microscope and made observations
  • 9. My story of science Generate samples Apply to a chip
  • 10. My “big data” story of science • Tradition methods: a student would focus on collecting one or two data points – Closer to my grandma’s experience – Couldn’t directly observe these molecules, but you were isolating a small picture and collecting a small result that could be understood • New methods: digitized data collection allowed one student to collect thousands of data points – Potentially more comprehensive picture – Fast, efficient and cheap – But very difficult to directly understand and control
  • 11. Science not from my grandma’s recipe Science! - unlimited observat ions - deep expert ise - endless curiosity - lots of grunt work - a pinch of luck - a community of scient ists - Know where your samples come from - Record observat ions - Be careful of your assumpt ions - Don't be in a t unnel - Consider alternate explanat ions Bake for 40+ years. Serves your community and humanity • I was removed from generating the data (black box) • Few observations: – The processes we studied were too small to observe – Our big data result was hard to map into what I knew • Assumption, tunnel thinking were baked into every step of this process – By necessity – Too hard to consider alternate explanations
  • 12. The opportunity and cost of big data • Measuring more, drawing a big picture • But our capacity to understand does not grow at the same rate as our data • For me, for my grandma: disorientation from losing direct connection with science “No one looks under a microscope anymore. It is all DNA and computers and chips. How do we make discoveries?”
  • 13. • Automated systems will help with big data • But it is not just about computers giving us answers • People can build and transmit new scientific knowledge • We need to give scientists access points to computational methods • Use my grandma’s image: To do this: we need some sort of big data microscope Picard: “Computer: scan everything, run diagnostics, and tell us the answer.” Computer: ”The answer is 42.” Role for computational approaches in big data and science
  • 14. There is a computer science field which focuses bringing scientists back ’into the loop’ and building ways for scientists to observe, explore, use prior knowledge, share findings… (all the things that has made science work)
  • 15. There is a computer science field which focuses bringing scientists back ’into the loop’ and building ways for scientists to observe, explore, use prior knowledge, share findings… (all the things that has made science work) DATA VISUALIZATION
  • 16. What is data visualization? Goal: Visually representing data on interactive devices so that users can view, explore and analyze data and share findings with others
  • 17. Powerful human visual system • Around 60% of our brain is involved in processing visual information • Evolved to recognize visual patterns, outliers and trends • To bring our expertise to bear on data, we just need good visual representations of data
  • 18. Research in visualization • Several international research conferences and journals on data visualization – http://www.ieeevis.org/ • Questions: – How to best represent data of different types – How to design efficient algorithms for representing data – How to help users perform different kinds of tasks – How to design new ways to interact with visualizations • Combines computer science, psychology, art, math/statistics, diverse application domains (sciences, engineering, business, humanities, journalism, sports…)
  • 19. My lucky break into the data vis world • Just starting my MS degree in computer science, I discovered the Electronic Visualization Lab at University of Illinois at Chicago – Big displays, touch displays, stereoscopic displays, gesture recognition • One day later: A group of biologists had a big data problem and believed new visualizations and big displays could help • No one in the lab knew biology
  • 20. My lucky break into the data vis world • Just starting my MS degree in computer science, I discovered the Electronic Visualization Lab at University of Illinois at Chicago – Big displays, touch displays, stereoscopic displays, gesture recognition • One day later: A group of biologists had a big data problem and believed new visualizations and big displays could help • No one in the lab knew biology This is amazing. I need to work here!
  • 21. My lucky break into the data vis world • Just starting my MS degree in computer science, I discovered the Electronic Visualization Lab at University of Illinois at Chicago – Big displays, touch displays, stereoscopic displays, gesture recognition • One day later: A group of biologists had a big data problem and believed new visualizations and big displays could help • No one in the lab knew biology This is amazing. I need to work here!
  • 22. EVL history • Founded in 1973 • art/CS lab • Developing new environments for visualizing data and collaborating
  • 23. EVL today: Big displays for big data • Big data revolution in science • At the same time: display resolutions and sizes also increasing • Improved rendering power from graphics cards • Tiled display walls using – Display clusters – Single machine with multiple graphics cards
  • 24. Can big displays help with big data? • These environments are cool and futuristic and beautiful but… • Can they help us solve big data problems?
  • 25. BactoGeNIE overview • Worked with a team of biologists who had thousands of bacterial genomes and a large tiled display wall • We learned that we needed new visualizations that would – Scale up to the wall – Scale up to large data volumes Next: the motivating problem and how I came up with the design. Example for how big displays could help with big data.
  • 26. My biology collaborators and their genome sequencing boom • In 2000 it took billions of dollars, hundreds of researchers to sequence the human genome • Since then, changes in genome sequencing technology enabled cheap and fast genome sequencing • My bacterial genomics collaborators suddenly could sequence thousands of complete genome sequences of closely related bacterial strains
  • 27. Why are bacterial genome sequences important?• Understanding bacterial genomes will help us – Develop antibiotics – Understand antibiotic resistance – Find genes that may be useful in drug development and agriculture • Finding subtle differences between genomes in related strains may help us explain why strains of bacteria have different properties – Eg. One is antibiotic resistant, another is not https://www.patricbrc.org/ portal/portal/patric/Home
  • 28. What is a genome sequence? What does the data look like? • Genome: complete genetic material for an organism, consists of a set of long sequence of nucleotide – chemical building blocks of DNA • Genes: a small sequence of nucleotides within a genome that encodes a product, such as a protein, which performs functions in an organism. • Genomic data includes – Sequence: is composed of a linear sequence of subunits called nucleotides. – Annotation: position of genes and other elements within the genome sequence • With the gene sequences, can identify related genes across different genomes: Orthologs
  • 29. Specific problem: Comparative Gene Neighborhood Analysis • In bacteria, a gene’s neighbors in the genome may be involved in similar functions. • Sequencing genomes would allow researchers to compare neighborhoods around interesting genes • This would allow my collaborators to – Explore to find new genes – Dig into differences between gene neighborhoods in related bacterial strains gene1 gene2 gene3 gene4 Biological process ? ?
  • 30. What we needed • We needed a visualization that would – Show the interesting differences and similarities around genes of interest – Scale to lots of genomes – Scale to big displays • How should we ‘draw’ this genomic data to help the researchers do their work?
  • 31. First: looked at existing visualizations • Could they find the features that interested them? • Did these scale up to larger numbers of genomes? • Designed for – Small collections of genomes (2-9), small numbers of genes – Because the ability to sequence so many genomes is new! • Why didn’t they scale - Line connections and text: visual clutter as you scale-up - Color to show similarity- but not enough colors • Conclusion: encodings and layouts incompatible with large numbers of gene neighborhoods McKay et al. Using the Generic Synteny Browser (GBrowse_syn). Current protocols in Bioinformatics Hoboken, NJ, USA: John Wiley & Sons Fong, Christine, et al. "PSAT: a web tool to compare genomic neighborhoods of multiple prokaryotic genomes." BMC bioinformatics 9.1 (2008): 170.
  • 33. Next: What did they want to observe? Content Order and orientation Context for addressing errors in data verification Break Strain 1 Strain 2 Strain 3 Ground truth Strain 1 Strain 2 Strain 3 Strain 2 B B B C C C D D D B B B C C C D D D Break pt Break pt Gap Strain 1 Strain 2 Strain 3 Ground truth Strain 1 Strain 2 Strain 3 A B C D A B C D A A B C D D A A B C D D A A A A A A
  • 35. How to make a high density design? • Traditional visualizations use lines and text to indicate related genes in different genomes – Low density – Lots of visual clutter – Hard to follow on compressed, high pixel- density displays • Our solution: High density encoding – Color to encode similarity – Removed the text, made it available ‘on demand’ gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id gene id high-density display color, not orthology lines identification on-demand Existing, low-density approaches: >100 pixels BactoGeNIE 8-16 pixels
  • 36. How to design for large displays? • Goal: consider whether design scales-up spatially across a big display. • An increase in display size could hamper the perception of data and relationships. – When related entities are on opposite ends of the display, preventing direct comparison
  • 37. How to design for large displays? • Solution: Features to enable clustering, grouping and alignment features to bring related genes and their neighborhoods together to enable comparisons
  • 38. Design to target perceptual scalability • Perceptual scalability: – Allow someone to look across a large number of entities on a big display surface and see patterns • Interaction design: gene targeting function: – User selects a gene of interest – Scene is reconfigured – A gradient is applied to neighbors and orthologs • Upstream (yellow to green) • Downstream (yellow to blue) • Encodes distance to target, order and orientation • Outcome: Make priority features stand out
  • 39. Case Study: Neighborhood around orthologs to a hypothetical protein in 673 draft genomes from E.coli.
  • 40. Video
  • 42. Multiple views showing different kinds of biological data (mock ups) • Biologists don’t typically examine just one data type, but many at once • Difficult to do on small displays
  • 43. How: Sage2 • Web-based • Multi-window • Collaborative • Tiled-display wall system
  • 45. Supporting big data in ecology
  • 46. 5 years from now? 10 years from now? • Digital wall paper • Naturalistic inputs to visualization • ‘smart’ systems to track your behavior • Advances in graphics • Enable: High resolution ‘smart rooms’ for science?
  • 47. What excites me • At its best, scientists are expressing curiosity, passion, interest, joy through their work • Big data presents a fantastic opportunity, but needs data visualization to keep scientists in the loop • Need data visualization so we can make big decisions about how to use science • I hope to see a more visually rich and beautiful world of technology for science
  • 48. Thanks! www.evl.uic.edu Jillian.aurisano@gmail.com • Acknowledgements: – Andy Johnson, my advisor – Jason Leigh, my former advisor – The scientists I have worked with – All the of EVL – Lance Long, for the beautiful pictures – My grandma and my family

Editor's Notes

  1. My grandma was a lab tech at a university and she specialized in everything to do with blood Blood types, blood groups, antibodies, transfusions If there was something we know about blood, my grandma was all about it She had wanted to become a doctor, but it was the 50s and 60s and she faces some discouragement But, her career gave her this deep bottom-up perspective on how science worked And she shared this understanding with me when I was a kid Though it was odd to grow up with lots of strories about blood and someone deeply deeply invested in blood
  2. She told me lots of stories about the things she found and how her job went, and it generally went something like this:
  3. When she would tell me these stories, it usually ended with some advice for me.  And this advice could be expressed as a recipe for good science. Some grandma’s pass along recipes for oat meal cookies.  Others pass along how to do science.  Know where your data comes from Observations matter Assumptions are the root of all problems Try to avoid tunnel thinking Think about alternate explanations and of course bake for 50 years at room temperature
  4. So, the core of this recipe, the core of how my grandma did science, was my grandma.  She was the one looking, observing, noticing oddities, following up, bringing all her curiosity and interest and persistence and… all these distinctively human things to the process.   And this is how science has historically been done.  My grandma in her community was echoing the same amazing, crazy, passionate insanity that has had such an immense impact on our world.  
  5. There are lots of visualizations that pertain to this problem, but it became clear, when discussing with the scientists that these ideas did not scale visually. Fundamentally not designed for comparative tasks across large collections of genomes. There has been fantastic work on large displays by X and Y, but not to address this domain problem.