SlideShare a Scribd company logo
1 of 30
Download to read offline
BIODATA ANALYSIS
                                  &
                            VISUALIZATION
                                        Jan Aerts
                           Faculty of Engineering - ESAT/SCD

                             http://saaientist.blogspot.com
                                         @jandot



Tuesday 1 February 2011
Involved in genomics research:

    •chicken, cow, human genome DNA sequencing

    •search for genetic variation responsible for phenotype/
     disease

        Issues with

    •filtering: finding the correct set of parameters

    •pattern searching: grasping the significance and effect of
     the mutations

        => visual analytics
Tuesday 1 February 2011
A. Filtering

                          Investigating parameter space...




Tuesday 1 February 2011
putative mutations

       filter 1


       filter 2


       filter 3
                          A         B                 C

            different settings
                 for filters

Tuesday 1 February 2011
What we find...

                                          B
                             A




                                      C



Tuesday 1 February 2011
What we find...

                                                B
                             A




                                      C    State of the art: run
                                          many filter pipelines
                                          and take intersection

Tuesday 1 February 2011
What we should have found...

                                  B
                          A




                              C



Tuesday 1 February 2011
parameter-space




Tuesday 1 February 2011
Tuesday 1 February 2011
Tuesday 1 February 2011
Tuesday 1 February 2011
sometimes: bypass parameter-setting




Tuesday 1 February 2011
Tuesday 1 February 2011
Tuesday 1 February 2011
Tuesday 1 February 2011
Aim: use interactive visualization of the “raw” data
        to:

    •peep inside the black box
    •get feel for the data
    •get feel for how filter settings influence each other


Tuesday 1 February 2011
Aim: use interactive visualization of the “raw” data
        to:

    •peep inside the black box                 di sease
                                   radic ate
    •get feel for the data     E
    •get feel for how filter settings influence each other


Tuesday 1 February 2011
B. Pattern searching

                          Making sense of our data...




Tuesday 1 February 2011
Tuesday 1 February 2011
Tuesday 1 February 2011
Typical example: gene networks
                           => can we identify patterns?




                           same
                          network




Tuesday 1 February 2011
How do these networks differ?




Tuesday 1 February 2011
Hive Plots, taken from http://mkweb.bcgsc.ca/linnet/




Tuesday 1 February 2011
Aim: help researchers make sense of complicated
        data:

    • gene                networks

    • structural             variation in the genome

    • linked              data

    • ...



Tuesday 1 February 2011
Aim: help researchers make sense of complicated
        data:
                                                       dise ase
    • gene                networks
                                          radic ate
                                      E
    • structural             variation in the genome

    • linked              data

    • ...



Tuesday 1 February 2011
Hurdles:

    • big         data (millions/billions of datapoints)

            => makes interactivity difficult

            solution: indexing methods, data formats,
            dimensionality reduction, ...

    • visual              encoding


Tuesday 1 February 2011
Tools




Tuesday 1 February 2011
User groups:

        researcher => clinician => patient




Tuesday 1 February 2011
Bioinformatics
                          Visualization

Tuesday 1 February 2011
So:

    •visual analytics: visually identifying patterns in large
        datasets to inform on statistical analysis

    •use visualization to make sense of complex data




Tuesday 1 February 2011

More Related Content

More from Jan Aerts

Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Jan Aerts
 

More from Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 

LICT Human-Machine-Interface