• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Humanizing bioinformatics
 

Humanizing bioinformatics

on

  • 3,710 views

In this talk, I explain the need for basic visualization know-how in bioinformatics.

In this talk, I explain the need for basic visualization know-how in bioinformatics.

Statistics

Views

Total Views
3,710
Views on SlideShare
3,662
Embed Views
48

Actions

Likes
10
Downloads
138
Comments
0

4 Embeds 48

http://paper.li 35
http://a0.twimg.com 7
http://tweetedtimes.com 5
https://si0.twimg.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Humanizing bioinformatics Humanizing bioinformatics Presentation Transcript

    • Humanizing bioinformaticsJan AertsAssistant Professor - ESAT/SCDBioData Analysis & VisualizationFaculty of EngineeringLeuven Universityjan.aerts@esat.kuleuven.be@jandot
    • whoami Leuven
    • whoami Wageningen
    • whoami Roslin
    • whoami Hinxton
    • whoami Leuven
    • why “humanizing bioinformatics”?
    • bout lk a ta at I’llwh scientific research paradigms - big & complex data - what about the user? - data visualization
    • scientific research throughout time
    • Science Paradigms 1st 1,000s years ago empirical 2nd 100s years ago theoretical 3rd last few decades computational 4rd today data exploration Jim Gray
    • Science Paradigms 1st 1,000s years ago empirical 2nd 100s years ago theoretical 3rd computational biology last few decades computational 4rd today bioinformatics data exploration Jim Gray
    • ever bigger datasetsever more complicated mining algorithms
    • case in point:genome sequencing
    • why do we sequence?
    • transcriptionally active sitesprotein-DNA interactions alternative splicing gene expression variation discoverycopy number variation miRNA expression & discovery
    • single nucleotide polymorphismscoverage reads polymorphisms gene model
    • structural variation Robberecht et al, 2010 Molecular Biology of the Cell, 4th Edition
    • Human Genome Project
    • automate, automate, automate
    • HGP:15 years, $3 billion, tens of labs => 1 genome now: 1 week, $5000, 1 technician => 1 genome
    • genome sequencing throughput Mardis, 2010
    • genome sequencing throughput“next-generation” sequencing platforms Mardis, 2010
    • NHGRI
    • Metzker et al, 2010
    • big throughput => big data
    • advanced data structures
    • advanced data miningsupport vector machine recursive feature e limination n ifold le arning ma adaptive cascade shar ing trees
    • “Dammit Jim, I’m a doctor, not a bioinformatician!” Christophe Lambert
    • “Dammit Jim, I’m a doctor, not a bioinformatician!” We’re alienating the user... too much data blind trust (?) in bioinformatician
    • but... what’s the question?what parameters should I use? can I trust this output? I can’t wrap my head around this...
    • what’s the question? 4th paradigm question -> hypothesis -> generate data
    • what’s the question? 4th paradigm question -> hypothesis -> generate data generate data -> see what we can do with it
    • Gene interaction data: “A regulates B”
    • what parameters should I use?
    • peak
    • but is this?
    • van de Wiel et al, 2010
    • T. Voet
    • can I trust this output? data filtering putative mutations filter 1 filter 2 filter 3 A B C different settings for filters
    • BA C
    • BA C State of the art: run many filter pipelines and take intersection
    • What we should have found... B A C
    • different algorithms forfinding the same thing
    • I can’t wrap my head around this... too much (?) info
    • treatment plan for cancer patientsheterogeneous datasets multiple abstraction levels multiple sources multiple formats patient/clinical data population/family data tissue samples MR/CT/X-ray pathways gene expression data collaborative data examination pathologist geneticist biologist
    • researcher is lost...
    • data visualization
    • “... the use of computer-supported, interactive, visual representations of data toamplify cognition” (S Card, J Mackinlay & B Schneiderman)“... computer-based visualization systems providing visual representations ofdatasets intended to help people carry out some task more effectively.” (TMunzner)
    • cognitive task => perceptive task
    • I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.80 mean x = 9.0 variance x = 11.0 correlation x & y = 0.816n = 11 mean y = 7.5 variance y = 4.12 regression line: y = 3+0.5x
    • exploration explanation
    • exploration explanationpictorial superiority effect “information” 72hr “informa” “i” 65% 1%
    • exploration explanation J van Wijk
    • exploration explanation
    • some of the principles (taken from T Munzner) know your visual encodings power of the plane danger of deptheyes beat memory overview, zoom and filter, details on demand overview, zoom and filter, details on demand overview, zoom and filter, details on demand overview, zoom and filter, details on demand overview, zoom and filter, details on demand ...
    • visual encoding channelsposition on common scaleposition on unaligned scale 2D size 3D size Mackinlay
    • “power of the plane”position on common scaleposition on unaligned scale 2D size 3D size
    • examples of sub-optimal encoding
    • Florence Nightingale
    • Florence Nightingale
    • Don’t believe everything you see
    • networks... <sigh>
    • same network Martin Krzewinsky
    • different networks! Martin Krzewinsky
    • 3D, anyone?
    • 3D, anyone? occlusion interaction complexity perspective distortion text legibility
    • Gene interaction data: “A regulates B”
    • regulatorworkhorse manager
    • size of effect shown in graphic“lie factor” = size of effect in data
    • Humanizing bioinformatics
    • Humanizing bioinformatics there and back againput the user back in the loop!
    • Thank you
    • Acknowledgments• graphics creators• Tamara Munzner• Martin Krzewinski
    • Image attributions ... got lost ... If you find something that’s yours, let me know!