Your SlideShare is downloading. ×
Humanizing bioinformatics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Humanizing bioinformatics

3,516

Published on

In this talk, I explain the need for basic visualization know-how in bioinformatics.

In this talk, I explain the need for basic visualization know-how in bioinformatics.

Published in: Design, Technology, Education
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,516
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
140
Comments
0
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Humanizing bioinformaticsJan AertsAssistant Professor - ESAT/SCDBioData Analysis & VisualizationFaculty of EngineeringLeuven Universityjan.aerts@esat.kuleuven.be@jandot
  • 2. whoami Leuven
  • 3. whoami Wageningen
  • 4. whoami Roslin
  • 5. whoami Hinxton
  • 6. whoami Leuven
  • 7. why “humanizing bioinformatics”?
  • 8. bout lk a ta at I’llwh scientific research paradigms - big & complex data - what about the user? - data visualization
  • 9. scientific research throughout time
  • 10. Science Paradigms 1st 1,000s years ago empirical 2nd 100s years ago theoretical 3rd last few decades computational 4rd today data exploration Jim Gray
  • 11. Science Paradigms 1st 1,000s years ago empirical 2nd 100s years ago theoretical 3rd computational biology last few decades computational 4rd today bioinformatics data exploration Jim Gray
  • 12. ever bigger datasetsever more complicated mining algorithms
  • 13. case in point:genome sequencing
  • 14. why do we sequence?
  • 15. transcriptionally active sitesprotein-DNA interactions alternative splicing gene expression variation discoverycopy number variation miRNA expression & discovery
  • 16. single nucleotide polymorphismscoverage reads polymorphisms gene model
  • 17. structural variation Robberecht et al, 2010 Molecular Biology of the Cell, 4th Edition
  • 18. Human Genome Project
  • 19. automate, automate, automate
  • 20. HGP:15 years, $3 billion, tens of labs => 1 genome now: 1 week, $5000, 1 technician => 1 genome
  • 21. genome sequencing throughput Mardis, 2010
  • 22. genome sequencing throughput“next-generation” sequencing platforms Mardis, 2010
  • 23. NHGRI
  • 24. Metzker et al, 2010
  • 25. big throughput => big data
  • 26. advanced data structures
  • 27. advanced data miningsupport vector machine recursive feature e limination n ifold le arning ma adaptive cascade shar ing trees
  • 28. “Dammit Jim, I’m a doctor, not a bioinformatician!” Christophe Lambert
  • 29. “Dammit Jim, I’m a doctor, not a bioinformatician!” We’re alienating the user... too much data blind trust (?) in bioinformatician
  • 30. but... what’s the question?what parameters should I use? can I trust this output? I can’t wrap my head around this...
  • 31. what’s the question? 4th paradigm question -> hypothesis -> generate data
  • 32. what’s the question? 4th paradigm question -> hypothesis -> generate data generate data -> see what we can do with it
  • 33. Gene interaction data: “A regulates B”
  • 34. what parameters should I use?
  • 35. peak
  • 36. but is this?
  • 37. van de Wiel et al, 2010
  • 38. T. Voet
  • 39. can I trust this output? data filtering putative mutations filter 1 filter 2 filter 3 A B C different settings for filters
  • 40. BA C
  • 41. BA C State of the art: run many filter pipelines and take intersection
  • 42. What we should have found... B A C
  • 43. different algorithms forfinding the same thing
  • 44. I can’t wrap my head around this... too much (?) info
  • 45. treatment plan for cancer patientsheterogeneous datasets multiple abstraction levels multiple sources multiple formats patient/clinical data population/family data tissue samples MR/CT/X-ray pathways gene expression data collaborative data examination pathologist geneticist biologist
  • 46. researcher is lost...
  • 47. data visualization
  • 48. “... the use of computer-supported, interactive, visual representations of data toamplify cognition” (S Card, J Mackinlay & B Schneiderman)“... computer-based visualization systems providing visual representations ofdatasets intended to help people carry out some task more effectively.” (TMunzner)
  • 49. cognitive task => perceptive task
  • 50. I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.80 mean x = 9.0 variance x = 11.0 correlation x & y = 0.816n = 11 mean y = 7.5 variance y = 4.12 regression line: y = 3+0.5x
  • 51. exploration explanation
  • 52. exploration explanationpictorial superiority effect “information” 72hr “informa” “i” 65% 1%
  • 53. exploration explanation J van Wijk
  • 54. exploration explanation
  • 55. some of the principles (taken from T Munzner) know your visual encodings power of the plane danger of deptheyes beat memory overview, zoom and filter, details on demand overview, zoom and filter, details on demand overview, zoom and filter, details on demand overview, zoom and filter, details on demand overview, zoom and filter, details on demand ...
  • 56. visual encoding channelsposition on common scaleposition on unaligned scale 2D size 3D size Mackinlay
  • 57. “power of the plane”position on common scaleposition on unaligned scale 2D size 3D size
  • 58. examples of sub-optimal encoding
  • 59. Florence Nightingale
  • 60. Florence Nightingale
  • 61. Don’t believe everything you see
  • 62. networks... <sigh>
  • 63. same network Martin Krzewinsky
  • 64. different networks! Martin Krzewinsky
  • 65. 3D, anyone?
  • 66. 3D, anyone? occlusion interaction complexity perspective distortion text legibility
  • 67. Gene interaction data: “A regulates B”
  • 68. regulatorworkhorse manager
  • 69. size of effect shown in graphic“lie factor” = size of effect in data
  • 70. Humanizing bioinformatics
  • 71. Humanizing bioinformatics there and back againput the user back in the loop!
  • 72. Thank you
  • 73. Acknowledgments• graphics creators• Tamara Munzner• Martin Krzewinski
  • 74. Image attributions ... got lost ... If you find something that’s yours, let me know!

×