Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Visualising Multi Dimensional Data

11,208 views

Published on

Even though exploring data visually is an integral part of the data analytic pipeline, we struggle to visually explore data once the number of dimensions go beyond three. This talk will focus on showcasing techniques to visually explore multi dimensional data p 3. The aim would be show examples of each of following techniques, potentially using one exemplar dataset. This talk was given at the Strata + Hadoop World Conference @ Singapore 2015 and at Fifth Elephant conference @ Bangalore, 2015

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • nice
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Great work!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Visualising Multi Dimensional Data

  1. 1. Amit Kapoor @amitkaps Visualising Multi - Dimensional Data x w z y
  2. 2. Flatland A Romance in Many Dimensions by Edwin Abbot (1884) Square
  3. 3. A square is but a line in 2d eye The Square
  4. 4. The disappearing circle The Sphere visits the 2d flatland The Sphere rising out of 2d space The Sphere on the point of vanishing eye
  5. 5. The square is a cube in 3d! eye The Square sees the world in a new way!
  6. 6. Show, not Tell
  7. 7. 70% of the sensory receptors are in the eyes 50% of the brain used for visual processing 100ms to get a sense of the visual scene Visual Wired Brain
  8. 8. Symbolic Abstraction Visual Abstraction Phenomena Source: Bret Victor
  9. 9. “Visualisation is the transformation of the symbolic into geometric”
  10. 10. Small Data Large Data Big Data Wide Data
  11. 11. Visualise Small Data Area Sales (Rs.) North 5 East 25 West 15 South 20 Central 10
  12. 12. Area Sales North 5 East 25 West 15 South 20 Central 10 Acquire Data
  13. 13. Area Sales North 5 East 25 West 15 South 20 Central 10 x y 1 5 2 25 3 15 4 20 5 10 x (C) = Area y (Q) = Sales Parse Variables Acquire Data
  14. 14. Area Sales North 5 East 25 West 15 South 20 Central 10 x y 1 5 2 25 3 15 4 20 5 10 x (C) = Area y (Q) = Sales x y 20 60 100 140 180 Encode Shape & Select Scales Parse Variables Acquire Data x - position, y - bar scale - 200 x 200
  15. 15. Area Sales North 5 East 25 West 15 South 20 Central 10 x y 1 5 2 25 3 15 4 20 5 10 x (C) = Area y (Q) = Sales x - position, y - bar scale - 200 x 200 x y 20 60 100 140 180 Parse Variables Acquire Data cartesian Render with Coordinates Encode Shape & Select Scales
  16. 16. Points Line Bar Bar - Stacked Bar - Stagger Coordinates System Create Visualisations
  17. 17. Coordinates Cartesian x y Dot Plot Line Chart Column Chart WaterfallStacked Column
  18. 18. Coordinates Cartesian - Flip Dot Plot Line Chart Bar Chart CascadeStacked Bar y x
  19. 19. Polar Coordinate - X r θ x = θ y = r Marked Radar Line Radar CoxComb Polar WaterfallBullseye
  20. 20. Polar Coordinate - Y r θ x = r y = θ Target Line Track Wind Rose Polar CascadePie Chart
  21. 21. Data Viz Process (Small Data)Acquire Data Encode Shape Select Scales Render Coordinates Parse Variables
  22. 22. gadfly bokeh ggplot2 matplotlib graphics
  23. 23. Small Data Large Data Big Data Wide Data
  24. 24. Visualise Large Data ~24,000 Pincodes e.g. Pincode : 560076 Latitude : 12.8843049° Longitude: 77.5967384° Place : Bannerghatta Pincodes in India
  25. 25. Pincode Map Scatter plot, play with alpha to show density But what if I want to show geographic nature of pincode?
  26. 26. Pincode+ Map Exploration of large data is iterative! Refine Data (Filter, Transform)
  27. 27. Data Viz Process (Large Data) Acquire Data Encode Shape Select Scales Render Coordinates Parse Variables Refine Data
  28. 28. Small Data Large Data Big Data Wide Data
  29. 29. Visualise Big Data x,y => 1,000,000 Comparable to the Number of Pixels on my MacBook Air 1400 x 900 Data
  30. 30. Data Sample Sampling can be effective (with overweighting unusual values) Require multiple plots or careful tuning parameters
  31. 31. Data Sample Model Models are great as they scale nicely. But, visualisation is required as “I don’t know, what I don’t know.”
  32. 32. Data Sample ModelBinning Binning can solve a lot of these challenges “Bin - Summarize - Smooth: A framework for visualising big data” - Hadley Wickam (2013) “imMens: Real-time Visual Querying of Big Data” - Liu, Jiang, Heer (2013)
  33. 33. Tools Matter Defaults Matter
  34. 34. “We are calling 2015 the year of the histogram” Amanda Cox
  35. 35. “Visualising big data is the process of creating generalized histograms” Amit Kapoor
  36. 36. Data Viz Process (Big Data) Acquire Data Encode Shape Select Scales Render Coordinates Parse Variables Filter Data Aggregate Data
  37. 37. Small Data Large Data Big Data Wide Data
  38. 38. Multi Dimensional Viz Standard 2d/3d Pixel Based Approach Glyph Approach Geometric Transforms Stacking Approach Scatterplot SPLOM Trellis / Facets Multiple View Star plots Stick Figure Chernoff Faces Color Icons Parallel Coord Table lens Star Coords Tours Space Filling Pixel Bar Chart Spiral Technique Treemaps Dimensional Stacking Hierarchical Axis
  39. 39. Multi Dimensional Viz Standard 2d/3d Pixel Based Approach Glyph Approach Geometric Transforms Stacking Approach Scatterplot SPLOM Trellis / Facets Multiple View Star plots Stick Figure Chernoff Faces Color Icons Parallel Coord Table lens Star Coords Tours Space Filling Pixel Bar Chart Treemaps Dimensional Stacking Hierarchical Axis Need for Interaction Ease of Interpretation Spiral Technique
  40. 40. Diamonds dataset 50K+ observations of 10 dimensions
  41. 41. Diamonds dataset 50K+ observations of 10 dimensions Price of diamonds is related to the 4C’s price in US$ carat weight (⅕ of a gram) cut 5 levels [Fair to ideal] colour 7 levels [J to D] clarity 8 levels [I1 to IF]
  42. 42. Diamonds dataset 50K+ observations of 10 dimensions z depth table width z y x x length mm y width mm z height mm depth z depth % table table width %
  43. 43. Diamonds dataset price carat cut color clarity x y z depth table 326 0.23 Ideal E SI2 3.95 3.98 2.43 61.5 55 326 0.21 Premium E SI1 3.89 3.84 2.31 59.8 61 327 0.23 Good E VS1 4.05 4.07 2.31 56.9 65 334 0.29 Premium I VS2 4.2 4.23 2.63 62.4 58 335 0.31 Good J SI2 4.34 4.35 2.75 63.3 58 336 0.24 Very Good J VVS2 3.94 3.96 2.48 62.8 57 50K+ observations of 10 dimensions
  44. 44. 2d 1d x x y
  45. 45. Chart Options Points Bars Lines Areas 1d Quantitative 1d Categorical 2d Quantitative + Categorical 2d Categorical + Categorical 2d Quantitative + Quantitative
  46. 46. Chart Options Points Bars Lines Areas 1d Quantitative Strip Plot Histogram Freq Poly Density Plots 1d Categorical Dot Plot Bar Chart Avoid Avoid 2d Quantitative + Categorical Strip Plot Box Plot Freq Poly Density Plots 2d Categorical + Categorical Avoid Bar Chart Avoid Mosaic Plot 2d Quantitative + Quantitative Scatter Plot Table Lens Slopegraph Avoid
  47. 47. 2d Scatter Plot
  48. 48. 2d Scatter Plot Interaction: Annotation (4.13, 17329) (4.50, 18531) (5.01, 18081)
  49. 49. 2d Scatter Plot log transformation
  50. 50. 2d Scatter Plot Select or Filter Area of Interest Carat > 1, Price > 10,000
  51. 51. 2d Scatter Plot Interaction: Pan & Zoom
  52. 52. 3d z x y
  53. 53. Use aesthetic for 3d Size Color Shape
  54. 54. 3d Scatter Plot Size for Quantitative Dim
  55. 55. 3d Scatter Plot Color for Categorical Dim
  56. 56. 3d Scatter Plot Shapes don’t scale well
  57. 57. 3d Scatter Plot depth persp not good
  58. 58. 3d Scatter Plot Interaction: Rotation
  59. 59. 4d to 6d z x y w v u
  60. 60. 4d Bubble Plot Color and Size
  61. 61. 5d Bubble Plot Color, Size and Time The Joy of Stat - Hans Rosling
  62. 62. Trellis / Facets Create Small Multiples
  63. 63. Trellis / Facet Grid Create Small Multiples
  64. 64. SPLOM Scatterplot Matrix Price Carat Table Depth
  65. 65. Subplots Binned Plot Distribution
  66. 66. Multiple View Create Many Small Charts
  67. 67. Multiple View Interaction: Brushing & Linking
  68. 68. 6d & more z x y w v u
  69. 69. Star Stick Chernoff Icon based Approach
  70. 70. Star Plot Matrix Layout color clarity depth cut table
  71. 71. Star Plot Plot on X-Y location color clarity depth cut table
  72. 72. Orthogonal Parallel
  73. 73. Parallel Coord Interaction: Sorting
  74. 74. Parallel Coord Interaction: Selection
  75. 75. Table Plot Interaction: Bin & Sort
  76. 76. Table Plot Interaction: Zoom & Filter
  77. 77. Stacked Interaction: Brushing Mosaic Plot cut, color and clarity Other example - Treemaps
  78. 78. Star Coordinates Tours & Projections Tourr PackageEser Kandogan Geometric Transforms
  79. 79. Spiral Pixel Curve Pixel Bar Chart Pixel Bar Chart - KeimVisDB - Keim Pixel Based Approach
  80. 80. Data Viz Process (Wide Data) Acquire Data Encode Shape Select Scales Render Algorithm Parse Variables Filter Data Aggregate Data Make Views Add Interactivity
  81. 81. Data Viz Process (Wide Data) Acquire Data Encode Shape Select Scales Render Algorithm Parse Variables Filter Data Aggregate Data Make Views Add Interactivity 1. Encode wisely 2. Use space and multiples 3. Add interactivity 4. Reduce dimensions
  82. 82. Code for these Slides https://github.com/amitkaps/multidim R libraries ❖ ggplot2 ❖ GGally ❖ ggsubplot ❖ scales ❖ iplots/Mondrian ❖ ggvis ❖ tourr ❖ rgl ❖ scatterplot3d ❖ dplyr ❖ tabplot ❖ grid ❖ gridExtra
  83. 83. “The greatest value of a picture is when it forces us to notice what we never expected to see” John Tukey
  84. 84. Amit Kapoor @amitkaps amitkaps.com narrativeviz.com Data Visual Story *

×