Image credit: Frax (fract.al)
DATA SCIENCE
Seeing what your
eyes can’t
Robert Rouse
Image credit: Frax (fract.al)
Robert Rouse
About Me
What is Data Science?
helps us discover and apply
patterns and relationships our
eyes can’t see
Data Science
statistics
predictive analytics
data mining
machine learning
deep learning
algorithms
Types of Analytics
Descriptive – what happened?
Diagnostic – why did it happen?
Predictive – what will happen?
Prescriptive – how can we make it happen?
*categories and definitions according to Gartner
or, what's the best option?
The Power of Data Visualization
Anscombe’s Quartet – raw data
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
The Power of Data Visualization
Anscombe’s Quartet – visualized
Visual Analytics
What action might we take from this view?
A Scientific Mindset
Finding the right questions to ask
What if we can raise
profit by 2% in these
categories?
What if we focus on
items with negative
profit?
The Limits of Data Visualization
Fisher’s Iris Dataset – can you identify the three species?
Descriptive: Clustering
Fisher’s Irises
k-means clusters actual species
Descriptive: Outliers
standard deviation outlier detection
(can be multivariate)
Predictive: Forecasting
Dashboard by Alex Lentz, InterWorks
Prescriptive: What-if & Optimization
Examples from Bora Beran, Tableau R integration demos
Principal Takeaways
• Start with the right questions
• Have a scientific mindset and eagerness for discovery
• Filter out the noise to avoid “analysis paralysis”
• Apply the tools of science to put your data under a microscope
Questions?

Data science see what your eyes can't

  • 1.
    Image credit: Frax(fract.al) DATA SCIENCE Seeing what your eyes can’t Robert Rouse
  • 2.
    Image credit: Frax(fract.al) Robert Rouse About Me
  • 3.
    What is DataScience? helps us discover and apply patterns and relationships our eyes can’t see Data Science statistics predictive analytics data mining machine learning deep learning algorithms
  • 4.
    Types of Analytics Descriptive– what happened? Diagnostic – why did it happen? Predictive – what will happen? Prescriptive – how can we make it happen? *categories and definitions according to Gartner or, what's the best option?
  • 5.
    The Power ofData Visualization Anscombe’s Quartet – raw data I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
  • 6.
    The Power ofData Visualization Anscombe’s Quartet – visualized
  • 7.
    Visual Analytics What actionmight we take from this view?
  • 8.
    A Scientific Mindset Findingthe right questions to ask What if we can raise profit by 2% in these categories? What if we focus on items with negative profit?
  • 9.
    The Limits ofData Visualization Fisher’s Iris Dataset – can you identify the three species?
  • 10.
  • 11.
    Descriptive: Outliers standard deviationoutlier detection (can be multivariate)
  • 12.
  • 13.
    Prescriptive: What-if &Optimization Examples from Bora Beran, Tableau R integration demos
  • 14.
    Principal Takeaways • Startwith the right questions • Have a scientific mindset and eagerness for discovery • Filter out the noise to avoid “analysis paralysis” • Apply the tools of science to put your data under a microscope
  • 15.

Editor's Notes

  • #4 Definitions can be slippery and buzzwords are not much help. There are many things that may fall under the very broad category known as “data science” This is my definition: We can do a lot with data visualization, but it isn't always enough. Data science helps us move beyond the limits of what our eyes can effectively perceive.
  • #5 This is from Gartner’s analytic maturity model There can be a lot of overlap and each builds on the other Visualization limited to first two categories, but often used to convey results of advanced analytics in other categories Data science can apply in any category, as we will see with examples
  • #6 We can run statistics on this and get nearly identical properties for means, correlation, regression, etc.
  • #7 Visualization helps us see the “shape” of the data in ways simple stats can’t
  • #8 Visual analytics draws us to certain conclusions about the data Leading to “actionable intelligence” Thinking about the definition: “What our eyes can’t see” Sometimes we can’t see it because we choose not to, it isn’t on standard reports and dashboards Or, our attention is drawn to the wrong place given strong visual cues (even with best practices in place) What if I told you Tables in the East wasn’t the right place to look?
  • #9 Looking at it different ways: this is still visual analytics, BUT We are applying the principle of questioning, testing conclusions, “experimenting” A scientific mindset is one that does not casually accept canned answers What about “analysis paralysis”? Algorithms and focused, efficient efforts yield value.
  • #10 But what about this data? If you only know some characteristics about a group and need to identify one vs. another, can you do it with visualization alone? You know there are supposed to be 3 groups, but which is which? Ronald Fisher 1936
  • #11 k-means (a common classification technique) correctly identifies most of the species according to their properties This is on only 4 measurements. What if you had more variables/measurements? Often falls into category of predictive b/c you’re “predicting” the classification. Descriptive is prerequisite to predictive Business problem: which types of customers have similar behavior? Given that knowledge, how might shifts in demographics affect your company?
  • #12 Standard deviation can help show outliers in linear trends cut can only deal with one variable at a time. Multivariate outlier detection can test if it’s an outlier given other variables of interest.
  • #13 Tableau’s built-in forecasting is best for getting a quick “feel” for things. More scientific, controllable methods would require R integration
  • #14 Forecasting can be taken to another level by applying “descriptive” models to the future given user input or data about expectations in related areas This enables “what-if” scenarios to inform choices about the optimal scenario R integration also allows optimization toward a target given variables such as allocation of stocks