Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

1030 track 1 wilson_using his laptop

119 views

Published on

#EMSNYCDAY1

Published in: Marketing
  • Be the first to comment

  • Be the first to like this

1030 track 1 wilson_using his laptop

  1. 1. Chi Square You to Dive into Statistics Tim Wilson Analytics Demystified @tgwilson | #eMetrics
  2. 2. Welcome to the AI Track! @tgwilson | #eMetrics
  3. 3. NOT a Smurf NOTa data scientist NOTa statistician @tgwilson | #eMetrics
  4. 4. If you are a web analyst… …you will need to start thinking about data differently. @tgwilson | #eMetrics
  5. 5. If you are a data scientist… …you will need recognize where web analysts and digital marketers are coming from. @tgwilson | #eMetricsurce: DARPA
  6. 6. The dream… @tgwilson | #eMetrics “I’ve fully optimized your marketing, Dave.”
  7. 7. The dream… Artificial Intelligence! @tgwilson | #eMetrics “I’ve fully optimized your marketing, Dave.” ?
  8. 8. @tgwilson | #eMetrics “In God we trust; all others bring data.” - W. Edwards Deming
  9. 9. @tgwilson | #eMetrics “In God we trust; all others bring data.” - W. Edwards Deming
  10. 10. @tgwilson | #eMetrics “Marketing (and analytics) is inherently decision-making in conditions of uncertainty.” - Matt Gershoff
  11. 11. @tgwilson | #eMetrics “Marketing (and analytics) is inherently decision-making in conditions of uncertainty. There is a cost to reducing uncertainty.” - Matt Gershoff
  12. 12. @tgwilson | #eMetrics “Marketing (and analytics) is inherently decision-making in conditions of uncertainty. There is a cost to reducing uncertainty. Uncertainty cannot be entirely eliminated.” - Matt Gershoff
  13. 13. This is one definition of statistical thinking. @tgwilson | #eMetrics
  14. 14. Statistics ≠ Machine Learning ≠ Artificial Intelligence @tgwilson | #eMetrics
  15. 15. @tgwilson | #eMetrics Statistical Thinking Machine Learning Artificial Intelligence
  16. 16. Let’s dive deep into one simple example. @tgwilson | #eMetrics
  17. 17. How much traffic did we get last month? @tgwilson | #eMetrics
  18. 18. 163,236 sessions @tgwilson | #eMetrics
  19. 19. Let’s trend that! @tgwilson | #eMetrics
  20. 20. @tgwilson | #eMetrics How did that traffic break down by channel?
  21. 21. Simple: Channel Overview @tgwilson | #eMetrics
  22. 22. Do sessions really differ by channel? @tgwilson | #eMetrics
  23. 23. How many rows of data? @tgwilson | #eMetrics 6!
  24. 24. Ask a data scientist… @tgwilson | #eMetrics 163,236
  25. 25. Ask a data scientist… @tgwilson | #eMetrics 163,236
  26. 26. Ask a data scientist… @tgwilson | #eMetrics
  27. 27. This is the difference between aggregation and observations. @tgwilson | #eMetrics
  28. 28. Web analytics platforms have conditioned us to work with aggregated data. @tgwilson | #eMetrics
  29. 29. BUT @tgwilson | #eMetrics
  30. 30. They have started implicitly acknowledging the need for observation-level detail. @tgwilson | #eMetrics
  31. 31. Adobe Analytics Data Feeds @tgwilson | #eMetrics
  32. 32. Google Analytics à Google BigQuery @tgwilson | #eMetrics
  33. 33. Back to our example @tgwilson | #eMetrics 6 vs. 163,236
  34. 34. Is there a middle ground? @tgwilson | #eMetrics 6 163,236 Maximum aggregation Maximum detail ?
  35. 35. We can go from 6 to 186… @tgwilson | #eMetrics
  36. 36. Yes, there is a middle ground. @tgwilson | #eMetrics 6 163,236 Maximum aggregation Maximum detail
  37. 37. From 6 to 186 by adding a dimension. @tgwilson | #eMetrics 6 163,236 Maximum aggregation Maximum detail 186 Add date as a dimension
  38. 38. This is less aggregated data. @tgwilson | #eMetrics 6 163,236 Maximum aggregation Maximum detail 186 Add date as a dimension
  39. 39. That’s enough to dig in a bit. @tgwilson | #eMetrics
  40. 40. Let’s look at the variability in the data. @tgwilson | #eMetrics
  41. 41. Overall… we see some variability @tgwilson | #eMetrics
  42. 42. And…we see variability within each channel @tgwilson | #eMetrics
  43. 43. What if we ignore the “time” aspect of “date?” @tgwilson | #eMetrics
  44. 44. We can now make a histogram! @tgwilson | #eMetrics
  45. 45. Or…a density plot. @tgwilson | #eMetrics
  46. 46. We can make a histogram for each channel. @tgwilson | #eMetrics
  47. 47. Which means a density plot for each channel. @tgwilson | #eMetrics
  48. 48. Back to our overall distribution. @tgwilson | #eMetrics
  49. 49. Let’s flip it. @tgwilson | #eMetrics
  50. 50. And squish and rotate it. @tgwilson | #eMetrics
  51. 51. And squish and rotate it. @tgwilson | #eMetrics
  52. 52. And then break out each channel @tgwilson | #eMetrics
  53. 53. And then break out each channel @tgwilson | #eMetrics
  54. 54. We can do an ANalysis Of VAriance @tgwilson | #eMetrics
  55. 55. Also known as…a 1-Way ANOVA @tgwilson | #eMetrics
  56. 56. Also known as…a 1-Way ANOVA @tgwilson | #eMetrics
  57. 57. We can reject the null hypothesis! @tgwilson | #eMetrics
  58. 58. @tgwilson | #eMetrics @#$&*?!!!!
  59. 59. We can be reasonably confident that there is a “real” difference in sessions across different channels. @tgwilson | #eMetrics
  60. 60. We can dig in further! @tgwilson | #eMetrics Results of a Tukey post hoc test I’m not really qualified to do this with confidence
  61. 61. Let’s do a quick little visual check. @tgwilson | #eMetrics
  62. 62. Go figure! @tgwilson | #eMetrics
  63. 63. I know. We could see most of this visually. @tgwilson | #eMetrics
  64. 64. But this was a simple example mainly for illustrative purposes. @tgwilson | #eMetrics
  65. 65. Things can easily get more involved. @tgwilson | #eMetrics Sessions by Channel Sessions by Device Type ?
  66. 66. Things can easily get more involved. @tgwilson | #eMetrics Sessions by Channel Sessions by Device Type 2-Way ANOVA
  67. 67. Many statistical techniques require moving away from aggregation and towards observations. @tgwilson | #eMetrics
  68. 68. Time is something we aggregate all the…time. @tgwilson | #eMetrics
  69. 69. Let’s look at another example. @tgwilson | #eMetrics
  70. 70. Did the metric move enough for me to care? @tgwilson | #eMetrics
  71. 71. Last week’s decline: signal or noise? @tgwilson | #eMetricsbit.ly/hw-forecast
  72. 72. Start by looking at the data by day @tgwilson | #eMetricsbit.ly/hw-forecast
  73. 73. Let’s break that data into two groups. @tgwilson | #eMetricsbit.ly/hw-forecast
  74. 74. We have the data of interest… @tgwilson | #eMetricsbit.ly/hw-forecast
  75. 75. …and data for context. @tgwilson | #eMetricsbit.ly/hw-forecast
  76. 76. …or “training data.” @tgwilson | #eMetricsbit.ly/hw-forecast
  77. 77. Let’s take our training data… @tgwilson | #eMetricsbit.ly/hw-forecast
  78. 78. …and “decompose” it. @tgwilson | #eMetricsbit.ly/hw-forecast
  79. 79. …and “decompose” it. @tgwilson | #eMetricsbit.ly/hw-forecast
  80. 80. We have seasonality on a 7-day cycle. @tgwilson | #eMetrics = bit.ly/hw-forecast
  81. 81. A moving average reveals trending. @tgwilson | #eMetrics = + bit.ly/hw-forecast
  82. 82. And what’s left is just noise. @tgwilson | #eMetrics = + + bit.ly/hw-forecast
  83. 83. Trend + Seasonality = Forecast @tgwilson | #eMetricsbit.ly/hw-forecast
  84. 84. The forecast won’t be perfect. @tgwilson | #eMetricsbit.ly/hw-forecast
  85. 85. Historical “noise” à a prediction interval @tgwilson | #eMetricsbit.ly/hw-forecast
  86. 86. We can compare to the actual results. @tgwilson | #eMetricsbit.ly/hw-forecast
  87. 87. Now we have meaningful context. @tgwilson | #eMetricsbit.ly/hw-forecast
  88. 88. Now we have meaningful context. @tgwilson | #eMetricsbit.ly/hw-forecast
  89. 89. What this can look like in the real world. @tgwilson | #eMetricsbit.ly/hw-forecast
  90. 90. And…drilling down and summarizing @tgwilson | #eMetrics
  91. 91. Mark Edmondson’s Pre-/Post- Analysis Using a Bayesian Structural Time-Series Method @tgwilson | #eMetricsbit.ly/ga-effect
  92. 92. Of course… @tgwilson | #eMetrics
  93. 93. …there’s more. @tgwilson | #eMetrics
  94. 94. A LOT more! @tgwilson / #eMetricsFlickr / Neil Piddock ANOVA Holt-Winters Forecasting Bayesian Structural Time-Series Crosstab with a !2 Test Correlation Regression Median Absolute Deviation (MAD) K-Means Clustering Is there a statistically significant difference between these groups (dimensions + metric)? Did that metric move enough for me to care? Can I quantify the impact of a change? ?
  95. 95. @tgwilson | #eMetrics “Marketing (and analytics) is inherently decision-making in conditions of uncertainty. There is a cost to reducing uncertainty. Uncertainty cannot be entirely eliminated.” - Matt Gershoff
  96. 96. Two possible reactions… @tgwilson | #eMetrics
  97. 97. Excitement? @tgwilson | #eMetrics
  98. 98. Terror? @tgwilson | #eMetrics
  99. 99. Both? @tgwilson | #eMetrics
  100. 100. So, now what? @tgwilson | #eMetrics
  101. 101. Get inspired. @tgwilson | #eMetrics “Most of this sounds a little over-the-horizon and science-fiction-ish, and it is. But it’s only just over the horizon.” - Jim Sterne
  102. 102. Join the discussion. @tgwilson | #eMetrics http://join.measure.chat # data-science
  103. 103. (Maybe) Give it a try. @tgwilson | #eMetrics
  104. 104. Slack/Twitter: @tgwilson LinkedIn: linkedin.com/in/tgwilson Email: tim@analyticsdemystified.com Podcast: analyticshour.io @tgwilson | #eMetrics Thank you! Measure Slack: join.measure.chat Holt-Winters Forecasting: bit.ly/hw-forecast GA Effect: bit.ly/ga-effect Learn R: dartistics.com

×