Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Loading in …3
×
1 of 49

Big data camp la futures so bright tim-shea

1

Share

Download to read offline

Big Data Camp LA 2014, • The Future's so bright (You can barely make any predictions about it) by Timothy Shea of DataSift

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Big data camp la futures so bright tim-shea

  1. 1. Powering The Social Economy
  2. 2. How do we Make Good Forecasts?
  3. 3. The Architecture vs The Practice(aka: Form vs Function) Platforms for Big Data storage, processing & analytics. VS Actual applications of Data-at-Scale
  4. 4. Themes for This Morning How DataSift Manages, Processes & Delivers Data Visualization via Tableau Causal Inference & Statistical Modeling Movies & Coffee
  5. 5. Who am I? Tim Shea @SheaNineSeven Data Scientist & Sales Engineer at DataSift
  6. 6. Focus on Alliances & Channels: Tableau, Alteryx, Microstrategy, Informatica, SAP Data Science as a Practice: Disambiguation, Classification, Causality
  7. 7. What is DataSift? Social Data Platform Full “Firehose” Access 2 Billion Posts per Day ½ Trillion Posts Historical Archive
  8. 8. Really Intense Architecture Diagram
  9. 9. We Make it Simple for You Focus on Filtering Big Data < Relevant Data Enrichments: - Demographics - Links - Emotion & Intent - Learned Classification
  10. 10. Demo
  11. 11. DataSift: Beyond “Social Listening” Ex. “Does Social have anything to do with my Business?” Line Charts and Graphs Vs Operationalized Decision Making
  12. 12. “The Enterprise” DataSift Enterprise customers are building: 1.  Demand Forecasting 2.  Critical Event Processing 3.  Market Segmentation/Statistical Classification 4.  Establishing Correlative Relationships(**)
  13. 13. Causality
  14. 14. Necessary…Connection? Does Event A cause Event B?
  15. 15. Fighting Crime…Fights Crime(?)
  16. 16. Does The Past have anything at all to do with The Future?
  17. 17. Defending Your Hypotheses How can I create & defend my Hypotheses? How do I communicate my findings to Laypeople (non-Data Scientists) like your Boss?
  18. 18. Risk Management in Hollywood
  19. 19. Movies Through the Lens of: DataSift - What we do as a Social Data Platform Tableau - How to Make Sense of a Mountain of Data Good Data & Good Tools
  20. 20. Risk Management is Hard Q: What is a “Sure Bet”? Q: Should I spend $100MM making this movie? Q: How can I make this process less risky?
  21. 21. Enter DataSift & Tableau
  22. 22. Example Return Every: Tweet Facebook Post Instagram Photo Bitly Click For What? Every single Movie released in 2013
  23. 23. Compare it With
  24. 24. Tableau
  25. 25. What Data do we Have?
  26. 26. 1. Intuition
  27. 27. 2. Social => Box Correlation?
  28. 28. 3. Prove It
  29. 29. 4. Defend the Model
  30. 30. The Model Y = a + bX Y = Box Office (the predicted) X = Social Volume (the predictor) B = Coefficient A = Some offset
  31. 31. Defend the Model v1 P-value: There is an X% chance that the Null Hypothesis is true. Null Hypothesis: The linear coefficient is equal to zero.
  32. 32. Defend the Model v2 P-value (again): We can be (100 – X)% confident that the correlation were seeing can be explained by our model. R-Squared: Our model explains about Y% of the variability (points outside the regression line) given “Sum of Least Squared”
  33. 33. Defend the Model v3 Every Bitly click predicts about $240 in Box Office Sales I’m extremely confident (99%) that this is not due to chance. With ~96% confidence we can rely on this model in the future.
  34. 34. The Model (cont) Y “is predicted by” a + bX Box Office = 0 + $240 * (# bitly clicks) Box Office = 0 + $130 * (# tweets)
  35. 35. Benchmarking If my Bitly #’s drop below $240 If my Twitter #’s drop below $130 If my Instagram #’s drop below $2809 If my Facebook #’s drop below $3871
  36. 36. Other Considerations
  37. 37. Other Considerations Residuals Other Regression (Logarithmic, Exponential, Polynomial) “Overfitting”
  38. 38. Additional Dimensions DataSift Social Data: Gender Income Geography “Influence” Industry vs Consumers
  39. 39. Getting Started tim.shea@datasift.com @sheanineseven http://bit.ly/DataSiftBigDataCamp
  40. 40. Thanks for Listening!

×