Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016

544 views

Published on

The collection and use of Big Data has become an important part of modern business practice. The Internet of Things (IoT) movement promises to provide new opportunities for businesses interested in the intersection of people and technology. It is also wrought with pitfalls for practitioners and researchers who struggle to make sense of an increasing cacophony of signals. How should they poll and collect data from millions of signals in a way that is manageable, scalable, and statistically valid? How should they analyze and predict using these data? This presentation will discuss these challenges with applied examples from monitoring and managing one of the world’s largest computers.

Published in: Technology
  • Be the first to comment

Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016

  1. 1. Things we will cover 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 2 GOAL Explain Cloud IoT, its challenges, and a principled, agile approach to prediction amidst uncertainty in such a way that people from a broad audience can (hopefully) relate. WILL ►  IoT, Cloud landscape, and CTL ►  Prediction Lifecycle ►  Challenges by business domain ►  Data Science Lessons Learned WILL NOT ►  Big Data ►  Architecture ►  Algorithms ►  Technology
  2. 2. WHO WE ARE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 3
  3. 3. Who I am 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 4 I am interested in creating intelligent systems through incorporating humans and machines in an active learning loop. ►  Decision Scientist with PhD in HCI from Iowa State ►  Principal Data Scientist for CenturyLink Cloud ►  Curricular Design, Educational Technology, Online Advertising, Online Retail, Big Data UX, Cloud, IoT, Physics ►  Hiking, Data journalism, Stocks, Horse Racing ryankirk.info
  4. 4. Who we are: CenturyLink Cloud 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 5 + ++ CLOUD COLOCATION NETWORK MANAGED SERVICES
  5. 5. What is IoT 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 6 Human desire to connect ourselves to each other via technology ►  Modern plumbing… ►  Telegraph ! Telephone ►  Telephone ! Dial-up ►  Dial-up ! HSN ►  HSN ! WAN ►  WAN ! IoT Human desire to connect ourselves to each other via technology to empower each other
  6. 6. Internet growth > Hardware growth 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 7 motherboard.vice.com newscientist.com
  7. 7. CenturyLink Cloud IoT Advantage ►  37 states ►  550,000 miles of network ►  Innovative Gigabit fiber network ►  25MM+ consumer endpoints ►  60+ DCS 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 8
  8. 8. PROBLEM STATEMENT 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 9
  9. 9. Problem statement: 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 10 ►  Prevent incidents through early detection ►  Reduce MTTR by facilitating root-cause analytics ►  Facilitate domain experts and harvest their knowledge "
  10. 10. GOAL 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 11 Build a real-time artificial intelligence capable of analyzing all incoming streams of data in order to know which actions our machines need to automatically take. It’s simple, really… build Skynet
  11. 11. PREDICTION LANDSCAPE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 12
  12. 12. Prediction Adoption Model 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 13 Stage I: INTRODUCTION 1. Design 2. Measure Stage III: MATURITY 5. Predict 6. Act TIME SOPHISTICATION INTRO GROWTH MATURITY DECLINE Stage II: GROWTH 3. Describe 4. Detect Stage IV: DECLINE 7. Feedback 8. Obsolescence
  13. 13. Prediction Adoption Model (actual) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 14 TIME SOPHISTICATION CHECK THIS OUT OH NO, OH NO, OH NO! HAHA, IT WORKED! I NEVER SAID IT WOULD … Stage I: CHECK THIS OUT 1. It runs 2. Results are promising Stage III: HAHA, IT WORKED! 5. I surprise myself sometimes 6. I found a shortcut to scale it Stage II: OH NO, OH NO, OH NO! 3. It works but it’s terrible 4. It will never scale Stage IV: I NEVER SAID IT WOULD… 7. How do I prove it is still working? 8. There is no way to apply it to this scenario
  14. 14. Stage I: INTRODUCTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 15 1. Design ►  What should we measure? ►  What are the core business processes? ►  What is the unit of analysis? ►  What are our research questions/ hypotheses? 2. Measure ►  Do we push or pull? ►  How often should we measure? ►  How long do we need the data? ►  How do we represent the data schema?
  15. 15. Stage II: GROWTH 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 16 3. Describe ►  Which metrics relate to our outcomes of interest? ►  What is the typical value of each metric? ►  How do you visualize each metric? 4. Detect ►  What do we expect to happen? ►  Which values/events are unexpected? ►  When should we alert? ►  How will we scale our analysis?
  16. 16. Stage III: MATURITY 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 17 7. Predict ►  Are there patterns? ►  Are there more complex relationships? ►  What is going to happen? ►  How do we get training data? 6. Act ►  What actions should we take? ►  How can we incorporate new outcomes into the current model?
  17. 17. Stage IV: DECLINE 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 18 7. Feedback ►  Is my model primarily basing its decisions upon its previous decisions? ►  Can I separate the model from its parameters? ►  Can I still evaluate accuracy? 8. Obsolescence ►  Are my business scenarios still grounded? ►  Do my model assumptions still hold? ►  Does it still scale? ►  Is the intervention still needed?
  18. 18. Domain process involvement 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 19 BUSINESS ►  Is involved early in defining requirements ENGINEERING ►  Builds MVP ►  Solidifies solution RESEARCH ►  Builds prototype and suggests solution
  19. 19. SOLUTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 20
  20. 20. Working backwards 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 21 ITEM 1 Skynet 2 Action mapping 3 Action landscape 4 Prediction 5 Categorical learning 6 Training Data 7 Feedback loop 8 High SNR 9 Unsupervised learning 10 Anomaly Detection 11 Normalization 12 Retention 13 Sampling 14 Collection 15 Approach 16 Domain model “In life, unless you’re more gifted than Einstein, inversion [i.e. working backwards] will help you solve problems.” Charlie Munger
  21. 21. Working backwards (cont.) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 22 ITEM STAGE 1 Skynet ACT 2 Action mapping ACT 3 Action landscape ACT 4 Prediction PREDICT 5 Categorical learning PREDICT 6 Training Data PREDICT 7 Feedback loop PREDICT 8 High SNR DETECT 9 Unsupervised learning DETECT 10 Anomaly Detection DETECT 11 Normalization DESCRIBE 12 Retention DESCRIBE 13 Sampling MEASURE 14 Collection MEASURE 15 Approach DESIGN 16 Domain model DESIGN TIME SOPHISTICATION INTRO GROWTH MATURITY DECLINE
  22. 22. Working backwards (cont.) 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 23 ITEM STAGE PRIMARY DOMAIN 1 Skynet ACT ENGINEERING 2 Action mapping ACT BUSINES 3 Action landscape ACT RESEARCH 4 Prediction PREDICT RESEARCH 5 Categorical learning PREDICT RESEARCH 6 Training Data PREDICT ENGINEERING 7 Feedback loop PREDICT BUSINESS 8 High SNR DETECT RESEARCH 9 Unsupervised learning DETECT RESEARCH 10 Anomaly Detection DETECT RESEARCH 11 Normalization DESCRIBE RESEARCH 12 Retention DESCRIBE ENGINEERING 13 Sampling MEASURE RESEARCH 14 Collection MEASURE ENGINEERING 15 Approach DESIGN RESEARCH 16 Domain model DESIGN BUSINESS
  23. 23. This is a WIP 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 24 ITEM STAGE PRIMARY DOMAIN 1 Skynet ACT ENGINEERING 2 Action mapping ACT BUSINES 3 Action landscape ACT RESEARCH 4 Prediction PREDICT RESEARCH 5 Categorical learning PREDICT RESEARCH 6 Training Data PREDICT ENGINEERING 7 Feedback loop PREDICT BUSINESS 8 High SNR DETECT RESEARCH 9 Unsupervised learning DETECT RESEARCH 10 Anomaly Detection DETECT RESEARCH 11 Normalization DESCRIBE RESEARCH 12 Sampling MEASURE RESEARCH 13 Collection MEASURE ENGINEERING 14 Domain model DESIGN BUSINESS QUEUED (StampedCon 2017?) WORKING PRODUCTION
  24. 24. LESSONS LEARNED 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 25
  25. 25. 16. DOMAIN MODEL 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 26 ►  938,076 metrics ►  Verify the unique stream of data across systems ►  Key-based DESIGN
  26. 26. 15. APPROACH 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 27 VARIABILITY ►  Changes in observed state ►  Plan for variability UNCERTAINTY ►  Unobserved state(s) ►  Design for uncertainty DESIGN (cont.)
  27. 27. 14. COLLECTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 28 ►  Agreement of signals ►  Cacophony of signals ►  How often should we measure? ►  We have no labeled training data ►  An approach we can build upon in the future MEASURE
  28. 28. 13. SAMPLING 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 29 Shannon-Nyquist Paradox ►  The more you measure something the more it varies ►  Bias related to time and variability ►  EG. Temperature yesterday was 68 degrees MEASURE (cont.)
  29. 29. 12. RETENTION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 30 ►  Recall that precision relates to sampling consistency ►  Not all metrics are created equal ►  Coverage remains problematic DESCRIBE
  30. 30. 11. NORMALIZATION 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 31 Kievit, R.A., Frankenhuis, et al. (2013). Simpson’s paradox in psychological science. Frontiers in Psychology Simpson’s Paradox ►  aggregate trend != sum of individual trends ►  Applies to all aggregates: sums, averages, correlations, etc. ►  What is the unit of analysis? DESCRIBE (cont.)
  31. 31. 26-Jul-16 32 Predicted CenturyLink Confidential Actual Boundary 10. ANOMALY DETECTION ►  Capture the time series data for each piece of connected platform technology ►  Find implicit anomalies within a time series vector ►  Values that are surprising ►  Highly scalable DETECT presented by Ryan Kirk at StampedeCon 2016
  32. 32. 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 33 ►  Time series data shows the context behind anomalies that co-occur ►  Group anomalous vectors based upon structural properties and co-occurrence ►  Up-level anomalies into higher-order alerts using contextual information 9. UNSUPERVISED LEARNING DETECT (cont.) 8. HIGH SNR
  33. 33. 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 34 ►  We have also built a search engine for time series data that allows us to build cool looking graphs in real-time ►  We basically do all of this to empower slack alerts ►  Allows tags to propagate forwards 7. FEEDBACK LOOP PREDICT
  34. 34. 6. TRAINING DATA 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 35 ►  Evaluate ALL assumptions in regards to training data ►  Ideally use active learning approach or risk becoming tautological PREDICT (cont.)
  35. 35. RESULTS 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 36
  36. 36. Prediction Results 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 37 ►  38,392,438 predictions every 24hr. ►  Anomaly rate < 0.01% (0.0001) ~3K anomalies/day ►  Accuracy is ~90% ►  Prediction latency ~3.0 seconds ►  ~30 Higher order alerts/day
  37. 37. Want to join me? Let’s connect: ►  @ryan_kirk Try CenturyLink Cloud free: ►  ctl.io We are hiring ►  ctl.io/careers/jobs Thanks to: ►  StampedeCon2016 ►  pixabay.com 26-Jul-16presented by Ryan Kirk at StampedeCon 2016 38

×