Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016

Things we will cover
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 2
GOAL
Explain Cloud IoT, its challenges, and a
principled, agile approach to prediction amidst
uncertainty in such a way that people from a
broad audience can (hopefully) relate.
WILL
►  IoT, Cloud landscape, and CTL
►  Prediction Lifecycle
►  Challenges by business domain
►  Data Science Lessons Learned
WILL NOT
►  Big Data
►  Architecture
►  Algorithms
►  Technology

WHO WE ARE

Who I am
I am interested in creating intelligent systems
through incorporating humans and machines in
an active learning loop.
►  Decision Scientist with PhD in HCI from Iowa
State
►  Principal Data Scientist for CenturyLink Cloud
►  Curricular Design, Educational Technology,
Online Advertising, Online Retail, Big Data
UX, Cloud, IoT, Physics
►  Hiking, Data journalism, Stocks, Horse Racing
ryankirk.info

Who we are: CenturyLink Cloud
+ ++
CLOUD COLOCATION NETWORK MANAGED
SERVICES

What is IoT
Human desire to connect ourselves to
each other via technology
►  Modern plumbing…
►  Telegraph ! Telephone
►  Telephone ! Dial-up
►  Dial-up ! HSN
►  HSN ! WAN
►  WAN ! IoT
Human desire to connect ourselves to
each other via technology to empower
each other

Internet growth > Hardware growth
motherboard.vice.com
newscientist.com

CenturyLink Cloud IoT Advantage
►  37 states
►  550,000 miles of network
►  Innovative Gigabit
fiber network
►  25MM+ consumer
endpoints
►  60+ DCS

PROBLEM
STATEMENT

Problem statement:
►  Prevent incidents
through early
detection
►  Reduce MTTR by
facilitating root-cause
analytics
►  Facilitate domain
experts and harvest
their knowledge "

GOAL
Build a real-time artificial intelligence
capable of analyzing all incoming
streams of data in order to know
which actions our machines need to
automatically take.
It’s simple, really… build Skynet

PREDICTION
LANDSCAPE

Prediction Adoption Model
Stage I:
INTRODUCTION
1. Design
2. Measure
Stage III:
MATURITY
5. Predict
6. Act TIME
SOPHISTICATION
INTRO GROWTH MATURITY DECLINE
Stage II:
GROWTH
3. Describe
4. Detect
Stage IV:
DECLINE
7. Feedback
8. Obsolescence

Prediction Adoption Model (actual)
TIME
SOPHISTICATION
CHECK
THIS
OUT
OH NO,
OH NO,
OH NO!
HAHA,
IT
WORKED!
I NEVER
SAID IT
WOULD …
Stage I:
CHECK
THIS OUT
1. It runs
2. Results are
promising
Stage III:
HAHA,
IT WORKED!
5. I surprise myself
sometimes
6. I found a
shortcut to scale it
Stage II:
OH NO, OH NO,
OH NO!
3. It works but it’s
terrible
4. It will never scale
Stage IV:
I NEVER SAID
IT WOULD…
7. How do I prove it is
still working?
8. There is no way to
apply it to this scenario

Stage I: INTRODUCTION
1. Design
►  What should we measure?
►  What are the core business
processes?
►  What is the unit of analysis?
►  What are our research questions/
hypotheses?
2. Measure
►  Do we push or pull?
►  How often should we measure?
►  How long do we need the data?
►  How do we represent the data
schema?

Stage II: GROWTH
3. Describe
►  Which metrics relate to our
outcomes of interest?
►  What is the typical value of each
metric?
►  How do you visualize each
metric?
4. Detect
►  What do we expect to happen?
►  Which values/events are
unexpected?
►  When should we alert?
►  How will we scale our analysis?

Stage III: MATURITY
7. Predict
►  Are there patterns?
►  Are there more complex
relationships?
►  What is going to happen?
►  How do we get training data?
6. Act
►  What actions should we take?
►  How can we incorporate new
outcomes into the current
model?

Stage IV: DECLINE
7. Feedback
►  Is my model primarily basing its
decisions upon its previous
decisions?
►  Can I separate the model from its
parameters?
►  Can I still evaluate accuracy?
8. Obsolescence
►  Are my business scenarios still
grounded?
►  Do my model assumptions still hold?
►  Does it still scale?
►  Is the intervention still needed?

Domain process involvement
BUSINESS
►  Is involved early
in defining
requirements
ENGINEERING
►  Builds MVP
►  Solidifies solution
RESEARCH
►  Builds prototype
and suggests
solution

SOLUTION

Working backwards
ITEM
1 Skynet
2 Action mapping
3 Action landscape
4 Prediction
5 Categorical learning
6 Training Data
7 Feedback loop
8 High SNR
9 Unsupervised learning
10 Anomaly Detection
11 Normalization
12 Retention
13 Sampling
14 Collection
15 Approach
16 Domain model
“In life, unless you’re more gifted than
Einstein, inversion [i.e. working
backwards] will help you solve
problems.”
Charlie Munger

Working backwards (cont.)
ITEM STAGE
1 Skynet ACT
2 Action mapping ACT
3 Action landscape ACT
4 Prediction PREDICT
5 Categorical learning PREDICT
6 Training Data PREDICT
7 Feedback loop PREDICT
8 High SNR DETECT
9 Unsupervised learning DETECT
10 Anomaly Detection DETECT
11 Normalization DESCRIBE
12 Retention DESCRIBE
13 Sampling MEASURE
14 Collection MEASURE
15 Approach DESIGN
16 Domain model DESIGN
TIME
SOPHISTICATION
INTRO GROWTH MATURITY DECLINE

Working backwards (cont.)
ITEM STAGE PRIMARY DOMAIN
1 Skynet ACT ENGINEERING
2 Action mapping ACT BUSINES
3 Action landscape ACT RESEARCH
4 Prediction PREDICT RESEARCH
5 Categorical learning PREDICT RESEARCH
6 Training Data PREDICT ENGINEERING
7 Feedback loop PREDICT BUSINESS
8 High SNR DETECT RESEARCH
9 Unsupervised learning DETECT RESEARCH
10 Anomaly Detection DETECT RESEARCH
11 Normalization DESCRIBE RESEARCH
12 Retention DESCRIBE ENGINEERING
13 Sampling MEASURE RESEARCH
14 Collection MEASURE ENGINEERING
15 Approach DESIGN RESEARCH
16 Domain model DESIGN BUSINESS

This is a WIP
ITEM STAGE PRIMARY DOMAIN
1 Skynet ACT ENGINEERING
2 Action mapping ACT BUSINES
3 Action landscape ACT RESEARCH
4 Prediction PREDICT RESEARCH
5 Categorical learning PREDICT RESEARCH
6 Training Data PREDICT ENGINEERING
7 Feedback loop PREDICT BUSINESS
8 High SNR DETECT RESEARCH
9 Unsupervised learning DETECT RESEARCH
10 Anomaly Detection DETECT RESEARCH
11 Normalization DESCRIBE RESEARCH
12 Sampling MEASURE RESEARCH
13 Collection MEASURE ENGINEERING
14 Domain model DESIGN BUSINESS
QUEUED
(StampedCon 2017?)
WORKING
PRODUCTION

LESSONS
LEARNED

16. DOMAIN MODEL
►  938,076 metrics
►  Verify the unique stream of
data across systems
►  Key-based
DESIGN

15. APPROACH
VARIABILITY
►  Changes in observed state
►  Plan for variability
UNCERTAINTY
►  Unobserved state(s)
►  Design for uncertainty
DESIGN (cont.)

14. COLLECTION
►  Agreement of signals
►  Cacophony of
signals
►  How often should we
measure?
►  We have no labeled
training data
►  An approach we
can build upon in the
future
MEASURE

13. SAMPLING
Shannon-Nyquist Paradox
►  The more you measure
something the more it varies
►  Bias related to time and
variability
►  EG. Temperature yesterday
was 68 degrees
MEASURE (cont.)

12. RETENTION
►  Recall that precision relates to
sampling consistency
►  Not all metrics are created
equal
►  Coverage remains
problematic
DESCRIBE

11. NORMALIZATION
Kievit, R.A., Frankenhuis, et al. (2013). Simpson’s paradox in
psychological science. Frontiers in Psychology
Simpson’s Paradox
►  aggregate trend != sum of
individual trends
►  Applies to all aggregates:
sums, averages, correlations,
etc.
►  What is the unit of analysis?
DESCRIBE (cont.)

26-Jul-16 32
Predicted
CenturyLink Confidential
Actual Boundary
10. ANOMALY DETECTION
►  Capture the time series data
for each piece of connected
platform technology
►  Find implicit anomalies within a
time series vector
►  Values that are surprising
►  Highly scalable
DETECT
presented by Ryan Kirk at StampedeCon 2016

►  Time series data shows
the context behind
anomalies that co-occur
►  Group anomalous
vectors based upon
structural properties and
co-occurrence
►  Up-level anomalies into
higher-order alerts using
contextual information
9. UNSUPERVISED
LEARNING
DETECT (cont.)
8. HIGH SNR

►  We have also built a search
engine for time series data
that allows us to build cool
looking graphs in real-time
►  We basically do all of this to
empower slack alerts
►  Allows tags to propagate
forwards
7. FEEDBACK LOOP PREDICT

6. TRAINING DATA
►  Evaluate ALL assumptions
in regards to training data
►  Ideally use active learning
approach or risk
becoming tautological
PREDICT (cont.)

RESULTS

Prediction Results
►  38,392,438 predictions every 24hr.
►  Anomaly rate < 0.01% (0.0001)
~3K anomalies/day
►  Accuracy is ~90%
►  Prediction latency ~3.0 seconds
►  ~30 Higher order alerts/day

Want to join me?
Let’s connect:
►  @ryan_kirk
Try CenturyLink Cloud free:
►  ctl.io
We are hiring
►  ctl.io/careers/jobs
Thanks to:
►  StampedeCon2016
►  pixabay.com

Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016

Similar to Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016 (20)

More from StampedeCon

More from StampedeCon (20)

Recently uploaded

Recently uploaded (20)

Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyzing - StampedeCon 2016