There has been a shift from big data to live streaming data to facilitate faster data-driven decision making. As the number of live data streams grow—partly a result of the expanding IoT—it is critical to develop techniques to better extract actionable insights.
One current application, anomaly detection, is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in an anomaly fatigue, limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.
In this talk, we explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.
Topics include:
* An overview correlation analysis
* Robust correlation analysis
* Overview of alternative measures, such as co-median
* Trade-offs between speed and accuracy
* Correlation analysis in large dimensions
3. MOBILE DATA TRAFFIC
17% of total IP traffic by 2021
VR/AR Traffic
CAGR of 82% from 2016-2021
ANNUAL GLOBAL IP TRAFFIC
3.3 ZB by 2021
BROADBAND SPEEDS
Reach 53 Mbps by 2021
INTERNET VIDEO SURVEILLANCE
3.4% of all Internet video traffic
by 2021
LIVE INTERNET VIDEO
13% of Internet video traffic
by 2021
THE NUMBERS
3
https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/complete-white-paper-c11-481360.html
4. 4
DATA FUSION
CHARACTERISTICS
DISTRIBUTED
HETEROGENEOUS
NON-LINEAR
INTERNET OF THINGS
Sensors and Actuators
SMART CITIES
SMART HEALTH CARE
POLLUTON MONITORING
INSIGHTS
PROBABILISTIC METHODS
THEORY OF EVIDENCE
MACHINE LEARNING
FUZZY LOGIC
CHALLENGES
DATA: INCONSISTENCY
MISSINGNESS, NOISE
LATENCY
BATTERY CONSUMPTION
INTELLIGENT DECISION MAKING
10. Common representation learning
Example: Across audio and video modalities
in a single live stream
Multimodal
Linear/non-linear correlation
Example: Across univariate time series of
operations data
Unimodal
CORRELATION ANALYSIS
FLAVORS: AT A HIGH LEVEL
10
11. CROSS-MODAL ANALYSIS
Joint Representation of Text-
Audio-Video
INTERPRETABILITY
Comparative analysis of deep
representations
AUTHENTICITY
Live streams vs. Recording
OBJECT IDENTIFICATION
Multiple cameras
MULTI-STREAM
SYNCHRONIZATION
Different vantage points
Unreliable timestamps
LOCALIZATION
Video ⟷ Audio
Audio ⟷ Text
CORRELATION ANALYSIS
WHY BOTHER?
11
13. CORRELATION ANALYSIS
A REAL LIFE EXAMPLE
13
Root cause analysis
Expose investment avenues
Surface optimization opportunities
Risk minimization
Medical diagnosis
Learning
15. 15
Symmetric
Lack of context
Spurious correlations
Non-actionable
Network topology
CORRELATION MATRIX
Scalability
Hundreds of millions of time series
Use cases
Multiple Regression
Discriminant Analysis
Mahalanobis Distance
16. CORRELATION MATRIX
16
* Figure borrowed from [Mueen et al. 2010]
*
Thresholded(=0.5)CorrelationMatrix
350x350CorrelationMatrix
20. STOCHASTICITY
20
Why bother? Multiple flavors
If X, Y are independent, then 𝜌(X, Y) = 0
However, the converse is not true, as only
the first two moments are considered
e.g., 𝜌(X, Y) = 0 even when Y = X2
Not invariant under nonlinear strictly
increasing transformations
Pearson’s correlation only recognizes linear
dependence
var(X) and var(Y) have to be finite
Non-stationary random processes
Stochastic correlation process
Applications
Finance (Brownian motion), biology
Time-varying correlation
Rolling correlation - lagged indicator
Dynamic Conditional Correlation [Engle’02]
Local Correlation [Langnau’09]
Wishart autoregressive process [Gourieroux’09]
Transformations [van Emmerich’06]
arctan and
Modified Jacobi process [Ma’09]
tanh transformation [Teng’16]
22. SPURIOUS CORRELATION
* Figure borrowed from [Anscombe 1973]
*
WHY CONTEXT IS IMPORTANT?
22
Identical r=0.816
Clearly spurious
Linear correlation is perhaps not the right metic
Identical summary statistics
23. *
ROBUSTNESS
r =0.8 r =0.2
r is highly sensitive to slight change, as measured
by Kolmogorov distance, in one of the marginal
distributions
Bivariate Normal
Distribution
Contaminated Normal
Distribution
PEARSON CORRELATION
23
* Figure borrowed from “Introduction to Robust Estimation and Hypothesis Testing” by Rand. R. Wilcox
Influence Function (IF) of Pearson’s Correlation
Unbounded
Pearson’s correlation does not have infinitesimal
robustness
x x
y y
zz
Recall, first order approximation of sample IF of r is:
r-i : correlation coefficient based on all but the ith
observation
24. 24
ROBUSTNESS
*
* Figure borrowed from “Introduction to Robust Estimation and Hypothesis Testing” by Rand. R. Wilcox
Sensitivity to anomalies
Linear relationship between x & y but r = -0.21
PEARSON CORRELATION
Quadrant (signum) correlation coefficient#
# [Blomqvist, 1950] ^[Pasman and Shevlyakov, 1987]
Correlation median estimator^
29. STREAMING CORRELATION
Incremental, One pass
CHARACTERISTICS
29
Other correlation measures are not amenable to incremental
computation
Applications
Security
Correlation power analysis
31. STREAMING CORRELATION
Wide Spectrum of Approaches
Sliding Windows, Damped Windows
Reduction
Smoothing
Down sampling
DFT [Agrawal et al. 1993, Zhu and Shasha 2002,
Qiu et al. 2018]
DWT [Chan and Fu 1999, Popivanov & Miller 2002]
PCA, SVD
PAA [Faloutsos and Yi 2000]
APAC [Chakrabarti et al. 2001]
Random projections [Grellmann et al. 2016]
LSH [Sundaram et al. 2013]
DATA SKETCHES
31
32. Bursts
Page views, Clicks, Retweets
Correlated burstiness (e.g., data center operations)
Root-cause analysis
STREAMING CORRELATION
* Figure borrowed from [Sakurai et al. 2005]
*
Lag between time series
Lagged correlation/Cross-correlation
ACTIONABLE INSIGHTS
32
[Zhu and Shasha 2002, Levine et al. 2016, Wu et al. 2017]
[Vlachos et al. 2008, Kotov et al. 2011, Shafer et al. 2012, Kusmierczyk and Norvag 2015]
34. PREDICTION MOTION
1952
“… prediction motion is the continuation of tracking motion
after a target disappears from view.”
1955
1962
Unlike duration of target presentation, target speed
exerts an influence on prediction accuracy.
EXPLORED >5 DECADES BACK!
34
35. PREDICTION MOTION
35
Human Motion/Robotics
[Bütepage et al. 2017, Martinez et a al. 2017, Byravan & Fox]
Traffic Prediction
[Hermes et al. 2010, Walker et al. 2014, Yu et al. 2017]
360-Degree Video (AR/VR)
[Bao et al. 2017, Vishwanath et al., 2017]
Potpourri: Deep Learning Based Approaches
[Oh et al. 2015, Mathieu et al. 2016, Liang et al. 2017]
37. AUDIO-VIDEO CORRELATION
1952
People utilize visual and postural experiences in perceiving the position
of an object in the field, of the whole field.
1897
Localization of sounds varied, being different when the source of sound was in sight from what it
was when this was out of sight, and also in the latter case differing with different directions of
attention, or with different suggestions as to the direction from which the sound came.
1941
SOUND LOCALIZATION: AN APPLICATION
37
38. AUDIO-VIDEO CORRELATION
1977
1960
1976
LIP READING: AN APPLICATION
38
Demonstrated the influence of vision on speech perception
McGurk and MacDonald
Established the relationship of the visually perceived
symbols to the underlying linguistic system
Use visual information as an aid when white noise made speech difficult to hear
1954
There’s a great opportunity for the visual contribution
at low speech-to-noise ratios
39. AUDIO-VIDEO CORRELATION
2007
2004
Combined acoustic and visual feature vectors to distinguish live synchronous audio-video
recordings from replay attacks that use audio with a still photo.
LIVENESS AND SYNCHRONIZATION: AN APPLICATION
39
Extracted the correlated components of audio and lip features
based on Canonical Correlation Analysis.
2009
Showed that there exists a relationship between perception of video presented
in screen and accompanying audio signals, both stereo and spatial
40. 40
AUDIO-VIDEO-TEXT
Speaker identification in multi-speaker scenarios
[Huang & Kingsbury 2013, Chung & Zisserman 2016, Torfi et al. 2017]
Cross-Modal Correlation Learning in Audio and Lyrics
[Yu et al. 2017, Tang et al. 2017]
Speech enhancement
[Xu et al. 2014, Hou et al. 2017, Kolbæk et al. 2017]
Action Recognition and Video Highlight Detection
[Wu et al. 2013, Sun et al. 2013, Takahashi et al. 2017]
DEEP LEARNING WAY
Emotion Recognition
[Tzirakis et al. 2017, Pini et al. 2017]
42. 42
Online anomaly detection: speed-accuracy trade-off
On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data*
by Choudhary et al. 2017
Breakouts/Changepoints
Skew in location of anomalies
SUSCEPTIBILITY TO ANOMALIES
* https://arxiv.org/pdf/1710.04735.pdf
44. 44
Potential sources: Event based monitoring via sensors, Occlusion in a video
Techniques: Resampling
Lomb-Scargle Fourier Transform [Andersson 2007, Rehfeld et al. 2011]
Kernel - such as Laplacian, Gaussian - based methods
IRREGULARLY SPACED TIME SERIES
45. TIME VARYING CORRELATION
45
* Figure borrowed from [Fu et al. 2013]
TVCC: Time Varying Correlation Coefficient
fMRI time-courses of 4 Regions of Interests (ROI)
✦ Time varying joint distribution
✦ Parameter non-constancy/Instability
F Test [Chow’60]
SupF Test [Quandt’60]
Lagrange Multiplier Test
[Nabeya and Tanaka’88, Nylom’89]
✦ Co-integrated processes
I(1) [Hansen’90]
✦ Stochastic vs. Deterministic
✦ Dynamic Conditional Correlation (DCC)
[Engle’02]
*
Post stimulus period with significant difference
(p<0.05) w.r.t. pre-stimulus interval
46. STREAMING CORRELATION
Missing data
Packets being dropped owing to, say, unexpected
high traffic
Data collection: Every, say, 5 seconds
How to scale analysis to milli-seconds granularity?
Unequal length time series
Different sampling rates
Small samples
Bootstrapping
Low SNR (Signal to Noise Ratio)
46
49. 49
SELF-LEARNING
EARLY WORK: GAMES
19501914
constructed a device which played an end game of king
and rook against king. The machine played the side with king and rook and would
force checkmate in a few moves however its human opponent played.
Gerald Tesauro
199519592002
50. 2002
2017
Amongst the first and most famous was the
chess-playing automaton constructed in 1769
by Baron Kempelen …
1953
Alan Turing
2012
1970
GAME PLAYING
POTPOURRI
50
19962007
The game of checkers has roughly 500 billion billion
possible positions (5 × 1020)
51. SELF-LEARNING
2016
REINFORCEMENT LEARNING
51
2016201720172018
AlphaGo
Mastering the game of Go with deep neural networks and tree search
by Silver et al.
AlphaGo Zero
Mastering the game of Go without Human knowledge
by Silver et al.
Alpha Zero
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
by Silver et al.
Libratus
Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
by Brown and Sandholm
52.
53.
54. 54
Rank Correlation Methods
by Kendall and Gibbons
READINGS
Correlation and Regression
by Bobko
Correlation
by Garson
Robust Correlation: Theory and Applications
by Shevlyakov and Oja
A Mathematical Theory of Evidence
by Shafer
BOOKS
55. 55
Fast Approximate Correlation of Massive Time-series Data
by Queen et al., 2010
READINGS
Local Correlation Detection with Linearity Enhancement in
Streaming Data
by Xie et al., 2013
Fast Distributed Correlation Discovery Over Streaming
Time-Series Data
by Guo et al., 2015
Random Projection of Fast and Efficient Multivariate
Correlation Analysis of High-Dimensional Data: A New Approach
by Grellmann et al., 2016
Detection of Highly Correlated Live Data Streams
by Alseghayer et al., 2017
STREAMING CORRELATION
56. 56
Perception of body position and of the position of the visual
field
by Witkin, 1949
READINGS
Autocorrelation, a principle for the evaluation of sensory
information by the nervous system
by Reichardt, 1961
EARLY RESEARCH IN AUDIO-VISUAL CORRELATION
Binocular cross-correlation in time and space
by Tyler and Julesz, 1978
Neurontropy. an entropy like measure of neural correlation
by Julesz and Tyler, 1976
Cross correlation of sensory stimuli and electroencephalogram
by Morgan, 1969
57. 57
Patch to the future: Unsupervised visual prediction
by Walker et al., 2014
READINGS
Motion-Prediction-based Multicast for 360-Degree Video
Transmissions
by Bao et al., 2017
Dual Motion GAN for Future-Flow Embedded Video
Prediction
by Liang et al., 2017
Deep representation learning for human motion prediction
and classification
by Bütepage et al., 2017
On human motion prediction using recurrent neural
networks
by Martinez, 2017
RECENT WORKS IN PREDICTION MOTION
58. 58
On Deep Multi-View Representation Learning
by Wang et al., 2015
READINGS
Correlational Neural Networks
by Chandar et al., 2017
Common Representation Learning Using Step-based
Correlation Multi-Modal CNN
by Bhatt et al., 2017
Objects that Sound
by Arandjelović and Zisserman, 2017
Deep Correlation Feature Learning for Face Verification in
the Wild
by Deng et al., 2017
COMMON REPRESENTATION LEARNING
59. 59
READINGS
Audiovisual Synchronization and Fusion Using Canonical
Correlation Analysis
by Sargin et al., 2007
The predictive power of trajectory motion
by Watamaniuk, 2005
Seeing motion behind occluders
by Watamaniuk and McKee, 1995
Temporal Coherence Theory For The Detection And
Measurement Of Visual Motion
by Grzywacz et al.,1995
Probabilistic Motion Estimation Based On Temporal Coherence
by Burgi et al., 2000
AUDIO-VISUAL RESEARCH
60. 60
A New Method of Audio-Visual Correlation Analysis
by Kunka and Kostek, 2009
READINGS
Uncertainty in ontologies: Dempster–Shafer theory for data
fusion applications
by Bellenger and Gatepaille, 2011
City Data Fusion: Sensor Data Fusion in the Internet of Things
by Wang et al., 2015
Data Fusion and IoT for Smart Ubiquitous Environments: A
Survey
by Alam et al., 2017
Correlation Analysis of Audio and Video Contents: A Metadata
based Approach
by Algur et al., 2015
POTPOURRI
61. 61
READINGS
POTPOURRI
Correlation detection as a general mechanism for
multisensory integration
by Parise and Ernst, 2016
Origin of information-limiting noise correlations
by Kanitscheidera et al., 2015
Temporal structure and complexity affect audio-visual
correspondence detection
by Denison et al., 2013
Correlation versus causation in multisensory perception
by Mitterer, Jesse, 2010
62. 62
READINGS
MATHEMATICS
A Bayesian approach to problems in stochastic estimation
and control
by Ho and Lee, 1964
A Generalization of Bayesian Inference
by Dempster, 1968
Multidimensional Scaling
by Kruskal and Wish, 1978