SlideShare a Scribd company logo
1 of 42
Download to read offline
@JOTB19
From Billions to Quintillions:
Paving the road to real-time
pattern mining using the Matrix
Profile
Zachary Zimmerman
#JOTB19@JOTB19
Contents
1.Introduction to the Matrix Profile
2.Applications of the Matrix Profile
3.Computing the Matrix Profile
4.Scaling the Matrix Profile
5.Future work
#JOTB19@JOTB19
What is the Matrix Profile?
What is the Matrix Profile?
• The Matrix Profile (MP) is a data structure that annotates a time series.
• Key Claim: Given the MP, most time series data mining problems are trivial or easy!
• There are many problems that are trivial given the MP, including motif discovery,
density estimation, anomaly detection, rule discovery, joins, segmentation, clustering
etc. However, you can use the MP to solve your problems, or to solve a problem listed
above, but in a different way, tailored to your interests/domain.
What is the Matrix Profile?
• The Matrix Profile (MP) is a data structure that annotates a time series.
• Key Claim: Given the MP, most time series data mining problems are trivial or easy!
• There are many problems that are trivial given the MP, including motif discovery,
density estimation, anomaly detection, rule discovery, joins, segmentation, clustering
etc. However, you can use the MP to solve your problems, or to solve a problem listed
above, but in a different way, tailored to your interests/domain.
• Key Insight: The MP profile has many highly desirable properties, and any algorithm
you build on top of it, will inherit those properties.
• Say you use the MP to create: An Algorithm to Segment Sleep States
• Then, for free, you have also created: An Anytime Algorithm to Segment Sleep States
An Online Algorithm to Segment Sleep States
A Parallelizable Algorithm to Segment Sleep States
A GPU Accelerated Algorithm to Segment Sleep States
An Algorithm to Segment Sleep States with Missing Data
Developing a Visual Intuition for the Matrix Profile
In the following slides we are
going to develop a visual intuition
for the matrix profile, without
regard to how we obtain it.
We are ignoring the elephant in
the room; the MP seems to be
much too slow to compute to be
practical. I will address this later in
the talk.
Note that algorithms that use the
MP do not require us to visualize
the MP, but it happens that just
visualizing the MP can be 99% of
solving a problem.
0 500 1000 1500 2000 2500 3000
Intuition behind the Matrix Profile: Assume we have a time series T, lets start with a synthetic one...
|T | = n =
3,000
0 500 1000 1500 2000 2500 3000
Note that for most time series data mining tasks, we are not interested in any global properties of the time
series, we are only interested in small local subsequences, of this length, m
These subsequences might be about the length of individual heartbeats (for ECGs), individual days (for
social media behavior), individual words (for speech analysis) etc
m = 100
0 500 1000 1500 2000 2500 3000
We can created a companion “time series”, called a Matrix Profile or MP.
The matrix profile at the ith location records the distance of the subsequence in T, at the ith location, to its
nearest neighbor under z-normalized Euclidean Distance.
For example, in the below, the subsequence starting at 921 happens to have a distance of 177.0 to its nearest
neighbor (wherever it is).
921
17
7
0 500 1000 1500 2000 2500 3000
We can create another companion sequence, called a matrix profile index.
The MPI contains integers that are used as pointers. As a practical matter, even 32-bits will let us have a MP of
length 2,147,483,647, over two years of data at 60Hz. A 64-bit integer gives us ten billion years at 60Hz)
In the following slides we won’t bother to show the matrix profile index, but be aware it
exists, and it allows us to find the nearest neighbor to any subsequence in constant time.
2
0
0
3
4.
1
1373 1375 1389 … .. 368 378 378 234 … matrix profile index
(zoom in )
Why is it called the Matrix Profile?
m
m
One naïve way to compute it would be to construct a distance
matrix of all pairs of subsequences of length m.
For each column, we could then “project” down the smallest (non
diagonal) value to a vector, and that vector would be the Matrix
Profile.
While in general we could never afford the memory to do this (4TB
for just |T|= one million), for most applications the Matrix Profile
is the only thing we need from the full matrix, and we can
compute and store it very efficiently. (as we will see later)
Key:
Small distances are blue
Large distances are red
Dark stripe is excluded
0 500 1000 1500 2000 2500 3000
How to “read” a Matrix Profile
Where you see relatively low values, you know that the subsequence in the original time
series must have (at least one) relatively similar subsequence elsewhere in the data (such
regions are “motifs” or reoccurring patterns)
Where you see relatively high values, you know that the subsequence in the original time
series must be unique in its shape (such areas are “discords” or anomalies).
Must be an anomaly in the original data, in
this region.
We call these Time Series Discords
Must be conserved shapes (motifs) in the original data, in these three
regions
0 500 1000 1500 2000 2500 3000
* Vipin Kumar performed an extensive empirical evaluation and noted that “..on 19 different publicly available data sets, comparing 9 different techniques
(time series discords) is the best overall technique.”. V. Chandola, D. Cheboli, V. Kumar. Detecting Anomalies in a Time Series Database. UMN TR09-004
How to “read” a Matrix Profile: Synthetic Anomaly Example
Where you see relatively high values, you know that the subsequence in the original time
series must be unique in its shape. In fact, the highest point is exactly the definition of Time
Series Discord, perhaps the best anomaly detector for time series*
Must be an anomaly in the original
data, in this region
How to “read” a Matrix Profile: Synthetic Motif Example
Where you see relatively low values, you know that the subsequence in the original time
series must have (at least one) relatively similar subsequence elsewhere in the data.
In fact, the lowest points must be a tieing pair, and correspond exactly to the classic definition
of time series motifs.
0 500 1000 1500 2000 2500 3000
The corresponding subsequence in the raw data at this location, must have at least one similar subsequence somewhere
How to “read” a Matrix Profile:
Now that we understand what a Matrix Profile is, and we have some practice
interpreting them on synthetic data, let us spend the next five minutes to see
some examples on real data.
Note that we will typically create algorithms that use the Matrix Profile,
without actually having humans look at it.
Nevertheless, in many exploratory time series data mining tasks, just looking
at the Matrix Profile can give us unexpected and actionable insights.
Ready to begin?
#JOTB19@JOTB19
Matrix Profile Applications
500 1000 1500 2000 2500 3000 3500
Taxi Example: Part I
[a] http://futuredata.stanford.edu/ASAP/extended.pdf
Given a long time series, where should you examine carefully?
The problem is called “Attention Prioritization”, a group at Stanford is working on this [a].
However we think that the Matrix Profile can be used for this, “for free”.
Below is the data, the hourly average of the number of NYC taxi passengers over 75 days
in Fall of 2014.
Lets compute the Matrix Profile for it, we choose a subsequence length corresponding to
two days…. (next slide)
500 1000 1500 2000 2500 3000 3500
500 1000 1500 2000 2500 3000 3500
0
Taxi Example: Part II
• The highest value corresponds to Thanksgiving (the uniqueness of Thanksgiving was the only thing
the Stanford Team noted)
• We find a secondary peak around Nov 6th, what could it be? Daylight Saving Time! The clock going
backwards one hour, gives an apparent doubling of taxi load.
• We find a tertiary peak around Oct 13th, what could it be? Columbus Day! Columbus Day is largely
ignored in much of America, but still a big deal in NY, with its large Italian American community.
500 1000 1500 2000 2500 3000 3500
500 1000 1500 2000 2500 3000 3500
0
Taxi Example: Part III
• What about the lowest values? (the best motifs)
• They are exactly seven days apart, suggesting that in this dataset, there might be a periodicity of
seven days, in addition to the more obvious periodicity of one day.
1000 2000 3000 4000 5000 6000 7000
1000 2000 3000 4000 5000 6000 7000
0
5
10
15
20
Electrocardiogram
(MIT-BIH Long-Term ECG Database)
The first discord: premature
ventricular contraction
The second discord:
ectopic beat
In this case there are two anomalies annotated by MIT cardiologists.
The Matrix Profile clearly indicates them.
Here the subsequence length was set to 150, but we still find these
anomalies if we half or triple that length.
Seismology
0 10 20seconds
Time:19:23:48.44 Latitude:37.57 Longitude:-118.86 Depth: 5.60 Magnitude: 1.29
Time:20:08:01.13 Latitude:37.58 Longitude:-118.86 Depth: 4.93 Magnitude: 1.09
0 9,000
Seismic Time Series
Matrix Profile
Thanks to C. Yoon, O. O’Reilly, K. Bergen and G. Beroza of Stanford for this
Zoom-In
The corresponding
subsequence in
the raw data at
this location, must
have at least one
similar earthquake
somewhere
If we see low values in the MP of a seismograph, it means there must have been a repeated earthquake.
Repeated earthquakes can happen decades apart.
Many fundamental problems seismology, including the discovery of foreshocks, aftershocks, triggered
earthquakes, swarms, volcanic activity and induced seismicity, can be reduced to the discovery of these
repeated patterns.
Summary
We could do this all day! We have applied the MP to hundreds of datasets.
It is worth reminding ourselves of the following:
• The MP can find structure in taxi demand, seismology, DNA, power demand, heartbeats,
music and bird vocalization data. However the MP does not “know” anything about this
domains, it was not tweaked, tuned or adjusted in any way.
• This domain-agnostic nature of MP is a great strength, you typically get very good results
“out-of-the-box” no matter what your domain.
The following is also worth stating again:
• We spent time looking at the MP to gain confidence that it is doing something useful.
However, most of the time, only a higher-level algorithm will look at the MP, with no
human intervention or inspection (we will make that clearer in the following slides).
#JOTB19@JOTB19
Computing the MP
d1,1(= 0) d2,1 … dn-m+1,1
Query, the 1st subsequence in the time series
Obtain the z-normalized Euclidean distance between Query and each window
(subsequence) in the time series. We would obtain a vector like this:
di,j is the distance between the ith subsequence and the jth subsequence.
D1
We can obtain D2, D3, … Dn-m+1 similarly.
Computing the Matrix Profile: Distance Profile
O(nm) complexity using
naïve method.
O(nlogn) complexity
using FFTs with MASS
The distance matrix is symmetric
The diagonal is zero
Cells close to the diagonal are very small
Matrix Profile: a vector of distance between each subsequence
and its nearest neighbor
di,j is the distance between the ith window and the jth
window of the time series
d1,1 d1,2 … … … d1,n-m+1
d2,1 d2,2 … … … d2,n-m+1
… … … … … …
di,1 di,2 … di,j … di,n-m+1
… … … … … …
dn-m+1,1 dn-m+1,2 … … … dn-m+1,n-
m+1
ith
jth
Min(D
1)
Min(D
2)
Min(Dn-
m+1)
Min(Di
)
P1 P2 … … ... Pn-m+1
Computing the Matrix Profile: Distance Matrix
D1
D2
Di
Dn-m+1
STAMP: Our (not completely) naïve method of computing the MP
MP(1:n-m+1) = inf;
for i = 1:n-m+1
d = MASS(T,T(i:i+m-1));
MP = min([MP ; d]);
end
Matrix Profile: a vector of distance between each subsequence
and its nearest neighbor
d1,1 d1,2 … … … d1,n-m+1
d2,1 d2,2 … … … d2,n-m+1
… … … … … …
di,1 di,2 … di,j … di,n-m+1
… … … … … …
dn-m+1,1 dn-m+1,2 … … … dn-m+1,n-
m+1
ith
jth
Min(D
1)
Min(D
2)
Min(Dn-
m+1)
Min(Di
)
P1 P2 … … ... Pn-m+1
Time complexity of STAMP is 𝑂𝑂(𝑛𝑛2
log 𝑛𝑛 )
Space complexity of STAMP is 𝑂𝑂 𝑛𝑛
D1
D2
Di
Dn-m+1
#JOTB19@JOTB19
Scaling the Matrix Profile
Scaling the Matrix Profile Calculation
● Over the last couple of years, we have slowly integrated insights from the
high performance computing space into the matrix profile.
● Key Lessons Learned:
■ Optimization is an iterative process with many moving parts
■ Fast code comes as a result of being aware of considerations across the
entire computing stack, from hardware to OS/application to the compiler
■ There are significant tradeoffs between speed and extensibility/readability,
achieving all of these things requires a very good design and is not even
be possible for some problems.
Scaling the Matrix Profile calculation
● Performance for Input time series of length 2 million:
■ Initial CPU Implementation: 1 CPU thread -> 4.2 days
■ Initial GPU Implementation: K80 GPU -> 3.2 hours
■ Optimized CPU implementation: 4 CPU thread -> 6.5 minutes
■ Optimized GPU Implementation: V100 GPU -> 5 seconds
● Cloud implementation 40 GPU cluster allowed us to do 1 Billion in < 10 hours
■ This is on the order of 10^18 ( a quintillion ) pairwise comparisons
■ COST ~ 500 USD (~ 0.80 USD per quadrillion comparisons)
Scaling the Matrix Profile Calculation
● These speedups came as the result of improvements in the following areas:
■ Better Algorithmic Complexity
■ Use of Modern Hardware
■ Use of Relevant Hardware Features
■ Architecture Aware Code
■ Tiling into subproblems
■ Algebraic Improvements to Problem Formulation
Scaling the Matrix Profile Calculation
Disclaimer:
● The following slides show some examples of ways we scaled our
application.
● This is a very rough overview, in the interest of time. I won’t get too far into
the details in this talk.
● If you are interested in more technical details and/or source code please see
the reference slide at the end of this talk for links to our supporting webpage
Scaling the Matrix Profile calculation: Algorithmic Complexity
𝑄𝑄𝑄𝑄𝑖𝑖+1,𝑗𝑗+1 =
𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 =
× × × ×+ + + +
× × × ×+ + + +…
𝑄𝑄𝑄𝑄𝑖𝑖+1,𝑗𝑗+1=𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 − 𝑡𝑡𝑖𝑖 𝑡𝑡𝑗𝑗 + 𝑡𝑡𝑖𝑖+𝑚𝑚 𝑡𝑡𝑗𝑗+𝑚𝑚 𝑶𝑶(𝑵𝑵𝑵𝑵𝑵𝑵 𝑵𝑵 𝑵𝑵 ) → 𝑶𝑶 𝑵𝑵
time complexity per
subsequence!
O(N2) algorithm
SCAMP: Scalable Matrix Profile
P1 P2 P3 … Pn-m+1
I1 I2 I3 … In-m+1
𝜇𝜇1 𝜇𝜇2 𝜇𝜇3 … 𝜇𝜇𝑛𝑛−𝑚𝑚+1
𝜎𝜎1 𝜎𝜎2 𝜎𝜎3 … 𝜎𝜎𝑛𝑛−𝑚𝑚+1
𝑇𝑇1 𝑇𝑇2 𝑇𝑇1 𝑇𝑇3 … 𝑇𝑇1 𝑇𝑇𝑛𝑛−𝑚𝑚 𝑇𝑇1 𝑇𝑇𝑛𝑛−𝑚𝑚+1𝑇𝑇1 𝑇𝑇1
𝑇𝑇2 𝑇𝑇1
.
.
.
𝑇𝑇𝑖𝑖−1 𝑇𝑇1
𝑇𝑇𝑖𝑖 𝑇𝑇1
.
.
.
𝑇𝑇𝑛𝑛−𝑚𝑚 𝑇𝑇1
𝑇𝑇𝑛𝑛−𝑚𝑚+1 𝑇𝑇1
Output Arrays
Precomputed Arrays
𝑇𝑇𝑛𝑛−𝑚𝑚+1 𝑇𝑇𝑛𝑛−𝑚𝑚+1
We can exploit the fact that our only
dependency is along the diagonal of the
distance matrix to speed up the
computations.
In the GPU we can assign each thread a
diagonal and compute the distances along
the diagonals.
We can use a similar strategy to improve
performance on the CPU.
Old update formula
Old (Distance) formula
Scaling the Matrix Profile calculation: Algebraic Improvements
𝑑𝑑𝑖𝑖,𝑗𝑗 = 2𝑚𝑚 1 −
𝑄𝑄𝑄𝑄𝑖𝑖𝑖𝑖 − 𝑚𝑚𝜇𝜇𝑖𝑖 𝜇𝜇𝑗𝑗
𝑚𝑚𝜎𝜎𝑖𝑖 𝜎𝜎𝑗𝑗
𝑄𝑄𝑄𝑄𝑖𝑖+1,𝑗𝑗+1=𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 − 𝑡𝑡𝑖𝑖 𝑡𝑡𝑗𝑗 + 𝑡𝑡𝑖𝑖+𝑚𝑚 𝑡𝑡𝑗𝑗+𝑚𝑚
Precomputed
𝑃𝑃𝑖𝑖,𝑗𝑗 = 𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 ∗
1
𝑇𝑇𝑖𝑖,𝑚𝑚 − 𝜇𝜇𝑖𝑖
∗
1
𝑇𝑇𝑗𝑗.𝑚𝑚 − 𝜇𝜇𝑗𝑗
New update formula
New (Correlation) formula
𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 = 𝑄𝑄𝑄𝑄𝑖𝑖−1,𝑗𝑗−1 + 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑗𝑗 + 𝑑𝑑𝑑𝑑𝑗𝑗 𝑑𝑑𝑑𝑑𝑖𝑖
Smaller intermediate values
 Less cancelation error
 More stable algorithm
 single precision is an
option
Fewer floating point
operations
 More efficient computation
Scaling the Matrix Profile calculation: Algebraic Improvements
Precomputed
Scaling the Matrix Profile calculation: Architecture Awareness
/ Feature Utilization (GPU Example)
Scaling the Matrix Profile calculation: Tiling and
Distributed Computation
Cloud GPU
Reducer
AWS, GCP, Azure…
Cloud GPU
Reducer
Mapper
Big Time Series
Scaling the Matrix Profile: Results
Dataset Parkfield 1B
Cascadia Subduction
Zone
Size 1 Billion 1 Billion
Tile Size ~52M (1 month) ~ 25M (2 weeks)
Total GPU time 375.2 hours 375.3 hours
Spot Job Time 2.5 days 10hours 3min
Approximate SpotCost 480 USD 620 USD
Intermediate DataSize 102.2 GB 196.4 GB
Spot cost is based on demand, GPU
time was the same in both experiments,
but the spot price was higher during the
Cascadia experiment
Parkfield 580 days @ 20Hz Matrix Profile
What’s Next?
• Many of these comparisons are redundant. Can we
compress the time series somehow? Is there some
representative subeset of the data which will let us
approximate the MP computation?
• Use this approximation to perform pattern mining
at the edge.
• What else can we do with this computational
pattern?
Real-time pattern mining intuition
• Be fast enough to operate in real-time
• We want O(1) updates
• Be able to operate with power/memory/disk
constraints. IoT devices.
Compressed
data
representaion
0.972
Matrix Profile
Value
Time Series
Subsequence
References
• Supporting Webpage (papers can be found here):
https://www.cs.ucr.edu/~eamonn/MatrixProfile.ht
ml
• SCAMP source code:
https://github.com/zpzim/SCAMP
Chin-Chia Michael Yeh, Yan Zhu, Abdullah Mueen
Nurjahan Begum, Yifei Ding, Hoang Anh Dau
Diego Furtado Silva, Liudmila Ulanova, Eamonn Keogh
Zachary Zimmerman, Nader S. Senobari, Philip Brisk
Shaghayegh Gharghabi, Kaveh Kamgar
..and others that have inspired us, forgive any omissions.
Thanks for listening!
Your
face
here

More Related Content

What's hot

Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
 
Ijarcet vol-2-issue-7-2223-2229
Ijarcet vol-2-issue-7-2223-2229Ijarcet vol-2-issue-7-2223-2229
Ijarcet vol-2-issue-7-2223-2229Editor IJARCET
 
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepManish nath choudhary
 
A smart guidance navigation robot using petri net, database location, and rad...
A smart guidance navigation robot using petri net, database location, and rad...A smart guidance navigation robot using petri net, database location, and rad...
A smart guidance navigation robot using petri net, database location, and rad...journalBEEI
 
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...IRJET Journal
 
ALGEBRAIC DEGREE ESTIMATION OF BLOCK CIPHERS USING RANDOMIZED ALGORITHM; UPPE...
ALGEBRAIC DEGREE ESTIMATION OF BLOCK CIPHERS USING RANDOMIZED ALGORITHM; UPPE...ALGEBRAIC DEGREE ESTIMATION OF BLOCK CIPHERS USING RANDOMIZED ALGORITHM; UPPE...
ALGEBRAIC DEGREE ESTIMATION OF BLOCK CIPHERS USING RANDOMIZED ALGORITHM; UPPE...ijcisjournal
 
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IINeural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IIEdgar Carrillo
 
Fast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance SystemsFast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance SystemsBahram Lavi
 

What's hot (12)

Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
Ijarcet vol-2-issue-7-2223-2229
Ijarcet vol-2-issue-7-2223-2229Ijarcet vol-2-issue-7-2223-2229
Ijarcet vol-2-issue-7-2223-2229
 
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by Step
 
A smart guidance navigation robot using petri net, database location, and rad...
A smart guidance navigation robot using petri net, database location, and rad...A smart guidance navigation robot using petri net, database location, and rad...
A smart guidance navigation robot using petri net, database location, and rad...
 
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
 
Svm
SvmSvm
Svm
 
ALGEBRAIC DEGREE ESTIMATION OF BLOCK CIPHERS USING RANDOMIZED ALGORITHM; UPPE...
ALGEBRAIC DEGREE ESTIMATION OF BLOCK CIPHERS USING RANDOMIZED ALGORITHM; UPPE...ALGEBRAIC DEGREE ESTIMATION OF BLOCK CIPHERS USING RANDOMIZED ALGORITHM; UPPE...
ALGEBRAIC DEGREE ESTIMATION OF BLOCK CIPHERS USING RANDOMIZED ALGORITHM; UPPE...
 
End1
End1End1
End1
 
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IINeural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
 
Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
 
Tensor Explained
Tensor ExplainedTensor Explained
Tensor Explained
 
Fast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance SystemsFast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance Systems
 

Similar to From Billions to Quintillions: Paving the way to real-time motif discovery in time series

Matrix_Profile_Tutorial_Part1.pdf
Matrix_Profile_Tutorial_Part1.pdfMatrix_Profile_Tutorial_Part1.pdf
Matrix_Profile_Tutorial_Part1.pdfAndrea496281
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reductionShatakirti Er
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Introduction to Machine Vision
Introduction to Machine VisionIntroduction to Machine Vision
Introduction to Machine VisionNasir Jumani
 
Performance characterization in computer vision
Performance characterization in computer visionPerformance characterization in computer vision
Performance characterization in computer visionpotaters
 
Hierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionHierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionIhor Bobak
 
Deep learning in Crypto Currency Trading
Deep learning in Crypto Currency TradingDeep learning in Crypto Currency Trading
Deep learning in Crypto Currency TradingMartin Kariithi, CFA
 
Essay On Cryptography
Essay On CryptographyEssay On Cryptography
Essay On CryptographyHaley Johnson
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxJawadHaider36
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classificationijtsrd
 
AI In Actuarial Science
AI In Actuarial ScienceAI In Actuarial Science
AI In Actuarial ScienceAudrey Britton
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Paul Houle
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 

Similar to From Billions to Quintillions: Paving the way to real-time motif discovery in time series (20)

Matrix_Profile_Tutorial_Part1.pdf
Matrix_Profile_Tutorial_Part1.pdfMatrix_Profile_Tutorial_Part1.pdf
Matrix_Profile_Tutorial_Part1.pdf
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Introduction to Machine Vision
Introduction to Machine VisionIntroduction to Machine Vision
Introduction to Machine Vision
 
20181212 ibm aot
20181212 ibm aot20181212 ibm aot
20181212 ibm aot
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
 
ProjectReport
ProjectReportProjectReport
ProjectReport
 
Performance characterization in computer vision
Performance characterization in computer visionPerformance characterization in computer vision
Performance characterization in computer vision
 
Hierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionHierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly Detection
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Deep learning in Crypto Currency Trading
Deep learning in Crypto Currency TradingDeep learning in Crypto Currency Trading
Deep learning in Crypto Currency Trading
 
Essay On Cryptography
Essay On CryptographyEssay On Cryptography
Essay On Cryptography
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
AI In Actuarial Science
AI In Actuarial ScienceAI In Actuarial Science
AI In Actuarial Science
 
Aina_final
Aina_finalAina_final
Aina_final
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6
 
som
somsom
som
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 

More from J On The Beach

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayJ On The Beach
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t HaveJ On The Beach
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...J On The Beach
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoTJ On The Beach
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsJ On The Beach
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternJ On The Beach
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorJ On The Beach
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.J On The Beach
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EEJ On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorJ On The Beach
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTingJ On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...J On The Beach
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysJ On The Beach
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to failJ On The Beach
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersJ On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...J On The Beach
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every levelJ On The Beach
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
 

More from J On The Beach (20)

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 

Recently uploaded

What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfryanfarris8
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 

Recently uploaded (20)

What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 

From Billions to Quintillions: Paving the way to real-time motif discovery in time series

  • 1. @JOTB19 From Billions to Quintillions: Paving the road to real-time pattern mining using the Matrix Profile Zachary Zimmerman
  • 2. #JOTB19@JOTB19 Contents 1.Introduction to the Matrix Profile 2.Applications of the Matrix Profile 3.Computing the Matrix Profile 4.Scaling the Matrix Profile 5.Future work
  • 3. #JOTB19@JOTB19 What is the Matrix Profile?
  • 4. What is the Matrix Profile? • The Matrix Profile (MP) is a data structure that annotates a time series. • Key Claim: Given the MP, most time series data mining problems are trivial or easy! • There are many problems that are trivial given the MP, including motif discovery, density estimation, anomaly detection, rule discovery, joins, segmentation, clustering etc. However, you can use the MP to solve your problems, or to solve a problem listed above, but in a different way, tailored to your interests/domain.
  • 5. What is the Matrix Profile? • The Matrix Profile (MP) is a data structure that annotates a time series. • Key Claim: Given the MP, most time series data mining problems are trivial or easy! • There are many problems that are trivial given the MP, including motif discovery, density estimation, anomaly detection, rule discovery, joins, segmentation, clustering etc. However, you can use the MP to solve your problems, or to solve a problem listed above, but in a different way, tailored to your interests/domain. • Key Insight: The MP profile has many highly desirable properties, and any algorithm you build on top of it, will inherit those properties. • Say you use the MP to create: An Algorithm to Segment Sleep States • Then, for free, you have also created: An Anytime Algorithm to Segment Sleep States An Online Algorithm to Segment Sleep States A Parallelizable Algorithm to Segment Sleep States A GPU Accelerated Algorithm to Segment Sleep States An Algorithm to Segment Sleep States with Missing Data
  • 6. Developing a Visual Intuition for the Matrix Profile In the following slides we are going to develop a visual intuition for the matrix profile, without regard to how we obtain it. We are ignoring the elephant in the room; the MP seems to be much too slow to compute to be practical. I will address this later in the talk. Note that algorithms that use the MP do not require us to visualize the MP, but it happens that just visualizing the MP can be 99% of solving a problem.
  • 7. 0 500 1000 1500 2000 2500 3000 Intuition behind the Matrix Profile: Assume we have a time series T, lets start with a synthetic one... |T | = n = 3,000
  • 8. 0 500 1000 1500 2000 2500 3000 Note that for most time series data mining tasks, we are not interested in any global properties of the time series, we are only interested in small local subsequences, of this length, m These subsequences might be about the length of individual heartbeats (for ECGs), individual days (for social media behavior), individual words (for speech analysis) etc m = 100
  • 9. 0 500 1000 1500 2000 2500 3000 We can created a companion “time series”, called a Matrix Profile or MP. The matrix profile at the ith location records the distance of the subsequence in T, at the ith location, to its nearest neighbor under z-normalized Euclidean Distance. For example, in the below, the subsequence starting at 921 happens to have a distance of 177.0 to its nearest neighbor (wherever it is). 921 17 7
  • 10. 0 500 1000 1500 2000 2500 3000 We can create another companion sequence, called a matrix profile index. The MPI contains integers that are used as pointers. As a practical matter, even 32-bits will let us have a MP of length 2,147,483,647, over two years of data at 60Hz. A 64-bit integer gives us ten billion years at 60Hz) In the following slides we won’t bother to show the matrix profile index, but be aware it exists, and it allows us to find the nearest neighbor to any subsequence in constant time. 2 0 0 3 4. 1 1373 1375 1389 … .. 368 378 378 234 … matrix profile index (zoom in )
  • 11. Why is it called the Matrix Profile? m m One naïve way to compute it would be to construct a distance matrix of all pairs of subsequences of length m. For each column, we could then “project” down the smallest (non diagonal) value to a vector, and that vector would be the Matrix Profile. While in general we could never afford the memory to do this (4TB for just |T|= one million), for most applications the Matrix Profile is the only thing we need from the full matrix, and we can compute and store it very efficiently. (as we will see later) Key: Small distances are blue Large distances are red Dark stripe is excluded
  • 12. 0 500 1000 1500 2000 2500 3000 How to “read” a Matrix Profile Where you see relatively low values, you know that the subsequence in the original time series must have (at least one) relatively similar subsequence elsewhere in the data (such regions are “motifs” or reoccurring patterns) Where you see relatively high values, you know that the subsequence in the original time series must be unique in its shape (such areas are “discords” or anomalies). Must be an anomaly in the original data, in this region. We call these Time Series Discords Must be conserved shapes (motifs) in the original data, in these three regions
  • 13. 0 500 1000 1500 2000 2500 3000 * Vipin Kumar performed an extensive empirical evaluation and noted that “..on 19 different publicly available data sets, comparing 9 different techniques (time series discords) is the best overall technique.”. V. Chandola, D. Cheboli, V. Kumar. Detecting Anomalies in a Time Series Database. UMN TR09-004 How to “read” a Matrix Profile: Synthetic Anomaly Example Where you see relatively high values, you know that the subsequence in the original time series must be unique in its shape. In fact, the highest point is exactly the definition of Time Series Discord, perhaps the best anomaly detector for time series* Must be an anomaly in the original data, in this region
  • 14. How to “read” a Matrix Profile: Synthetic Motif Example Where you see relatively low values, you know that the subsequence in the original time series must have (at least one) relatively similar subsequence elsewhere in the data. In fact, the lowest points must be a tieing pair, and correspond exactly to the classic definition of time series motifs. 0 500 1000 1500 2000 2500 3000 The corresponding subsequence in the raw data at this location, must have at least one similar subsequence somewhere
  • 15. How to “read” a Matrix Profile: Now that we understand what a Matrix Profile is, and we have some practice interpreting them on synthetic data, let us spend the next five minutes to see some examples on real data. Note that we will typically create algorithms that use the Matrix Profile, without actually having humans look at it. Nevertheless, in many exploratory time series data mining tasks, just looking at the Matrix Profile can give us unexpected and actionable insights. Ready to begin?
  • 17. 500 1000 1500 2000 2500 3000 3500 Taxi Example: Part I [a] http://futuredata.stanford.edu/ASAP/extended.pdf Given a long time series, where should you examine carefully? The problem is called “Attention Prioritization”, a group at Stanford is working on this [a]. However we think that the Matrix Profile can be used for this, “for free”. Below is the data, the hourly average of the number of NYC taxi passengers over 75 days in Fall of 2014. Lets compute the Matrix Profile for it, we choose a subsequence length corresponding to two days…. (next slide)
  • 18. 500 1000 1500 2000 2500 3000 3500 500 1000 1500 2000 2500 3000 3500 0 Taxi Example: Part II • The highest value corresponds to Thanksgiving (the uniqueness of Thanksgiving was the only thing the Stanford Team noted) • We find a secondary peak around Nov 6th, what could it be? Daylight Saving Time! The clock going backwards one hour, gives an apparent doubling of taxi load. • We find a tertiary peak around Oct 13th, what could it be? Columbus Day! Columbus Day is largely ignored in much of America, but still a big deal in NY, with its large Italian American community.
  • 19. 500 1000 1500 2000 2500 3000 3500 500 1000 1500 2000 2500 3000 3500 0 Taxi Example: Part III • What about the lowest values? (the best motifs) • They are exactly seven days apart, suggesting that in this dataset, there might be a periodicity of seven days, in addition to the more obvious periodicity of one day.
  • 20. 1000 2000 3000 4000 5000 6000 7000 1000 2000 3000 4000 5000 6000 7000 0 5 10 15 20 Electrocardiogram (MIT-BIH Long-Term ECG Database) The first discord: premature ventricular contraction The second discord: ectopic beat In this case there are two anomalies annotated by MIT cardiologists. The Matrix Profile clearly indicates them. Here the subsequence length was set to 150, but we still find these anomalies if we half or triple that length.
  • 21. Seismology 0 10 20seconds Time:19:23:48.44 Latitude:37.57 Longitude:-118.86 Depth: 5.60 Magnitude: 1.29 Time:20:08:01.13 Latitude:37.58 Longitude:-118.86 Depth: 4.93 Magnitude: 1.09 0 9,000 Seismic Time Series Matrix Profile Thanks to C. Yoon, O. O’Reilly, K. Bergen and G. Beroza of Stanford for this Zoom-In The corresponding subsequence in the raw data at this location, must have at least one similar earthquake somewhere If we see low values in the MP of a seismograph, it means there must have been a repeated earthquake. Repeated earthquakes can happen decades apart. Many fundamental problems seismology, including the discovery of foreshocks, aftershocks, triggered earthquakes, swarms, volcanic activity and induced seismicity, can be reduced to the discovery of these repeated patterns.
  • 22. Summary We could do this all day! We have applied the MP to hundreds of datasets. It is worth reminding ourselves of the following: • The MP can find structure in taxi demand, seismology, DNA, power demand, heartbeats, music and bird vocalization data. However the MP does not “know” anything about this domains, it was not tweaked, tuned or adjusted in any way. • This domain-agnostic nature of MP is a great strength, you typically get very good results “out-of-the-box” no matter what your domain. The following is also worth stating again: • We spent time looking at the MP to gain confidence that it is doing something useful. However, most of the time, only a higher-level algorithm will look at the MP, with no human intervention or inspection (we will make that clearer in the following slides).
  • 24. d1,1(= 0) d2,1 … dn-m+1,1 Query, the 1st subsequence in the time series Obtain the z-normalized Euclidean distance between Query and each window (subsequence) in the time series. We would obtain a vector like this: di,j is the distance between the ith subsequence and the jth subsequence. D1 We can obtain D2, D3, … Dn-m+1 similarly. Computing the Matrix Profile: Distance Profile O(nm) complexity using naïve method. O(nlogn) complexity using FFTs with MASS
  • 25. The distance matrix is symmetric The diagonal is zero Cells close to the diagonal are very small Matrix Profile: a vector of distance between each subsequence and its nearest neighbor di,j is the distance between the ith window and the jth window of the time series d1,1 d1,2 … … … d1,n-m+1 d2,1 d2,2 … … … d2,n-m+1 … … … … … … di,1 di,2 … di,j … di,n-m+1 … … … … … … dn-m+1,1 dn-m+1,2 … … … dn-m+1,n- m+1 ith jth Min(D 1) Min(D 2) Min(Dn- m+1) Min(Di ) P1 P2 … … ... Pn-m+1 Computing the Matrix Profile: Distance Matrix D1 D2 Di Dn-m+1
  • 26. STAMP: Our (not completely) naïve method of computing the MP MP(1:n-m+1) = inf; for i = 1:n-m+1 d = MASS(T,T(i:i+m-1)); MP = min([MP ; d]); end Matrix Profile: a vector of distance between each subsequence and its nearest neighbor d1,1 d1,2 … … … d1,n-m+1 d2,1 d2,2 … … … d2,n-m+1 … … … … … … di,1 di,2 … di,j … di,n-m+1 … … … … … … dn-m+1,1 dn-m+1,2 … … … dn-m+1,n- m+1 ith jth Min(D 1) Min(D 2) Min(Dn- m+1) Min(Di ) P1 P2 … … ... Pn-m+1 Time complexity of STAMP is 𝑂𝑂(𝑛𝑛2 log 𝑛𝑛 ) Space complexity of STAMP is 𝑂𝑂 𝑛𝑛 D1 D2 Di Dn-m+1
  • 28. Scaling the Matrix Profile Calculation ● Over the last couple of years, we have slowly integrated insights from the high performance computing space into the matrix profile. ● Key Lessons Learned: ■ Optimization is an iterative process with many moving parts ■ Fast code comes as a result of being aware of considerations across the entire computing stack, from hardware to OS/application to the compiler ■ There are significant tradeoffs between speed and extensibility/readability, achieving all of these things requires a very good design and is not even be possible for some problems.
  • 29. Scaling the Matrix Profile calculation ● Performance for Input time series of length 2 million: ■ Initial CPU Implementation: 1 CPU thread -> 4.2 days ■ Initial GPU Implementation: K80 GPU -> 3.2 hours ■ Optimized CPU implementation: 4 CPU thread -> 6.5 minutes ■ Optimized GPU Implementation: V100 GPU -> 5 seconds ● Cloud implementation 40 GPU cluster allowed us to do 1 Billion in < 10 hours ■ This is on the order of 10^18 ( a quintillion ) pairwise comparisons ■ COST ~ 500 USD (~ 0.80 USD per quadrillion comparisons)
  • 30. Scaling the Matrix Profile Calculation ● These speedups came as the result of improvements in the following areas: ■ Better Algorithmic Complexity ■ Use of Modern Hardware ■ Use of Relevant Hardware Features ■ Architecture Aware Code ■ Tiling into subproblems ■ Algebraic Improvements to Problem Formulation
  • 31. Scaling the Matrix Profile Calculation Disclaimer: ● The following slides show some examples of ways we scaled our application. ● This is a very rough overview, in the interest of time. I won’t get too far into the details in this talk. ● If you are interested in more technical details and/or source code please see the reference slide at the end of this talk for links to our supporting webpage
  • 32. Scaling the Matrix Profile calculation: Algorithmic Complexity 𝑄𝑄𝑄𝑄𝑖𝑖+1,𝑗𝑗+1 = 𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 = × × × ×+ + + + × × × ×+ + + +… 𝑄𝑄𝑄𝑄𝑖𝑖+1,𝑗𝑗+1=𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 − 𝑡𝑡𝑖𝑖 𝑡𝑡𝑗𝑗 + 𝑡𝑡𝑖𝑖+𝑚𝑚 𝑡𝑡𝑗𝑗+𝑚𝑚 𝑶𝑶(𝑵𝑵𝑵𝑵𝑵𝑵 𝑵𝑵 𝑵𝑵 ) → 𝑶𝑶 𝑵𝑵 time complexity per subsequence! O(N2) algorithm
  • 33. SCAMP: Scalable Matrix Profile P1 P2 P3 … Pn-m+1 I1 I2 I3 … In-m+1 𝜇𝜇1 𝜇𝜇2 𝜇𝜇3 … 𝜇𝜇𝑛𝑛−𝑚𝑚+1 𝜎𝜎1 𝜎𝜎2 𝜎𝜎3 … 𝜎𝜎𝑛𝑛−𝑚𝑚+1 𝑇𝑇1 𝑇𝑇2 𝑇𝑇1 𝑇𝑇3 … 𝑇𝑇1 𝑇𝑇𝑛𝑛−𝑚𝑚 𝑇𝑇1 𝑇𝑇𝑛𝑛−𝑚𝑚+1𝑇𝑇1 𝑇𝑇1 𝑇𝑇2 𝑇𝑇1 . . . 𝑇𝑇𝑖𝑖−1 𝑇𝑇1 𝑇𝑇𝑖𝑖 𝑇𝑇1 . . . 𝑇𝑇𝑛𝑛−𝑚𝑚 𝑇𝑇1 𝑇𝑇𝑛𝑛−𝑚𝑚+1 𝑇𝑇1 Output Arrays Precomputed Arrays 𝑇𝑇𝑛𝑛−𝑚𝑚+1 𝑇𝑇𝑛𝑛−𝑚𝑚+1 We can exploit the fact that our only dependency is along the diagonal of the distance matrix to speed up the computations. In the GPU we can assign each thread a diagonal and compute the distances along the diagonals. We can use a similar strategy to improve performance on the CPU.
  • 34. Old update formula Old (Distance) formula Scaling the Matrix Profile calculation: Algebraic Improvements 𝑑𝑑𝑖𝑖,𝑗𝑗 = 2𝑚𝑚 1 − 𝑄𝑄𝑄𝑄𝑖𝑖𝑖𝑖 − 𝑚𝑚𝜇𝜇𝑖𝑖 𝜇𝜇𝑗𝑗 𝑚𝑚𝜎𝜎𝑖𝑖 𝜎𝜎𝑗𝑗 𝑄𝑄𝑄𝑄𝑖𝑖+1,𝑗𝑗+1=𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 − 𝑡𝑡𝑖𝑖 𝑡𝑡𝑗𝑗 + 𝑡𝑡𝑖𝑖+𝑚𝑚 𝑡𝑡𝑗𝑗+𝑚𝑚 Precomputed
  • 35. 𝑃𝑃𝑖𝑖,𝑗𝑗 = 𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 ∗ 1 𝑇𝑇𝑖𝑖,𝑚𝑚 − 𝜇𝜇𝑖𝑖 ∗ 1 𝑇𝑇𝑗𝑗.𝑚𝑚 − 𝜇𝜇𝑗𝑗 New update formula New (Correlation) formula 𝑄𝑄𝑄𝑄𝑖𝑖,𝑗𝑗 = 𝑄𝑄𝑄𝑄𝑖𝑖−1,𝑗𝑗−1 + 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑗𝑗 + 𝑑𝑑𝑑𝑑𝑗𝑗 𝑑𝑑𝑑𝑑𝑖𝑖 Smaller intermediate values  Less cancelation error  More stable algorithm  single precision is an option Fewer floating point operations  More efficient computation Scaling the Matrix Profile calculation: Algebraic Improvements Precomputed
  • 36. Scaling the Matrix Profile calculation: Architecture Awareness / Feature Utilization (GPU Example)
  • 37. Scaling the Matrix Profile calculation: Tiling and Distributed Computation Cloud GPU Reducer AWS, GCP, Azure… Cloud GPU Reducer Mapper Big Time Series
  • 38. Scaling the Matrix Profile: Results Dataset Parkfield 1B Cascadia Subduction Zone Size 1 Billion 1 Billion Tile Size ~52M (1 month) ~ 25M (2 weeks) Total GPU time 375.2 hours 375.3 hours Spot Job Time 2.5 days 10hours 3min Approximate SpotCost 480 USD 620 USD Intermediate DataSize 102.2 GB 196.4 GB Spot cost is based on demand, GPU time was the same in both experiments, but the spot price was higher during the Cascadia experiment Parkfield 580 days @ 20Hz Matrix Profile
  • 39. What’s Next? • Many of these comparisons are redundant. Can we compress the time series somehow? Is there some representative subeset of the data which will let us approximate the MP computation? • Use this approximation to perform pattern mining at the edge. • What else can we do with this computational pattern?
  • 40. Real-time pattern mining intuition • Be fast enough to operate in real-time • We want O(1) updates • Be able to operate with power/memory/disk constraints. IoT devices. Compressed data representaion 0.972 Matrix Profile Value Time Series Subsequence
  • 41. References • Supporting Webpage (papers can be found here): https://www.cs.ucr.edu/~eamonn/MatrixProfile.ht ml • SCAMP source code: https://github.com/zpzim/SCAMP
  • 42. Chin-Chia Michael Yeh, Yan Zhu, Abdullah Mueen Nurjahan Begum, Yifei Ding, Hoang Anh Dau Diego Furtado Silva, Liudmila Ulanova, Eamonn Keogh Zachary Zimmerman, Nader S. Senobari, Philip Brisk Shaghayegh Gharghabi, Kaveh Kamgar ..and others that have inspired us, forgive any omissions. Thanks for listening! Your face here