SlideShare a Scribd company logo
1 of 1
Download to read offline
Scalable Discovery of Multi Dimensional Time Series ShapeletsScalable Discovery of Multi Dimensional Time Series Shapelets
NurjahanNurjahan BegumBegum11,, JianhuiJianhui ChenChen22, Makoto Yamada, Makoto Yamada22 andand EamonnEamonn KeoghKeogh11
UniversityUniversity of California,of California, RiversideRiverside11 Yahoo LabsYahoo Labs22
WhyWhy ShapeletsShapelets are Important?are Important?
IntroductionIntroduction Admissible PruningAdmissible Pruning –– Entropy PruningEntropy Pruning
State of the art algorithmsState of the art algorithms
Multi DimensionalMulti Dimensional ShapeletsShapelets
Time series shapelets are small, local patterns in a time series that are
highly predictive of a class and are thus very useful features for building
classifiers and for certain visualization and summarization tasks.
0 200 400 600 800 1000 1200 1400
Horned Lizards
Turtles
Exact Algorithm [1]
Orderline0 ∞
split
candidate
Locked Shapelet
Euclidean Distance Early Abandoned Euclidean Distance
Admissible PruningAdmissible Pruning –– Early AbandonEarly Abandon
Orderline for bsf shapelet candidate
Optimal split point
of the 30,240 distinct ways the
remaining five distances could be added
to this line, could any of them results in
an information gain that is better than
the best so far?
Question: Answer:
Constant time answer!
Trick: consider most optimistic
scenarios and test.
Optimistically assumed placements
Information gain of the better of the two possiblities:
50% of the distance calculations are pruned!
0 100
0 100
0 100
0 100
0 100
0
0 100
Right wrist
Right Side of Pelvis
Subject A Subject B
100
0 100
0 100
0 0 100
0 100
0 100
Right Ankle
Why discovery of multi dimensional
shapelets is expensive?
Brute force algorithm complexity
Number of Candidates shapelets are high
For lagged shapelet problem, the size of candidate set will even larger
Large number of distance computation
Each distance computation will take more time now due to problem settings (multi-
dimensional)
#newiran
#Montazeri
#Neda
16 June, 2009
20 June, 2009
0 50 100 150 200 250 300
10
30
50
0 50 100 150 200 250 300
10
30
50
0 50 100 150 200 250 300
50
150
250
Event: IranRevolution
hours
hours
hours
#Taylor
#Kanye
#taylorswift
#kanyewest
0 20 40 60 80 100 120 140
2
6
10
0 20 40 60 80 100 120 140
0
20
40
0 20 40 60 80 100 120 140
0
4
8
0 20 40 60 80 100 120 140
5
15
Overall time complexity O(N2m4)
Lagged Shapelet
Our ObservationsOur Observations
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
200
400
600
Information Gain
frequency
Lag = 0
Lag = 0
Lag = 1
Lag = 2
shapelets
InformationGain
0 5 10 15 20 250
0.5
1
0 5 10 15 20 250
0.5
1
0 5 10 15 20 250
0.5
1
The number of useful shapelet candidates is small
Only improving the useful shapelet candidates is enough
References:
[1] Ye, L., et al. Time Series Shapeles: A Useful Primitive for Data Mining. KDD, 2009.
Algorithm OutlineAlgorithm Outline
Phase 1:
Use a Logistic Regression type model to identify useful candidate shapelets
Phase 2:
Consider only the useful candidate shapelets for lagged scenario with some
fixed lag parameter (relaxed assumption)
Phase 3:
Estimate heuristically or theoretically the optimal lag width.
Types of Multi DimensionalTypes of Multi Dimensional ShapeletsShapelets

More Related Content

Similar to Shapelet Poster Nurjahan Begum

Hadoop and Cloud at Netflix
Hadoop and Cloud at NetflixHadoop and Cloud at Netflix
Hadoop and Cloud at Netflix
DataWorks Summit
 
Introducing MERLIN_3.0.pptx
Introducing MERLIN_3.0.pptxIntroducing MERLIN_3.0.pptx
Introducing MERLIN_3.0.pptx
ssuser716de5
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllers
Pooyan Jamshidi
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
學翰 施
 
Generating Automated and Online Test Oracles for Simulink Models with Continu...
Generating Automated and Online Test Oracles for Simulink Models with Continu...Generating Automated and Online Test Oracles for Simulink Models with Continu...
Generating Automated and Online Test Oracles for Simulink Models with Continu...
Lionel Briand
 
Scaling classical clone detection tools for ultra large datasets
Scaling classical clone detection tools for ultra large datasetsScaling classical clone detection tools for ultra large datasets
Scaling classical clone detection tools for ultra large datasets
imanmahsa
 
PosterFormatRNYF(1)
PosterFormatRNYF(1)PosterFormatRNYF(1)
PosterFormatRNYF(1)
Usman Khalid
 

Similar to Shapelet Poster Nurjahan Begum (20)

Fuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringFuzzy Control meets Software Engineering
Fuzzy Control meets Software Engineering
 
Hadoop and Cloud at Netflix
Hadoop and Cloud at NetflixHadoop and Cloud at Netflix
Hadoop and Cloud at Netflix
 
Introducing MERLIN_3.0.pptx
Introducing MERLIN_3.0.pptxIntroducing MERLIN_3.0.pptx
Introducing MERLIN_3.0.pptx
 
Tears for quantum fears
Tears for quantum fearsTears for quantum fears
Tears for quantum fears
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllers
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)
 
Jag Trasgo Helsinki091002
Jag Trasgo Helsinki091002Jag Trasgo Helsinki091002
Jag Trasgo Helsinki091002
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Neural networks across space & time : Deep learning in java
Neural networks across space & time : Deep learning in javaNeural networks across space & time : Deep learning in java
Neural networks across space & time : Deep learning in java
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
 
Generating Automated and Online Test Oracles for Simulink Models with Continu...
Generating Automated and Online Test Oracles for Simulink Models with Continu...Generating Automated and Online Test Oracles for Simulink Models with Continu...
Generating Automated and Online Test Oracles for Simulink Models with Continu...
 
20190417 畳み込みニューラル ネットワークの基礎と応用
20190417 畳み込みニューラル ネットワークの基礎と応用20190417 畳み込みニューラル ネットワークの基礎と応用
20190417 畳み込みニューラル ネットワークの基礎と応用
 
ujava.org Deep Learning with Convolutional Neural Network
ujava.org Deep Learning with Convolutional Neural Network ujava.org Deep Learning with Convolutional Neural Network
ujava.org Deep Learning with Convolutional Neural Network
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain
 
Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lec...
Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lec...Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lec...
Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lec...
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based Software
 
Scaling classical clone detection tools for ultra large datasets
Scaling classical clone detection tools for ultra large datasetsScaling classical clone detection tools for ultra large datasets
Scaling classical clone detection tools for ultra large datasets
 
PosterFormatRNYF(1)
PosterFormatRNYF(1)PosterFormatRNYF(1)
PosterFormatRNYF(1)
 
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
 

Shapelet Poster Nurjahan Begum

  • 1. Scalable Discovery of Multi Dimensional Time Series ShapeletsScalable Discovery of Multi Dimensional Time Series Shapelets NurjahanNurjahan BegumBegum11,, JianhuiJianhui ChenChen22, Makoto Yamada, Makoto Yamada22 andand EamonnEamonn KeoghKeogh11 UniversityUniversity of California,of California, RiversideRiverside11 Yahoo LabsYahoo Labs22 WhyWhy ShapeletsShapelets are Important?are Important? IntroductionIntroduction Admissible PruningAdmissible Pruning –– Entropy PruningEntropy Pruning State of the art algorithmsState of the art algorithms Multi DimensionalMulti Dimensional ShapeletsShapelets Time series shapelets are small, local patterns in a time series that are highly predictive of a class and are thus very useful features for building classifiers and for certain visualization and summarization tasks. 0 200 400 600 800 1000 1200 1400 Horned Lizards Turtles Exact Algorithm [1] Orderline0 ∞ split candidate Locked Shapelet Euclidean Distance Early Abandoned Euclidean Distance Admissible PruningAdmissible Pruning –– Early AbandonEarly Abandon Orderline for bsf shapelet candidate Optimal split point of the 30,240 distinct ways the remaining five distances could be added to this line, could any of them results in an information gain that is better than the best so far? Question: Answer: Constant time answer! Trick: consider most optimistic scenarios and test. Optimistically assumed placements Information gain of the better of the two possiblities: 50% of the distance calculations are pruned! 0 100 0 100 0 100 0 100 0 100 0 0 100 Right wrist Right Side of Pelvis Subject A Subject B 100 0 100 0 100 0 0 100 0 100 0 100 Right Ankle Why discovery of multi dimensional shapelets is expensive? Brute force algorithm complexity Number of Candidates shapelets are high For lagged shapelet problem, the size of candidate set will even larger Large number of distance computation Each distance computation will take more time now due to problem settings (multi- dimensional) #newiran #Montazeri #Neda 16 June, 2009 20 June, 2009 0 50 100 150 200 250 300 10 30 50 0 50 100 150 200 250 300 10 30 50 0 50 100 150 200 250 300 50 150 250 Event: IranRevolution hours hours hours #Taylor #Kanye #taylorswift #kanyewest 0 20 40 60 80 100 120 140 2 6 10 0 20 40 60 80 100 120 140 0 20 40 0 20 40 60 80 100 120 140 0 4 8 0 20 40 60 80 100 120 140 5 15 Overall time complexity O(N2m4) Lagged Shapelet Our ObservationsOur Observations 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80 200 400 600 Information Gain frequency Lag = 0 Lag = 0 Lag = 1 Lag = 2 shapelets InformationGain 0 5 10 15 20 250 0.5 1 0 5 10 15 20 250 0.5 1 0 5 10 15 20 250 0.5 1 The number of useful shapelet candidates is small Only improving the useful shapelet candidates is enough References: [1] Ye, L., et al. Time Series Shapeles: A Useful Primitive for Data Mining. KDD, 2009. Algorithm OutlineAlgorithm Outline Phase 1: Use a Logistic Regression type model to identify useful candidate shapelets Phase 2: Consider only the useful candidate shapelets for lagged scenario with some fixed lag parameter (relaxed assumption) Phase 3: Estimate heuristically or theoretically the optimal lag width. Types of Multi DimensionalTypes of Multi Dimensional ShapeletsShapelets