1 I NAME OF PRESENTER
An Efficient Approach to Mine Flexible
Periodic Patterns in
Time Series Databases
Supervised by
Dr. Chowdhury Farhan Ahmed
Associate Professor
Md. Samiullah (Lecturer)
Presented by
Ashis Kumar Chanda
Swapnil Saha
Department of Computer Science and Engineering
University of Dhaka
2 I NAME OF PRESENTERCSE, DU2
Introduction
Problem Definitions
Motivation
Contribution
Experimental Results
Conclusion
1
2
4
7
8
Existing Algorithms3
The Proposed Algorithm
5
Topics to be covered
6
3 I NAME OF PRESENTERCSE, DU3
Extracting hidden patterns or structure
Gain Information from huge data
Data Mining
Introduction
Example:
Periodic amount of money withdrawn within a fixed time
Interval from an ATM booth in a specific location
Day Time slot Money amount (million)
Sun 12 am - 8 am
8 am – 4 pm
4 pm – 12 am
2
6
9
Mon 12 am - 8 am
8 am – 4 pm
4 pm – 12 am
1.2
12
9
Thu 12 am - 8 am
8 am – 4 pm
4 pm – 12 am
1.5
3
4.5
4 I NAME OF PRESENTERCSE, DU4
Flexible Periodic Pattern:
Skipping a single or couple of particular intermediate
characters or events which are not interesting in the user's
point of view
F = ‘abc’ or ‘adc’
Introduction (cont.)
Example:
Consider T = {abc adc abc}
Flexible pattern = ‘a*c’
Where ‘*’ indicates any unimportant intermediate events
‘a*c’
5 I NAME OF PRESENTER
Problem Definition
CSE, DU5
Flexible Pattern Mining:
Given a sequence with n number of characters or events,
S = {e1, e2, e3 ... en} a time series database, user specified
maximum event skipping threshold, ϴ and support threshold, σ
Mine all possible Flexible Periodic sequence of events,
FP = {e1, e2, e3 ... ei} Є S
that satisfy σ, and considering variable starting position st,
where i ≤ n with maximum ϴ number of unimportant
intermediate events
6 I NAME OF PRESENTER
Existing Algorithms
CSE, DU6
Effective
periodic
pattern
mining
Apriori based
sequential
pattern mining
Nishi et al. 2013 Huge candidate
set,
False pattern
generation
Most notable algorithms:Algorithm Mechanism Authors Year Drawbacks
CONV Convolution
process
M. G. Elfeky et
al.
2005 Fails in insertion,
deletion process
WARP Time warping
technique
M. G. Elfeky et
al.
2005 Only detects
segment
periodicity
STNR Suffix tree Rasheed et al. 2010 Lack of skipping
intermediate
events
7 I NAME OF PRESENTER
Motivation
CSE, DU7
Apriori based approach should be avoided
To vary starting positions in generated sequences
Mine three types of periodicity detection in one run
8 I NAME OF PRESENTER
Contribution
CSE, DU8
Reduced redundant patterns>
Developed a new algorithm using suffix tree like data
structure to generate Flexible Periodic Patterns
>
Also proposed a new periodicity detection algorithm>
Capable of mining all three types of periodicity in a single run>
Considered variable starting positions from the given time
series sequence
>
9 I NAME OF PRESENTER
Terms & Definitions
CSE, DU9
T = {acbd afbd agbd}
Occurrence vector:
• occ_vec[a] = [0, 4, 8]
• occ_vec[c] = [1]
• occ_vec[b] = [2, 6, 10]
Confidence of ‘a’:
• actual periodicity = 3
• perfect periodicity = 3
• Confidence = 3 / 3
Confidence of ‘c’:
• actual periodicity = 1
• perfect periodicity = 3
• Confidence = 1 / 3
Perfect periodicity = (endpos – stpos + 1)/
period
Confidence = actual periodicity/ perfect
periodicity
0 1 2 3 4 5 6 7 8 9 10 11
10 I NAME OF PRESENTER
Terms & Definitions
Ladder factor:
• lad_fact[A2] = 3
• lad_fact[A6] = 2
CSE, DU10
a
$ $
$
b
bb
b
A5
A4
A3
A2
A1
A6
A7
A8
A9
Fig: SSES tree for T = {abb$}
Length vector:
• len_vec[A2] = [3]
• len_vec[A6] = [2, 1]
support threshold, σ = 50%
lad_fact = nth max(len_vec)
n = size of len_vec * σ
11 I NAME OF PRESENTER
The Proposed Algorithm
CSE, DU11
Key Features:
- Apply discretization technique on given database
- Construct the Single Symbol Edge based Suffix (SSES)
tree
- Calculate occurrence vector at the time of construction
- Traverse the tree level-wise
- Mine patterns following joining property
- Check each generated patterns through the proposed
periodicity detection algorithm
12 I NAME OF PRESENTER
SSES Tree Construction
12
1
T = { }abcabbabb$
12 45
3934
2
3
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
29
30
31
32
33
43 35 44
36
37
38
40
41
42
21
22
23
24
25
26
27
28
a
a
a
a
a
a
a
a
a
a
$
$
$
$
$
$
$
$
$
bb
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
c c
c1
5 17
1412
2
3
4
6
711
13 16 15
8
9
10
a
a
a
a
a
$
$
bb
b b
b
b
c c
c
Period = 3
root
13 I NAME OF PRESENTERCSE, DU13
Unique event occ_vec
Occurrence vector calculation
b [1, 4, 5, 7, 8]
Confidence calculation
Pattern occ_vec confidence status
b [1, 4, 7] 100% √
Algorithm Demonstration
1
5 17
1412
2
3
4
6
711
13 16 15
8
9
10
a
a
a
a
a
$
$
bb
b b
b
b
c c
c
Patterns
Pattern occ_vec
b [1, 4, 5, 7, 8]
L1
L4 L7 L5
L8
σ = 50%
14 I NAME OF PRESENTER
bb [4, 7]
CSE, DU14
Unique event occ_vec
Occurrence vector calculation
c [1]
Confidence calculation
Pattern occ_vec confidence status
bc [1] 33% χ
1
5 17
1412
6
7
13 16 15
10
a
a
a
$
$
b
b
b
c
ba [5] 100% √
bb [4, 7] 100% √
b [4, 7]
a [5]
Patterns
Pattern occ_vec
b [1, 4, 5, 7, 8]
Join
ba [5]
b* [1, 4, 5, 7]
Algorithm Demonstration
L1
L4 L7 L5
L8
σ = 50%
15 I NAME OF PRESENTERCSE, DU15
Unique event occ_vec
Occurrence vector calculation
a [1]
Pattern occ_vec confidence status
bba [4] 50% √
1
5 17
1412
6
7
13 16 15
10
a
a
a
$
$
b
b
b
c
b*a [1, 4] 66% √
baa [] 0% χ
a [1, 4]
b [5]
Patterns
Pattern occ_vec
Join
ba [5]
b* [1, 4, 5, 7]
bab [5] 100% √
bbb [] 0% χ
b*b [5] 100% √
bb [4, 7]
b*a [1, 4]
bab [5]
bba [4]
b*b [5]
Algorithm Demonstration
Confidence calculation
L1
L4 L7 L5
L8
σ = 50%
16 I NAME OF PRESENTER
Final Result
CSE, DU16
Mined patternsPattern occ_vec
a [0, 3, 6]
ab [0, 3, 6]
abb [3, 6]
a*b [3, 6]
b [1, 4, 7]
bb [4, 7]
ba [5]
b*a [1, 4]
bab [5]
b*b [5]
c [2]
ca [2]
cab [2]
c*b [2]
bba [4]
A1
1
5 17
1412
2
3
4
6
711
13 16 15
8
9
10
a
a
a
a
a
a
$
$
bb
b b
b
b
c c
c
Pattern occ_vec
Pattern occ_vec
T = {abcabbabb$}
17 I NAME OF PRESENTER
Experimental Result
CSE, DU17
18 I NAME OF PRESENTER
Conclusion
CSE, DU18
Future Works:
Improve the proposed procedure to compare with noise-
resilient features
Develop an efficient way to execute in parallel time series
databases
Reduce memory consumption
Summary:
 Mine Flexible Periodic Patterns using Suffix tree like
structure
 Improve performance by pruning tree
 Consider variable starting positions in given time sequence
19 I NAME OF PRESENTER
References
CSE, DU19
1. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. Periodicity detection in time series databases. IEEE Trans. Knowl. Data Eng.,
17(7):875-887, 2005
2. Faraz Rasheed, Mohammed Al-Shalalfa, and Reda Alhajj. Adapting machine learning technique for periodicity detection in nucleosomal locations in
sequences. In IDEAL, pages 870-879, 2007.
3. Manziba Akanda Nishi, Chowdhury Farhan Ahmed, Md. Samiullah, and Byeong-Soo Jeong. Eective periodic pattern mining in time series databases.
Expert Syst. Appl., 40(8):3015-3027, 2013.
4. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. Warp: Time warping for periodicity detection. In ICDM, pages 138-145, 2005.
5. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997.
6. Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
7. Piotr Indyk, Nick Koudas, and S. Muthukrishnan. Identifying representative trends in massive time series data sets using sketches. In VLDB, pages
363-372, 2000.
8. Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In FOCS, pages 596{604, 1999.
9. Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. Prefix Span: Mining sequential patterns
by prefix-projected growth. In ICDE, pages 215{224, 2001.
10. Faraz Rasheed, Mohammed Al-Shalalfa, and Reda Alhajj. Efficient periodicity mining in time series databases using suffix trees. IEEE Trans. Knowl.
Data Eng., 23(1):79-94, 2011.
11. Faraz Rasheed and Reda Alhajj. Stnr: A suffix tree based noise resilient algorithm for periodicity detection in time series databases. Appl. Intell.,
32(3):267-278, 2010.
12. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260,1995.
13. Andreas S. Weigend and Neil A. Gerschenfeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, 1994.
14. Huei-Wen Wu and Anthony J. T. Lee. Mining closed exible patterns in time-series databases. Expert Syst. Appl., 37(3):2098-2107, 2010.
15. Ramakrishnan Srikant and Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In EDBT, pages 3-17, 1996.
16. Anthony K. H. Tung, Hongjun Lu, Jiawei Han, and Ling Feng. Breaking the barrier of transactions: Mining inter-transaction association rules. In KDD,
pages 297-301,1999.
17. Chang Sheng, Wynne Hsu, and Mong-Li Lee. Mining dense periodic patterns in time series data. In ICDE, page 115, 2006.
18. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259-
289, 1997.
19. Sheng Ma and Joseph L. Hellerstein. Mining partially periodic event patterns with unknown periods. In ICDE, pages 205-214, 2001.
20. Earl F. Glynn, Jie Chen, and Arcady R. Mushegian. Detecting periodic patterns in unevenly spaced gene expression time series using lomb-scargle
periodograms. Bioinformatics, 22(3):310-316, 2006.
21. Walid G. Aref, Mohamed G. Elfeky, and Ahmed K. Elmagarmid. Incremental, online, and merge mining of partial periodic patterns in time-series
databases. IEEE Trans. Knowl. Data Eng., 16(3):332-342, 2004.
20 I NAME OF PRESENTERCSE, DU20
Questions?
21 I NAME OF PRESENTERCSE, DU21
Thank You

FPPM algorithm

  • 1.
    1 I NAMEOF PRESENTER An Efficient Approach to Mine Flexible Periodic Patterns in Time Series Databases Supervised by Dr. Chowdhury Farhan Ahmed Associate Professor Md. Samiullah (Lecturer) Presented by Ashis Kumar Chanda Swapnil Saha Department of Computer Science and Engineering University of Dhaka
  • 2.
    2 I NAMEOF PRESENTERCSE, DU2 Introduction Problem Definitions Motivation Contribution Experimental Results Conclusion 1 2 4 7 8 Existing Algorithms3 The Proposed Algorithm 5 Topics to be covered 6
  • 3.
    3 I NAMEOF PRESENTERCSE, DU3 Extracting hidden patterns or structure Gain Information from huge data Data Mining Introduction Example: Periodic amount of money withdrawn within a fixed time Interval from an ATM booth in a specific location Day Time slot Money amount (million) Sun 12 am - 8 am 8 am – 4 pm 4 pm – 12 am 2 6 9 Mon 12 am - 8 am 8 am – 4 pm 4 pm – 12 am 1.2 12 9 Thu 12 am - 8 am 8 am – 4 pm 4 pm – 12 am 1.5 3 4.5
  • 4.
    4 I NAMEOF PRESENTERCSE, DU4 Flexible Periodic Pattern: Skipping a single or couple of particular intermediate characters or events which are not interesting in the user's point of view F = ‘abc’ or ‘adc’ Introduction (cont.) Example: Consider T = {abc adc abc} Flexible pattern = ‘a*c’ Where ‘*’ indicates any unimportant intermediate events ‘a*c’
  • 5.
    5 I NAMEOF PRESENTER Problem Definition CSE, DU5 Flexible Pattern Mining: Given a sequence with n number of characters or events, S = {e1, e2, e3 ... en} a time series database, user specified maximum event skipping threshold, ϴ and support threshold, σ Mine all possible Flexible Periodic sequence of events, FP = {e1, e2, e3 ... ei} Є S that satisfy σ, and considering variable starting position st, where i ≤ n with maximum ϴ number of unimportant intermediate events
  • 6.
    6 I NAMEOF PRESENTER Existing Algorithms CSE, DU6 Effective periodic pattern mining Apriori based sequential pattern mining Nishi et al. 2013 Huge candidate set, False pattern generation Most notable algorithms:Algorithm Mechanism Authors Year Drawbacks CONV Convolution process M. G. Elfeky et al. 2005 Fails in insertion, deletion process WARP Time warping technique M. G. Elfeky et al. 2005 Only detects segment periodicity STNR Suffix tree Rasheed et al. 2010 Lack of skipping intermediate events
  • 7.
    7 I NAMEOF PRESENTER Motivation CSE, DU7 Apriori based approach should be avoided To vary starting positions in generated sequences Mine three types of periodicity detection in one run
  • 8.
    8 I NAMEOF PRESENTER Contribution CSE, DU8 Reduced redundant patterns> Developed a new algorithm using suffix tree like data structure to generate Flexible Periodic Patterns > Also proposed a new periodicity detection algorithm> Capable of mining all three types of periodicity in a single run> Considered variable starting positions from the given time series sequence >
  • 9.
    9 I NAMEOF PRESENTER Terms & Definitions CSE, DU9 T = {acbd afbd agbd} Occurrence vector: • occ_vec[a] = [0, 4, 8] • occ_vec[c] = [1] • occ_vec[b] = [2, 6, 10] Confidence of ‘a’: • actual periodicity = 3 • perfect periodicity = 3 • Confidence = 3 / 3 Confidence of ‘c’: • actual periodicity = 1 • perfect periodicity = 3 • Confidence = 1 / 3 Perfect periodicity = (endpos – stpos + 1)/ period Confidence = actual periodicity/ perfect periodicity 0 1 2 3 4 5 6 7 8 9 10 11
  • 10.
    10 I NAMEOF PRESENTER Terms & Definitions Ladder factor: • lad_fact[A2] = 3 • lad_fact[A6] = 2 CSE, DU10 a $ $ $ b bb b A5 A4 A3 A2 A1 A6 A7 A8 A9 Fig: SSES tree for T = {abb$} Length vector: • len_vec[A2] = [3] • len_vec[A6] = [2, 1] support threshold, σ = 50% lad_fact = nth max(len_vec) n = size of len_vec * σ
  • 11.
    11 I NAMEOF PRESENTER The Proposed Algorithm CSE, DU11 Key Features: - Apply discretization technique on given database - Construct the Single Symbol Edge based Suffix (SSES) tree - Calculate occurrence vector at the time of construction - Traverse the tree level-wise - Mine patterns following joining property - Check each generated patterns through the proposed periodicity detection algorithm
  • 12.
    12 I NAMEOF PRESENTER SSES Tree Construction 12 1 T = { }abcabbabb$ 12 45 3934 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 29 30 31 32 33 43 35 44 36 37 38 40 41 42 21 22 23 24 25 26 27 28 a a a a a a a a a a $ $ $ $ $ $ $ $ $ bb b b b b b b b b b b b b b b b b b b b b c c c1 5 17 1412 2 3 4 6 711 13 16 15 8 9 10 a a a a a $ $ bb b b b b c c c Period = 3 root
  • 13.
    13 I NAMEOF PRESENTERCSE, DU13 Unique event occ_vec Occurrence vector calculation b [1, 4, 5, 7, 8] Confidence calculation Pattern occ_vec confidence status b [1, 4, 7] 100% √ Algorithm Demonstration 1 5 17 1412 2 3 4 6 711 13 16 15 8 9 10 a a a a a $ $ bb b b b b c c c Patterns Pattern occ_vec b [1, 4, 5, 7, 8] L1 L4 L7 L5 L8 σ = 50%
  • 14.
    14 I NAMEOF PRESENTER bb [4, 7] CSE, DU14 Unique event occ_vec Occurrence vector calculation c [1] Confidence calculation Pattern occ_vec confidence status bc [1] 33% χ 1 5 17 1412 6 7 13 16 15 10 a a a $ $ b b b c ba [5] 100% √ bb [4, 7] 100% √ b [4, 7] a [5] Patterns Pattern occ_vec b [1, 4, 5, 7, 8] Join ba [5] b* [1, 4, 5, 7] Algorithm Demonstration L1 L4 L7 L5 L8 σ = 50%
  • 15.
    15 I NAMEOF PRESENTERCSE, DU15 Unique event occ_vec Occurrence vector calculation a [1] Pattern occ_vec confidence status bba [4] 50% √ 1 5 17 1412 6 7 13 16 15 10 a a a $ $ b b b c b*a [1, 4] 66% √ baa [] 0% χ a [1, 4] b [5] Patterns Pattern occ_vec Join ba [5] b* [1, 4, 5, 7] bab [5] 100% √ bbb [] 0% χ b*b [5] 100% √ bb [4, 7] b*a [1, 4] bab [5] bba [4] b*b [5] Algorithm Demonstration Confidence calculation L1 L4 L7 L5 L8 σ = 50%
  • 16.
    16 I NAMEOF PRESENTER Final Result CSE, DU16 Mined patternsPattern occ_vec a [0, 3, 6] ab [0, 3, 6] abb [3, 6] a*b [3, 6] b [1, 4, 7] bb [4, 7] ba [5] b*a [1, 4] bab [5] b*b [5] c [2] ca [2] cab [2] c*b [2] bba [4] A1 1 5 17 1412 2 3 4 6 711 13 16 15 8 9 10 a a a a a a $ $ bb b b b b c c c Pattern occ_vec Pattern occ_vec T = {abcabbabb$}
  • 17.
    17 I NAMEOF PRESENTER Experimental Result CSE, DU17
  • 18.
    18 I NAMEOF PRESENTER Conclusion CSE, DU18 Future Works: Improve the proposed procedure to compare with noise- resilient features Develop an efficient way to execute in parallel time series databases Reduce memory consumption Summary:  Mine Flexible Periodic Patterns using Suffix tree like structure  Improve performance by pruning tree  Consider variable starting positions in given time sequence
  • 19.
    19 I NAMEOF PRESENTER References CSE, DU19 1. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. Periodicity detection in time series databases. IEEE Trans. Knowl. Data Eng., 17(7):875-887, 2005 2. Faraz Rasheed, Mohammed Al-Shalalfa, and Reda Alhajj. Adapting machine learning technique for periodicity detection in nucleosomal locations in sequences. In IDEAL, pages 870-879, 2007. 3. Manziba Akanda Nishi, Chowdhury Farhan Ahmed, Md. Samiullah, and Byeong-Soo Jeong. Eective periodic pattern mining in time series databases. Expert Syst. Appl., 40(8):3015-3027, 2013. 4. Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid. Warp: Time warping for periodicity detection. In ICDM, pages 138-145, 2005. 5. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. 6. Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000. 7. Piotr Indyk, Nick Koudas, and S. Muthukrishnan. Identifying representative trends in massive time series data sets using sketches. In VLDB, pages 363-372, 2000. 8. Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In FOCS, pages 596{604, 1999. 9. Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. Prefix Span: Mining sequential patterns by prefix-projected growth. In ICDE, pages 215{224, 2001. 10. Faraz Rasheed, Mohammed Al-Shalalfa, and Reda Alhajj. Efficient periodicity mining in time series databases using suffix trees. IEEE Trans. Knowl. Data Eng., 23(1):79-94, 2011. 11. Faraz Rasheed and Reda Alhajj. Stnr: A suffix tree based noise resilient algorithm for periodicity detection in time series databases. Appl. Intell., 32(3):267-278, 2010. 12. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260,1995. 13. Andreas S. Weigend and Neil A. Gerschenfeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, 1994. 14. Huei-Wen Wu and Anthony J. T. Lee. Mining closed exible patterns in time-series databases. Expert Syst. Appl., 37(3):2098-2107, 2010. 15. Ramakrishnan Srikant and Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In EDBT, pages 3-17, 1996. 16. Anthony K. H. Tung, Hongjun Lu, Jiawei Han, and Ling Feng. Breaking the barrier of transactions: Mining inter-transaction association rules. In KDD, pages 297-301,1999. 17. Chang Sheng, Wynne Hsu, and Mong-Li Lee. Mining dense periodic patterns in time series data. In ICDE, page 115, 2006. 18. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259- 289, 1997. 19. Sheng Ma and Joseph L. Hellerstein. Mining partially periodic event patterns with unknown periods. In ICDE, pages 205-214, 2001. 20. Earl F. Glynn, Jie Chen, and Arcady R. Mushegian. Detecting periodic patterns in unevenly spaced gene expression time series using lomb-scargle periodograms. Bioinformatics, 22(3):310-316, 2006. 21. Walid G. Aref, Mohamed G. Elfeky, and Ahmed K. Elmagarmid. Incremental, online, and merge mining of partial periodic patterns in time-series databases. IEEE Trans. Knowl. Data Eng., 16(3):332-342, 2004.
  • 20.
    20 I NAMEOF PRESENTERCSE, DU20 Questions?
  • 21.
    21 I NAMEOF PRESENTERCSE, DU21 Thank You

Editor's Notes

  • #6 To make lower subscribe clt+=, to upper clt+shift+=
  • #8 To make lower subscribe clt+=, to upper clt+shift+=