Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Periodic pattern mining
1. 1 I NAME OF PRESENTER
Periodic Pattern Mining in
Time Series Databases
Ashis Kumar Chanda
Swapnil Saha
Department of Computer Science and Engineering
University of Dhaka
2. 2 I NAME OF PRESENTERCSE, DU2
Introduction
Key Terms
Suffix Tree Generation
Conclusion
>
>
>
Time Series Database>
Periodic Pattern Detection
>
Topics to be covered
>
3. 3 I NAME OF PRESENTERCSE, DU3
Introduction
What is a time-series database?
A time-series database consists of
sequences of values or events obtained
over repeated measurements of time
A fixed time intervals (e.g., hourly, daily,
weekly).
4. A time series is a set of observation taken at
specified times
A time series involving a variable Y
If a time series is defined by y1, y2, y3 ...
Values at times t1, t2, t3 ... Then we can
write a function of time Y=F(t)
4
5. Long term movements
Cyclic movements
Seasonal movements
Irregular or random movements
We can define each movements as L, C, S, I
variables respectively
And Time series variables Y = L+C+S+I
or Y = L*C*S*I
5
8. Periodicity in Subsection of a Time Series
T= gbxy asdf abpq abmn
Stpos = 8
endPos= 15
So, Subsection part gbxy asdf abpq abmn
8
9. Periodicity with Time Tolerance
We can’t get always noise free time series data
So we check some more bit then our target
sequence
This extra bit is known as time tolerance (tt)
If X is a pattern of p length in T then we check
At stPos, stPos+p±tt, stPos+2p±tt . . . ..
9
10. A period in a time series may be represented
by 5 tuple
( S, p, stPos, endPos, Conf)
S = sequence of periodic pattern
p = check pattern after p num of char
Conf= confidence
stPos, endPos is the starting and ending
position of segment where match pattern
10
11. Suppose, T= abxy acpq abdd abmn
then ( ab, 4, 0, 11, 1) means
Find ab pattern in T from 0 position to 11
postion affter 4 char
a b x y a c p q a b d d abmn
0 1 2 3 4 5 6 7 8 9 10 11
11
12. Occurrence Vector:
a b c a b b a b b a $
0 1 2 3 4 5 6 7 8 9
Occurrence vector of a : (0 3 6 9)
Occurrence vector of ab : (0 3 6)
12
13. Difference Vector:
a b c a b b a b b a $
0 1 2 3 4 5 6 7 8 9
Occurrence vector of a : 0 3
Difference vector : 3
Occurrence vector of bb : 4 7
Difference vetor : 3
13
14. How to get a string format from
a Transactional database?
14
Discretization Technique
16. We need to define a range or group from DB
and characterized each range by a unique
ASCII character
Suppose,
In our previous example,
log in defined by a
log out ,, x
before log in ,, b
before log out ,, c
after log out ,, d
16
19. ‘abcabbaabb$’ has following ten suffixes. We
can ignore the 10th suffix when generating
suffix tree
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
7. abb$
8. bb$
9. b$
10. $
19
23. Strings:
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
23
a
b
b
c
b
a
b
$
a
b
b
c
b
a
b
$
a
b
b
a
c
b
b
a
b
b
$
24. Strings:
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
24
a
b
b
c
b
a
b
$
a
b
b
c
b
a
b
$
a
b
b
a
c
b
b
a
b
b
$
b
a
b
b
$
25. Strings:
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
25
a
b b
c
b
a
b
$
a
b
b
a
c
b
b
a
b
b
$
b
a
b
b
$
c
b
a
b
$
a
b
b
b
a
b
b
$
26. Strings:
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
26
a
b b
c
b
a
b
$
a
b
b
a
c
b
b
a
b
b
$
b
a
b
b
$
c
b
a
b
$
a
b
b
b
a
b
b
$
a b
b
$
27. Strings:
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
7. abb$
27
a
b b
c
b
a
b
$
a
b
b
a
c
b
b
a
b
b
$
b
a
b
b
$
c
b
a
b
$
a
b
b
b
a
b
b
$
a b
b
$
$
28. Strings:
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
7. abb$
8. bb$
28
a
b b
c
b
a
b
$
a
b
b
a
c
b
b
a
b
b
$
b
a
b
b
$
c
b
a
b
$
a
b
b
b
a
a b
b
$
$
b
b
$
$
29. Strings:
1. abcabbabb$
2. bcabbabb$
3. cabbabb$
4. abbabb$
5. bbabb$
6. babb$
7. abb$
8. bb$
9. b$
29
a
b b
c
b
a
b
$
a
b
b
a
c
b
b
a
b
b
$
b
a
b
b
$
c
b
a
b
$
a
b
b
b
a
a b
b
$
$
b
b
$
$
$
30. abcabbabb$
Edge leaf node holds
a number that represents
starting position
of the suffix
Each intermediate node holds a number which
is the length of the substring read from root
to the intermediate node
30
0
a
b
1
b
c
b
a
b
$
a
b
b
2a
c
b
b
a
b
b
$
2
6
b
a
b
b
$
c
b
a
b
$
a
b
b
1
4
b
a
5
a b
b
$
3
$
3
b
b
$
2
7
$
$
8
34. Input: a time series of Size n
Output: Positions of periodic patterns
Process:
for each occurrence vector of size k
find p
for 0 to k
check each position after p char
count confidence
add to list if greater than threshold
34
35. abcabbabb$
ab - (0,3,6)
abb - (3,6)
bb - (4,7)
b - (1,5,8,4,7)
35
stpos= 0
endPos= 6
P= 3-0 = 3
Now check occurrence vector of ab
if difference equal p
count increment
Check confidence
Add to pattern list if confidence >= Θ
36. abcdabcabcab$
ab - (0,4,7,10)
36
stpos= 0
endPos= 10
P= 4-0 = 4
Now check occurrence vector of ab
if difference equal p
count increment
Only one pattern get 0 to 10 with p=4
abcdabcabcab$
37. abcdabcabcab$
ab - (0,4,7,10)
37
stpos= 4
endPos= 10
P= 7-4 = 3
Now check occurrence vector of ab
if difference equal p
count increment
3 pattern get 4 to 10 with p=3
abcdabcabcab$
39. - Elfeky proposed two separate algorithms to
detect symbol & segment periodicity. (CONV)
& (WARP)
But it not used in sub-sequence & complexity
O(nlogn) & O(n^2)
- Han’s parper algorithm used in sub-sequence
But it need user input
39
40. - In this perspective, The algorithm discussed
here is better than previous
- Complexity O(nlogn)
- Works online
40
41. 41 I NAME OF PRESENTERCSE, DU41
References
- Periodic pattern mining using suffix tree
by Rasheed, Al-Shalalfa, & Alhajj, 2011
- Effective periodic pattern mining in time series database
by Nishi, Farhan, Samiullah, Jeong
- Data Mining Concepts & Techniques
by J. Han & M. Kamber
- Database system Concept
by Abraham Sillberschatz, Korth, Sudarshan