SlideShare a Scribd company logo
1 of 6
Download to read offline
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3285
Motifs In Time Series For Prediction
A naïve approach compared to ARIMA
Nertila Ismailaja
Programme Analyst
Armundia Factory
Tirana, Albania
n.ismailaja@armundiafactory.com
Abstract—Time series is a subject that
includes two key factors - observations and time.
It is obvious that observations are time-
dependent. Another interest field is motif
discovery, composed solely by time series
subsequences. During this time, loads of
similarity measures have been presented. In their
article, Dhamo et al [5] concluded that the best
performance according to the quality in motif
discovery was achieved by Chouakria’s index with
CID (Chouakria’s index, proposed by Chouakria et
al [4] and CID is proposed by Batista et al [2]).
The following step is to use this distance and
other time series features to make predictions.
Various tests are made over time series with high
level of complexity. The results achieved by this
approach are compared to ARIMA models. All the
tests are made in R.
Keywords—motif discovery;ARIMA;time series;
forecasting; Chouakria with CID; R
I. INTRODUCTION
Motif discovery has been widely exploited in
Biological sciences, to detect anomalies in heart
beatings, pressure, etc. The effort was to deal with a
large amount of data (known as big data). The first
approach was made by reducing data dimensions’
[1][3]. Some methods were Discrete Fourier
Transformation[1], Discrete Wavelet Transformation[3],
Piecewise Linear Approximation [12], etc. The first
scientists to formally use motif discovery in time series,
were Agrawal et al [3], Lin et al [6], Mueen[7][8][9][10],
etc. Since then, several tactics have been proposed.
The studies have always followed the similarity search
throughout the time series, regarding to the problem.
The most recent method is ε-queries, proposed as a
probability approach to motif discovery. An expansion
of motif discovery is to compare time series measures
to each-other, based in four criterions - algorithm
complexity, number of discovered motifs, accuracy and
quality. In their article, Dhamo et al. [4] concluded that
Chouakria index with CID gave better results than CID
alone. Taking in consideration that Chouakria index
with CID is a well-performing similarity measure, we
move on in the following step- applying motif discovery
in forecasting. There are used several well-known time
series, for each of which is created a model, and then
compared to the model given by ARIMA (to construct
the model was used a predefined package of R,
”forecast”). According to the results, by using as norm
error in forecasting, we can deduce that our model
provides better performance.
II. TIME SERIES
A. Basic Concepts
A time series A time series may be defined as a
collection of data, where time is a relevant component.
Definition 1 A time series T is an ordered collection
of data, observed in n - regular intervals of time
[𝑇1, 𝑇2, … , 𝑇𝑛].
Definition 2 A motif M of length m in a time series T
of length n is a subsequence of 𝑇 which repeats itself
in T.
Definition 3 In ε-query search for similarity between
two time series’ subsequences 𝑃 = [𝑃1, 𝑃2, … , 𝑃𝑚] and
𝑄 = [𝑄1, 𝑄2, … , 𝑄 𝑚] of length m, in a time series T of
length n, P and Q are considered similar if the distance
between P and Q is laid in an interval with absolute
error equal to ε .An illustration of a motif is given in Fig.
1., with dataset unemp.
Fig.1. Algorithm for motif discovery
B. Distances Used For Time Series
While comparing two notions, it is indispensable to
use a criterion for comparison, in order to claim
whether these notions mean or not the same. In time
series, this criterion is undoubtedly the similarity
measure, which is based in the concept of the
distance.
Definition 4 A distance is a function that complies
with the following properties:
 Identity: ∀𝑥, 𝑦 ∈ 𝑅, 𝑑(𝑥, 𝑦) = 0 ↔ 𝑥 = 𝑦
 Symmetry: ∀𝑥, 𝑦 ∈ 𝑅, 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥)
 Non-negativity: ∀𝑥, 𝑦 ∈ 𝑅, 𝑑(𝑥, 𝑦) ≥ 0
 Transitive: ∀𝑥, 𝑦 ∈ 𝑅, ∃𝑧 ∈ 𝑅|𝑑(𝑥, 𝑦) ≤
𝑑(𝑥, 𝑧) + 𝑑(𝑧, 𝑦)
Definition 5 A similarity measure is a function that
disobeys at least one of the conditions to be distance.
0 100 200 300
200500800
CID
Index
Observedvalues
Time Series
Motif
Similar subseq
2 6 12
200500800
Similar subseq
Motif's length
Values
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3286
In other words, a similarity measure is able to
measure the similarity between two time series’
subsequences. Reported as a key factor in the quality
and quantity in motif discovery, there are several
similarity measures. In contrary to its’ wide usage,
Euclid distance provides dissatisfactory results if it is
used as a similarity measure.
Definition 6 Euclid distance between two
subsequences with length m, 𝑃 = [𝑃1, 𝑃2, … , 𝑃𝑚] and
𝑄 = [𝑄1, 𝑄2, … , 𝑄 𝑚], is the index, measured as below:
𝑑 𝐸𝑢𝑐(𝑃, 𝑄) = ∑ (𝑃𝑖 − 𝑄𝑖)2𝑚
𝑖=1 (1)
As we mentioned above, the most performing
similarity measure is Chouakria’s index with CID. In
this case, it is therefore necessary to explain what is
CID, and then to give the definition of Chouakria’s
index.
Definition 7 CID distance between two sub-
sequences with length m, 𝑃 = [𝑃1, 𝑃2, … , 𝑃𝑚] and
𝑄 = [𝑄1, 𝑄2, … , 𝑄 𝑚], is the index, measured as below:
𝑑 𝐶𝐼𝐷(𝑃, 𝑄) = 𝑑 𝐸𝑢𝑐𝑙(𝑃, 𝑄)
max{𝐶𝐸(𝑃),𝐶𝐸(𝑄)}
min{𝐶𝐸(𝑃),𝐶𝐸(𝑄)}
(2)
, where:
𝐶𝐸(𝑃) = ∑ (𝑃𝑖 − 𝑃𝑖+1)2𝑚−1
𝑖=1 (3)
This similarity measure is used for nonlinear time
series subsequences. One main feature of CID is that,
in case of two nonlinear subsequences P, Q of length
m:
𝑑 𝐶𝐼𝐷(𝑃, 𝑄) > 𝑑 𝐸𝑢𝑐𝑙(𝑃, 𝑄) (4)
Moreover, the closer to 1 is the fraction
max{𝐶𝐸(𝑃),𝐶𝐸(𝑄)}
min{𝐶𝐸(𝑃),𝐶𝐸(𝑄)}
, the more similar is the behavior of the
two subsequences.
Another similarity measures is Chouakria’s index.
This measure is composed by two factors, one of
which is responsible for behavior and the other for
proximity of values[]. Chouakria’s index is measured,
as below:
Definition 8 Chouakria’s index between two sub-
sequences with length m, 𝑃 = [𝑃1, 𝑃2, … , 𝑃𝑚] and
𝑄 = [𝑄1, 𝑄2, … , 𝑄 𝑚], is the index, measured as below:
𝑑 𝐶𝐼𝐷(𝑃, 𝑄) =
2
1+𝑒 𝑘∗𝛿(𝑃,𝑄) ∗ 𝐶𝑂𝑅𝑡(𝑃, 𝑄) (5)
, where:
𝐶𝑂𝑅𝑡(𝑃, 𝑄) =
∑ (𝑃 𝑖−𝑃 𝑖+1)(𝑄 𝑖−𝑄 𝑖+1)𝑚−1
𝑖=1
√∑ (𝑃 𝑖−𝑃 𝑖+1)2𝑚−1
𝑖=1 √∑ (𝑄 𝑖−𝑄 𝑖+1)2𝑚−1
𝑖=1
(6)
and 𝑘 ∈ 𝑅+
; and 𝛿(𝑃, 𝑄) may be Euclid, etc. In their
article, Dhamo et al. [] proposed CID as distance
𝛿(𝑃, 𝑄) and 𝑘 = 2.
III. MODELLING TIME SERIES
Time series is a novel concept, which enrolls two
main concepts - determinism and randomness. A time
series is composed by many parts, such as trend,
seasonality, cycles, etc. After the detection of these
components in a time series, the remains are
considered a random part. In other words, a time
series T of length n, can be decomposed, as below:
𝑇𝑛 = 𝑡 𝑛 + 𝑐 𝑛 + 𝑠 𝑛 + 𝜀 𝑛 (7)
, where 𝑡 𝑛 is trend, 𝑐 𝑛 is cycle, 𝑠 𝑛 is seasonal
variation and 𝜀 𝑛 is noise. Usually a time series can be
decomposed according to these features. The only
characteristic to be modeled is the noise, which is a
random variable.
A. ARIMA Models
As cited above, a time series is a regular collection
of data. An ARIMA model is composed by
implementing other forecasting models, such as AR,
MA and the operator ∆. Let us first define the key
concepts of a stochastic process.
Definition 9 Let {𝜀𝑡, 𝑡 ∈ 𝑇} be a stochastic process.
𝜀𝑡~𝑊𝑁(0, 𝜎2) is considered to be a white noise, if it
obeys the following rules:
{
𝐸(𝜀𝑡) = 0
𝐸(𝜀𝑡 𝜀𝑠) = {
𝜎2
, 𝑠 = 𝑡
0, 𝑠 ≠ 𝑡
(8)
Definition 10 Let {𝑌𝑡, 𝑡 ∈ 𝑇} be a stochastic process
and 𝜀𝑡~𝑊𝑁(0, 𝜎2). An AR(p) model is constructed, as
below:
𝜀𝑡 = ∑ 𝜑𝑖
𝑝
𝑖=0 𝑌𝑡−𝑖 (9)
, where 𝜑𝑖, 𝑖 = 1, 𝑝̅̅̅̅̅ are coefficients defined dynami-
cally.
Definition 10 Let {𝑌𝑡, 𝑡 ∈ 𝑇} be a stochastic process.
Let 𝜔𝑖~𝑊𝑁(0, 𝜎2), 𝑖 = 0,1,2, … . A MA(q) model is
constructed, as below:
𝑌𝑡 = ∑ 𝜓𝑖
𝑞
𝑖=0 𝜔𝑡−𝑖 (10)
, where 𝜓𝑖, 𝑖 = 1, 𝑝̅̅̅̅̅ are coefficients defined dynami-
cally.
Logically, a ARMA(p,q) model, is constructed, as
below:
Definition 12 A stochastic process {𝑌𝑡, 𝑡 ∈ 𝑇} is said
to be ARMA(p,q) model (autoregressive moving
average of order (p,q)) if it can be expressed as:
𝑌𝑡 = 𝜇 𝑡 + ∑ 𝜑𝑖
𝑝
𝑖=0 𝑌𝑡−𝑖 + ∑ 𝜓𝑖
𝑞
𝑖=0 𝜔𝑡−𝑖 (11)
, where 𝜇 𝑡, 𝜑𝑖, 𝑖 = 0, 𝑝̅̅̅̅̅ and 𝜓𝑖, 𝑖 = 0, 𝑝̅̅̅̅̅ are constants
and 𝜔𝑡 is white noise.
An important parameter in ARIMA(p,d,q) is also the
difference operator (∆), defined as below:
Definition 13 Let { 𝑌𝑡, 𝑡 ∈ 𝑇} be a stochastic process.
The ∆ 𝑑
operator is though defined, as:
∆ 𝑑
= 𝑌𝑡 − 𝑌𝑡−𝑑 (12)
, where 𝑑 = 1,2, …
Having all the necessary concepts, we can
introduce ARIMA concept.
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3287
Definition 14 The stochastic process {𝑌𝑡, 𝑡 ∈ 𝑇} is
considered to be an ARIMA(p,d,q) process only if the
process
𝑋𝑡 = ∆ 𝑑
𝑌𝑡 (13)
is an ARMA(p,q) process.
Of course that there are several ways to determine
the most efficient values of p,q and d. But this will be
discussed in the following sections.
B. The Proposed Approach
When trying to detect a motif in a time series, it is
agreed that there is a pattern that repeats itself in time
series, independently from observation. An example is
given in Fig. 2.
Fig.2. The relation between time series’ sub-
sequences (time series co2)
It is obvious the relationship that exists between
the subsequences of length 12. The relation is made
visible by the normalization of these subsequences,
where is impossible to distinguish subsequences from
each-other.
Definition 15 Normalization of the time series 𝑇 with
length 𝑛, is the time series 𝑇’, defined as:
𝑇′ =
𝑇−𝑚𝑒𝑎𝑛(𝑇)
𝑠𝑑(𝑇)
(14)
If there is a such strong relation between two
subsequences 𝑃 = [𝑇𝑖, 𝑇𝑖+1, … , 𝑇𝑖+𝑚−1] and 𝑄 =
[𝑇𝑗, 𝑇𝑗+1, … , 𝑇𝑗+𝑚−1] with length m, from the time series
T, it is presumable that the relation would be of the
same strength even if the length of the subsequence is
greater. The situation is shown in Fig. 3.
Fig. 3. Widening the motif’s length
It is clear from Fig. 3. that, even the length of the
subsequences has been enlarged, from 12 to 15, the
difference is still small.
C. Reasoning to Reach to Forecasting
When we say that there is a motif in a time series,
we assume that there is a large similarity between
different subsequences of the same time series.
Moreover, these subsequences are generally spread
throughout all the time series. Given a fixed length m
as motif’s length, we are able to find, according to
Brute-Force algorithm, all patterns in the time series.
In contrary to this algorithm, in order to predict, there
are some changes being made. Those changes are
reflected not only in code, but even in structure:
Suppose that we require to use a subsequence of
length m in our time series T of length n. Moreover, is
found out that the most approximate subsequence is
stated in position j. In this case, the dependency factor
would be calculated, as below:
𝑑𝑒𝑝𝑓𝑎𝑐𝑡 =
𝑇[𝑗+𝑚]
𝑇[𝑗+𝑚−1]
(15)
Our prediction is 𝑇[𝑛 + 1], which is calculated as:
𝑇[𝑛 + 1] = 𝑑𝑒𝑝 𝑓𝑎𝑐𝑡 ∗ 𝑇[𝑛] (16)
We define 𝑎 = 𝑇[(n − m + 1) : (n + 1)]. We can create a
95% confidence interval for our prediction, as below:
𝐶𝐼(𝑇[𝑛 + 1]) =]𝐸(𝑎) − 1.96
𝑠𝑑(𝑎)
√𝑚+1
, 𝐸(𝑎) + 1.96
𝑠𝑑(𝑎)
√𝑚+1
(17)
An example of our algorithm in R, in time series
AirPassengers, is given in Fig. 4.
Fig.4. Snapshot of the proposed approach
CO2 time series
Time
Observation
0 100 200 300 400
320350
2
320350
Subsequences
Index
Observation
2
-1.50.01.5
Normalized
Index
Observation
CO2 time series
Time
Observation
0 100 200 300 400
320350
2
320350
Subsequences
Index
Observation
2
-2.0-0.51.0
Normalized
Index
Observation
Brute -force algorithm
Brute_force=function(T,m,epsilon)
1.#n=length(T)
2.threshold=min{d(Ai,Bj)}, Ai=T[i : (i+m-1)],i=1: (n-m+2) and
j=(i+1): (n-m+2)
3.motif_index=argmax{d(Ai,Bj)<threshold+epsilon}
4.motif=T[motif_index: (motif_index+m-1)]
5. similar_seq=T[where(d(motif,Bj)<threshold+epsilon)], Bj=T[j :
(j+m-1)] and j>motif_index
end function
Short-term algorithm
Short_term=function(T,m)
1.Keep the last subsequence of length m constant (also considered as
motif, 𝑇[(n − m + 1) : n].).
2.Choose one suitable distance for time series (Chouakria’s index)
3. Normalize time series’ subsequences, by using (8).
4.Detect the strongest relationship between our motif and other
subsequences of our time series. Keep track of the initial index
where the strongest relationship is proven.
5.Create a dependency factor that is being used during the
prediction.
end function
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3288
D. Forecasting in a Long-term Period
In the previous paragraph, we constructed an
algorithm to predict the next observation, based on
previous ones. What is more, there is a possibility to
find a confidence interval for the provided prediction.
But, in practice, there is a long-term necessity for
prediction. In order to get to a long-term forecasting,
we construct the following algorithm, as below:
According to this algorithm, we can keep track of
the original time series and the forecasts provided. An
example is provided below, in Fig. 5.
Fig.5. Forecasting in long-term period
In Fig.5. is obviously seen the proximity in values
and even in shape. This means that the model that is
built is strong.
IV. ARIMA VS PROPOSED METHOD
In many forecasting models, such as AR, MA,
ARIMA, etc, the basis of fitting the best parameters is
low error in curve fitting. The lower the error it is, the
better are the parameters.
Definition 16 Given T a time series of length n, and
T’ the ARIMA fitted model. The error in curve fitting is
called the vector, as below:
𝜀 = 𝑇 − 𝑇′ (18)
In practice, most commonly used is the Sum
Square Error (𝜀′𝜀). There are also other measures that
supply with information about the quality of the
constructed model, such as AIC (Akaike Information
Criterion) or BIC (Bayes Information Criterion).
Definition 11 Given T a time series of length n, and
T’ the ARIMA fitted model. AIC is measured, as below:
𝐴𝐼𝐶 = ln (
1
𝑛
∑ 𝑒𝑖
2𝑛
𝑖=1 ) +
2(𝑘+1)
𝑛
(19)
, where 𝑒𝑖
2
= (𝑇𝑖 − 𝑇𝑖′)2
and k is the degree of
freedom of T.
Definition 12 Given T a time series of length n, and
T’ the ARIMA fitted model. BIC is measured, as below:
𝐵𝐼𝐶 = ln (
1
𝑛
∑ 𝑒𝑖
2𝑛
𝑖=1 ) +
(𝑘+1)∗ln(𝑛)
𝑛
(20)
, where 𝑒𝑖
2
= (𝑇𝑖 − 𝑇𝑖′)2
and k is the degree of
freedom of T.
Generally, between AIC and BIC criterions, the
most useful is BIC, which is considered to be more
accurate.
A. Detecting Error in Forecast in the Proposed
Model
In the proposed algorithm, we mentioned that the
method does not create a single model - it does not
create parameters. Our model is a dynamic model,
which, in a long-term period, changes constantly. This
means that AIC or BIC criterion is inapplicable.
In order to be able to proof which model is better,
and why, we use the concept of error in forecasting.
This means that we keep a certain percentage of
values unused (the last observations), in order to
detect which algorithm provides a smaller Sum Square
Error in Predictions. The number of predictions is kept
constant, 10, in any case that we studied. This is done
in order to prevent reducing the amount of information
in disposal.
In Table 1, there are some examples of errors
made during the forecast, for several time series.
TABLE I. COMPARISON OF TWO METHODS
Time
series
Error in forecasting
ARIMA
Prop.
approach
star 4184.963 4245.082
prodn 16.14206 18.56693
rosewine Inf 768.4422
unemp 15006.22 4095.185
penguin 596901.4 321.071
Al_births Inf 3489920
hsales 671.1774 274.1702
AirPassen
gers
281852.9 167063.2
ARIMA vs Proposed approach
In Table 1, is obvious the advantage of our
approach in relation to ARIMA. In 2 cases, ARIMA
could not give an appropriate model – AIC criterion
could not be approximated. In our trials, resulted that
in 66.7% of the cases, our approach provided better
performance than ARIMA’s model. In this percentage,
is also included the 12.5% when ARIMA failed to
provide a suitable model.
An example of how much differ the real results
from the ones provided by ARIMA, is given in Fig.6.
Long-Term Algorithm
Long_Term=function(T,m,k)
1.#n=length(T); k is the number of lags we want to forecast
2.for (i in seq(1,k)){
3.prediction=Short_Term(T,m)
4.T=c(T,prediction)
5. }
end function
0 10 20 30 40 50 60 70
100200300
Comparison real vs forecast
Time index
Observation
Time series
Prediction
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3289
The time series taken in consideration is penguin,
while the length of the motif is 7.
Fig.6. ARIMA, the prop. method and real values
It is clearly visible that the shape and the values
gained by the proposed method are more compatible
with the real ones. Whereas ARIMA’s curve of
prediction is far more unrealistic. In many cases,
ARIMA provided central values, very close to a linear
curve. Whereas the proposed method provides
different shapes, a nonlinear vector of forecasts.
B. Detecting factors that influence results
In order to define where this result comes from, so,
whether it is because of the complexity of the time
series, we use a complexity measure for time series’
complexity-fluctuations[9].
Definition 17 Given T a time series of length n,
fluctuations are measured, as below:
𝑓𝑙𝑢𝑐𝑡(𝑇) =
1
𝑛−1
∑ (𝑇𝑖 − 𝑇𝑖+1)2𝑛−1
𝑖=1 (21)
In Fig.7. is shown the fluctuation of some time
series.
Fig.7. Fluctuation of some time series
According to the traditional definition to the
correlation between two factors, as below:
Definition 18 The correlation between two random
variables X and Y with length n is the index, measured
as below:
𝐶𝑂𝑅(𝑃, 𝑄) =
∑ (𝑃 𝑖−𝑃̅)∗(𝑄 𝑖−𝑄̅)𝑛
𝑖=1
√∑ (𝑃 𝑖−𝑃̅)2𝑛
𝑖=1 √∑ (𝑄 𝑖−𝑄̅)2𝑛
𝑖=1
(22)
it results that, in both cases, the relation between
fluctuation and error in forecasting is strong (in each
case, greater than 99%). This means that there is a
strong dependency between these two factors.
Another important factor that influences the
forecast, might be the quantity of the data. It is naively
deduced that the longer the time series is, the smaller
will be the error. There have been made various tests,
by increasing the amount of data. It has been chosen a
high-complexity time series, such as hsales. In Fig.8.
we see what happens with the error in forecasting
when this time series is enlarged, with fixed length of
the time series m=12 and m=9.
Fig.8. Error in forecasting for different lengths
In 50% of the cases, ARIMA (the proposed
method with m=12) were better than the other. In a low
percentage, the proposed method with m=9 is more
performant than the others. What is more, we notice
that there is no defined trend of the error in
forecasting. This means that, if we increase the
amount of data, we cannot presume whether the error
will decrease or not.
An important parameter during our method would
be the length of the motif, m. The time series selected
for this evaluation is again hsales, due to its’
complexity. We keep a fixed length of the time series-
n=170. An example of this algorithm is shown in Fig.9.
Fig.9. a) hsales b) Detecting the best value of m
We can see that the lower error in prediction is
made for m=9, equal to the periodicity of the time
series. But this contradicts the upper results, where the
proposed approach with m=12 was better than the
proposed approach with m=9. This means that, in
order to get better results, the length of the motif
should be defined according to a careful study of
periodicity, variations, etc.
ACKNOWLEDGMENT
I would like to thank Ms. Ilda Drini, employee at
Department of Financial Stability and Statistics, for all
the help during the creation of this method.
REFERENCES
Comparison of fluctuations
Time series
Fluctuations
050000150000
star
prodn
rosewine
unemp
penguin
Al_births
hsales
Air_pass
60 80 100 120 140 160
040008000
Comparison
Length of t.s.
Errorinpred.
ARIMA
Pr. m. m=12
Pr. m. m=9
0 50 100 200
30507090
Time series hsales
Time
Observations
4 6 8 10 12
10003000
Detect best length of m
Length of motif
Errorinpred.
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3290
All time series that are used in this article, can be
found in the next websites:
http://new.censusatschool.org.nz/resource/time-series-
data-sets-2013/
www.instat.gov.al
https://datamarket.com/data/list/?q=provider%3Atsdl
http://www.cs.ucr.edu/~eamonn/discords/
http://vincentarelbundock.github.io/Rdatasets/datasets.ht
ml
http://cran.r-project.org/web/packages/astsa/index.htm
[1] Agrawal, R., Faloutsos, C., Swami, A., (1993):
“Efficient similarity search in sequence databases,” in:
Fourth International Conference on Foundations of
Data Organization, D. Lomet, Ed., Heidelberg:
SpringerVerlag, pp. 69–84
[2] Batista, G., Keogh, E. J. Tataw, O., de Souza,
V., “CID: an efficient complexity-invariant distance for
time series”
[3] Chan, K. & Fu, W. (1999). Efficient time series
matching by wavelets. Proceedings of the 15 th IEEE
International Conference on Data Engineering.
[4] Chouakria, A., D., Diallo A., Giroud F., (2007)
“Adaptive clustering of time series”. International
Association for Statistical Computing (IASC), Statistics
for Data Mining, Learning and Knowledge Extraction,
Aveiro, Portugal
[5] Dhamo, E., Ismailaja, N., Kalluçi, E., (2015):
“Comparing the efficiency of CID distance and CORT
coefficient for finding similar subsequences in time
series”, Sixth International Conference ISTI, 5-6 June.
[6] Lin, J., Keogh, E. , Lonardi, S. and Patel, P.
(2002): “Finding Motifs in Time Series”, in Proc. of 2nd
Workshop on Temporal Data Mining
[7] Mueen A., Keogh, E., J., (2010A): “Online
discovery and maintenance of time series motifs”.
KDD, pg. 1089-1098
[8] Mueen, A., Keogh, E., Zhu, Q., Cash, S.,
Westover, B., (2009A) “Exact Discovery of Time
Series Motifs”, SDM, pg. 473-484
[9] Mueen A., Keogh, E., J., Bigdely- Shamlo N.,
(2009): “Finding Time Series Motifs in Disk-Resident
Data”. ICDM, pg 367-376
[10]Mueen A., (2013):” Enumeration of Time
Series Motifs of All Lengths”. ICDM,pg. 547-556
[11]Yan, Ch., Fang, J., Wu, L., Ma, Sh., (2013):
“An Approach of Time Series Piecewise Linear
Representation Based on Local Maximum Minimum
and Extremum”, Journal of Information &
Computational Science 10:9 2747–2756
[12]Yi, B.-K., Faloutsos Ch., (2000): “Fast time
sequence indexing for arbitrary Lp norms”. In The
VLDB Journal, pages 385–394.

More Related Content

What's hot

MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhangcanlin zhang
 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural NetworkAshwani Jha
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapterBan Bang
 
Applied Mathematics and Sciences: An International Journal (MathSJ)
Applied Mathematics and Sciences: An International Journal (MathSJ)Applied Mathematics and Sciences: An International Journal (MathSJ)
Applied Mathematics and Sciences: An International Journal (MathSJ)mathsjournal
 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapHa Phuong
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-reportRyen Krusinga
 
Advanced cosine measures for collaborative filtering
Advanced cosine measures for collaborative filteringAdvanced cosine measures for collaborative filtering
Advanced cosine measures for collaborative filteringLoc Nguyen
 
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...ijfls
 
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)Mohanlal Sukhadia University (MLSU)
 
Free Ebooks Download
Free Ebooks Download Free Ebooks Download
Free Ebooks Download Edhole.com
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
A technique to construct linear trend free fractional factorial design using...
A technique to construct  linear trend free fractional factorial design using...A technique to construct  linear trend free fractional factorial design using...
A technique to construct linear trend free fractional factorial design using...Premier Publishers
 
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...TELKOMNIKA JOURNAL
 
Varaiational formulation fem
Varaiational formulation fem Varaiational formulation fem
Varaiational formulation fem Tushar Aneyrao
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...IAEME Publication
 
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!idescitation
 

What's hot (17)

MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhang
 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural Network
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapter
 
Applied Mathematics and Sciences: An International Journal (MathSJ)
Applied Mathematics and Sciences: An International Journal (MathSJ)Applied Mathematics and Sciences: An International Journal (MathSJ)
Applied Mathematics and Sciences: An International Journal (MathSJ)
 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature Map
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
 
Advanced cosine measures for collaborative filtering
Advanced cosine measures for collaborative filteringAdvanced cosine measures for collaborative filtering
Advanced cosine measures for collaborative filtering
 
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
 
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
 
Free Ebooks Download
Free Ebooks Download Free Ebooks Download
Free Ebooks Download
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
A technique to construct linear trend free fractional factorial design using...
A technique to construct  linear trend free fractional factorial design using...A technique to construct  linear trend free fractional factorial design using...
A technique to construct linear trend free fractional factorial design using...
 
Goldberg etal 2006
Goldberg etal 2006Goldberg etal 2006
Goldberg etal 2006
 
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
 
Varaiational formulation fem
Varaiational formulation fem Varaiational formulation fem
Varaiational formulation fem
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...
 
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
 

Viewers also liked

Taufiq nur r manajemen pembiayaan lpi
Taufiq nur r manajemen pembiayaan lpiTaufiq nur r manajemen pembiayaan lpi
Taufiq nur r manajemen pembiayaan lpisiti nurjannah
 
Teoria de la diversió
Teoria de la diversióTeoria de la diversió
Teoria de la diversióRubenM2803
 
DanHickey_2014_Folio
DanHickey_2014_FolioDanHickey_2014_Folio
DanHickey_2014_FolioDaniel Hickey
 
Tot art és esport i tot esport és art
Tot art és esport i tot esport és artTot art és esport i tot esport és art
Tot art és esport i tot esport és artlaiatr3
 
ScholarshipPlanFall2016
ScholarshipPlanFall2016ScholarshipPlanFall2016
ScholarshipPlanFall2016Emma McAdams
 
El desarrollo humano
El desarrollo humanoEl desarrollo humano
El desarrollo humanonico ruiz
 
Power point resume
Power point resumePower point resume
Power point resumeKara Kressin
 
แบ่งหน้าที่ Cut
แบ่งหน้าที่ Cutแบ่งหน้าที่ Cut
แบ่งหน้าที่ CutWattana Chamchad
 
Disponibilidad del agua
Disponibilidad del aguaDisponibilidad del agua
Disponibilidad del aguaA. Laura Perez
 
A Recipe for Virality
A Recipe for ViralityA Recipe for Virality
A Recipe for ViralityCult LDN
 

Viewers also liked (13)

Taufiq nur r manajemen pembiayaan lpi
Taufiq nur r manajemen pembiayaan lpiTaufiq nur r manajemen pembiayaan lpi
Taufiq nur r manajemen pembiayaan lpi
 
Teoria de la diversió
Teoria de la diversióTeoria de la diversió
Teoria de la diversió
 
Misión Vision
Misión VisionMisión Vision
Misión Vision
 
DanHickey_2014_Folio
DanHickey_2014_FolioDanHickey_2014_Folio
DanHickey_2014_Folio
 
Tot art és esport i tot esport és art
Tot art és esport i tot esport és artTot art és esport i tot esport és art
Tot art és esport i tot esport és art
 
ScholarshipPlanFall2016
ScholarshipPlanFall2016ScholarshipPlanFall2016
ScholarshipPlanFall2016
 
El desarrollo humano
El desarrollo humanoEl desarrollo humano
El desarrollo humano
 
Power point resume
Power point resumePower point resume
Power point resume
 
CCFsafari
CCFsafariCCFsafari
CCFsafari
 
Standar pelayanan
Standar pelayananStandar pelayanan
Standar pelayanan
 
แบ่งหน้าที่ Cut
แบ่งหน้าที่ Cutแบ่งหน้าที่ Cut
แบ่งหน้าที่ Cut
 
Disponibilidad del agua
Disponibilidad del aguaDisponibilidad del agua
Disponibilidad del agua
 
A Recipe for Virality
A Recipe for ViralityA Recipe for Virality
A Recipe for Virality
 

Similar to Jmestn42351212

On Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in NigeriaOn Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in NigeriaScientific Review SR
 
On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
 
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONcscpconf
 
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutEnhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutTELKOMNIKA JOURNAL
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399Nafas Esmaeili
 
Arima model
Arima modelArima model
Arima modelJassika
 
arimamodel-170204090012.pdf
arimamodel-170204090012.pdfarimamodel-170204090012.pdf
arimamodel-170204090012.pdfssuserdca880
 
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum EstimateData Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimatesipij
 
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...BRNSS Publication Hub
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLVijaySharma802
 
Seminar Report (Final)
Seminar Report (Final)Seminar Report (Final)
Seminar Report (Final)Aruneel Das
 
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...ijaia
 
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...ITIIIndustries
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...ijdpsjournal
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principleskenluck2001
 

Similar to Jmestn42351212 (20)

On Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in NigeriaOn Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in Nigeria
 
On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction
 
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
 
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutEnhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399
 
Arima model
Arima modelArima model
Arima model
 
arimamodel-170204090012.pdf
arimamodel-170204090012.pdfarimamodel-170204090012.pdf
arimamodel-170204090012.pdf
 
Social_Distancing_DIS_Time_Series
Social_Distancing_DIS_Time_SeriesSocial_Distancing_DIS_Time_Series
Social_Distancing_DIS_Time_Series
 
How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS
 
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum EstimateData Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
 
04_AJMS_288_20.pdf
04_AJMS_288_20.pdf04_AJMS_288_20.pdf
04_AJMS_288_20.pdf
 
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
 
Seminar Report (Final)
Seminar Report (Final)Seminar Report (Final)
Seminar Report (Final)
 
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
 
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Jmestn42351212

  • 1. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3285 Motifs In Time Series For Prediction A naïve approach compared to ARIMA Nertila Ismailaja Programme Analyst Armundia Factory Tirana, Albania n.ismailaja@armundiafactory.com Abstract—Time series is a subject that includes two key factors - observations and time. It is obvious that observations are time- dependent. Another interest field is motif discovery, composed solely by time series subsequences. During this time, loads of similarity measures have been presented. In their article, Dhamo et al [5] concluded that the best performance according to the quality in motif discovery was achieved by Chouakria’s index with CID (Chouakria’s index, proposed by Chouakria et al [4] and CID is proposed by Batista et al [2]). The following step is to use this distance and other time series features to make predictions. Various tests are made over time series with high level of complexity. The results achieved by this approach are compared to ARIMA models. All the tests are made in R. Keywords—motif discovery;ARIMA;time series; forecasting; Chouakria with CID; R I. INTRODUCTION Motif discovery has been widely exploited in Biological sciences, to detect anomalies in heart beatings, pressure, etc. The effort was to deal with a large amount of data (known as big data). The first approach was made by reducing data dimensions’ [1][3]. Some methods were Discrete Fourier Transformation[1], Discrete Wavelet Transformation[3], Piecewise Linear Approximation [12], etc. The first scientists to formally use motif discovery in time series, were Agrawal et al [3], Lin et al [6], Mueen[7][8][9][10], etc. Since then, several tactics have been proposed. The studies have always followed the similarity search throughout the time series, regarding to the problem. The most recent method is ε-queries, proposed as a probability approach to motif discovery. An expansion of motif discovery is to compare time series measures to each-other, based in four criterions - algorithm complexity, number of discovered motifs, accuracy and quality. In their article, Dhamo et al. [4] concluded that Chouakria index with CID gave better results than CID alone. Taking in consideration that Chouakria index with CID is a well-performing similarity measure, we move on in the following step- applying motif discovery in forecasting. There are used several well-known time series, for each of which is created a model, and then compared to the model given by ARIMA (to construct the model was used a predefined package of R, ”forecast”). According to the results, by using as norm error in forecasting, we can deduce that our model provides better performance. II. TIME SERIES A. Basic Concepts A time series A time series may be defined as a collection of data, where time is a relevant component. Definition 1 A time series T is an ordered collection of data, observed in n - regular intervals of time [𝑇1, 𝑇2, … , 𝑇𝑛]. Definition 2 A motif M of length m in a time series T of length n is a subsequence of 𝑇 which repeats itself in T. Definition 3 In ε-query search for similarity between two time series’ subsequences 𝑃 = [𝑃1, 𝑃2, … , 𝑃𝑚] and 𝑄 = [𝑄1, 𝑄2, … , 𝑄 𝑚] of length m, in a time series T of length n, P and Q are considered similar if the distance between P and Q is laid in an interval with absolute error equal to ε .An illustration of a motif is given in Fig. 1., with dataset unemp. Fig.1. Algorithm for motif discovery B. Distances Used For Time Series While comparing two notions, it is indispensable to use a criterion for comparison, in order to claim whether these notions mean or not the same. In time series, this criterion is undoubtedly the similarity measure, which is based in the concept of the distance. Definition 4 A distance is a function that complies with the following properties:  Identity: ∀𝑥, 𝑦 ∈ 𝑅, 𝑑(𝑥, 𝑦) = 0 ↔ 𝑥 = 𝑦  Symmetry: ∀𝑥, 𝑦 ∈ 𝑅, 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥)  Non-negativity: ∀𝑥, 𝑦 ∈ 𝑅, 𝑑(𝑥, 𝑦) ≥ 0  Transitive: ∀𝑥, 𝑦 ∈ 𝑅, ∃𝑧 ∈ 𝑅|𝑑(𝑥, 𝑦) ≤ 𝑑(𝑥, 𝑧) + 𝑑(𝑧, 𝑦) Definition 5 A similarity measure is a function that disobeys at least one of the conditions to be distance. 0 100 200 300 200500800 CID Index Observedvalues Time Series Motif Similar subseq 2 6 12 200500800 Similar subseq Motif's length Values
  • 2. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3286 In other words, a similarity measure is able to measure the similarity between two time series’ subsequences. Reported as a key factor in the quality and quantity in motif discovery, there are several similarity measures. In contrary to its’ wide usage, Euclid distance provides dissatisfactory results if it is used as a similarity measure. Definition 6 Euclid distance between two subsequences with length m, 𝑃 = [𝑃1, 𝑃2, … , 𝑃𝑚] and 𝑄 = [𝑄1, 𝑄2, … , 𝑄 𝑚], is the index, measured as below: 𝑑 𝐸𝑢𝑐(𝑃, 𝑄) = ∑ (𝑃𝑖 − 𝑄𝑖)2𝑚 𝑖=1 (1) As we mentioned above, the most performing similarity measure is Chouakria’s index with CID. In this case, it is therefore necessary to explain what is CID, and then to give the definition of Chouakria’s index. Definition 7 CID distance between two sub- sequences with length m, 𝑃 = [𝑃1, 𝑃2, … , 𝑃𝑚] and 𝑄 = [𝑄1, 𝑄2, … , 𝑄 𝑚], is the index, measured as below: 𝑑 𝐶𝐼𝐷(𝑃, 𝑄) = 𝑑 𝐸𝑢𝑐𝑙(𝑃, 𝑄) max{𝐶𝐸(𝑃),𝐶𝐸(𝑄)} min{𝐶𝐸(𝑃),𝐶𝐸(𝑄)} (2) , where: 𝐶𝐸(𝑃) = ∑ (𝑃𝑖 − 𝑃𝑖+1)2𝑚−1 𝑖=1 (3) This similarity measure is used for nonlinear time series subsequences. One main feature of CID is that, in case of two nonlinear subsequences P, Q of length m: 𝑑 𝐶𝐼𝐷(𝑃, 𝑄) > 𝑑 𝐸𝑢𝑐𝑙(𝑃, 𝑄) (4) Moreover, the closer to 1 is the fraction max{𝐶𝐸(𝑃),𝐶𝐸(𝑄)} min{𝐶𝐸(𝑃),𝐶𝐸(𝑄)} , the more similar is the behavior of the two subsequences. Another similarity measures is Chouakria’s index. This measure is composed by two factors, one of which is responsible for behavior and the other for proximity of values[]. Chouakria’s index is measured, as below: Definition 8 Chouakria’s index between two sub- sequences with length m, 𝑃 = [𝑃1, 𝑃2, … , 𝑃𝑚] and 𝑄 = [𝑄1, 𝑄2, … , 𝑄 𝑚], is the index, measured as below: 𝑑 𝐶𝐼𝐷(𝑃, 𝑄) = 2 1+𝑒 𝑘∗𝛿(𝑃,𝑄) ∗ 𝐶𝑂𝑅𝑡(𝑃, 𝑄) (5) , where: 𝐶𝑂𝑅𝑡(𝑃, 𝑄) = ∑ (𝑃 𝑖−𝑃 𝑖+1)(𝑄 𝑖−𝑄 𝑖+1)𝑚−1 𝑖=1 √∑ (𝑃 𝑖−𝑃 𝑖+1)2𝑚−1 𝑖=1 √∑ (𝑄 𝑖−𝑄 𝑖+1)2𝑚−1 𝑖=1 (6) and 𝑘 ∈ 𝑅+ ; and 𝛿(𝑃, 𝑄) may be Euclid, etc. In their article, Dhamo et al. [] proposed CID as distance 𝛿(𝑃, 𝑄) and 𝑘 = 2. III. MODELLING TIME SERIES Time series is a novel concept, which enrolls two main concepts - determinism and randomness. A time series is composed by many parts, such as trend, seasonality, cycles, etc. After the detection of these components in a time series, the remains are considered a random part. In other words, a time series T of length n, can be decomposed, as below: 𝑇𝑛 = 𝑡 𝑛 + 𝑐 𝑛 + 𝑠 𝑛 + 𝜀 𝑛 (7) , where 𝑡 𝑛 is trend, 𝑐 𝑛 is cycle, 𝑠 𝑛 is seasonal variation and 𝜀 𝑛 is noise. Usually a time series can be decomposed according to these features. The only characteristic to be modeled is the noise, which is a random variable. A. ARIMA Models As cited above, a time series is a regular collection of data. An ARIMA model is composed by implementing other forecasting models, such as AR, MA and the operator ∆. Let us first define the key concepts of a stochastic process. Definition 9 Let {𝜀𝑡, 𝑡 ∈ 𝑇} be a stochastic process. 𝜀𝑡~𝑊𝑁(0, 𝜎2) is considered to be a white noise, if it obeys the following rules: { 𝐸(𝜀𝑡) = 0 𝐸(𝜀𝑡 𝜀𝑠) = { 𝜎2 , 𝑠 = 𝑡 0, 𝑠 ≠ 𝑡 (8) Definition 10 Let {𝑌𝑡, 𝑡 ∈ 𝑇} be a stochastic process and 𝜀𝑡~𝑊𝑁(0, 𝜎2). An AR(p) model is constructed, as below: 𝜀𝑡 = ∑ 𝜑𝑖 𝑝 𝑖=0 𝑌𝑡−𝑖 (9) , where 𝜑𝑖, 𝑖 = 1, 𝑝̅̅̅̅̅ are coefficients defined dynami- cally. Definition 10 Let {𝑌𝑡, 𝑡 ∈ 𝑇} be a stochastic process. Let 𝜔𝑖~𝑊𝑁(0, 𝜎2), 𝑖 = 0,1,2, … . A MA(q) model is constructed, as below: 𝑌𝑡 = ∑ 𝜓𝑖 𝑞 𝑖=0 𝜔𝑡−𝑖 (10) , where 𝜓𝑖, 𝑖 = 1, 𝑝̅̅̅̅̅ are coefficients defined dynami- cally. Logically, a ARMA(p,q) model, is constructed, as below: Definition 12 A stochastic process {𝑌𝑡, 𝑡 ∈ 𝑇} is said to be ARMA(p,q) model (autoregressive moving average of order (p,q)) if it can be expressed as: 𝑌𝑡 = 𝜇 𝑡 + ∑ 𝜑𝑖 𝑝 𝑖=0 𝑌𝑡−𝑖 + ∑ 𝜓𝑖 𝑞 𝑖=0 𝜔𝑡−𝑖 (11) , where 𝜇 𝑡, 𝜑𝑖, 𝑖 = 0, 𝑝̅̅̅̅̅ and 𝜓𝑖, 𝑖 = 0, 𝑝̅̅̅̅̅ are constants and 𝜔𝑡 is white noise. An important parameter in ARIMA(p,d,q) is also the difference operator (∆), defined as below: Definition 13 Let { 𝑌𝑡, 𝑡 ∈ 𝑇} be a stochastic process. The ∆ 𝑑 operator is though defined, as: ∆ 𝑑 = 𝑌𝑡 − 𝑌𝑡−𝑑 (12) , where 𝑑 = 1,2, … Having all the necessary concepts, we can introduce ARIMA concept.
  • 3. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3287 Definition 14 The stochastic process {𝑌𝑡, 𝑡 ∈ 𝑇} is considered to be an ARIMA(p,d,q) process only if the process 𝑋𝑡 = ∆ 𝑑 𝑌𝑡 (13) is an ARMA(p,q) process. Of course that there are several ways to determine the most efficient values of p,q and d. But this will be discussed in the following sections. B. The Proposed Approach When trying to detect a motif in a time series, it is agreed that there is a pattern that repeats itself in time series, independently from observation. An example is given in Fig. 2. Fig.2. The relation between time series’ sub- sequences (time series co2) It is obvious the relationship that exists between the subsequences of length 12. The relation is made visible by the normalization of these subsequences, where is impossible to distinguish subsequences from each-other. Definition 15 Normalization of the time series 𝑇 with length 𝑛, is the time series 𝑇’, defined as: 𝑇′ = 𝑇−𝑚𝑒𝑎𝑛(𝑇) 𝑠𝑑(𝑇) (14) If there is a such strong relation between two subsequences 𝑃 = [𝑇𝑖, 𝑇𝑖+1, … , 𝑇𝑖+𝑚−1] and 𝑄 = [𝑇𝑗, 𝑇𝑗+1, … , 𝑇𝑗+𝑚−1] with length m, from the time series T, it is presumable that the relation would be of the same strength even if the length of the subsequence is greater. The situation is shown in Fig. 3. Fig. 3. Widening the motif’s length It is clear from Fig. 3. that, even the length of the subsequences has been enlarged, from 12 to 15, the difference is still small. C. Reasoning to Reach to Forecasting When we say that there is a motif in a time series, we assume that there is a large similarity between different subsequences of the same time series. Moreover, these subsequences are generally spread throughout all the time series. Given a fixed length m as motif’s length, we are able to find, according to Brute-Force algorithm, all patterns in the time series. In contrary to this algorithm, in order to predict, there are some changes being made. Those changes are reflected not only in code, but even in structure: Suppose that we require to use a subsequence of length m in our time series T of length n. Moreover, is found out that the most approximate subsequence is stated in position j. In this case, the dependency factor would be calculated, as below: 𝑑𝑒𝑝𝑓𝑎𝑐𝑡 = 𝑇[𝑗+𝑚] 𝑇[𝑗+𝑚−1] (15) Our prediction is 𝑇[𝑛 + 1], which is calculated as: 𝑇[𝑛 + 1] = 𝑑𝑒𝑝 𝑓𝑎𝑐𝑡 ∗ 𝑇[𝑛] (16) We define 𝑎 = 𝑇[(n − m + 1) : (n + 1)]. We can create a 95% confidence interval for our prediction, as below: 𝐶𝐼(𝑇[𝑛 + 1]) =]𝐸(𝑎) − 1.96 𝑠𝑑(𝑎) √𝑚+1 , 𝐸(𝑎) + 1.96 𝑠𝑑(𝑎) √𝑚+1 (17) An example of our algorithm in R, in time series AirPassengers, is given in Fig. 4. Fig.4. Snapshot of the proposed approach CO2 time series Time Observation 0 100 200 300 400 320350 2 320350 Subsequences Index Observation 2 -1.50.01.5 Normalized Index Observation CO2 time series Time Observation 0 100 200 300 400 320350 2 320350 Subsequences Index Observation 2 -2.0-0.51.0 Normalized Index Observation Brute -force algorithm Brute_force=function(T,m,epsilon) 1.#n=length(T) 2.threshold=min{d(Ai,Bj)}, Ai=T[i : (i+m-1)],i=1: (n-m+2) and j=(i+1): (n-m+2) 3.motif_index=argmax{d(Ai,Bj)<threshold+epsilon} 4.motif=T[motif_index: (motif_index+m-1)] 5. similar_seq=T[where(d(motif,Bj)<threshold+epsilon)], Bj=T[j : (j+m-1)] and j>motif_index end function Short-term algorithm Short_term=function(T,m) 1.Keep the last subsequence of length m constant (also considered as motif, 𝑇[(n − m + 1) : n].). 2.Choose one suitable distance for time series (Chouakria’s index) 3. Normalize time series’ subsequences, by using (8). 4.Detect the strongest relationship between our motif and other subsequences of our time series. Keep track of the initial index where the strongest relationship is proven. 5.Create a dependency factor that is being used during the prediction. end function
  • 4. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3288 D. Forecasting in a Long-term Period In the previous paragraph, we constructed an algorithm to predict the next observation, based on previous ones. What is more, there is a possibility to find a confidence interval for the provided prediction. But, in practice, there is a long-term necessity for prediction. In order to get to a long-term forecasting, we construct the following algorithm, as below: According to this algorithm, we can keep track of the original time series and the forecasts provided. An example is provided below, in Fig. 5. Fig.5. Forecasting in long-term period In Fig.5. is obviously seen the proximity in values and even in shape. This means that the model that is built is strong. IV. ARIMA VS PROPOSED METHOD In many forecasting models, such as AR, MA, ARIMA, etc, the basis of fitting the best parameters is low error in curve fitting. The lower the error it is, the better are the parameters. Definition 16 Given T a time series of length n, and T’ the ARIMA fitted model. The error in curve fitting is called the vector, as below: 𝜀 = 𝑇 − 𝑇′ (18) In practice, most commonly used is the Sum Square Error (𝜀′𝜀). There are also other measures that supply with information about the quality of the constructed model, such as AIC (Akaike Information Criterion) or BIC (Bayes Information Criterion). Definition 11 Given T a time series of length n, and T’ the ARIMA fitted model. AIC is measured, as below: 𝐴𝐼𝐶 = ln ( 1 𝑛 ∑ 𝑒𝑖 2𝑛 𝑖=1 ) + 2(𝑘+1) 𝑛 (19) , where 𝑒𝑖 2 = (𝑇𝑖 − 𝑇𝑖′)2 and k is the degree of freedom of T. Definition 12 Given T a time series of length n, and T’ the ARIMA fitted model. BIC is measured, as below: 𝐵𝐼𝐶 = ln ( 1 𝑛 ∑ 𝑒𝑖 2𝑛 𝑖=1 ) + (𝑘+1)∗ln(𝑛) 𝑛 (20) , where 𝑒𝑖 2 = (𝑇𝑖 − 𝑇𝑖′)2 and k is the degree of freedom of T. Generally, between AIC and BIC criterions, the most useful is BIC, which is considered to be more accurate. A. Detecting Error in Forecast in the Proposed Model In the proposed algorithm, we mentioned that the method does not create a single model - it does not create parameters. Our model is a dynamic model, which, in a long-term period, changes constantly. This means that AIC or BIC criterion is inapplicable. In order to be able to proof which model is better, and why, we use the concept of error in forecasting. This means that we keep a certain percentage of values unused (the last observations), in order to detect which algorithm provides a smaller Sum Square Error in Predictions. The number of predictions is kept constant, 10, in any case that we studied. This is done in order to prevent reducing the amount of information in disposal. In Table 1, there are some examples of errors made during the forecast, for several time series. TABLE I. COMPARISON OF TWO METHODS Time series Error in forecasting ARIMA Prop. approach star 4184.963 4245.082 prodn 16.14206 18.56693 rosewine Inf 768.4422 unemp 15006.22 4095.185 penguin 596901.4 321.071 Al_births Inf 3489920 hsales 671.1774 274.1702 AirPassen gers 281852.9 167063.2 ARIMA vs Proposed approach In Table 1, is obvious the advantage of our approach in relation to ARIMA. In 2 cases, ARIMA could not give an appropriate model – AIC criterion could not be approximated. In our trials, resulted that in 66.7% of the cases, our approach provided better performance than ARIMA’s model. In this percentage, is also included the 12.5% when ARIMA failed to provide a suitable model. An example of how much differ the real results from the ones provided by ARIMA, is given in Fig.6. Long-Term Algorithm Long_Term=function(T,m,k) 1.#n=length(T); k is the number of lags we want to forecast 2.for (i in seq(1,k)){ 3.prediction=Short_Term(T,m) 4.T=c(T,prediction) 5. } end function 0 10 20 30 40 50 60 70 100200300 Comparison real vs forecast Time index Observation Time series Prediction
  • 5. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3289 The time series taken in consideration is penguin, while the length of the motif is 7. Fig.6. ARIMA, the prop. method and real values It is clearly visible that the shape and the values gained by the proposed method are more compatible with the real ones. Whereas ARIMA’s curve of prediction is far more unrealistic. In many cases, ARIMA provided central values, very close to a linear curve. Whereas the proposed method provides different shapes, a nonlinear vector of forecasts. B. Detecting factors that influence results In order to define where this result comes from, so, whether it is because of the complexity of the time series, we use a complexity measure for time series’ complexity-fluctuations[9]. Definition 17 Given T a time series of length n, fluctuations are measured, as below: 𝑓𝑙𝑢𝑐𝑡(𝑇) = 1 𝑛−1 ∑ (𝑇𝑖 − 𝑇𝑖+1)2𝑛−1 𝑖=1 (21) In Fig.7. is shown the fluctuation of some time series. Fig.7. Fluctuation of some time series According to the traditional definition to the correlation between two factors, as below: Definition 18 The correlation between two random variables X and Y with length n is the index, measured as below: 𝐶𝑂𝑅(𝑃, 𝑄) = ∑ (𝑃 𝑖−𝑃̅)∗(𝑄 𝑖−𝑄̅)𝑛 𝑖=1 √∑ (𝑃 𝑖−𝑃̅)2𝑛 𝑖=1 √∑ (𝑄 𝑖−𝑄̅)2𝑛 𝑖=1 (22) it results that, in both cases, the relation between fluctuation and error in forecasting is strong (in each case, greater than 99%). This means that there is a strong dependency between these two factors. Another important factor that influences the forecast, might be the quantity of the data. It is naively deduced that the longer the time series is, the smaller will be the error. There have been made various tests, by increasing the amount of data. It has been chosen a high-complexity time series, such as hsales. In Fig.8. we see what happens with the error in forecasting when this time series is enlarged, with fixed length of the time series m=12 and m=9. Fig.8. Error in forecasting for different lengths In 50% of the cases, ARIMA (the proposed method with m=12) were better than the other. In a low percentage, the proposed method with m=9 is more performant than the others. What is more, we notice that there is no defined trend of the error in forecasting. This means that, if we increase the amount of data, we cannot presume whether the error will decrease or not. An important parameter during our method would be the length of the motif, m. The time series selected for this evaluation is again hsales, due to its’ complexity. We keep a fixed length of the time series- n=170. An example of this algorithm is shown in Fig.9. Fig.9. a) hsales b) Detecting the best value of m We can see that the lower error in prediction is made for m=9, equal to the periodicity of the time series. But this contradicts the upper results, where the proposed approach with m=12 was better than the proposed approach with m=9. This means that, in order to get better results, the length of the motif should be defined according to a careful study of periodicity, variations, etc. ACKNOWLEDGMENT I would like to thank Ms. Ilda Drini, employee at Department of Financial Stability and Statistics, for all the help during the creation of this method. REFERENCES Comparison of fluctuations Time series Fluctuations 050000150000 star prodn rosewine unemp penguin Al_births hsales Air_pass 60 80 100 120 140 160 040008000 Comparison Length of t.s. Errorinpred. ARIMA Pr. m. m=12 Pr. m. m=9 0 50 100 200 30507090 Time series hsales Time Observations 4 6 8 10 12 10003000 Detect best length of m Length of motif Errorinpred.
  • 6. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3290 All time series that are used in this article, can be found in the next websites: http://new.censusatschool.org.nz/resource/time-series- data-sets-2013/ www.instat.gov.al https://datamarket.com/data/list/?q=provider%3Atsdl http://www.cs.ucr.edu/~eamonn/discords/ http://vincentarelbundock.github.io/Rdatasets/datasets.ht ml http://cran.r-project.org/web/packages/astsa/index.htm [1] Agrawal, R., Faloutsos, C., Swami, A., (1993): “Efficient similarity search in sequence databases,” in: Fourth International Conference on Foundations of Data Organization, D. Lomet, Ed., Heidelberg: SpringerVerlag, pp. 69–84 [2] Batista, G., Keogh, E. J. Tataw, O., de Souza, V., “CID: an efficient complexity-invariant distance for time series” [3] Chan, K. & Fu, W. (1999). Efficient time series matching by wavelets. Proceedings of the 15 th IEEE International Conference on Data Engineering. [4] Chouakria, A., D., Diallo A., Giroud F., (2007) “Adaptive clustering of time series”. International Association for Statistical Computing (IASC), Statistics for Data Mining, Learning and Knowledge Extraction, Aveiro, Portugal [5] Dhamo, E., Ismailaja, N., Kalluçi, E., (2015): “Comparing the efficiency of CID distance and CORT coefficient for finding similar subsequences in time series”, Sixth International Conference ISTI, 5-6 June. [6] Lin, J., Keogh, E. , Lonardi, S. and Patel, P. (2002): “Finding Motifs in Time Series”, in Proc. of 2nd Workshop on Temporal Data Mining [7] Mueen A., Keogh, E., J., (2010A): “Online discovery and maintenance of time series motifs”. KDD, pg. 1089-1098 [8] Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B., (2009A) “Exact Discovery of Time Series Motifs”, SDM, pg. 473-484 [9] Mueen A., Keogh, E., J., Bigdely- Shamlo N., (2009): “Finding Time Series Motifs in Disk-Resident Data”. ICDM, pg 367-376 [10]Mueen A., (2013):” Enumeration of Time Series Motifs of All Lengths”. ICDM,pg. 547-556 [11]Yan, Ch., Fang, J., Wu, L., Ma, Sh., (2013): “An Approach of Time Series Piecewise Linear Representation Based on Local Maximum Minimum and Extremum”, Journal of Information & Computational Science 10:9 2747–2756 [12]Yi, B.-K., Faloutsos Ch., (2000): “Fast time sequence indexing for arbitrary Lp norms”. In The VLDB Journal, pages 385–394.