SlideShare a Scribd company logo
1 of 49
WAL_HUMN6100_08_A_EN-CC.mp4
Time Series Design
Time Series Design
Program Transcript
RICHARD BALKIN: Time series design simply refers to study
done over time, as
opposed to time to collect data at one particular instant. Often,
time series design
is really the single subject design. But you can have multiple
participants in time
series design. In the article that we'll even discuss for this
week, we see a time
series design occur as an element of looking at changes across a
program over
time, and perceptions that participants have about that program
over time.
So time series design can be used to look at how data is
predictable, how
information can become verifiable, and can this information be
replicated over
time? In other words, in a good time series design, I should be
able to conduct a
study like this again, and get the same result. So in a time series
design, instead
of looking at how changes may occur between groups, we may
see how change
occurs with a single subject, or even within a group, or for a
program over
particular period of time.
And that period of time can even be more of a longitudinal
nature. We can look at
changes across a few months, but we can look at changes across
years.
Additionally, if multiple subjects are used in a time series
design, and if the
research is longitudinal in nature, you need to take into
consideration attrition
rates. Are the participants who began the study the same
participants at the end
of the study? Was there attrition? And maybe consider why
attrition might occur.
For example, is the researcher able to keep up with all of the
participants at the
beginning, intermediate, and latter stages of the study. Attrition
is normal in any
research study, but it also needs to be accounted for.
An example of some time series research that I've conducted in
the past, has
been when I worked as a therapist at a psychiatric hospital. At
that time, we were
very interested in seeing what happened to our clients once they
leave the
hospital. We knew how they were when they were admitted.
They were either a
danger to self or a danger to others. And we had an idea of how
stable they were
when they discharged. But how are they doing one month, three
months, six
months, and 12 months after treatment?
So we had an after care program. And through the active
aftercare program, we
were able to do some post-care follow up with each of the
clients once they left
the hospital. One of our experiences was that after six months it
was very difficult
to continue to get feedback from the participants. One of the
reasons simply was
that working with this population, they were highly transient.
Phone numbers
would change. Addresses would change. And we just weren't
able to get a lot of
one-year follow up. Or, perhaps a child had relapsed and the
parents were
maybe angry at the treatment center, didn't want to respond to
our queries. So
those elements can play a role to. As I said before, attrition
occurs.
© 2016 Laureate Education, Inc. 1
Time Series Design
But that process of getting data from each client at one month,
and the three
months, and the six months, and the 12 months interval, was
essential in terms
of doing a time series design, and finding out did kids relapse
or regress to their
previous high risk behavior after receiving treatment at the
hospital. And what
were the influencing factors? We also would want to know
information as, did
they continue an outpatient counseling, for example?
In examining an article that uses time series design, we've
selected an article
that's quite multi-faceted. So in this particular article, they use
a four-phase
design to conduct the time series research. The 12-month
baseline pre-exposure
phase assessed program and patient outcomes. In Phase II,
which occurs after
six months of training, MDFT experts train Adolescent Day
Treatment Program
staff and administrators. and then in Phase III they have an
implementation
stage. And this is at 14 months. And then at Phase IV, they have
a Durability
Practice Phase, which is around 18 months.
So let's take a look at how the program dimensions changed
over time through
this time series design. So these program dimensions included
aspects like
autonomy, and clarity, and program organization, and control.
And what they
notice is that as a result of implementing this MDFT program,
that participants,
patients within the program, noticed positive differences among
these program
dimensions. So here what we end up with is a statistically
significant difference in
the way a program is perceived by the primary stakeholders, in
this case, the
patients who are experiencing treatment in the day program.
So imagine being able to implement an intervention that across
time improves
your program and improves receptiveness to treatment. And that
was the
importance of the study. Hopefully, when practitioners see this,
they can see a
treatment model that affects the quality of care. And they may
be more apt to use
such a model in their programs.
In terms of multicultural ethical and legal considerations, we
might want to once
again review, who was a sample? Who are the participants in
this study? So that
we make sure that the participants in the study are truly
generalizable to the
population of interest.
Additionally, whenever doing a time series design, you want to
think about and
consider, what occurs during the study? What is the
intervention? What is the
change that we're looking at? Is this change positive or not? For
example, what
would happen if the study was being conducted and immediately
a negative
consequence as a result of the intervention is occurring? Well,
of course the
ethical thing to do would be to stop the study.
And then it would be important to note that maybe this is not a
good intervention
to use. The study was cut short. And none of the phases were
completed,
because an unforeseen event or negative consequence was
occurring. So that's
© 2016 Laureate Education, Inc. 2
Time Series Design
another element of time series design, particularly when the
study is longitudinal
in nature.
Time Series Design
Additional Content Attribution
MUSIC:
Creative Support Services
Los Angeles, CA
Dimension Sound Effects Library
Newnan, GA
Narrator Tracks Music Library
Stevens Point, WI
Signature Music, Inc
Chesterton, IN
Studio Cutz Music Library
Carrollton, TX
© 2016 Laureate Education, Inc. 3
Time Series
Analysis
Anne Senter
One definition of a time series is that of a collection of
quantitative observations that are evenly spaced in time and
measured
successively. Examples of time series include
the continuous monitoring of a person’s heart rate, hourly
readings of air
temperature, daily closing price of a company stock, monthly
rainfall data, and
yearly sales figures. Time series analysis is generally used when
there are 50
or more data points in a series. If the
time series exhibits seasonality, there should be 4 to 5 cycles of
observations
in order to fit a seasonal model to the data.
Goals of time series analysis:
1. Descriptive: Identify patterns in correlated data—trends
and seasonal variation
2. Explanation: understanding and modeling the data
3. Forecasting: prediction of short-term trends from
previous patterns
4. Intervention analysis: how does a single event change the
time series?
5. Quality control: deviations of a specified size indicate
a problem
Time series are analyzed in order to understand the
underlying structure and function that produce the
observations. Understanding the mechanisms of a time series
allows a mathematical model to be developed that explains the
data in such a
way that prediction, monitoring, or control can occur.
Examples include prediction/forecasting,
which is widely used in economics and business.
Monitoring of ambient conditions, or of an input or an output, is
common
in science and industry. Quality control
is used in computer science, communications, and industry.
It is assumed that a time series data set has at least one
systematic pattern. The most common
patterns are trends and seasonality.
Trends are generally linear or quadratic. To find trends,
moving averages or regression
analysis is often used. Seasonality is a
trend that repeats itself systematically over time. A second
assumption is that the data exhibits
enough of a random process so that it is hard to identify the
systematic patterns
within the data. Time series analysis
techniques often employ some type of filter to the data in order
to dampen the
error. Other potential patterns have to
do with lingering effects of earlier observations or earlier
random errors.
There are numerous software programs that will analyze time
series, such as SPSS, JMP, and SAS/ETS.
For those who want to learn or are comfortable with coding,
Matlab, S-PLUS, and R are other software packages that can
perform time series analyses. Excel can be used if linear
regression analysis
is all that is required (that is, if all you want to find out is the
magnitude
of the most obvious trend). A word of
caution about using multiple regression techniques with time
series data:
because of the autocorrelation nature of time series, time series
violate the
assumption of independence of errors.
Type I error rates will increase substantially when
autocorrelation is
present. Also, inherent patterns in the
data may dampen or enhance the effect of an intervention; in
time series
analysis, patterns are accounted for within the analysis.
Observations made over time can be either discrete or
continuous. Both types of observations
can be equally spaced, unequally spaced, or have missing data.
Discrete measurements can be recorded at any
time interval, but are most often taken at evenly spaced
intervals. Continuous measurements can be spaced
randomly in time, such as measuring earthquakes as they occur
because an
instrument is constantly recording, or can entail constant
measurement of a
natural phenomenon such as air temperature, or a process such
as velocity of an
airplane.
Time series are very complex because each observation is
somewhat dependent upon the previous observation, and often is
influenced by
more than one previous observation.
Random error is also influential from one observation to
another. These influences are called
autocorrelation—dependent relationships between successive
observations of the
same variable. The challenge of time
series analysis is to extract the autocorrelation elements of the
data, either
to understand the trend itself or to model the underlying
mechanisms.
Time series reflect the stochastic nature of most
measurements over time. Thus, data may
be skewed, with mean and variation not constant, non-normally
distributed, and
not randomly sampled or independent.
Another non-normal aspect of time series observations is that
they are
often not evenly spaced in time due to instrument failure, or
simply due to
variation in the number of days in a month.
There are two main approaches used to analyze time series
(1) in the time domain or (2) in the frequency domain. Many
techniques are available to analyze data
within each domain. Analysis in the time
domain is most often used for stochastic observations. One
common technique is the Box-Jenkins ARIMA
method, which can be used for univariate (a single
data set) or multivariate (comparing two or more data sets)
analyses. The ARIMA technique uses
moving averages, detrending, and regression methods to detect
and remove
autocorrelation in the data.
Below, I will demonstrate a Box-Jenkins ARIMA time domain
analysis of a
single data set.
Analysis in the frequency domain is often used for periodic
and cyclical observations. Common techniques are spectral
analysis, harmonic
analysis, and periodogram analysis. A
specialized technique is Fast Fourier Transform (FFT).
Mathematically, frequency domain techniques
use fewer computations than time domain techniques, thus for
complex data,
analysis in the frequency domain is most common. However,
frequency analysis is more difficult
to understand, so time domain analysis is generally used outside
of the
sciences.
Time series analysis using
ARIMA methods
Using the ARIMA (auto-regressive, integrated, moving
average) method is an iterative, exploratory, process intended to
best-fit your
time series observations by using three steps—identification,
estimation, and
diagnostic checking—in the process of building an adequate
model for a time
series. The auto-regressive component
(AR) in ARIMA is designated as p, the
integrated component (I) as d, and
moving average (MA) as q. The AR component represents the
lingering
effects of previous observations. The I component represents
trends, including
seasonality. And the MA component
represents lingering effects of previous random
shocks (or error). To fit an ARIMA
model to a time series, the order of each model component must
be selected.
Usually a small integer value (usually 0, 1, or 2) is found for
each
component. The goal is to find the most
parsimonious model with the smallest number of estimated
parameters needed to
adequately model the patterns in the observed data.
In order to demonstrate time series analysis, I introduce a
data set of monthly precipitation totals from Portola,
CA in the Sierra Nevada
in Table
1. When a time series has strong
seasonality, as my data set does, a slightly different type of
ARIMA (p,d,q) process is used, which is often called SARIMA
(p,d,q)*(P,D,Q), where S stands for seasonal. In this model, not
only are there possible
AR, I, and MA terms for the data, there is a second set of AR, I,
and MA terms
that take into account the seasonality of the data.
Time series data are correlated,
which means that measurements are related to one another and
change together to
some degree. Thus, each observation is
partially predictable from previous observations, or from
previous random
shocks, or from both. An assumption made
after analysis is that the correlations inherent in the data set
have been
adequately modeled. Thus after a model
has been built, any leftover variations are considered to be
independent and
normally distributed with mean zero and constant variance over
time. These leftover variations are used to
interpret the data.
Regardless of which technique is used, the first step in any
time series analysis is to plot the observed values against time.
A number of qualitative aspects are
noticeable as you visually inspect the graph.
In Figure 1, we see that there is a 12-month pattern of
seasonality, no
evidence of a linear trend, and, variation from the mean appears
to be
approximately equal across time.
Monthly
precipitation data from NOAA weather station in Portola,
Ca., from January
1999 through April 2004
Figure 1. Precipitation occurs cyclically. December falls on
number 12, 24, 36, 48,
60, and 72. Mean = 1.66 inches/month,
standard deviation = 2.09, n =
76.
Is there a trend to this data set? The simplest linear equation
would be y = b,
where b is the random shock, or error, of the data set. The
linear equation for my data set is y =
-0.0018x + 1.6688. With a slope of
-0.0018, there is no significant linear trend.
This data set needs no further work to eliminate a linear or
quadratic
trend.
If removal of the trend—detrending—is needed, I would
proceed to differencing. Ordinary least
squares analysis is another method used to detect and remove
trends. Differencing has advantages of ease of use
and simplicity, but also has disadvantages including over-
correcting for
trends, which skews the correlations in a negative direction.
There are other problems with differencing
that are covered in textbooks.
Differencing
means calculating the difference among pairs of observations at
some time
interval. A difference of one time
interval apart is calculated by subtracting value #1 from value
#2, then #2
from #3, and on, and plotting that data to determine if mean of 0
and a
constant variance are present. If
differencing of one does not detrend the data,
calculate a difference of 2 by subtracting difference #2 from
difference #3,
and on. Use a log transformation on the
differences if necessary to stabilize the mean and variance.
Seasonal
autocorrelation is different from a linear or quadratic data trend
in that
it is predictably spaced in time. Our
precipitation data can be expected to have a 12-month seasonal
pattern, whereas
daily observations might have a 7-day pattern, and hourly
observations often
have a 24-hour pattern.
Equation 1
In order to detect seasonality, plot the autocorrelation function
(ACF) by
calculating and graphing the residuals
(observed minus mean for each data point).
The graph of the residuals against a specified time interval is
called a
lagged autocorrelation function or a correlogram. The null
hypothesis for the ACF is that the
time series observations are not correlated to one another, i.e.;
that any
pattern in the data is from random shocks only.
The residuals can be calculated using equation 1.
In time series analysis a lag is defined as: an event
occurring at time t + k (k > 0) is said to lag behind an event
occurring at
time t, the extent of the lag being k.
In 1970, Box and Jenkins wrote, “..to obtain a
useful estimate of the autocorrelation function, we would need
at least 50
observations and the estimated autocorrelations would be
calculated for k = 0,
1, …, k, where k was not larger than N/4”. For my data set of
78 observations, I
specified 19 autocorrelation lags (78/4 = 19.5).
A rule of thumb for an ACF is if there are plotted residuals
that are greater than 2 standard errors away from the zero mean,
they indicate
statistically significant autocorrelation.
In Figure 2, there are 2 residual values, at lag 6 and lag 12, that
lay
more than 2 standard errors—that is, the approximate 95%
confidence limits—from
the zero mean. I interpret this as a
6-month seasonal pattern that cycles between summer when
there is little to no
precipitation, and winter when precipitation is at its peak. So,
even though the linear equation reveals
no trend, graphing the ACF reveals seasonality.
I used the JMP software program from SAS to analyze my data
set. Though I will not cover how to
perform a time series analysis in the spectral domain, I did use
the spectral
density graph to verify that the biggest seasonal pattern occurs
at 12-month
intervals, not at 6-month intervals. In
Figure 3, notice the large spike at period 12.
Lagged
autocorrelation function of Portola, Ca precipitation data.
Figure 2. Visual inspection shows significant deviations
from zero correlation at lag 1, 6, and 12, and very close at
lag 7 and
13. Interpretation suggests that
there are two seasonal (rainy season and dry season) patterns
spaced about
6 months apart. Number of
autocorrelation lags equals 19.
Spectral Density as a function of period
Figure 3. A
strong signal appears at about period 12, corresponding to a
yearly cycle.
The partial autocorrelation function (PACF) is also used to
detect trends and seasonality. Figure 4
is the PACF of the precipitation data.
In general, the PACF is the amount of correlation between a
variable and
its lag that is not explained by correlations at all lower-order
lags. The equation to obtain partial
autocorrelations is very complex, and is best explained in time
series
textbooks.
Lagged partial
autocorrelation function of Portola, Ca precipitation data.
Figure 4.
Significant deviation from zero is evident at lags 1, 6, and
12,
suggesting the same 6-month seasonal pattern.
Now that our observations against time, as well as the ACF,
and PACF have been graphed, we can begin to match our
patterns to idealized
ARIMA models. The easy way to analyze a
time series data set is to simply input numerous variations of
ARIMA. There are also systematic steps that you can
take that will help suggest the best values for the AR, I, and MA
terms.
Here I present a few general rules to apply when working to
identify the best-fit ARIMA model. These
rules come from the Duke University
website http://www.duke.edu/~rnau/411home.htm, that, along
with other textbooks and websites listed
below, was instrumental in helping me understand time series
analysis, and
specifically in helping me understand the nuances of seasonally
affected time
series.
After adjusting the data by a seasonal difference of 1 using
JMP, a visual inspection of shows that the ACF decays more
slowly than the
PACF, Figure 5. I used Duke’s Rule #3:
The optimal order of differencing is often the order of
differencing at which
the standard deviation is lowest, to help me determine that my
data needed no
differencing for trend but did need to be differenced for
seasonality (both
options available in JMP). A seasonal
difference of 1 yields a standard deviation of 1.89, the lowest
value of the
iterations that I tried.
ACF and PACF
after seasonal differencing of 1.
Figure 5. All ACF and PACF lags fall below significant
levels, indicating that autocorrelation has been eliminated.
Using the iterative approach of checking model values via
JMP, I found that the lowest values of Aikaike’s ‘A’
Information Criterion (AIC), Schwarz’s Bayesian Criterion, and
the
-2LogLikelihood for my data set are obtained with an ARIMA
(0,0,0)(1,1,1). According to Duke’s Rule 8, it is possible
for an AR term and an MA term to cancel each other out. They
suggested that I try a model with one
fewer AR term and one fewer MA term, particularly if it takes
more than 10
iterations for the model to converge. My
model took 6 iterations to converge.
Duke’s Rule 12 states that if a series has a strong and
consistent seasonal pattern, never use more than one order of
seasonal
differencing or more than 2 orders of total differencing
(seasonal + nonseasonal). Rule
13 states that if the autocorrelation at the seasonal period is
positive,
consider adding an SAR term, and if negative try adding an
SMA term to the
model. Do not mix SAR and SMA terms in
the same model.
Duke’s rules for seasonality suggest that I not accept a
mixed model as the best-fit model for my data.
I eliminated the AR and MA terms, but that model yielded a
higher value
of AIC, Schwarz’s Bayesian Criterion, and a much higher value
of the
-2LogLikelihood. I also successively
eliminated the AR or the MA term while leaving the other term
in, but still got
higher values for all test parameters. Based
on the parameter values, I believe that the ARIMA
(0,0,0)(1,1,1)
is the best model for my data.
Parameter estimates
of the most likely SARIMA models
Model
DF
Variance
AIC
Seasonal ARIMA(0, 0, 0)(1, 1, 0)12
62
3.5908132
83.784319
Seasonal ARIMA(0, 0, 0)(0, 1, 1)12
62
3.5125921
82.374756
Seasonal ARIMA(0, 0, 0)(0, 1, 0)12
63
3.6544726
83.93302
Seasonal ARIMA(0, 0, 0)(1, 1, 1)12
61
2.8333581
69.581017
SBC
RSquare
-2LogLH
88.102085
-0.11
80.1373
86.692522
-0.09
79.272251
86.091903
-0.14
348.10154
76.057666
-0.04
75.26258
Table 2. Model
#4, SARIMA (0,0,0)(1,1,1) has the lowest variance, AIC,
SBC, RSquare, and -2LogLH.
About 20 models were tested; these four had the lowest
scores.
I have demonstrated best-fitting an ARIMA model to a time
series using description and explanation phases of time series
analysis. If I were to continue with this exercise, I
could use this model to predict precipitation for the next year or
two. Most software programs are capable of
extrapolating values based on previous patterns in the data set.
This topic is covered in textbooks.
There are numerous books, websites, and software programs
available for working with time series.
I found that most of the books that were solely dedicated to
time series
were quite dense with formulas, thus difficult to understand.
Some websites were somewhat easier to
understand but only a couple offered a step-by-step process to
guide you
through an analysis. I used just one
software program, JMP, and used the help guide extensively.
The help guide was useful in understanding
the generated graphs, but offered definitions without
elaboration as to how to
interpret the defined data. If you are
going to analyze a time series, I suggest using multiple
resources, and
especially if you are new to time series analysis (like I am),
find a
knowledgeable person who can help you with interpretation of
your results.
Books:
If the CD-ROM is available, this text will walk you through
many analyses.
Brockwell, P.J. and Davis,
R.A. 2002, 2nd ed. Introduction to time series
and forecasting. Springer, New York.
These guys wrote the book on ARIMA processes.
Box, G.E.P., Jenkins, G.M., and Reinsel,
G.C. 1994, 3rd ed. Time series analysis: Forecasting and
control. Prentice Hall, Englewood
Cliffs, NJ.
This book is pretty understandable, though still lots of
formulas.
Chatfield, C. 2004, 6th ed.
The analysis of time series – an introduction. Chapman and
Hall,
London, UK.
An excellent discussion of problems and
solutions to ARIMA techniques.
Glass, G.V., Willson, V.L., and Gottman, J.M. 1975. Design
and analysis of time-series experiments. Colorado
Associated University Press, Boulder, Colorado.
An interesting read about time series from a historical
perspective.
Klein, J.L. 1997. Statistical visions in time: a history of
time series analysis, 1662-1938.
Cambridge University Press, New York.
The time series chapter is understandable and easily
followed.
Tabachnick, B.G., and Fidell, L.S. 2001, 4th ed. Using
multivariate statistics. Allyn and Bacon, Needham
Heights, MA.
Websites:
This is the best
website that I found in my web searches.
It is a step-by-step guide to understanding many aspects of time
series,
including a series of ‘rules’ to use when analyzing your data.
http://www.duke.edu/~rnau/411home.htm
An introduction to
time series analysis from an engineering point of view, with two
worked
examples. Very
helpful.
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.ht
m
Extensive website
with LOTS of useful information once you get through the
business talk. Has applets for
determining stationarity, seasonality, mean,
variance, etc.
http://home.ubalt.edu/ntsbarsh/Business-stat/stat-
data/Forecast.htm
Useful for
definitions, would be great if they had examples of actual
analyses.
http://www.statsoftinc.com/textbook/stathome.html
Step-by-step explanation
of time series analysis, including examples of how to use Excel
to adjust for
seasonality and analyzing the data by using linear regression,
all in the
Crunching section.
http://www.bized.ac.uk/timeweb/index.htm
Type
in time series in product search to see available books that are
short but
sweet.
http://www.sagepub.com/Home.aspx
Website
for my precipitation data.
http://www.wrh.noaa.gov/cnrfc/monthly_precip.php
Website
for the software package that I used in this presentation.
http://www.jmp.com/
Extensive
and easy to use statistical software package.
http://www.spss.com/
Free software for
analyzing time series data sets, but you need to code.
http://www.r-project.org/
Free statistics and
forecasting software (didn’t try out, so can’t say how good)
http://www.wessa.net/
@charset "windows-1252";
@font-face { font-family: Geneva; }
@font-face { font-family: Times-Roman; }
p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0in 0in
0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }
p.MsoCommentText, li.MsoCommentText,
div.MsoCommentText { margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: "Times New Roman"; }
span.MsoCommentReference { }
a:link, span.MsoHyperlink { color: blue; text-decoration:
underline; }
a:visited, span.MsoHyperlinkFollowed { color: purple; text-
decoration: underline; }
p.MsoCommentSubject, li.MsoCommentSubject,
div.MsoCommentSubject { margin: 0in 0in 0.0001pt; font-size:
12pt; font-family: "Times New Roman"; }
span.SpellE { }
span.GramE { }
@page Section1 { size: 8.5in 11in; margin: 1in 1.25in; }
div.Section1 { page: Section1; }
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 74
Graph Based Framework for Time Series Prediction
Vivek Yadav
*
Durga Toshniwal
**
Abstract
Purpose: A time series comprises of a sequence of observations
ordered with time.
A major task of data mining with regard to time series data is
predicting the
future values. In time series there is a general notion that some
aspect of past
pattern will continue in future. Existing time series techniques
fail to capture the
knowledge present in databases to make good assumptions of
future values.
Design/Methodology/Approach: Application of graph matching
technique to
time series data is applied in the paper.
Findings: The study found that use of graph matching
techniques on time-series
data can be a useful technique for finding hidden patterns in
time series database.
Research Implications: The study motivates to map time series
data and graphs
and use existing graph mining techniques to discover patterns
from time series
data and use the derived patterns for making predictions.
Originality/Value: The study maps the time-series data as
graphs and use graph
mining techniques to discover knowledge from time series data.
Keywords: Data mining; Time Series Prediction; Graph Mining;
Graph Matching
Paper Type: Conceptual
Introduction
ata mining is the process of extracting meaningful and
potentially useful patterns from large datasets. Nowadays, data
mining is becoming an increasingly important tool by modern
business processes to transform data into business intelligence
giving
business processes an informational advantage to make their
strategic
business decisions based on the past observed patterns rather
than on
intuitions or beliefs (Clifton, 2011). Graph based framework for
time
series prediction is a step towards exploring new efficient
approach for
time series prediction where predictions are based on patterns
observed
in past.
Time Series data consists of sequences of values or events
obtained over
repeated instances of time. Mostly these values or events are
collected at
equally spaced, discrete time intervals (e.g., hourly, daily,
weekly,
monthly, yearly etc.). When there is only one variable upon
which
observations with respect to (w.r.t) time are made, is called
univariate
time series. Data mining on Time-series data is popular in many
applications, such as stock market analysis, economic and sales
forecasting, budgetary analysis, utility studies, inventory
studies, yield
* Department of Electronics & Computer Engineering, IIT
Roorkee.
email: [email protected]
** Assistant Professor. Department of Electronics & Computer
Engineering, IIT Roorkee.
email: [email protected]
D
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 75
projections, workload projections, process and quality control,
observation of natural phenomena (such as atmosphere,
temperature,
wind, earthquake), scientific and engineering experiments, and
medical
treatments (Han & Kamber, 2006).
Time series dataset constitutes of {Y1, Y2, Y3, …, Yt } values,
where each Yi
represent the value of variable under study at time i. One of the
major
goal of Data mining in the time series is forecasting the time
series i.e., to
predict the future value Yt+1. The successive observations in
time series
are statistically dependent on time and time series modeling is
concerned
with techniques for analysis of such dependencies. In time
series analysis,
a basic assumption is made that is (i.e.) some aspect of past
pattern will
continue in future. Under this assumption time series prediction
is
assumed to be based on past values of the main variable Y. The
time
series prediction can be useful in planning and measuring the
performance of predicted value on past data against actual
observed
value on the main variable Y.
Time series modeling is advantageous, as it can be used more
easily for
forecasting purposes since the historical sequences of
observations upon
study on main variable are readily available as they are
recorded in the
form of past observations & can be purchased or gathered from
published secondary sources. In time series modeling, the
prediction of
values for future periods is based on the pattern of past values
of the
variable under study, but the model does not generally account
for
explanatory variable which may have affected the system. There
are two
reasons for resorting to such time models. First, the system may
not be
understood, and even if it is understood it may be extremely
difficult to
measure the cause and effect relationship of parameters
affecting the
time series. Second, the main concern may be only to predict
the next
value and not to explicitly know why it was observed (Box,
Jenkins &
Reinsel, 1976)
Time Series analysis consists of four major components for
characterizing
time-series data (Madsen, 2008). First, Trend component- these
indicate
the general direction in which a time series data is moving over
a long
interval of time, denoted by T. Second, Cyclic component- these
refer to
the cycles, that is, the long-term oscillations about a trend line
or curve,
which may or may not be periodic, denoted by C. Third,
Seasonal
component- these are systematic or calendar related, denoted by
S.
Fourth, Random component- these characterize the sporadic
motion of
time series due to random or chance events, denoted by R.
Time-series
modeling is also referred to as the decomposition of a time
series into
these four basic components. The time-series variable Y at the
time t can
be modeled as either the product of the four variables at time t
(i.e., Yt =
Tt×Ct× St× Rt) using multiplicative model proposed by (Box,
Jenkins &
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 76
Reinsel, 1970) where Tt means Trend component at time t, Ct
means
cyclic component at time t, St means seasonal component at
time t and Rt
signifies Random component at time t. As an alternative,
additive model
(Balestra & Nerlove, 1966; Bollerslev, 1987) can also be used
in which (Yt
= Tt+Ct+St+Rt) where Yt, Tt, Ct, St, Rt have the same meaning
as described
above. Since multiplicative model is the most popular model,
we will use
it for the time series decomposition. Example of time series data
is the
airline passenger data set (Fig. 1) in which the main variable Y
is the
number of passengers (in thousands) in an airline is recorded
w.r.t time,
where each observation on main variable is recorded on monthly
basis
from January 1949 to December 1960. Clearly, the time series is
affected
by increasing trend, seasonal and cyclic variations.
Fig. 1: Time series Data of the Airline Passenger Data from
Year 1949 to 1960 represented
on monthly basis.
Review of Literature
In time series analysis there is an important notion of de-
seasonalizing
the time series (Box & Pierce, 1970). It makes the assumption
that if the
time series represents a seasonal pattern of L periods, then by
taking
moving average Mt of L periods, we would get the mean value
for the
year. This would be free of seasonality and contain little
randomness
(owing to averaging). Thus Mt=Tt×Ct (Box, Jenkins & Reinsel,
1976). To
determine the seasonal component, one would simply divide the
original
series by the moving average i.e., Yt/Mt= (Tt×Ct× St× Rt)/(
Tt×Ct )= St× Rt.
Taking average over months eliminates randomness and yields
seasonality component St. De-seasonalized Yt time series can
be
computed by Yt/St.
The approach described in (Box, et al, 1976) for predicting the
time
series, uses regression to fit a curve to De-seasonalized time
series using
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 77
least square method. To predict the values in time series, model
projects
the De-seasonalized time series into future using regression and
divide it
by the seasonal component. The Least Square Method is
explained in
Johnson and Wichern (2002).
Exponential Smoothing has been proposed in (Shumway &
Stoffer, 1982)
which is an extension to above method to make more accurate
predictions. It suggests, making prediction for Yt weighing the
most
recent observation (Yt-1) by α and weighting the most recent
forecast (Ft-1)
by (1- α). Note α lies between 0 and 1 (i.e., 0≤α≤1). Thus the
forecast is
given by Ft+1= Yt-1* α +(Ft-1) * (1- α). Optimal α is chosen
based on the
smallest MSE (Mean Square Error) value during the training.
ARIMA (Auto-Regressive Integration Moving Average Based
Model) has
also been proposed (Box, et al., 1970, 1976; Hamilton, 1989).
ARIMA
model is categorized by ARIMA(p,q,d) where p denotes order of
auto-
regression, q denotes order of differentiation and d denotes
order of
moving averages. The model tries to find the value of p, q, and
d that best
fits the data. In time series forecasting using a hybrid ARIMA
and neural
network model has proposed a model that tries to find p, q and d
using
neural network (Zhang, 2003).
Proposed Work: Graph Based Framework for Time Series
Prediction
In this paper, I propose to use graph based framework for time
series
prediction. The motivation to use the graphs is to capture the
tacit
historical pattern present in the dataset. The idea behind
creation of
graph over time series is to utilize two facts. First, some aspect
of time
series pattern will continue in future and graph is a data
structure that is
well suited to model a pattern. Second, similarity can be
calculated
between graphs to know the similar patterns and their order of
occurrence. Thus, graph is created with the motivation to store a
pattern
over time series and make prediction based on similarity of
observed
pattern from historical data as an alternative to Regression and
curve
fitting. The major shortcoming of using the regression and
curve fitting is
that it requires expert knowledge about curve equation and the
number
of parameters in it. If parameters are too many there is problem
of over
fitting and if parameters are too less, model suffers from
problem of
under fitting (Han & Kamber, 2006). The complete pattern in
time series
is not known initially and it is affected by random component
which
makes the regression harder, hence deciding the curve equation
and
number of parameters in it is a major issue.
To further explore the concept of pattern, let there be time
series on
monthly data of N years where first observation was in first
month of m
year, Data = {Y1(k)Y2(k)…Y12(k), Y1(k+1) Y2(k+1)
…Y12(k+1),…, Y1(k+N)Y2(k+N)…Y12(k+N)}
where Y1(k) means value of variable under study for first month
of year k
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 78
& Y12(k+N) means value of variable under study for twelfth
month of year
k+N. Note m≤k≤(m+N). In general let d, be the time interval
which makes
a pattern. If a pattern has to be stored yearly and data is
available
monthly d=12, data is available quarterly d=4, etc. Each
successive
observation to Yij (meaning month i and year j) on main
variable ordered
by time is in general given by Yi’j’ where if Yij 1≤i≤12,
k≤j≤(k+N), then for
Yi’j’ if i<12 then i’=i+1, j’=j else i’=1, j’=j+1. A graph over
each successive d
observation is created to store the pattern. This is called ‘last-
pattern-
observed-graph’. To make the prediction we also store the
knowledge in
each graph that how the last pattern observed effect the next
observation. This is called ‘knowledge-graph’. Example If we
consider the
data {Y1(k)Y2(k)…Y12(k), Y1(k+1) Y2(k+1) …Y12(k+1),…,
Y1(k+N) Y2(k+N)…Y12(k+N)}, last-
pattern-observed-graph for Jan of year (k+1) will be generated
using data
{Y1(k)Y2(k)…Y12(k)} and knowledge-graph of Jan for year
(k+1) will be
generated using {Y1(k)Y2(k)…Y12(k), Y1(k+1)} data.
Knowledge graph is created
with intuition to capture how the variable under study changed
over last
d observations and its effect on d+1 observation.
In time series data, the graph is created with the motivation to
model
each observation as vertex and represent the effect of variation
in
observations with respect to time in form of edges. The number
of
vertices in graph is equal to time interval over which a pattern
has to be
stored. The edges are created to take into account the effect of
each
observation on other. Since the past values will affect the future
values,
but future values would not affect the past values and hence the
edges
are created between vertices corresponding to it and all the
subsequent
observations which measure the change in angle with
horizontal. The
graphs generated can be represented in computer memory either
by
using Adjacency matrix representation or Adjacency list
representation
(Cormen, 2001). I have used Adjacency list representation to
save the
memory required to store the graph as each graph will have n(n-
1)/2
edges thus space required will be n(n-1)/2 using adjacency list
representation as compared to n
2
space using adjacency matrix
representation.
Dataset of N tuples is partitioned into two sets. First set for
training data
of m tuples and second {N-m} tuples for training and validation
of model.
During the training phase, a Knowledge-Graph is generated over
training
data tuples over each subsequent d+1 observation.
Yi(k)Y(i+1)(k)…Y(i+12)(k),
Y(i+13) (k) where i has bounds 1≤i≤12 and if i>12 then i=1 &
k=k+1 for all m
tuples in training Dataset. Thus m-12 Knowledge-Graphs are
generated.
These generated graphs are partitioned into d sets (d=12), where
each
graph is stored in the interval over which knowledge they have
captured
(i.e. graph for all Jan’s are stored together, all Feb’s stored
together, etc.).
To implement this we have used an array of size d of linked list
of graphs.
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 79
Each linked list stores all the knowledge graph corresponding to
interval
over which knowledge it represents. The graphs are partitioned
with the
motivation to ease the search since while making prediction,
model will
query for all patterns observed w.r.t a particular month, since
the graphs
are already stored in partitioned form, time taken by model to
execute
this query will be O(1).
To predict the next value in time series, model will take the last
d known
observations previous to the month on which prediction has to
be done
and compute ‘last-pattern-observed-Graph’. The model will
search for a
Knowledge graph (stored in the partitioned form corresponding
to month
for which prediction has to be made) that is most similar to
‘last-pattern
observed graph’, considering only number of vertices equal to
‘last-
pattern observed graph’ in Knowledge-Graph. To compute the
similarity
between two graphs, graph-edit distance technique has been
used
(Brown, 2004; Bunke & Riesen, 2008). The key idea of Graph-
edit
Distance approach is to model structural variation by edit
operations
reflecting modifications in structure and labeling. A standard
set of edit
operations is given by insertions, deletions, and substitutions of
both
nodes and edges. While calculating graph edit distance for time-
series
Graph for g1 (source graph) & g2 (destination graph), requires
only
substitutions of edges (change in angle) in g2 to make it similar
to g1 and a
summation of cost incurred with each edit operation is
calculated. The
graph with least edit cost is most similar & selected as a graph
that will
form the basis, of the prediction.
To make the prediction, model takes into account the structural
difference between two graphs in vertex ordered weighted
average
manner. To make the prediction on graph g1 (last-pattern-
observed-
Graph) using graph g2 (Knowledge Graph which is most similar
to g1),
every vertex in g1 predicts the angle between itself and the
predicted
value using the knowledge of g2 and taking into account the
difference of
edges between itself & it’s corresponding vertex in g2 in a
weighted
average manner (where edge difference to vertex that are closer
to be
predicted are given more weight technique to apply exponential
smoothing in Graph based time series prediction approach), and
thus in
this way each vertex predicts the angle. Every vertex makes the
prediction & the predicted value is average of value predicted
by each
vertex. After making the prediction, once the actual observed
value is
known, Knowledge graph is generated to capture the pattern
corresponding to the last observation and in this way model
learns in an
iterative manner.
Experimental Results
The code to implement Graph Based Time Series prediction
approach as
discussed above is written in java. The Graph Based Time
Series
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 80
prediction approach was applied on the airline passenger data
set, which
was first used in (Brown & Smoothing, 1962) and then in (Box,
et al.,
1976). It represents the number of airline passengers in
thousands
observed between January 1949 and December 1960 on a
monthly basis.
I have used 2 years of data for training i.e., 1949 & 1950 and
estimated
the remaining data on monthly basis implementing iterative
learning as
an observation is recorded.
Fig. 2 represents Actual and Predicted number of Passenger
using Graph
Based Framework for Time Series prediction applied on the
Time Series
of airline passenger data set. Fig. 3 represents the corresponding
percentage error rate observed on monthly basis. The average
error
recorded on time-series is 7.05.Fig. 4 represents the Actual and
Predicted
Number of passenger using Graph Based Framework for Time
Series
prediction applied on the De-seasonalized Time Series of airline
passenger data set (using concept of Moving Average). Fig. 5
represents
the corresponding percentage error rate observed on monthly
basis. The
average percentage error recorded on De-seasonalized Time
series is
5.81.
Fig. 2: Actual and Predicted number of Passenger using Graph
Based Framework for Time
Series prediction applied on the Time Series of airline
passenger data set (APTS).
Fig. 3: Percentage Error between Actual and predicted using
Graph Based Framework for
Time Series prediction applied on the Time Series of airline
passenger data set (APTS).
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 81
Fig. 4: Actual and Predicted number of Passenger using Graph
Based Framework for Time
Series prediction applied on the De-seasonalized Time Series of
airline passenger data set
(APTS).
Fig. 5: Percentage Error between Actual and Predicted values
using Graph Based
Framework for Time Series prediction applied on the De-
seasonalized Time Series of
airline passenger data set (APTS).
Conclusion & Discussion
A new approach for time series prediction has been proposed &
implemented which is based on graphs. The results reported
show that
using graph based framework for time series prediction on De-
seasonalized Time Series (Computed Using Concept of Moving
Average)
on The Airline Passenger Data has 94.19 percent accuracy and
on direct
Time Series of The Airline Passenger Data has 92.95 percent
accuracy.
The accuracy on De-seasonalized time series is better since this
time
series has only two factors, cyclic and trend factors which leads
to less
error rate as compared to direct application of proposed
approach on
time-series which has all the four factors cyclic, trend, seasonal
and
randomness, which makes the prediction difficult. Thus
application of
Graph based framework in conjunction to Moving average
offers good
accuracy.
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 82
Graph based framework approach for time series prediction has
incorporated the concept of exponential smoothing, moving
average and
graph mining to enhance its accuracy. Graph based framework
approach
for time series prediction is a good alternative to regression. In
the
proposed approach there is no need of domain expert knowledge
to
know the curve equation and number of parameters in it. The
result
validate that the new approach has good accuracy rate.
References
Balestra, P., & Nerlove, M. (1966). Pooling cross section and
time series
data in the estimation of a dynamic model: The demand for
natural gas. Econometrica, 34(3), 585-612.
Bollerslev, T. (1987). A conditionally heteroskedastic time
series model
for speculative prices and rates of return. The review of
economics and statistics, 69(3), 542-547.
Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1970). Time
series analysis.
Oakland, CA: Holden-Day.
Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1976). Time
series analysis:
forecasting and control (Vol. 16): San Francisco, CA: Holden-
Day.
Box, G. E. P., & Pierce, D. A. (1970). Distribution of residual
autocorrelations in autoregressive-integrated moving average
time series models. Journal of the American Statistical
Association, 65(332), 1509-1526.
Brown, R. G. (2004). Smoothing, forecasting and prediction of
discrete
time series. Mineola, NY: Dover Publications.
Brown, R. G., & Smoothing, F. (1962). Prediction of Discrete
Time Series.
Englewood Cliffs, NJ: Prentice Hall.
Bunke, H., & Riesen, K. (2008). Graph Classification Based on
Dissimilarity
Space Embedding. In N. da Vitoria Lobo, T. Kasparis, F. Roli,
J.
Kwok, M. Georgiopoulos, G. Anagnostopoulos & M. Loog
(Eds.),
Structural, Syntactic, and Statistical Pattern Recognition (Vol.
5342, pp. 996-1007): Berlin / Heidelberg: Springer
Clifton, C. (2011). Data Mining. In Encyclopaedia Britannica.
Retrieved
from
http://www.britannica.com/EBchecked/topic/1056150/data-
mining
Cormen, T. H. (2001). Introduction to algorithms. Cambridge,
Mass: The
MIT press.
Hamilton, J. D. (1989). A new approach to the economic
analysis of
nonstationary time series and the business cycle. Econometrica,
57(2), 357-384.
Han, J., & Kamber, M. (2006). Data mining: concepts and
techniques:
Morgan Kaufmann.
Graph Based Framework for Time Series Prediction Yadav*&
Toshniwal
TRIM 7 (2) July - Dec 2011 83
Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate
statistical
analysis (Vol. 5): NJ: Prentice Hall Upper Saddle River.
Madsen, H. (2008). Time series analysis. Boca Raton: Chapman
and
Hall/CRC Press.
Shumway, R. H., & Stoffer, D. S. (1982). An approach to time
series
smoothing and forecasting using the EM algorithm. Journal of
time series analysis, 3(4), 253-264.
Zhang, G. P. (2003). Time series forecasting using a hybrid
ARIMA and
neural network model. Neurocomputing, 50, 159-175. doi:
10.1016/s0925-2312(01)00702-0
Copyright of Trends in Information Management is the property
of University of Kashmir and its content may
not be copied or emailed to multiple sites or posted to a listserv
without the copyright holder's express written
permission. However, users may print, download, or email
articles for individual use.

More Related Content

Similar to WAL_HUMN6100_08_A_EN-CC.mp4.docx

Evaluation Of A Program Evaluation Essay
Evaluation Of A Program Evaluation EssayEvaluation Of A Program Evaluation Essay
Evaluation Of A Program Evaluation EssayHelp With A Paper Provo
 
Presentation research design
Presentation research designPresentation research design
Presentation research designShagufta Moghal
 
Social Work Research Planning a Program EvaluationJoan is a soc.docx
Social Work Research Planning a Program EvaluationJoan is a soc.docxSocial Work Research Planning a Program EvaluationJoan is a soc.docx
Social Work Research Planning a Program EvaluationJoan is a soc.docxsamuel699872
 
Research Method EMBA chapter 5
Research Method EMBA chapter 5Research Method EMBA chapter 5
Research Method EMBA chapter 5Mazhar Poohlah
 
Running head SETTING UP RESEARCH1 Chapter 6 Methods of Measu.docx
Running head SETTING UP RESEARCH1  Chapter 6 Methods of Measu.docxRunning head SETTING UP RESEARCH1  Chapter 6 Methods of Measu.docx
Running head SETTING UP RESEARCH1 Chapter 6 Methods of Measu.docxtodd521
 
Time series analysis
Time series analysisTime series analysis
Time series analysisFaltu Focat
 
Research Methodology General.ppt
Research  Methodology  General.pptResearch  Methodology  General.ppt
Research Methodology General.pptShama
 
Research Methodology_General.ppt
Research Methodology_General.pptResearch Methodology_General.ppt
Research Methodology_General.pptShama
 
COMMUNITY EVALUATION 2023.pptx
COMMUNITY  EVALUATION 2023.pptxCOMMUNITY  EVALUATION 2023.pptx
COMMUNITY EVALUATION 2023.pptxgggadiel
 
SM Nonprofit Ad Campaign Term Project InstructionsOverview.docx
SM Nonprofit Ad Campaign Term Project InstructionsOverview.docxSM Nonprofit Ad Campaign Term Project InstructionsOverview.docx
SM Nonprofit Ad Campaign Term Project InstructionsOverview.docxjennifer822
 
Chapter5-Methods_of_Research-Module.pdf
Chapter5-Methods_of_Research-Module.pdfChapter5-Methods_of_Research-Module.pdf
Chapter5-Methods_of_Research-Module.pdfJUNGERONA
 
BUS 499 Module 3 Homework AssignmentDirections Throughout this.docx
BUS 499 Module 3 Homework AssignmentDirections Throughout this.docxBUS 499 Module 3 Homework AssignmentDirections Throughout this.docx
BUS 499 Module 3 Homework AssignmentDirections Throughout this.docxRAHUL126667
 
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docxeugeniadean34240
 
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docxnovabroom
 
Please pay attention to all the details. The instructor told me th.docx
Please pay attention to all the details. The instructor told me th.docxPlease pay attention to all the details. The instructor told me th.docx
Please pay attention to all the details. The instructor told me th.docxstilliegeorgiana
 
Running Head Evidence based Practice, Step by Step Asking the Cl.docx
Running Head Evidence based Practice, Step by Step Asking the Cl.docxRunning Head Evidence based Practice, Step by Step Asking the Cl.docx
Running Head Evidence based Practice, Step by Step Asking the Cl.docxtodd271
 
After Action Report: a structured support to the practice of continuous impro...
After Action Report: a structured support to the practice of continuous impro...After Action Report: a structured support to the practice of continuous impro...
After Action Report: a structured support to the practice of continuous impro...Learning Everywhere
 

Similar to WAL_HUMN6100_08_A_EN-CC.mp4.docx (20)

Research Design new.ppt
Research Design new.pptResearch Design new.ppt
Research Design new.ppt
 
Evaluation Toolkit
Evaluation ToolkitEvaluation Toolkit
Evaluation Toolkit
 
Evaluation Of A Program Evaluation Essay
Evaluation Of A Program Evaluation EssayEvaluation Of A Program Evaluation Essay
Evaluation Of A Program Evaluation Essay
 
Presentation research design
Presentation research designPresentation research design
Presentation research design
 
Social Work Research Planning a Program EvaluationJoan is a soc.docx
Social Work Research Planning a Program EvaluationJoan is a soc.docxSocial Work Research Planning a Program EvaluationJoan is a soc.docx
Social Work Research Planning a Program EvaluationJoan is a soc.docx
 
Research Method EMBA chapter 5
Research Method EMBA chapter 5Research Method EMBA chapter 5
Research Method EMBA chapter 5
 
Running head SETTING UP RESEARCH1 Chapter 6 Methods of Measu.docx
Running head SETTING UP RESEARCH1  Chapter 6 Methods of Measu.docxRunning head SETTING UP RESEARCH1  Chapter 6 Methods of Measu.docx
Running head SETTING UP RESEARCH1 Chapter 6 Methods of Measu.docx
 
Time series analysis
Time series analysisTime series analysis
Time series analysis
 
Research Methodology General.ppt
Research  Methodology  General.pptResearch  Methodology  General.ppt
Research Methodology General.ppt
 
Research Methodology_General.ppt
Research Methodology_General.pptResearch Methodology_General.ppt
Research Methodology_General.ppt
 
Marketing Research
Marketing ResearchMarketing Research
Marketing Research
 
COMMUNITY EVALUATION 2023.pptx
COMMUNITY  EVALUATION 2023.pptxCOMMUNITY  EVALUATION 2023.pptx
COMMUNITY EVALUATION 2023.pptx
 
SM Nonprofit Ad Campaign Term Project InstructionsOverview.docx
SM Nonprofit Ad Campaign Term Project InstructionsOverview.docxSM Nonprofit Ad Campaign Term Project InstructionsOverview.docx
SM Nonprofit Ad Campaign Term Project InstructionsOverview.docx
 
Chapter5-Methods_of_Research-Module.pdf
Chapter5-Methods_of_Research-Module.pdfChapter5-Methods_of_Research-Module.pdf
Chapter5-Methods_of_Research-Module.pdf
 
BUS 499 Module 3 Homework AssignmentDirections Throughout this.docx
BUS 499 Module 3 Homework AssignmentDirections Throughout this.docxBUS 499 Module 3 Homework AssignmentDirections Throughout this.docx
BUS 499 Module 3 Homework AssignmentDirections Throughout this.docx
 
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
 
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
20200507_010443.jpg20200507_010448.jpg20200507_010502.jp.docx
 
Please pay attention to all the details. The instructor told me th.docx
Please pay attention to all the details. The instructor told me th.docxPlease pay attention to all the details. The instructor told me th.docx
Please pay attention to all the details. The instructor told me th.docx
 
Running Head Evidence based Practice, Step by Step Asking the Cl.docx
Running Head Evidence based Practice, Step by Step Asking the Cl.docxRunning Head Evidence based Practice, Step by Step Asking the Cl.docx
Running Head Evidence based Practice, Step by Step Asking the Cl.docx
 
After Action Report: a structured support to the practice of continuous impro...
After Action Report: a structured support to the practice of continuous impro...After Action Report: a structured support to the practice of continuous impro...
After Action Report: a structured support to the practice of continuous impro...
 

More from jessiehampson

Milestones Navigating Late Childhood to AdolescenceFrom the m.docx
Milestones Navigating Late Childhood to AdolescenceFrom the m.docxMilestones Navigating Late Childhood to AdolescenceFrom the m.docx
Milestones Navigating Late Childhood to AdolescenceFrom the m.docxjessiehampson
 
Migration and RefugeesMany immigrants in the region flee persecu.docx
Migration and RefugeesMany immigrants in the region flee persecu.docxMigration and RefugeesMany immigrants in the region flee persecu.docx
Migration and RefugeesMany immigrants in the region flee persecu.docxjessiehampson
 
Min-2 pagesThe goal is to develop a professional document, take .docx
Min-2 pagesThe goal is to develop a professional document, take .docxMin-2 pagesThe goal is to develop a professional document, take .docx
Min-2 pagesThe goal is to develop a professional document, take .docxjessiehampson
 
Mingzhi HuFirst Paper352020POLS 203Applicati.docx
Mingzhi HuFirst Paper352020POLS 203Applicati.docxMingzhi HuFirst Paper352020POLS 203Applicati.docx
Mingzhi HuFirst Paper352020POLS 203Applicati.docxjessiehampson
 
Miller, 1 Sarah Miller Professor Kristen Johnson C.docx
Miller, 1 Sarah Miller Professor Kristen Johnson C.docxMiller, 1 Sarah Miller Professor Kristen Johnson C.docx
Miller, 1 Sarah Miller Professor Kristen Johnson C.docxjessiehampson
 
Migrating to the Cloud Please respond to the following1. .docx
Migrating to the Cloud Please respond to the following1. .docxMigrating to the Cloud Please respond to the following1. .docx
Migrating to the Cloud Please respond to the following1. .docxjessiehampson
 
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docxMike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docxjessiehampson
 
Michelle Wrote; There are several different reasons why an inter.docx
Michelle Wrote; There are several different reasons why an inter.docxMichelle Wrote; There are several different reasons why an inter.docx
Michelle Wrote; There are several different reasons why an inter.docxjessiehampson
 
Midterm Lad Report 7Midterm Lab ReportIntroductionCell.docx
Midterm Lad Report     7Midterm Lab ReportIntroductionCell.docxMidterm Lad Report     7Midterm Lab ReportIntroductionCell.docx
Midterm Lad Report 7Midterm Lab ReportIntroductionCell.docxjessiehampson
 
MicroEssay Identify a behavioral tendency that you believe.docx
MicroEssay Identify a behavioral tendency that you believe.docxMicroEssay Identify a behavioral tendency that you believe.docx
MicroEssay Identify a behavioral tendency that you believe.docxjessiehampson
 
MILNETVisionMILNETs vision is to leverage the diverse mili.docx
MILNETVisionMILNETs vision is to leverage the diverse mili.docxMILNETVisionMILNETs vision is to leverage the diverse mili.docx
MILNETVisionMILNETs vision is to leverage the diverse mili.docxjessiehampson
 
midtermAnswer all question with proper number atleast 1 and half.docx
midtermAnswer all question with proper number atleast 1 and half.docxmidtermAnswer all question with proper number atleast 1 and half.docx
midtermAnswer all question with proper number atleast 1 and half.docxjessiehampson
 
Midterm QuestionIs the movement towards human security a true .docx
Midterm QuestionIs the movement towards human security a true .docxMidterm QuestionIs the movement towards human security a true .docx
Midterm QuestionIs the movement towards human security a true .docxjessiehampson
 
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docxMGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docxjessiehampson
 
Microsoft Word Editing Version 1.0Software Requirement Speci.docx
Microsoft Word Editing  Version 1.0Software Requirement Speci.docxMicrosoft Word Editing  Version 1.0Software Requirement Speci.docx
Microsoft Word Editing Version 1.0Software Requirement Speci.docxjessiehampson
 
Microsoft Windows implements access controls by allowing organiz.docx
Microsoft Windows implements access controls by allowing organiz.docxMicrosoft Windows implements access controls by allowing organiz.docx
Microsoft Windows implements access controls by allowing organiz.docxjessiehampson
 
MGT520 Critical Thinking Writing Rubric - Module 10 .docx
MGT520  Critical Thinking Writing Rubric - Module 10   .docxMGT520  Critical Thinking Writing Rubric - Module 10   .docx
MGT520 Critical Thinking Writing Rubric - Module 10 .docxjessiehampson
 
Midterm PaperThe Midterm Paper is worth 100 points. It will .docx
Midterm PaperThe Midterm Paper is worth 100 points. It will .docxMidterm PaperThe Midterm Paper is worth 100 points. It will .docx
Midterm PaperThe Midterm Paper is worth 100 points. It will .docxjessiehampson
 
Miami Florida is considered ground zero for climate change, in parti.docx
Miami Florida is considered ground zero for climate change, in parti.docxMiami Florida is considered ground zero for climate change, in parti.docx
Miami Florida is considered ground zero for climate change, in parti.docxjessiehampson
 
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docxMGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docxjessiehampson
 

More from jessiehampson (20)

Milestones Navigating Late Childhood to AdolescenceFrom the m.docx
Milestones Navigating Late Childhood to AdolescenceFrom the m.docxMilestones Navigating Late Childhood to AdolescenceFrom the m.docx
Milestones Navigating Late Childhood to AdolescenceFrom the m.docx
 
Migration and RefugeesMany immigrants in the region flee persecu.docx
Migration and RefugeesMany immigrants in the region flee persecu.docxMigration and RefugeesMany immigrants in the region flee persecu.docx
Migration and RefugeesMany immigrants in the region flee persecu.docx
 
Min-2 pagesThe goal is to develop a professional document, take .docx
Min-2 pagesThe goal is to develop a professional document, take .docxMin-2 pagesThe goal is to develop a professional document, take .docx
Min-2 pagesThe goal is to develop a professional document, take .docx
 
Mingzhi HuFirst Paper352020POLS 203Applicati.docx
Mingzhi HuFirst Paper352020POLS 203Applicati.docxMingzhi HuFirst Paper352020POLS 203Applicati.docx
Mingzhi HuFirst Paper352020POLS 203Applicati.docx
 
Miller, 1 Sarah Miller Professor Kristen Johnson C.docx
Miller, 1 Sarah Miller Professor Kristen Johnson C.docxMiller, 1 Sarah Miller Professor Kristen Johnson C.docx
Miller, 1 Sarah Miller Professor Kristen Johnson C.docx
 
Migrating to the Cloud Please respond to the following1. .docx
Migrating to the Cloud Please respond to the following1. .docxMigrating to the Cloud Please respond to the following1. .docx
Migrating to the Cloud Please respond to the following1. .docx
 
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docxMike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
Mike, Ana, Tiffany, Josh and Annie are heading to the store to get.docx
 
Michelle Wrote; There are several different reasons why an inter.docx
Michelle Wrote; There are several different reasons why an inter.docxMichelle Wrote; There are several different reasons why an inter.docx
Michelle Wrote; There are several different reasons why an inter.docx
 
Midterm Lad Report 7Midterm Lab ReportIntroductionCell.docx
Midterm Lad Report     7Midterm Lab ReportIntroductionCell.docxMidterm Lad Report     7Midterm Lab ReportIntroductionCell.docx
Midterm Lad Report 7Midterm Lab ReportIntroductionCell.docx
 
MicroEssay Identify a behavioral tendency that you believe.docx
MicroEssay Identify a behavioral tendency that you believe.docxMicroEssay Identify a behavioral tendency that you believe.docx
MicroEssay Identify a behavioral tendency that you believe.docx
 
MILNETVisionMILNETs vision is to leverage the diverse mili.docx
MILNETVisionMILNETs vision is to leverage the diverse mili.docxMILNETVisionMILNETs vision is to leverage the diverse mili.docx
MILNETVisionMILNETs vision is to leverage the diverse mili.docx
 
midtermAnswer all question with proper number atleast 1 and half.docx
midtermAnswer all question with proper number atleast 1 and half.docxmidtermAnswer all question with proper number atleast 1 and half.docx
midtermAnswer all question with proper number atleast 1 and half.docx
 
Midterm QuestionIs the movement towards human security a true .docx
Midterm QuestionIs the movement towards human security a true .docxMidterm QuestionIs the movement towards human security a true .docx
Midterm QuestionIs the movement towards human security a true .docx
 
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docxMGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
MGT526 v1Wk 2 – Apply Organizational AnalysisMGT526 v1Pag.docx
 
Microsoft Word Editing Version 1.0Software Requirement Speci.docx
Microsoft Word Editing  Version 1.0Software Requirement Speci.docxMicrosoft Word Editing  Version 1.0Software Requirement Speci.docx
Microsoft Word Editing Version 1.0Software Requirement Speci.docx
 
Microsoft Windows implements access controls by allowing organiz.docx
Microsoft Windows implements access controls by allowing organiz.docxMicrosoft Windows implements access controls by allowing organiz.docx
Microsoft Windows implements access controls by allowing organiz.docx
 
MGT520 Critical Thinking Writing Rubric - Module 10 .docx
MGT520  Critical Thinking Writing Rubric - Module 10   .docxMGT520  Critical Thinking Writing Rubric - Module 10   .docx
MGT520 Critical Thinking Writing Rubric - Module 10 .docx
 
Midterm PaperThe Midterm Paper is worth 100 points. It will .docx
Midterm PaperThe Midterm Paper is worth 100 points. It will .docxMidterm PaperThe Midterm Paper is worth 100 points. It will .docx
Midterm PaperThe Midterm Paper is worth 100 points. It will .docx
 
Miami Florida is considered ground zero for climate change, in parti.docx
Miami Florida is considered ground zero for climate change, in parti.docxMiami Florida is considered ground zero for climate change, in parti.docx
Miami Florida is considered ground zero for climate change, in parti.docx
 
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docxMGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
MGT230 v6Nordstrom Case Study AnalysisMGT230 v6Page 2 of 2.docx
 

Recently uploaded

Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxAdelaideRefugio
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxCeline George
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................MirzaAbrarBaig5
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismDabee Kamal
 
Trauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesTrauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesPooky Knightsmith
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17Celine George
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17Celine George
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi RajagopalEADTU
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxLimon Prince
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code ExamplesPeter Brusilovsky
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptxPoojaSen20
 

Recently uploaded (20)

Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptx
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
Trauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesTrauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical Principles
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 

WAL_HUMN6100_08_A_EN-CC.mp4.docx

  • 1. WAL_HUMN6100_08_A_EN-CC.mp4 Time Series Design Time Series Design Program Transcript RICHARD BALKIN: Time series design simply refers to study
  • 2. done over time, as opposed to time to collect data at one particular instant. Often, time series design is really the single subject design. But you can have multiple participants in time series design. In the article that we'll even discuss for this week, we see a time series design occur as an element of looking at changes across a program over time, and perceptions that participants have about that program over time. So time series design can be used to look at how data is predictable, how information can become verifiable, and can this information be replicated over time? In other words, in a good time series design, I should be able to conduct a study like this again, and get the same result. So in a time series design, instead of looking at how changes may occur between groups, we may see how change occurs with a single subject, or even within a group, or for a program over particular period of time. And that period of time can even be more of a longitudinal nature. We can look at changes across a few months, but we can look at changes across years. Additionally, if multiple subjects are used in a time series design, and if the research is longitudinal in nature, you need to take into consideration attrition rates. Are the participants who began the study the same participants at the end
  • 3. of the study? Was there attrition? And maybe consider why attrition might occur. For example, is the researcher able to keep up with all of the participants at the beginning, intermediate, and latter stages of the study. Attrition is normal in any research study, but it also needs to be accounted for. An example of some time series research that I've conducted in the past, has been when I worked as a therapist at a psychiatric hospital. At that time, we were very interested in seeing what happened to our clients once they leave the hospital. We knew how they were when they were admitted. They were either a danger to self or a danger to others. And we had an idea of how stable they were when they discharged. But how are they doing one month, three months, six months, and 12 months after treatment? So we had an after care program. And through the active aftercare program, we were able to do some post-care follow up with each of the clients once they left the hospital. One of our experiences was that after six months it was very difficult to continue to get feedback from the participants. One of the reasons simply was that working with this population, they were highly transient. Phone numbers would change. Addresses would change. And we just weren't able to get a lot of one-year follow up. Or, perhaps a child had relapsed and the parents were
  • 4. maybe angry at the treatment center, didn't want to respond to our queries. So those elements can play a role to. As I said before, attrition occurs. © 2016 Laureate Education, Inc. 1 Time Series Design But that process of getting data from each client at one month, and the three
  • 5. months, and the six months, and the 12 months interval, was essential in terms of doing a time series design, and finding out did kids relapse or regress to their previous high risk behavior after receiving treatment at the hospital. And what were the influencing factors? We also would want to know information as, did they continue an outpatient counseling, for example? In examining an article that uses time series design, we've selected an article that's quite multi-faceted. So in this particular article, they use a four-phase design to conduct the time series research. The 12-month baseline pre-exposure phase assessed program and patient outcomes. In Phase II, which occurs after six months of training, MDFT experts train Adolescent Day Treatment Program staff and administrators. and then in Phase III they have an implementation stage. And this is at 14 months. And then at Phase IV, they have a Durability Practice Phase, which is around 18 months. So let's take a look at how the program dimensions changed over time through this time series design. So these program dimensions included aspects like autonomy, and clarity, and program organization, and control. And what they notice is that as a result of implementing this MDFT program, that participants, patients within the program, noticed positive differences among these program
  • 6. dimensions. So here what we end up with is a statistically significant difference in the way a program is perceived by the primary stakeholders, in this case, the patients who are experiencing treatment in the day program. So imagine being able to implement an intervention that across time improves your program and improves receptiveness to treatment. And that was the importance of the study. Hopefully, when practitioners see this, they can see a treatment model that affects the quality of care. And they may be more apt to use such a model in their programs. In terms of multicultural ethical and legal considerations, we might want to once again review, who was a sample? Who are the participants in this study? So that we make sure that the participants in the study are truly generalizable to the population of interest. Additionally, whenever doing a time series design, you want to think about and consider, what occurs during the study? What is the intervention? What is the change that we're looking at? Is this change positive or not? For example, what would happen if the study was being conducted and immediately a negative consequence as a result of the intervention is occurring? Well, of course the ethical thing to do would be to stop the study.
  • 7. And then it would be important to note that maybe this is not a good intervention to use. The study was cut short. And none of the phases were completed, because an unforeseen event or negative consequence was occurring. So that's © 2016 Laureate Education, Inc. 2 Time Series Design another element of time series design, particularly when the study is longitudinal in nature. Time Series Design Additional Content Attribution MUSIC: Creative Support Services
  • 8. Los Angeles, CA Dimension Sound Effects Library Newnan, GA Narrator Tracks Music Library Stevens Point, WI Signature Music, Inc Chesterton, IN Studio Cutz Music Library Carrollton, TX © 2016 Laureate Education, Inc. 3 Time Series Analysis Anne Senter One definition of a time series is that of a collection of quantitative observations that are evenly spaced in time and measured successively. Examples of time series include the continuous monitoring of a person’s heart rate, hourly readings of air temperature, daily closing price of a company stock, monthly rainfall data, and yearly sales figures. Time series analysis is generally used when there are 50 or more data points in a series. If the time series exhibits seasonality, there should be 4 to 5 cycles of
  • 9. observations in order to fit a seasonal model to the data. Goals of time series analysis: 1. Descriptive: Identify patterns in correlated data—trends and seasonal variation 2. Explanation: understanding and modeling the data 3. Forecasting: prediction of short-term trends from previous patterns 4. Intervention analysis: how does a single event change the time series? 5. Quality control: deviations of a specified size indicate a problem Time series are analyzed in order to understand the underlying structure and function that produce the observations. Understanding the mechanisms of a time series allows a mathematical model to be developed that explains the data in such a way that prediction, monitoring, or control can occur. Examples include prediction/forecasting, which is widely used in economics and business. Monitoring of ambient conditions, or of an input or an output, is common in science and industry. Quality control is used in computer science, communications, and industry. It is assumed that a time series data set has at least one systematic pattern. The most common patterns are trends and seasonality. Trends are generally linear or quadratic. To find trends, moving averages or regression analysis is often used. Seasonality is a trend that repeats itself systematically over time. A second assumption is that the data exhibits enough of a random process so that it is hard to identify the
  • 10. systematic patterns within the data. Time series analysis techniques often employ some type of filter to the data in order to dampen the error. Other potential patterns have to do with lingering effects of earlier observations or earlier random errors. There are numerous software programs that will analyze time series, such as SPSS, JMP, and SAS/ETS. For those who want to learn or are comfortable with coding, Matlab, S-PLUS, and R are other software packages that can perform time series analyses. Excel can be used if linear regression analysis is all that is required (that is, if all you want to find out is the magnitude of the most obvious trend). A word of caution about using multiple regression techniques with time series data: because of the autocorrelation nature of time series, time series violate the assumption of independence of errors. Type I error rates will increase substantially when autocorrelation is present. Also, inherent patterns in the data may dampen or enhance the effect of an intervention; in time series analysis, patterns are accounted for within the analysis. Observations made over time can be either discrete or continuous. Both types of observations can be equally spaced, unequally spaced, or have missing data. Discrete measurements can be recorded at any time interval, but are most often taken at evenly spaced intervals. Continuous measurements can be spaced randomly in time, such as measuring earthquakes as they occur
  • 11. because an instrument is constantly recording, or can entail constant measurement of a natural phenomenon such as air temperature, or a process such as velocity of an airplane. Time series are very complex because each observation is somewhat dependent upon the previous observation, and often is influenced by more than one previous observation. Random error is also influential from one observation to another. These influences are called autocorrelation—dependent relationships between successive observations of the same variable. The challenge of time series analysis is to extract the autocorrelation elements of the data, either to understand the trend itself or to model the underlying mechanisms. Time series reflect the stochastic nature of most measurements over time. Thus, data may be skewed, with mean and variation not constant, non-normally distributed, and not randomly sampled or independent. Another non-normal aspect of time series observations is that they are often not evenly spaced in time due to instrument failure, or simply due to variation in the number of days in a month. There are two main approaches used to analyze time series (1) in the time domain or (2) in the frequency domain. Many techniques are available to analyze data within each domain. Analysis in the time
  • 12. domain is most often used for stochastic observations. One common technique is the Box-Jenkins ARIMA method, which can be used for univariate (a single data set) or multivariate (comparing two or more data sets) analyses. The ARIMA technique uses moving averages, detrending, and regression methods to detect and remove autocorrelation in the data. Below, I will demonstrate a Box-Jenkins ARIMA time domain analysis of a single data set. Analysis in the frequency domain is often used for periodic and cyclical observations. Common techniques are spectral analysis, harmonic analysis, and periodogram analysis. A specialized technique is Fast Fourier Transform (FFT). Mathematically, frequency domain techniques use fewer computations than time domain techniques, thus for complex data, analysis in the frequency domain is most common. However, frequency analysis is more difficult to understand, so time domain analysis is generally used outside of the sciences. Time series analysis using ARIMA methods Using the ARIMA (auto-regressive, integrated, moving average) method is an iterative, exploratory, process intended to best-fit your time series observations by using three steps—identification,
  • 13. estimation, and diagnostic checking—in the process of building an adequate model for a time series. The auto-regressive component (AR) in ARIMA is designated as p, the integrated component (I) as d, and moving average (MA) as q. The AR component represents the lingering effects of previous observations. The I component represents trends, including seasonality. And the MA component represents lingering effects of previous random shocks (or error). To fit an ARIMA model to a time series, the order of each model component must be selected. Usually a small integer value (usually 0, 1, or 2) is found for each component. The goal is to find the most parsimonious model with the smallest number of estimated parameters needed to adequately model the patterns in the observed data. In order to demonstrate time series analysis, I introduce a data set of monthly precipitation totals from Portola, CA in the Sierra Nevada in Table 1. When a time series has strong seasonality, as my data set does, a slightly different type of ARIMA (p,d,q) process is used, which is often called SARIMA (p,d,q)*(P,D,Q), where S stands for seasonal. In this model, not only are there possible AR, I, and MA terms for the data, there is a second set of AR, I, and MA terms that take into account the seasonality of the data. Time series data are correlated,
  • 14. which means that measurements are related to one another and change together to some degree. Thus, each observation is partially predictable from previous observations, or from previous random shocks, or from both. An assumption made after analysis is that the correlations inherent in the data set have been adequately modeled. Thus after a model has been built, any leftover variations are considered to be independent and normally distributed with mean zero and constant variance over time. These leftover variations are used to interpret the data. Regardless of which technique is used, the first step in any time series analysis is to plot the observed values against time. A number of qualitative aspects are noticeable as you visually inspect the graph. In Figure 1, we see that there is a 12-month pattern of seasonality, no evidence of a linear trend, and, variation from the mean appears to be approximately equal across time. Monthly precipitation data from NOAA weather station in Portola, Ca., from January 1999 through April 2004 Figure 1. Precipitation occurs cyclically. December falls on number 12, 24, 36, 48,
  • 15. 60, and 72. Mean = 1.66 inches/month, standard deviation = 2.09, n = 76. Is there a trend to this data set? The simplest linear equation would be y = b, where b is the random shock, or error, of the data set. The linear equation for my data set is y = -0.0018x + 1.6688. With a slope of -0.0018, there is no significant linear trend. This data set needs no further work to eliminate a linear or quadratic trend. If removal of the trend—detrending—is needed, I would proceed to differencing. Ordinary least squares analysis is another method used to detect and remove trends. Differencing has advantages of ease of use and simplicity, but also has disadvantages including over- correcting for trends, which skews the correlations in a negative direction. There are other problems with differencing that are covered in textbooks. Differencing means calculating the difference among pairs of observations at some time interval. A difference of one time interval apart is calculated by subtracting value #1 from value #2, then #2 from #3, and on, and plotting that data to determine if mean of 0 and a constant variance are present. If differencing of one does not detrend the data,
  • 16. calculate a difference of 2 by subtracting difference #2 from difference #3, and on. Use a log transformation on the differences if necessary to stabilize the mean and variance. Seasonal autocorrelation is different from a linear or quadratic data trend in that it is predictably spaced in time. Our precipitation data can be expected to have a 12-month seasonal pattern, whereas daily observations might have a 7-day pattern, and hourly observations often have a 24-hour pattern. Equation 1 In order to detect seasonality, plot the autocorrelation function (ACF) by calculating and graphing the residuals (observed minus mean for each data point). The graph of the residuals against a specified time interval is called a lagged autocorrelation function or a correlogram. The null hypothesis for the ACF is that the time series observations are not correlated to one another, i.e.; that any pattern in the data is from random shocks only. The residuals can be calculated using equation 1. In time series analysis a lag is defined as: an event occurring at time t + k (k > 0) is said to lag behind an event occurring at
  • 17. time t, the extent of the lag being k. In 1970, Box and Jenkins wrote, “..to obtain a useful estimate of the autocorrelation function, we would need at least 50 observations and the estimated autocorrelations would be calculated for k = 0, 1, …, k, where k was not larger than N/4”. For my data set of 78 observations, I specified 19 autocorrelation lags (78/4 = 19.5). A rule of thumb for an ACF is if there are plotted residuals that are greater than 2 standard errors away from the zero mean, they indicate statistically significant autocorrelation. In Figure 2, there are 2 residual values, at lag 6 and lag 12, that lay more than 2 standard errors—that is, the approximate 95% confidence limits—from the zero mean. I interpret this as a 6-month seasonal pattern that cycles between summer when there is little to no precipitation, and winter when precipitation is at its peak. So, even though the linear equation reveals no trend, graphing the ACF reveals seasonality. I used the JMP software program from SAS to analyze my data set. Though I will not cover how to perform a time series analysis in the spectral domain, I did use the spectral density graph to verify that the biggest seasonal pattern occurs at 12-month intervals, not at 6-month intervals. In Figure 3, notice the large spike at period 12. Lagged autocorrelation function of Portola, Ca precipitation data.
  • 18. Figure 2. Visual inspection shows significant deviations from zero correlation at lag 1, 6, and 12, and very close at lag 7 and 13. Interpretation suggests that there are two seasonal (rainy season and dry season) patterns spaced about 6 months apart. Number of autocorrelation lags equals 19. Spectral Density as a function of period Figure 3. A strong signal appears at about period 12, corresponding to a yearly cycle. The partial autocorrelation function (PACF) is also used to detect trends and seasonality. Figure 4 is the PACF of the precipitation data. In general, the PACF is the amount of correlation between a variable and its lag that is not explained by correlations at all lower-order lags. The equation to obtain partial autocorrelations is very complex, and is best explained in time
  • 19. series textbooks. Lagged partial autocorrelation function of Portola, Ca precipitation data. Figure 4. Significant deviation from zero is evident at lags 1, 6, and 12, suggesting the same 6-month seasonal pattern. Now that our observations against time, as well as the ACF, and PACF have been graphed, we can begin to match our patterns to idealized ARIMA models. The easy way to analyze a time series data set is to simply input numerous variations of ARIMA. There are also systematic steps that you can take that will help suggest the best values for the AR, I, and MA terms. Here I present a few general rules to apply when working to identify the best-fit ARIMA model. These rules come from the Duke University website http://www.duke.edu/~rnau/411home.htm, that, along with other textbooks and websites listed below, was instrumental in helping me understand time series analysis, and specifically in helping me understand the nuances of seasonally affected time series.
  • 20. After adjusting the data by a seasonal difference of 1 using JMP, a visual inspection of shows that the ACF decays more slowly than the PACF, Figure 5. I used Duke’s Rule #3: The optimal order of differencing is often the order of differencing at which the standard deviation is lowest, to help me determine that my data needed no differencing for trend but did need to be differenced for seasonality (both options available in JMP). A seasonal difference of 1 yields a standard deviation of 1.89, the lowest value of the iterations that I tried. ACF and PACF after seasonal differencing of 1. Figure 5. All ACF and PACF lags fall below significant levels, indicating that autocorrelation has been eliminated. Using the iterative approach of checking model values via JMP, I found that the lowest values of Aikaike’s ‘A’ Information Criterion (AIC), Schwarz’s Bayesian Criterion, and the -2LogLikelihood for my data set are obtained with an ARIMA (0,0,0)(1,1,1). According to Duke’s Rule 8, it is possible
  • 21. for an AR term and an MA term to cancel each other out. They suggested that I try a model with one fewer AR term and one fewer MA term, particularly if it takes more than 10 iterations for the model to converge. My model took 6 iterations to converge. Duke’s Rule 12 states that if a series has a strong and consistent seasonal pattern, never use more than one order of seasonal differencing or more than 2 orders of total differencing (seasonal + nonseasonal). Rule 13 states that if the autocorrelation at the seasonal period is positive, consider adding an SAR term, and if negative try adding an SMA term to the model. Do not mix SAR and SMA terms in the same model. Duke’s rules for seasonality suggest that I not accept a mixed model as the best-fit model for my data. I eliminated the AR and MA terms, but that model yielded a higher value of AIC, Schwarz’s Bayesian Criterion, and a much higher value of the -2LogLikelihood. I also successively eliminated the AR or the MA term while leaving the other term in, but still got higher values for all test parameters. Based on the parameter values, I believe that the ARIMA (0,0,0)(1,1,1) is the best model for my data. Parameter estimates of the most likely SARIMA models
  • 22. Model DF Variance AIC Seasonal ARIMA(0, 0, 0)(1, 1, 0)12 62 3.5908132 83.784319 Seasonal ARIMA(0, 0, 0)(0, 1, 1)12 62 3.5125921 82.374756 Seasonal ARIMA(0, 0, 0)(0, 1, 0)12 63 3.6544726 83.93302 Seasonal ARIMA(0, 0, 0)(1, 1, 1)12 61 2.8333581 69.581017 SBC RSquare -2LogLH 88.102085 -0.11 80.1373 86.692522 -0.09 79.272251 86.091903 -0.14 348.10154
  • 23. 76.057666 -0.04 75.26258 Table 2. Model #4, SARIMA (0,0,0)(1,1,1) has the lowest variance, AIC, SBC, RSquare, and -2LogLH. About 20 models were tested; these four had the lowest scores. I have demonstrated best-fitting an ARIMA model to a time series using description and explanation phases of time series analysis. If I were to continue with this exercise, I could use this model to predict precipitation for the next year or two. Most software programs are capable of extrapolating values based on previous patterns in the data set. This topic is covered in textbooks. There are numerous books, websites, and software programs available for working with time series. I found that most of the books that were solely dedicated to time series were quite dense with formulas, thus difficult to understand. Some websites were somewhat easier to understand but only a couple offered a step-by-step process to guide you through an analysis. I used just one software program, JMP, and used the help guide extensively. The help guide was useful in understanding the generated graphs, but offered definitions without elaboration as to how to
  • 24. interpret the defined data. If you are going to analyze a time series, I suggest using multiple resources, and especially if you are new to time series analysis (like I am), find a knowledgeable person who can help you with interpretation of your results. Books: If the CD-ROM is available, this text will walk you through many analyses. Brockwell, P.J. and Davis, R.A. 2002, 2nd ed. Introduction to time series and forecasting. Springer, New York. These guys wrote the book on ARIMA processes. Box, G.E.P., Jenkins, G.M., and Reinsel, G.C. 1994, 3rd ed. Time series analysis: Forecasting and control. Prentice Hall, Englewood Cliffs, NJ. This book is pretty understandable, though still lots of formulas. Chatfield, C. 2004, 6th ed. The analysis of time series – an introduction. Chapman and Hall, London, UK. An excellent discussion of problems and solutions to ARIMA techniques. Glass, G.V., Willson, V.L., and Gottman, J.M. 1975. Design and analysis of time-series experiments. Colorado Associated University Press, Boulder, Colorado.
  • 25. An interesting read about time series from a historical perspective. Klein, J.L. 1997. Statistical visions in time: a history of time series analysis, 1662-1938. Cambridge University Press, New York. The time series chapter is understandable and easily followed. Tabachnick, B.G., and Fidell, L.S. 2001, 4th ed. Using multivariate statistics. Allyn and Bacon, Needham Heights, MA. Websites: This is the best website that I found in my web searches. It is a step-by-step guide to understanding many aspects of time series, including a series of ‘rules’ to use when analyzing your data. http://www.duke.edu/~rnau/411home.htm An introduction to time series analysis from an engineering point of view, with two worked examples. Very helpful. http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.ht m Extensive website with LOTS of useful information once you get through the business talk. Has applets for determining stationarity, seasonality, mean, variance, etc. http://home.ubalt.edu/ntsbarsh/Business-stat/stat-
  • 26. data/Forecast.htm Useful for definitions, would be great if they had examples of actual analyses. http://www.statsoftinc.com/textbook/stathome.html Step-by-step explanation of time series analysis, including examples of how to use Excel to adjust for seasonality and analyzing the data by using linear regression, all in the Crunching section. http://www.bized.ac.uk/timeweb/index.htm Type in time series in product search to see available books that are short but sweet. http://www.sagepub.com/Home.aspx Website for my precipitation data. http://www.wrh.noaa.gov/cnrfc/monthly_precip.php Website for the software package that I used in this presentation. http://www.jmp.com/ Extensive and easy to use statistical software package. http://www.spss.com/ Free software for analyzing time series data sets, but you need to code. http://www.r-project.org/
  • 27. Free statistics and forecasting software (didn’t try out, so can’t say how good) http://www.wessa.net/ @charset "windows-1252"; @font-face { font-family: Geneva; } @font-face { font-family: Times-Roman; } p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; } p.MsoCommentText, li.MsoCommentText, div.MsoCommentText { margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; } span.MsoCommentReference { }
  • 28. a:link, span.MsoHyperlink { color: blue; text-decoration: underline; } a:visited, span.MsoHyperlinkFollowed { color: purple; text- decoration: underline; } p.MsoCommentSubject, li.MsoCommentSubject, div.MsoCommentSubject { margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; } span.SpellE { } span.GramE { } @page Section1 { size: 8.5in 11in; margin: 1in 1.25in; } div.Section1 { page: Section1; }
  • 29. Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 74 Graph Based Framework for Time Series Prediction Vivek Yadav * Durga Toshniwal ** Abstract Purpose: A time series comprises of a sequence of observations ordered with time. A major task of data mining with regard to time series data is predicting the future values. In time series there is a general notion that some aspect of past pattern will continue in future. Existing time series techniques fail to capture the
  • 30. knowledge present in databases to make good assumptions of future values. Design/Methodology/Approach: Application of graph matching technique to time series data is applied in the paper. Findings: The study found that use of graph matching techniques on time-series data can be a useful technique for finding hidden patterns in time series database. Research Implications: The study motivates to map time series data and graphs and use existing graph mining techniques to discover patterns from time series data and use the derived patterns for making predictions. Originality/Value: The study maps the time-series data as graphs and use graph mining techniques to discover knowledge from time series data. Keywords: Data mining; Time Series Prediction; Graph Mining; Graph Matching Paper Type: Conceptual Introduction ata mining is the process of extracting meaningful and potentially useful patterns from large datasets. Nowadays, data mining is becoming an increasingly important tool by modern business processes to transform data into business intelligence giving business processes an informational advantage to make their strategic business decisions based on the past observed patterns rather than on intuitions or beliefs (Clifton, 2011). Graph based framework for time series prediction is a step towards exploring new efficient
  • 31. approach for time series prediction where predictions are based on patterns observed in past. Time Series data consists of sequences of values or events obtained over repeated instances of time. Mostly these values or events are collected at equally spaced, discrete time intervals (e.g., hourly, daily, weekly, monthly, yearly etc.). When there is only one variable upon which observations with respect to (w.r.t) time are made, is called univariate time series. Data mining on Time-series data is popular in many applications, such as stock market analysis, economic and sales forecasting, budgetary analysis, utility studies, inventory studies, yield * Department of Electronics & Computer Engineering, IIT Roorkee. email: [email protected] ** Assistant Professor. Department of Electronics & Computer Engineering, IIT Roorkee. email: [email protected] D Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 75 projections, workload projections, process and quality control,
  • 32. observation of natural phenomena (such as atmosphere, temperature, wind, earthquake), scientific and engineering experiments, and medical treatments (Han & Kamber, 2006). Time series dataset constitutes of {Y1, Y2, Y3, …, Yt } values, where each Yi represent the value of variable under study at time i. One of the major goal of Data mining in the time series is forecasting the time series i.e., to predict the future value Yt+1. The successive observations in time series are statistically dependent on time and time series modeling is concerned with techniques for analysis of such dependencies. In time series analysis, a basic assumption is made that is (i.e.) some aspect of past pattern will continue in future. Under this assumption time series prediction is assumed to be based on past values of the main variable Y. The time series prediction can be useful in planning and measuring the performance of predicted value on past data against actual observed value on the main variable Y. Time series modeling is advantageous, as it can be used more easily for forecasting purposes since the historical sequences of observations upon study on main variable are readily available as they are recorded in the form of past observations & can be purchased or gathered from published secondary sources. In time series modeling, the prediction of
  • 33. values for future periods is based on the pattern of past values of the variable under study, but the model does not generally account for explanatory variable which may have affected the system. There are two reasons for resorting to such time models. First, the system may not be understood, and even if it is understood it may be extremely difficult to measure the cause and effect relationship of parameters affecting the time series. Second, the main concern may be only to predict the next value and not to explicitly know why it was observed (Box, Jenkins & Reinsel, 1976) Time Series analysis consists of four major components for characterizing time-series data (Madsen, 2008). First, Trend component- these indicate the general direction in which a time series data is moving over a long interval of time, denoted by T. Second, Cyclic component- these refer to the cycles, that is, the long-term oscillations about a trend line or curve, which may or may not be periodic, denoted by C. Third, Seasonal component- these are systematic or calendar related, denoted by S. Fourth, Random component- these characterize the sporadic motion of time series due to random or chance events, denoted by R. Time-series modeling is also referred to as the decomposition of a time
  • 34. series into these four basic components. The time-series variable Y at the time t can be modeled as either the product of the four variables at time t (i.e., Yt = Tt×Ct× St× Rt) using multiplicative model proposed by (Box, Jenkins & Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 76 Reinsel, 1970) where Tt means Trend component at time t, Ct means cyclic component at time t, St means seasonal component at time t and Rt signifies Random component at time t. As an alternative, additive model (Balestra & Nerlove, 1966; Bollerslev, 1987) can also be used in which (Yt = Tt+Ct+St+Rt) where Yt, Tt, Ct, St, Rt have the same meaning as described above. Since multiplicative model is the most popular model, we will use it for the time series decomposition. Example of time series data is the airline passenger data set (Fig. 1) in which the main variable Y is the number of passengers (in thousands) in an airline is recorded w.r.t time, where each observation on main variable is recorded on monthly basis from January 1949 to December 1960. Clearly, the time series is
  • 35. affected by increasing trend, seasonal and cyclic variations. Fig. 1: Time series Data of the Airline Passenger Data from Year 1949 to 1960 represented on monthly basis. Review of Literature In time series analysis there is an important notion of de- seasonalizing the time series (Box & Pierce, 1970). It makes the assumption that if the time series represents a seasonal pattern of L periods, then by taking moving average Mt of L periods, we would get the mean value for the year. This would be free of seasonality and contain little randomness (owing to averaging). Thus Mt=Tt×Ct (Box, Jenkins & Reinsel, 1976). To determine the seasonal component, one would simply divide the original series by the moving average i.e., Yt/Mt= (Tt×Ct× St× Rt)/( Tt×Ct )= St× Rt. Taking average over months eliminates randomness and yields seasonality component St. De-seasonalized Yt time series can be computed by Yt/St. The approach described in (Box, et al, 1976) for predicting the time series, uses regression to fit a curve to De-seasonalized time series using
  • 36. Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 77 least square method. To predict the values in time series, model projects the De-seasonalized time series into future using regression and divide it by the seasonal component. The Least Square Method is explained in Johnson and Wichern (2002). Exponential Smoothing has been proposed in (Shumway & Stoffer, 1982) which is an extension to above method to make more accurate predictions. It suggests, making prediction for Yt weighing the most recent observation (Yt-1) by α and weighting the most recent forecast (Ft-1) by (1- α). Note α lies between 0 and 1 (i.e., 0≤α≤1). Thus the forecast is given by Ft+1= Yt-1* α +(Ft-1) * (1- α). Optimal α is chosen based on the smallest MSE (Mean Square Error) value during the training. ARIMA (Auto-Regressive Integration Moving Average Based Model) has also been proposed (Box, et al., 1970, 1976; Hamilton, 1989). ARIMA model is categorized by ARIMA(p,q,d) where p denotes order of auto- regression, q denotes order of differentiation and d denotes order of moving averages. The model tries to find the value of p, q, and
  • 37. d that best fits the data. In time series forecasting using a hybrid ARIMA and neural network model has proposed a model that tries to find p, q and d using neural network (Zhang, 2003). Proposed Work: Graph Based Framework for Time Series Prediction In this paper, I propose to use graph based framework for time series prediction. The motivation to use the graphs is to capture the tacit historical pattern present in the dataset. The idea behind creation of graph over time series is to utilize two facts. First, some aspect of time series pattern will continue in future and graph is a data structure that is well suited to model a pattern. Second, similarity can be calculated between graphs to know the similar patterns and their order of occurrence. Thus, graph is created with the motivation to store a pattern over time series and make prediction based on similarity of observed pattern from historical data as an alternative to Regression and curve fitting. The major shortcoming of using the regression and curve fitting is that it requires expert knowledge about curve equation and the number of parameters in it. If parameters are too many there is problem of over fitting and if parameters are too less, model suffers from problem of
  • 38. under fitting (Han & Kamber, 2006). The complete pattern in time series is not known initially and it is affected by random component which makes the regression harder, hence deciding the curve equation and number of parameters in it is a major issue. To further explore the concept of pattern, let there be time series on monthly data of N years where first observation was in first month of m year, Data = {Y1(k)Y2(k)…Y12(k), Y1(k+1) Y2(k+1) …Y12(k+1),…, Y1(k+N)Y2(k+N)…Y12(k+N)} where Y1(k) means value of variable under study for first month of year k Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 78 & Y12(k+N) means value of variable under study for twelfth month of year k+N. Note m≤k≤(m+N). In general let d, be the time interval which makes a pattern. If a pattern has to be stored yearly and data is available monthly d=12, data is available quarterly d=4, etc. Each successive observation to Yij (meaning month i and year j) on main variable ordered by time is in general given by Yi’j’ where if Yij 1≤i≤12, k≤j≤(k+N), then for Yi’j’ if i<12 then i’=i+1, j’=j else i’=1, j’=j+1. A graph over
  • 39. each successive d observation is created to store the pattern. This is called ‘last- pattern- observed-graph’. To make the prediction we also store the knowledge in each graph that how the last pattern observed effect the next observation. This is called ‘knowledge-graph’. Example If we consider the data {Y1(k)Y2(k)…Y12(k), Y1(k+1) Y2(k+1) …Y12(k+1),…, Y1(k+N) Y2(k+N)…Y12(k+N)}, last- pattern-observed-graph for Jan of year (k+1) will be generated using data {Y1(k)Y2(k)…Y12(k)} and knowledge-graph of Jan for year (k+1) will be generated using {Y1(k)Y2(k)…Y12(k), Y1(k+1)} data. Knowledge graph is created with intuition to capture how the variable under study changed over last d observations and its effect on d+1 observation. In time series data, the graph is created with the motivation to model each observation as vertex and represent the effect of variation in observations with respect to time in form of edges. The number of vertices in graph is equal to time interval over which a pattern has to be stored. The edges are created to take into account the effect of each observation on other. Since the past values will affect the future values, but future values would not affect the past values and hence the edges are created between vertices corresponding to it and all the subsequent observations which measure the change in angle with
  • 40. horizontal. The graphs generated can be represented in computer memory either by using Adjacency matrix representation or Adjacency list representation (Cormen, 2001). I have used Adjacency list representation to save the memory required to store the graph as each graph will have n(n- 1)/2 edges thus space required will be n(n-1)/2 using adjacency list representation as compared to n 2 space using adjacency matrix representation. Dataset of N tuples is partitioned into two sets. First set for training data of m tuples and second {N-m} tuples for training and validation of model. During the training phase, a Knowledge-Graph is generated over training data tuples over each subsequent d+1 observation. Yi(k)Y(i+1)(k)…Y(i+12)(k), Y(i+13) (k) where i has bounds 1≤i≤12 and if i>12 then i=1 & k=k+1 for all m tuples in training Dataset. Thus m-12 Knowledge-Graphs are generated. These generated graphs are partitioned into d sets (d=12), where each graph is stored in the interval over which knowledge they have captured (i.e. graph for all Jan’s are stored together, all Feb’s stored together, etc.). To implement this we have used an array of size d of linked list of graphs.
  • 41. Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 79 Each linked list stores all the knowledge graph corresponding to interval over which knowledge it represents. The graphs are partitioned with the motivation to ease the search since while making prediction, model will query for all patterns observed w.r.t a particular month, since the graphs are already stored in partitioned form, time taken by model to execute this query will be O(1). To predict the next value in time series, model will take the last d known observations previous to the month on which prediction has to be done and compute ‘last-pattern-observed-Graph’. The model will search for a Knowledge graph (stored in the partitioned form corresponding to month for which prediction has to be made) that is most similar to ‘last-pattern observed graph’, considering only number of vertices equal to ‘last- pattern observed graph’ in Knowledge-Graph. To compute the similarity between two graphs, graph-edit distance technique has been used (Brown, 2004; Bunke & Riesen, 2008). The key idea of Graph-
  • 42. edit Distance approach is to model structural variation by edit operations reflecting modifications in structure and labeling. A standard set of edit operations is given by insertions, deletions, and substitutions of both nodes and edges. While calculating graph edit distance for time- series Graph for g1 (source graph) & g2 (destination graph), requires only substitutions of edges (change in angle) in g2 to make it similar to g1 and a summation of cost incurred with each edit operation is calculated. The graph with least edit cost is most similar & selected as a graph that will form the basis, of the prediction. To make the prediction, model takes into account the structural difference between two graphs in vertex ordered weighted average manner. To make the prediction on graph g1 (last-pattern- observed- Graph) using graph g2 (Knowledge Graph which is most similar to g1), every vertex in g1 predicts the angle between itself and the predicted value using the knowledge of g2 and taking into account the difference of edges between itself & it’s corresponding vertex in g2 in a weighted average manner (where edge difference to vertex that are closer to be predicted are given more weight technique to apply exponential smoothing in Graph based time series prediction approach), and thus in
  • 43. this way each vertex predicts the angle. Every vertex makes the prediction & the predicted value is average of value predicted by each vertex. After making the prediction, once the actual observed value is known, Knowledge graph is generated to capture the pattern corresponding to the last observation and in this way model learns in an iterative manner. Experimental Results The code to implement Graph Based Time Series prediction approach as discussed above is written in java. The Graph Based Time Series Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 80 prediction approach was applied on the airline passenger data set, which was first used in (Brown & Smoothing, 1962) and then in (Box, et al., 1976). It represents the number of airline passengers in thousands observed between January 1949 and December 1960 on a monthly basis. I have used 2 years of data for training i.e., 1949 & 1950 and estimated the remaining data on monthly basis implementing iterative learning as an observation is recorded. Fig. 2 represents Actual and Predicted number of Passenger
  • 44. using Graph Based Framework for Time Series prediction applied on the Time Series of airline passenger data set. Fig. 3 represents the corresponding percentage error rate observed on monthly basis. The average error recorded on time-series is 7.05.Fig. 4 represents the Actual and Predicted Number of passenger using Graph Based Framework for Time Series prediction applied on the De-seasonalized Time Series of airline passenger data set (using concept of Moving Average). Fig. 5 represents the corresponding percentage error rate observed on monthly basis. The average percentage error recorded on De-seasonalized Time series is 5.81. Fig. 2: Actual and Predicted number of Passenger using Graph Based Framework for Time Series prediction applied on the Time Series of airline passenger data set (APTS). Fig. 3: Percentage Error between Actual and predicted using Graph Based Framework for Time Series prediction applied on the Time Series of airline passenger data set (APTS).
  • 45. Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 81 Fig. 4: Actual and Predicted number of Passenger using Graph Based Framework for Time Series prediction applied on the De-seasonalized Time Series of airline passenger data set (APTS). Fig. 5: Percentage Error between Actual and Predicted values using Graph Based Framework for Time Series prediction applied on the De- seasonalized Time Series of airline passenger data set (APTS). Conclusion & Discussion A new approach for time series prediction has been proposed & implemented which is based on graphs. The results reported show that using graph based framework for time series prediction on De- seasonalized Time Series (Computed Using Concept of Moving Average) on The Airline Passenger Data has 94.19 percent accuracy and on direct Time Series of The Airline Passenger Data has 92.95 percent
  • 46. accuracy. The accuracy on De-seasonalized time series is better since this time series has only two factors, cyclic and trend factors which leads to less error rate as compared to direct application of proposed approach on time-series which has all the four factors cyclic, trend, seasonal and randomness, which makes the prediction difficult. Thus application of Graph based framework in conjunction to Moving average offers good accuracy. Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 82 Graph based framework approach for time series prediction has incorporated the concept of exponential smoothing, moving average and graph mining to enhance its accuracy. Graph based framework approach for time series prediction is a good alternative to regression. In the proposed approach there is no need of domain expert knowledge to know the curve equation and number of parameters in it. The result validate that the new approach has good accuracy rate. References
  • 47. Balestra, P., & Nerlove, M. (1966). Pooling cross section and time series data in the estimation of a dynamic model: The demand for natural gas. Econometrica, 34(3), 585-612. Bollerslev, T. (1987). A conditionally heteroskedastic time series model for speculative prices and rates of return. The review of economics and statistics, 69(3), 542-547. Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1970). Time series analysis. Oakland, CA: Holden-Day. Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1976). Time series analysis: forecasting and control (Vol. 16): San Francisco, CA: Holden- Day. Box, G. E. P., & Pierce, D. A. (1970). Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association, 65(332), 1509-1526. Brown, R. G. (2004). Smoothing, forecasting and prediction of discrete time series. Mineola, NY: Dover Publications. Brown, R. G., & Smoothing, F. (1962). Prediction of Discrete Time Series. Englewood Cliffs, NJ: Prentice Hall. Bunke, H., & Riesen, K. (2008). Graph Classification Based on Dissimilarity Space Embedding. In N. da Vitoria Lobo, T. Kasparis, F. Roli,
  • 48. J. Kwok, M. Georgiopoulos, G. Anagnostopoulos & M. Loog (Eds.), Structural, Syntactic, and Statistical Pattern Recognition (Vol. 5342, pp. 996-1007): Berlin / Heidelberg: Springer Clifton, C. (2011). Data Mining. In Encyclopaedia Britannica. Retrieved from http://www.britannica.com/EBchecked/topic/1056150/data- mining Cormen, T. H. (2001). Introduction to algorithms. Cambridge, Mass: The MIT press. Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57(2), 357-384. Han, J., & Kamber, M. (2006). Data mining: concepts and techniques: Morgan Kaufmann. Graph Based Framework for Time Series Prediction Yadav*& Toshniwal TRIM 7 (2) July - Dec 2011 83 Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis (Vol. 5): NJ: Prentice Hall Upper Saddle River.
  • 49. Madsen, H. (2008). Time series analysis. Boca Raton: Chapman and Hall/CRC Press. Shumway, R. H., & Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using the EM algorithm. Journal of time series analysis, 3(4), 253-264. Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159-175. doi: 10.1016/s0925-2312(01)00702-0 Copyright of Trends in Information Management is the property of University of Kashmir and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.