SlideShare a Scribd company logo
1 of 38
Download to read offline
http://publicationslist.org/junio
Data Analysis
Time as a variable: time-series analysis
Prof. Dr. Jose Fernando Rodrigues Junior
ICMC-USP
http://publicationslist.org/junio
What is it about?
Time series are an incredibly common kind of data
 Stock market
 CPU utilization
 Meteorology - daily rainfall, wind speed, and temperature
 Sociology - crime figures, employment figures
 Software engineering – number of errors
 Networks – number of nodes, and edges
http://publicationslist.org/junio
First examples
Consider a data set with the concentration (ppm) of carbon
dioxide (CO2) in the atmosphere, as measured by the
observatory on Mauna Loa on Hawaii, recorded at monthly
intervals since 1959
The plot shows two
common features in
time series:
 Trend: a steady, long-
term linear growth
 Seasonality: a regular
periodic pattern – on 12
month cycle
http://publicationslist.org/junio
First examples
Consider the data set with the price of long-distance phone
calls in the US over the last century
The plot shows a strong
nonlinear trend
The single-log plot (inset)
shows that the data follow a
power-law distribution
(logarithmic linear) – a usual
behavior of growth/decay
processes
http://publicationslist.org/junio
First examples
Consider the data set with the price of long-distance phone
calls in the US over the last century
The plot shows a strong
nonlinear trend
The single-log plot (inset)
shows that the data follow a
power-law distribution
(logarithmic linear) – a usual
behavior of growth/decay
processes
This example asks for closer inspection:
• Has the long-distance call service changed along
time?
• Were the prices adjusted for inflation?
• What are the uncharacteristically low prices for a
couple of years in the late 1970s? Did the breakup
of the AT&T system have anything to do with it?
http://publicationslist.org/junio
First examples
Consider the data set with the development of the Japanese
stock market as represented by the Nikkei Stock Index over
the last 40 years shown with a 31-point Gaussian smoothing
filter
The plot shows a change in
the behavior after 1990
(the big Japanese bubble),
after which a long-term
increasing trend turned into
an oscillatory decreasing
trend
The seasonality also
changed significantly after
then
http://publicationslist.org/junio
First examples
Consider a data set with the number of daily calls placed in a
call center for a time period slightly longer than two years
 This example is way more
challenging with its complex
structure
 Actually, it is not clear whether
the high-frequency variation in
the plot is noise or has some
form of regularity
 In an initial analysis, not many
conclusions can be drawn from
the plot – apparently, no
trend, no seasonality, and
no change in behavior
http://publicationslist.org/junio
First examples
Consider a data set with the number of daily calls placed in a
call center for a time period slightly longer than two years
 This example is way more
challenging with is complex
structure
 Actually, it is not clear whether
the high-frequency variation in
the plot is noise or has some
form of regularity
 In an initial analysis, not many
conclusions can be drawn from
the plot – apparently, no
trend, no seasonality, and
no change in behavior
As time-series commonly counts on long-term data, it
is important to certify that the data acquisition was
homogeneous along the period, otherwise the series
may change its behavior in ways that becomes hard to
make sense
http://publicationslist.org/junio
Main components
As we have seen, the main components observed are:
 Trend: linear or non-linear, with a characteristic magnitude
 Seasonality: additive, for example, every 12 months the sales
increase by 3 million; or multiplicative, for example, every 12
months the sales increase by 1.4 times what was observed in the last
cycle
 Noise: some form of random variation, quite common
 Other: change in behavior, special outliers, missing data, and anything
remarkable
http://publicationslist.org/junio
Assumptions
Standard methods of time-series analysis make a number of
assumptions, all of them are violated in real-world
scenarios:
 Data points have been taken at equally spaced time steps, with
no missing data points: demands interpolation in case of missing
points, or re-sampling in case of insufficient sampling
 The time series is sufficiently long (at least 50 points): requires
smoothing methods to define a continuous curve, even where
there are no points
 The series is stationary, it has no trend, no seasonality, and the
character (amplitude and frequency) of any noise does not change
with time: may require breaking the series into multiple
segments to be analyzed separately
http://publicationslist.org/junio
Smoothing
Just as with two-variable data, it is useful to fit a curve
according to the available data (actually, a time series is a
special case of two-variable data)
Smoothing helps in:
 Reducing noise
 Interpolating missing/insufficient values
http://publicationslist.org/junio
Running averages
The method know as running (moving, rolling, or floating)
average is straightforward: for any odd number of consecutive
points, replace the centermost value with the average of
the other points
The smoothed point si is given by:
where xi are the data points
For example, for a 5-point (k=2) moving average, consider
point x10 = 4, and points x8 = 4, x9 = 7, x11 = 2, x12 = 9, so
s10 = 1/5*(4+7+4+2+9)=1/5*26 = 5.2
And so forth for any point
http://publicationslist.org/junio
Weighted running averages
Running averages do not work well in the presence of
outliers, what may distort the curve
The weighted running averages techniques lessens this
problem by using weights to associate more importance
to points at the center of the moving window
The weights wj can be defined manually, for instance, for a 5-
point window is could be (1/9, 2/9, 1/3, 2/9, 1/9)
Or they can be defined by a function, in this case the Gaussian
is the first choice
http://publicationslist.org/junio
Weighted running averages
Running averages do not work well in the presence of
outliers, what may distort the curve
The weighted running averages techniques lessens this
problem by using weights to associate more importance
to points at the center of the moving window
The weights wj can be defined manually, for instance, for a 5-
point window is could be (1/9, 2/9, 1/3, 2/9, 1/9)
Or they can be defined by a function, in this case the Gaussian
is the first choice
In either case, the choice of weights must be peaked at
the center, drop toward the edges, and add up to 1
http://publicationslist.org/junio
Running averages
For example: considering synthetic data (filled line) and an 11-
point moving average
 The plot shows that the simple
technique could reasonably
represent the data, but
whenever an outlier (spike)
appears, the curve is abruptly
distorted until the outlier
leaves the window
 The weighted version of the
technique presented better
results, instead of abrupt
distortions, it shows
smoothed peaks that point
out the original outliers
http://publicationslist.org/junio
Single exponential smoothing
 Running averages are intrinsically local and may not capture the global
behavior of the series
 An improved method is exponential smoothing, which, in its single
form, departs from a simple recursive definition
= + 1 −
with 0 ≤ ≤ 1, and = , or = ∑ , for n initial values
 That is, the next i-th smoothed point is a mix between the actual xi point
and the previous smoothed si-1 point, where can be defined with trial
and error
 By mathematical induction, this recursion leads to the exponential
expression: = ∑ (1 − )
which can provide any smooth si value as a function of all the previous i
values x
http://publicationslist.org/junio
Single exponential smoothing
 The single exponential smoothing provides good smoothing curves and,
for some cases, forecasting
 It is limited, though, for series that present trend or seasonality,
situations when the technique cannot be accurately used for
prediction
 There two exponential smoothing techniques that are more advanced
 Double exponential smoothing for series with trend but without
seasonality
 Triple exponential smoothing for series with trend and seasonality, this
technique is called Holt–Winters method
 The Holt–Winters method is a powerful technique able to reproduce
the full behavior of additive or multiplicative time series
http://publicationslist.org/junio
Double and triple exponential smoothing
 Double exponential smoothing
 Additive triple exponential smoothing
 Multiplicative triple exponential smoothing
Trend factor
Trend factor
Seasonality factor
Trend factor
Seasonality factor
Forecasting
Forecasting
http://publicationslist.org/junio
Double and triple exponential smoothing
 Double exponential smoothing
 Additive triple exponential smoothing
 Multiplicative triple exponential smoothing
Trend factor
Trend factor
Seasonality factor
Trend factor
Seasonality factor
Forecasting
Forecasting
 Exponential smoothing depends on mixing parameters,
which are required by software packages:
• Single exponential smoothing:
• Double exponential smoothing:
• Triple exponential smoothing:
 More on time-series analysis:
http://www.statsoft.com/textbook/time-series-analysis/
http://publicationslist.org/junio
Triple exponential smoothing
For example, the additive Holt–Winters plot for a dataset with the
number of US monthly international flight passengers
The years 1949 through 1957 were used to “train” the algorithm,
and the years 1958 through 1960 were forecasted
Note how well the forecast agrees with the actual data
http://publicationslist.org/junio
Autocorrelation and correlogram
As mentioned, time-series are mainly characterized by trends
and seasonality
Trend is analyzed by means of smoothing, function fitting
(modeling), and plotting
Seasonality can benefit from techniques correlation and
correlogram
http://publicationslist.org/junio
Autocorrelation and correlogram
The correlation between two time series is obtained as
follows:
 For each point xi in the two series, multiply their response values (yi),
considering their deviation from the mean
 Sum up all the products
 Normalize
The correlation for two identical series is 1, and it is -1 for
series that are exactly inverted one in relation to the
other
http://publicationslist.org/junio
Autocorrelation and correlogram
 Seasonality:
 Formally defined as the correlation between each i-th element and
the (i+k)-th element – k is usually called the lag
 Measured by the Autocorrelation Function - ACF, i.e., the correlation
between the two terms xi and xi+k
 If the measurement error is not too large, seasonality can be
visually identified as a pattern that repeats every k moments in
time
http://publicationslist.org/junio
Autocorrelation and correlogram
If seasonality is present, then the behavior of the series
should repeat at every k time units, where k is named lag
The problem, hence, is: how to identify analytically what is
the lag of the series?
 The answer is: compare the time series with its own self, but
shifted by increasing values (lags) of k; for each value calculate
the correlation
Hence, the autocorrelation of a given series at lag k is given
by
Normalization according to lag 0,
that is, to the correlation of the
series with itself
http://publicationslist.org/junio
Autocorrelation and correlogram
Autocorrelation basic algorithm:
1.Let k = 0
2.Start with two copies of the series (original and copy)
3.Subtract the mean from all values in both series
4.Multiply the values at corresponding time steps with each other
5.Sum up the results for all time steps
6.Normalize with the variance of the original series  this is the
correlation for lag k, that is, c(k)
7.Shift the copy by 1 time step
8.Let k  k+1
9.Continue in step 2 while k < kmax
http://publicationslist.org/junio
Autocorrelation and correlogram
Autocorrelation basic algorithm:
1.Let k = 0
2.Start with two copies of the series (original and copy)
3.Subtract the mean from all values in both series
4.Multiply the values at corresponding time steps with each other
5.Sum up the results for all time steps
6.Normalize with the variance of the original series  this is the
correlation for lag k, that is, c(k)
7.Shift the copy by 1 time step
8.Let k  k+1
9.Continue in step 2 while k < kmax
 According to this algorithm:
 Initially (lag 0), the two signals are perfectly aligned and the
correlation is 1
 Then, as we shift the signals they slowly move out of phase and
the correlation drops
 How quickly it drops tells us how much “memory” there is
in the data:
 If quickly, we know that, after a few steps, the signal has lost all
memory of its recent past
 If slowly, then we know that we are dealing with a process that
is relatively steady over longer periods of time
http://publicationslist.org/junio
Autocorrelation and correlogram
The correlogram refers to the plot “lag x correlation” of a
given time series
For example: consider a data set with the number of daily calls
placed in a call center for a time period slightly longer than two
years – as presented earlier
Time series (Auto) correlogram – axis x  0<=lag<=500
http://publicationslist.org/junio
Autocorrelation and correlogram
The correlogram refers to the plot lag x correlation of a given
time series
For example: consider a data set with the number of daily calls
placed in a call center for a time period slightly longer than two
years – as presented earlier
Time series (Auto) correlogram
 From the correlogram we can observe that:
 The series has a long “memory” (long cycles): it takes the
correlation almost 100 days to fall to zero, indicating that the
frequency of calls changes more or less once per quarter but not
more frequently
 There is a pronounced secondary peak at a lag of 365 days: the
call center data is highly seasonal and repeats itself on a yearly
basis, when the series repeats its response behavior (high
correlation)
 There is a small but regular sawtooth structure; if we look
closely, we will find that the first peak of the sawtooth is at a lag of
7 days and that all repeating ones occur at multiples of 7 - this is
the signature of the high-frequency component that we see in the
plot of the series; that is, the traffic to the call center exhibits a
secondary seasonal component with 7-day periodicity, the
traffic depends on the day of the week
http://publicationslist.org/junio
Example
http://publicationslist.org/junio
CO2 measurements above Mauna Loa in Hawaii
Consider again the data set with the concentration (ppm) of
carbon dioxide (CO2) in the atmosphere, as measured by the
observatory on Mauna Loa on Hawaii, recorded at monthly
intervals since 1959
http://publicationslist.org/junio
CO2 measurements above Mauna Loa in Hawaii
 Which can be better numerically analyzed if the horizontal axis be expressed as
incremental monthly indexes, and if the graph goes through the origin (vertical
translation of -315)
 This can be achieved in Gnuplot with:
plot "data" using 0:($2-315) with lines
http://publicationslist.org/junio
CO2 measurements above Mauna Loa in Hawaii
 The series has a trend that seems to be a power-law of the form b(x/a)k with k
bigger than 1 as the curve is convex downward, a first guess is k=2 and b=35 and
a=350 (upper rightmost part of the series)
 This can be achieved in Gnuplot with:
plot “data” using 0:($2-315) with lines, 35*(x/350)**2
http://publicationslist.org/junio
CO2 measurements above Mauna Loa in Hawaii
 By trial and error, a better guess for k is 1.35
 This can be achieved in Gnuplot with:
plot "data" using 0:($2-315) with lines, 35*(x/350)**1.35
http://publicationslist.org/junio
CO2 measurements above Mauna Loa in Hawaii
 To verify the accuracy of the model function, we can plot the residual by subtracting
the trend from the data
 This can be achieved in Gnuplot with:
plot "data" using 0:($2-315 - 35*($0/350)**1.35) with lines
http://publicationslist.org/junio
CO2 measurements above Mauna Loa in Hawaii
 The model seems fine but for the seasonality, which consists of regular oscillations
that can be captured by sines, as the series starts at (0,0); also the series is monthly-
based with a cycle of one year, so a guess is that the data is the same every 12
points; the amplitude is around 3, as we can observe in the former plots
 We can compare the residual and our seasonality mode in Gnuplot with:
plot "data" using 0:($2-f($0)) with lines, 3*sin(2*pi*x/12) with lines
http://publicationslist.org/junio
CO2 measurements above Mauna Loa in Hawaii
 The model seems fine but for the seasonality, which consists of regular oscillations that
can be captured by sines as the series starts at (0,0); also the series is monthly-based
with a cycle of one year, so a guess is that the data is the same every 12 points; the
amplitude is around 3, as we can observe in the former plots
 We can compare the residual and our seasonality mode in Gnuplot with:
plot "data" u 0:($2-f($0)) w l, 3*sin(2*pi*x/12) w l
 At this point the model is given by the power-law function plus the sine
function
f(x) = 315 + 35*(x/350)**1.35 + 3*sin(2*pi*x/12)
plot "data" using 0:2 with lines, f(x)
which is pretty close the actual phenomenon
http://publicationslist.org/junio
CO2 measurements above Mauna Loa in Hawaii
With the final model, it becomes possible to predict future values
for the series
http://publicationslist.org/junio
References
 Philipp K. Janert, Data Analysis with Open Source Tools,
O’Reilly, 2010.
 Wikipedia, http://en.wikipedia.org
 Wolfram MathWorld, http://mathworld.wolfram.com/

More Related Content

What's hot

Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingMaruthi Nataraj K
 
Non-parametric Change Point Detection for Spike Trains
Non-parametric Change Point Detection for Spike TrainsNon-parametric Change Point Detection for Spike Trains
Non-parametric Change Point Detection for Spike TrainsThiago Mosqueiro
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingDerek Kane
 
GIS Spatially Weighted Averages
GIS Spatially Weighted AveragesGIS Spatially Weighted Averages
GIS Spatially Weighted AveragesJoseph Luchette
 
Machine Learning - Time Series
Machine Learning - Time Series Machine Learning - Time Series
Machine Learning - Time Series Rupak Roy
 
Machine Learning - Time Series Part 2
Machine Learning - Time Series Part 2Machine Learning - Time Series Part 2
Machine Learning - Time Series Part 2Rupak Roy
 
Time series analysis- Part 2
Time series analysis- Part 2Time series analysis- Part 2
Time series analysis- Part 2QuantUniversity
 
Bayesian Analysis Influences Autoregressive Models
Bayesian Analysis Influences Autoregressive ModelsBayesian Analysis Influences Autoregressive Models
Bayesian Analysis Influences Autoregressive ModelsAI Publications
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive ModelsBhaskar T
 
NEW Time Series Paper
NEW Time Series PaperNEW Time Series Paper
NEW Time Series PaperKatie Harvey
 

What's hot (14)

Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
Non-parametric Change Point Detection for Spike Trains
Non-parametric Change Point Detection for Spike TrainsNon-parametric Change Point Detection for Spike Trains
Non-parametric Change Point Detection for Spike Trains
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
 
GIS Spatially Weighted Averages
GIS Spatially Weighted AveragesGIS Spatially Weighted Averages
GIS Spatially Weighted Averages
 
Machine Learning - Time Series
Machine Learning - Time Series Machine Learning - Time Series
Machine Learning - Time Series
 
Econometrics
EconometricsEconometrics
Econometrics
 
Machine Learning - Time Series Part 2
Machine Learning - Time Series Part 2Machine Learning - Time Series Part 2
Machine Learning - Time Series Part 2
 
Time series analysis- Part 2
Time series analysis- Part 2Time series analysis- Part 2
Time series analysis- Part 2
 
Data Presenetation
Data PresenetationData Presenetation
Data Presenetation
 
Bayesian Analysis Influences Autoregressive Models
Bayesian Analysis Influences Autoregressive ModelsBayesian Analysis Influences Autoregressive Models
Bayesian Analysis Influences Autoregressive Models
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
 
NEW Time Series Paper
NEW Time Series PaperNEW Time Series Paper
NEW Time Series Paper
 
data analysis
 data analysis data analysis
data analysis
 
CLIM Fall 2017 Course: Statistics for Climate Research, Analysis for Climate ...
CLIM Fall 2017 Course: Statistics for Climate Research, Analysis for Climate ...CLIM Fall 2017 Course: Statistics for Climate Research, Analysis for Climate ...
CLIM Fall 2017 Course: Statistics for Climate Research, Analysis for Climate ...
 

Similar to Data analysis03 timeasa-variable

CHAPTER- FIVE.pptx futfuyuiui898 y90y8y98t78t
CHAPTER- FIVE.pptx futfuyuiui898 y90y8y98t78tCHAPTER- FIVE.pptx futfuyuiui898 y90y8y98t78t
CHAPTER- FIVE.pptx futfuyuiui898 y90y8y98t78t2cd
 
Forecasting Web Page Views:methods and observations
Forecasting Web Page Views:methods and observationsForecasting Web Page Views:methods and observations
Forecasting Web Page Views:methods and observationsAjay Ohri
 
1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-mainZulyy Astutik
 
Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
 Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr... Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...hydrologywebsite1
 
Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
 Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr... Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...hydrologyproject001
 
Article on Frequency Domain Analysis
Article on Frequency Domain AnalysisArticle on Frequency Domain Analysis
Article on Frequency Domain AnalysisSubhankar Pramanik
 
A gentle introduction to growth curves using SPSS
A gentle introduction to growth curves using SPSSA gentle introduction to growth curves using SPSS
A gentle introduction to growth curves using SPSSsmackinnon
 
Scientific paper introduction-method-results
Scientific paper introduction-method-resultsScientific paper introduction-method-results
Scientific paper introduction-method-resultsJonathanCovena1
 
vatter_wu_chavez_yu_2014
vatter_wu_chavez_yu_2014vatter_wu_chavez_yu_2014
vatter_wu_chavez_yu_2014Thibault Vatter
 
TIME SERIES & CROSS ‎SECTIONAL ANALYSIS
TIME SERIES & CROSS ‎SECTIONAL ANALYSISTIME SERIES & CROSS ‎SECTIONAL ANALYSIS
TIME SERIES & CROSS ‎SECTIONAL ANALYSISLibcorpio
 
Wave process cycle and_market
Wave process cycle and_marketWave process cycle and_market
Wave process cycle and_marketLeadingTrader21
 
ders 6 Panel data analysis.pptx
ders 6 Panel data analysis.pptxders 6 Panel data analysis.pptx
ders 6 Panel data analysis.pptxErgin Akalpler
 
16 ch ken black solution
16 ch ken black solution16 ch ken black solution
16 ch ken black solutionKrunal Shah
 
Large Scale Automatic Forecasting for Millions of Forecasts
Large Scale Automatic Forecasting for Millions of ForecastsLarge Scale Automatic Forecasting for Millions of Forecasts
Large Scale Automatic Forecasting for Millions of ForecastsAjay Ohri
 

Similar to Data analysis03 timeasa-variable (20)

Data analysis00 commonprobabilitymodels
Data analysis00 commonprobabilitymodelsData analysis00 commonprobabilitymodels
Data analysis00 commonprobabilitymodels
 
CHAPTER- FIVE.pptx futfuyuiui898 y90y8y98t78t
CHAPTER- FIVE.pptx futfuyuiui898 y90y8y98t78tCHAPTER- FIVE.pptx futfuyuiui898 y90y8y98t78t
CHAPTER- FIVE.pptx futfuyuiui898 y90y8y98t78t
 
Forecasting Web Page Views:methods and observations
Forecasting Web Page Views:methods and observationsForecasting Web Page Views:methods and observations
Forecasting Web Page Views:methods and observations
 
1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main
 
Data analysis01 singlevariable
Data analysis01 singlevariableData analysis01 singlevariable
Data analysis01 singlevariable
 
Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
 Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr... Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
 
Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
 Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr... Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
Download-manuals-hydrometeorology-data processing-43statisticalanalysiswithr...
 
Article on Frequency Domain Analysis
Article on Frequency Domain AnalysisArticle on Frequency Domain Analysis
Article on Frequency Domain Analysis
 
A gentle introduction to growth curves using SPSS
A gentle introduction to growth curves using SPSSA gentle introduction to growth curves using SPSS
A gentle introduction to growth curves using SPSS
 
Scientific paper introduction-method-results
Scientific paper introduction-method-resultsScientific paper introduction-method-results
Scientific paper introduction-method-results
 
vatter_wu_chavez_yu_2014
vatter_wu_chavez_yu_2014vatter_wu_chavez_yu_2014
vatter_wu_chavez_yu_2014
 
TIME SERIES & CROSS ‎SECTIONAL ANALYSIS
TIME SERIES & CROSS ‎SECTIONAL ANALYSISTIME SERIES & CROSS ‎SECTIONAL ANALYSIS
TIME SERIES & CROSS ‎SECTIONAL ANALYSIS
 
Wave process cycle and_market
Wave process cycle and_marketWave process cycle and_market
Wave process cycle and_market
 
Time Series FORECASTING
Time Series FORECASTINGTime Series FORECASTING
Time Series FORECASTING
 
ders 6 Panel data analysis.pptx
ders 6 Panel data analysis.pptxders 6 Panel data analysis.pptx
ders 6 Panel data analysis.pptx
 
Casa cookbook for KAT 7
Casa cookbook for KAT 7Casa cookbook for KAT 7
Casa cookbook for KAT 7
 
Panel slides
Panel slidesPanel slides
Panel slides
 
16 ch ken black solution
16 ch ken black solution16 ch ken black solution
16 ch ken black solution
 
Run Chart
Run ChartRun Chart
Run Chart
 
Large Scale Automatic Forecasting for Millions of Forecasts
Large Scale Automatic Forecasting for Millions of ForecastsLarge Scale Automatic Forecasting for Millions of Forecasts
Large Scale Automatic Forecasting for Millions of Forecasts
 

More from Universidade de São Paulo

Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopUniversidade de São Paulo
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...Universidade de São Paulo
 
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Universidade de São Paulo
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Universidade de São Paulo
 
Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUniversidade de São Paulo
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsUniversidade de São Paulo
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelUniversidade de São Paulo
 
StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...Universidade de São Paulo
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesUniversidade de São Paulo
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkUniversidade de São Paulo
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyUniversidade de São Paulo
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Universidade de São Paulo
 
Visualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsVisualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsUniversidade de São Paulo
 

More from Universidade de São Paulo (20)

A gentle introduction to Deep Learning
A gentle introduction to Deep LearningA gentle introduction to Deep Learning
A gentle introduction to Deep Learning
 
Computação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalhoComputação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalho
 
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
 
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
 
Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...
 
Apresentacao vldb
Apresentacao vldbApresentacao vldb
Apresentacao vldb
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical Study
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
 
Dawarehouse e OLAP
Dawarehouse e OLAPDawarehouse e OLAP
Dawarehouse e OLAP
 
Visualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsVisualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisions
 

Recently uploaded

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 

Data analysis03 timeasa-variable

  • 1. http://publicationslist.org/junio Data Analysis Time as a variable: time-series analysis Prof. Dr. Jose Fernando Rodrigues Junior ICMC-USP
  • 2. http://publicationslist.org/junio What is it about? Time series are an incredibly common kind of data  Stock market  CPU utilization  Meteorology - daily rainfall, wind speed, and temperature  Sociology - crime figures, employment figures  Software engineering – number of errors  Networks – number of nodes, and edges
  • 3. http://publicationslist.org/junio First examples Consider a data set with the concentration (ppm) of carbon dioxide (CO2) in the atmosphere, as measured by the observatory on Mauna Loa on Hawaii, recorded at monthly intervals since 1959 The plot shows two common features in time series:  Trend: a steady, long- term linear growth  Seasonality: a regular periodic pattern – on 12 month cycle
  • 4. http://publicationslist.org/junio First examples Consider the data set with the price of long-distance phone calls in the US over the last century The plot shows a strong nonlinear trend The single-log plot (inset) shows that the data follow a power-law distribution (logarithmic linear) – a usual behavior of growth/decay processes
  • 5. http://publicationslist.org/junio First examples Consider the data set with the price of long-distance phone calls in the US over the last century The plot shows a strong nonlinear trend The single-log plot (inset) shows that the data follow a power-law distribution (logarithmic linear) – a usual behavior of growth/decay processes This example asks for closer inspection: • Has the long-distance call service changed along time? • Were the prices adjusted for inflation? • What are the uncharacteristically low prices for a couple of years in the late 1970s? Did the breakup of the AT&T system have anything to do with it?
  • 6. http://publicationslist.org/junio First examples Consider the data set with the development of the Japanese stock market as represented by the Nikkei Stock Index over the last 40 years shown with a 31-point Gaussian smoothing filter The plot shows a change in the behavior after 1990 (the big Japanese bubble), after which a long-term increasing trend turned into an oscillatory decreasing trend The seasonality also changed significantly after then
  • 7. http://publicationslist.org/junio First examples Consider a data set with the number of daily calls placed in a call center for a time period slightly longer than two years  This example is way more challenging with its complex structure  Actually, it is not clear whether the high-frequency variation in the plot is noise or has some form of regularity  In an initial analysis, not many conclusions can be drawn from the plot – apparently, no trend, no seasonality, and no change in behavior
  • 8. http://publicationslist.org/junio First examples Consider a data set with the number of daily calls placed in a call center for a time period slightly longer than two years  This example is way more challenging with is complex structure  Actually, it is not clear whether the high-frequency variation in the plot is noise or has some form of regularity  In an initial analysis, not many conclusions can be drawn from the plot – apparently, no trend, no seasonality, and no change in behavior As time-series commonly counts on long-term data, it is important to certify that the data acquisition was homogeneous along the period, otherwise the series may change its behavior in ways that becomes hard to make sense
  • 9. http://publicationslist.org/junio Main components As we have seen, the main components observed are:  Trend: linear or non-linear, with a characteristic magnitude  Seasonality: additive, for example, every 12 months the sales increase by 3 million; or multiplicative, for example, every 12 months the sales increase by 1.4 times what was observed in the last cycle  Noise: some form of random variation, quite common  Other: change in behavior, special outliers, missing data, and anything remarkable
  • 10. http://publicationslist.org/junio Assumptions Standard methods of time-series analysis make a number of assumptions, all of them are violated in real-world scenarios:  Data points have been taken at equally spaced time steps, with no missing data points: demands interpolation in case of missing points, or re-sampling in case of insufficient sampling  The time series is sufficiently long (at least 50 points): requires smoothing methods to define a continuous curve, even where there are no points  The series is stationary, it has no trend, no seasonality, and the character (amplitude and frequency) of any noise does not change with time: may require breaking the series into multiple segments to be analyzed separately
  • 11. http://publicationslist.org/junio Smoothing Just as with two-variable data, it is useful to fit a curve according to the available data (actually, a time series is a special case of two-variable data) Smoothing helps in:  Reducing noise  Interpolating missing/insufficient values
  • 12. http://publicationslist.org/junio Running averages The method know as running (moving, rolling, or floating) average is straightforward: for any odd number of consecutive points, replace the centermost value with the average of the other points The smoothed point si is given by: where xi are the data points For example, for a 5-point (k=2) moving average, consider point x10 = 4, and points x8 = 4, x9 = 7, x11 = 2, x12 = 9, so s10 = 1/5*(4+7+4+2+9)=1/5*26 = 5.2 And so forth for any point
  • 13. http://publicationslist.org/junio Weighted running averages Running averages do not work well in the presence of outliers, what may distort the curve The weighted running averages techniques lessens this problem by using weights to associate more importance to points at the center of the moving window The weights wj can be defined manually, for instance, for a 5- point window is could be (1/9, 2/9, 1/3, 2/9, 1/9) Or they can be defined by a function, in this case the Gaussian is the first choice
  • 14. http://publicationslist.org/junio Weighted running averages Running averages do not work well in the presence of outliers, what may distort the curve The weighted running averages techniques lessens this problem by using weights to associate more importance to points at the center of the moving window The weights wj can be defined manually, for instance, for a 5- point window is could be (1/9, 2/9, 1/3, 2/9, 1/9) Or they can be defined by a function, in this case the Gaussian is the first choice In either case, the choice of weights must be peaked at the center, drop toward the edges, and add up to 1
  • 15. http://publicationslist.org/junio Running averages For example: considering synthetic data (filled line) and an 11- point moving average  The plot shows that the simple technique could reasonably represent the data, but whenever an outlier (spike) appears, the curve is abruptly distorted until the outlier leaves the window  The weighted version of the technique presented better results, instead of abrupt distortions, it shows smoothed peaks that point out the original outliers
  • 16. http://publicationslist.org/junio Single exponential smoothing  Running averages are intrinsically local and may not capture the global behavior of the series  An improved method is exponential smoothing, which, in its single form, departs from a simple recursive definition = + 1 − with 0 ≤ ≤ 1, and = , or = ∑ , for n initial values  That is, the next i-th smoothed point is a mix between the actual xi point and the previous smoothed si-1 point, where can be defined with trial and error  By mathematical induction, this recursion leads to the exponential expression: = ∑ (1 − ) which can provide any smooth si value as a function of all the previous i values x
  • 17. http://publicationslist.org/junio Single exponential smoothing  The single exponential smoothing provides good smoothing curves and, for some cases, forecasting  It is limited, though, for series that present trend or seasonality, situations when the technique cannot be accurately used for prediction  There two exponential smoothing techniques that are more advanced  Double exponential smoothing for series with trend but without seasonality  Triple exponential smoothing for series with trend and seasonality, this technique is called Holt–Winters method  The Holt–Winters method is a powerful technique able to reproduce the full behavior of additive or multiplicative time series
  • 18. http://publicationslist.org/junio Double and triple exponential smoothing  Double exponential smoothing  Additive triple exponential smoothing  Multiplicative triple exponential smoothing Trend factor Trend factor Seasonality factor Trend factor Seasonality factor Forecasting Forecasting
  • 19. http://publicationslist.org/junio Double and triple exponential smoothing  Double exponential smoothing  Additive triple exponential smoothing  Multiplicative triple exponential smoothing Trend factor Trend factor Seasonality factor Trend factor Seasonality factor Forecasting Forecasting  Exponential smoothing depends on mixing parameters, which are required by software packages: • Single exponential smoothing: • Double exponential smoothing: • Triple exponential smoothing:  More on time-series analysis: http://www.statsoft.com/textbook/time-series-analysis/
  • 20. http://publicationslist.org/junio Triple exponential smoothing For example, the additive Holt–Winters plot for a dataset with the number of US monthly international flight passengers The years 1949 through 1957 were used to “train” the algorithm, and the years 1958 through 1960 were forecasted Note how well the forecast agrees with the actual data
  • 21. http://publicationslist.org/junio Autocorrelation and correlogram As mentioned, time-series are mainly characterized by trends and seasonality Trend is analyzed by means of smoothing, function fitting (modeling), and plotting Seasonality can benefit from techniques correlation and correlogram
  • 22. http://publicationslist.org/junio Autocorrelation and correlogram The correlation between two time series is obtained as follows:  For each point xi in the two series, multiply their response values (yi), considering their deviation from the mean  Sum up all the products  Normalize The correlation for two identical series is 1, and it is -1 for series that are exactly inverted one in relation to the other
  • 23. http://publicationslist.org/junio Autocorrelation and correlogram  Seasonality:  Formally defined as the correlation between each i-th element and the (i+k)-th element – k is usually called the lag  Measured by the Autocorrelation Function - ACF, i.e., the correlation between the two terms xi and xi+k  If the measurement error is not too large, seasonality can be visually identified as a pattern that repeats every k moments in time
  • 24. http://publicationslist.org/junio Autocorrelation and correlogram If seasonality is present, then the behavior of the series should repeat at every k time units, where k is named lag The problem, hence, is: how to identify analytically what is the lag of the series?  The answer is: compare the time series with its own self, but shifted by increasing values (lags) of k; for each value calculate the correlation Hence, the autocorrelation of a given series at lag k is given by Normalization according to lag 0, that is, to the correlation of the series with itself
  • 25. http://publicationslist.org/junio Autocorrelation and correlogram Autocorrelation basic algorithm: 1.Let k = 0 2.Start with two copies of the series (original and copy) 3.Subtract the mean from all values in both series 4.Multiply the values at corresponding time steps with each other 5.Sum up the results for all time steps 6.Normalize with the variance of the original series  this is the correlation for lag k, that is, c(k) 7.Shift the copy by 1 time step 8.Let k  k+1 9.Continue in step 2 while k < kmax
  • 26. http://publicationslist.org/junio Autocorrelation and correlogram Autocorrelation basic algorithm: 1.Let k = 0 2.Start with two copies of the series (original and copy) 3.Subtract the mean from all values in both series 4.Multiply the values at corresponding time steps with each other 5.Sum up the results for all time steps 6.Normalize with the variance of the original series  this is the correlation for lag k, that is, c(k) 7.Shift the copy by 1 time step 8.Let k  k+1 9.Continue in step 2 while k < kmax  According to this algorithm:  Initially (lag 0), the two signals are perfectly aligned and the correlation is 1  Then, as we shift the signals they slowly move out of phase and the correlation drops  How quickly it drops tells us how much “memory” there is in the data:  If quickly, we know that, after a few steps, the signal has lost all memory of its recent past  If slowly, then we know that we are dealing with a process that is relatively steady over longer periods of time
  • 27. http://publicationslist.org/junio Autocorrelation and correlogram The correlogram refers to the plot “lag x correlation” of a given time series For example: consider a data set with the number of daily calls placed in a call center for a time period slightly longer than two years – as presented earlier Time series (Auto) correlogram – axis x  0<=lag<=500
  • 28. http://publicationslist.org/junio Autocorrelation and correlogram The correlogram refers to the plot lag x correlation of a given time series For example: consider a data set with the number of daily calls placed in a call center for a time period slightly longer than two years – as presented earlier Time series (Auto) correlogram  From the correlogram we can observe that:  The series has a long “memory” (long cycles): it takes the correlation almost 100 days to fall to zero, indicating that the frequency of calls changes more or less once per quarter but not more frequently  There is a pronounced secondary peak at a lag of 365 days: the call center data is highly seasonal and repeats itself on a yearly basis, when the series repeats its response behavior (high correlation)  There is a small but regular sawtooth structure; if we look closely, we will find that the first peak of the sawtooth is at a lag of 7 days and that all repeating ones occur at multiples of 7 - this is the signature of the high-frequency component that we see in the plot of the series; that is, the traffic to the call center exhibits a secondary seasonal component with 7-day periodicity, the traffic depends on the day of the week
  • 30. http://publicationslist.org/junio CO2 measurements above Mauna Loa in Hawaii Consider again the data set with the concentration (ppm) of carbon dioxide (CO2) in the atmosphere, as measured by the observatory on Mauna Loa on Hawaii, recorded at monthly intervals since 1959
  • 31. http://publicationslist.org/junio CO2 measurements above Mauna Loa in Hawaii  Which can be better numerically analyzed if the horizontal axis be expressed as incremental monthly indexes, and if the graph goes through the origin (vertical translation of -315)  This can be achieved in Gnuplot with: plot "data" using 0:($2-315) with lines
  • 32. http://publicationslist.org/junio CO2 measurements above Mauna Loa in Hawaii  The series has a trend that seems to be a power-law of the form b(x/a)k with k bigger than 1 as the curve is convex downward, a first guess is k=2 and b=35 and a=350 (upper rightmost part of the series)  This can be achieved in Gnuplot with: plot “data” using 0:($2-315) with lines, 35*(x/350)**2
  • 33. http://publicationslist.org/junio CO2 measurements above Mauna Loa in Hawaii  By trial and error, a better guess for k is 1.35  This can be achieved in Gnuplot with: plot "data" using 0:($2-315) with lines, 35*(x/350)**1.35
  • 34. http://publicationslist.org/junio CO2 measurements above Mauna Loa in Hawaii  To verify the accuracy of the model function, we can plot the residual by subtracting the trend from the data  This can be achieved in Gnuplot with: plot "data" using 0:($2-315 - 35*($0/350)**1.35) with lines
  • 35. http://publicationslist.org/junio CO2 measurements above Mauna Loa in Hawaii  The model seems fine but for the seasonality, which consists of regular oscillations that can be captured by sines, as the series starts at (0,0); also the series is monthly- based with a cycle of one year, so a guess is that the data is the same every 12 points; the amplitude is around 3, as we can observe in the former plots  We can compare the residual and our seasonality mode in Gnuplot with: plot "data" using 0:($2-f($0)) with lines, 3*sin(2*pi*x/12) with lines
  • 36. http://publicationslist.org/junio CO2 measurements above Mauna Loa in Hawaii  The model seems fine but for the seasonality, which consists of regular oscillations that can be captured by sines as the series starts at (0,0); also the series is monthly-based with a cycle of one year, so a guess is that the data is the same every 12 points; the amplitude is around 3, as we can observe in the former plots  We can compare the residual and our seasonality mode in Gnuplot with: plot "data" u 0:($2-f($0)) w l, 3*sin(2*pi*x/12) w l  At this point the model is given by the power-law function plus the sine function f(x) = 315 + 35*(x/350)**1.35 + 3*sin(2*pi*x/12) plot "data" using 0:2 with lines, f(x) which is pretty close the actual phenomenon
  • 37. http://publicationslist.org/junio CO2 measurements above Mauna Loa in Hawaii With the final model, it becomes possible to predict future values for the series
  • 38. http://publicationslist.org/junio References  Philipp K. Janert, Data Analysis with Open Source Tools, O’Reilly, 2010.  Wikipedia, http://en.wikipedia.org  Wolfram MathWorld, http://mathworld.wolfram.com/