SlideShare a Scribd company logo
1 of 40
1
Data Mining:
Concepts and Techniques
2
Mining Time-Series Data
 A time series is a sequence of data points, measured typically at
successive times, spaced at (often uniform) time intervals
 Time series analysis: A subfield of statistics, comprises methods
that attempt to understand such time series, often either to
understand the underlying context of the data points or to make
forecasts (or predictions)
 Methods for time series analyses
 Frequency-domain methods: Model-free analyses, well-suited to
exploratory investigations
 spectral analysis vs. wavelet analysis
 Time-domain methods: Auto-correlation and cross-correlation
analysis
 Motif-based time-series analysis
 Applications
 Financial: stock price, inflation
 Industry: power consumption
 Scientific: experiment results
 Meteorological: precipitation
3
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in Time
Series Data
Summary
4
Time-Series Data Analysis: Prediction &
Regression Analysis
 (Numerical) prediction is similar to classification
 construct a model
 use model to predict continuous or ordered value for a given input
 Prediction is different from classification
 Classification refers to predict categorical class label
 Prediction models continuous-valued functions
 Major method for prediction: regression
 model the relationship between one or more independent or
predictor variables and a dependent or response variable
 Regression analysis
 Linear and multiple regression
 Non-linear regression
 Other regression methods: generalized linear model, Poisson
regression, log-linear models, regression trees
5
What is Regression?
 Modeling the relationship between one response
variable and one or more predictor variables
 Analyzing the confidence of the model
 E.g, height v.s weight
6
Regression Yields Analytical Model
 Discrete data points →Analytical model
 General relationship
 Easy calculation
 Further analysis
 Application - Prediction
7
Application - Detrending
 Obtain the trend for irregular data series
 Subtract trend
 Reveal oscillations
trend
8
Linear Regression - Single Predictor
 Model is linear
y = w0 + w1 x
where w0 (y-intercept) and w1
(slope) are regression
coefficients
 Method of least squares:
y: response
variable
x: predictor
variable
w1
w0
| |
1
| |
2
1
( )( )
1
( )
D
i i
i
D
i
i
x x y y
x x
w 

 



 x
w
y
w
1
0


9
 Training data is of the form (X1, y1), (X2, y2),…, (X|D|,
y|D|)
 E.g., for 2-D data or
y = w0 + w1 x1+ w2 x2
 Solvable by
 Extension of least square method
(XTX ) W=Y →W = (XTX ) -1Y
 Commercial software (SAS, S-Plus) x1
x2
y
Linear Regression – Multiple Predictor
10
Nonlinear Regression with Linear Method
 Polynomial regression model
 E.g., y = w0 + w1 x + w2 x2 + w3 x3
Let x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
 Log-linear regression model
 E. g., y = exp(w0 + w1 x + w2 x2 + w3 x3 )
Let y’=log(y)
y’= w0 + w1 x + w2 x2 + w3 x3
11
Generalized Linear Regression
 Response y
 Distribution function in the exponential family
 Variance of y depends on E( y), not a constant
 E( y) = g-1( w0 + w1 x + w2 x2 + w3 x3 )
 Examples
 Logistic regression (binomial regression): probability of
some event occurring
 Poisson regression: number of customers
 …
 References: Nelder and Wedderburn, 1972; McCullagh and
Nelder, 1989
12
Regression Tree (Breiman et al., 1984)
Figure source: http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf
 Partition the domain space
 Leaf: (1) a continuous-valued
prediction; (2) average value
13
Model Tree (Quinlan, 1992)
 Leaf – a linear equation
 More general than regression tree
Figure source: http://datamining.ihe.nl/research/model-trees.htm
14
Regression Trees and Model Trees
 Regression tree: proposed in CART system (Breiman et al. 1984)
 CART: Classification And Regression Trees
 Each leaf stores a continuous-valued prediction
 It is the average value of the predicted attribute for the training tuples
that reach the leaf
 Model tree: proposed by Quinlan (1992)
 Each leaf holds a regression model—a multivariate linear equation for the
predicted attribute
 A more general case than regression tree
 Regression and model trees tend to be more accurate than linear
regression when the data cannot be represented well by a simple
linear model
15
Predictive Modeling in
Multidimensional Databases
 Predictive modeling: Predict data values or construct
generalized linear models based on the database data
 One can only predict value ranges or category
distributions
 Method outline
 Minimal generalization
 Attribute relevance analysis
 Generalized linear model construction
 Prediction
 Determine the major factors which influence the prediction
 Data relevance analysis: uncertainty measurement,
entropy analysis, expert judgment, etc.
 Multi-level prediction: drill-down and roll-up analysis
16
 Predictive modeling: Predict data values or construct
generalized linear models based on the database data
 One can only predict value ranges or category
distributions
 Method outline:
 Minimal generalization
 Attribute relevance analysis
 Generalized linear model construction
 Prediction
 Determine the major factors which influence the
prediction
 Data relevance analysis: uncertainty measurement,
entropy analysis, expert judgment, etc.
 Multi-level prediction: drill-down and roll-up analysis
Predictive Modeling in Multidimensional Databases
17
Prediction: Numerical Data
18
References
 Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models.
Journal of the Royal Statistical Society A, 135, 370-384.
 C. Chatfield. The Analysis of Time Series: An Introduction, 3rd ed. Chapman &
Hall, 1984.
 McCullagh, P. and Nelder, J.A. (1989). Generalized linear models, 2nd ed.
Chapman and Hall, London.
 Breiman L, Friedman JH, Olshen RA, Stone CJ. (1984). Classification and
Regression Trees. Chapman &Hall (Wadsworth, Inc.): New York.
 Quinlan, J. R. (1992). Learning with continuous classes. In: Adams, , Sterling,
(Eds.), Proceedings of artificial intelligence'92, World Scientific, Singapore. pp.
343-348.
 Acknowledgment
 This presentation integrates Xiaopeng Li’s slides in his CS 512 class
presentation
19
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in Time
Series Data
Summary
20
 A time series can be illustrated as a time-series graph
which describes a point moving with the passage of time
21
Categories of Time-Series Movements
 Categories of Time-Series Movements
 Long-term or trend movements (trend curve): general direction in
which a time series is moving over a long interval of time
 Cyclic movements or cycle variations: long term oscillations about
a trend line or curve
 e.g., business cycles, may or may not be periodic
 Seasonal movements or seasonal variations
 i.e, almost identical patterns that a time series appears to
follow during corresponding months of successive years.
 Irregular or random movements
 Time series analysis: decomposition of a time series into these four
basic movements
 Additive Modal: TS = T + C + S + I
 Multiplicative Modal: TS = T  C  S  I
22
Estimation of Trend Curve
 The freehand method
 Fit the curve by looking at the graph
 Costly and barely reliable for large-scaled data mining
 The least-square method
 Find the curve minimizing the sum of the squares of
the deviation of points on the curve from the
corresponding data points
 The moving-average method
23
Moving Average
 Moving average of order n
 Smoothes the data
 Eliminates cyclic, seasonal and irregular movements
 Loses the data at the beginning or end of a series
 Sensitive to outliers (can be reduced by weighted
moving average)
24
Trend Discovery in Time-Series (1):
Estimation of Seasonal Variations
 Seasonal index
 Set of numbers showing the relative values of a variable during
the months of the year
 E.g., if the sales during October, November, and December are
80%, 120%, and 140% of the average monthly sales for the
whole year, respectively, then 80, 120, and 140 are seasonal
index numbers for these months
 Deseasonalized data
 Data adjusted for seasonal variations for better trend and cyclic
analysis
 Divide the original monthly data by the seasonal index numbers
for the corresponding months
November 17, 2023 Data Mining: Concepts and Techniques 25
Seasonal Index
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7 8 9 10 11 12
Month
Seasonal Index
Raw data from
http://www.bbk.ac.uk/man
op/man/docs/QII_2_2003
%20Time%20series.pdf
26
Trend Discovery in Time-Series (2)
 Estimation of cyclic variations
 If (approximate) periodicity of cycles occurs, cyclic
index can be constructed in much the same manner as
seasonal indexes
 Estimation of irregular variations
 By adjusting the data for trend, seasonal and cyclic
variations
 With the systematic analysis of the trend, cyclic, seasonal,
and irregular components, it is possible to make long- or
short-term predictions with reasonable quality
27
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in Time
Series Data
Summary
28
Similarity Search in Time-Series Analysis
 Normal database query finds exact match
 Similarity search finds data sequences that differ only
slightly from the given query sequence
 Two categories of similarity queries
 Whole matching: find a sequence that is similar to the
query sequence
 Subsequence matching: find all pairs of similar
sequences
 Typical Applications
 Financial market
 Market basket data analysis
 Scientific databases
 Medical diagnosis
29
Data Transformation
 Many techniques for signal analysis require the data to
be in the frequency domain
 Usually data-independent transformations are used
 The transformation matrix is determined a priori
 discrete Fourier transform (DFT)
 discrete wavelet transform (DWT)
 The distance between two signals in the time domain is
the same as their Euclidean distance in the frequency
domain
30
Discrete Fourier Transform
 DFT does a good job of concentrating energy in the first
few coefficients
 If we keep only first a few coefficients in DFT, we can
compute the lower bounds of the actual distance
 Feature extraction: keep the first few coefficients (F-index)
as representative of the sequence
31
DFT (continued)
 Parseval’s Theorem
 The Euclidean distance between two signals in the time
domain is the same as their distance in the frequency
domain
 Keep the first few (say, 3) coefficients underestimates the
distance and there will be no false dismissals!







1
0
2
1
0
2
|
|
|
|
n
f
f
n
t
t X
x
|
]
)[
(
]
)[
(
|
|
]
[
]
[
|
3
0
2
0
2

 






f
n
t
f
Q
F
f
S
F
t
Q
t
S 

32
Multidimensional Indexing in Time-Series
 Multidimensional index construction
 Constructed for efficient accessing using the first few
Fourier coefficients
 Similarity search
 Use the index to retrieve the sequences that are at
most a certain small distance away from the query
sequence
 Perform post-processing by computing the actual
distance between sequences in the time domain and
discard any false matches
33
Subsequence Matching
 Break each sequence into a set of
pieces of window with length w
 Extract the features of the
subsequence inside the window
 Map each sequence to a “trail” in
the feature space
 Divide the trail of each sequence
into “subtrails” and represent each
of them with minimum bounding
rectangle
 Use a multi-piece assembly
algorithm to search for longer
sequence matches
34
Analysis of Similar Time Series
35
Enhanced Similarity Search Methods
 Allow for gaps within a sequence or differences in offsets
or amplitudes
 Normalize sequences with amplitude scaling and offset
translation
 Two subsequences are considered similar if one lies within
an envelope of  width around the other, ignoring outliers
 Two sequences are said to be similar if they have enough
non-overlapping time-ordered pairs of similar
subsequences
 Parameters specified by a user or expert: sliding window
size, width of an envelope for similarity, maximum gap,
and matching fraction
36
Steps for Performing a Similarity Search
 Atomic matching
 Find all pairs of gap-free windows of a small length that
are similar
 Window stitching
 Stitch similar windows to form pairs of large similar
subsequences allowing gaps between atomic matches
 Subsequence Ordering
 Linearly order the subsequence matches to determine
whether enough similar pieces exist
37
Similar Time Series Analysis
VanEck International Fund Fidelity Selective Precious Metal and Mineral Fund
Two similar mutual funds in the different fund group
38
Query Languages for Time Sequences
 Time-sequence query language
 Should be able to specify sophisticated queries like
Find all of the sequences that are similar to some sequence in class
A, but not similar to any sequence in class B
 Should be able to support various kinds of queries: range queries,
all-pair queries, and nearest neighbor queries
 Shape definition language
 Allows users to define and query the overall shape of time
sequences
 Uses human readable series of sequence transitions or macros
 Ignores the specific details
 E.g., the pattern up, Up, UP can be used to describe
increasing degrees of rising slopes
 Macros: spike, valley, etc.
39
Mining Time-Series Data
Regression Analysis
Trend Analysis
Similarity Search in Time Series Data
Motif-Based Search and Mining in Time
Series Data
Summary
40
Sequence Distance
 A function that measures the differentness of two
sequences (of possibly unequal length)
 Example: Euclidean Distance between TS Q,C



n
i i
i c
q
C
Q
D 1
2
)
(
)
,
(

More Related Content

Similar to Data Mining Time-Series Concepts

UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docxscet315
 
Quality management presentation
Quality management presentationQuality management presentation
Quality management presentationselinasimpson1501
 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process ControlMarwa Abo-Amra
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive ModellingRajiv Advani
 
FORECASTING 2015-17.pptx
FORECASTING 2015-17.pptxFORECASTING 2015-17.pptx
FORECASTING 2015-17.pptxRohit Raj
 
7 qc tools
7 qc tools7 qc tools
7 qc toolskmsonam
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfRAJVEERKUMAR41
 
Forecasting
ForecastingForecasting
Forecasting3abooodi
 
Iso 9001 training courses
Iso 9001 training coursesIso 9001 training courses
Iso 9001 training courseskaredutip
 
Iso 9001 courses
Iso 9001 coursesIso 9001 courses
Iso 9001 coursesdenritafu
 
Iso 9001 requirements checklist
Iso 9001 requirements checklistIso 9001 requirements checklist
Iso 9001 requirements checklistdenritafu
 
Quality manual for iso 9001
Quality manual for iso 9001Quality manual for iso 9001
Quality manual for iso 9001jondarita
 
Iso 9001 lead auditor
Iso 9001 lead auditorIso 9001 lead auditor
Iso 9001 lead auditorpogerita
 
Time Series Analysis and Forecasting.ppt
Time Series Analysis and Forecasting.pptTime Series Analysis and Forecasting.ppt
Time Series Analysis and Forecasting.pptssuser220491
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants ukjondarita
 

Similar to Data Mining Time-Series Concepts (20)

Quality management thesis
Quality management thesisQuality management thesis
Quality management thesis
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Forecasting (1)
Forecasting (1)Forecasting (1)
Forecasting (1)
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docx
 
Quality management presentation
Quality management presentationQuality management presentation
Quality management presentation
 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process Control
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
FORECASTING 2015-17.pptx
FORECASTING 2015-17.pptxFORECASTING 2015-17.pptx
FORECASTING 2015-17.pptx
 
7 qc tools
7 qc tools7 qc tools
7 qc tools
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdf
 
Forecasting
ForecastingForecasting
Forecasting
 
Iso 9001 training courses
Iso 9001 training coursesIso 9001 training courses
Iso 9001 training courses
 
Iso 9001 courses
Iso 9001 coursesIso 9001 courses
Iso 9001 courses
 
Iso 9001 requirements checklist
Iso 9001 requirements checklistIso 9001 requirements checklist
Iso 9001 requirements checklist
 
Quality manual for iso 9001
Quality manual for iso 9001Quality manual for iso 9001
Quality manual for iso 9001
 
Iso 9001 lead auditor
Iso 9001 lead auditorIso 9001 lead auditor
Iso 9001 lead auditor
 
Time Series Analysis and Forecasting.ppt
Time Series Analysis and Forecasting.pptTime Series Analysis and Forecasting.ppt
Time Series Analysis and Forecasting.ppt
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants uk
 

More from SauravDash10

Permutation and Combination excellant.ppt
Permutation and Combination excellant.pptPermutation and Combination excellant.ppt
Permutation and Combination excellant.pptSauravDash10
 
Transportation.pptx
Transportation.pptxTransportation.pptx
Transportation.pptxSauravDash10
 
Data Migration.pptx
Data Migration.pptxData Migration.pptx
Data Migration.pptxSauravDash10
 
01_Module_1-ProbabilityTheory.pptx
01_Module_1-ProbabilityTheory.pptx01_Module_1-ProbabilityTheory.pptx
01_Module_1-ProbabilityTheory.pptxSauravDash10
 
Bivariate Distribution.pptx
Bivariate Distribution.pptxBivariate Distribution.pptx
Bivariate Distribution.pptxSauravDash10
 
Determinants and matrices.ppt
Determinants and matrices.pptDeterminants and matrices.ppt
Determinants and matrices.pptSauravDash10
 
CP Power Point.ppt
CP Power Point.pptCP Power Point.ppt
CP Power Point.pptSauravDash10
 
PermutationsAndCombinations.ppt
PermutationsAndCombinations.pptPermutationsAndCombinations.ppt
PermutationsAndCombinations.pptSauravDash10
 

More from SauravDash10 (15)

Permutation and Combination excellant.ppt
Permutation and Combination excellant.pptPermutation and Combination excellant.ppt
Permutation and Combination excellant.ppt
 
Game Theory.pptx
Game Theory.pptxGame Theory.pptx
Game Theory.pptx
 
ANOVA.pptx
ANOVA.pptxANOVA.pptx
ANOVA.pptx
 
Transportation.pptx
Transportation.pptxTransportation.pptx
Transportation.pptx
 
Data Migration.pptx
Data Migration.pptxData Migration.pptx
Data Migration.pptx
 
Hypothesis.ppt
Hypothesis.pptHypothesis.ppt
Hypothesis.ppt
 
01_Module_1-ProbabilityTheory.pptx
01_Module_1-ProbabilityTheory.pptx01_Module_1-ProbabilityTheory.pptx
01_Module_1-ProbabilityTheory.pptx
 
Bivariate Distribution.pptx
Bivariate Distribution.pptxBivariate Distribution.pptx
Bivariate Distribution.pptx
 
Determinants and matrices.ppt
Determinants and matrices.pptDeterminants and matrices.ppt
Determinants and matrices.ppt
 
graphs.ppt
graphs.pptgraphs.ppt
graphs.ppt
 
Set Theory.pdf
Set Theory.pdfSet Theory.pdf
Set Theory.pdf
 
CP Power Point.ppt
CP Power Point.pptCP Power Point.ppt
CP Power Point.ppt
 
PermutationsAndCombinations.ppt
PermutationsAndCombinations.pptPermutationsAndCombinations.ppt
PermutationsAndCombinations.ppt
 
Group Ring.ppt
Group Ring.pptGroup Ring.ppt
Group Ring.ppt
 
PPTs.pptx
PPTs.pptxPPTs.pptx
PPTs.pptx
 

Recently uploaded

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 

Recently uploaded (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 

Data Mining Time-Series Concepts

  • 2. 2 Mining Time-Series Data  A time series is a sequence of data points, measured typically at successive times, spaced at (often uniform) time intervals  Time series analysis: A subfield of statistics, comprises methods that attempt to understand such time series, often either to understand the underlying context of the data points or to make forecasts (or predictions)  Methods for time series analyses  Frequency-domain methods: Model-free analyses, well-suited to exploratory investigations  spectral analysis vs. wavelet analysis  Time-domain methods: Auto-correlation and cross-correlation analysis  Motif-based time-series analysis  Applications  Financial: stock price, inflation  Industry: power consumption  Scientific: experiment results  Meteorological: precipitation
  • 3. 3 Mining Time-Series Data Regression Analysis Trend Analysis Similarity Search in Time Series Data Motif-Based Search and Mining in Time Series Data Summary
  • 4. 4 Time-Series Data Analysis: Prediction & Regression Analysis  (Numerical) prediction is similar to classification  construct a model  use model to predict continuous or ordered value for a given input  Prediction is different from classification  Classification refers to predict categorical class label  Prediction models continuous-valued functions  Major method for prediction: regression  model the relationship between one or more independent or predictor variables and a dependent or response variable  Regression analysis  Linear and multiple regression  Non-linear regression  Other regression methods: generalized linear model, Poisson regression, log-linear models, regression trees
  • 5. 5 What is Regression?  Modeling the relationship between one response variable and one or more predictor variables  Analyzing the confidence of the model  E.g, height v.s weight
  • 6. 6 Regression Yields Analytical Model  Discrete data points →Analytical model  General relationship  Easy calculation  Further analysis  Application - Prediction
  • 7. 7 Application - Detrending  Obtain the trend for irregular data series  Subtract trend  Reveal oscillations trend
  • 8. 8 Linear Regression - Single Predictor  Model is linear y = w0 + w1 x where w0 (y-intercept) and w1 (slope) are regression coefficients  Method of least squares: y: response variable x: predictor variable w1 w0 | | 1 | | 2 1 ( )( ) 1 ( ) D i i i D i i x x y y x x w         x w y w 1 0  
  • 9. 9  Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)  E.g., for 2-D data or y = w0 + w1 x1+ w2 x2  Solvable by  Extension of least square method (XTX ) W=Y →W = (XTX ) -1Y  Commercial software (SAS, S-Plus) x1 x2 y Linear Regression – Multiple Predictor
  • 10. 10 Nonlinear Regression with Linear Method  Polynomial regression model  E.g., y = w0 + w1 x + w2 x2 + w3 x3 Let x2 = x2, x3= x3 y = w0 + w1 x + w2 x2 + w3 x3  Log-linear regression model  E. g., y = exp(w0 + w1 x + w2 x2 + w3 x3 ) Let y’=log(y) y’= w0 + w1 x + w2 x2 + w3 x3
  • 11. 11 Generalized Linear Regression  Response y  Distribution function in the exponential family  Variance of y depends on E( y), not a constant  E( y) = g-1( w0 + w1 x + w2 x2 + w3 x3 )  Examples  Logistic regression (binomial regression): probability of some event occurring  Poisson regression: number of customers  …  References: Nelder and Wedderburn, 1972; McCullagh and Nelder, 1989
  • 12. 12 Regression Tree (Breiman et al., 1984) Figure source: http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf  Partition the domain space  Leaf: (1) a continuous-valued prediction; (2) average value
  • 13. 13 Model Tree (Quinlan, 1992)  Leaf – a linear equation  More general than regression tree Figure source: http://datamining.ihe.nl/research/model-trees.htm
  • 14. 14 Regression Trees and Model Trees  Regression tree: proposed in CART system (Breiman et al. 1984)  CART: Classification And Regression Trees  Each leaf stores a continuous-valued prediction  It is the average value of the predicted attribute for the training tuples that reach the leaf  Model tree: proposed by Quinlan (1992)  Each leaf holds a regression model—a multivariate linear equation for the predicted attribute  A more general case than regression tree  Regression and model trees tend to be more accurate than linear regression when the data cannot be represented well by a simple linear model
  • 15. 15 Predictive Modeling in Multidimensional Databases  Predictive modeling: Predict data values or construct generalized linear models based on the database data  One can only predict value ranges or category distributions  Method outline  Minimal generalization  Attribute relevance analysis  Generalized linear model construction  Prediction  Determine the major factors which influence the prediction  Data relevance analysis: uncertainty measurement, entropy analysis, expert judgment, etc.  Multi-level prediction: drill-down and roll-up analysis
  • 16. 16  Predictive modeling: Predict data values or construct generalized linear models based on the database data  One can only predict value ranges or category distributions  Method outline:  Minimal generalization  Attribute relevance analysis  Generalized linear model construction  Prediction  Determine the major factors which influence the prediction  Data relevance analysis: uncertainty measurement, entropy analysis, expert judgment, etc.  Multi-level prediction: drill-down and roll-up analysis Predictive Modeling in Multidimensional Databases
  • 18. 18 References  Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. Journal of the Royal Statistical Society A, 135, 370-384.  C. Chatfield. The Analysis of Time Series: An Introduction, 3rd ed. Chapman & Hall, 1984.  McCullagh, P. and Nelder, J.A. (1989). Generalized linear models, 2nd ed. Chapman and Hall, London.  Breiman L, Friedman JH, Olshen RA, Stone CJ. (1984). Classification and Regression Trees. Chapman &Hall (Wadsworth, Inc.): New York.  Quinlan, J. R. (1992). Learning with continuous classes. In: Adams, , Sterling, (Eds.), Proceedings of artificial intelligence'92, World Scientific, Singapore. pp. 343-348.  Acknowledgment  This presentation integrates Xiaopeng Li’s slides in his CS 512 class presentation
  • 19. 19 Mining Time-Series Data Regression Analysis Trend Analysis Similarity Search in Time Series Data Motif-Based Search and Mining in Time Series Data Summary
  • 20. 20  A time series can be illustrated as a time-series graph which describes a point moving with the passage of time
  • 21. 21 Categories of Time-Series Movements  Categories of Time-Series Movements  Long-term or trend movements (trend curve): general direction in which a time series is moving over a long interval of time  Cyclic movements or cycle variations: long term oscillations about a trend line or curve  e.g., business cycles, may or may not be periodic  Seasonal movements or seasonal variations  i.e, almost identical patterns that a time series appears to follow during corresponding months of successive years.  Irregular or random movements  Time series analysis: decomposition of a time series into these four basic movements  Additive Modal: TS = T + C + S + I  Multiplicative Modal: TS = T  C  S  I
  • 22. 22 Estimation of Trend Curve  The freehand method  Fit the curve by looking at the graph  Costly and barely reliable for large-scaled data mining  The least-square method  Find the curve minimizing the sum of the squares of the deviation of points on the curve from the corresponding data points  The moving-average method
  • 23. 23 Moving Average  Moving average of order n  Smoothes the data  Eliminates cyclic, seasonal and irregular movements  Loses the data at the beginning or end of a series  Sensitive to outliers (can be reduced by weighted moving average)
  • 24. 24 Trend Discovery in Time-Series (1): Estimation of Seasonal Variations  Seasonal index  Set of numbers showing the relative values of a variable during the months of the year  E.g., if the sales during October, November, and December are 80%, 120%, and 140% of the average monthly sales for the whole year, respectively, then 80, 120, and 140 are seasonal index numbers for these months  Deseasonalized data  Data adjusted for seasonal variations for better trend and cyclic analysis  Divide the original monthly data by the seasonal index numbers for the corresponding months
  • 25. November 17, 2023 Data Mining: Concepts and Techniques 25 Seasonal Index 0 20 40 60 80 100 120 140 160 1 2 3 4 5 6 7 8 9 10 11 12 Month Seasonal Index Raw data from http://www.bbk.ac.uk/man op/man/docs/QII_2_2003 %20Time%20series.pdf
  • 26. 26 Trend Discovery in Time-Series (2)  Estimation of cyclic variations  If (approximate) periodicity of cycles occurs, cyclic index can be constructed in much the same manner as seasonal indexes  Estimation of irregular variations  By adjusting the data for trend, seasonal and cyclic variations  With the systematic analysis of the trend, cyclic, seasonal, and irregular components, it is possible to make long- or short-term predictions with reasonable quality
  • 27. 27 Mining Time-Series Data Regression Analysis Trend Analysis Similarity Search in Time Series Data Motif-Based Search and Mining in Time Series Data Summary
  • 28. 28 Similarity Search in Time-Series Analysis  Normal database query finds exact match  Similarity search finds data sequences that differ only slightly from the given query sequence  Two categories of similarity queries  Whole matching: find a sequence that is similar to the query sequence  Subsequence matching: find all pairs of similar sequences  Typical Applications  Financial market  Market basket data analysis  Scientific databases  Medical diagnosis
  • 29. 29 Data Transformation  Many techniques for signal analysis require the data to be in the frequency domain  Usually data-independent transformations are used  The transformation matrix is determined a priori  discrete Fourier transform (DFT)  discrete wavelet transform (DWT)  The distance between two signals in the time domain is the same as their Euclidean distance in the frequency domain
  • 30. 30 Discrete Fourier Transform  DFT does a good job of concentrating energy in the first few coefficients  If we keep only first a few coefficients in DFT, we can compute the lower bounds of the actual distance  Feature extraction: keep the first few coefficients (F-index) as representative of the sequence
  • 31. 31 DFT (continued)  Parseval’s Theorem  The Euclidean distance between two signals in the time domain is the same as their distance in the frequency domain  Keep the first few (say, 3) coefficients underestimates the distance and there will be no false dismissals!        1 0 2 1 0 2 | | | | n f f n t t X x | ] )[ ( ] )[ ( | | ] [ ] [ | 3 0 2 0 2          f n t f Q F f S F t Q t S  
  • 32. 32 Multidimensional Indexing in Time-Series  Multidimensional index construction  Constructed for efficient accessing using the first few Fourier coefficients  Similarity search  Use the index to retrieve the sequences that are at most a certain small distance away from the query sequence  Perform post-processing by computing the actual distance between sequences in the time domain and discard any false matches
  • 33. 33 Subsequence Matching  Break each sequence into a set of pieces of window with length w  Extract the features of the subsequence inside the window  Map each sequence to a “trail” in the feature space  Divide the trail of each sequence into “subtrails” and represent each of them with minimum bounding rectangle  Use a multi-piece assembly algorithm to search for longer sequence matches
  • 34. 34 Analysis of Similar Time Series
  • 35. 35 Enhanced Similarity Search Methods  Allow for gaps within a sequence or differences in offsets or amplitudes  Normalize sequences with amplitude scaling and offset translation  Two subsequences are considered similar if one lies within an envelope of  width around the other, ignoring outliers  Two sequences are said to be similar if they have enough non-overlapping time-ordered pairs of similar subsequences  Parameters specified by a user or expert: sliding window size, width of an envelope for similarity, maximum gap, and matching fraction
  • 36. 36 Steps for Performing a Similarity Search  Atomic matching  Find all pairs of gap-free windows of a small length that are similar  Window stitching  Stitch similar windows to form pairs of large similar subsequences allowing gaps between atomic matches  Subsequence Ordering  Linearly order the subsequence matches to determine whether enough similar pieces exist
  • 37. 37 Similar Time Series Analysis VanEck International Fund Fidelity Selective Precious Metal and Mineral Fund Two similar mutual funds in the different fund group
  • 38. 38 Query Languages for Time Sequences  Time-sequence query language  Should be able to specify sophisticated queries like Find all of the sequences that are similar to some sequence in class A, but not similar to any sequence in class B  Should be able to support various kinds of queries: range queries, all-pair queries, and nearest neighbor queries  Shape definition language  Allows users to define and query the overall shape of time sequences  Uses human readable series of sequence transitions or macros  Ignores the specific details  E.g., the pattern up, Up, UP can be used to describe increasing degrees of rising slopes  Macros: spike, valley, etc.
  • 39. 39 Mining Time-Series Data Regression Analysis Trend Analysis Similarity Search in Time Series Data Motif-Based Search and Mining in Time Series Data Summary
  • 40. 40 Sequence Distance  A function that measures the differentness of two sequences (of possibly unequal length)  Example: Euclidean Distance between TS Q,C    n i i i c q C Q D 1 2 ) ( ) , (