SlideShare a Scribd company logo
Multidimensional analysis and descriptive mining of complex data objects
Time Series and Sequence Data
• A r c h i s h m a n B a n d y o p a d h y a y ( 3 9 1 3 6 )
• G a u ra n g D h u m e ( 3 9 1 6 8 )
Data Stream Mining
Key characteristics :
• Temporarily ordered
• Fast changing
• Massive
• Potentially infinite
Streams of data flow in and
out of a computer system
continuously and with
varying update rates.
Data stream management
systems (DSMS) are used
to perform data mining on
the data stream
I/P O/P
Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A)
Data Stream Mining is the process of extracting
knowledge structures from continuous, rapid
data records.
A data stream is an ordered sequence of
instances that in many applications of data
stream mining can be read only once or a small
number of times using limited computing and
storage capabilities.
Time Series
A time series is a collection of
observations made sequentially in
time.
e.g.
• Financial time series like stock
fluctuations with time,
• Sales revenue with time,
• Budgetary analysis,
• Utility studies, inventory studies,
• Yield projections,
• Workload projections,
• Process and quality control,
• Observation of natural phenomena
(such as atmosphere, temperature,
wind, earthquake)
Major time series data mining tasks :
1. Indexing
2. Clustering
3. Classification
4. Prediction
5. Anomaly Detection
Time
Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A)
Trend Analysis ; Y=f(t)
Goals for Time series analysis :
• Modelling time series (i.e., to gain
insight into the mechanisms or
underlying forces that generate the
time series )
• Forecasting time series (i.e., to
predict the future values of the
time-series variables).
Major components or movements :
• Trend or long-term movement
• Cyclic movement
• Seasonal movements
• Irregular/Random movements
0
1
2
3
4
5
6
Jan-01
Jul-01
Jan-02
Jul-02
Jan-03
Jul-03
Jan-04
Jul-04
Jan-05
Jul-05
Jan-06
Jul-06
Jan-07
Jul-07
Jan-08
Jul-08
Jan-09
Jul-09
Jan-10
Jul-10
Jan-11
Jul-11
Germany-Long-term interest rate (%)
400
500
600
700
800
900
1000
97 -
Q1
97 -
Q2
97 -
Q3
97 -
Q4
98 -
Q1
98 -
Q2
98 -
Q3
98 -
Q4
99 -
Q1
99 -
Q2
99 -
Q3
99 -
Q4
Ice Cream Sales(Rs. Mn)
Long Term Trend
Seasonal Movement
Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A)
Sequential Pattern Mining
A Sequential database is any database
that consists of sequences of ordered
events, with or without concrete
notions of time.
Examples
• Web page traversal sequences
Customer shopping transaction
sequences (Renting “Star Wars”, then
“Empire Strikes Back”, then “Return of
the Jedi” in that order)
• Collection of ordered events within an
interval
Applications
• Targeted marketing
• Customer retention
• Weather prediction , etc
Pattern mining consists of discovering
interesting, useful, and unexpected
patterns in databases
Various types of patterns can be
discovered in databases, such as :
• Frequent item-sets,
• Associations,
• Subgraphs,
• Sequential rules, and
• Periodic patterns
Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A)
Seq. ID Sequence
1 {a , b} , {c} , {f , g} , {g} , {e}
2 {a , d} , {c} , {b} , {a , b , e , f }
3 {a} , {b} , {f , g} , {e}
4 {b} , {f , g}
Example :
Applying Sequential Pattern Mining to Time Series
Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A)
It is possible to convert time
series to sequences by
discretizing the time series
(transforming the numbers into
symbols).
Then techniques for analysing
sequences can also be applied to
analyse the time series - one of
the most popular algorithms for
the same being the SAX
algorithm (Symbolic Aggregate
approXimation)
Step 1 : PAA (piecewise
aggregate approximation)
Split the time series into 8
segments and replace each
segment by its average ,to
reduce the dimensionality
n = 11 (orgnl. data pts.)
w = 4 (no. of symbols)
v = 8 (PAA data pts.)
Step 2 : Each data point is
replaced by a symbol (to
represent various intervals of
values such that each is equally
probable under the normal
distribution)
Four symbols are created:
a = [-Infinity, 4.50]
b = [4.50, 6.2]
c = [6.2, 7.90]
d = [7.90,Infinity]
The result is a sequence of symbols: a, a, c, d, d, c, c, b
Step 3 : This sequence is the symbolic representation of the
time series.
Then, after a time series has been converted to a sequence, it is
possible to apply traditional pattern mining algorithms on the
time series
Common Sequential Pattern Mining Algorithms
I. Apriori-based Approaches (If a sequence S is not frequent, then
none of the supersequences of S is frequent) :
• GSP (Generalized Sequential Pattern)
• SPADE (Sequential PAttern Discovery using Equivalent Class)
II. Pattern-Growth-based Approaches (Sequence databases are
recursively projected into a set of smaller projected databases based on
the current sequential pattern(s), and sequential patterns are grown in
each projected databases by exploring only locally frequent fragments) :
• FreeSpan
• PrefixSpan (Prefix-Projected Sequential Pattern Growth)
Common Sequential Pattern Mining Algorithms
Algorithm (Top Down – Apriori , Bottom Up – Pattern Growth)
Large supermarket tracks sales data by
stock-keeping unit (SKU) for each item: each
item, such as "butter" or "bread", is
identified by a numerical SKU. The
supermarket has a database of transactions
where each transaction is a set of SKUs that
were bought together.
Let the database of transactions consist of
following item sets:
Item sets
{1,2,3,4}
{1,2,4}
{1,2}
{2,3,4}
{2,3}
{3,4}
{2,4}
Count the no. of
occurrences, called
the support, of each
member item
separately. By
scanning the
database for the first
time, we obtain the
following result
Item Support
{1} 3
{2} 6
{3} 4
{4} 5
Say that an item set is frequent if it appears in at least 3
transactions of the database: the value 3 is the support
threshold.
All the itemsets of size 1 have a support of at least 3, so they
are all frequent.
The next step is to generate a list of all pairs of the frequent
items.
For example, regarding the pair {1,2}: the first table of
Example 2 shows items 1 and 2 appearing together in three
of the itemsets; therefore, we say item {1,2} has support of
three.
Item
Supp
ort
{1,2} 3
{1,3} 1
{1,4} 2
{2,3} 3
{2,4} 4
{3,4} 3
The pairs {1,2}, {2,3}, {2,4}, and
{3,4} all meet or exceed the
minimum support of 3, so they
are frequent. The pairs {1,3}
and {1,4} are not. Now, because
{1,3} and {1,4} are not
frequent, any larger set which
contains {1,3} or {1,4} cannot
be frequent. In this way, we can
prune sets: we will now look
for frequent triples in the
database
- Thank You -
• http://data-mining.philippe-fournier-
viger.com/introduction-time-series-
mining-spmf/
• http://data-mining.philippe-fournier-
viger.com/introduction-sequential-
pattern-mining/
• http://web.engr.illinois.edu/~hanj/cs5
12/bk2chaps/chapter_8.pdf
• Lin, J., Keogh, E., Wei, L., Lonardi, S.:
Experiencing SAX: a novel symbolic
representation of time series. Data
Mining and Knowledge Discovery 15,
107–144 (2007)
Sources :

More Related Content

What's hot

Time series Analysis
Time series AnalysisTime series Analysis
1634 time series and trend analysis
1634 time series and trend analysis1634 time series and trend analysis
1634 time series and trend analysis
Dr Fereidoun Dejahang
 
Trend analysis
Trend analysisTrend analysis
Trend analysis
Milan Verma
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
cscpconf
 
Time series analysis
Time series analysisTime series analysis
Time series analysis
Faltu Focat
 
Mba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisMba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisChandra Kodituwakku
 
Trend analysis and time Series Analysis
Trend analysis and time Series Analysis Trend analysis and time Series Analysis
Trend analysis and time Series Analysis
Amna Kouser
 
Trend analysis - Lecture Notes
Trend analysis - Lecture NotesTrend analysis - Lecture Notes
Trend analysis - Lecture Notes
Dr. Nirav Vyas
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 
Time-series Analysis in Minutes
Time-series Analysis in MinutesTime-series Analysis in Minutes
Time-series Analysis in Minutes
Orzota
 
time series modeling-Decision Science Central Oct 26
time series modeling-Decision Science Central Oct 26time series modeling-Decision Science Central Oct 26
time series modeling-Decision Science Central Oct 26Vinay Mehendiratta, PhD
 
What Are Data Trends and Patterns, and How Do They Impact Business Decisions?
What Are Data Trends and Patterns, and How Do They Impact Business Decisions?What Are Data Trends and Patterns, and How Do They Impact Business Decisions?
What Are Data Trends and Patterns, and How Do They Impact Business Decisions?
Smarten Augmented Analytics
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
Davide Gallitelli
 
Time Series, Moving Average
Time Series, Moving AverageTime Series, Moving Average
Time Series, Moving Average
SOMASUNDARAM T
 

What's hot (18)

Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
 
1634 time series and trend analysis
1634 time series and trend analysis1634 time series and trend analysis
1634 time series and trend analysis
 
Trend analysis
Trend analysisTrend analysis
Trend analysis
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
 
Time series analysis
Time series analysisTime series analysis
Time series analysis
 
Mba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisMba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysis
 
Trend analysis and time Series Analysis
Trend analysis and time Series Analysis Trend analysis and time Series Analysis
Trend analysis and time Series Analysis
 
Ijariie1129
Ijariie1129Ijariie1129
Ijariie1129
 
Trend analysis - Lecture Notes
Trend analysis - Lecture NotesTrend analysis - Lecture Notes
Trend analysis - Lecture Notes
 
time series analysis
time series analysistime series analysis
time series analysis
 
Time Series
Time SeriesTime Series
Time Series
 
Time-series Analysis in Minutes
Time-series Analysis in MinutesTime-series Analysis in Minutes
Time-series Analysis in Minutes
 
time series modeling-Decision Science Central Oct 26
time series modeling-Decision Science Central Oct 26time series modeling-Decision Science Central Oct 26
time series modeling-Decision Science Central Oct 26
 
What Are Data Trends and Patterns, and How Do They Impact Business Decisions?
What Are Data Trends and Patterns, and How Do They Impact Business Decisions?What Are Data Trends and Patterns, and How Do They Impact Business Decisions?
What Are Data Trends and Patterns, and How Do They Impact Business Decisions?
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
 
Time series
Time seriesTime series
Time series
 
Time Series, Moving Average
Time Series, Moving AverageTime Series, Moving Average
Time Series, Moving Average
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 

Similar to Analysis of Time Series Data & Pattern Sequencing

20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt
PalaniKumarR2
 
20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt
SamPrem3
 
Unit 5 Time series Data Analysis.pdf
Unit 5 Time series Data Analysis.pdfUnit 5 Time series Data Analysis.pdf
Unit 5 Time series Data Analysis.pdf
Sheba41
 
7 QC - NEW.ppt
7 QC - NEW.ppt7 QC - NEW.ppt
7 QC - NEW.ppt
AmitGajbhiye9
 
Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...butest
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cmaiaemedu
 
Ali upload
Ali uploadAli upload
Ali upload
Ali Zahraei, Ph.D
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039ijsrd.com
 
Sales Forecast and Store Analysis for Data Analytics
Sales Forecast and Store Analysis for Data AnalyticsSales Forecast and Store Analysis for Data Analytics
Sales Forecast and Store Analysis for Data Analytics
AnkitArora764271
 
Quality management solutions
Quality management solutionsQuality management solutions
Quality management solutionsselinasimpson0801
 
Quality management courses ireland
Quality management courses irelandQuality management courses ireland
Quality management courses irelandselinasimpson371
 
Predicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector RegressionPredicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector Regression
Chittagong Independent University
 
Call center quality management
Call center quality managementCall center quality management
Call center quality managementselinasimpson2601
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1
guest9529cb
 
Quality of service management
Quality of service managementQuality of service management
Quality of service managementselinasimpson2301
 

Similar to Analysis of Time Series Data & Pattern Sequencing (20)

20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt
 
20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt
 
Unit 5 Time series Data Analysis.pdf
Unit 5 Time series Data Analysis.pdfUnit 5 Time series Data Analysis.pdf
Unit 5 Time series Data Analysis.pdf
 
7 QC - NEW.ppt
7 QC - NEW.ppt7 QC - NEW.ppt
7 QC - NEW.ppt
 
Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...
 
Quality management syllabus
Quality management syllabusQuality management syllabus
Quality management syllabus
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cma
 
Ali upload
Ali uploadAli upload
Ali upload
 
Ac26185187
Ac26185187Ac26185187
Ac26185187
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039
 
Sales Forecast and Store Analysis for Data Analytics
Sales Forecast and Store Analysis for Data AnalyticsSales Forecast and Store Analysis for Data Analytics
Sales Forecast and Store Analysis for Data Analytics
 
Quality management solutions
Quality management solutionsQuality management solutions
Quality management solutions
 
Quality management courses ireland
Quality management courses irelandQuality management courses ireland
Quality management courses ireland
 
Predicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector RegressionPredicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector Regression
 
howto_histogram.pdf
howto_histogram.pdfhowto_histogram.pdf
howto_histogram.pdf
 
Howto histogram
Howto histogramHowto histogram
Howto histogram
 
Call center quality management
Call center quality managementCall center quality management
Call center quality management
 
Quality management nampa id
Quality management nampa idQuality management nampa id
Quality management nampa id
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1
 
Quality of service management
Quality of service managementQuality of service management
Quality of service management
 

Recently uploaded

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 

Recently uploaded (20)

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 

Analysis of Time Series Data & Pattern Sequencing

  • 1. Multidimensional analysis and descriptive mining of complex data objects Time Series and Sequence Data • A r c h i s h m a n B a n d y o p a d h y a y ( 3 9 1 3 6 ) • G a u ra n g D h u m e ( 3 9 1 6 8 )
  • 2. Data Stream Mining Key characteristics : • Temporarily ordered • Fast changing • Massive • Potentially infinite Streams of data flow in and out of a computer system continuously and with varying update rates. Data stream management systems (DSMS) are used to perform data mining on the data stream I/P O/P Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A) Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.
  • 3. Time Series A time series is a collection of observations made sequentially in time. e.g. • Financial time series like stock fluctuations with time, • Sales revenue with time, • Budgetary analysis, • Utility studies, inventory studies, • Yield projections, • Workload projections, • Process and quality control, • Observation of natural phenomena (such as atmosphere, temperature, wind, earthquake) Major time series data mining tasks : 1. Indexing 2. Clustering 3. Classification 4. Prediction 5. Anomaly Detection Time Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A)
  • 4. Trend Analysis ; Y=f(t) Goals for Time series analysis : • Modelling time series (i.e., to gain insight into the mechanisms or underlying forces that generate the time series ) • Forecasting time series (i.e., to predict the future values of the time-series variables). Major components or movements : • Trend or long-term movement • Cyclic movement • Seasonal movements • Irregular/Random movements 0 1 2 3 4 5 6 Jan-01 Jul-01 Jan-02 Jul-02 Jan-03 Jul-03 Jan-04 Jul-04 Jan-05 Jul-05 Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08 Jan-09 Jul-09 Jan-10 Jul-10 Jan-11 Jul-11 Germany-Long-term interest rate (%) 400 500 600 700 800 900 1000 97 - Q1 97 - Q2 97 - Q3 97 - Q4 98 - Q1 98 - Q2 98 - Q3 98 - Q4 99 - Q1 99 - Q2 99 - Q3 99 - Q4 Ice Cream Sales(Rs. Mn) Long Term Trend Seasonal Movement Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A)
  • 5. Sequential Pattern Mining A Sequential database is any database that consists of sequences of ordered events, with or without concrete notions of time. Examples • Web page traversal sequences Customer shopping transaction sequences (Renting “Star Wars”, then “Empire Strikes Back”, then “Return of the Jedi” in that order) • Collection of ordered events within an interval Applications • Targeted marketing • Customer retention • Weather prediction , etc Pattern mining consists of discovering interesting, useful, and unexpected patterns in databases Various types of patterns can be discovered in databases, such as : • Frequent item-sets, • Associations, • Subgraphs, • Sequential rules, and • Periodic patterns Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A) Seq. ID Sequence 1 {a , b} , {c} , {f , g} , {g} , {e} 2 {a , d} , {c} , {b} , {a , b , e , f } 3 {a} , {b} , {f , g} , {e} 4 {b} , {f , g} Example :
  • 6. Applying Sequential Pattern Mining to Time Series Introduction to Business Intelligence – Archishman Bandyopadhyay & Gaurang Dhume , SIBM Pune (MBA 39 , Marketing A) It is possible to convert time series to sequences by discretizing the time series (transforming the numbers into symbols). Then techniques for analysing sequences can also be applied to analyse the time series - one of the most popular algorithms for the same being the SAX algorithm (Symbolic Aggregate approXimation) Step 1 : PAA (piecewise aggregate approximation) Split the time series into 8 segments and replace each segment by its average ,to reduce the dimensionality n = 11 (orgnl. data pts.) w = 4 (no. of symbols) v = 8 (PAA data pts.) Step 2 : Each data point is replaced by a symbol (to represent various intervals of values such that each is equally probable under the normal distribution) Four symbols are created: a = [-Infinity, 4.50] b = [4.50, 6.2] c = [6.2, 7.90] d = [7.90,Infinity] The result is a sequence of symbols: a, a, c, d, d, c, c, b Step 3 : This sequence is the symbolic representation of the time series. Then, after a time series has been converted to a sequence, it is possible to apply traditional pattern mining algorithms on the time series
  • 7. Common Sequential Pattern Mining Algorithms I. Apriori-based Approaches (If a sequence S is not frequent, then none of the supersequences of S is frequent) : • GSP (Generalized Sequential Pattern) • SPADE (Sequential PAttern Discovery using Equivalent Class) II. Pattern-Growth-based Approaches (Sequence databases are recursively projected into a set of smaller projected databases based on the current sequential pattern(s), and sequential patterns are grown in each projected databases by exploring only locally frequent fragments) : • FreeSpan • PrefixSpan (Prefix-Projected Sequential Pattern Growth)
  • 8. Common Sequential Pattern Mining Algorithms Algorithm (Top Down – Apriori , Bottom Up – Pattern Growth) Large supermarket tracks sales data by stock-keeping unit (SKU) for each item: each item, such as "butter" or "bread", is identified by a numerical SKU. The supermarket has a database of transactions where each transaction is a set of SKUs that were bought together. Let the database of transactions consist of following item sets: Item sets {1,2,3,4} {1,2,4} {1,2} {2,3,4} {2,3} {3,4} {2,4} Count the no. of occurrences, called the support, of each member item separately. By scanning the database for the first time, we obtain the following result Item Support {1} 3 {2} 6 {3} 4 {4} 5 Say that an item set is frequent if it appears in at least 3 transactions of the database: the value 3 is the support threshold. All the itemsets of size 1 have a support of at least 3, so they are all frequent. The next step is to generate a list of all pairs of the frequent items. For example, regarding the pair {1,2}: the first table of Example 2 shows items 1 and 2 appearing together in three of the itemsets; therefore, we say item {1,2} has support of three. Item Supp ort {1,2} 3 {1,3} 1 {1,4} 2 {2,3} 3 {2,4} 4 {3,4} 3 The pairs {1,2}, {2,3}, {2,4}, and {3,4} all meet or exceed the minimum support of 3, so they are frequent. The pairs {1,3} and {1,4} are not. Now, because {1,3} and {1,4} are not frequent, any larger set which contains {1,3} or {1,4} cannot be frequent. In this way, we can prune sets: we will now look for frequent triples in the database
  • 9. - Thank You - • http://data-mining.philippe-fournier- viger.com/introduction-time-series- mining-spmf/ • http://data-mining.philippe-fournier- viger.com/introduction-sequential- pattern-mining/ • http://web.engr.illinois.edu/~hanj/cs5 12/bk2chaps/chapter_8.pdf • Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery 15, 107–144 (2007) Sources :