SlideShare a Scribd company logo

Data Mining Lecture_8(b).pptx

Lecture 8b: Clustering Validity, Minimum Description Length (MDL), Introduction to Information Theory, Co-clustering using MDL. (ppt,pdf) Chapter 2, Evimaria Terzi, Problems and Algorithms for Sequence Segmentations, Ph.D. Thesis (PDF)

1 of 23
Download to read offline
DATA MINING
LECTURE 8B
Time series analysis and
Sequence Segmentation
Subrata Kumer Paul
Assistant Professor, Dept. of CSE, BAUET
sksubrata96@gmail.com
Sequential data
• Sequential data (or time series) refers to data that appear
in a specific order.
• The order defines a time axis, that differentiates this data from
other cases we have seen so far
• Examples
• The price of a stock (or of many stocks) over time
• Environmental data (pressure, temperature, precipitation etc) over
time
• The sequence of queries in a search engine, or the frequency of a
query over time
• The words in a document as they appear in order
• A DNA sequence of nucleotides
• Event occurrences in a log over time
• Etc…
• Time series: usually we assume that we have a vector of
numeric values that change over time.
Time-series data
 Financial time series, process monitoring…
Why deal with sequential data?
• Because all data is sequential 
• All data items arrive in the data store in some order
• In some (many) cases the order does not matter
• E.g., we can assume a bag of words model for a
document
• In many cases the order is of interest
• E.g., stock prices do not make sense without the time
information.
Time series analysis
• The addition of the time axis defines new sets of
problems
• Discovering periodic patterns in time series
• Defining similarity between time series
• Finding bursts, or outliers
• Also, some existing problems need to be revisited
taking sequential order into account
• Association rules and Frequent Itemsets in sequential
data
• Summarization and Clustering: Sequence
Segmentation
Sequence Segmentation
• Goal: discover structure in the sequence and
provide a concise summary
• Given a sequence T, segment it into K contiguous
segments that are as homogeneous as possible
• Similar to clustering but now we require the
points in the cluster to be contiguous
• Commonly used for summarization of histograms
in databases

Recommended

More Related Content

Similar to Data Mining Lecture_8(b).pptx

DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxShivamKrPathak
 
chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................nourhandardeer3
 
Tri Merge Sorting Algorithm
Tri Merge Sorting AlgorithmTri Merge Sorting Algorithm
Tri Merge Sorting AlgorithmAshim Sikder
 
Replica exchange MCMC
Replica exchange MCMCReplica exchange MCMC
Replica exchange MCMC. .
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexityshowkat27
 
Lca seminar modified
Lca seminar modifiedLca seminar modified
Lca seminar modifiedInbok Lee
 
l1.ppt
l1.pptl1.ppt
l1.pptImXaib
 
Welcome to Introduction to Algorithms, Spring 2004
Welcome to Introduction to Algorithms, Spring 2004Welcome to Introduction to Algorithms, Spring 2004
Welcome to Introduction to Algorithms, Spring 2004jeronimored
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Recursion and Sorting Algorithms
Recursion and Sorting AlgorithmsRecursion and Sorting Algorithms
Recursion and Sorting AlgorithmsAfaq Mansoor Khan
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!ChenYiHuang5
 
Counting Sort Lowerbound
Counting Sort LowerboundCounting Sort Lowerbound
Counting Sort Lowerbounddespicable me
 

Similar to Data Mining Lecture_8(b).pptx (20)

DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptx
 
chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................
 
l1.ppt
l1.pptl1.ppt
l1.ppt
 
Data Structures 6
Data Structures 6Data Structures 6
Data Structures 6
 
Merge sort
Merge sortMerge sort
Merge sort
 
Ch07 linearspacealignment
Ch07 linearspacealignmentCh07 linearspacealignment
Ch07 linearspacealignment
 
Tri Merge Sorting Algorithm
Tri Merge Sorting AlgorithmTri Merge Sorting Algorithm
Tri Merge Sorting Algorithm
 
Replica exchange MCMC
Replica exchange MCMCReplica exchange MCMC
Replica exchange MCMC
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
 
Lec35
Lec35Lec35
Lec35
 
Lca seminar modified
Lca seminar modifiedLca seminar modified
Lca seminar modified
 
Linear sorting
Linear sortingLinear sorting
Linear sorting
 
Encoding survey
Encoding surveyEncoding survey
Encoding survey
 
l1.ppt
l1.pptl1.ppt
l1.ppt
 
l1.ppt
l1.pptl1.ppt
l1.ppt
 
Welcome to Introduction to Algorithms, Spring 2004
Welcome to Introduction to Algorithms, Spring 2004Welcome to Introduction to Algorithms, Spring 2004
Welcome to Introduction to Algorithms, Spring 2004
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Recursion and Sorting Algorithms
Recursion and Sorting AlgorithmsRecursion and Sorting Algorithms
Recursion and Sorting Algorithms
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Counting Sort Lowerbound
Counting Sort LowerboundCounting Sort Lowerbound
Counting Sort Lowerbound
 

More from Subrata Kumer Paul

Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptSubrata Kumer Paul
 
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptChapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptSubrata Kumer Paul
 
Chapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptChapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptSubrata Kumer Paul
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptSubrata Kumer Paul
 
Chapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptChapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptSubrata Kumer Paul
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptSubrata Kumer Paul
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptSubrata Kumer Paul
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptSubrata Kumer Paul
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxSubrata Kumer Paul
 

More from Subrata Kumer Paul (20)

Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.ppt
 
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptChapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
 
Chapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptChapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.ppt
 
Chapter 2. Know Your Data.ppt
Chapter 2. Know Your Data.pptChapter 2. Know Your Data.ppt
Chapter 2. Know Your Data.ppt
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
Chapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptChapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.ppt
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.ppt
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptx
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
Data Mining Lecture_7.pptx
Data Mining Lecture_7.pptxData Mining Lecture_7.pptx
Data Mining Lecture_7.pptx
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
Data Mining Lecture_6.pptx
Data Mining Lecture_6.pptxData Mining Lecture_6.pptx
Data Mining Lecture_6.pptx
 
Data Mining Lecture_11.pptx
Data Mining Lecture_11.pptxData Mining Lecture_11.pptx
Data Mining Lecture_11.pptx
 
Data Mining Lecture_4.pptx
Data Mining Lecture_4.pptxData Mining Lecture_4.pptx
Data Mining Lecture_4.pptx
 

Recently uploaded

Application of eddy current in industry and domestic purposes.pptx
Application of eddy current in industry and domestic purposes.pptxApplication of eddy current in industry and domestic purposes.pptx
Application of eddy current in industry and domestic purposes.pptxsukantatechedu
 
ChoiAccess.pptbuhuhujhhuhuhuhuhuhhhhhhhhhhhhh
ChoiAccess.pptbuhuhujhhuhuhuhuhuhhhhhhhhhhhhhChoiAccess.pptbuhuhujhhuhuhuhuhuhhhhhhhhhhhhh
ChoiAccess.pptbuhuhujhhuhuhuhuhuhhhhhhhhhhhhhhazhamina
 
FSMR - FIRE SAFETY MAINTENANCE REPORT FORM.pdf
FSMR - FIRE SAFETY MAINTENANCE REPORT FORM.pdfFSMR - FIRE SAFETY MAINTENANCE REPORT FORM.pdf
FSMR - FIRE SAFETY MAINTENANCE REPORT FORM.pdfArnold Saludares
 
Robust, Precise, Fast - Chose Two for Radiated EMC Measurements!
Robust, Precise, Fast - Chose Two for Radiated EMC Measurements!Robust, Precise, Fast - Chose Two for Radiated EMC Measurements!
Robust, Precise, Fast - Chose Two for Radiated EMC Measurements!Mathias Magdowski
 
Laser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsLaser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsPurva Nikam
 
Chase Commerce Center History Nordberg manufacturing Rexnord Global power com...
Chase Commerce Center History Nordberg manufacturing Rexnord Global power com...Chase Commerce Center History Nordberg manufacturing Rexnord Global power com...
Chase Commerce Center History Nordberg manufacturing Rexnord Global power com...drezdzond
 
Paper Machine Troubleshooting manual for paper makers
Paper Machine Troubleshooting manual for paper makersPaper Machine Troubleshooting manual for paper makers
Paper Machine Troubleshooting manual for paper makersNoman khan
 
Microstructure of Hadfield Steels (Robert Hadfield)
Microstructure of Hadfield Steels (Robert Hadfield)Microstructure of Hadfield Steels (Robert Hadfield)
Microstructure of Hadfield Steels (Robert Hadfield)MANICKAVASAHAM G
 
Lesson2 Stoichiometry and mass balance.pdf
Lesson2 Stoichiometry and mass balance.pdfLesson2 Stoichiometry and mass balance.pdf
Lesson2 Stoichiometry and mass balance.pdff1002753214
 
【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证muvgemo
 
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdfForged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdfVikasKumar11936
 
seminar power point presentation by Getahun Shanko.pptx
seminar power point presentation by Getahun Shanko.pptxseminar power point presentation by Getahun Shanko.pptx
seminar power point presentation by Getahun Shanko.pptxGetahunShankoKefeni
 
Amplitude modulation and Demodulation Techniques
Amplitude modulation and Demodulation TechniquesAmplitude modulation and Demodulation Techniques
Amplitude modulation and Demodulation TechniquesRich171473
 
fat and edible oil processsing.ppt, refining
fat and edible oil processsing.ppt, refiningfat and edible oil processsing.ppt, refining
fat and edible oil processsing.ppt, refiningteddymebratie
 
Integrity Constraints in Database Management System.pptx
Integrity Constraints in Database Management System.pptxIntegrity Constraints in Database Management System.pptx
Integrity Constraints in Database Management System.pptxPallaviPatil905338
 
MedTech R&D - Tamer Emara - resume @2024
MedTech R&D - Tamer Emara - resume @2024MedTech R&D - Tamer Emara - resume @2024
MedTech R&D - Tamer Emara - resume @2024Tamer Emara
 
Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?Marian Marinov
 
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh RajputBasic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh RajputGaurav Singh Rajput
 

Recently uploaded (20)

Application of eddy current in industry and domestic purposes.pptx
Application of eddy current in industry and domestic purposes.pptxApplication of eddy current in industry and domestic purposes.pptx
Application of eddy current in industry and domestic purposes.pptx
 
ChoiAccess.pptbuhuhujhhuhuhuhuhuhhhhhhhhhhhhh
ChoiAccess.pptbuhuhujhhuhuhuhuhuhhhhhhhhhhhhhChoiAccess.pptbuhuhujhhuhuhuhuhuhhhhhhhhhhhhh
ChoiAccess.pptbuhuhujhhuhuhuhuhuhhhhhhhhhhhhh
 
FSMR - FIRE SAFETY MAINTENANCE REPORT FORM.pdf
FSMR - FIRE SAFETY MAINTENANCE REPORT FORM.pdfFSMR - FIRE SAFETY MAINTENANCE REPORT FORM.pdf
FSMR - FIRE SAFETY MAINTENANCE REPORT FORM.pdf
 
Robust, Precise, Fast - Chose Two for Radiated EMC Measurements!
Robust, Precise, Fast - Chose Two for Radiated EMC Measurements!Robust, Precise, Fast - Chose Two for Radiated EMC Measurements!
Robust, Precise, Fast - Chose Two for Radiated EMC Measurements!
 
Laser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsLaser And its Application's - Engineering Physics
Laser And its Application's - Engineering Physics
 
Chase Commerce Center History Nordberg manufacturing Rexnord Global power com...
Chase Commerce Center History Nordberg manufacturing Rexnord Global power com...Chase Commerce Center History Nordberg manufacturing Rexnord Global power com...
Chase Commerce Center History Nordberg manufacturing Rexnord Global power com...
 
Paper Machine Troubleshooting manual for paper makers
Paper Machine Troubleshooting manual for paper makersPaper Machine Troubleshooting manual for paper makers
Paper Machine Troubleshooting manual for paper makers
 
Microstructure of Hadfield Steels (Robert Hadfield)
Microstructure of Hadfield Steels (Robert Hadfield)Microstructure of Hadfield Steels (Robert Hadfield)
Microstructure of Hadfield Steels (Robert Hadfield)
 
Lesson2 Stoichiometry and mass balance.pdf
Lesson2 Stoichiometry and mass balance.pdfLesson2 Stoichiometry and mass balance.pdf
Lesson2 Stoichiometry and mass balance.pdf
 
Présentation IIRB 2024 Prévibest T. Leborgne
Présentation IIRB 2024 Prévibest T. LeborgnePrésentation IIRB 2024 Prévibest T. Leborgne
Présentation IIRB 2024 Prévibest T. Leborgne
 
Présentation de F. Joudelat Congrès IIRB février 2024
Présentation de F. Joudelat Congrès IIRB février 2024Présentation de F. Joudelat Congrès IIRB février 2024
Présentation de F. Joudelat Congrès IIRB février 2024
 
【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证
 
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdfForged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
Forged Fitting Socket Welding Standard- ASME-B16.11-2001.pdf
 
seminar power point presentation by Getahun Shanko.pptx
seminar power point presentation by Getahun Shanko.pptxseminar power point presentation by Getahun Shanko.pptx
seminar power point presentation by Getahun Shanko.pptx
 
Amplitude modulation and Demodulation Techniques
Amplitude modulation and Demodulation TechniquesAmplitude modulation and Demodulation Techniques
Amplitude modulation and Demodulation Techniques
 
fat and edible oil processsing.ppt, refining
fat and edible oil processsing.ppt, refiningfat and edible oil processsing.ppt, refining
fat and edible oil processsing.ppt, refining
 
Integrity Constraints in Database Management System.pptx
Integrity Constraints in Database Management System.pptxIntegrity Constraints in Database Management System.pptx
Integrity Constraints in Database Management System.pptx
 
MedTech R&D - Tamer Emara - resume @2024
MedTech R&D - Tamer Emara - resume @2024MedTech R&D - Tamer Emara - resume @2024
MedTech R&D - Tamer Emara - resume @2024
 
Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?
 
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh RajputBasic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
 

Data Mining Lecture_8(b).pptx

  • 1. DATA MINING LECTURE 8B Time series analysis and Sequence Segmentation Subrata Kumer Paul Assistant Professor, Dept. of CSE, BAUET sksubrata96@gmail.com
  • 2. Sequential data • Sequential data (or time series) refers to data that appear in a specific order. • The order defines a time axis, that differentiates this data from other cases we have seen so far • Examples • The price of a stock (or of many stocks) over time • Environmental data (pressure, temperature, precipitation etc) over time • The sequence of queries in a search engine, or the frequency of a query over time • The words in a document as they appear in order • A DNA sequence of nucleotides • Event occurrences in a log over time • Etc… • Time series: usually we assume that we have a vector of numeric values that change over time.
  • 3. Time-series data  Financial time series, process monitoring…
  • 4. Why deal with sequential data? • Because all data is sequential  • All data items arrive in the data store in some order • In some (many) cases the order does not matter • E.g., we can assume a bag of words model for a document • In many cases the order is of interest • E.g., stock prices do not make sense without the time information.
  • 5. Time series analysis • The addition of the time axis defines new sets of problems • Discovering periodic patterns in time series • Defining similarity between time series • Finding bursts, or outliers • Also, some existing problems need to be revisited taking sequential order into account • Association rules and Frequent Itemsets in sequential data • Summarization and Clustering: Sequence Segmentation
  • 6. Sequence Segmentation • Goal: discover structure in the sequence and provide a concise summary • Given a sequence T, segment it into K contiguous segments that are as homogeneous as possible • Similar to clustering but now we require the points in the cluster to be contiguous • Commonly used for summarization of histograms in databases
  • 7. Example t R t R Segmentation into 4 segments Homogeneity: points are close to the mean value (small error)
  • 8. Basic definitions • Sequence T = {t1,t2,…,tN}: an ordered set of N d-dimensional real points tiЄRd • A K-segmentation S: a partition of T into K contiguous segments {s1,s2,…,sK}. • Each segment sЄS is represented by a single vector μsЄRd (the representative of the segment -- same as the centroid of a cluster) • Error E(S): The error of replacing individual points with representatives • Different error functions, define different representatives. • Sum of Squares Error (SSE): 𝐸 𝑆 = 𝑠∈𝑆 𝑡∈𝑠 𝑡 − 𝜇𝑠 2 • Representative of segment s with SSE: mean 𝜇𝑠 = 1 |𝑠| 𝑡∈𝑠 𝑡
  • 9. Basic Definitions • Observation: a K-segmentation S is defined by K+1 boundary points 𝑏0, 𝑏1, … , 𝑏𝐾−1, 𝑏𝐾. • 𝑏0 = 0, 𝑏𝑘 = 𝑁 + 1 always. • We only need to specify 𝑏1, … , 𝑏𝐾−1 t R 𝑏0 𝑏1 𝑏2 𝑏3 𝑏4
  • 10. The K-segmentation problem • Similar to K-means clustering, but now we need the points in the clusters to respect the order of the sequence. • This actually makes the problem easier.  Given a sequence T of length N and a value K, find a K-segmentation S = {s1, s2, …,sK} of T such that the SSE error E is minimized.
  • 11. Optimal solution for the k-segmentation problem  [Bellman’61: The K-segmentation problem can be solved optimally using a standard dynamic- programming algorithm • Dynamic Programming: • Construct the solution of the problem by using solutions to problems of smaller size • Define the dynamic programming recursion • Build the solution bottom up from smaller to larger instances • Define the dynamic programming table that stores the solutions to the sub-problems
  • 12. Rule of thumb • Most optimization problems where order is involved can be solved optimally in polynomial time using dynamic programming. • The polynomial exponent may be large though
  • 13. Dynamic Programming Recursion • Terminology: • 𝑇[1, 𝑛]: subsequence {t1,t2,…,tn} for 𝑛 ≤ 𝑁 • 𝐸 𝑆[1, 𝑛], 𝑘 : error of optimal segmentation of subsequence 𝑇[1, 𝑛] with 𝑘 segments for 𝑘 ≤ 𝐾 • Dynamic Programming Recursion: 𝐸 𝑆 1, 𝑛 , 𝑘 = min 𝑘≤j≤n−1 𝐸 𝑆 1, 𝑗 , 𝑘 − 1 + 𝑗+1≤𝑡≤𝑛 𝑡 − 𝜇 𝑗+1,𝑛 2 Error of k-th (last) segment when the last segment is [j+1,n] Error of optimal segmentation S[1,j] with k-1 segments Minimum over all possible placements of the last boundary point 𝑏𝑘−1
  • 14. • Two−dimensional table 𝐴[1 … 𝐾, 1 … 𝑁] 𝐸 𝑆 1, 𝑛 , 𝑘 = min 𝑘≤j≤n−1 𝐸 𝑆 1, 𝑗 , 𝑘 − 1 + 𝑗+1≤𝑡≤𝑛 𝑡 − 𝜇 𝑗+1,𝑛 2 • Fill the table top to bottom, left to right. N 1 1 K Dynamic programming table k n 𝐴 𝑘, 𝑛 = 𝐸 𝑆 1, 𝑛 , 𝑘 Error of optimal K-segmentation
  • 15. Example R n-th point k = 3 Where should we place boundary 𝑏2 ? N 1 1 2 3 4 n 𝐸 𝑆 1, 𝑛 , 𝑘 = min 𝑘≤j≤n−1 𝐸 𝑆 1, 𝑗 , 𝑘 − 1 𝑏2 𝑏1
  • 16. Example R n-th point k = 3 Where should we place boundary 𝑏2 ? N 1 1 2 3 4 n 𝐸 𝑆 1, 𝑛 , 𝑘 = min 𝑘≤j≤n−1 𝐸 𝑆 1, 𝑗 , 𝑘 − 1 𝑏2 𝑏1
  • 17. Example R n-th point k = 3 Where should we place boundary 𝑏2 ? N 1 1 2 3 4 n 𝐸 𝑆 1, 𝑛 , 𝑘 = min 𝑘≤j≤n−1 𝐸 𝑆 1, 𝑗 , 𝑘 − 1 𝑏2 𝑏1
  • 18. Example R n-th point k = 3 Where should we place boundary 𝑏2 ? N 1 1 2 3 4 n 𝐸 𝑆 1, 𝑛 , 𝑘 = min 𝑘≤j≤n−1 𝐸 𝑆 1, 𝑗 , 𝑘 − 1 𝑏2 𝑏1
  • 19. Example R n-th point k = 3 Optimal segmentation S[1:n] N 1 1 2 3 4 n 𝑏2 𝑏1 The cell A[3,n] stores the error of the optimal solution 3-segmentation of T[1,n] In the cell (or in a different table) we also store the position n-3 of the boundary so we can trace back the segmentation n-3
  • 20. Dynamic-programming algorithm • Input: Sequence T, length N, K segments, error function E() • For i=1 to N //Initialize first row – A[1,i]=E(T[1…i]) //Error when everything is in one cluster • For k=1 to K // Initialize diagonal – A[k,k] = 0 // Error when each point in its own cluster • For k=2 to K – For i=k+1 to N • A[k,i] = minj<i{A[k-1,j]+E(T[j+1…i])} • To recover the actual segmentation (not just the optimal cost) store also the minimizing values j
  • 21. Algorithm Complexity • What is the complexity? • NK cells to fill • Computation per cell 𝐸 𝑆 1, 𝑛 , 𝑘 = min 𝑘≤j<n 𝐸 𝑆 1, 𝑗 , 𝑘 − 1 + 𝑗+1≤𝑡≤𝑛 𝑡 −
  • 22. Heuristics • Top-down greedy (TD): O(NK) • Introduce boundaries one at the time so that you get the largest decrease in error, until K segments are created. • Bottom-up greedy (BU): O(NlogN) • Merge adjacent points each time selecting the two points that cause the smallest increase in the error until K segments • Local Search Heuristics: O(NKI) • Assign the breakpoints randomly and then move them so that you reduce the error
  • 23. Other time series analysis • Using signal processing techniques is common for defining similarity between series • Fast Fourier Transform • Wavelets • Rich literature in the field