SlideShare a Scribd company logo
GUIDE : MS. ANAGHA CHAUDHARI
A sequence : < (ef) (ab) (df) c b >
A sequence database
SID        sequence             An element may contain a set of items.
                                Items within an element are unordered
10     <a(abc)(ac)d(cf)>
                                and we list them alphabetically.
20      <(ad)c(bc)(ae)>
30      <(ef)(ab)(df)cb>        <a(bc)df> is a subsequence of
40        <eg(af)cbc>           <a(abc)(ac)d(cf)>

  Given support threshold min_sup =2, <(ab)c> is a sequential
  pattern                                                                6
CHALLENGES ON SEQUENTIAL
PATTERN MINING
 A huge number of possible sequential patterns are hidden in
  databases

 A mining algorithm should
    find the complete set of patterns, when possible, satisfying the
     minimum support (frequency) threshold
    be highly efficient, scalable, involving only a small number of
     database scans
    be able to incorporate various kinds of user-specific
     constraints

                                                        7
The Apriori Algorithm—An Example
                      Supmin = 2      Itemset       sup
                                                                     Itemset     sup
Database TDB                             {A}         2
 Tid        Items
                                                           L1          {A}         2
                               C1        {B}         3
                                                                       {B}         3
 10         A, C, D                      {C}         3
                          1st scan                                     {C}         3
 20         B, C, E                      {D}         1
                                                                       {E}         3
 30     A, B, C, E                       {E}         3
 40          B, E
                              C2     Itemset    sup               C2         Itemset
                                      {A, B}     1
 L2    Itemset        sup                                 2nd scan            {A, B}
                                      {A, C}     2
        {A, C}         2                                                      {A, C}
                                      {A, E}     1
        {B, C}         2
                                      {B, C}     2                            {A, E}
        {B, E}         3
                                      {B, E}     3                            {B, C}
        {C, E}         2
                                      {C, E}     2                            {B, E}
                                                                              {C, E}

              Itemset
                              3rd scan         L3   Itemset     sup
       C3
              {B, C, E}                             {B, C, E}    2
                                                                                       10
The Apriori Algorithm [Pseudo-Code]

Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk != ; k++) do begin
  Ck+1 = candidates generated from Lk;
  for each transaction t in database do
    increment the count of all candidates in Ck+1 that are
     contained in t
  Lk+1 = candidates in Ck+1 with min_support
  end
return k Lk;
                                                             11
APRIORI ADV/DISADV

 Advantages:
   Uses large itemset property.
   Easily parallelized
   Easy to implement.

 Disadvantages:
   Assumes transaction database is memory resident.
   Requires up to m database scans.
   J. Han, J. Pei, and Y. Yin 2000
   Depth-first search
   Avoid explicit candidate generation
   Adopt divide-and-conquer strategy
   Two step approach
    Step1:Build a compact data
          structure called FP tree
    Step2:Extract frequent itemsets
           from FP tree.
Step 1: FP-Tree Construction
 FP-Tree is constructed using 2 passes over the data-set:

  Pass 1:
    Scan data and find support for each item.
    Discard infrequent items.
    Sort frequent items in decreasing order based on
      their support.
Pass 2:

Nodes correspond to items and have a counter

1.     FP-Growth reads 1 transaction at a time and maps it to a path

2.     Fixed order is used, so paths can overlap when transactions share items (when
       they have the same prfix ).
     – In this case, counters are incremented

3.      Pointers are maintained between nodes containing the same item, creating singly
       linked lists (dotted lines)
     – The more paths that overlap, the higher the compression. FP-tree may fit in
       memory.

4.     Frequent itemsets extracted from the FP-Tree.
 Start from each frequent length-1 pattern (as an initial suffix
  pattern)
 construct its conditional pattern base (a ―subdatabase,‖which
  consists of the set of prefix paths in the FP-tree co-occurring
  with the suffix pattern)
 Construct its (conditional) FP-tree, and perform mining
  recursively on such a tree.
 The pattern growth is achieved by the concatenation of the
  suffix pattern with the frequent patterns generated from a
  conditional FP-tree.
Table : Table after
                             first scan of database
Table : Transactional data
Fig . FP – Tree Construction
EXAMPLE CONT




Table:Mining FP Tree by creating conditional (sub)-pattern bases
EXAMPLE CONT




Fig.The conditional FP-tree associated with the conditiona node I3
FP-FROWTH ADV/DISADV

Advantages of FP-Growth
  • only 2 passes over data-set
  • ―compresses‖ data-set
  • no candidate generation
  • much faster than Apriori

Disadvantages of FP-Growth
  • FP-Tree may not fit in memory!!
  • FP-Tree is expensive to build
APPLICATIONS



Customer shopping sequences:
   First buy computer, then CD-ROM, and then digital camera, within 3
    months.

Medical treatments, natural disasters (e.g., earthquakes), science
 & eng. processes, stocks and markets, etc.
Telephone calling patterns, Weblog click streams
DNA sequences and gene structures


                                                                  22
THANK YOU
Sequential pattern mining

More Related Content

What's hot

What's hot (20)

Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Density based methods
Density based methodsDensity based methods
Density based methods
 
Fp growth
Fp growthFp growth
Fp growth
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Content Based Image Retrieval
Content Based Image Retrieval Content Based Image Retrieval
Content Based Image Retrieval
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Image Restoration
Image RestorationImage Restoration
Image Restoration
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
APRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptxAPRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptx
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 

Similar to Sequential pattern mining

Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
Albert Orriols-Puig
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
Shani729
 
ARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .pptARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .ppt
ChellamuthuHaripriya
 
data structure and algorithm Array.pptx btech 2nd year
data structure and algorithm  Array.pptx btech 2nd yeardata structure and algorithm  Array.pptx btech 2nd year
data structure and algorithm Array.pptx btech 2nd year
palhimanshi999
 
Ashish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish garg research paper 660_CamReady
Ashish garg research paper 660_CamReady
Ashish Garg
 
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docxConsider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
maxinesmith73660
 

Similar to Sequential pattern mining (20)

Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Speeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using CodesSpeeding Up Distributed Machine Learning Using Codes
Speeding Up Distributed Machine Learning Using Codes
 
My6asso
My6assoMy6asso
My6asso
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
ARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .pptARM_03_FPtreefrequency pattern data warehousing .ppt
ARM_03_FPtreefrequency pattern data warehousing .ppt
 
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHMA PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
 
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHMA PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoder
 
Mining Approach for Updating Sequential Patterns
Mining Approach for Updating Sequential PatternsMining Approach for Updating Sequential Patterns
Mining Approach for Updating Sequential Patterns
 
Computer science ms
Computer science msComputer science ms
Computer science ms
 
Datastage real time scenario
Datastage real time scenarioDatastage real time scenario
Datastage real time scenario
 
data structure and algorithm Array.pptx btech 2nd year
data structure and algorithm  Array.pptx btech 2nd yeardata structure and algorithm  Array.pptx btech 2nd year
data structure and algorithm Array.pptx btech 2nd year
 
3rd Semester Computer Science and Engineering (ACU-2022) Question papers
3rd Semester Computer Science and Engineering  (ACU-2022) Question papers3rd Semester Computer Science and Engineering  (ACU-2022) Question papers
3rd Semester Computer Science and Engineering (ACU-2022) Question papers
 
Adobe
AdobeAdobe
Adobe
 
2nd Semester M Tech: Structural Engineering (June-2015) Question Papers
2nd  Semester M Tech: Structural Engineering  (June-2015) Question Papers2nd  Semester M Tech: Structural Engineering  (June-2015) Question Papers
2nd Semester M Tech: Structural Engineering (June-2015) Question Papers
 
Ashish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish garg research paper 660_CamReady
Ashish garg research paper 660_CamReady
 
CS Sample Paper 1
CS Sample Paper 1CS Sample Paper 1
CS Sample Paper 1
 
Dat 305 dat305 dat 305 education for service uopstudy.com
Dat 305 dat305 dat 305 education for service   uopstudy.comDat 305 dat305 dat 305 education for service   uopstudy.com
Dat 305 dat305 dat 305 education for service uopstudy.com
 
Data structure-question-bank
Data structure-question-bankData structure-question-bank
Data structure-question-bank
 
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docxConsider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
 

Recently uploaded

Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 

Recently uploaded (20)

Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resources
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdf
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
NLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptxNLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 

Sequential pattern mining

  • 1. GUIDE : MS. ANAGHA CHAUDHARI
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. A sequence : < (ef) (ab) (df) c b > A sequence database SID sequence An element may contain a set of items. Items within an element are unordered 10 <a(abc)(ac)d(cf)> and we list them alphabetically. 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> <a(bc)df> is a subsequence of 40 <eg(af)cbc> <a(abc)(ac)d(cf)> Given support threshold min_sup =2, <(ab)c> is a sequential pattern 6
  • 7. CHALLENGES ON SEQUENTIAL PATTERN MINING A huge number of possible sequential patterns are hidden in databases A mining algorithm should  find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold  be highly efficient, scalable, involving only a small number of database scans  be able to incorporate various kinds of user-specific constraints 7
  • 8.
  • 9.
  • 10. The Apriori Algorithm—An Example Supmin = 2 Itemset sup Itemset sup Database TDB {A} 2 Tid Items L1 {A} 2 C1 {B} 3 {B} 3 10 A, C, D {C} 3 1st scan {C} 3 20 B, C, E {D} 1 {E} 3 30 A, B, C, E {E} 3 40 B, E C2 Itemset sup C2 Itemset {A, B} 1 L2 Itemset sup 2nd scan {A, B} {A, C} 2 {A, C} 2 {A, C} {A, E} 1 {B, C} 2 {B, C} 2 {A, E} {B, E} 3 {B, E} 3 {B, C} {C, E} 2 {C, E} 2 {B, E} {C, E} Itemset 3rd scan L3 Itemset sup C3 {B, C, E} {B, C, E} 2 10
  • 11. The Apriori Algorithm [Pseudo-Code] Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk != ; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk; 11
  • 12. APRIORI ADV/DISADV  Advantages:  Uses large itemset property.  Easily parallelized  Easy to implement.  Disadvantages:  Assumes transaction database is memory resident.  Requires up to m database scans.
  • 13. J. Han, J. Pei, and Y. Yin 2000  Depth-first search  Avoid explicit candidate generation  Adopt divide-and-conquer strategy  Two step approach Step1:Build a compact data structure called FP tree Step2:Extract frequent itemsets from FP tree.
  • 14. Step 1: FP-Tree Construction  FP-Tree is constructed using 2 passes over the data-set: Pass 1:  Scan data and find support for each item.  Discard infrequent items.  Sort frequent items in decreasing order based on their support.
  • 15. Pass 2: Nodes correspond to items and have a counter 1. FP-Growth reads 1 transaction at a time and maps it to a path 2. Fixed order is used, so paths can overlap when transactions share items (when they have the same prfix ). – In this case, counters are incremented 3. Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines) – The more paths that overlap, the higher the compression. FP-tree may fit in memory. 4. Frequent itemsets extracted from the FP-Tree.
  • 16.  Start from each frequent length-1 pattern (as an initial suffix pattern)  construct its conditional pattern base (a ―subdatabase,‖which consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern)  Construct its (conditional) FP-tree, and perform mining recursively on such a tree.  The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree.
  • 17. Table : Table after first scan of database Table : Transactional data
  • 18. Fig . FP – Tree Construction
  • 19. EXAMPLE CONT Table:Mining FP Tree by creating conditional (sub)-pattern bases
  • 20. EXAMPLE CONT Fig.The conditional FP-tree associated with the conditiona node I3
  • 21. FP-FROWTH ADV/DISADV Advantages of FP-Growth • only 2 passes over data-set • ―compresses‖ data-set • no candidate generation • much faster than Apriori Disadvantages of FP-Growth • FP-Tree may not fit in memory!! • FP-Tree is expensive to build
  • 22. APPLICATIONS Customer shopping sequences:  First buy computer, then CD-ROM, and then digital camera, within 3 months. Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. Telephone calling patterns, Weblog click streams DNA sequences and gene structures 22
  • 23.
  • 24.
  • 25.