SlideShare a Scribd company logo
Sequence mining algorithm



           Monica Dăgădiţă
                        ISI
 Introduction
             to sequence mining
 Why sequence mining?
 Sequence mining algorithms
 SPADE
    Motivation
    Definitions and examples
    Algorithm
    Implementation




                     Data Mining   11/8/2011   2
 Aim - finding statistically relevant patterns
 between data examples where the values are
 delivered in a sequence

 Originallyintroduced for market basket
 analysis - customer behaviour predictions

2    types of sequence mining:
     string mining – biology (gene/protein sequences)
     itemset mining - marketing and CRM applications

                       Data Mining   11/8/2011   3
 Discovering   patterns:
    Bookstore: 70% of the people who buy Jane
     Austen’s “Pride and Prejudice” also buy “Emma”
     within a month
    Website: finding sequences of most frequently
     accessed pages

 Usage:
    Promotions
    Shelf placement
    Restructure the website
    Recommender systems

                     Data Mining   11/8/2011   4
 Apriori
 GSP  (Generalized Sequential Pattern)
 FreeSpan (Frequent pattern-projected
  Sequential pattern mining)
 PrefixSpan (Prefix-projected Sequential
  pattern mining)
 SPADE (Sequential PAttern Discovery using
  Equivalence classes)




                  Data Mining   11/8/2011   5
 Problems   of existing solutions
    Repeated database scans
    Complex internal data structures


 Key   features of SPADE:
    Fixed number of database scans
    Vertical id-list database format
    Decomposition of search space into smaller
     pieces – processed independently




                     Data Mining   11/8/2011      6
 Itemset:    set of m distinct items
   I = {i1, i2, …, im }
 Event: non-empty collection of items
   (i1,i2 … ik)
 Sequence : ordered list of events
  < e1 -> e2 -> … -> en >
 K-sequence : sequence with k items
  (B->AC) – 3-sequence



                  Data Mining   11/8/2011   7
 Subsequence:   given two sequences α=<a1 a2 … an>
 and β=<b1 b2 … bm>, α is called a subsequence of
 β, denoted as α⊆ β, if there exist integers 1≤ j1 < j2
 <…< jn ≤m such that a1 ⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn

  Examples:
  1. (B->AC) is a subsequence of (AB->E->ACD)
  2. (AB->E) is not a subsequence of (ABE)




                    Data Mining   11/8/2011     8
Data Mining   11/8/2011   9
Id-lists of the most frequent items (1-sequences)




                   Data Mining   11/8/2011   10
 D->BF->A
    Step 1: D->B




    Step 2: D->BF




                     Data Mining   11/8/2011   11
 D->BF->A
    Step 3 : D->BF->A




 Not   space-efficient
    Solution: 2 columns - (sid,eid) for each sequence
    Eid – id of the sequence’s last item


                      Data Mining   11/8/2011   12
 D->BF->A   (space-efficient id-list joins)
                                                               D->B

                                                       SID       EID
                                                       1         15
                                                       1         20
                                                       4         20




                   D->BF->A                                  D->BF

             SID       EID                         SID          EID
             1         25                          1            20
             4         25                          4            20


                         Data Mining   11/8/2011                      13
 Complete   latice representation




                   Data Mining   11/8/2011   14
Data Mining   11/8/2011   15
 Decomposing  the latice => smaller pieces
 that can be solved independently

 Equivalence   classes
 2 sequences are in the same class (Θk) if they
  share a common k length prefix
 Example
   k=1 : Θ1 -> {[A],[B],[D],[F]}




                    Data Mining   11/8/2011   16
Data Mining   11/8/2011   17
Data Mining   11/8/2011   18
 SPADE(min_sup,D)
  //min_sup – minimum_support
 //D –initial dataset
 F1<- {frequent items or 1-sequences}
 F2<- {frequent 2-sequences}
 Ε <- {equivalence classes [X] Θ1 }
 for all [X] in E
   enumerate_frequent_seq([X],min_sup)




                  Data Mining   11/8/2011   19
   Enumerate_frequent_seq(S,min_sup)
      for all Ai in S
          Ti <- {}
          for all Aj in S, with j≥i
              R<- Ai v Aj (join)
              if R satisfies min_sup
                   Ti <- Ti U {R}
          end
          Enumerate_frequent_seq(Ti , min_sup) //DFS
    end
    For all non-empty Ti
      Enumerate_frequent_seq(Ti , min_sup) //BFS


                       Data Mining   11/8/2011   20
 The   R Project for Statistical Computing
    developed at Bell Laboratories (formerly
     AT&T, now Lucent Technologies) by John
     Chambers and colleagues

    Different implementation of S language

    arulesSequences package




                      Data Mining   11/8/2011   21
Data Mining   11/8/2011   22

More Related Content

What's hot

Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Hoang Nguyen Phong
 
I.BEST FIRST SEARCH IN AI
I.BEST FIRST SEARCH IN AII.BEST FIRST SEARCH IN AI
I.BEST FIRST SEARCH IN AI
vikas dhakane
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
weak slot and filler structure
weak slot and filler structureweak slot and filler structure
weak slot and filler structure
Amey Kerkar
 
Sequential Pattern Mining and GSP
Sequential Pattern Mining and GSPSequential Pattern Mining and GSP
Sequential Pattern Mining and GSP
Hamidreza Mahdavipanah
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
Harini Balamurugan
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
janani thirupathi
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
Krish_ver2
 
Introduction and architecture of expert system
Introduction  and architecture of expert systemIntroduction  and architecture of expert system
Introduction and architecture of expert system
premdeshmane
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
Dabbal Singh Mahara
 
A* Algorithm
A* AlgorithmA* Algorithm
A* Algorithm
Dr. C.V. Suresh Babu
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
GauravBiswas9
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
Shiwani Gupta
 
CS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna UniversityCS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna University
P. Subathra Kishore, KAMARAJ College of Engineering and Technology, Madurai
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
Konpal Darakshan
 
Ch 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfCh 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdf
KrishnaMadala1
 
strong slot and filler
strong slot and fillerstrong slot and filler
Lab report for Prolog program in artificial intelligence.
Lab report for Prolog program in artificial intelligence.Lab report for Prolog program in artificial intelligence.
Lab report for Prolog program in artificial intelligence.
Alamgir Hossain
 
Unification and Lifting
Unification and LiftingUnification and Lifting
Unification and Lifting
Megha Sharma
 

What's hot (20)

Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
 
I.BEST FIRST SEARCH IN AI
I.BEST FIRST SEARCH IN AII.BEST FIRST SEARCH IN AI
I.BEST FIRST SEARCH IN AI
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
weak slot and filler structure
weak slot and filler structureweak slot and filler structure
weak slot and filler structure
 
Sequential Pattern Mining and GSP
Sequential Pattern Mining and GSPSequential Pattern Mining and GSP
Sequential Pattern Mining and GSP
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Introduction and architecture of expert system
Introduction  and architecture of expert systemIntroduction  and architecture of expert system
Introduction and architecture of expert system
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
A* Algorithm
A* AlgorithmA* Algorithm
A* Algorithm
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
 
CS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna UniversityCS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna University
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Ch 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfCh 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdf
 
strong slot and filler
strong slot and fillerstrong slot and filler
strong slot and filler
 
Lab report for Prolog program in artificial intelligence.
Lab report for Prolog program in artificial intelligence.Lab report for Prolog program in artificial intelligence.
Lab report for Prolog program in artificial intelligence.
 
Unification and Lifting
Unification and LiftingUnification and Lifting
Unification and Lifting
 

Similar to SPADE -

OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo RickliOSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
NETWAYS
 
FP-growth.pptx
FP-growth.pptxFP-growth.pptx
FP-growth.pptx
selvifitria1
 
Cdi implementation
Cdi implementationCdi implementation
Cdi implementation
Mohamed Salah
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Asuka Nakajima
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
Aabida Noman
 
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Michael Rush
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
mysqlops
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
kiran said
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
Kamal Singh Lodhi
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastner
liqiang xu
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
Keshav Murthy
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
shravanthium111
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
Cobus Bernard
 
Citation data flow 2012 nat latipat
Citation data flow 2012 nat latipatCitation data flow 2012 nat latipat
Citation data flow 2012 nat latipat
LATIPAT
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012
Vincent Michel
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
Kazuki Fujikawa
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
ScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilabTEC 2015 - KIT
ScilabTEC 2015 - KIT
Scilab
 
SMDMS'13
SMDMS'13SMDMS'13
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Marlon Dumas
 

Similar to SPADE - (20)

OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo RickliOSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
OSDC 2011 | NeDi - Network Discovery im RZ by Remo Rickli
 
FP-growth.pptx
FP-growth.pptxFP-growth.pptx
FP-growth.pptx
 
Cdi implementation
Cdi implementationCdi implementation
Cdi implementation
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
 
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
Formats for Exchanging Archival Data: An Introduction to EAD, EAC-CPF, and Ar...
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastner
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Citation data flow 2012 nat latipat
Citation data flow 2012 nat latipatCitation data flow 2012 nat latipat
Citation data flow 2012 nat latipat
 
Datamining at SemWebPro 2012
Datamining at SemWebPro 2012Datamining at SemWebPro 2012
Datamining at SemWebPro 2012
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
ScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilabTEC 2015 - KIT
ScilabTEC 2015 - KIT
 
SMDMS'13
SMDMS'13SMDMS'13
SMDMS'13
 
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
 

Recently uploaded

Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 

SPADE -

  • 1. Sequence mining algorithm Monica Dăgădiţă ISI
  • 2.  Introduction to sequence mining  Why sequence mining?  Sequence mining algorithms  SPADE  Motivation  Definitions and examples  Algorithm  Implementation Data Mining 11/8/2011 2
  • 3.  Aim - finding statistically relevant patterns between data examples where the values are delivered in a sequence  Originallyintroduced for market basket analysis - customer behaviour predictions 2 types of sequence mining:  string mining – biology (gene/protein sequences)  itemset mining - marketing and CRM applications Data Mining 11/8/2011 3
  • 4.  Discovering patterns:  Bookstore: 70% of the people who buy Jane Austen’s “Pride and Prejudice” also buy “Emma” within a month  Website: finding sequences of most frequently accessed pages  Usage:  Promotions  Shelf placement  Restructure the website  Recommender systems Data Mining 11/8/2011 4
  • 5.  Apriori  GSP (Generalized Sequential Pattern)  FreeSpan (Frequent pattern-projected Sequential pattern mining)  PrefixSpan (Prefix-projected Sequential pattern mining)  SPADE (Sequential PAttern Discovery using Equivalence classes) Data Mining 11/8/2011 5
  • 6.  Problems of existing solutions  Repeated database scans  Complex internal data structures  Key features of SPADE:  Fixed number of database scans  Vertical id-list database format  Decomposition of search space into smaller pieces – processed independently Data Mining 11/8/2011 6
  • 7.  Itemset: set of m distinct items I = {i1, i2, …, im }  Event: non-empty collection of items (i1,i2 … ik)  Sequence : ordered list of events < e1 -> e2 -> … -> en >  K-sequence : sequence with k items (B->AC) – 3-sequence Data Mining 11/8/2011 7
  • 8.  Subsequence: given two sequences α=<a1 a2 … an> and β=<b1 b2 … bm>, α is called a subsequence of β, denoted as α⊆ β, if there exist integers 1≤ j1 < j2 <…< jn ≤m such that a1 ⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn  Examples: 1. (B->AC) is a subsequence of (AB->E->ACD) 2. (AB->E) is not a subsequence of (ABE) Data Mining 11/8/2011 8
  • 9. Data Mining 11/8/2011 9
  • 10. Id-lists of the most frequent items (1-sequences) Data Mining 11/8/2011 10
  • 11.  D->BF->A  Step 1: D->B  Step 2: D->BF Data Mining 11/8/2011 11
  • 12.  D->BF->A  Step 3 : D->BF->A  Not space-efficient  Solution: 2 columns - (sid,eid) for each sequence  Eid – id of the sequence’s last item Data Mining 11/8/2011 12
  • 13.  D->BF->A (space-efficient id-list joins) D->B SID EID 1 15 1 20 4 20 D->BF->A D->BF SID EID SID EID 1 25 1 20 4 25 4 20 Data Mining 11/8/2011 13
  • 14.  Complete latice representation Data Mining 11/8/2011 14
  • 15. Data Mining 11/8/2011 15
  • 16.  Decomposing the latice => smaller pieces that can be solved independently  Equivalence classes 2 sequences are in the same class (Θk) if they share a common k length prefix Example k=1 : Θ1 -> {[A],[B],[D],[F]} Data Mining 11/8/2011 16
  • 17. Data Mining 11/8/2011 17
  • 18. Data Mining 11/8/2011 18
  • 19.  SPADE(min_sup,D) //min_sup – minimum_support //D –initial dataset F1<- {frequent items or 1-sequences} F2<- {frequent 2-sequences} Ε <- {equivalence classes [X] Θ1 } for all [X] in E enumerate_frequent_seq([X],min_sup) Data Mining 11/8/2011 19
  • 20. Enumerate_frequent_seq(S,min_sup) for all Ai in S Ti <- {} for all Aj in S, with j≥i R<- Ai v Aj (join) if R satisfies min_sup Ti <- Ti U {R} end Enumerate_frequent_seq(Ti , min_sup) //DFS end For all non-empty Ti Enumerate_frequent_seq(Ti , min_sup) //BFS Data Mining 11/8/2011 20
  • 21.  The R Project for Statistical Computing  developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues  Different implementation of S language  arulesSequences package Data Mining 11/8/2011 21
  • 22. Data Mining 11/8/2011 22