SlideShare a Scribd company logo
Sequence Mining Automata: a New Technique for Mining Frequent Sequences Under Regular Expressions Roberto Trasarti, Francesco Bonchi, Bart Goethals
Problem Definition (1): Given a database of sequences D, the support of a sequence S ∈ Σ∗ is the number of sequences in D that are supersequences of S: sup(S) = | {T ∈ D | S ⊑ T} |.  Given a Regular Expression R a sequence s is valid if can be generated by R. A B A C B A Sequence	s:  1 Minimum support: 3  	RE: A*BC* A A A B B C A B C C D A B A A B B C 2 C B A A B D A A A B 3 A A B Subsequence:                              Support: 3 Subsequence:                              Support: 2 … B C
Previous approaches and our contribution: Previous approaches [1,2,3] solve the problem focusing on its search space, exploiting in different ways the pruning power of the regular expression  R over unpromising patterns. The idea behind our solution is to focus on the input dataset and the given regular expression: reading the input database we produce for each sequence in the database, all and only the valid patterns contained in the sequences. [1] H. Albert-Lorincz and J.-F. Boulicaut. Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In Proc. of SDM’03. [2] M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of VLDB’99. [3] J. Pei, J. Han, andW.Wang. Mining sequential patterns with constraints in large databases. In Proc. of CIKM’02. A B ...  A C A B C A ...  B ...  A A ...  ...  C ...  C A B ...  A B A C B A A A A B B C ...
Sequence Mining Automata (1): Our subsequences mining automata SMA is a specialized kind of Petri Net, which can be constructed from a DFA by transforming each edge of the DFA in a transition with its two arcs from its input place and to its output place.  Moreover it has the following peculiarities: • Transitions do not consume tokens• Parallel execution • External signal The initial marking consists of only the token representing the empty sequence ε in the starting places.  External signal Example RE: A*B(B|C)D*E
Sequence Mining Automata (2): Each transition applies an process which is activated only if the external signal is equal to the label of the edge. This process produces a new set of tokens in the destination  place. External signal Example RE: A*B(B|C)D*E
Sequence Mining Automata (3 Example): Given R ≡ A∗B(B|C)D∗E S ≡ ACDBFAEBCFDE
One-Pass Solution (SMA-1P) and Full-Cut (SMA-FC) Simply using the SMA on each transactions and at the end compute the support for each sequences extracted filtering using the support threshold. The support threshold is not used during the process of generation. We compute All the sequences in the dataset w.r.t the RE. A D B B E C Given a SMA a valid set of cuts is a partition p1, . . . , pn of the places of the SMA such as does not exist a path from a place in pj to a place in pi if j > i. For each cut we apply the SMA-1P on all the DB. At the end of the i-th scan we obtain an intermediate information about frequent patterns that can be used in subsequent scans by removing the infrequent tokens.
Experiments (Synthetic Data): (D=dataset size, N=number of items, C=average length)
Experiments (Mobility data): From San Jose to San Francisco and back – via CA-101 (west-bound of the bay), i.e., passing through San Mateo (cell H9 of our map); or via I-880 (east-bound of the bay), i.e., passing through Hayward (cell J8 of our map).
Conclusions:  We have introduced “Sequence Mining Automata”, a new mechanism for mining frequent sequences under regular expressions.   Around this basic mechanism we built a family of algorithms embedding different techniques.   The efficiency of our proposal has been thoroughly proven empirically.   The SMA is a very simple and fundamental mechanism opening the door to many possible extensions.

More Related Content

What's hot

Breadth first search signed
Breadth first search signedBreadth first search signed
Breadth first search signed
AfshanKhan51
 
Propulsion ii
Propulsion iiPropulsion ii
Propulsion ii
Saravanan Atthiappan
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1Vin Voro
 
2.7 normal forms cnf & problems
2.7 normal forms  cnf & problems2.7 normal forms  cnf & problems
2.7 normal forms cnf & problems
Sampath Kumar S
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2Vin Voro
 
22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpadMedia4math
 
Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)SRI TECHNOLOGICAL SOLUTIONS
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5Vin Voro
 
Prepostinfix
PrepostinfixPrepostinfix
Prepostinfix
MohitKumawat27
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016
appasami
 
Assignment2
Assignment2Assignment2
Assignment2
Kiran Acharya
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4Vin Voro
 
Lo18
Lo18Lo18
Lo18
liankei
 
Turing machine
Turing machineTuring machine
Turing machine
nirob nahin
 
DFS & BFS in Computer Algorithm
DFS & BFS in Computer AlgorithmDFS & BFS in Computer Algorithm
DFS & BFS in Computer Algorithm
Meghaj Mallick
 
Adding new Query to Druid
Adding new Query to DruidAdding new Query to Druid
Adding new Query to Druid
Navis Ryu
 
Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015
appasami
 

What's hot (20)

Breadth first search signed
Breadth first search signedBreadth first search signed
Breadth first search signed
 
Propulsion ii
Propulsion iiPropulsion ii
Propulsion ii
 
Tele3113 tut1
Tele3113 tut1Tele3113 tut1
Tele3113 tut1
 
Mid term
Mid termMid term
Mid term
 
2.7 normal forms cnf & problems
2.7 normal forms  cnf & problems2.7 normal forms  cnf & problems
2.7 normal forms cnf & problems
 
Tele3113 tut2
Tele3113 tut2Tele3113 tut2
Tele3113 tut2
 
Cs 62
Cs 62Cs 62
Cs 62
 
22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad22. trig identitiessumdiffsinecosinetouchpad
22. trig identitiessumdiffsinecosinetouchpad
 
Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)Applied maths for electronics engineers june 2013 (2)
Applied maths for electronics engineers june 2013 (2)
 
Sns pre sem
Sns pre semSns pre sem
Sns pre sem
 
Tele3113 tut5
Tele3113 tut5Tele3113 tut5
Tele3113 tut5
 
Prepostinfix
PrepostinfixPrepostinfix
Prepostinfix
 
Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016Cs2303 theory of computation may june 2016
Cs2303 theory of computation may june 2016
 
Assignment2
Assignment2Assignment2
Assignment2
 
Tele3113 tut4
Tele3113 tut4Tele3113 tut4
Tele3113 tut4
 
Lo18
Lo18Lo18
Lo18
 
Turing machine
Turing machineTuring machine
Turing machine
 
DFS & BFS in Computer Algorithm
DFS & BFS in Computer AlgorithmDFS & BFS in Computer Algorithm
DFS & BFS in Computer Algorithm
 
Adding new Query to Druid
Adding new Query to DruidAdding new Query to Druid
Adding new Query to Druid
 
Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015Cs2303 theory of computation november december 2015
Cs2303 theory of computation november december 2015
 

Viewers also liked

5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdataPranab Ghosh
 
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...Thanh Hieu
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
SlideShare
 

Viewers also liked (10)

5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Real timefrauddetectiononbigdata
Real timefrauddetectiononbigdataReal timefrauddetectiononbigdata
Real timefrauddetectiononbigdata
 
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Temporal data mining
Temporal data miningTemporal data mining
Temporal data mining
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
SPADE -
SPADE - SPADE -
SPADE -
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Similar to Sma

Er24902905
Er24902905Er24902905
Er24902905
IJERA Editor
 
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
CSCJournals
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
IJMIT JOURNAL
 
International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)
IJMIT JOURNAL
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
IJMIT JOURNAL
 
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Beniamino Murgante
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
Usatyuk Vasiliy
 
MATEX @ DAC14
MATEX @ DAC14MATEX @ DAC14
MATEX @ DAC14
Hao Zhuang
 
DC_PPT.pptx
DC_PPT.pptxDC_PPT.pptx
DC_PPT.pptx
RahulAgarwal505237
 
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Vishalkumarec
 
Iaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detection
Iaetsd Iaetsd
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
Usatyuk Vasiliy
 
Lecture 3 sapienza 2017
Lecture 3 sapienza 2017Lecture 3 sapienza 2017
Lecture 3 sapienza 2017
Franco Bontempi Org Didattica
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
Genome Reference Consortium
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
Jonny Daenen
 
Baseband transmission
Baseband transmissionBaseband transmission
Baseband transmission
Punk Pankaj
 
Acquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss SignalAcquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss Signal
IJMER
 

Similar to Sma (20)

Er24902905
Er24902905Er24902905
Er24902905
 
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
 
Lect6 csp
Lect6 cspLect6 csp
Lect6 csp
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
 
International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
 
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
Theories and Applications of Spatial-Temporal Data Mining and Knowledge Disco...
 
Nc2421532161
Nc2421532161Nc2421532161
Nc2421532161
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
MATEX @ DAC14
MATEX @ DAC14MATEX @ DAC14
MATEX @ DAC14
 
DC_PPT.pptx
DC_PPT.pptxDC_PPT.pptx
DC_PPT.pptx
 
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
Thesis : "IBBET : In Band Bandwidth Estimation for LAN"
 
Iaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detection
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Lecture 3 sapienza 2017
Lecture 3 sapienza 2017Lecture 3 sapienza 2017
Lecture 3 sapienza 2017
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Baseband transmission
Baseband transmissionBaseband transmission
Baseband transmission
 
Acquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss SignalAcquisition of Long Pseudo Code in Dsss Signal
Acquisition of Long Pseudo Code in Dsss Signal
 

More from Roberto Trasarti

Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
Preserving Privacy in Semantic-Rich Trajectories of Human MobilityPreserving Privacy in Semantic-Rich Trajectories of Human Mobility
Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
Roberto Trasarti
 
Cast
CastCast
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti
 

More from Roberto Trasarti (8)

Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
Preserving Privacy in Semantic-Rich Trajectories of Human MobilityPreserving Privacy in Semantic-Rich Trajectories of Human Mobility
Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
 
Cast
CastCast
Cast
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD Thesis
 
Athena
AthenaAthena
Athena
 
K-BestMatch
K-BestMatchK-BestMatch
K-BestMatch
 
Where Next
Where NextWhere Next
Where Next
 
Daedalus
DaedalusDaedalus
Daedalus
 
ConQueSt
ConQueStConQueSt
ConQueSt
 

Recently uploaded

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Sma

  • 1. Sequence Mining Automata: a New Technique for Mining Frequent Sequences Under Regular Expressions Roberto Trasarti, Francesco Bonchi, Bart Goethals
  • 2. Problem Definition (1): Given a database of sequences D, the support of a sequence S ∈ Σ∗ is the number of sequences in D that are supersequences of S: sup(S) = | {T ∈ D | S ⊑ T} |. Given a Regular Expression R a sequence s is valid if can be generated by R. A B A C B A Sequence s: 1 Minimum support: 3 RE: A*BC* A A A B B C A B C C D A B A A B B C 2 C B A A B D A A A B 3 A A B Subsequence: Support: 3 Subsequence: Support: 2 … B C
  • 3. Previous approaches and our contribution: Previous approaches [1,2,3] solve the problem focusing on its search space, exploiting in different ways the pruning power of the regular expression R over unpromising patterns. The idea behind our solution is to focus on the input dataset and the given regular expression: reading the input database we produce for each sequence in the database, all and only the valid patterns contained in the sequences. [1] H. Albert-Lorincz and J.-F. Boulicaut. Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In Proc. of SDM’03. [2] M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of VLDB’99. [3] J. Pei, J. Han, andW.Wang. Mining sequential patterns with constraints in large databases. In Proc. of CIKM’02. A B ... A C A B C A ... B ... A A ... ... C ... C A B ... A B A C B A A A A B B C ...
  • 4. Sequence Mining Automata (1): Our subsequences mining automata SMA is a specialized kind of Petri Net, which can be constructed from a DFA by transforming each edge of the DFA in a transition with its two arcs from its input place and to its output place. Moreover it has the following peculiarities: • Transitions do not consume tokens• Parallel execution • External signal The initial marking consists of only the token representing the empty sequence ε in the starting places. External signal Example RE: A*B(B|C)D*E
  • 5. Sequence Mining Automata (2): Each transition applies an process which is activated only if the external signal is equal to the label of the edge. This process produces a new set of tokens in the destination place. External signal Example RE: A*B(B|C)D*E
  • 6. Sequence Mining Automata (3 Example): Given R ≡ A∗B(B|C)D∗E S ≡ ACDBFAEBCFDE
  • 7. One-Pass Solution (SMA-1P) and Full-Cut (SMA-FC) Simply using the SMA on each transactions and at the end compute the support for each sequences extracted filtering using the support threshold. The support threshold is not used during the process of generation. We compute All the sequences in the dataset w.r.t the RE. A D B B E C Given a SMA a valid set of cuts is a partition p1, . . . , pn of the places of the SMA such as does not exist a path from a place in pj to a place in pi if j > i. For each cut we apply the SMA-1P on all the DB. At the end of the i-th scan we obtain an intermediate information about frequent patterns that can be used in subsequent scans by removing the infrequent tokens.
  • 8. Experiments (Synthetic Data): (D=dataset size, N=number of items, C=average length)
  • 9. Experiments (Mobility data): From San Jose to San Francisco and back – via CA-101 (west-bound of the bay), i.e., passing through San Mateo (cell H9 of our map); or via I-880 (east-bound of the bay), i.e., passing through Hayward (cell J8 of our map).
  • 10. Conclusions: We have introduced “Sequence Mining Automata”, a new mechanism for mining frequent sequences under regular expressions. Around this basic mechanism we built a family of algorithms embedding different techniques. The efficiency of our proposal has been thoroughly proven empirically. The SMA is a very simple and fundamental mechanism opening the door to many possible extensions.