SlideShare a Scribd company logo
1 of 1
Download to read offline
DFA Minimization in Map-Reduce
Gösta Grahne, Shahab Harrafi, Iraj Hedayati, Ali Moallemi
Department of Computer Science and Software Engineering, Concordia University
Introduction
DFA Minimization is the process of finding equivalent minimal DFA
DFA A = (Q, Σ, δ, qs, F)
DFA Minimization DFA minimization is the process of discovering an
equivalent DFA to given one with minimum number of states.
Hopcroft Hopcroft’s algorithm is considered superior due to its
running time of O(n log n)
q2
2
p1
1
p3
1
q4
4
q1
2
p2
1
p4
1
q3
3
a
a
a
a
q2
2
p1
1
p3
1
q4
4
q1
2
p2
1
p4
1
q3
3
a
a
a
a
Figure: Hopcroft minimization method
Moore Iteratively computes equivalence class of each state as
p ≡i q ⇔ p ≡i−1 q ∧ ∀a ∈ Σδ(p, a) ≡i−1 q
Map-Reduce a parallel programming model that can work over large
clusters of commodity computers.
Challenges
Huge amount of data
Complex graph based structure
Iterative problem
Moore’s algorithm in MapReduce
Hopcroft’s algorithm in MapReduce
Communication Cost of the Algorithms
Communication cost can be calculated as:
Number of rounds × (Replication rate × Input size + Output size)
Number of rounds: O(n)
Replication rate: O(1)
Input size = Output size
Moore-MR: Record size of output of first job is Θ(k log n). Thus communication cost of each round is
Θ(k2
n log n). Therefore total comunication cost is O(k2
n2
log n).
Hopcroft-MR: There are O(n log n) updates in parallel execution at each round. Thus it requires O(kn2
log n)
bits of communication.
Experimental Results
Findings
Figure: Evenly distributed DFA
Figure: Effect of number of rounds
Figure: Effect of number of alphabet symbols
Figure: Effect of skewness
Conclusion
Hopcroft-MR outperforms Moore-MR
in communication cost when the cardinality of the alphabet is
at least 16,
in wall-clock time when the cardinality is at least 32
in communication cost when number of rounds is more than
128
Both algorithms are equally sensitive to skewness in the
input data.
Future work,
There is potential to reduce skew-sensitiveness in Moore-MR.
Investigate the average communication cost
Reducer capacity vs. Number of rounds
Presented at ACM-Sigmod Beyond MR Workshopin San Francisco Ca., July 2016 {grahne, s_harraf, h_iraj, moa_ali}@encs.concordia.ca

More Related Content

What's hot

DFA Minimization in Map-Reduce
DFA Minimization in Map-ReduceDFA Minimization in Map-Reduce
DFA Minimization in Map-ReduceIraj Hedayati
 
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operatorjfrchicanog
 
Asymptotic Analysis in Data Structure using C
Asymptotic Analysis in Data Structure using CAsymptotic Analysis in Data Structure using C
Asymptotic Analysis in Data Structure using CMeghaj Mallick
 
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...Lviv Data Science Summer School
 
Graph based cryptographic hash functions
Graph based cryptographic hash functionsGraph based cryptographic hash functions
Graph based cryptographic hash functionsmskmoorthy
 
Variational Inference in Python
Variational Inference in PythonVariational Inference in Python
Variational Inference in PythonPeadar Coyle
 
A small debate of power of randomness
A small debate of power of randomnessA small debate of power of randomness
A small debate of power of randomnessAbner Chih Yi Huang
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...CRS4 Research Center in Sardinia
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Till Rohrmann
 
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...CRS4 Research Center in Sardinia
 
0015.register allocation-graph-coloring
0015.register allocation-graph-coloring0015.register allocation-graph-coloring
0015.register allocation-graph-coloringsean chen
 
Basic Computer Engineering Unit II as per RGPV Syllabus
Basic Computer Engineering Unit II as per RGPV SyllabusBasic Computer Engineering Unit II as per RGPV Syllabus
Basic Computer Engineering Unit II as per RGPV SyllabusNANDINI SHARMA
 
Machine learning session 9
Machine learning session 9Machine learning session 9
Machine learning session 9NirsandhG
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
 

What's hot (20)

DFA Minimization in Map-Reduce
DFA Minimization in Map-ReduceDFA Minimization in Map-Reduce
DFA Minimization in Map-Reduce
 
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operator
 
Asymptotic Analysis in Data Structure using C
Asymptotic Analysis in Data Structure using CAsymptotic Analysis in Data Structure using C
Asymptotic Analysis in Data Structure using C
 
First Technical Paper
First Technical PaperFirst Technical Paper
First Technical Paper
 
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
Master defence 2020 - Dmitri Glusco - Replica Exchange For Multiple-Environme...
 
Al2ed chapter16
Al2ed chapter16Al2ed chapter16
Al2ed chapter16
 
Graph based cryptographic hash functions
Graph based cryptographic hash functionsGraph based cryptographic hash functions
Graph based cryptographic hash functions
 
Matlab bode diagram_instructions
Matlab bode diagram_instructionsMatlab bode diagram_instructions
Matlab bode diagram_instructions
 
Variational Inference in Python
Variational Inference in PythonVariational Inference in Python
Variational Inference in Python
 
A small debate of power of randomness
A small debate of power of randomnessA small debate of power of randomness
A small debate of power of randomness
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
 
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
 
20180722 pyro
20180722 pyro20180722 pyro
20180722 pyro
 
0015.register allocation-graph-coloring
0015.register allocation-graph-coloring0015.register allocation-graph-coloring
0015.register allocation-graph-coloring
 
Approximation algorithms
Approximation  algorithms Approximation  algorithms
Approximation algorithms
 
Basic Computer Engineering Unit II as per RGPV Syllabus
Basic Computer Engineering Unit II as per RGPV SyllabusBasic Computer Engineering Unit II as per RGPV Syllabus
Basic Computer Engineering Unit II as per RGPV Syllabus
 
Machine learning session 9
Machine learning session 9Machine learning session 9
Machine learning session 9
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 

Viewers also liked

Viewers also liked (14)

Three way join in one round on hadoop
Three way join in one round on hadoopThree way join in one round on hadoop
Three way join in one round on hadoop
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
August Cover
August CoverAugust Cover
August Cover
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 
Como evitar enfermarse
Como evitar enfermarseComo evitar enfermarse
Como evitar enfermarse
 

Similar to poster

Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceShantanu Sharma
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBigMine
 
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...Priyanka Aash
 
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewIJARIIT
 
Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...
Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...
Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...ijceronline
 
Применение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботамиПрименение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботамиSkolkovo Robotics Center
 
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNELNEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNELijcseit
 
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexityAnkit Katiyar
 
Joint Timing and Frequency Synchronization in OFDM
Joint Timing and Frequency Synchronization in OFDMJoint Timing and Frequency Synchronization in OFDM
Joint Timing and Frequency Synchronization in OFDMidescitation
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
PAPR REDUCTION OF OFDM SIGNAL BY USING COMBINED HADAMARD AND MODIFIED MEU-LAW...
PAPR REDUCTION OF OFDM SIGNAL BY USING COMBINED HADAMARD AND MODIFIED MEU-LAW...PAPR REDUCTION OF OFDM SIGNAL BY USING COMBINED HADAMARD AND MODIFIED MEU-LAW...
PAPR REDUCTION OF OFDM SIGNAL BY USING COMBINED HADAMARD AND MODIFIED MEU-LAW...IJCNCJournal
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Computationally Efficient Algorithm for Reducing PAPR in OFDM using Null Subc...
Computationally Efficient Algorithm for Reducing PAPR in OFDM using Null Subc...Computationally Efficient Algorithm for Reducing PAPR in OFDM using Null Subc...
Computationally Efficient Algorithm for Reducing PAPR in OFDM using Null Subc...idescitation
 
Is Multipath Routing Really a Panacea?
Is Multipath Routing Really a Panacea?Is Multipath Routing Really a Panacea?
Is Multipath Routing Really a Panacea?d_medhi
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
 

Similar to poster (20)

Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduce
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina Morik
 
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...
 
Combinatorial Optimization
Combinatorial OptimizationCombinatorial Optimization
Combinatorial Optimization
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a review
 
Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...
Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...
Reduced Energy Min-Max Decoding Algorithm for Ldpc Code with Adder Correction...
 
papr-presentation
papr-presentationpapr-presentation
papr-presentation
 
Применение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботамиПрименение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботами
 
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNELNEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
NEW BER ANALYSIS OF OFDM SYSTEM OVER NAKAGAMI-n (RICE) FADING CHANNEL
 
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
 
Joint Timing and Frequency Synchronization in OFDM
Joint Timing and Frequency Synchronization in OFDMJoint Timing and Frequency Synchronization in OFDM
Joint Timing and Frequency Synchronization in OFDM
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
PAPR REDUCTION OF OFDM SIGNAL BY USING COMBINED HADAMARD AND MODIFIED MEU-LAW...
PAPR REDUCTION OF OFDM SIGNAL BY USING COMBINED HADAMARD AND MODIFIED MEU-LAW...PAPR REDUCTION OF OFDM SIGNAL BY USING COMBINED HADAMARD AND MODIFIED MEU-LAW...
PAPR REDUCTION OF OFDM SIGNAL BY USING COMBINED HADAMARD AND MODIFIED MEU-LAW...
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Computationally Efficient Algorithm for Reducing PAPR in OFDM using Null Subc...
Computationally Efficient Algorithm for Reducing PAPR in OFDM using Null Subc...Computationally Efficient Algorithm for Reducing PAPR in OFDM using Null Subc...
Computationally Efficient Algorithm for Reducing PAPR in OFDM using Null Subc...
 
Is Multipath Routing Really a Panacea?
Is Multipath Routing Really a Panacea?Is Multipath Routing Really a Panacea?
Is Multipath Routing Really a Panacea?
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
 

poster

  • 1. DFA Minimization in Map-Reduce Gösta Grahne, Shahab Harrafi, Iraj Hedayati, Ali Moallemi Department of Computer Science and Software Engineering, Concordia University Introduction DFA Minimization is the process of finding equivalent minimal DFA DFA A = (Q, Σ, δ, qs, F) DFA Minimization DFA minimization is the process of discovering an equivalent DFA to given one with minimum number of states. Hopcroft Hopcroft’s algorithm is considered superior due to its running time of O(n log n) q2 2 p1 1 p3 1 q4 4 q1 2 p2 1 p4 1 q3 3 a a a a q2 2 p1 1 p3 1 q4 4 q1 2 p2 1 p4 1 q3 3 a a a a Figure: Hopcroft minimization method Moore Iteratively computes equivalence class of each state as p ≡i q ⇔ p ≡i−1 q ∧ ∀a ∈ Σδ(p, a) ≡i−1 q Map-Reduce a parallel programming model that can work over large clusters of commodity computers. Challenges Huge amount of data Complex graph based structure Iterative problem Moore’s algorithm in MapReduce Hopcroft’s algorithm in MapReduce Communication Cost of the Algorithms Communication cost can be calculated as: Number of rounds × (Replication rate × Input size + Output size) Number of rounds: O(n) Replication rate: O(1) Input size = Output size Moore-MR: Record size of output of first job is Θ(k log n). Thus communication cost of each round is Θ(k2 n log n). Therefore total comunication cost is O(k2 n2 log n). Hopcroft-MR: There are O(n log n) updates in parallel execution at each round. Thus it requires O(kn2 log n) bits of communication. Experimental Results Findings Figure: Evenly distributed DFA Figure: Effect of number of rounds Figure: Effect of number of alphabet symbols Figure: Effect of skewness Conclusion Hopcroft-MR outperforms Moore-MR in communication cost when the cardinality of the alphabet is at least 16, in wall-clock time when the cardinality is at least 32 in communication cost when number of rounds is more than 128 Both algorithms are equally sensitive to skewness in the input data. Future work, There is potential to reduce skew-sensitiveness in Moore-MR. Investigate the average communication cost Reducer capacity vs. Number of rounds Presented at ACM-Sigmod Beyond MR Workshopin San Francisco Ca., July 2016 {grahne, s_harraf, h_iraj, moa_ali}@encs.concordia.ca