This document describes the ClustBigFIM algorithm for frequent itemset mining of big data using pre-processing based on the MapReduce framework. The ClustBigFIM algorithm first applies k-means clustering to generate clusters from large datasets. It then mines frequent itemsets from the generated clusters using the Apriori and Eclat algorithms within the MapReduce programming model. Experimental results on several datasets show that the ClustBigFIM algorithm increases execution efficiency compared to the BigFIM algorithm by applying k-means clustering as a pre-processing step before frequent itemset mining.
Scalable frequent itemset mining using heterogeneous computing par apriori a...ijdpsjournal
Association Rule mining is one of the dominant tasks of data mining, which concerns in finding frequent
itemsets in large volumes of data in order to produce summarized models of mined rules. These models are
extended to generate association rules in various applications such as e-commerce, bio-informatics,
associations between image contents and non image features, analysis of effectiveness of sales and retail
industry, etc. In the vast increasing databases, the major challenge is the frequent itemsets mining in a
very short period of time. In the case of increasing data, the time taken to process the data should be
almost constant. Since high performance computing has many processors, and many cores, consistent runtime
performance for such very large databases on association rules mining is achieved. We, therefore,
must rely on high performance parallel and/or distributed computing. In literature survey, we have studied
the sequential Apriori algorithms and identified the fundamental problems in sequential environment and
parallel environment. In our proposed ParApriori, we have proposed parallel algorithm for GPGPU, and
we have also done the results analysis of our GPU parallel algorithm. We find that proposed algorithm
improved the computing time, consistency in performance over the increasing load. The empirical analysis
of the algorithm also shows that efficiency and scalability is verified over the series of datasets
experimented on many core GPU platform.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Existing parallel digging calculations for visit itemsets do not have a component that empowers programmed parallelization, stack adjusting, information conveyance, and adaptation to non-critical failure on substantial bunches. As an answer for this issue, we outline a parallel incessant itemsets mining calculation called FiDoop utilizing the MapReduce programming model. To accomplish compacted capacity and abstain from building contingent example bases, FiDoop joins the incessant things Ultrametric tree, as opposed to ordinary FP trees. In FiDoop, three MapReduce occupations are actualized to finish the mining undertaking. In the essential third MapReduce work, the mappers autonomously disintegrate itemsets, the reducers perform mix activities by building little Ultrametric trees, and the genuine mining of these trees independently. We actualize FiDoop on our in-house Hadoop group. We demonstrate that FiDoop on the group is touchy to information dissemination and measurements, in light of the fact that itemsets with various lengths have diverse decay and development costs. To enhance FiDoop's execution, we build up a workload adjust metric to quantify stack adjust over the group's registering hubs. We create FiDoop-HD, an augmentation of FiDoop, to accelerate the digging execution for high-dimensional information investigation. Broad tests utilizing genuine heavenly phantom information exhibit that our proposed arrangement is productive and versatile.
Job Scheduling on the Grid Environment using Max-Min Firefly AlgorithmEditor IJCATR
Grid computing indeed is the next generation of distributed systems and its goals is creating a powerful virtual, great, and
autonomous computer that is created using countless Heterogeneous resource with the purpose of sharing resources. Scheduling is one
of the main steps to exploit the capabilities of emerging computing systems such as the grid. Scheduling of the jobs in computational
grids due to Heterogeneous resources is known as an NP-Complete problem. Grid resources belong to different management domains
and each applies different management policies. Since the nature of the grid is Heterogeneous and dynamic, techniques used in
traditional systems cannot be applied to grid scheduling, therefore new methods must be found. This paper proposes a new algorithm
which combines the firefly algorithm with the Max-Min algorithm for scheduling of jobs on the grid. The firefly algorithm is a new
technique based on the swarm behavior that is inspired by social behavior of fireflies in nature. Fireflies move in the search space of
problem to find the optimal or near-optimal solutions. Minimization of the makespan and flowtime of completing jobs simultaneously
are the goals of this paper. Experiments and simulation results show that the proposed method has a better efficiency than other
compared algorithms.
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
In this paper, MDL based reduction in frequent pattern is presented. The ideal outcome of any pattern mining process is to explore the data in new insights. And also, we need to eliminate the non-interesting patterns that describe noise. The major problem in frequent pattern mining is to identify the interesting patterns. Instead of performing association rule mining on all the frequent item sets, it is feasible to select a sub set of frequent item sets and perform the mining task. Selecting a small set of frequent item sets from large amount of interesting ones is a difficult task. In our approach, MDL based algorithm is used for reducing the number of frequent item sets to be used for association rule mining is presented.
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIijtsrd
In Matrix multiplication we refer to a concept that is used in technology applications such as digital image processing, digital signal processing and graph problem solving. Multiplication of huge matrices requires a lot of computing time as its complexity is O n3 . Because most engineering science applications require higher computational throughput with minimum time, many sequential and analogue algorithms are developed. In this paper, methods of matrix multiplication are elect, implemented, and analyzed. A performance analysis is evaluated, and some recommendations are given when using open MP and MPI methods of parallel of latitude computing. Adamu Abubakar I | Oyku A | Mehmet K | Amina M. Tako ""Comprehensive Performance Evaluation on Multiplication of Matrices using MPI""
Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30015.pdf
Paper Url : https://www.ijtsrd.com/engineering/electrical-engineering/30015/comprehensive-performance-evaluation-on-multiplication-of-matrices-using-mpi/adamu-abubakar-i
Scalable frequent itemset mining using heterogeneous computing par apriori a...ijdpsjournal
Association Rule mining is one of the dominant tasks of data mining, which concerns in finding frequent
itemsets in large volumes of data in order to produce summarized models of mined rules. These models are
extended to generate association rules in various applications such as e-commerce, bio-informatics,
associations between image contents and non image features, analysis of effectiveness of sales and retail
industry, etc. In the vast increasing databases, the major challenge is the frequent itemsets mining in a
very short period of time. In the case of increasing data, the time taken to process the data should be
almost constant. Since high performance computing has many processors, and many cores, consistent runtime
performance for such very large databases on association rules mining is achieved. We, therefore,
must rely on high performance parallel and/or distributed computing. In literature survey, we have studied
the sequential Apriori algorithms and identified the fundamental problems in sequential environment and
parallel environment. In our proposed ParApriori, we have proposed parallel algorithm for GPGPU, and
we have also done the results analysis of our GPU parallel algorithm. We find that proposed algorithm
improved the computing time, consistency in performance over the increasing load. The empirical analysis
of the algorithm also shows that efficiency and scalability is verified over the series of datasets
experimented on many core GPU platform.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Existing parallel digging calculations for visit itemsets do not have a component that empowers programmed parallelization, stack adjusting, information conveyance, and adaptation to non-critical failure on substantial bunches. As an answer for this issue, we outline a parallel incessant itemsets mining calculation called FiDoop utilizing the MapReduce programming model. To accomplish compacted capacity and abstain from building contingent example bases, FiDoop joins the incessant things Ultrametric tree, as opposed to ordinary FP trees. In FiDoop, three MapReduce occupations are actualized to finish the mining undertaking. In the essential third MapReduce work, the mappers autonomously disintegrate itemsets, the reducers perform mix activities by building little Ultrametric trees, and the genuine mining of these trees independently. We actualize FiDoop on our in-house Hadoop group. We demonstrate that FiDoop on the group is touchy to information dissemination and measurements, in light of the fact that itemsets with various lengths have diverse decay and development costs. To enhance FiDoop's execution, we build up a workload adjust metric to quantify stack adjust over the group's registering hubs. We create FiDoop-HD, an augmentation of FiDoop, to accelerate the digging execution for high-dimensional information investigation. Broad tests utilizing genuine heavenly phantom information exhibit that our proposed arrangement is productive and versatile.
Job Scheduling on the Grid Environment using Max-Min Firefly AlgorithmEditor IJCATR
Grid computing indeed is the next generation of distributed systems and its goals is creating a powerful virtual, great, and
autonomous computer that is created using countless Heterogeneous resource with the purpose of sharing resources. Scheduling is one
of the main steps to exploit the capabilities of emerging computing systems such as the grid. Scheduling of the jobs in computational
grids due to Heterogeneous resources is known as an NP-Complete problem. Grid resources belong to different management domains
and each applies different management policies. Since the nature of the grid is Heterogeneous and dynamic, techniques used in
traditional systems cannot be applied to grid scheduling, therefore new methods must be found. This paper proposes a new algorithm
which combines the firefly algorithm with the Max-Min algorithm for scheduling of jobs on the grid. The firefly algorithm is a new
technique based on the swarm behavior that is inspired by social behavior of fireflies in nature. Fireflies move in the search space of
problem to find the optimal or near-optimal solutions. Minimization of the makespan and flowtime of completing jobs simultaneously
are the goals of this paper. Experiments and simulation results show that the proposed method has a better efficiency than other
compared algorithms.
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
In this paper, MDL based reduction in frequent pattern is presented. The ideal outcome of any pattern mining process is to explore the data in new insights. And also, we need to eliminate the non-interesting patterns that describe noise. The major problem in frequent pattern mining is to identify the interesting patterns. Instead of performing association rule mining on all the frequent item sets, it is feasible to select a sub set of frequent item sets and perform the mining task. Selecting a small set of frequent item sets from large amount of interesting ones is a difficult task. In our approach, MDL based algorithm is used for reducing the number of frequent item sets to be used for association rule mining is presented.
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIijtsrd
In Matrix multiplication we refer to a concept that is used in technology applications such as digital image processing, digital signal processing and graph problem solving. Multiplication of huge matrices requires a lot of computing time as its complexity is O n3 . Because most engineering science applications require higher computational throughput with minimum time, many sequential and analogue algorithms are developed. In this paper, methods of matrix multiplication are elect, implemented, and analyzed. A performance analysis is evaluated, and some recommendations are given when using open MP and MPI methods of parallel of latitude computing. Adamu Abubakar I | Oyku A | Mehmet K | Amina M. Tako ""Comprehensive Performance Evaluation on Multiplication of Matrices using MPI""
Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30015.pdf
Paper Url : https://www.ijtsrd.com/engineering/electrical-engineering/30015/comprehensive-performance-evaluation-on-multiplication-of-matrices-using-mpi/adamu-abubakar-i
Scalable Rough C-Means clustering using Firefly algorithm..................................................................1
Abhilash Namdev and B.K. Tripathy
Significance of Embedded Systems to IoT................................................................................................. 15
P. R. S. M. Lakshmi, P. Lakshmi Narayanamma and K. Santhi Sri
Cognitive Abilities, Information Literacy Knowledge and Retrieval Skills of Undergraduates: A
Comparison of Public and Private Universities in Nigeria ........................................................................ 24
Janet O. Adekannbi and Testimony Morenike Oluwayinka
Risk Assessment in Constructing Horseshoe Vault Tunnels using Fuzzy Technique................................ 48
Erfan Shafaghat and Mostafa Yousefi Rad
Evaluating the Adoption of Deductive Database Technology in Augmenting Criminal Intelligence in
Zimbabwe: Case of Zimbabwe Republic Police......................................................................................... 68
Mahlangu Gilbert, Furusa Samuel Simbarashe, Chikonye Musafare and Mugoniwa Beauty
Analysis of Petrol Pumps Reachability in Anand District of Gujarat ....................................................... 77
Nidhi Arora
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...csandit
Computational Grid (CG) creates a large heterogeneous and distributed paradigm to manage and execute the applications which are computationally intensive. In grid scheduling tasks are assigned to the proper processors in the grid system to for its execution by considering the execution policy and the optimization objectives. In this paper, makespan and the faulttolerance of the computational nodes of the grid which are the two important parameters for the task execution, are considered and tried to optimize it. As the grid scheduling is considered to be NP-Hard, so a meta-heuristics evolutionary based techniques are often used to find a solution for this. We have proposed a NSGA II for this purpose. The performance estimation ofthe proposed Fault tolerance Aware NSGA II (FTNSGA II) has been done by writing program in Matlab. The simulation results evaluates the performance of the all proposed algorithm and the results of proposed model is compared with existing model Min-Min and Max-Min algorithm which proves effectiveness of the model.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
The last 10 years have been the decade of autonomous vehicles. Advances in intelligent sensors and control schemes have shown the possibility of real applications.
Deep learning, and in particular convolutional networks have become a fundamental
tool in the solution of problems related to environment identification, path planning,
vehicle behavior, and motion control. In this paper, we perform a comparative study of
the most used optimization strategies on the convolutional architecture residual neural network (ResNet) for an autonomous driving problem as a previous step to the
development of an intelligent sensor. This sensor, part of our research in reactive
systems for autonomous vehicles, aims to become a system for direct mapping of sensory information to control actions from real-time images of the environment. The
optimization techniques analyzed include stochastic gradient descent (SGD), adaptive gradient (Adagrad), adaptive learning rate (Adadelta), root mean square propagation (RMSProp), Adamax, adaptive moment estimation (Adam), nesterov-accelerated
adaptive moment estimation (Nadam), and follow the regularized leader (Ftrl). The
training of the deep model is evaluated in terms of convergence, accuracy, recall, and
F1-score metrics. Preliminary results show a better performance of the deep network
when using the SGD function as an optimizer, while the Ftrl function presents the
poorest performances.
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Multi-threaded approach in generating frequent itemset of Apriori algorithm b...TELKOMNIKA JOURNAL
This research is about the application of multi-threaded and trie data structures to the support calculation problem in the Apriori algorithm.
The support calculation results can search the association rule for market basket analysis problems. The support calculation process is a bottleneck process and can cause delays in the following process. This work observed five multi-threaded models based on Flynn’s taxonomy, which are single process, multiple data (SPMD), multiple process, single data (MPSD), multiple process, multiple data (MPMD), double SPMD first variant, and double SPMD second variant to shorten the processing time of the support calculation. In addition to the processing time, this works also consider the time difference between each multi-threaded model when the number of item variants increases. The time obtained from the experiment shows that the multi-threaded model that applies a double SPMD variant structure can perform almost three times faster than the multi-threaded model that applies the SPMD structure, MPMD structure, and combination of MPMD and SPMD based on the time difference of 5-itemsets and 10-itemsets experimental result.
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
In this paper the Delay Computation method for Common Sub expression Elimination algorithm is being implemented on Cyclotomic Fast Fourier Transform. The Common Sub Expression Elimination algorithm is combined with the delay computing method and is known as Gate Level Delay Computation with Common Sub expression Elimination Algorithm. Common sub expression elimination is effective
optimization method used to reduce adders in cyclotomic Fourier transform. The delay computing method is based on delay matrix and suitable for implementation with computers. The Gate level delay computation method is used to find critical path delay and it is analyzed on various finite field elements. The presented algorithm is established through a case study in Cyclotomic Fast Fourier Transform over finite field. If Cyclotomic Fast Fourier Transform is implemented directly then the system will have high additive complexities. So by using GLDC-CSE algorithm on cyclotomic fast Fourier transform, the additive
complexities will be reduced and also the area and area delay product will be reduced.
Study of Density Based Clustering Techniques on Data StreamsIJERA Editor
Data streams are generated by many real time systems. Data stream is fast changing and massive. In stream data mining traditional methods are not efficient so that many methodologies developed to stream data processing. Many applications require data into groups based on its characteristics. So clustering on data streams is applied. Clustering of non liner data density based clustering is used. Review of clustering algorithm and methodologies is represented and evaluated if they meet requirement of users. Study of density based clustering algorithm is presented here because of advantages of density based clustering method over other clustering method.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
ENHANCING ENGLISH WRITING SKILLS THROUGH INTERNET-PLUS TOOLS IN THE PERSPECTI...ijfcstjournal
This investigation delves into incorporating a hybridized memetic strategy within the framework of English
composition pedagogy, leveraging Internet Plus resources. The study aims to provide an in-depth analysis
of how this method influences students’ writing competence, their perceptions of writing, and their
enthusiasm for English acquisition. Employing an explanatory research design that combines qualitative
and quantitative methods, the study collects data through surveys, interviews, and observations of students’
writing performance before and after the intervention. Findings demonstrate a beneficial impact of
integrating the memetic approach alongside Internet Plus tools on the writing aptitude of English as a
Foreign Language (EFL) learners. Students reported increased engagement with writing, attributing it to
the use of Internet plus tools. They also expressed that the memetic approach facilitated a deeper
understanding of cultural and social contexts in writing. Furthermore, the findings highlight a significant
improvement in students’ writing skills following the intervention. This study provides significant insights
into the practical implementation of the memetic approach within English writing education, highlighting
the beneficial contribution of Internet Plus tools in enriching students' learning journeys.
A SURVEY TO REAL-TIME MESSAGE-ROUTING NETWORK SYSTEM WITH KLA MODELLINGijfcstjournal
Messages routing over a network is one of the most fundamental concept in communication which requires
simultaneous transmission of messages from a source to a destination. In terms of Real-Time Routing, it
refers to the addition of a timing constraint in which messages should be received within a specified time
delay. This study involves Scheduling, Algorithm Design and Graph Theory which are essential parts of
the Computer Science (CS) discipline. Our goal is to investigate an innovative and efficient way to present
these concepts in the context of CS Education. In this paper, we will explore the fundamental modelling of
routing real-time messages on networks. We study whether it is possible to have an optimal on-line
algorithm for the Arbitrary Directed Graph network topology. In addition, we will examine the message
routing’s algorithmic complexity by breaking down the complex mathematical proofs into concrete, visual
examples. Next, we explore the Unidirectional Ring topology in finding the transmission’s
“makespan”.Lastly, we propose the same network modelling through the technique of Kinesthetic Learning
Activity (KLA). We will analyse the data collected and present the results in a case study to evaluate the
effectiveness of the KLA approach compared to the traditional teaching method.
More Related Content
Similar to CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK
Scalable Rough C-Means clustering using Firefly algorithm..................................................................1
Abhilash Namdev and B.K. Tripathy
Significance of Embedded Systems to IoT................................................................................................. 15
P. R. S. M. Lakshmi, P. Lakshmi Narayanamma and K. Santhi Sri
Cognitive Abilities, Information Literacy Knowledge and Retrieval Skills of Undergraduates: A
Comparison of Public and Private Universities in Nigeria ........................................................................ 24
Janet O. Adekannbi and Testimony Morenike Oluwayinka
Risk Assessment in Constructing Horseshoe Vault Tunnels using Fuzzy Technique................................ 48
Erfan Shafaghat and Mostafa Yousefi Rad
Evaluating the Adoption of Deductive Database Technology in Augmenting Criminal Intelligence in
Zimbabwe: Case of Zimbabwe Republic Police......................................................................................... 68
Mahlangu Gilbert, Furusa Samuel Simbarashe, Chikonye Musafare and Mugoniwa Beauty
Analysis of Petrol Pumps Reachability in Anand District of Gujarat ....................................................... 77
Nidhi Arora
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...csandit
Computational Grid (CG) creates a large heterogeneous and distributed paradigm to manage and execute the applications which are computationally intensive. In grid scheduling tasks are assigned to the proper processors in the grid system to for its execution by considering the execution policy and the optimization objectives. In this paper, makespan and the faulttolerance of the computational nodes of the grid which are the two important parameters for the task execution, are considered and tried to optimize it. As the grid scheduling is considered to be NP-Hard, so a meta-heuristics evolutionary based techniques are often used to find a solution for this. We have proposed a NSGA II for this purpose. The performance estimation ofthe proposed Fault tolerance Aware NSGA II (FTNSGA II) has been done by writing program in Matlab. The simulation results evaluates the performance of the all proposed algorithm and the results of proposed model is compared with existing model Min-Min and Max-Min algorithm which proves effectiveness of the model.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
The last 10 years have been the decade of autonomous vehicles. Advances in intelligent sensors and control schemes have shown the possibility of real applications.
Deep learning, and in particular convolutional networks have become a fundamental
tool in the solution of problems related to environment identification, path planning,
vehicle behavior, and motion control. In this paper, we perform a comparative study of
the most used optimization strategies on the convolutional architecture residual neural network (ResNet) for an autonomous driving problem as a previous step to the
development of an intelligent sensor. This sensor, part of our research in reactive
systems for autonomous vehicles, aims to become a system for direct mapping of sensory information to control actions from real-time images of the environment. The
optimization techniques analyzed include stochastic gradient descent (SGD), adaptive gradient (Adagrad), adaptive learning rate (Adadelta), root mean square propagation (RMSProp), Adamax, adaptive moment estimation (Adam), nesterov-accelerated
adaptive moment estimation (Nadam), and follow the regularized leader (Ftrl). The
training of the deep model is evaluated in terms of convergence, accuracy, recall, and
F1-score metrics. Preliminary results show a better performance of the deep network
when using the SGD function as an optimizer, while the Ftrl function presents the
poorest performances.
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Multi-threaded approach in generating frequent itemset of Apriori algorithm b...TELKOMNIKA JOURNAL
This research is about the application of multi-threaded and trie data structures to the support calculation problem in the Apriori algorithm.
The support calculation results can search the association rule for market basket analysis problems. The support calculation process is a bottleneck process and can cause delays in the following process. This work observed five multi-threaded models based on Flynn’s taxonomy, which are single process, multiple data (SPMD), multiple process, single data (MPSD), multiple process, multiple data (MPMD), double SPMD first variant, and double SPMD second variant to shorten the processing time of the support calculation. In addition to the processing time, this works also consider the time difference between each multi-threaded model when the number of item variants increases. The time obtained from the experiment shows that the multi-threaded model that applies a double SPMD variant structure can perform almost three times faster than the multi-threaded model that applies the SPMD structure, MPMD structure, and combination of MPMD and SPMD based on the time difference of 5-itemsets and 10-itemsets experimental result.
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
In this paper the Delay Computation method for Common Sub expression Elimination algorithm is being implemented on Cyclotomic Fast Fourier Transform. The Common Sub Expression Elimination algorithm is combined with the delay computing method and is known as Gate Level Delay Computation with Common Sub expression Elimination Algorithm. Common sub expression elimination is effective
optimization method used to reduce adders in cyclotomic Fourier transform. The delay computing method is based on delay matrix and suitable for implementation with computers. The Gate level delay computation method is used to find critical path delay and it is analyzed on various finite field elements. The presented algorithm is established through a case study in Cyclotomic Fast Fourier Transform over finite field. If Cyclotomic Fast Fourier Transform is implemented directly then the system will have high additive complexities. So by using GLDC-CSE algorithm on cyclotomic fast Fourier transform, the additive
complexities will be reduced and also the area and area delay product will be reduced.
Study of Density Based Clustering Techniques on Data StreamsIJERA Editor
Data streams are generated by many real time systems. Data stream is fast changing and massive. In stream data mining traditional methods are not efficient so that many methodologies developed to stream data processing. Many applications require data into groups based on its characteristics. So clustering on data streams is applied. Clustering of non liner data density based clustering is used. Review of clustering algorithm and methodologies is represented and evaluated if they meet requirement of users. Study of density based clustering algorithm is presented here because of advantages of density based clustering method over other clustering method.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and
algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive
parameters have been used to compare their performance. However, depending upon the application a
wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion
of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99
has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts
with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing
KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the
number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported
literature general methods for anomaly detection are selected which perform best for different types of
attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is
suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost
sensitivity of true positive and false positive results is done and a method is proposed to select the elements
of cost sensitivity metrics for further improving the results to achieve the overall better performance. The
impact on performance trade of due to incorporating the cost sensitivity is discussed.
ENHANCING ENGLISH WRITING SKILLS THROUGH INTERNET-PLUS TOOLS IN THE PERSPECTI...ijfcstjournal
This investigation delves into incorporating a hybridized memetic strategy within the framework of English
composition pedagogy, leveraging Internet Plus resources. The study aims to provide an in-depth analysis
of how this method influences students’ writing competence, their perceptions of writing, and their
enthusiasm for English acquisition. Employing an explanatory research design that combines qualitative
and quantitative methods, the study collects data through surveys, interviews, and observations of students’
writing performance before and after the intervention. Findings demonstrate a beneficial impact of
integrating the memetic approach alongside Internet Plus tools on the writing aptitude of English as a
Foreign Language (EFL) learners. Students reported increased engagement with writing, attributing it to
the use of Internet plus tools. They also expressed that the memetic approach facilitated a deeper
understanding of cultural and social contexts in writing. Furthermore, the findings highlight a significant
improvement in students’ writing skills following the intervention. This study provides significant insights
into the practical implementation of the memetic approach within English writing education, highlighting
the beneficial contribution of Internet Plus tools in enriching students' learning journeys.
A SURVEY TO REAL-TIME MESSAGE-ROUTING NETWORK SYSTEM WITH KLA MODELLINGijfcstjournal
Messages routing over a network is one of the most fundamental concept in communication which requires
simultaneous transmission of messages from a source to a destination. In terms of Real-Time Routing, it
refers to the addition of a timing constraint in which messages should be received within a specified time
delay. This study involves Scheduling, Algorithm Design and Graph Theory which are essential parts of
the Computer Science (CS) discipline. Our goal is to investigate an innovative and efficient way to present
these concepts in the context of CS Education. In this paper, we will explore the fundamental modelling of
routing real-time messages on networks. We study whether it is possible to have an optimal on-line
algorithm for the Arbitrary Directed Graph network topology. In addition, we will examine the message
routing’s algorithmic complexity by breaking down the complex mathematical proofs into concrete, visual
examples. Next, we explore the Unidirectional Ring topology in finding the transmission’s
“makespan”.Lastly, we propose the same network modelling through the technique of Kinesthetic Learning
Activity (KLA). We will analyse the data collected and present the results in a case study to evaluate the
effectiveness of the KLA approach compared to the traditional teaching method.
A COMPARATIVE ANALYSIS ON SOFTWARE ARCHITECTURE STYLESijfcstjournal
Software architecture is the structural solution that achieves the overall technical and operational
requirements for software developments. Software engineers applied software architectures for their
software system developments; however, they worry the basic benchmarks in order to select software
architecture styles, possible components, integration methods (connectors) and the exact application of
each style.
The objective of this research work was a comparative analysis of software architecture styles by its
weakness and benefits in order to select by the programmer during their design time. Finally, in this study,
the researcher has been identified architectural styles, weakness, and Strength and application areas with
its component, connector and Interface for the selected architectural styles.
SYSTEM ANALYSIS AND DESIGN FOR A BUSINESS DEVELOPMENT MANAGEMENT SYSTEM BASED...ijfcstjournal
A design of a sales system for professional services requires a comprehensive understanding of the
dynamics of sale cycles and how key knowledge for completing sales is managed. This research describes
a design model of a business development (sales) system for professional service firms based on the Saudi
Arabian commercial market, which takes into account the new advances in technology while preserving
unique or cultural practices that are an important part of the Saudi Arabian commercial market. The
design model has combined a number of key technologies, such as cloud computing and mobility, as an
integral part of the proposed system. An adaptive development process has also been used in implementing
the proposed design model.
AN ALGORITHM FOR SOLVING LINEAR OPTIMIZATION PROBLEMS SUBJECTED TO THE INTERS...ijfcstjournal
Frank t-norms are parametric family of continuous Archimedean t-norms whose members are also strict
functions. Very often, this family of t-norms is also called the family of fundamental t-norms because of the
role it plays in several applications. In this paper, optimization of a linear objective function with fuzzy
relational inequality constraints is investigated. The feasible region is formed as the intersection of two
inequality fuzzy systems defined by frank family of t-norms is considered as fuzzy composition. First, the
resolution of the feasible solutions set is studied where the two fuzzy inequality systems are defined with
max-Frank composition. Second, some related basic and theoretical properties are derived. Then, a
necessary and sufficient condition and three other necessary conditions are presented to conceptualize the
feasibility of the problem. Subsequently, it is shown that a lower bound is always attainable for the optimal
objective value. Also, it is proved that the optimal solution of the problem is always resulted from the
unique maximum solution and a minimal solution of the feasible region. Finally, an algorithm is presented
to solve the problem and an example is described to illustrate the algorithm. Additionally, a method is
proposed to generate random feasible max-Frank fuzzy relational inequalities. By this method, we can
easily generate a feasible test problem and employ our algorithm to it.
LBRP: A RESILIENT ENERGY HARVESTING NOISE AWARE ROUTING PROTOCOL FOR UNDER WA...ijfcstjournal
Underwater detector network is one amongst the foremost difficult and fascinating analysis arenas that
open the door of pleasing plenty of researchers during this field of study. In several under water based
sensor applications, nodes are square measured and through this the energy is affected. Thus, the mobility
of each sensor nodes are measured through the water atmosphere from the water flow for sensor based
protocol formations. Researchers have developed many routing protocols. However, those lost their charm
with the time. This can be the demand of the age to supply associate degree upon energy-efficient and
ascendable strong routing protocol for under water actuator networks. During this work, the authors tend
to propose a customary routing protocol named level primarily based routing protocol (LBRP), reaching to
offer strong, ascendable and energy economical routing. LBRP conjointly guarantees the most effective use
of total energy consumption and ensures packet transmission which redirects as an additional reliability in
compare to different routing protocols. In this work, the authors have used the level of forwarding node,
residual energy and distance from the forwarding node to the causing node as a proof in multicasting
technique comparisons. Throughout this work, the authors have got a recognition result concerning about
86.35% on the average in node multicasting performances. Simulation has been experienced each in a
wheezy and quiet atmosphere which represents the endorsement of higher performance for the planned
protocol.
STRUCTURAL DYNAMICS AND EVOLUTION OF CAPSULE ENDOSCOPY (PILL CAMERA) TECHNOLO...ijfcstjournal
This research paper examined and re-evaluates the technological innovation, theory, structural dynamics
and evolution of Pill Camera(Capsule Endoscopy) technology in redirecting the response manner of small
bowel (intestine) examination in human. The Pill Camera (Endoscopy Capsule) is made up of sealed
biocompatible material to withstand acid, enzymes and other antibody chemicals in the stomach is a
technology that helps the medical practitioners especially the general physicians and the
gastroenterologists to examine and re-examine the intestine for possible bleeding or infection. Before the
advent of the Pill camera (Endoscopy Capsule) the colonoscopy was the local method used but research
showed that some parts (bowel) of the intestine can’t be reach by mere traditional method hence the need
for Pill Camera. Countless number of deaths from stomach disease such as polyps, inflammatory bowel
(Crohn”s diseases), Cancers, Ulcer, anaemia and tumours of small intestines which ordinary would have
been detected by sophisticated technology like Pill Camera has become norm in the developing nations.
Nevertheless, not only will this paper examine and re-evaluate the Pill Camera Innovation, theory,
Structural dynamics and evolution it unravelled and aimed to create awareness for both medical
practitioners and the public.
AN OPTIMIZED HYBRID APPROACH FOR PATH FINDINGijfcstjournal
Path finding algorithm addresses problem of finding shortest path from source to destination avoiding
obstacles. There exist various search algorithms namely A*, Dijkstra's and ant colony optimization. Unlike
most path finding algorithms which require destination co-ordinates to compute path, the proposed
algorithm comprises of a new method which finds path using backtracking without requiring destination
co-ordinates. Moreover, in existing path finding algorithm, the number of iterations required to find path is
large. Hence, to overcome this, an algorithm is proposed which reduces number of iterations required to
traverse the path. The proposed algorithm is hybrid of backtracking and a new technique(modified 8-
neighbor approach). The proposed algorithm can become essential part in location based, network, gaming
applications. grid traversal, navigation, gaming applications, mobile robot and Artificial Intelligence.
EAGRO CROP MARKETING FOR FARMING COMMUNITYijfcstjournal
The Major Occupation in India is the Agriculture; the people involved in the Agriculture belong to the poor
class and category. The people of the farming community are unaware of the new techniques and Agromachines, which would direct the world to greater heights in the field of agriculture. Though the farmers
work hard, they are cheated by agents in today’s market. This serves as a opportunity to solve
all the problems that farmers face in the current world. The eAgro crop marketing will serve as a better
way for the farmers to sell their products within the country with some mediocre knowledge about using
the website. This would provide information to the farmers about current market rate of agro-products,
their sale history and profits earned in a sale. This site will also help the farmers to know about the market
information and to view agricultural schemes of the Government provided to farmers.
EDGE-TENACITY IN CYCLES AND COMPLETE GRAPHSijfcstjournal
It is well known that the tenacity is a proper measure for studying vulnerability and reliability in graphs.
Here, a modified edge-tenacity of a graph is introduced based on the classical definition of tenacity.
Properties and bounds for this measure are introduced; meanwhile edge-tenacity is calculated for cycle
graphs and also for complete graphs.
COMPARATIVE STUDY OF DIFFERENT ALGORITHMS TO SOLVE N QUEENS PROBLEMijfcstjournal
This Paper provides a brief description of the Genetic Algorithm (GA), the Simulated Annealing (SA)
Algorithm, the Backtracking (BT) Algorithm and the Brute Force (BF) Search Algorithm and attempts to
explain the way as how the Proposed Genetic Algorithm (GA), the Proposed Simulated Annealing (SA)
Algorithm using GA, the Backtracking (BT) Algorithm and the Brute Force (BF) Search Algorithm can be
employed in finding the best solution of N Queens Problem and also, makes a comparison between these
four algorithms. It is entirely a review based work. The four algorithms were written as well as
implemented. From the Results, it was found that, the Proposed Genetic Algorithm (GA) performed better
than the Proposed Simulated Annealing (SA) Algorithm using GA, the Backtracking (BT) Algorithm and
the Brute Force (BF) Search Algorithm and it also provided better fitness value (solution) than the
Proposed Simulated Annealing Algorithm (SA) using GA, the Backtracking (BT) Algorithm and the Brute
Force (BF) Search Algorithm, for different N values. Also, it was noticed that, the Proposed GA took more
time to provide result than the Proposed SA using GA.
PSTECEQL: A NOVEL EVENT QUERY LANGUAGE FOR VANET’S UNCERTAIN EVENT STREAMSijfcstjournal
In recent years, the complex event processing technology has been used to process the VANET’s temporal
and spatial event streams. However, we usually cannot get the accurate data because the device sensing
accuracy limitations of the system. We only can get the uncertain data from the complex and limited
environment of the VANET. Because the VANET’s event streams are consist of the uncertain data, so they
are also uncertain. How effective to express and process these uncertain event streams has become the core
issue for the VANET system. To solve this problem, we propose a novel complex event query language
PSTeCEQL (probabilistic spatio-temporal constraint event query language). Firstly, we give the definition
of the possible world model of VANET’s uncertain event streams. Secondly, we propose an event query
language PSTeCEQL and give the syntax and the operational semantics of the language. Finally, we
illustrate the validity of the PSTeCEQL by an example.
A MUTATION TESTING ANALYSIS AND REGRESSION TESTINGijfcstjournal
Software testing is a testing which conducted a test to provide information to client about the quality of the
product under test. Software testing can also provide an objective, independent view of the software to
allow the business to appreciate and understand the risks of software implementation. In this paper we
focused on two main software testing –mutation testing and mutation testing. Mutation testing is a
procedural testing method, i.e. we use the structure of the code to guide the test program, A mutation is a
little change in a program. Such changes are applied to model low level defects that obtain in the process
of coding systems. Ideally mutations should model low-level defect creation. Mutation testing is a process
of testing in which code is modified then mutated code is tested against test suites. The mutations used in
source code are planned to include in common programming errors. A good unit test typically detects the
program mutations and fails automatically. Mutation testing is used on many different platforms, including
Java, C++, C# and Ruby. Regression testing is a type of software testing that seeks to uncover
new software bugs, or regressions, in existing functional and non-functional areas of a system after
changes such as enhancements, patches or configuration changes, have been made to them. When defects
are found during testing, the defect got fixed and that part of the software started working as needed. But
there may be a case that the defects that fixed have introduced or uncovered a different defect in the
software. The way to detect these unexpected bugs and to fix them used regression testing. The main focus
of regression testing is to verify that changes in the software or program have not made any adverse side
effects and that the software still meets its need. Regression tests are done when there are any changes
made on software, because of modified functions.
GREEN WSN- OPTIMIZATION OF ENERGY USE THROUGH REDUCTION IN COMMUNICATION WORK...ijfcstjournal
Advances in micro fabrication and communication techniques have led to unimaginable proliferation of
WSN applications. Research is focussed on reduction of setup operational energy costs. Bulk of operational
energy costs are linked to communication activities of WSN. Any progress towards energy efficiency has a
potential of huge savings globally. Therefore, every energy efficient step is an endeavour to cut costs and
‘Go Green’. In this paper, we have proposed a framework to reduce communication workload through: Innetwork compression and multiple query synthesis at the base-station and modification of query syntax
through introduction of Static Variables. These approaches are general approaches which can be used in
any WSN irrespective of application.
A NEW MODEL FOR SOFTWARE COSTESTIMATION USING HARMONY SEARCHijfcstjournal
Accurate and realistic estimation is always considered to be a great challenge in software industry.
Software Cost Estimation (SCE) is the standard application used to manage software projects. Determining
the amount of estimation in the initial stages of the project depends on planning other activities of the
project. In fact, the estimation is confronted with a number of uncertainties and barriers’, yet assessing the
previous projects is essential to solve this problem. Several models have been developed for the analysis of
software projects. But the classical reference method is the COCOMO model, there are other methods
which are also applied such as Function Point (FP), Line of Code(LOC); meanwhile, the expert`s opinions
matter in this regard. In recent years, the growth and the combination of meta-heuristic algorithms with
high accuracy have brought about a great achievement in software engineering. Meta-heuristic algorithms
which can analyze data from multiple dimensions and identify the optimum solution between them are
analytical tools for the analysis of data. In this paper, we have used the Harmony Search (HS)algorithm for
SCE. The proposed model which is a collection of 60 standard projects from Dataset NASA60 has been
assessed.The experimental results show that HS algorithm is a good way for determining the weight
similarity measures factors of software effort, and reducing the error of MRE.
AGENT ENABLED MINING OF DISTRIBUTED PROTEIN DATA BANKSijfcstjournal
Mining biological data is an emergent area at the intersection between bioinformatics and data mining
(DM). The intelligent agent based model is a popular approach in constructing Distributed Data Mining
(DDM) systems to address scalable mining over large scale distributed data. The nature of associations
between different amino acids in proteins has also been a subject of great anxiety. There is a strong need to
develop new models and exploit and analyze the available distributed biological data sources. In this study,
we have designed and implemented a multi-agent system (MAS) called Agent enriched Quantitative
Association Rules Mining for Amino Acids in distributed Protein Data Banks (AeQARM-AAPDB). Such
globally strong association rules enhance understanding of protein composition and are desirable for
synthesis of artificial proteins. A real protein data bank is used to validate the system.
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
International Journal on Foundations of Computer Science & Technology (IJFCST) is a Bi-monthly peer-reviewed and refereed open access journal that publishes articles which contribute new results in all areas of the Foundations of Computer Science & Technology. Over the last decade, there has been an explosion in the field of computer science to solve various problems from mathematics to engineering. This journal aims to provide a platform for exchanging ideas in new emerging trends that needs more focus and exposure and will attempt to publish proposals that strengthen our goals. Topics of interest include, but are not limited to the following:
Because the technology is used largely in the last decades; cybercrimes have become a significant
international issue as a result of the huge damage that it causes to the business and even to the ordinary
users of technology. The main aims of this paper is to shed light on digital crimes and gives overview about
what a person who is related to computer science has to know about this new type of crimes. The paper has
three sections: Introduction to Digital Crime which gives fundamental information about digital crimes,
Digital Crime Investigation which presents different investigation models and the third section is about
Cybercrime Law.
DISTRIBUTION OF MAXIMAL CLIQUE SIZE UNDER THE WATTS-STROGATZ MODEL OF EVOLUTI...ijfcstjournal
In this paper, we analyze the evolution of a small-world network and its subsequent transformation to a
random network using the idea of link rewiring under the well-known Watts-Strogatz model for complex
networks. Every link u-v in the regular network is considered for rewiring with a certain probability and if
chosen for rewiring, the link u-v is removed from the network and the node u is connected to a randomly
chosen node w (other than nodes u and v). Our objective in this paper is to analyze the distribution of the
maximal clique size per node by varying the probability of link rewiring and the degree per node (number
of links incident on a node) in the initial regular network. For a given probability of rewiring and initial
number of links per node, we observe the distribution of the maximal clique per node to follow a Poisson
distribution. We also observe the maximal clique size per node in the small-world network to be very close
to that of the average value and close to that of the maximal clique size in a regular network. There is no
appreciable decrease in the maximal clique size per node when the network transforms from a regular
network to a small-world network. On the other hand, when the network transforms from a small-world
network to a random network, the average maximal clique size value decreases significantly
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMSijfcstjournal
This research paper is a statistical comparative study of a few average case asymptotically optimal sorting
algorithms namely, Quick sort, Heap sort and K- sort. The three sorting algorithms all with the same
average case complexity have been compared by obtaining the corresponding statistical bounds while
subjecting these procedures over the randomly generated data from some standard discrete and continuous
probability distributions such as Binomial distribution, Uniform discrete and continuous distribution and
Poisson distribution. The statistical analysis is well supplemented by the parameterized complexity
analysis
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK
1. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
DOI:10.5121/ijfcst.2015.5307 79
CLUSTBIGFIM-FREQUENT ITEMSET MINING OF
BIG DATA USING PRE-PROCESSING BASED ON
MAPREDUCE FRAMEWORK
Sheela Gole1
and Bharat Tidke2
1
Department of Computer Engineering, Flora Institute of Technology, Pune, India
ABSTRACT
Now a day enormous amount of data is getting explored through Internet of Things (IoT) as technologies
are advancing and people uses these technologies in day to day activities, this data is termed as Big Data
having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose
frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by
traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets
but it has large communication cost which reduces execution efficiency. This proposed new pre-processed
k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-
means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets
from generated clusters using MapReduce programming model. Results shown that execution efficiency of
ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as
one of the pre-processing technique.
KEYWORDS
Association Rule Mining, Big Data, Clustering, Frequent Itemset Mining, MapReduce.
1. INTRODUCTION
Data mining and KDD (Knowledge Discovery in Databases) are essential techniques to discover
hidden information from large datasets with various characteristics. Now a day Big Data has
bloom in various areas such as social networking, retail, web blogs, forums, online groups [1].
Frequent Itemset Mining is one of the important techniques of ARM. Goal of FIM techniques is
to reveal frequent itemsets from transactional databases. Agrawal et al. [2] put forward Apriori
algorithm which generates frequent itemsets having frequency greater than minimum support
given. It is not efficient on single computer when dataset size increases. Enormous amount of
work has been put forward to uncover frequent items. There exist various parallel and distributed
algorithms which works on large datasets but having memory and I/O cost limitations and cannot
handle Big Data [3] [4].
MapReduce developed by Google [5] along with hadoop distributed file system is exploited to
find out frequent itemsets from Big Data on large clusters. MapReduce uses parallel computing
approach and HDFS is fault tolerant system. MapReduce has Map and Reduce functions; data
flow in MapReduce is shown in below figure.
2. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
80
Figure 1. Map-Reduce Data flow.
In this paper, based on BigFIM algorithm, a new algorithm optimizing the speed of BigFIM
algorithm is proposed. Firstly using parallel K-Means clustering clusters are generated from Big
Datasets. Then clusters are mined using ClustBigFIM algorithm, effectively increasing the
execution efficiency.
This paper is organized as follows section 2 gives overview of related work done on frequent
itemset mining. Section 3 gives overview of background theory for ClustBigFIM. Section 4
explains pseudo code of ClustBigFIM. The experimental results with comparative analysis are
given in section 5. Section 6 concludes the paper.
2. RELATED WORK
Various sequential and parallel frequent itemset parallel algorithms are available [5] [6] [7] [8]
[9] [10]. But there is need of FIM algorithms which can handle Big Data. This section gives an
insight into frequent itemset mining which exploits MapReduce framework. The existing
algorithms have challenges while dealing with Big Data.
Parallel implementation of traditional Apriori algorithm based on MapReduce framework is put
forward by Lin et al. [11] and Li et al. [12] also proposed parallel implementation of Apriori
algorithm. Hammoud [13] has put forward MRApriori algorithm which is based on MapReduce
programming model and classic Apriori algorithm. It does not require repetitive scan of database
which uses iterative horizontal and vertical switching. Parallel implementation of FP-Growth
algorithms has been put forward in [14].
Liu et al. [15] has been put forward IOMRA algorithm which is a modified FAMR algorithm
optimizes execution efficiency by pre-processing using Apriori TID which removes all low
frequency 1-item itemsets from given database. Then possible longest candidate itemset size is
determined using length of each transaction and minimum support.
3. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
81
Moens et al. [16] has been put forward two algorithms such as DistEclat and BigFIM, DistEclat is
distributed version of Eclat algorithm which mines prefix tree and extracts frequent itemsets
faster but not scalable enough. BigFIM applies Apriori algorithm before DistEclat to handle
frequent itemsets till size k and next k+1 item are extracted using Eclat algorithm but BigFIM
algorithm has limitation on speed. Both algorithms are based on MapReduce framework.
Currently Moens also proposed implementations of DistEclat and BigFIM algorithms using
Mahout.
Approximate frequent itemsets are mined using PARMA algorithm which has been put forward
by Riondato et al. [17]. K-means clustering algorithm is used for finding clusters which is called
as sample list. Frequent item sets are extracted very fast, reducing execution time.
Malek and Kadima [18] has been put forward parallel k-means clustering which uses MapReduce
programming model for generating clusters parallel by increasing performance of traditional K-
Means algorithm. It has Map, Combine and Reduce functions which uses (key, value) pair.
Distance between sample point and random centres are calculated for all points using map
function. Intermediate output values from map function are combined using combiner function.
All samples are assigned to closest cluster using reduce function.
3. BACKGROUND
3.1. Problem Statement
Let I be a set of items, I = {i1,i2,i3,…,in}, X is a set of items, X = {i1,i2,i3,…,ik} ⊆ I called k -
itemset. A transaction T = {t1,t2, t3, …,tm}, denoted as T = (tid, I) where tid is transaction ID. T∈D,
where D is a transactional database. The cover of itemset X in D is the set of transaction IDs
containing items from X.
Cover(X, D) = {tid | (tid, I) ∈D, X ⊆ I}
The support of an itemset X in D is count of transactions containing items from X.
Support (X, D) = |Cover(X, D)|
An itemset is called frequent when its absolute minimum support threshold σ abs, with 0 ≤ σ abs ≤
|D|.
Partitioning of transactions into set of groups is called clustering. Let s be the number of clusters
then {C1, C2, C3… Cs} is a set of clusters from {t1,t2, t3, …,tm} , where m is number of
transactions. Each transaction is assigned to only one clusters i.e. Cp ≠ φ ∧ Cp ∩ Cq for 1 ≤ p, q
≤ s, Cp is called as cluster. Let µ z be the mean of cluster Cz, squared error between mean of
cluster and transactions in cluster is given as below,
J (Cs ) = 2
||
|| s
C
t
i
s
i
t µ
−
∑
∈
k-means is used for minimizing sum of squared error over all S clusters and is given by,
J (C ) = ∑
=
S
s 1
2
||
|| s
C
t
i
s
i
t µ
−
∑
∈
k-means algorithm starts with one cluster and assigns each transaction to clusters with minimum
squared error.
4. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
82
3.2. Apriori Algorithm
Apriori is the first frequent itemset mining algorithm which has been put forward by Agarwal et
al. [19]. Transactional database has transaction identifier and set of items presenting transaction.
Apriori algorithm scans the horizontal database and finds frequents items of size 1-item using
minimum support condition. From these frequent items discovered in iteration 1 candidate
itemsets are formed and frequent itemsets of size two are extracted using minimum support
condition. This process is repeated till either list of candidate itemset or frequent itemset is empty.
It requires repetitive scan of database. Monotonicity property is used for removing frequent items.
3.3. Eclat Algorithm
Eclat algorithm is proposed by Zaki et al. [20] which works on vertical database. TID list of each
item is calculated and intersection of TID list of items is used for extracting frequent itemsets of
size k+1. No need of iterative scan of database but expensive to manipulate large TID list.
3.4. k-means Algorithm
The k-means algorithm [21] is well known technique of clustering which takes number of clusters
as input, random points are chosen as centre of gravity and distance measures to calculate
distance of each point from centre of gravity. Each point is assigned to only one cluster based on
high intra-cluster similarity and low inter-cluster similarity.
4. CLUSTBIGFIM ALGORITHM
This section gives high level architecture of ClustBigFIM algorithm and pseudo code of phases
used in ClustBigFIM algorithm.
4.1. High Level Architecture
Figure 2. High Level Architecture of ClustBigFIM Algorithm
Clustering is applied on large datasets as one of the pre-processing techniques and then frequent
itemsets are mined from clustered data using frequent itemset mining algorithms, Apriori and
Eclat.
5. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
83
4.2. ClustBigFIM on MapReduce
ClustBigFIM algorithm has below phases,
a. Find Clusters
b. Finding k-FIs
c. Generate single global TID list
d. Mining of subtree
4.2.1. Find Clusters
K-means clustering algorithm is used for finding clusters from given large datasets.
Clusters of transactions are formed based on below formula which calculates minimum
squared error,
J (Cs ) = 2
||
|| s
C
t
i
s
i
t µ
−
∑
∈
and assign each transaction to the cluster. Input to this phase is transaction dataset and number of
clusters, clusters of transactions are generated like C={t1,t10,...t40000}.
Input : Cluster Size and Dataset
Output : Clusters with size z
Steps :
1. Find distance between centres and transaction id in map phase.
2. Use combiner function to combine results of above step.
3. Compute MSE using below formula and assign all points to clusters in
reduce phase,
J (Cs ) = 2
||
|| s
C
t
i
s
i
t µ
−
∑
∈
J (C ) = ∑
=
S
s 1
2
||
|| s
C
t
i
s
i
t µ
−
∑
∈
4. Repeat steps 1-3 by changing Centre and stop when convergence criteria is
reached.
4.2.2. Finding k-FIs
Transaction ID list for large datasets cannot be handled by Eclat algorithm, So frequent
itemsets of size k are mined from generated clusters in above phase using Apriori
algorithm based on minimum support condition which handles problem of large datasets.
Prefix tree is generated using frequent itemsets.
6. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
84
Input : Cluster Size s, Minimum threshold σ , prefix length(l)
Output : Prefixes with length l and k-FIs
Steps :
5. Find support of all items in a cluster using Apriori algorithm.
6. Apply Support (xi)> σ and calculate FIs using monotonic property.
7. Repeat step 5-6 till calculating all k-FIs using mapper and reducers.
8. Repeat steps 5-7 for clusters (1 To S) and find final k-FIs.
9. Keep created prefixes in lexicographic order using lexicographic prefix
tree.
4.2.3. Generate single global TID list
Eclat algorithm uses vertical database, item and list of transactions where item is present.
The global TID list is generated by combining local TID list using mappers and reducers.
Generated TID list is used in next phase.
Input : Prefix Tree, Min Supportσ
Output : Single TID list of all items
Steps :
10. Calculate TID list using prefix tree in map phase
11. Create single TID list from TID list generated in above step. Perform
pruning with support( ia) ≤ support( ib) ↔ a < b
12. Generate prefix groups, Pk = (Pk
1
, Pk
2
, …, Pk
n
)
4.2.4. Mining of Subtree
Next (k+1) FIs are mined using Eclat algorithm. Prefix tree generated in phase2 is mined
independently by mappers and frequent itemsets are generated.
Input : Prefix tree, Minimum supportσ
Output : k-FIs
Steps :
13. Apply Eclat algorithm and find FIs till size k.
14. Repeat step 13 for each Subtree in map phase.
15. Find all frequent items of size k and store them in compressed trie
format.
7. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
85
5. EXPERIMENTS
This section gives overview of datasets used and experimental results with comparative analysis.
For experiments 2 machines are going to be used. Each machine contains Intel® Core ™ i5-
3230M CPU@2.60GHz processing units and 6.00GB RAM with Ubuntu 12.04 and Hadoop
1.1.2. Currently algorithm run on single pseudo distributed hadoop cluster.
Datasets used from standard UCI repository and FIMI repository in order to compare results with
existing systems such as DistEclat and BigFIM.
5.1. Dataset Information
Experiments are performed on below datasets,
Mushroom – Provided by FIMI repository [22] has 119 items and 8,124 transactions.
T10I4D100K- Provided by UCI repository [23] has 870 items and 100,000 transactions.
Retail - Provided by UCI repository [23].
Pumsb - Provided by FIMI repository [22] has 49,046 transactions.
5.2. Results Analysis
Experiments are performed on T10I4D100K, Retail, Mushroom and Pumsb dataset and execution
time required for generating k-FIs is compared based on number of mappers and Minimum
Support. Results shown that Dist-Eclat is faster than BigFIM and ClustBigFIM algorithm on
T10I4D100K but Dist-Eclat algorithm is not working on large datasets such as Pumsb. Dist-Eclat
is not scalable enough and faces memory problems as the dataset size increases.
Experiments performed on T10I4D100K dataset in order to compare execution time with
different Minimum Support and number of mappers on Dist-Eclat, BigFIM and ClustBigFIM.
Table 1. shows Execution Time (Sec) for T10I4D100K dataset with different values of Minimum
Support and 6 numbers of mappers. Figure 3. shows timing comparison for various methods on
T10I4D100K dataset which shows that Dist-Eclat has faster performance over BigFIM and
ClustBigFIM algorithm. Execution time decreases as Minimum Support value increases which
shows effect of Minimum Support on execution time.
Table 2. shows Execution Time (Sec) for T10I4D100K dataset with different values of Number
of mappers and Minimum Support 100. Figure 4. shows timing comparison for various methods
on T10I4D100K dataset which shows that Dist-Eclat has faster performance over BigFIM and
ClustBigFIM algorithm. Execution time increases as number of mappers increases as
communication cost between mappers and reducers increases.
Table 1. Execution Time (Sec) for T10I4D100K with different Support.
Dataset Algorithm
Min. Support
100 150 200 250 300
T10I4D100K
Dist-Eclat 12 10 9 9 10
BigFIM 33 22 19 16 15
ClustBigFIM 30 21 18 15 15
No. of Mappers - 6
8. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
86
Table 2. Execution Time (Sec) for T10I4D100K with different No. of Mappers
Dataset Algorithm
Number of Mappers
3 4 5 6 7
T10I4D100K
Dist-Eclat 6 7 7 9 9
BigFIM 21 25 29 32 37
ClustBigFIM 19 23 25 30 36
Minimum Support - 100
Figure 3. Timing comparison for various methods and Minimum Support on T10I4D100K
Figure 4. Timing comparison for different methods and No. of Mappers on T10I4D100K
9. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
87
Results have been shown that ClustBigFIM algorithm works on Big Data. Experiments are
performed on Pumsb dataset. Dist-Eclat algorithm faced memory problem with Pumsb dataset.
Results of ClustBigFIM are compared with BigFIM algorithm which is scalable.
Table 3. and Table 4. shows execution time taken for BigFIM and ClustBigFIM algorithm on
Pumsb dataset with variable Minimum Support and No. of Mappers. Number of mappers is 20
and Minimum Support is 40000 for the experiments. Figure 3. And Figure 5 and Figure 6. shows
that ClustBigFIM algorithm has better performance over BigFIM algorithm due to pre-
processing.
Table 3. Execution Time (Sec) for Pumsb with different Support.
Dataset Algorithm
Min. Support
25000 30000 35000 40000 45000
Pumsb
BigFIM 19462 6464 1256 453 36
ClustBigFIM 18500 5049 1100 440 30
No. of Mappers - 20
Table 4. Execution Time (Sec) for Pumsb with different No. of Mappers
Dataset Algorithm
Number of Mappers
10 15 20 25 30
Pumsb
BigFIM 390 422 439 441 442
ClustBigFIM 385 419 435 438 438
Minimum Support - 40000
.
Figure 5. Timing comparison for different methods and Minimum Support on Pumsb
10. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
88
Figure 6. Timing comparison for different methods and No. of Mappers on Pumsb
6. CONCLUSIONS
In this paper we implemented FIM algorithm based on MapReduce programming model. K-
means clustering algorithm focuses on pre-processing, frequent itemsets of size k are mined using
Apriori algorithm and discovered frequent itemsets are mined using Eclat algorithm.
ClustBigFIM works on large datasets with increased execution efficiency using pre-processing.
Experiments are done on transactional datasets, results shown that ClustBigFIM works on Big
Data very efficiently and with higher speed. We are planning to run ClustBigFIM algorithm on
different datasets for further comparative analysis.
REFERENCES
[1] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. The KDD process for
extracting useful knowledge from volumes of data. Commun. ACM 39, 11 (November 1996), 27-34.
DOI=10.1145/240455.240464
[2] Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets
of items in large databases. SIGMOD Rec. 22, 2 (June 1993), 207-216.
DOI=10.1145/170036.170072.
[3] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithms for discovery of association
rules. Data Min. and Knowl. Disc., pages 343–373, 1997.
[4] G. A. Andrews. Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-
Wesley, 2000.
[5] J. Li, Y. Liu, W. k. Liao, and A. Choudhary. Parallel data mining algorithms for association rules and
clustering. In Intl. Conf. on Management of Data, 2008.
[6] E. Ozkural, B. Ucar, and C. Aykanat. Parallel frequent item set mining with selective item replication.
IEEE Trans. Parallel Distrib. Syst., pages 1632–1640, 2011.
[7] M. J. Zaki. Parallel and distributed association mining: A survey. IEEE Concurrency, pages 14–25,
1999.
[8] L. Zeng, L. Li, L. Duan, K. Lu, Z. Shi, M. Wang, W. Wu, and P. Luo. Distributed data mining: a
survey. Information Technology and Management, pages 403–409, 2012.
[9] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. SIGMOD Rec.,
pages 1–12, 2000.
11. International Journal in Foundations of Computer Science & Technology (IJFCST), Vol.5, No.3, May 2015
89
[10] L. Liu, E. Li, Y. Zhang, and Z. Tang. Optimization of frequent itemset mining on multiple-core
processor. In Proceedings of the 33rd international conference on Very large data bases, VLDB ’07,
pages 1275–1285. VLDB Endowment, 2007.
[11] M.-Y. Lin, P.-Y. Lee and S.C. Hsueh. Apriori-based frequent itemset mining algorithms on
MapReduce. In Proc. ICUIMC, pages 26–30. ACM, 2012.
[12] N. Li, L. Zeng, Q. He, and Z. Shi. Parallel implementation of Apriori algorithm based on MapReduce.
In Proc. SNPD, pages 236–241, 2012.
[13] S. Hammoud. MapReduce Network Enabled Algorithms for Classification Based on Association
Rules. Thesis, 2011.
[14] L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng. Balanced parallel FP-Growth with
MapReduce. In Proc. YC-ICT, pages 243–246, 2010.
[15] Sheng-Hui Liu; Shi-Jia Liu; Shi-Xuan Chen; Kun-Ming Yu, "IOMRA - A High Efficiency Frequent
Itemset Mining Algorithm Based on the MapReduce Computation Model," Computational Science
and Engineering (CSE), 2014 IEEE 17th International Conference on , vol., no., pp.1290,1295, 19-21
Dec. 2014.doi: 10.1109/CSE.2014.247
[16] Moens, S.; Aksehirli, E.; Goethals, B., "Frequent Itemset Mining for Big Data," Big Data, 2013 IEEE
International Conference on , vol., no., pp.111,118, 6-9 Oct. 2013 doi:
10.1109/BigData.2013.6691742
[17] M. Riondato, J. A. DeBrabant, R. Fonseca, and E. Upfal. PARMA: a parallel randomized algorithm
for approximate association rules mining in MapReduce. In Proc. CIKM, pages 85–94. ACM, 2012.
[18] M. Malek and H. Kadima. Searching frequent itemsets by clustering data: towards a parallel approach
using mapreduce. In Proc. WISE 2011 and 2012 Workshops, pages 251–258. Springer Berlin
Heidelberg, 2013.
[19] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc.
VLDB, pages 487–499, 1994.
[20] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithms for discovery of association
rules. Data Min. and Knowl. Disc., pages 343–373, 1997.
[21] A K Jain, M N Murty, P. J. Flynn, ‘Data Clustering: A Review’, ACM COMPUTING SURVEYS,
1999.
[22] Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data, 2004.
[23] T. De Bie. An information theoretic framework for data mining. In Proc. ACM SIGKDD, pages 564–
572, 2011.