Mining frequent itemsets from the huge transactional database is an important task in data mining. To find frequent itemsets in databases involves big decision in data mining for the purpose of extracting association rules. Association rule mining is used to find relationships among large datasets. Many algorithms were developed to find those frequent itemsets. This work presents a summarization and new model of parallel key value pattern matching model which shards a large-scale mining task into independent, parallel tasks. It produces a frequent pattern showing their capabilities and efficiency in terms of time consumption. It also avoids the high computational cost. It discovers the frequent item set from the database.
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...IJDKP
In mining frequent itemsets, one of most important algorithm is FP-growth. FP-growth proposes an
algorithm to compress information needed for mining frequent itemsets in FP-tree and recursively
constructs FP-trees to find all frequent itemsets. In this paper, we propose the EFP-growth (enhanced FPgrowth)
algorithm to achieve the quality of FP-growth. Our proposed method implemented the EFPGrowth
based on MapReduce framework using Hadoop approach. New method has high achieving
performance compared with the basic FP-Growth. The EFP-growth it can work with the large datasets to
discovery frequent patterns in a transaction database. Based on our method, the execution time under
different minimum supports is decreased..
Construction of a compact FP-tree ensures that subsequent mining can be performed with a rather compact data structure. This does not automatically guarantee that it will be highly efficient since one may still encounter the combinatorial problem of candidate generation if one simply uses this FP-tree to generate and check all the candidate patterns. we study how to explore the compact information stored in an FP-tree, develop the principles of frequent-pattern growth by examination of our running example, explore how to perform further optimization when there exit a single prefix path in an FP-tree, and propose a frequent- pattern growth algorithm, FP-growth, for mining the complete set of frequent patterns using FP-tree.
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...IJDKP
In mining frequent itemsets, one of most important algorithm is FP-growth. FP-growth proposes an
algorithm to compress information needed for mining frequent itemsets in FP-tree and recursively
constructs FP-trees to find all frequent itemsets. In this paper, we propose the EFP-growth (enhanced FPgrowth)
algorithm to achieve the quality of FP-growth. Our proposed method implemented the EFPGrowth
based on MapReduce framework using Hadoop approach. New method has high achieving
performance compared with the basic FP-Growth. The EFP-growth it can work with the large datasets to
discovery frequent patterns in a transaction database. Based on our method, the execution time under
different minimum supports is decreased..
Construction of a compact FP-tree ensures that subsequent mining can be performed with a rather compact data structure. This does not automatically guarantee that it will be highly efficient since one may still encounter the combinatorial problem of candidate generation if one simply uses this FP-tree to generate and check all the candidate patterns. we study how to explore the compact information stored in an FP-tree, develop the principles of frequent-pattern growth by examination of our running example, explore how to perform further optimization when there exit a single prefix path in an FP-tree, and propose a frequent- pattern growth algorithm, FP-growth, for mining the complete set of frequent patterns using FP-tree.
Abstract: Sequential pattern mining, which discovers the correlation relationships from the ordered list of
events, is an important research field in data mining area. In our study, we have developed a Sequential
Pattern Tree structure to store both frequent and non-frequent items from sequence database. It requires only
one scan of database to build the tree due to storage of non-frequent items which reduce the tree construction
time considerably. Then, we have proposed an efficient Sequential Pattern Tree Mining algorithm which can
generate frequent sequential patterns from the Sequential Pattern Tree recursively. The main advantage of this
algorithm is to mine the complete set of frequent sequential patterns from the Sequential Pattern Tree without
generating any intermediate projected tree. Again, it does not generate unnecessary candidate sequences and
not require repeated scanning of the original database. We have compared our proposed approach with three
existing algorithms and our performance study shows that, our algorithm is much faster than apriori based GSP
algorithm and also faster than existing PrefixSpan and Tree Based Mining algorithm which are based on
pattern growth approaches.
Keywords: Data Mining, Sequence Database, Sequential Pattern, Sequential Pattern Mining, Frequent
Patterns, Tree Based Mining.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Weighted frequent pattern mining is suggested to find out more important frequent pattern by considering different weights of each item. Weighted Frequent Patterns are generated in weight ascending and frequency descending order by using prefix tree structure. These generated weighted frequent patterns are applied to maximal frequent item set mining algorithm. Maximal frequent pattern mining can reduces the number of frequent patterns and keep sufficient result information. In this paper, we proposed an efficient algorithm to mine maximal weighted frequent pattern mining over data streams. A new efficient data structure i.e. prefix tree and conditional tree structure is used to dynamically maintain the information of transactions. Here, three information mining strategies (i.e. Incremental, Interactive and Maximal) are presented. The detail of the algorithms is also discussed. Our study has submitted an application to the Electronic shop Market Basket Analysis. Experimental studies are performed to evaluate the good effectiveness of our algorithm..
Existing parallel digging calculations for visit itemsets do not have a component that empowers programmed parallelization, stack adjusting, information conveyance, and adaptation to non-critical failure on substantial bunches. As an answer for this issue, we outline a parallel incessant itemsets mining calculation called FiDoop utilizing the MapReduce programming model. To accomplish compacted capacity and abstain from building contingent example bases, FiDoop joins the incessant things Ultrametric tree, as opposed to ordinary FP trees. In FiDoop, three MapReduce occupations are actualized to finish the mining undertaking. In the essential third MapReduce work, the mappers autonomously disintegrate itemsets, the reducers perform mix activities by building little Ultrametric trees, and the genuine mining of these trees independently. We actualize FiDoop on our in-house Hadoop group. We demonstrate that FiDoop on the group is touchy to information dissemination and measurements, in light of the fact that itemsets with various lengths have diverse decay and development costs. To enhance FiDoop's execution, we build up a workload adjust metric to quantify stack adjust over the group's registering hubs. We create FiDoop-HD, an augmentation of FiDoop, to accelerate the digging execution for high-dimensional information investigation. Broad tests utilizing genuine heavenly phantom information exhibit that our proposed arrangement is productive and versatile.
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...AshishDPatel1
The sequential pattern mining generates the sequential patterns. It can be used as the input of another program for retrieving the information from the large collection of data. It requires a large amount of memory as well as numerous I/O operations. Multistage operations reduce the efficiency of the
algorithm. The given GACP is based on graph representation and avoids recursively reconstructing intermediate trees during the mining process. The algorithm also eliminates the need of repeatedly scanning the database. A graph used in GACP is a data structure accessed starting at its first node called root and each node of a graph is either a leaf or an interior node. An interior node has one or more child nodes, thus from the root to any node in the graph defines a sequence. After construction of the graph the pruning technique called clustering is used to retrieve the records from the graph. The algorithm can be used to mine the database using compact memory based data structures and cleaver pruning methods.
SPATIAL R-TREE INDEX BASED ON GRID DIVISION FOR QUERY PROCESSINGijdms
Tracing moving objects have turned out to be essential in our life and have a lot of uses like: GPS guide,
traffic monitor based administrations and location-based services. Tracking the changing places of objects
has turned into important issues. The moving entities send their positions to the server through a system
and large amount of data is generated from these objects with high frequent updates so we need an index
structure to retrieve information as fast as possible. The index structure should be adaptive, dynamic to
monitor the locations of objects and quick to give responses to the inquiries efficiently. The most wellknown
kinds of queries strategies in moving objects databases are Rang, Point and K-Nearest Neighbour
and inquiries. This study uses R-tree method to get detailed range query results efficiently. But using R-tree
only will generate much overlapping and coverage between MBR. So R-tree by combining with Gridpartition
index is used because grid-index can reduce the overlap and coverage between MBR. The query
performance will be efficient by using these methods. We perform an extensive experimental study to
compare the two approaches on modern hardware.
Data mining is a very popular research topic over the years. Sequential pattern mining or sequential rule mining is very useful application of data mining for the prediction purpose. In this paper, we have presented a review over sequential rule cum sequential pattern mining. The advantages & drawbacks of each popular sequential mining method is discussed in brief.
Combined mining approach to generate patterns for complex datacsandit
In Data mining applications, which often involve complex data like multiple heterogeneous data
sources, user preferences, decision-making actions and business impacts etc., the complete
useful information cannot be obtained by using single data mining method in the form of
informative patterns as that would consume more time and space, if and only if it is possible to
join large relevant data sources for discovering patterns consisting of various aspects of useful
information. We consider combined mining as an approach for mining informative patterns
from multiple data-sources or multiple-features or by multiple-methods as per the requirements.
In combined mining approach, we applied Lossy-counting algorithm on each data-source to get
the frequent data item-sets and then get the combined association rules. In multi-feature
combined mining approach, we obtained pair patterns and cluster patterns and then generate
incremental pair patterns and incremental cluster patterns, which cannot be directly generated
by the existing methods. In multi-method combined mining approach, we combine FP-growth
and Bayesian Belief Network to make a classifier to get more informative knowledge.
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATAcscpconf
In Data mining applications, which often involve complex data like multiple heterogeneous data sources, user preferences, decision-making actions and business impacts etc., the complete useful information cannot be obtained by using single data mining method in the form of informative patterns as that would consume more time and space, if and only if it is possible to join large relevant data sources for discovering patterns consisting of various aspects of useful information. We consider combined mining as an approach for mining informative patterns
from multiple data-sources or multiple-features or by multiple-methods as per the requirements. In combined mining approach, we applied Lossy-counting algorithm on each data-source to get the frequent data item-sets and then get the combined association rules. In multi-feature combined mining approach, we obtained pair patterns and cluster patterns and then generate incremental pair patterns and incremental cluster patterns, which cannot be directly generated by the existing methods. In multi-method combined mining approach, we combine FP-growth and Bayesian Belief Network to make a classifier to get more informative knowledge.
In this paper, we present a literature survey of existing frequent item set mining algorithms. The concept of frequent item set mining is also discussed in brief. The working procedure of some modern frequent item set mining techniques is given. Also the merits and demerits of each method are described. It is found that the frequent item set mining is still a burning research topic.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...IAEME Publication
Association rule mining plays an important role in decision support system. Nowadays in the era of internet, various online marketing sites and social networking sites are generating enormous amount of structural/semi structural data in the form of sales data, tweets, emails, web pages and so on. This online generated data is too large that it becomes very complex to process and analyze it using traditional systems which consumes more time. This paper overcomes the main memory bottleneck in single computing system. There are two major goals of this paper. In this paper, big sales dataset of AMUL dairy is preprocessed using Hadoop Map Reduce that convert it into the transactional dataset. Then, after removing the null transactions; distributed frequent pattern mining algorithm MR-DARM (Map Reduce based Distributed Association Rule Mining) is used to find most frequent item set. Finally, strong association rules are generated from frequent item sets. The paper also compares the time efficiency of MR-DARM algorithm with existing Count Distributed Algorithm (CDA) and Fast Distributed Mining (FDM) distributed frequent pattern mining algorithms. The compared algorithms are presented together with experimental results that lead to the final conclusions.
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...idescitation
With the rapid growth of information technology and in many business
applications, mining frequent patterns and finding associations among them requires
handling large and distributed databases. As FP-tree considered being the best compact data
structure to hold the data patterns in memory there has been efforts to make it parallel and
distributed to handle large databases. However, it incurs lot of communication over head
during the mining. In this paper parallel and distributed frequent pattern mining algorithm
using Hadoop Map Reduce framework is proposed, which shows best performance results
for large databases. Proposed algorithm partitions the database in such a way that, it works
independently at each local node and locally generates the frequent patterns by sharing the
global frequent pattern header table. These local frequent patterns are merged at final stage.
This reduces the complete communication overhead during structure construction as well as
during pattern mining. The item set count is also taken into consideration reducing
processor idle time. Hadoop Map Reduce framework is used effectively in all the steps of the
algorithm. Experiments are carried out on a PC cluster with 5 computing nodes which
shows execution time efficiency as compared to other algorithms. The experimental result
shows that proposed algorithm efficiently handles the scalability for very large datab ases.
Index Terms—
Abstract: Sequential pattern mining, which discovers the correlation relationships from the ordered list of
events, is an important research field in data mining area. In our study, we have developed a Sequential
Pattern Tree structure to store both frequent and non-frequent items from sequence database. It requires only
one scan of database to build the tree due to storage of non-frequent items which reduce the tree construction
time considerably. Then, we have proposed an efficient Sequential Pattern Tree Mining algorithm which can
generate frequent sequential patterns from the Sequential Pattern Tree recursively. The main advantage of this
algorithm is to mine the complete set of frequent sequential patterns from the Sequential Pattern Tree without
generating any intermediate projected tree. Again, it does not generate unnecessary candidate sequences and
not require repeated scanning of the original database. We have compared our proposed approach with three
existing algorithms and our performance study shows that, our algorithm is much faster than apriori based GSP
algorithm and also faster than existing PrefixSpan and Tree Based Mining algorithm which are based on
pattern growth approaches.
Keywords: Data Mining, Sequence Database, Sequential Pattern, Sequential Pattern Mining, Frequent
Patterns, Tree Based Mining.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Weighted frequent pattern mining is suggested to find out more important frequent pattern by considering different weights of each item. Weighted Frequent Patterns are generated in weight ascending and frequency descending order by using prefix tree structure. These generated weighted frequent patterns are applied to maximal frequent item set mining algorithm. Maximal frequent pattern mining can reduces the number of frequent patterns and keep sufficient result information. In this paper, we proposed an efficient algorithm to mine maximal weighted frequent pattern mining over data streams. A new efficient data structure i.e. prefix tree and conditional tree structure is used to dynamically maintain the information of transactions. Here, three information mining strategies (i.e. Incremental, Interactive and Maximal) are presented. The detail of the algorithms is also discussed. Our study has submitted an application to the Electronic shop Market Basket Analysis. Experimental studies are performed to evaluate the good effectiveness of our algorithm..
Existing parallel digging calculations for visit itemsets do not have a component that empowers programmed parallelization, stack adjusting, information conveyance, and adaptation to non-critical failure on substantial bunches. As an answer for this issue, we outline a parallel incessant itemsets mining calculation called FiDoop utilizing the MapReduce programming model. To accomplish compacted capacity and abstain from building contingent example bases, FiDoop joins the incessant things Ultrametric tree, as opposed to ordinary FP trees. In FiDoop, three MapReduce occupations are actualized to finish the mining undertaking. In the essential third MapReduce work, the mappers autonomously disintegrate itemsets, the reducers perform mix activities by building little Ultrametric trees, and the genuine mining of these trees independently. We actualize FiDoop on our in-house Hadoop group. We demonstrate that FiDoop on the group is touchy to information dissemination and measurements, in light of the fact that itemsets with various lengths have diverse decay and development costs. To enhance FiDoop's execution, we build up a workload adjust metric to quantify stack adjust over the group's registering hubs. We create FiDoop-HD, an augmentation of FiDoop, to accelerate the digging execution for high-dimensional information investigation. Broad tests utilizing genuine heavenly phantom information exhibit that our proposed arrangement is productive and versatile.
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...AshishDPatel1
The sequential pattern mining generates the sequential patterns. It can be used as the input of another program for retrieving the information from the large collection of data. It requires a large amount of memory as well as numerous I/O operations. Multistage operations reduce the efficiency of the
algorithm. The given GACP is based on graph representation and avoids recursively reconstructing intermediate trees during the mining process. The algorithm also eliminates the need of repeatedly scanning the database. A graph used in GACP is a data structure accessed starting at its first node called root and each node of a graph is either a leaf or an interior node. An interior node has one or more child nodes, thus from the root to any node in the graph defines a sequence. After construction of the graph the pruning technique called clustering is used to retrieve the records from the graph. The algorithm can be used to mine the database using compact memory based data structures and cleaver pruning methods.
SPATIAL R-TREE INDEX BASED ON GRID DIVISION FOR QUERY PROCESSINGijdms
Tracing moving objects have turned out to be essential in our life and have a lot of uses like: GPS guide,
traffic monitor based administrations and location-based services. Tracking the changing places of objects
has turned into important issues. The moving entities send their positions to the server through a system
and large amount of data is generated from these objects with high frequent updates so we need an index
structure to retrieve information as fast as possible. The index structure should be adaptive, dynamic to
monitor the locations of objects and quick to give responses to the inquiries efficiently. The most wellknown
kinds of queries strategies in moving objects databases are Rang, Point and K-Nearest Neighbour
and inquiries. This study uses R-tree method to get detailed range query results efficiently. But using R-tree
only will generate much overlapping and coverage between MBR. So R-tree by combining with Gridpartition
index is used because grid-index can reduce the overlap and coverage between MBR. The query
performance will be efficient by using these methods. We perform an extensive experimental study to
compare the two approaches on modern hardware.
Data mining is a very popular research topic over the years. Sequential pattern mining or sequential rule mining is very useful application of data mining for the prediction purpose. In this paper, we have presented a review over sequential rule cum sequential pattern mining. The advantages & drawbacks of each popular sequential mining method is discussed in brief.
Combined mining approach to generate patterns for complex datacsandit
In Data mining applications, which often involve complex data like multiple heterogeneous data
sources, user preferences, decision-making actions and business impacts etc., the complete
useful information cannot be obtained by using single data mining method in the form of
informative patterns as that would consume more time and space, if and only if it is possible to
join large relevant data sources for discovering patterns consisting of various aspects of useful
information. We consider combined mining as an approach for mining informative patterns
from multiple data-sources or multiple-features or by multiple-methods as per the requirements.
In combined mining approach, we applied Lossy-counting algorithm on each data-source to get
the frequent data item-sets and then get the combined association rules. In multi-feature
combined mining approach, we obtained pair patterns and cluster patterns and then generate
incremental pair patterns and incremental cluster patterns, which cannot be directly generated
by the existing methods. In multi-method combined mining approach, we combine FP-growth
and Bayesian Belief Network to make a classifier to get more informative knowledge.
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATAcscpconf
In Data mining applications, which often involve complex data like multiple heterogeneous data sources, user preferences, decision-making actions and business impacts etc., the complete useful information cannot be obtained by using single data mining method in the form of informative patterns as that would consume more time and space, if and only if it is possible to join large relevant data sources for discovering patterns consisting of various aspects of useful information. We consider combined mining as an approach for mining informative patterns
from multiple data-sources or multiple-features or by multiple-methods as per the requirements. In combined mining approach, we applied Lossy-counting algorithm on each data-source to get the frequent data item-sets and then get the combined association rules. In multi-feature combined mining approach, we obtained pair patterns and cluster patterns and then generate incremental pair patterns and incremental cluster patterns, which cannot be directly generated by the existing methods. In multi-method combined mining approach, we combine FP-growth and Bayesian Belief Network to make a classifier to get more informative knowledge.
In this paper, we present a literature survey of existing frequent item set mining algorithms. The concept of frequent item set mining is also discussed in brief. The working procedure of some modern frequent item set mining techniques is given. Also the merits and demerits of each method are described. It is found that the frequent item set mining is still a burning research topic.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...IAEME Publication
Association rule mining plays an important role in decision support system. Nowadays in the era of internet, various online marketing sites and social networking sites are generating enormous amount of structural/semi structural data in the form of sales data, tweets, emails, web pages and so on. This online generated data is too large that it becomes very complex to process and analyze it using traditional systems which consumes more time. This paper overcomes the main memory bottleneck in single computing system. There are two major goals of this paper. In this paper, big sales dataset of AMUL dairy is preprocessed using Hadoop Map Reduce that convert it into the transactional dataset. Then, after removing the null transactions; distributed frequent pattern mining algorithm MR-DARM (Map Reduce based Distributed Association Rule Mining) is used to find most frequent item set. Finally, strong association rules are generated from frequent item sets. The paper also compares the time efficiency of MR-DARM algorithm with existing Count Distributed Algorithm (CDA) and Fast Distributed Mining (FDM) distributed frequent pattern mining algorithms. The compared algorithms are presented together with experimental results that lead to the final conclusions.
Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Fram...idescitation
With the rapid growth of information technology and in many business
applications, mining frequent patterns and finding associations among them requires
handling large and distributed databases. As FP-tree considered being the best compact data
structure to hold the data patterns in memory there has been efforts to make it parallel and
distributed to handle large databases. However, it incurs lot of communication over head
during the mining. In this paper parallel and distributed frequent pattern mining algorithm
using Hadoop Map Reduce framework is proposed, which shows best performance results
for large databases. Proposed algorithm partitions the database in such a way that, it works
independently at each local node and locally generates the frequent patterns by sharing the
global frequent pattern header table. These local frequent patterns are merged at final stage.
This reduces the complete communication overhead during structure construction as well as
during pattern mining. The item set count is also taken into consideration reducing
processor idle time. Hadoop Map Reduce framework is used effectively in all the steps of the
algorithm. Experiments are carried out on a PC cluster with 5 computing nodes which
shows execution time efficiency as compared to other algorithms. The experimental result
shows that proposed algorithm efficiently handles the scalability for very large datab ases.
Index Terms—
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
In this paper, MDL based reduction in frequent pattern is presented. The ideal outcome of any pattern mining process is to explore the data in new insights. And also, we need to eliminate the non-interesting patterns that describe noise. The major problem in frequent pattern mining is to identify the interesting patterns. Instead of performing association rule mining on all the frequent item sets, it is feasible to select a sub set of frequent item sets and perform the mining task. Selecting a small set of frequent item sets from large amount of interesting ones is a difficult task. In our approach, MDL based algorithm is used for reducing the number of frequent item sets to be used for association rule mining is presented.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules.
An efficient algorithm for sequence generation in data miningijcisjournal
Data mining is the method or the activity of analyzing data from different perspectives and summarizing it
into useful information. There are several major data mining techniques that have been developed and are
used in the data mining projects which include association, classification, clustering, sequential patterns,
prediction and decision tree. Among different tasks in data mining, sequential pattern mining is one of the
most important tasks. Sequential pattern mining involves the mining of the subsequences that appear
frequently in a set of sequences. It has a variety of applications in several domains such as the analysis of
customer purchase patterns, protein sequence analysis, DNA analysis, gene sequence analysis, web access
patterns, seismologic data and weather observations. Various models and algorithms have been developed
for the efficient mining of sequential patterns in large amount of data. This research paper analyzes the
efficiency of three sequence generation algorithms namely GSP, SPADE and PrefixSpan on a retail dataset
by applying various performance factors. From the experimental results, it is observed that the PrefixSpan
algorithm is more efficient than other two algorithms.
Multi-threaded approach in generating frequent itemset of Apriori algorithm b...TELKOMNIKA JOURNAL
This research is about the application of multi-threaded and trie data structures to the support calculation problem in the Apriori algorithm.
The support calculation results can search the association rule for market basket analysis problems. The support calculation process is a bottleneck process and can cause delays in the following process. This work observed five multi-threaded models based on Flynn’s taxonomy, which are single process, multiple data (SPMD), multiple process, single data (MPSD), multiple process, multiple data (MPMD), double SPMD first variant, and double SPMD second variant to shorten the processing time of the support calculation. In addition to the processing time, this works also consider the time difference between each multi-threaded model when the number of item variants increases. The time obtained from the experiment shows that the multi-threaded model that applies a double SPMD variant structure can perform almost three times faster than the multi-threaded model that applies the SPMD structure, MPMD structure, and combination of MPMD and SPMD based on the time difference of 5-itemsets and 10-itemsets experimental result.
Frequent pattern mining techniques helpful to find interesting trends or patterns in
massive data. Prior domain knowledge leads to decide appropriate minimum support threshold. This
review article show different frequent pattern mining techniques based on apriori or FP-tree or user
define techniques under different computing environments like parallel, distributed or available data
mining tools, those helpful to determine interesting frequent patterns/itemsets with or without prior
domain knowledge. Proposed review article helps to develop efficient and scalable frequent pattern
mining techniques.
An Efficient Compressed Data Structure Based Method for Frequent Item Set Miningijsrd.com
Frequent pattern mining is very important for business organizations. The major applications of frequent pattern mining include disease prediction and analysis, rain forecasting, profit maximization, etc. In this paper, we are presenting a new method for mining frequent patterns. Our method is based on a new compact data structure. This data structure will help in reducing the execution time.
Web Oriented FIM for large scale dataset using Hadoopdbpublications
In large scale datasets, mining frequent itemsets using existing parallel mining algorithm is to balance the load by distributing such enormous data between collections of computers. But we identify high performance issue in existing mining algorithms [1]. To handle this problem, we introduce a new approach called data partitioning using Map Reduce programming model.In our proposed system, we have introduced new technique called frequent itemset ultrametric tree rather than conservative FP-trees. An investigational outcome tells us that, eradicating redundant transaction results in improving the performance by reducing computing loads.
FP-Tree is also a huge hierarchical data structure and cannot fit into the main memory also it is not suitable for “Incremental-mining” nor used in “Interactive-mining” system
An incremental mining algorithm for maintaining sequential patterns using pre...Editor IJMTER
Mining useful information and helpful knowledge from large databases has evolved into
an important research area in recent years. Among the classes of knowledge derived, finding
sequential patterns in temporal transaction databases is very important since it can help model
customer behavior. In the past, researchers usually assumed databases were static to simplify datamining problems. In real-world applications, new transactions may be added into databases
frequently. Designing an efficient and effective mining algorithm that can maintain sequential
patterns as a database grows is thus important. In this paper, we propose a novel incremental mining
algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce
the need for rescanning original databases.
Mining Frequent Item set Using Genetic Algorithmijsrd.com
By applying rule mining algorithms, frequent itemsets are generated from large data sets e.g. Apriori algorithm. It takes so much computer time to compute all frequent itemsets. We can solve this problem much efficiently by using Genetic Algorithm(GA). GA performs global search and the time complexity is less compared to other algorithms. Genetic Algorithms (GAs) are adaptive heuristic search & optimization method for solving both constrained and unconstrained problems based on the evolutionary ideas of natural selection and genetic. The main aim of this work is to find all the frequent itemsets from given data sets using genetic algorithm & compare the results generated by GA with other algorithms. Population size, number of generation, crossover probability, and mutation probability are the parameters of GA which affect the quality of result and time of calculation.
Similar to Parallel Key Value Pattern Matching Model (20)
Due to availability of internet and evolution of embedded devices, Internet of things can be useful to contribute in energy domain. The Internet of Things (IoT) will deliver a smarter grid to enable more information and connectivity throughout the infrastructure and to homes. Through the IoT, consumers, manufacturers and utility providers will come across new ways to manage devices and ultimately conserve resources and save money by using smart meters, home gateways, smart plugs and connected appliances. The future smart home, various devices will be able to measure and share their energy consumption, and actively participate in house-wide or building wide energy management systems. This paper discusses the different approaches being taken worldwide to connect the smart grid. Full system solutions can be developed by combining hardware and software to address some of the challenges in building a smarter and more connected smart grid.
A Survey Report on : Security & Challenges in Internet of Thingsijsrd.com
In the era of computing technology, Internet of Things (IoT) devices are now popular in each and every domains like e-governance, e-Health, e-Home, e-Commerce, and e-Trafficking etc. Iot is spreading from small to large applications in all fields like Smart Cities, Smart Grids, Smart Transportation. As on one side IoT provide facilities and services for the society. On the other hand, IoT security is also a crucial issues.IoT security is an area which totally concerned for giving security to connected devices and networks in the IoT .As, IoT is vast area with usability, performance, security, and reliability as a major challenges in it. The growth of the IoT is exponentially increases as driven by market pressures, which proportionally increases the security threats involved in IoT The relationship between the security and billions of devices connecting to the Internet cannot be described with existing mathematical methods. In this paper, we explore the opportunities possible in the IoT with security threats and challenges associated with it.
In today’s emerging world of Internet, each and every thing is supposed to be in connected mode with the help of billions of smart devices. By connecting all the devises used in our day to day life, make our life trouble less and easy. We are incorporated in a world where we are used to have smart phones, smart cars, smart gadgets, smart homes and smart cities. Different institutes and researchers are working for creating a smart world for us but real question which we need to emphasis on is how to make dumb devises talk with uncommon hardware and communication technology. For the same what kind of mechanism to use with various protocols and less human interaction. The purpose is to provide the key area for application of IoT and a platform on which various devices having different mechanism and protocols can communicate with an integrated architecture.
Study on Issues in Managing and Protecting Data of IOTijsrd.com
This paper discusses variety of issues for preserving and managing data produced by IoT. Every second large amount of data are added or updated in the IoT databases across the heterogeneous environment. While managing the data each phase of data processing for IoT data is exigent like storing data, querying, indexing, transaction management and failure handling. We also refer to the problem of data integration and protection as data requires to be fit in single layout and travel securely as they arrive in the pool from diversified sources in different structure. Finally, we confer a standardized pathway to manage and to defend data in consistent manner.
Interactive Technologies for Improving Quality of Education to Build Collabor...ijsrd.com
Today with advancement in Information Communication Technology (ICT) the way the education is being delivered is seeing a paradigm shift from boring classroom lectures to interactive applications such as 2-D and 3-D learning content, animations, live videos, response systems, interactive panels, education games, virtual laboratories and collaborative research (data gathering and analysis) etc. Engineering is emerging with more innovative solutions in the field of education and bringing out their innovative products to improve education delivery. The academic institutes which were once hesitant to use such technology are now looking forward to such innovations. They are adopting the new ways as they are realizing the vast benefits of using such methods and technology. The benefits are better comprehensibility, improved learning efficiency of students, and access to vast knowledge resources, geographical reach, quick feedback, accountability and quality research. This paper focuses on how engineering can leverage the latest technology and build a collaborative learning environment which can then be integrated with the national e-learning grid.
Internet of Things - Paradigm Shift of Future Internet Application for Specia...ijsrd.com
In the world more than 15% people are living with disability that also include children below age of 10 years. Due to lack of independent support services specially abled (handicap) people overly rely on other people for their basic needs, that excludes them from being financially and socially active. The Internet of Things (IoT) can give support system and a better quality of life as well as participation in routine and day to day life. For this purpose, the future solutions for current problems has been introduced in this paper. Daunting challenges have been considered as future research and glimpse of the IoT for specially abled person is given in the paper.
A Study of the Adverse Effects of IoT on Student's Lifeijsrd.com
Internet of things (IoT) is the most powerful invention and if used in the positive direction, internet can prove to be very productive. But, now a days, due to the social networking sites such as Face book, WhatsApp, twitter, hike etc. internet is producing adverse effects on the student life, especially those students studying at college Level. As it is rightly said, something which has some positive effects also has some of the negative effects on the other hand. In this article, we are discussing some adverse effects of IoT on student’s life.
Pedagogy for Effective use of ICT in English Language Learningijsrd.com
The use of information and communications technology (ICT) in education is a relatively new phenomenon and it has been the educational researchers' focus of attention for more than two decades. Educators and researchers examine the challenges of using ICT and think of new ways to integrate ICT into the curriculum. However, there are some barriers for the teachers that prevent them to use ICT in the classroom and develop supporting materials through ICT. The purpose of this study is to examine the high school English teachers’ perceptions of the factors discouraging teachers to use ICT in the classroom.
In recent years usage of private vehicles create urban traffic more and more crowded. As result traffic becomes one of the important problems in big cities in all over the world. Some of the traffic concerns are traffic jam and accidents which have caused a huge waste of time, more fuel consumption and more pollution. Time is very important parameter in routine life. The main problem faced by the people is real time routing. Our solution Virtual Eye will provide the current updates as in the real time scenario of the specific route. This research paper presents smart traffic navigation system, based on Internet of Things, which is featured by low cost, high compatibility, easy to upgrade, to replace traditional traffic management system and the proposed system can improve road traffic tremendously.
Ontological Model of Educational Programs in Computer Science (Bachelor and M...ijsrd.com
In this work there is illustrated an ontological model of educational programs in computer science for bachelor and master degrees in Computer science and for master educational program “Computer science as second competence†by Tempus project PROMIS.
Understanding IoT Management for Smart Refrigeratorijsrd.com
Lately the concept of Internet of Things (IoT) is being more elaborated and devices and databases are proposed thereby to meet the need of an Internet of Things scenario. IoT is being considered to be an integral part of smart house where devices will be connected to each other and also react upon certain environmental input. This will eventually include the home refrigerator, air conditioner, lights, heater and such other home appliances. Therefore, we focus our research on the database part for such an IoT’ fridge which we called as smart Fridge. We describe the potentials achievable through a database for an IoT refrigerator to manage the refrigerator food and also aid the creation of a monthly budget of the house for a family. The paper aims at the data management issue based on a proposed design for an intelligent refrigerator leveraging the sensor technology and the wireless communication technology. The refrigerator which identifies products by reading the barcodes or RFID tags is proposed to order the required products by connecting to the Internet. Thus the goal of this paper is to minimize human interaction to maintain the daily life events.
DESIGN AND ANALYSIS OF DOUBLE WISHBONE SUSPENSION SYSTEM USING FINITE ELEMENT...ijsrd.com
Double wishbone designs allow the engineer to carefully control the motion of the wheel throughout suspension travel. 3-D model of the Lower Wishbone Arm is prepared by using CAD software for modal and stress analysis. The forces and moments are used as the boundary conditions for finite element model of the wishbone arm. By using these boundary conditions static analysis is carried out. Then making the load as a function of time; quasi-static analysis of the wishbone arm is carried out. A finite element based optimization is used to optimize the design of lower wishbone arm. Topology optimization and material optimization techniques are used to optimize lower wishbone arm design.
A Review: Microwave Energy for materials processingijsrd.com
Microwave energy is a latest largest growing technique for material processing. This paper presents a review of microwave technologies used for material processing and its use for industrial applications. Advantages in using microwave energy for processing material include rapid heating, high heating efficiency, heating uniformity and clean energy. The microwave heating has various characteristics and due to which it has been become popular for heating low temperature applications to high temperature applications. In recent years this novel technique has been successfully utilized for the processing of metallic materials. Many researchers have reported microwave energy for sintering, joining and cladding of metallic materials. The aim of this paper is to show the use of microwave energy not only for non-metallic materials but also the metallic materials. The ability to process metals with microwave could assist in the manufacturing of high performance metal parts desired in many industries, for example in automotive and aeronautical industries.
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logsijsrd.com
With an expontial growth of World Wide Web, there are so many information overloaded and it became hard to find out data according to need. Web usage mining is a part of web mining, which deal with automatic discovery of user navigation pattern from web log. This paper presents an overview of web mining and also provide navigation pattern from classification and clustering algorithm for web usage mining. Web usage mining contain three important task namely data preprocessing, pattern discovery and pattern analysis based on discovered pattern. And also contain the comparative study of web mining techniques.
APPLICATION OF STATCOM to IMPROVED DYNAMIC PERFORMANCE OF POWER SYSTEMijsrd.com
Application of FACTS controller called Static Synchronous Compensator STATCOM to improve the performance of power grid with Wind Farms is investigated .The essential feature of the STATCOM is that it has the ability to absorb or inject fastly the reactive power with power grid . Therefore the voltage regulation of the power grid with STATCOM FACTS device is achieved. Moreover restoring the stability of the power system having wind farm after occurring severe disturbance such as faults or wind farm mechanical power variation is obtained with STATCOM controller . The dynamic model of the power system having wind farm controlled by proposed STATCOM is developed . To validate the powerful of the STATCOM FACTS controller, the studied power system is simulated and subjected to different severe disturbances. The results prove the effectiveness of the proposed STATCOM controller in terms of fast damping the power system oscillations and restoring the power system stability.
Making model of dual axis solar tracking with Maximum Power Point Trackingijsrd.com
Now a days solar harvesting is more popular. As the popularity become higher the material quality and solar tracking methods are more improved. There are several factors affecting the solar system. Major influence on solar cell, intensity of source radiation and storage techniques The materials used in solar cell manufacturing limit the efficiency of solar cell. This makes it particularly difficult to make considerable improvements in the performance of the cell, and hence restricts the efficiency of the overall collection process. Therefore, the most attainable maximum power point tracking method of improving the performance of solar power collection is to increase the mean intensity of radiation received from the source used. The purposed of tracking system controls elevation and orientation angles of solar panels such that the panels always maintain perpendicular to the sunlight. The measured variables of our automatic system were compared with those of a fixed angle PV system. As a result of the experiment, the voltage generated by the proposed tracking system has an overall of about 28.11% more than the fixed angle PV system. There are three major approaches for maximizing power extraction in medium and large scale systems. They are sun tracking, maximum power point (MPP) tracking or both.
A REVIEW PAPER ON PERFORMANCE AND EMISSION TEST OF 4 STROKE DIESEL ENGINE USI...ijsrd.com
In day today's relevance, it is mandatory to device the usage of diesel in an economic way. In present scenario, the very low combustion efficiency of CI engine leads to poor performance of engine and produces emission due to incomplete combustion. Study of research papers is focused on the improvement in efficiency of the engine and reduction in emissions by adding ethanol in a diesel with different blends like 5%, 10%, 15%, 20%, 25% and 30% by volume. The performance and emission characteristics of the engine are tested observed using blended fuels and comparative assessment is done with the performance and emission characteristics of engine using pure diesel.
Study and Review on Various Current Comparatorsijsrd.com
This paper presents study and review on various current comparators. It also describes low voltage current comparator using flipped voltage follower (FVF) to obtain the single supply voltage. This circuit has short propagation delay and occupies a small chip area as compare to other current comparators. The results of this circuit has obtained using PSpice simulator for 0.18 μm CMOS technology and a comparison has been performed with its non FVF counterpart to contrast its effectiveness, simplicity, compactness and low power consumption.
Reducing Silicon Real Estate and Switching Activity Using Low Power Test Patt...ijsrd.com
Power dissipation is a challenging problem for today's system-on-chip design and test. This paper presents a novel architecture which generates the test patterns with reduced switching activities; it has the advantage of low test power and low hardware overhead. The proposed LP-TPG (test pattern generator) structure consists of modified low power linear feedback shift register (LP-LFSR), m-bit counter, gray counter, NOR-gate structure and XOR-array. The seed generated from LP-LFSR is EXCLUSIVE-OR ed with the data generated from gray code generator. The XOR result of the sequence is single input changing (SIC) sequence, in turn reduces the switching activity and so power dissipation will be very less. The proposed architecture is simulated using Modelsim and synthesized using Xilinx ISE9.2.The Xilinx chip scope tool will be used to test the logic running on FPGA.
Defending Reactive Jammers in WSN using a Trigger Identification Service.ijsrd.com
In the last decade, the greatest threat to the wireless sensor network has been Reactive Jamming Attack because it is difficult to be disclosed and defend as well as due to its mass destruction to legitimate sensor communications. As discussed above about the Reactive Jammers Nodes, a new scheme to deactivate them efficiently is by identifying all trigger nodes, where transmissions invoke the jammer nodes, which has been proposed and developed. Due to this identification mechanism, many existing reactive jamming defending schemes can be benefited. This Trigger Identification can also work as an application layer .In this paper, on one side we provide the several optimization problems to provide complete trigger identification service framework for unreliable wireless sensor networks and on the other side we also provide an improved algorithm with regard to two sophisticated jamming models, in order to enhance its robustness for various network scenarios.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Thesis Statement for students diagnonsed withADHD.ppt
Parallel Key Value Pattern Matching Model
1. IJSRD - International Journal for Scientific Research & Development| Vol. 2, Issue 08, 2014 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 181
Parallel Key Value Pattern Matching Model
R. Senthamil Selvi1
Dr. T. Abdul Razak2
1
Assistant Professor 2
Associate Professor
1,2
Department of Computer Science
1,2
Jamal Mohamed College (Autonomous), Tiruchirappalli
Abstract— Mining frequent itemsets from the huge
transactional database is an important task in data mining.
To find frequent itemsets in databases involves big decision
in data mining for the purpose of extracting association
rules. Association rule mining is used to find relationships
among large datasets. Many algorithms were developed to
find those frequent itemsets. This work presents a
summarization and new model of parallel key value pattern
matching model which shards a large-scale mining task into
independent, parallel tasks. It produces a frequent pattern
showing their capabilities and efficiency in terms of time
consumption. It also avoids the high computational cost. It
discovers the frequent item set from the database.
Keywords: Data mining, FP Growth, Frequent Item Set
Mining, Association rule Mining
I. INTRODUCTION
Data Mining is a collection of processes for efficient
discovery of previously unknown, valid, useful and
understandable patterns in large databases. The patterns
should be actionable. So that they may be used in an
enterprise’s resolution processes. It has many software and
tools; they are used to analyze the data from large databases.
Mining Frequent Pattern is an important concept
for data mining. It gives the minimum support for threshold
in frequent itemset. Association rule mining discovers
relations between variables in large databases. Maximal
Frequent Itemset is an item set that occur maximum number
of times to the other itemset. The main purpose is to produce
a large number of results as a pattern.
Closed Frequent Itemset is linked to all frequent
itemsets. Each item can be linked to other item and to form
closed group itemsets. For example, if four itemsets are
taken as, s1, s2, s3 and s4. The first three items can be
linked to each other like, (s1, s2), (s1, s3) and (s2, s3). So
the three items form a group between them and have a
closed itemset. The main purpose is to produce a large
number of results as a pattern.
II. MOTIVATION
Business data are stored in computer and it allows users to
navigate through the data in real time. The evolution of data
mining is to support three technologies. They are data
collection, high performance computing and data mining
algorithm. Data mining has many algorithms in frequent
itemset mining. Every algorithm can perform well.
Especially, the FP-growth algorithm avoids the generation
of large numbers of candidate sets. The main idea of the
algorithm is to maintain a frequent pattern tree. All
algorithms of frequent itemset mining do not use the concept
of parallel key value model. It distributes the work through
the program to easily search and retrieve the frequent pattern
data.
III. RELATED WORK
Jiawei Han et al. [1] proposed FP-growth approach for
mining frequent itemsets without candidate generation. It is
an extended prefix-tree structure for storing quantitative
information about frequent patterns. And also some
optimizations are available to speed up FP-growth.
Christian Borgelt proposed a C implementation of a
FP-growth algorithm. The pruning concept is achieved by
traversing the levels of the FP-tree from top to bottom [2].
In implementation, the initial FP-tree is built from top to
bottom and built from a main memory representation of the
transaction database as a simple list of integer arrays. FP
growth algorithm behaves exactly the opposite way as
Apriori, which in implementation usually runs faster if items
are sorted in the ascending order.
Aiman Moyaid Said et al. [3] proposed a
comparative study of FP-growth variation. It is an
alternative method to the Apriori-based approach. It
represents the frequent itemset into a frequent pattern tree or
FP-tree, which retains the information of itemset. Using the
compact tree structure, the FP-growth algorithm mines all
the frequent itemsets.
B. Santhosh Kumar et al. [4] proposed a
comparison of memory usage and time usage in Apriori
algorithm and FP growth algorithm. It uses a compact data
structure and eliminates the repeated database scan. The
algorithm has some advantages like completeness and
compactness.
Haoyuan li et al [5] proposed FP-growth based on
the principle of divide and conquer way. That is to
decompose a mining task into a smaller task and totally
avoid candidate generation. In this paper, parallel algorithms
were developed for reducing memory use and computational
cost on every machine. Recent work in parallelizing FP-
growth suffers from high communication cost. Here a
MapReduceModel of parallel FP-growth algorithm (PFP)
which cleverly slices a large-scale mining task into
autonomous computational tasks and maps them into
MapReduce jobs achieving non-linear speedup was
proposed. The paper is based on novel data and
computation distribution scheme, which virtually eliminates
communication among computers and use map reduce
model. It is effective in mining tag-tag associations and
webpage-webpage associations to support query
recommendation or related search.
Bharat Gupta et al. proposed FP-growth algorithm
[6] that compresses the database of frequent itemsets into
frequent pattern tree recursively in the same order of
magnitude as the numbers of frequent patterns. It then
divides the compressed database into a set of conditional
databases. The FP-growth technique constructs conditional
frequent pattern tree and conditional pattern base from
database which satisfy the minimum support.
2. Parallel Key Value Pattern Matching Model
(IJSRD/Vol. 2/Issue 08/2014/044)
All rights reserved by www.ijsrd.com 182
Marek wojciechowski et al. proposed the common
counting method to work with FP-Growth algorithm and
evaluate the efficiency of both methods when FP-Growth
basically used as a mining algorithm. [7] They consider the
problem of optimizing batches of frequent itemset queries.
This paper uses multiple query optimization methods, like
common counting and mine merge. This methods reduces
the I/O cost for common execution tasks and executes them
only once for the whole data. The experiment shows that
common counting for FP-Growth reduces the overall
processing time.
E R Naganathan et al. [8] proposed structured data
mining. It is a major research topic in Data Mining. One of
the common types of representation of structured data is
graph. Graph-based data mining show a number of methods
to mine the relational aspects of data. Graph is an alternate
approach of modeling the objects. Graph-based data mining
(GDM) is the task of finding novel, and understandable
graph-theoretic patterns in a graph representation of data. It
presents a new process to find out the Normalization
Technique for the sub graphs obtained from the FP-growth
model. This process may be one of the perfect ranking
schemes among the sub graphs mined and this ranking
scheme will play an efficient role in the sub graph
applications.
IV. EXISTING PARALLEL FP-GROWTH MODEL
Parallel FP-Growth (PFP) means mining the complete set of
frequent patterns by pattern fragment growth in parallel.
Generally it depends on distributed machines. Each machine
executes on an independent group of mining tasks. The FP-
Growth algorithm runs much faster than the Apriori, but the
parallel FP-Growth algorithm is too faster than the FP-
Growth algorithm. It converts the DB into new databases of
group-dependent transactions. So that the FP-trees built
from different group-dependent transactions are
independent. It is used to eliminate the computational
dependencies between machines. And also it demonstrates
that PFP to be promising for supporting query
recommendation for search engines.
The PFP explains the resource challenges for FP-
Growth algorithm. They are storage, computation
distribution, costly communication and support threshold
value in FP-growth. Given a set of transaction database, PFP
uses three MapReduce phases to parallelize FP-Growth.
The PFP framework has five stages of
computation. They are shard, parallel counting, graphing
items, parallel FP-Growth and aggregating. PFP using
parallel counting is a classical application of MapReduce
approach.
PFP using MapReduce approach is used to shard a
large-scale mining task into independent computational
tasks. And also it is able to address the issues of memory use
and fault tolerance. So PFP is effective in mining tag-tag
associations and webpage-webpage associations to support
query recommendation. And the disadvantage of this
method is distributed machines. Because it will increase cost
of each machines.
V. THE PROPOSED MODEL
This model uses a frequent pattern to work faster than other
methods. Here, two tasks are used. They are XModel and
PModel. This Frequent Pattern proves that PModel task is
better and works faster than XModel. Also in this PModel
computing time is saved. This model is suitable for all the
algorithms in data mining.
The processing of XModel is used to retrieve data
from the user interface and generate a frequent itemset. If
items are equal, the process will end. Otherwise it will
return to the temporary database and again execute the
whole process until the condition is true.
The Processing of PModel is used to retrieve data
from the user interface and distribute the work using the
key-value for preparing frequent item sets. The intersect
operation between all frequent item sets are executed and
another frequent list called the F-list is produced. Group of
all F-list is called G-list. Then it checks whether the G-list
has more equal frequent items and the process will end.
Otherwise the whole process will be repeated until the
frequent item sets are equal.
A. Searching Algorithm
The searching algorithm illustrated in Fig. 1 sorts the set of
items in descending order and connects a database using
JdbcOdbcDriver; this process is showed in steps 8 to 11.
Then it checks if the condition record set is null, and moves
to the next record set. Otherwise it will be cleared. To select
a frequent item from database using a command execute a
Query statement; select a product from transaction table.
After selection process is completed, the database
connection is closed. This process is showed in Steps 18 to
20. All items are collected to generate groups.
Fig. 1: The Searching Algorithm
3. Parallel Key Value Pattern Matching Model
(IJSRD/Vol. 2/Issue 08/2014/044)
All rights reserved by www.ijsrd.com 183
B. XModel Algorithm
Fig. 2: The XModel Algorithm
The XModel illustrated in Fig. 2. is used to execute
a whole database and this takes more time for execution.
First, to read databases, then to select all items and execute
them; therefore the frequent items can be displayed at end of
the program. The main work of XModel is to print the
starting and ending time. This process is showed in steps 14
to 18. The total time duration can be calculated by the
difference between ending time and finishing time. So, the
duration of time taken to finish is in the order of
milliseconds.
C. PModel Algorithm
The PModel Algorithm illustrated in Fig. 3, works as
follows:
(1) Scan the transaction database once to find all
frequent items and their supports.
(2) Sort the frequent items in descending order of their
support.
(3) Get the first transaction from the transaction
database. Remove all non-frequent items and list
the remaining items according to the order in the
sorted frequent items.
(4) Get the next transaction from the transaction
database. Remove all non-frequent items and list
the remaining items according to the order in the
sorted frequent items.
(5) Group all sorted frequent itemsets and display the
start time and end time.
(6) Continue with step 4 until all transactions of the
database are processed.
Fig. 3: The PModel Algorithm
D. Architectural Diagram
Fig. 4: Parallel Key Value Pattern Matching Model Using
XModel
In Fig. 4, a single program for searches a data from
the database with the help of single key value. So it
produces a large frequent item set. It is a difficult process to
get an exact frequent item set. Because it takes a lot of time
to execute the processes. Now, the user gets more equal
4. Parallel Key Value Pattern Matching Model
(IJSRD/Vol. 2/Issue 08/2014/044)
All rights reserved by www.ijsrd.com 184
frequent item sets, and then the process will end. Otherwise
the process will be repeated until the user gets the exact
frequent item set.
Fig. 5: Parallel Key Value Pattern Matching using PModel
Architectural Diagram for Parallel Key Value
Pattern Matching model using PModel is illustrated in Fig.
5. The PModel contains a program for searching a data from
the database with the help of key value. So each program
contains some frequent itemsets that can be denoted as
frequent set1, frequent set2, and frequent set3. Then it
performs the operation of intersection between frequent set
1 and frequent set 2, frequent set 2 and frequent set 3,
frequent set3 and frequent set 1. After performing this
operation, it gives another frequent list; it is called F-list I,
F-list II, and F-list III respectively. After that, group all F-
lists, and then it gives frequent item set. This set can be
denoted as “G-List”. Now, the users get more equal frequent
item sets, and then the process will end. Otherwise the
process will be repeated until the user gets the exact
frequent item set.
VI. RESULTS AND DISCUSSION
This section contains the comparison table and graph of
XModel and PModel. The proposed model is applied into
the data of transactions.
The following table provides the summing up of results.
Number of Records
Finishing Time (Milliseconds)
X Model P Model
25 844 194
50 1567 1477
75 3010 2270
100 7929 3509
Table 1: Values for Comparison Graph
The execution time of XModel and PModel is
differentiated from each other in milliseconds. The
execution time is based on the number of records. This is
shown in Table 1. For example, in 100 records, the XModel
can take 7929 milliseconds for execution while the PModel
can execute in 3509 milliseconds only.
Fig. 6: XModel Vs PModel
Fig. 6. shows the comparison graph of XModel vs.
PModel. Number of records represents X-axis and
milliseconds Y-axis. The bar chart differentiates the
XModel and PModel. The experiments show that Parallel
Key Value Matching Model reduces the overall processing
time.
VII. CONCLUSION
The Parallel Key Value Pattern Matching Model is suitable
for all algorithms in frequent itemset mining, which are
usually of large scale distribution. It demonstrated that
parallel key value pattern matching model is effective for
discovering frequent itemsets. This model contained two
methods, they are XModel and PModel. The existing model
and proposed model are denoted as XModel and PModel
respectively. The XModel takes more time to execute a
program. The comparison is based on the performance of
speedup and efficiency. The PModel produced better results
of speedup and efficiency than XModel.
REFERENCES
[1] Jia Wei Han, Jian Pei, Yiwen Yin, Runying Mao,
“Mining Frequent Patterns without Candidate
Generation: A Frequent-Pattern Tree Approach”,
Data Mining and Knowledge Discovery, 8, 53 – 87,
Kluwer Academic Publishers, Netherlands, 2004.
[2] Christian Borgelt, “An Implementation of the FP-
Growth Algorithm”, Department of Knowledge
Processing and Language Engineering, Germany,
2005.
[3] Aiman Moyaid Said, Dr. P D D.Dominic, Dr.
Azween B Abdullah,” A Comparative Study of FP-
Growth Variations”, Department of Computer and
Information Sciences, International Journal of
Computer Science and Network Security, Vol.9,
No.5, Petronas, May 2009.
[4] B. Santhosh Kumar and K.V. Rukmani,
“Implementation of Web Usage Mining using Apriori
and FP Growth Algorithms”, Department of
Computer Science, Int.J. Of Advanced Networking
5. Parallel Key Value Pattern Matching Model
(IJSRD/Vol. 2/Issue 08/2014/044)
All rights reserved by www.ijsrd.com 185
and Applications, Vol.1, Issue: 06, Pages: 400-404,
Ketti, the Nilgiris, and Feb – April 2010.
[5] Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang,
Edward Chang, “PFP: Parallel FP-Growth for Query
Recommendation”, Google Beijing Research, china,
2010.
[6] Bharat Gupta and Dr. Deepak Garg, “FP-Tree Based
Algorithm Analysis: FP-Growth, COFI-Tree and CT-
PRO”, Department of Computer Science,
International Journal on Computer Science and
Engineering (IJCSE), ISSN: 0975-3397, Vol. 3, No.
7, Patiala, India, July 2011.
[7] Marek Wojciechowski, Krzysztof Galecki, and
Krzysztof Gawronek, “Concurrent Processing of
Frequent Itemset Queries using FP-Growth
Algorithm”, Department of Computer Science,
Poland.
[8] E R Naganathan, S.Narayanan and K. Ramesh
Kumar, “FP-growth Based new normalization for sub
graph ranking”, Department of Computer
Application, International Journal of Database
Management System(IJDMS), Vol. 3, No.1, Tamil
Nadu, February 2011.