journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...ijcsit
Β
High-utility itemset mining (HUIM) is an important research topic in data mining field and extensive algorithms have been proposed. However, existing methods for HUIM present too many high-utility
itemsets (HUIs), which reduces not only efficiency but also effectiveness of mining since users have to sift
through a large number of HUIs to find useful ones. Recently a new representation, closed+ high-utility itemset (CHUI), has been proposed. With this concept, the number of HUIs is reduced massively. Existing
methods adopt two phases to discover CHUIs from a transaction database. In phase I, an itemset is first
checked whether it is closed. If the itemset is closed, an overestimation technique is adopted to set an upper
bound of the utility of this itemset in the database. The itemsets whose overestimated utilities are no less
than a given threshold are selected as candidate CHUIs. In phase II, the candidate CHUIs generated from
phase 1 are verified through computing their utilities in the database. However, there are two problems in
these methods. 1) The number of candidate CHUIs is usually very huge and extensive memory is required.
2) The method computing closed itemsets is time consuming. Thus in this paper we propose an efficient
algorithm CloHUI for mining CHUIs from a transaction database. CloHUI does not generate any
candidate CHUIs during the mining process, and verifies closed itemsets from a tree structure. We propose
a strategy to make the verifying process faster. Extensive experiments have been performed on sparse and
dense datasets to compare CloHUI with the state-of-the-art algorithm CHUD, the experiment results show
that for dense datasets our proposed algorithm CloHUI significantly outperforms CHUD: it is more than an order of magnitude faster, and consumes less memory.
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...IRJET Journal
Β
This document presents a new algorithm called Efficient UP-Growth+ for mining high utility itemsets from transactional databases in an efficient manner. It aims to address issues with existing algorithms that generate a large number of candidate itemsets and require multiple scans of the original database. The proposed algorithm ensures it generates efficient itemsets with only two scans of the database. It works by optimizing the minimum utility threshold value to generate a suitable number of potential high utility itemsets in the first phase, rather than relying on a user-specified threshold. Experimental results on real and synthetic datasets show that the proposed algorithm takes less time and generates fewer candidate itemsets compared to other state-of-the-art utility mining algorithms like UP-Growth,
Mining High Utility Patterns in Large Databases using Mapreduce FrameworkIRJET Journal
Β
This document discusses mining high utility patterns from large databases using the MapReduce framework. It proposes using the d2HUP algorithm to efficiently mine high utility patterns from partitioned big data in parallel. The algorithm traverses a reverse set enumeration tree using depth-first search to identify high utility patterns based on a minimum utility threshold, without generating candidates. It partitions the data using MapReduce and mines patterns from each partition individually. The results are then combined to obtain the final high utility patterns. The proposed approach aims to improve efficiency over existing methods that are not scalable to large datasets.
A Relative Study on Various Techniques for High Utility Itemset Mining from T...IRJET Journal
Β
This document summarizes research on techniques for mining high utility itemsets from transactional databases. It discusses how traditional frequent itemset mining focuses only on item frequency and not utility. High utility itemset mining considers both frequency and the utility (e.g. profit, quantity) of itemsets to find those with high total utility. The document reviews related work on frequent itemset mining and introduces high utility itemset mining. It defines key concepts like internal utility, external utility and discusses properties like the utility bound property. Finally, it surveys several algorithms for high utility itemset mining including Two-Phase, CTU-Mine, CTU-PRO and CTU-PROL.
The document proposes improved MapReduce algorithms (UP-Growth and UP-Growth+) for mining high utility itemsets from transactional databases. Existing algorithms often generate a huge number of candidate itemsets, degrading performance. The proposed algorithms use a UP-Tree structure and MapReduce framework on Hadoop to more efficiently identify high utility itemsets from large datasets in distributed storage. Experimental results show the improved algorithms outperform other methods, especially for databases with long transactions or low minimum utility thresholds. The goal is to address limitations of existing approaches for low-memory systems and databases with null transactions.
The document proposes the UP-Growth+ algorithm to efficiently mine high utility itemsets from transactional databases. It first constructs a UP-Tree to store transaction information using two database scans while removing unpromising items. The UP-Tree aims to reduce overestimated utilities. Potential high utility itemsets are then generated from the UP-Tree using the UP-Growth+ algorithm through two strategies to further decrease overestimations. Finally, actual high utility itemsets are identified from the potential set by considering real utilities in the database.
This document provides an overview of scalable pattern mining algorithms for large scale interval data. It discusses the need for scalable pattern mining due to the huge increase in data size. It covers serial frequent itemset mining methods like Apriori, Eclat, and FP-growth. It also discusses parallel itemset mining methods including FP-growth based PFP algorithm and ultrametric tree based FiDoop algorithm. Additionally, it covers pattern mining approaches for interval data, including interval sequences, temporal relations, and hierarchical representations. The document concludes by stating that while efforts have been made to modify classic algorithms for distributed processing, scalable mining of temporal relationships on large interval data remains an open issue.
An incremental mining algorithm for maintaining sequential patterns using pre...Editor IJMTER
Β
Mining useful information and helpful knowledge from large databases has evolved into
an important research area in recent years. Among the classes of knowledge derived, finding
sequential patterns in temporal transaction databases is very important since it can help model
customer behavior. In the past, researchers usually assumed databases were static to simplify datamining problems. In real-world applications, new transactions may be added into databases
frequently. Designing an efficient and effective mining algorithm that can maintain sequential
patterns as a database grows is thus important. In this paper, we propose a novel incremental mining
algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce
the need for rescanning original databases.
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...ijcsit
Β
High-utility itemset mining (HUIM) is an important research topic in data mining field and extensive algorithms have been proposed. However, existing methods for HUIM present too many high-utility
itemsets (HUIs), which reduces not only efficiency but also effectiveness of mining since users have to sift
through a large number of HUIs to find useful ones. Recently a new representation, closed+ high-utility itemset (CHUI), has been proposed. With this concept, the number of HUIs is reduced massively. Existing
methods adopt two phases to discover CHUIs from a transaction database. In phase I, an itemset is first
checked whether it is closed. If the itemset is closed, an overestimation technique is adopted to set an upper
bound of the utility of this itemset in the database. The itemsets whose overestimated utilities are no less
than a given threshold are selected as candidate CHUIs. In phase II, the candidate CHUIs generated from
phase 1 are verified through computing their utilities in the database. However, there are two problems in
these methods. 1) The number of candidate CHUIs is usually very huge and extensive memory is required.
2) The method computing closed itemsets is time consuming. Thus in this paper we propose an efficient
algorithm CloHUI for mining CHUIs from a transaction database. CloHUI does not generate any
candidate CHUIs during the mining process, and verifies closed itemsets from a tree structure. We propose
a strategy to make the verifying process faster. Extensive experiments have been performed on sparse and
dense datasets to compare CloHUI with the state-of-the-art algorithm CHUD, the experiment results show
that for dense datasets our proposed algorithm CloHUI significantly outperforms CHUD: it is more than an order of magnitude faster, and consumes less memory.
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...IRJET Journal
Β
This document presents a new algorithm called Efficient UP-Growth+ for mining high utility itemsets from transactional databases in an efficient manner. It aims to address issues with existing algorithms that generate a large number of candidate itemsets and require multiple scans of the original database. The proposed algorithm ensures it generates efficient itemsets with only two scans of the database. It works by optimizing the minimum utility threshold value to generate a suitable number of potential high utility itemsets in the first phase, rather than relying on a user-specified threshold. Experimental results on real and synthetic datasets show that the proposed algorithm takes less time and generates fewer candidate itemsets compared to other state-of-the-art utility mining algorithms like UP-Growth,
Mining High Utility Patterns in Large Databases using Mapreduce FrameworkIRJET Journal
Β
This document discusses mining high utility patterns from large databases using the MapReduce framework. It proposes using the d2HUP algorithm to efficiently mine high utility patterns from partitioned big data in parallel. The algorithm traverses a reverse set enumeration tree using depth-first search to identify high utility patterns based on a minimum utility threshold, without generating candidates. It partitions the data using MapReduce and mines patterns from each partition individually. The results are then combined to obtain the final high utility patterns. The proposed approach aims to improve efficiency over existing methods that are not scalable to large datasets.
A Relative Study on Various Techniques for High Utility Itemset Mining from T...IRJET Journal
Β
This document summarizes research on techniques for mining high utility itemsets from transactional databases. It discusses how traditional frequent itemset mining focuses only on item frequency and not utility. High utility itemset mining considers both frequency and the utility (e.g. profit, quantity) of itemsets to find those with high total utility. The document reviews related work on frequent itemset mining and introduces high utility itemset mining. It defines key concepts like internal utility, external utility and discusses properties like the utility bound property. Finally, it surveys several algorithms for high utility itemset mining including Two-Phase, CTU-Mine, CTU-PRO and CTU-PROL.
The document proposes improved MapReduce algorithms (UP-Growth and UP-Growth+) for mining high utility itemsets from transactional databases. Existing algorithms often generate a huge number of candidate itemsets, degrading performance. The proposed algorithms use a UP-Tree structure and MapReduce framework on Hadoop to more efficiently identify high utility itemsets from large datasets in distributed storage. Experimental results show the improved algorithms outperform other methods, especially for databases with long transactions or low minimum utility thresholds. The goal is to address limitations of existing approaches for low-memory systems and databases with null transactions.
The document proposes the UP-Growth+ algorithm to efficiently mine high utility itemsets from transactional databases. It first constructs a UP-Tree to store transaction information using two database scans while removing unpromising items. The UP-Tree aims to reduce overestimated utilities. Potential high utility itemsets are then generated from the UP-Tree using the UP-Growth+ algorithm through two strategies to further decrease overestimations. Finally, actual high utility itemsets are identified from the potential set by considering real utilities in the database.
This document provides an overview of scalable pattern mining algorithms for large scale interval data. It discusses the need for scalable pattern mining due to the huge increase in data size. It covers serial frequent itemset mining methods like Apriori, Eclat, and FP-growth. It also discusses parallel itemset mining methods including FP-growth based PFP algorithm and ultrametric tree based FiDoop algorithm. Additionally, it covers pattern mining approaches for interval data, including interval sequences, temporal relations, and hierarchical representations. The document concludes by stating that while efforts have been made to modify classic algorithms for distributed processing, scalable mining of temporal relationships on large interval data remains an open issue.
An incremental mining algorithm for maintaining sequential patterns using pre...Editor IJMTER
Β
Mining useful information and helpful knowledge from large databases has evolved into
an important research area in recent years. Among the classes of knowledge derived, finding
sequential patterns in temporal transaction databases is very important since it can help model
customer behavior. In the past, researchers usually assumed databases were static to simplify datamining problems. In real-world applications, new transactions may be added into databases
frequently. Designing an efficient and effective mining algorithm that can maintain sequential
patterns as a database grows is thus important. In this paper, we propose a novel incremental mining
algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce
the need for rescanning original databases.
Mining high utility itemsets in data streams based on the weighted sliding wi...IJDKP
Β
This document proposes an algorithm for mining high utility itemsets from data streams based on the weighted sliding window model. The weighted sliding window model allows the user to specify the number of windows, window size, and weight for each window. The algorithm, called HUI_W, makes a single pass over the data stream and takes advantage of reusing stored information to efficiently discover high utility itemsets. It first identifies high transaction-weighted utilization (HTWU) 1-itemsets, then generates candidate itemsets and determines actual high utility itemsets based on a transaction-weighted downward closure property to prune the search space.
Comparison Between High Utility Frequent Item sets Mining Techniquesijsrd.com
Β
Data Mining can be defined as an activity that extracts some new nontrivial information contained in large databases. Traditional data mining techniques have focused largely on detecting the statistical correlations between the items that are more frequent in the transaction databases. Also termed as frequent itemsets mining, these techniques were based on the rationale that itemsets which appear more frequently must be of more importance to the user from the business perspective .In this paper we throw light upon an emerging area called Utility Mining which not only considers the frequency of the itemsets but also considers the utility associated with the itemsets. The term utility refers to the importance or the usefulness of the appearance of the itemset in transactions quantified in terms like profit , sales or any other user preferences. This paper presents a novel efficient algorithm FUFM (Fast Utility-Frequent Mining) which finds all utility-frequent itemsets within the given utility and support constraints threshold. It is faster and simpler than the original 2P-UF algorithm (2 Phase Utility-Frequent), as it is based on efficient methods for frequent itemset mining. Experimental evaluation on artificial datasets shown here, in contrast with 2P-UF, our algorithm can also be applied to mine large databases.
Data mining is a very popular research topic over the years. Sequential pattern mining or sequential rule mining is very useful application of data mining for the prediction purpose. In this paper, we have presented a review over sequential rule cum sequential pattern mining. The advantages & drawbacks of each popular sequential mining method is discussed in brief.
This document summarizes literature on frequent itemset mining on big data. It first defines key concepts like frequent itemsets, support, and confidence used in frequent itemset mining. It then discusses the Hadoop framework and MapReduce programming model for distributed processing of large datasets. Different algorithms for mining frequent itemsets on Hadoop like single-pass counting, fixed-pass combined counting, and dynamic-pass counting are described. Methods to distribute the search space like partitioning the prefix tree are also covered.
The document discusses various algorithms for searching data structures, including serial search with average time complexity of Ξ(n), binary search with average time complexity of Ξ(log n), and hashing techniques that can provide constant time Ξ(1) search by storing items in an array using a hash function. It provides pseudocode for binary search and discusses improvements like interpolation search that can achieve Ξ(log log n) search time on average.
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...IOSR Journals
Β
This document presents an improved item-based maxcover algorithm to protect sensitive patterns in large databases. The algorithm aims to minimize information loss when sanitizing databases to hide sensitive patterns. It works by identifying sensitive transactions containing restrictive patterns. It then sorts these transactions by degree and size and selects victim items to remove based on which items have the maximum cover across multiple patterns. This is done with only one scan of the source database. Experimental results on real datasets show the algorithm achieves zero hiding failure and low misses costs between 0-2.43% while keeping the sanitization rate between 40-68% and information loss below 1.1%.
An improved apriori algorithm for association rulesijnlc
Β
There are several mining algorithms of association rules. One of the most popular algorithms is Apriori
that is used to extract frequent itemsets from large database and getting the association rule for
discovering the knowledge. Based on this algorithm, this paper indicates the limitation of the original
Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and
presents an improvement on Apriori by reducing that wasted time depending on scanning only some
transactions. The paper shows by experimental results with several groups of transactions, and with
several values of minimum support that applied on the original Apriori and our implemented improved
Apriori that our improved Apriori reduces the time consumed by 67.38% in comparison with the original
Apriori, and makes the Apriori algorithm more efficient and less time consuming
This document presents a study on finding profitable and preferable itemsets from transactional datasets. It proposes algorithms to find the top-k profitable and popular product itemsets within given utility and support threshold constraints. The paper describes the system structure, which includes modules for preprocessing product and user datasets, identifying frequent itemsets, finding smaller frequent sets, price correlation analysis, and determining the top-k popular products. An example is provided to illustrate the frequent itemset mining process. An experimental comparative study on real and synthetic datasets demonstrates the effectiveness and efficiency of the proposed algorithms.
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce Fabio Fumarola
Β
The document presents MrAdam, a parallel algorithm for approximate frequent itemset mining using MapReduce. MrAdam avoids expensive communication and synchronization costs by mining approximate frequent itemsets from big data with statistical error guarantees. It combines a statistical approach based on the Chernoff bound with MapReduce-based local model discovery and global combination through an SE-tree and structural interpolation. Experiments show MrAdam is 2 to 100 times faster than previous frequent itemset mining algorithms using MapReduce.
Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Miningidescitation
Β
Mining frequent item-sets is one of the most
important concepts in data mining. It is a fundamental and
initial task of data mining. Apriori[3] is the most popular and
frequently used algorithm for finding frequent item-sets.
There are other algorithms viz, Eclat[4], FP-growth[5] which
are used to find out frequent item-sets. In order to improve
the time efficiency of Apriori algorithms, Jiemin Zheng
introduced Bit-Apriori[1] algorithm with the following
corrections with respect to Apriori[3] algorithm.
1) Support count is implemented by performing bitwise βAndβ
operation on binary strings
2) Special equal-support pruning
In this paper, to improve the time efficiency of Bit-Apriori[1]
algorithm, a novel algorithm that deletes infrequent items
during trie2 and subsequent tireβs are proposed and
demonstrated with an example.
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...IOSR Journals
Β
The document analyzes pattern transformation algorithms for sensitive knowledge protection in data mining. It discusses:
1) Three main privacy preserving techniques - heuristic, cryptography, and reconstruction-based. The proposed algorithms use heuristic-based techniques.
2) Four proposed heuristic-based algorithms - item-based Maxcover (IMA), pattern-based Maxcover (PMA), transaction-based Maxcover (TMA), and Sensitivity Cost Sanitization (SCS) - that modify sensitive transactions to decrease support of restrictive patterns.
3) Performance improvements including parallel and incremental approaches to handle large, dynamic databases while balancing privacy and utility.
FP-Tree is also a huge hierarchical data structure and cannot fit into the main memory also it is not suitable for βIncremental-miningβ nor used in βInteractive-miningβ system
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
Β
The document discusses sequential pattern mining algorithms. It begins by introducing sequential patterns and challenges in mining them from transaction databases. It then describes the Apriori-based GSP algorithm, which generates candidate sequences level-by-level and scans the database multiple times. The document also introduces pattern-growth methods like PrefixSpan that avoid candidate generation by projecting databases based on prefixes. Finally, it discusses optimizations like pseudo-projection that speed up sequential pattern mining.
This document discusses frequent pattern mining algorithms. It describes the Apriori, AprioriTid, and FP-Growth algorithms. The Apriori algorithm uses candidate generation and database scanning to find frequent itemsets. AprioriTid tracks transaction IDs to reduce scans. FP-Growth avoids candidate generation and multiple scans by building a frequent-pattern tree. It finds frequent patterns by mining the tree.
Mining Of Big Data Using Map-Reduce TheoremIOSR Journals
Β
This document discusses using MapReduce to efficiently extract large and complex data from big data sources. It proposes a MapReduce theorem for big data mining that is more efficient than the Heterogeneous Autonomous Complex and Evolving (HACE) theorem. MapReduce libraries support different programming languages and platforms, allowing for portable big data processing. The document outlines how MapReduce connects to Big Query to allow SQL queries to efficiently extract and analyze large datasets stored in the cloud. It also discusses data cleaning, sampling, and normalization as part of the big data mining process.
This document discusses using the R programming language and RHadoop libraries to perform association rule mining on big data stored in Hadoop. It first provides background on big data, association rule mining, and integrating R with Hadoop using RHadoop. It then describes setting up an 8-node Hadoop cluster using Ambari and installing RHadoop libraries to enable R scripts to run MapReduce jobs on the cluster. The goal is to use R and RHadoop to analyze a training dataset and discover interesting association rules.
A survey paper on sequence pattern mining with incrementalAlexander Decker
Β
This document summarizes four algorithms for sequential pattern mining: GSP, ISM, FreeSpan, and PrefixSpan. GSP is an Apriori-based algorithm that takes into account time constraints and taxonomies. ISM extends SPADE to incrementally update the frequent pattern set when new data is added. FreeSpan uses frequent items to recursively project databases and grow subsequences. PrefixSpan also uses projection but claims to not require candidate generation. It recursively projects databases based on short prefix patterns. The document concludes that most previous studies used GSP or PrefixSpan and that future work could focus on improving time efficiency of sequential pattern mining.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A cyber physical stream algorithm for intelligent software defined storageMade Artha
Β
The document presents a new Cyber Physical Stream (CPS) algorithm for selecting predominant items from large data streams. The algorithm works well for item frequencies starting from 2%. It is designed for use in intelligent Software-Defined Storage systems combined with fuzzy indexing. Experiments show CPS improves accuracy and efficiency over previous algorithms. CPS is inspired by a brain model and works by incrementing a "voltage" value when items match and decrementing it otherwise, selecting the item with highest voltage. It performs well on both uniform random and Zipf's law distributed streams, with optimal parameter values depending on the distribution.
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...ijitcs
Β
As we know that the online mining of streaming data is one of the most important issues in data mining. In
this paper, we proposed an efficient one- .frequent item sets over a transaction-sensitive sliding window),
to mine the set of all frequent item sets in data streams with a transaction-sensitive sliding window. An
effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and
memory needed to slide the windows. The experiments show that the proposed algorithm not only attain
highly accurate mining results, but also the performance significant faster and consume less memory than
existing algorithms for mining frequent item sets over recent data streams. In this paper our theoretical
analysis and experimental studies show that the proposed algorithm is efficient and scalable and perform
better for mining the set of all maximum frequent item sets over the entire history of the data streams.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
Β
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
The document summarizes research on mining high utility itemsets from transactional databases. It discusses how traditional frequent itemset mining algorithms do not account for item importance (weights/profits). Utility mining aims to discover itemsets that generate high total utility based on item weights and quantities. The document reviews existing utility mining algorithms like Two-Phase and UP-Growth, and proposes a new algorithm called Miner. Miner uses a novel utility-list structure and an Estimated Utility Cooccurrence Pruning strategy to reduce the number of costly join operations during mining, achieving better performance than UP-Growth. Experimental results on real datasets show Miner performs up to 95% fewer joins and is up to six times faster than UP-Growth.
Mining high utility itemsets in data streams based on the weighted sliding wi...IJDKP
Β
This document proposes an algorithm for mining high utility itemsets from data streams based on the weighted sliding window model. The weighted sliding window model allows the user to specify the number of windows, window size, and weight for each window. The algorithm, called HUI_W, makes a single pass over the data stream and takes advantage of reusing stored information to efficiently discover high utility itemsets. It first identifies high transaction-weighted utilization (HTWU) 1-itemsets, then generates candidate itemsets and determines actual high utility itemsets based on a transaction-weighted downward closure property to prune the search space.
Comparison Between High Utility Frequent Item sets Mining Techniquesijsrd.com
Β
Data Mining can be defined as an activity that extracts some new nontrivial information contained in large databases. Traditional data mining techniques have focused largely on detecting the statistical correlations between the items that are more frequent in the transaction databases. Also termed as frequent itemsets mining, these techniques were based on the rationale that itemsets which appear more frequently must be of more importance to the user from the business perspective .In this paper we throw light upon an emerging area called Utility Mining which not only considers the frequency of the itemsets but also considers the utility associated with the itemsets. The term utility refers to the importance or the usefulness of the appearance of the itemset in transactions quantified in terms like profit , sales or any other user preferences. This paper presents a novel efficient algorithm FUFM (Fast Utility-Frequent Mining) which finds all utility-frequent itemsets within the given utility and support constraints threshold. It is faster and simpler than the original 2P-UF algorithm (2 Phase Utility-Frequent), as it is based on efficient methods for frequent itemset mining. Experimental evaluation on artificial datasets shown here, in contrast with 2P-UF, our algorithm can also be applied to mine large databases.
Data mining is a very popular research topic over the years. Sequential pattern mining or sequential rule mining is very useful application of data mining for the prediction purpose. In this paper, we have presented a review over sequential rule cum sequential pattern mining. The advantages & drawbacks of each popular sequential mining method is discussed in brief.
This document summarizes literature on frequent itemset mining on big data. It first defines key concepts like frequent itemsets, support, and confidence used in frequent itemset mining. It then discusses the Hadoop framework and MapReduce programming model for distributed processing of large datasets. Different algorithms for mining frequent itemsets on Hadoop like single-pass counting, fixed-pass combined counting, and dynamic-pass counting are described. Methods to distribute the search space like partitioning the prefix tree are also covered.
The document discusses various algorithms for searching data structures, including serial search with average time complexity of Ξ(n), binary search with average time complexity of Ξ(log n), and hashing techniques that can provide constant time Ξ(1) search by storing items in an array using a hash function. It provides pseudocode for binary search and discusses improvements like interpolation search that can achieve Ξ(log log n) search time on average.
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...IOSR Journals
Β
This document presents an improved item-based maxcover algorithm to protect sensitive patterns in large databases. The algorithm aims to minimize information loss when sanitizing databases to hide sensitive patterns. It works by identifying sensitive transactions containing restrictive patterns. It then sorts these transactions by degree and size and selects victim items to remove based on which items have the maximum cover across multiple patterns. This is done with only one scan of the source database. Experimental results on real datasets show the algorithm achieves zero hiding failure and low misses costs between 0-2.43% while keeping the sanitization rate between 40-68% and information loss below 1.1%.
An improved apriori algorithm for association rulesijnlc
Β
There are several mining algorithms of association rules. One of the most popular algorithms is Apriori
that is used to extract frequent itemsets from large database and getting the association rule for
discovering the knowledge. Based on this algorithm, this paper indicates the limitation of the original
Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and
presents an improvement on Apriori by reducing that wasted time depending on scanning only some
transactions. The paper shows by experimental results with several groups of transactions, and with
several values of minimum support that applied on the original Apriori and our implemented improved
Apriori that our improved Apriori reduces the time consumed by 67.38% in comparison with the original
Apriori, and makes the Apriori algorithm more efficient and less time consuming
This document presents a study on finding profitable and preferable itemsets from transactional datasets. It proposes algorithms to find the top-k profitable and popular product itemsets within given utility and support threshold constraints. The paper describes the system structure, which includes modules for preprocessing product and user datasets, identifying frequent itemsets, finding smaller frequent sets, price correlation analysis, and determining the top-k popular products. An example is provided to illustrate the frequent itemset mining process. An experimental comparative study on real and synthetic datasets demonstrates the effectiveness and efficiency of the proposed algorithms.
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce Fabio Fumarola
Β
The document presents MrAdam, a parallel algorithm for approximate frequent itemset mining using MapReduce. MrAdam avoids expensive communication and synchronization costs by mining approximate frequent itemsets from big data with statistical error guarantees. It combines a statistical approach based on the Chernoff bound with MapReduce-based local model discovery and global combination through an SE-tree and structural interpolation. Experiments show MrAdam is 2 to 100 times faster than previous frequent itemset mining algorithms using MapReduce.
Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Miningidescitation
Β
Mining frequent item-sets is one of the most
important concepts in data mining. It is a fundamental and
initial task of data mining. Apriori[3] is the most popular and
frequently used algorithm for finding frequent item-sets.
There are other algorithms viz, Eclat[4], FP-growth[5] which
are used to find out frequent item-sets. In order to improve
the time efficiency of Apriori algorithms, Jiemin Zheng
introduced Bit-Apriori[1] algorithm with the following
corrections with respect to Apriori[3] algorithm.
1) Support count is implemented by performing bitwise βAndβ
operation on binary strings
2) Special equal-support pruning
In this paper, to improve the time efficiency of Bit-Apriori[1]
algorithm, a novel algorithm that deletes infrequent items
during trie2 and subsequent tireβs are proposed and
demonstrated with an example.
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...IOSR Journals
Β
The document analyzes pattern transformation algorithms for sensitive knowledge protection in data mining. It discusses:
1) Three main privacy preserving techniques - heuristic, cryptography, and reconstruction-based. The proposed algorithms use heuristic-based techniques.
2) Four proposed heuristic-based algorithms - item-based Maxcover (IMA), pattern-based Maxcover (PMA), transaction-based Maxcover (TMA), and Sensitivity Cost Sanitization (SCS) - that modify sensitive transactions to decrease support of restrictive patterns.
3) Performance improvements including parallel and incremental approaches to handle large, dynamic databases while balancing privacy and utility.
FP-Tree is also a huge hierarchical data structure and cannot fit into the main memory also it is not suitable for βIncremental-miningβ nor used in βInteractive-miningβ system
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
Β
The document discusses sequential pattern mining algorithms. It begins by introducing sequential patterns and challenges in mining them from transaction databases. It then describes the Apriori-based GSP algorithm, which generates candidate sequences level-by-level and scans the database multiple times. The document also introduces pattern-growth methods like PrefixSpan that avoid candidate generation by projecting databases based on prefixes. Finally, it discusses optimizations like pseudo-projection that speed up sequential pattern mining.
This document discusses frequent pattern mining algorithms. It describes the Apriori, AprioriTid, and FP-Growth algorithms. The Apriori algorithm uses candidate generation and database scanning to find frequent itemsets. AprioriTid tracks transaction IDs to reduce scans. FP-Growth avoids candidate generation and multiple scans by building a frequent-pattern tree. It finds frequent patterns by mining the tree.
Mining Of Big Data Using Map-Reduce TheoremIOSR Journals
Β
This document discusses using MapReduce to efficiently extract large and complex data from big data sources. It proposes a MapReduce theorem for big data mining that is more efficient than the Heterogeneous Autonomous Complex and Evolving (HACE) theorem. MapReduce libraries support different programming languages and platforms, allowing for portable big data processing. The document outlines how MapReduce connects to Big Query to allow SQL queries to efficiently extract and analyze large datasets stored in the cloud. It also discusses data cleaning, sampling, and normalization as part of the big data mining process.
This document discusses using the R programming language and RHadoop libraries to perform association rule mining on big data stored in Hadoop. It first provides background on big data, association rule mining, and integrating R with Hadoop using RHadoop. It then describes setting up an 8-node Hadoop cluster using Ambari and installing RHadoop libraries to enable R scripts to run MapReduce jobs on the cluster. The goal is to use R and RHadoop to analyze a training dataset and discover interesting association rules.
A survey paper on sequence pattern mining with incrementalAlexander Decker
Β
This document summarizes four algorithms for sequential pattern mining: GSP, ISM, FreeSpan, and PrefixSpan. GSP is an Apriori-based algorithm that takes into account time constraints and taxonomies. ISM extends SPADE to incrementally update the frequent pattern set when new data is added. FreeSpan uses frequent items to recursively project databases and grow subsequences. PrefixSpan also uses projection but claims to not require candidate generation. It recursively projects databases based on short prefix patterns. The document concludes that most previous studies used GSP or PrefixSpan and that future work could focus on improving time efficiency of sequential pattern mining.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A cyber physical stream algorithm for intelligent software defined storageMade Artha
Β
The document presents a new Cyber Physical Stream (CPS) algorithm for selecting predominant items from large data streams. The algorithm works well for item frequencies starting from 2%. It is designed for use in intelligent Software-Defined Storage systems combined with fuzzy indexing. Experiments show CPS improves accuracy and efficiency over previous algorithms. CPS is inspired by a brain model and works by incrementing a "voltage" value when items match and decrementing it otherwise, selecting the item with highest voltage. It performs well on both uniform random and Zipf's law distributed streams, with optimal parameter values depending on the distribution.
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...ijitcs
Β
As we know that the online mining of streaming data is one of the most important issues in data mining. In
this paper, we proposed an efficient one- .frequent item sets over a transaction-sensitive sliding window),
to mine the set of all frequent item sets in data streams with a transaction-sensitive sliding window. An
effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and
memory needed to slide the windows. The experiments show that the proposed algorithm not only attain
highly accurate mining results, but also the performance significant faster and consume less memory than
existing algorithms for mining frequent item sets over recent data streams. In this paper our theoretical
analysis and experimental studies show that the proposed algorithm is efficient and scalable and perform
better for mining the set of all maximum frequent item sets over the entire history of the data streams.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
Β
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
The document summarizes research on mining high utility itemsets from transactional databases. It discusses how traditional frequent itemset mining algorithms do not account for item importance (weights/profits). Utility mining aims to discover itemsets that generate high total utility based on item weights and quantities. The document reviews existing utility mining algorithms like Two-Phase and UP-Growth, and proposes a new algorithm called Miner. Miner uses a novel utility-list structure and an Estimated Utility Cooccurrence Pruning strategy to reduce the number of costly join operations during mining, achieving better performance than UP-Growth. Experimental results on real datasets show Miner performs up to 95% fewer joins and is up to six times faster than UP-Growth.
Generation of Potential High Utility Itemsets from Transactional DatabasesAM Publications
Β
Mining high utility item sets from a transactional database refers to the discovery of item sets with high utility.
Previous algorithm such as Apriori and Fp-Growth incurs the problem of producing a large number of candidate item sets for
high utility item sets. Such large number of candidate item sets degrades the mining performance in terms of execution time. So,
to improve the mining performance Up-Growth came into existence. Up-Growth effectively mines the potential high utility item
sets from the Transactional database. The information of high utility item sets is maintained in a tree-based data structure named
utility pattern tree (UP-Tree) such that candidate item sets can be generated efficiently with only two scans of database. The
performance of UP-Growth is compared with the state-of-the-art algorithms on many types of both real and synthetic data sets.
A Fuzzy Algorithm for Mining High Utility Rare Itemsets β FHURIidescitation
Β
Classical frequent itemset mining identifies frequent itemsets in transaction
databases using only frequency of item occurrences, without considering utility of items. In
many real world situations, utility of itemsets are based upon userβs perspective such as
cost, profit or revenue and are of significant importance. Utility mining considers using
utility factors in data mining tasks. Utility-based descriptive data mining aims at
discovering itemsets with high total utility is termed High Utility Itemset mining. High
Utility itemsets may contain frequent as well as rare itemsets. Classical utility mining only
considers items and their utilities as discrete values. In real world applications, such utilities
can be described by fuzzy sets. Thus itemset utility mining with fuzzy modeling allows item
utility values to be fuzzy and dynamic over time. In this paper, an algorithm, FHURI (Fuzzy
High Utility Rare Itemset Mining) is presented to efficiently and effectively mine very-high
(and high) utility rare itemsets from databases, by fuzzification of utility values. FHURI can
effectively extract fuzzy high utility rare itemsets by integrating fuzzy logic with high utility
rare itemset mining. FHURI algorithm may have practical meaning to real-world
marketing strategies. The results are shown using synthetic datasets.
A Survey Report on High Utility Itemset Mining for Frequent Pattern MiningIJSRD
Β
Data Mining is the Process to extract knowledge from the data or data stream, Mining high utility itemsets focus on the itemset with high profit only, as data or itemset may be static or in a stream mine this type if data has become a significant research topic. In this research paper we have presented different methods available for mining high utility dataset , to develop algorithm for this requires logic to mine the high utility itemsets from data streams, We have compared following algorithms, One pass Algorithm, Two Phase Algorithm, Mining top k-Utility Frequent itemset, and Sliding window based algorithms like MHUI-BIT(Mining High Utility Itemset Based on BITVector), MHUI-TID(Mining HighUtility Itemset based on TID-List) for lexicographical tree based summary data structure. And the FHM algorithm which we will used for our proposed work ΓΒ’Γ’βˆ βFast High Utility Miner) Using estimated utility Co-occurrence Pruning.
Frequent pattern mining techniques helpful to find interesting trends or patterns in
massive data. Prior domain knowledge leads to decide appropriate minimum support threshold. This
review article show different frequent pattern mining techniques based on apriori or FP-tree or user
define techniques under different computing environments like parallel, distributed or available data
mining tools, those helpful to determine interesting frequent patterns/itemsets with or without prior
domain knowledge. Proposed review article helps to develop efficient and scalable frequent pattern
mining techniques.
The International Journal of Engineering and Science (The IJES)theijes
Β
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability
Optimized High-Utility Itemsets Mining for Effective Association Mining Paper IJECEIAES
Β
Association rule mining is intently used for determining the frequent itemsets of transactional database; however, it is needed to consider the utility of itemsets in market behavioral applications. Apriori or FP-growth methods generate the association rules without utility factor of items. High-utility itemset mining (HUIM) is a well-known method that effectively determines the itemsets based on high-utility value and the resulting itemsets are known as high-utility itemsets. Fastest high-utility mining method (FHM) is an enhanced version of HUIM. FHM reduces the number of join operations during itemsets generation, so it is faster than HUIM. For large datasets, both methods are very expenisve. Proposed method addressed this issue by building pruning based utility co-occurrence structure (PEUCS) for elimatination of low-profit itemsets, thus, obviously it process only optimal number of high-utility itemsets, so it is called as optimal FHM (OFHM). Experimental results show that OFHM takes less computational runtime, therefore it is more efficient when compared to other existing methods for benchmarked large datasets.
This document proposes a new framework for mining the top-k high utility itemsets from transactional databases without specifying a minimum utility threshold. It introduces the problem of setting an appropriate minimum utility threshold in traditional high utility itemset mining. The document then presents the Top-K Utility itemset mining (TKU) algorithm, which addresses four main challenges in mining top-k high utility itemsets: 1) the lack of an anti-monotone property, 2) adapting the TWU model, 3) not having a minimum utility threshold given beforehand, and 4) effectively raising the threshold without missing any top-k itemsets. Experimental results on real and synthetic datasets show that TKU has excellent performance and scalability compared to
Transaction Profitability Using HURI Algorithm [TPHURI]ijbiss
Β
Business intelligence (BI) is formulation of business strategies which help organizations to achieve its objectives and to predict its future. Data mining is often referred as BI in the domain of business. One of the major tasks in data mining is Association Rule Mining (ARM). ARM techniques incorporated in BI systems can be utilized in business decision-making such as retail shelf management, catalog design, customer segmentation, cross-selling, quality improvement and bundling products marketing.
ARM technique is used for the identification of frequent itemsets from huge databases and then generating strong association rules by considering each item having same value. But in a large number of real world applications, items have different values according to their impact on the respective decision making processes. Traditional ARM techniques cannot fulfil the arising demands from these applications. The data mining researchers are continuously improving the quality of ARM technique by incorporating the utility of items. The utility of item is decided by its contribution towards the business profit or quantity of the item sold, etc. Hence Utility mining focuses on identifying the itemsets with high utilities.
Jyothi et al proposed HURI algorithm in [2] for producing high utility rare itemset according to usersβ interest. An algorithm Transaction Profitability using HURI [TPHURI] is proposed in this paper which is a modified version of HURI. TPHURI finds profitable transactions consisting of high utility rare items and also finds the share of such items in the overall profit of the transactions.
TRANSACTION PROFITABILITY USING HURI ALGORITHM [TPHURI]ijbiss
Β
Business intelligence (BI) is formulation of business strategies which help organizations to achieve its
objectives and to predict its future. Data mining is often referred as BI in the domain of business. One of
the major tasks in data mining is Association Rule Mining (ARM). ARM techniques incorporated in BI
systems can be utilized in business decision-making such as retail shelf management, catalog design,
customer segmentation, cross-selling, quality improvement and bundling products marketing.
ARM technique is used for the identification of frequent itemsets from huge databases and then generating
strong association rules by considering each item having same value. But in a large number of real world
applications, items have different values according to their impact on the respective decision making
processes. Traditional ARM techniques cannot fulfil the arising demands from these applications. The data
mining researchers are continuously improving the quality of ARM technique by incorporating the utility of
items. The utility of item is decided by its contribution towards the business profit or quantity of the item
sold, etc. Hence Utility mining focuses on identifying the itemsets with high utilities.
Jyothi et al proposed HURI algorithm in [2] for producing high utility rare itemset according to usersβ
interest. An algorithm Transaction Profitability using HURI [TPHURI] is proposed in this paper which is
a modified version of HURI. TPHURI finds profitable transactions consisting of high utility rare items and
also finds the share of such items in the overall profit of the transactions.
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET Journal
Β
This document discusses classifying patterns from online shopping data using data mining techniques. It proposes using the Apriori algorithm to mine frequent patterns from transaction data stored in a data warehouse. Patterns mined from the data warehouse using Apriori would then be stored in a pattern warehouse. This would allow users to view product details and related patterns when browsing items online. The system aims to efficiently analyze large amounts of user data to discover useful patterns for improving the online shopping experience.
The document summarizes several improved algorithms that aim to address the drawbacks of the Apriori algorithm for association rule mining. It discusses six different approaches: 1) An intersection and record filter approach that counts candidate support only in transactions of sufficient length and uses set intersection; 2) An approach using set size and frequency to prune insignificant candidates; 3) An approach that reduces the candidate set and memory usage by only searching frequent itemsets once to delete candidates; 4) A partitioning approach that divides the database; 5) An approach using vertical data format to reduce database scans; and 6) A distributed approach to parallelize the algorithm across machines.
This document proposes improvements to existing algorithms for multidimensional sequential pattern mining. It summarizes existing research that combines sequential pattern mining with multidimensional analysis or incorporates multidimensional information into sequential pattern mining. The proposed algorithm first mines minimal atomic frequent sequences from the data and prunes hierarchies using an adapted PrefixSpan algorithm to efficiently generate sequential patterns associated with multidimensional information. This approach aims to improve over existing methods by leveraging the efficiency of PrefixSpan for multidimensional sequential pattern mining.
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesVenu Madhav
Β
Frequent itemset mining and association rule generation is
a challenging task in data stream. Even though, various algorithms
have been proposed to solve the issue, it has been found
out that only frequency does not decides the significance
interestingness of the mined itemset and hence the association
rules. This accelerates the algorithms to mine the association
rules based on utility i.e. proficiency of the mined rules. However,
fewer algorithms exist in the literature to deal with the utility
as most of them deals with reducing the complexity in frequent
itemset/association rules mining algorithm. Also, those few
algorithms consider only the overall utility of the association
rules and not the consistency of the rules throughout a defined
number of periods. To solve this issue, in this paper, an enhanced
association rule mining algorithm is proposed. The algorithm
introduces new weightage validation in the conventional
association rule mining algorithms to validate the utility and
its consistency in the mined association rules. The utility is
validated by the integrated calculation of the cost/price efficiency
of the itemsets and its frequency. The consistency validation
is performed at every defined number of windows using the
probability distribution function, assuming that the weights are
normally distributed. Hence, validated and the obtained rules
are frequent and utility efficient and their interestingness are
distributed throughout the entire time period. The algorithm is
implemented and the resultant rules are compared against the
rules that can be obtained from conventional mining algorithms
Associationship is an important component of data mining. In real world applications, the knowledge that is used for aiding decision-making is always time-varying. However, most of the existing data mining approaches rely on the assumption that discovered knowledge is valid indefinitely. For supporting better decision making, it is desirable to be able to actually identify the temporal features with the interesting patterns or rules. This paper presents a novel approach for mining Efficient Temporal Association Rule (ETAR). The basic idea of ETAR is to first partition the database into time periods of item set and then progressively accumulates the occurrence count of each item set based on the intrinsic partitioning characteristics. Explicitly, the execution time of ETAR is, in orders of magnitude, smaller than those required by schemes which are directly extended from existing methods because it scan the database only once.
This document presents an efficient temporal association rule mining (ETAR) algorithm that mines frequent patterns from transactional databases while considering the temporal aspect of the data. The ETAR algorithm partitions the database into time periods based on the transaction dates, then generates frequent itemsets for each time period in a single pass over the data. The experimental results on a sample database show that ETAR can discover time-related association rules that would be missed by traditional approaches. Different frequent itemsets were found for different time intervals, showing the importance of considering temporal factors.
Associationship is an important component of data mining. In real world applications, the knowledge that is used for aiding decision-making is always time-varying. However, most of the existing data mining approaches rely on the assumption that discovered knowledge is valid indefinitely. For supporting better decision making, it is desirable to be able to actually identify the temporal features with the interesting patterns or rules. This paper presents a novel approach for mining Efficient Temporal Association Rule (ETAR). The basic idea of ETAR is to first partition the database into time periods of item set and then progressively accumulates the occurrence count of each item set based on the intrinsic partitioning characteristics. Explicitly, the execution time of ETAR is, in orders of magnitude, smaller than those required by schemes which are directly extended from existing methods because it scan the database only once.
A Quantified Approach for large Dataset Compression in Association MiningIOSR Journals
Β
Abstract: With the rapid development of computer and information technology in the last several decades, an
enormous amount of data in science and engineering will continuously be generated in massive scale; data
compression is needed to reduce the cost and storage space. Compression and discovering association rules by
identifying relationships among sets of items in a transaction database is an important problem in Data Mining.
Finding frequent itemsets is computationally the most expensive step in association rule discovery and therefore
it has attracted significant research attention. However, existing compression algorithms are not appropriate in
data mining for large data sets. In this research a new approach is describe in which the original dataset is
sorted in lexicographical order and desired number of groups are formed to generate the quantification tables.
These quantification tables are used to generate the compressed dataset, which is more efficient algorithm for
mining complete frequent itemsets from compressed dataset. The experimental results show that the proposed
algorithm performs better when comparing it with the mining merge algorithm with different supports and
execution time.
Keywords: Apriori Algorithm, mining merge Algorithm, quantification table
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...BRNSSPublicationHubI
Β
This document presents an improved Apriori algorithm for generating frequent item sets on large datasets using Hadoop MapReduce. The classical Apriori algorithm suffers from repeated database scans, high candidate generation costs, and memory issues. The proposed improved Apriori algorithm aims to address these issues by leveraging Hadoop MapReduce to parallelize the processing and reduce unnecessary database scans. It presents the pseudocode for the classical and improved algorithms. The improved algorithm is evaluated to show it provides better performance than the classical Apriori algorithm in terms of time and number of iterations required.
Similar to International Journal of Engineering Research and Development (IJERD) (20)
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
Β
Distributed Denial of Service (DDoS) Attacks became a massive threat to the Internet. Traditional
Architecture of internet is vulnerable to the attacks like DDoS. Attacker primarily acquire his army of Zombies,
then that army will be instructed by the Attacker that when to start an attack and on whom the attack should be
done. In this paper, different techniques which are used to perform DDoS Attacks, Tools that were used to
perform Attacks and Countermeasures in order to detect the attackers and eliminate the Bandwidth Distributed
Denial of Service attacks (B-DDoS) are reviewed. DDoS Attacks were done by using various Flooding
techniques which are used in DDoS attack.
The main purpose of this paper is to design an architecture which can reduce the Bandwidth
Distributed Denial of service Attack and make the victim site or server available for the normal users by
eliminating the zombie machines. Our Primary focus of this paper is to dispute how normal machines are
turning into zombies (Bots), how attack is been initiated, DDoS attack procedure and how an organization can
save their server from being a DDoS victim. In order to present this we implemented a simulated environment
with Cisco switches, Routers, Firewall, some virtual machines and some Attack tools to display a real DDoS
attack. By using Time scheduling, Resource Limiting, System log, Access Control List and some Modular
policy Framework we stopped the attack and identified the Attacker (Bot) machines
Hearing loss is one of the most common human impairments. It is estimated that by year 2015 more
than 700 million people will suffer mild deafness. Most can be helped by hearing aid devices depending on the
severity of their hearing loss. This paper describes the implementation and characterization details of a dual
channel transmitter front end (TFE) for digital hearing aid (DHA) applications that use novel micro
electromechanical- systems (MEMS) audio transducers and ultra-low power-scalable analog-to-digital
converters (ADCs), which enable a very-low form factor, energy-efficient implementation for next-generation
DHA. The contribution of the design is the implementation of the dual channel MEMS microphones and powerscalable
ADC system.
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
Β
-A composite beam is composed of a steel beam and a slab connected by means of shear connectors
like studs installed on the top flange of the steel beam to form a structure behaving monolithically. This study
analyzes the effects of the tensile behavior of the slab on the structural behavior of the shear connection like slip
stiffness and maximum shear force in composite beams subjected to hogging moment. The results show that the
shear studs located in the crack-concentration zones due to large hogging moments sustain significantly smaller
shear force and slip stiffness than the other zones. Moreover, the reduction of the slip stiffness in the shear
connection appears also to be closely related to the change in the tensile strain of rebar according to the increase
of the load. Further experimental and analytical studies shall be conducted considering variables such as the
reinforcement ratio and the arrangement of shear connectors to achieve efficient design of the shear connection
in composite beams subjected to hogging moment.
Gold prospecting using Remote Sensing βA case study of SudanβIJERD Editor
Β
Gold has been extracted from northeast Africa for more than 5000 years, and this may be the first
place where the metal was extracted. The Arabian-Nubian Shield (ANS) is an exposure of Precambrian
crystalline rocks on the flanks of the Red Sea. The crystalline rocks are mostly Neoproterozoic in age. ANS
includes the nations of Israel, Jordan. Egypt, Saudi Arabia, Sudan, Eritrea, Ethiopia, Yemen, and Somalia.
Arabian Nubian Shield Consists of juvenile continental crest that formed between 900 550 Ma, when intra
oceanic arc welded together along ophiolite decorated arc. Primary Au mineralization probably developed in
association with the growth of intra oceanic arc and evolution of back arc. Multiple episodes of deformation
have obscured the primary metallogenic setting, but at least some of the deposits preserve evidence that they
originate as sea floor massive sulphide deposits.
The Red Sea Hills Region is a vast span of rugged, harsh and inhospitable sector of the Earth with
inimical moon-like terrain, nevertheless since ancient times it is famed to be an abode of gold and was a major
source of wealth for the Pharaohs of ancient Egypt. The Pharaohs old workings have been periodically
rediscovered through time. Recent endeavours by the Geological Research Authority of Sudan led to the
discovery of a score of occurrences with gold and massive sulphide mineralizations. In the nineties of the
previous century the Geological Research Authority of Sudan (GRAS) in cooperation with BRGM utilized
satellite data of Landsat TM using spectral ratio technique to map possible mineralized zones in the Red Sea
Hills of Sudan. The outcome of the study mapped a gossan type gold mineralization. Band ratio technique was
applied to Arbaat area and a signature of alteration zone was detected. The alteration zones are commonly
associated with mineralization. The alteration zones are commonly associated with mineralization. A filed check
confirmed the existence of stock work of gold bearing quartz in the alteration zone. Another type of gold
mineralization that was discovered using remote sensing is the gold associated with metachert in the Atmur
Desert.
Reducing Corrosion Rate by Welding DesignIJERD Editor
Β
This document summarizes a study on reducing corrosion rates in steel through welding design. The researchers tested different welding groove designs (X, V, 1/2X, 1/2V) and preheating temperatures (400Β°C, 500Β°C, 600Β°C) on ferritic malleable iron samples. Testing found that X and V groove designs with 500Β°C and 600Β°C preheating had corrosion rates of 0.5-0.69% weight loss after 14 days, compared to 0.57-0.76% for 400Β°C preheating. Higher preheating reduced residual stresses which decreased corrosion. Residual stresses were 1.7 MPa for optimal X groove and 600Β°C
Router 1X3 β RTL Design and VerificationIJERD Editor
Β
Routing is the process of moving a packet of data from source to destination and enables messages
to pass from one computer to another and eventually reach the target machine. A router is a networking device
that forwards data packets between computer networks. It is connected to two or more data lines from different
networks (as opposed to a network switch, which connects data lines from one single network). This paper,
mainly emphasizes upon the study of router device, itβs top level architecture, and how various sub-modules of
router i.e. Register, FIFO, FSM and Synchronizer are synthesized, and simulated and finally connected to its top
module.
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
Β
This paper presents a component within the flexible ac-transmission system (FACTS) family, called
distributed power-flow controller (DPFC). The DPFC is derived from the unified power-flow controller (UPFC)
with an eliminated common dc link. The DPFC has the same control capabilities as the UPFC, which comprise
the adjustment of the line impedance, the transmission angle, and the bus voltage. The active power exchange
between the shunt and series converters, which is through the common dc link in the UPFC, is now through the
transmission lines at the third-harmonic frequency. DPFC multiple small-size single-phase converters which
reduces the cost of equipment, no voltage isolation between phases, increases redundancy and there by
reliability increases. The principle and analysis of the DPFC are presented in this paper and the corresponding
simulation results that are carried out on a scaled prototype are also shown.
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
Β
Power quality has been an issue that is becoming increasingly pivotal in industrial electricity
consumers point of view in recent times. Modern industries employ Sensitive power electronic equipments,
control devices and non-linear loads as part of automated processes to increase energy efficiency and
productivity. Voltage disturbances are the most common power quality problem due to this the use of a large
numbers of sophisticated and sensitive electronic equipment in industrial systems is increased. This paper
discusses the design and simulation of dynamic voltage restorer for improvement of power quality and
reduce the harmonics distortion of sensitive loads. Power quality problem is occurring at non-standard
voltage, current and frequency. Electronic devices are very sensitive loads. In power system voltage sag,
swell, flicker and harmonics are some of the problem to the sensitive load. The compensation capability
of a DVR depends primarily on the maximum voltage injection ability and the amount of stored
energy available within the restorer. This device is connected in series with the distribution feeder at
medium voltage. A fuzzy logic control is used to produce the gate pulses for control circuit of DVR and the
circuit is simulated by using MATLAB/SIMULINK software.
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
Β
Additive manufacturing process, also popularly known as 3-D printing, is a process where a product
is created in a succession of layers. It is based on a novel materials incremental manufacturing philosophy.
Unlike conventional manufacturing processes where material is removed from a given work price to derive the
final shape of a product, 3-D printing develops the product from scratch thus obviating the necessity to cut away
materials. This prevents wastage of raw materials. Commonly used raw materials for the process are ABS
plastic, PLA and nylon. Recently the use of gold, bronze and wood has also been implemented. The complexity
factor of this process is 0% as in any object of any shape and size can be manufactured.
Spyware triggering system by particular string valueIJERD Editor
Β
This computer programme can be used for good and bad purpose in hacking or in any general
purpose. We can say it is next step for hacking techniques such as keylogger and spyware. Once in this system if
user or hacker store particular string as a input after that software continually compare typing activity of user
with that stored string and if it is match then launch spyware programme.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
Β
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
Β
- Data over the cloud is transferred or transmitted between servers and users. Privacy of that
data is very important as it belongs to personal information. If data get hacked by the hacker, can be
used to defame a personβs social data. Sometimes delay are held during data transmission. i.e. Mobile
communication, bandwidth is low. Hence compression algorithms are proposed for fast and efficient
transmission, encryption is used for security purposes and blurring is used by providing additional
layers of security. These algorithms are hybridized for having a robust and efficient security and
transmission over cloud storage system.
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
Β
A thorough review of existing literature indicates that the Buckley-Leverett equation only analyzes
waterflood practices directly without any adjustments on real reservoir scenarios. By doing so, quite a number
of errors are introduced into these analyses. Also, for most waterflood scenarios, a radial investigation is more
appropriate than a simplified linear system. This study investigates the adoption of the Buckley-Leverett
equation to estimate the radius invasion of the displacing fluid during waterflooding. The model is also adopted
for a Microbial flood and a comparative analysis is conducted for both waterflooding and microbial flooding.
Results shown from the analysis doesnβt only records a success in determining the radial distance of the leading
edge of water during the flooding process, but also gives a clearer understanding of the applicability of
microbes to enhance oil production through in-situ production of bio-products like bio surfactans, biogenic
gases, bio acids etc.
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
Β
- Gesture gaming is a method by which users having a laptop/pc/x-box play games using natural or
bodily gestures. This paper presents a way of playing free flash games on the internet using an ordinary webcam
with the help of open source technologies. Emphasis in human activity recognition is given on the pose
estimation and the consistency in the pose of the player. These are estimated with the help of an ordinary web
camera having different resolutions from VGA to 20mps. Our work involved giving a 10 second documentary to
the user on how to play a particular game using gestures and what are the various kinds of gestures that can be
performed in front of the system. The initial inputs of the RGB values for the gesture component is obtained by
instructing the user to place his component in a red box in about 10 seconds after the short documentary before
the game is finished. Later the system opens the concerned game on the internet on popular flash game sites like
miniclip, games arcade, GameStop etc and loads the game clicking at various places and brings the state to a
place where the user is to perform only gestures to start playing the game. At any point of time the user can call
off the game by hitting the esc key and the program will release all of the controls and return to the desktop. It
was noted that the results obtained using an ordinary webcam matched that of the Kinect and the users could
relive the gaming experience of the free flash games on the net. Therefore effective in game advertising could
also be achieved thus resulting in a disruptive growth to the advertising firms.
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
Β
-LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region[5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBTβs where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBTβs are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits.
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
Β
LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region [5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBTβs where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBTβs are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits. The supported simulation
is done through PSIM 6.0 software tool
Amateurs Radio operator, also known as HAM communicates with other HAMs through Radio
waves. Wireless communication in which Moon is used as natural satellite is called Moon-bounce or EME
(Earth -Moon-Earth) technique. Long distance communication (DXing) using Very High Frequency (VHF)
operated amateur HAM radio was difficult. Even with the modest setup having good transceiver, power
amplifier and high gain antenna with high directivity, VHF DXing is possible. Generally 2X11 YAGI antenna
along with rotor to set horizontal and vertical angle is used. Moon tracking software gives exact location,
visibility of Moon at both the stations and other vital data to acquire real time position of moon.
βMS-Extractor: An Innovative Approach to Extract Microsatellites on βYβ Chrom...IJERD Editor
Β
Simple Sequence Repeats (SSR), also known as Microsatellites, have been extensively used as
molecular markers due to their abundance and high degree of polymorphism. The nucleotide sequences of
polymorphic forms of the same gene should be 99.9% identical. So, Microsatellites extraction from the Gene is
crucial. However, Microsatellites repeat count is compared, if they differ largely, he has some disorder. The Y
chromosome likely contains 50 to 60 genes that provide instructions for making proteins. Because only males
have the Y chromosome, the genes on this chromosome tend to be involved in male sex determination and
development. Several Microsatellite Extractors exist and they fail to extract microsatellites on large data sets of
giga bytes and tera bytes in size. The proposed tool βMS-Extractor: An Innovative Approach to extract
Microsatellites on βYβ Chromosomeβ can extract both Perfect as well as Imperfect Microsatellites from large
data sets of human genome βYβ. The proposed system uses string matching with sliding window approach to
locate Microsatellites and extracts them.
Importance of Measurements in Smart GridIJERD Editor
Β
- The need to get reliable supply, independence from fossil fuels, and capability to provide clean
energy at a fixed and lower cost, the existing power grid structure is transforming into Smart Grid. The
development of a smart energy distribution grid is a current goal of many nations. A Smart Grid should have
new capabilities such as self-healing, high reliability, energy management, and real-time pricing. This new era
of smart future grid will lead to major changes in existing technologies at generation, transmission and
distribution levels. The incorporation of renewable energy resources and distribution generators in the existing
grid will increase the complexity, optimization problems and instability of the system. This will lead to a
paradigm shift in the instrumentation and control requirements for Smart Grids for high quality, stable and
reliable electricity supply of power. The monitoring of the grid system state and stability relies on the
availability of reliable measurement of data. In this paper the measurement areas that highlight new
measurement challenges, development of the Smart Meters and the critical parameters of electric energy to be
monitored for improving the reliability of power systems has been discussed.
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
Β
The document summarizes a study on the use of ground granulated blast furnace slag (GGBS) and limestone powder to replace cement in self-compacting concrete (SCC). Tests were conducted on SCC mixes with 0-50% replacement of cement with GGBS and 0-20% replacement with limestone powder. The results showed that replacing 30% of cement with GGBS and 15% with limestone powder produced SCC with the highest compressive strength of 46MPa, meeting fresh property requirements. The study concluded that this ternary blend of cement, GGBS and limestone powder can improve SCC properties while reducing costs.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Β
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Β
Are you ready to revolutionize how you handle data? Join us for a webinar where weβll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, weβll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sourcesβfrom PDF floorplans to web pagesβusing FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether itβs populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
Weβll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Β
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Β
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
Β
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power gridβs behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Β
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Β
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Donβt worry, we can help with all of this!
Weβll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. Weβll provide examples and solutions for those as well. And naturally weβll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Β
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
Β
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Project Management Semester Long Project - Acuityjpupo2018
Β
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Β
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
Β
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Fueling AI with Great Data with Airbyte WebinarZilliz
Β
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Β
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Β
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
Β
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Generating privacy-protected synthetic data using Secludy and Milvus
Β
International Journal of Engineering Research and Development (IJERD)
1. International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com
Volume 9, Issue 5 (December 2013), PP. 38-50
EUP-Growth+ - Efficient Algorithm for Mining High
Utility Itemset
V.JayasudhaV.Umarani MPhil, (Phd)
Research ScholarAssistant Professor
Department of Computer ScienceDepartment of Computer Science
Sri Ramakrishna College of Arts and Science Sri Ramakrishna College of Arts and Science
For Women Coimbatore, Indiafor Women Coimbatore, India
Abstract:- In recent years, Utility mining becomes an emerging topicin the field of data mining. From a
transaction database the discovery of itemsets with high utility like profits are referred as a high utility itemsets
mining. In this paper, a new algorithm is proposed,named Enhanced Utility Pattern Growth+ (EUP-Growth+),
for reducing a large number of candidate itemsets for high utility itemsetswith a set of effective strategies. These
strategies are used for pruning candidate itemsets effectively. By reducing a hefty number of candidate itemsets
the mining performance upgrades in terms of execution time and space requirement. The selective information
ofpotential high utility itemsetsare stored in the appropriate memory using a hashing technique and maintained
in a tree-based data structure named Improved Utility Pattern Tree (IMUP-Tree). The performance of EUPGrowth+ is compared with the State-of-the-art algorithms on many types of both real and synthetic data sets.
Experimental and comparative results reveal that the proposed algorithms, EUP- Growth+, not only reduce the
number of PHUIs effectivelybut also outperform other algorithms.
Index Terms:- Candidate pruning, utility mining, frequent itemset, potential high utility itemset,
I.
INTRODUCTION
Data mining is the process of enlightening non-trivial, formerly unknown and potentially useful
information from large databases. Association Rule Mining is one of the popular and well research technique in
data mining for finding interesting pattern between variables in a large database. The most
well-known
example for association rule mining is Market basket analysis. In ARM, the most important pattern mining is
frequent pattern mining which is a fundamental research topic that has been applied to different kinds of
databases, such as transnationaldatabases [1], [14], [21], streaming databases [18], [27], and time series
databases [9], [12], and various application domains, such as Bioinformatics [8], [11], [20], Web click-stream
analysis [7], [35], and mobile environments [15], [36]. The frequent itemsets identified by ARM reflect only the
frequency of the existence or nonexistence of an item. Hence, the major drawbacks of frequent pattern mining
are; first the impact of any other factor are not consider in frequent pattern mining. Next the non-frequent
itemsets may contribute a large portion of the profit. Finallythe relative importance of each item is not
considered in frequent pattern mining. Recently, to address the limitation of ARM, many types of association
rule mining were defined as weightedfrequent pattern mining and Utility Mining.
In weighted frequent pattern mining, weights of items such as price and profits are considered in the transaction
database.With this perception, even if some items come out infrequently, they might still be found if they have
higher weights. However, in this weighted pattern mining, the quantities of items are not considered yet.
Therefore, it cannot satisfy the user requirements who are interested in discovering the itemsets with high sales
profits.
To overcome this, Utility mining becomes an emerging topicin the field of data mining. In a transaction
database the discovery of itemsets with high utility like profits are referred as a high utility itemsets mining. In a
transaction database, the utility of an item consists of two aspects: 1) External utility and 2) Internal utility.
External utility provides importanceβs to distinct items. Internal utility provides importanceβsto item in
transactions. The utility of an itemset is defined as the product of external utility and internal utility. The utility
of an itemset which is greater than a user-specified minimum utility threshold is called a high utility itemset;
otherwise, it is called a low-utility itemset.
However, mining high utility itemsets from databases is not an easy task since downward closure
property [1] in frequent itemset mining does not hold. In other words, pruning the search space for high utility
itemset mining is difficult because a superset of a low-utility itemset may be a high utility itemset. A simple
method to address this problem is to compute all itemsets from databases by the principle of fatigue. Obviously,
this method requires large search space, particularly when databases restrain lots of long transactions or a lower
38
2. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
minimum utility threshold is set. Hence, In order to reduce the search space and efficiently capture all high
utility itemsets with no mess is a crucial challenge in utility mining.
In existing studies, the performance of utility mining are improved by applying the overestimated
methods [3], [10], [16], [17], [19], [24], [29], [30]. In these methods, potential high utility itemsets (PHUIs) are
identified first, and then one more additional database scan is performed for identifying their utilities. However,
these methods generate a hefty number of potential high utility itemsets and their mining performance is
degraded subsequently. This situation may become worse when databases contain many long transactions or low
thresholds are set. The hefty number of PHUIs forms a challenging problem for the mining performance since
the algorithm generates a large number of PHUIs and also it requires high processing time it consumes.
In this paper, the existing UP-Growth+ algorithm is enhanced to generate high utility itemsets efficiently for
large datasets and reduce execution time when compared with existing algorithms. In the experimental section,
experiments are conducted on our enhanced algorithm and existing algorithm with a variety of synthetic and
real-time datasets.
The rest of this paper is organized as follows: In Section 2, the background and related work for high
utility itemset mining are discussed. In Section 3, the proposed data structure and algorithms are described in
details. In section 4, the experimental results are shown and conclusions are given in Section 5.
II.
BACKGROUNDS
In this section, we first define the preliminary work of utility mining, and then introduce related work
in utility mining.
2.1 Preliminary Work
Given a finite set of items I= {π1 ,π2 , β¦ π π }, each item π π ( 1 β€ π β€ π ) has a unit profit pr (π π ). An
itemset X is a set of k distinct items {π1 ,π2 , β¦ π π }, whereπ π β πΌ, 1 β€ π β€ π. K is the length of X. An itemset with
length k is called a k- itemset. A transaction database D = {π1 ,π2 , β¦ π π }Contains a set of transactions, and each
transaction π π (1 β€ π β€ π ) has a unique identifier d, called TID. Each item π π in transaction π π is associated with
a quantity q (π π , π π ), that is, the purchased quantity of π π in π π .
Definition: Utility of an item π π in a transaction π π is denoted as u (π π, π π ) and defined as pr (π π ) Γ q (π π , π π ).
Definition: Utility of an itemset X in π π is denoted as u ( π, π π ) and defined as β π π β πβ§πβ π π u (π π, π π ).
Definition: Utility of an itemset X in D is denoted as u (π) and defined β πβ π π β§ π π βπ· u ( π, π π )
Definition: An itemset is called a high utility itemset if its utility is no less than a user-specified minimum
utility threshold which is denoted as min_util. Definition: Transaction utility of a transaction π π is denoted as
TU ( π π ) and defined as u ( π π ,π π ).
Definition: Transaction-weighted utility of an itemset X is the sum of the transaction utilities of all the
transactions containing X, which is denoted as TWU(X) and defined as β πβ π π β§ π π βπ· TU( π π ).
Definition: An itemset X is called a high transaction- weighted utility itemset (abbreviated as HTWUI) if
TWU(X) is no less than min_util.
2.2 Related Work
Extensive studies have been proposed for mining frequent patterns [1], [2], [13], [14], [21], [22], [34],
and [40]. Among this, Apriori [1] is the first association rule mining algorithm that pioneered the use of support
based pruning to systematically control the exponential growth of candidate itemsets. Apriori algorithm faces
two problems dealing with large datasets; first it requires multiple scans of transaction database, incurring a
major time cost. Second it generates too many candidate sets which take a lot of memory space. All of the
Apriori-based mining algorithms [1],[2],[3] have time and space cost problems when handling a huge number of
candidate sets and a large database.
Numerous pattern growth-based association rule mining algorithms are available in the literature [14],
[21]. FP-Growth [14] is widely recognized. It achieves a better performance than Apriori-based algorithms
since it finds frequent itemsets without generating any candidate itemset and scans database just twice. In the
framework of frequent itemset mining, the importance of items is not considered and also it does not satisfies
user requirement.
To overcome this, the topic called weighted association rule mining was brought to the attention [4],
[26], [28], [31], [37], [38], [39]. CAI et al. First proposed in the concept of weighted items and weighted
association rules [4]. However, since the framework of weighted association rules does not have downward
closure property, mining performance is degrades subsequently. To address this problem, the perception of
weighted downward closure property [28], use transaction weight and weighted support that can not only reflect
the importance of an itemset but also maintain the downward closure property during the mining process. The
goal is to steer the mining focus to those significant relationships involving items with significant weights rather
than being flooded in the combinatorial explosion of insignificant relationships. Although weighted association
rule mining considers the importance of items but still items, quantities in transactions database are not taken
39
3. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
into considerations. Thus, the problem of high utility itemset mining is elevated and many studies[3], [5], [10],
[16], [17], [19], [24], [25], [29], [30], [32], [33]have elucidated this issues. Recent research has focused on
efficient high utility mining using intermediate anti-monotone measures for pruning the search space. Liu et al
[19] proposed a two phase algorithm to mine high utility itemsets. In phase I, it employs an Apriori-based levelwise method to enumerate HTWUIs. Candidate itemsets with length k are generated from length k-1 HTWUIs
and once in each pass scans the database to compute theirTWUs. After the above steps, the whole set of
HTWUIs is collected in phase I. In phase II,high utility itemsets (HTWUI) are identified with an additional
database scan. Although Two-Phase algorithm reduces search space by using TWDC property, nonetheless it
generates too many candidates to obtain HTWUIs and requires multiple database scans. A framework for high
utility itemset mining was proposed recently by Yao et al [16, It is a mining method for describing pruning
strategies based on the mathematical properties of utility constraints. It developed an algorithm named Umining
and other heuristic based algorithm Umining_H to discover high utility itemsets. However, this algorithm is
based on the mathematical approach and it suffers from poor performance when mining dense datasets and long
patterns much like the Apriori algorithm for frequent pattern mining.
An isolated item discarding strategy (abbreviated as IIDS) was proposed by Li et al. [17] to reduce the
number of candidates. During level-wise search the isolated items are pruned and reduces the number of
candidateitemsets. However, this algorithm still scans database for several times and uses a candidate
generation-and-test scheme to find high utility itemset which increases time complexity.
The two novel algorithms, named utility pattern growth (UP- Growth) and UP-Growth +, and a compact tree
structure, called a utility pattern tree (UP-Tree), for discovering high utility itemsets and maintaining important
information related to utility patterns within databases were proposed by Tseng et al., [30], [31]. Several
strategies are proposed for facilitating the mining processes of UP-Growth and UP-Growth+ by maintaining
only essential information onthe UP - Tree. By these strategies, overestimated utilities of candidates can be well
reduced by discarding utilities of the items that cannot be high utilized or are not involved in the search space.
The proposed strategies decrease the overestimated utilities of PHUIs and also greatly reduces the number of
candidates. Although the UP-Growth+ algorithm reduces the number of potential high utility itemsets for large
datasets. But it takes more execution time and I/O Operation and also it contains overrated utility itemset due
to random memory allocation in the Up - tree.
As stated above, the number of generating PHUIs is a critical issue for the performance of algorithms.
The random memory allocation is a precarious issue for the mining speed of the up-tree. Therefore, this study
aims at proposing several strategies for reducing memory, I/O operations, PHUIs, and overestimated utilities.
By applying the proposed strategies, the number of generated PHUIs can be highly reduced and high utility
itemsets can be identified more efficiently. Finally reduces memory in the UP - tree.
III.
PROPOSED METHODS
The structure of the proposed methods consists of two steps: In first step it requires two database scan
for constructing a global IMUP- Tree with the first two strategies (given in Section 3.1). In second step,PHUIs
are generated recursively from global IMUP-Tree and local IMUP-Trees by EUP-Growth+ with the third and
fourth strategies (given in Section 3.2).
3.1 The Proposed Data Structure: IMUP-Tree
To alleviate the mining speed, merge the UP-tree [30], [31] with one of the hashing technique for
reducing the memory consumption. The improved UP-tree named as an IMUP - Tree is used to store the
information about transactions and high utility itemsetsin the appropriate memory and maintained as tree
structure. Two strategies with RH algorithm are applied for reducing the memory and to store the overestimated
utility of each item during the construction of a global IMUP-Tree.The two strategies and the construction of
global IMUP-tree with the two strategies are briefly discussed in the following sections.
3.1.1 DGUM: Discarding Global Unpromising Items and Memory Allocation during Constructing a
Global IMUP-Tree
The construction of a global IMUP-Tree is accomplished with two scans of the original database. In the
first scan, Transaction Utility of each transaction is computed. At the same time, Transaction Weighted Utility
of each single item is also accrued. By TWDC property, an unpromising item is defined as an item and its
supersets are unpromising to be high utility itemsets if its TWU is less than the minimum utility threshold.
During the second scan of the database, transactions are inserted into a global IMUP-Tree. When a transaction is
retrieved, the unpromising items are removed from the transaction and their utilities should also be eliminated
from the transactionβs TU. This concept forms our first strategy.
Strategy 1. DGUM: Discarding global unpromising items and their actual utilities from transactions and
transaction utilities of the database.
40
4. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
According to DGUM,while utilities of itemsets are being estimated, utilities of unpromising items can be
pragmatic as irrelevant and be discarded. From this,we can realize that unpromising items play no role in high
utility itemsets.
The new TU after pruning unpromising items is called reorganized transaction utility (abbreviated as
RTU). The remaining promising items in the transaction are sorted in the descending order of TWU. Allocate
memory for each promising items in the TWU using RH algorithm are described in the section 3.1.2. Moreover,
before constructing an IMUP-Tree, DGUM can be performed repeatedly till all the promising items are
allocated to the appropriate memory space. Transactions are inserted into allocated memory of the IMUP - tree
which are generated by RH algorithm.
3.1.2Random Hashing
A random hashing algorithm which is a hash-based technique mines the potential high utility itemsets
without any collision into the memory. It is a very efficient method in searching for the exact data item set in a
very short time. The following are the steps which have been performed using RH algorithm.
The process of RH algorithm is to place each and every item in the memory location for the purpose of ease of
usability. The basic things required for hashing process is hash functions. The hash function provides a way of
assigning numbers to the input item such that the item can then be stored in the memory corresponding to the
assigned number. Random hashing efficiently allocates the memory for the itemsets into the IMUP-tree. The
potential high utility itemsets are mined exactly by means of penetrating the IMUP-tree.
Table 3 Pseudo Code for Random Algorithm
1.
Generate descending order of TWU for the promising
items.
2.
Allocate the memory space for tree based on number
of items in the database.
n = (Number of items) + x
Where βxβ may be any integer and
The memory size (n) must be nearest
Prime number to the total number of itemsin the database.
3. Allocate the memory space for the 1st item based on the hash
function
h (k) = [(( (a . k) + b ) mod s ) mod n ]
Where, s is the total number of transactions in the table.
(a ,b) is any random number between the number of Items in
transaction.
k is the items in the TWU
4. The above step is repeated until the memoryis allocated for
all the items in TWU.
5.If collision occurs, change the random number inthe hash
function to allocate the memory forcollided item.
Advantage:
The followings are the advantages of random hashing,
ο·
The time and space complexity of the mining process are gradually reduced.
ο·
It increases the mining process speed.
3.1.3 Strategy DGNM: Decreasing Global Node Utilities and Memory Allocation during Constructing a
Global UP-Tree
It is shown in [3] that the tree-based framework for high utility itemset mining applies the divide-andconquer technique in mining methods. Thus, the search space can be alienated into smaller subspaces. From this
viewpoint, our second proposed strategy for decreasing overestimated utilities is to remove the utilities of
descendant nodes from their node utilities in a global IMUP - Tree. The process is performed during the
construction of the global IMUP-Tree. By applying strategy DGNM, the utilities of the nodes are inserted into
the appropriate memory which reduces memory and also reduces utilities of the nodes that are closer to the root
of a global UP-Tree. DGNM is especially suitable for the databases containing lots of long transactions. Inthe
following sections, the process of constructing a global UP-Tree with strategies DGUM and DGNM are
described.
3.1.4 Constructing a Global UP-Tree by Applying DGUM and DGNM
Recall that the construction of a global IMUP-Tree is performed with two database scans. In the first
scan, Transaction Utility is computed; at the same time, each 1- itemβs TWU is also accrued.
41
7. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
technique, both real and synthetic data sets are used in the experiments. The Synthetic Transactional data set is
generated from the data generated based on fast algorithm for mining association rules [1]. Parameter
descriptions and default values of synthetic data sets are shown in Table 9. Real world data sets Retail, a
Weblog and Chess are obtained from the FIMI Repository [FIMI]. These data sets do not provide proο¬t values
or the quantity of each item for each transaction.
Table 9 Parameter setting of Synthetic Data Sets
Parameter Descriptions
Default
|D| Total number of transactions
T: Average Transactional Length
|I|: Number of distinct items
100K
10
1000
F: Average size of maximal potential
frequent itemsets
6
Q: Maximum number of purchase
items in transactions
10
As for the performance evaluation of the previous utility based pattern mining[19],[16], unit profits
from items in utility tables are generated between 1 and 1,000 by using a log-normal distribution and quantities
of items are generated randomly between 1 and 10.
Dataset
Table 10 Characteristics of real data sets
|D|
T
|I|
Type
Retail
88162
10.3
16470
Sparse
Chess
3196
37.0
75
Dense
Web
Log
Medical
1.692.08
2
16487
71.45
5.267.656
Sparse
32
497
Sparse
Finally, the results are evaluated by using a real life dataset (medical) with real utility values is
collected from medical shoppers that is located in Pollachi. The performance of proposed algorithm was
compared with the existing algorithms UP-Growth [30] and UP-growth+ [31]. For convenience, PHUIs are
called candidates in our experiments. The characteristics of the above data sets are shown in the Table 10.
4.1 COMPARATIVE ANALYSIS OF PROPOSED ALGORITHM ON DIFFERENT DATA SETS
In this part, the performance comparison on three real data sets are shown: dense data set Chess and
sparse data sets Retail and Weblog. First, we show the results on real dense data set Chess in Fig. 4.
Figure 4 Candidateβs comparison on the chess dataset
44
8. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
The chess dataset is an extremely dense data set. Dense data sets have many long frequent as well as
high utilization patterns. Because the probability of an itemβs occurrence is very high in every transaction, for
comparatively higher thresholds, dense datasets have too many candidate patterns. Here, ο¬rst comparing the
number of candidates.
From Fig. 4, the performance of the proposed methods substantially outperforms that of previous
methods. The minimum utility threshold range of 900% to 1300% is used here. It is remarkable that since chess
is a huge dense dataset, a large number of candidate patterns occur at a comparatively higher threshold (above
50%). The number of candidates increases rapidly below the utility threshold of 1000%. For utility thresholds of
900% and 1000%, the numbers of candidate patterns are too large for the existing algorithms. The runtime of
UPT&UPG is the worst, followed by UPT&UPG, UPT&UPG+ and IMUPT & EUPG+ is the best. The main
reason is the performance of UPT&UPG and UPT&UPG+ is decided by the number of generating candidates.
Figure 5 Time Comparision on Chess Data Set
Figure 5 it clearly shows the runtime comparison on the chess dataset. The runtime of the methods is
just proportional to their number of candidates, that is, more the candidates produced by the method leads to the
greater execution time. Due to several database scans with a large candidate set, the total time needed for the
algorithm is also very large when compared to sparse data sets.
4.1.2 Performance analysis for Retail Data Set
The dataset retail is provided by Tom Brijs (FIMI), and contains the retail market basket data from an
anonymous Belgian retail store.It is an extreme sparse dataset. Sparse data sets normally have too many distinct
items. Although in the average case their transaction length is small, they normally have many
transactions.From Fig 5.3, it evidently shows the comparison of the candidateβs on the Retail dataset. The
minimum utility threshold range of 400 to 800 is used
Figure 6 Candidateβs comparison on the RetailDataset
here. It is clear that the performance of UPT&UPG is the worst since it generates the most candidates.
45
9. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
Table 7Time Comparison on the Retail Dataset
From Fig 7, it clearly shows the running time comparison on using the Retail Data Set. It's clear that
the runtime of UPT&UPG is the worst, followed by UPT&UPG, UPT&UPG +, and IMUPT&EUPG+ is the best.
Besides, although the number of candidates of UPT&FPG, UPT&UPG, and UPT&UPG+ are almost the same,
the execution time of UPT&UPG is the worst among the three methods since UP-Growth+ and EUP-Growth +
efficiently prune the search space of local IMUP-Trees.
4.2 PERFORMANCE COMPARISON UNDER DIFFERENT PARAMETERS
The performance under varied average transaction length (T) is shown in Fig 8.
Figure 8 Time Comparison on the Synthetic Dataset
The experiment is performed on synthetic data sets Tx.F6.|I|1,000.|D|100k and min_util is set to 1
percent. From Fig 8, it's evident that the runtime of all algorithms increases with increasing T because when T
is larger, transactions and databases become longer and larger. Also, the runtime of the methods is proportional
to the number of candidates. The difference of the performance between the methods appears when T is larger
than 25. The best method is UPT&UPG+ and the worst one is UPT&UPG.
Figure 9 Candidateβs comparison on the Synthetic Dataset
From Fig.9, it clearly shows the number of candidates generated by UPT&UPG+ is the smallest. This
shows that EUP-Growth+ can effectively prune more candidates by decreasing overestimated utilities when
transactions are longer. In other words, UP-Growth+ is more efficient on the data sets with longer transactions.
46
10. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
4.3 SCALABILITY OF THE PROPOSED METHODS
In this section, the scalability of the proposed method is performed on synthetic data sets
T10.F6.|I|1,000.|D|xk. Results of run time, number of candidates and the number of high utility itemsets are
shown in Fig.10 and Table 10 respectively.
Fig. 10 Experimental results under varied database size.
Fig.10 shows that there are significant differences in runtime on different database size. Total runtime
of IMUPT&EUPG+ is the best, followed by UPT&UPG+ and UPT&UPG being the worst. This is because
when the size of the database increases, runtime for identifying high utility itemsets also increases. Here, the
importance of runtime is emphasized again. From Table 10, it is clear that the number of PHUIs generated by
IMUPT&EUPG+ outperforms other methods in the databases with varied database sizes. Overall, the
performance of IMUPT&EUPG+ outperforms the other compared algorithms with increasing size of databases
since it generates the least PHUIs.
TABLE 10: Number of Candidates and High Utility Item sets under Varied Database Sizes
Database UPT&UPG UPT&UPG+ IMUPT&EUPG+
200k
51,534
32,261
18,175
400k
56,844
39,450
18,976
600k
52,845
34,164
17,324
800k
52,491
35,645
18,188
1000k
50,073
32,789
17,144
4.4 MEMORY USAGE OF THE PROPOSED METHODS
In this section, memory consumption of the proposed methods (in GB) is shown in the Tables 11 and
12 under varied min_util on Retail data set and varied database sizes on synthetic data sets
T10.F6.|I|1,000.|D|xk, respectively.
In Table 11, it is clear that the memory usage of all methods increases with decreasing min_util since
less min_util makes UP-Trees and IMUP- Trees larger. Generally, IMUP&EUPG+ uses the least memory in
IMUP-tree to store the PHUIs. This is because the strategies effectively decrease the number of PHUIs in local
IMUP-Trees.
Table 11 Memory in (GB) Usage under Varied min_util on Retail Dataset
Memory (GB)
Min_util
UPT&UPG
UPT&UPG+
IMUPT&EUPG+
0.1%
1.017
0.478
0.299
0.08%
1.033
0.923
0.916
0.06%
1.132
1.069
1.021
47
11. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
0.04%
1.294
1.188
1.125
0.02%
1.364
1.288
1.174
In the Table 12, it is clear that the memory usage increases with increasing database size. Generally,
IMUP&EUPG+ uses the least memory among the three methods.This is because the strategies effectively
decrease the number of PHUIs in local IMUP-Trees. On the other hand, the fewer PHUIs are generated by
IMUPT&EUPG+, it consumes less memory.
Table 12 Memory in (GB) Usage under Varied Database SizesT10.F6.|I|1000.|D|xK (min_util = 0.1
Percent)
Memory (GB)
Database UPT&UPG UPT&UPG+ IMUPT&EUPG+
200K
1.221
1.028
0.979
400K
1.350
1.290
1.185
600K
1.490
1.354
1.252
800K
1.555
1.422
1.326
1000K
1.699
1.548
1.437
5.7 Summary of the Experimental Results
Experimental results in this section show that the proposed methods outperform the state-of-the-art
algorithms almost in all cases on both real and synthetic data sets. The reasons are described as follows,
First, memory used by global IMUP-Tree is much less the memory used by UP-Tree since DGUM and DGNM
effectively decrease memory consumption of utilities by RH-algorithm during the construction of a global
IMUP-Tree.
Second, EUP-Growth+ generate much fewer candidates than UP-growth and UP-Growth+ since DPU,
and DPN are applied during the construction of local IMUP- Trees. By the proposed algorithm with the
strategies, generation of candidates can be more efficient since lots of useless candidates are pruned.
Third, generally, EUP-Growth+ utilizes minimal node utilities and a path node utility count for further
decreasing overestimated utilities of itemsets. They are more effective especially when there are many longer
transactions in databases. By the reasons mentioned above, the proposed algorithm EUP-Growth+ achieve better
performance than UP- Growth and UP-Growth+ algorithms.
V.
CONCLUSIONS
In this thesis, an efficient algorithm for mining high utility itemset called EUP-growth+ is proposed for
mining high utility itemsets with a set of effective strategies for pruning potential high utility itemsets. A data
structure namedIMUP-Tree is proposed for maintaining the information of high utility itemsets into the
appropriate memory using hashing techniques for reducing memory and time. The EUP-Growth + algorithm
efficiently generatesPHUIs from IMUP-Tree with only two database scans. By developing four strategies, the
mining performance is enhanced significantly since both the search space and the number of candidates are
effectively reduced. The proposed algorithm builds the IMUP - tree to reduce the memory consumption while
storing the utility itemsets. An IMUP-tree is built only for pruned database that fit into main memory easily.
According to recent observations, the performances of the algorithms strongly depends on the minimum utility
levels and the features of the data sets (the nature and the size of the data sets). Therefore it is employed in the
proposed algorithm to guarantee the time and the memory are further reduced in the case of sparse and dense
data sets. In the experiments, both real and synthetic data sets were used for performance evaluation.
Experimental results reveal that the strategies considerably improved performance by reducing the search space,
time and the number of candidates. It is evident from experiments that EUPG + algorithm outperforms
substantially than UPG+ and UPG algorithms.
REFERENCES
[1].
[2].
[3].
R. Agrawal and R. Srikant, βFast Algorithms for Mining Association Rules,β Proc. 20th Intβl Conf.
Very Large Data Bases (VLDB), pp. 487-499, 1994.
R. Agrawal and R. Srikant, βMining Sequential Patterns,β Proc. 11th Intβl Conf. Data Eng., pp. 3-14,
Mar. 1995.
C.F. Ahmed, S.K. Tanbeer, B.-S. Jeong, and Y.K. Lee, βEfficient Tree Structures for High Utility
Pattern Mining in Incremental Databases,β IEEE Trans. Knowledge and Data Eng., vol. 21, no. 12, pp.
1708-1721, Dec. 2009.
48
12. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
[4].
[5].
[6].
[7].
[8].
[9].
[10].
[11].
[12].
[13].
[14].
[15].
[16].
[17].
[18].
[19].
[20].
[21].
[22].
[23].
[24].
[25].
[26].
[27].
[28].
[29].
C.H. Cai, A.W.C. Fu, C.H. Cheng, and W.W. Kwong, βMining Association Rules with Weighted
Items,β Proc. Intβl Database Eng. and Applications Symp. (IDEAS β98), pp. 68-77, 1998.
R. Chan, Q. Yang, and Y. Shen, βMining High Utility Itemsets,β Proc. IEEE Third Intβl Conf. Data
Mining, pp. 19-26, Nov. 2003.
J.H. Chang, βMining Weighted Sequential Patterns in a Sequence Database with a Time-Interval
Weight,β Knowledge-Based Systems, vol. 24, no. 1, pp. 1-9, 2011.
M.S. Chen, J.S. Park, and P.S. Yu, βEfficient Data Mining for Path Traversal Patterns,β IEEE Trans.
Knowledge and Data Eng., vol. 10, no. 2, pp. 209-221, Mar. 1998.
C. Creighton and S. Hanash, βMining Gene Expression Databases for Association Rules,β
Bioinformatics, vol. 19, no. 1, pp. 79-86, 2003.
M.Y. Eltabakh, M. Ouzzani, M.A. Khalil, W.G. Aref, and A.K. Elmagarmid, βIncremental Mining for
Frequent Patterns in Evolving Time Series Databases,β Technical Report CSD TR#08- 02, Purdue
Univ., 2008.
A. Erwin, R.P. Gopalan, and N.R. Achuthan, βEfficient Mining of High Utility Itemsets from Large
Data Sets,β Proc. 12th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining
(PAKDD), pp. 554-561, 2008.
E. Georgii, L. Richter, U. Ruckert, and S. Kramer, βAnalyzing Microarray Data Using Quantitative
Association Rules,β Bioinformatics, vol. 21, pp. 123-129, 2005.
J. Han, G. Dong, and Y. Yin, βEfficient Mining of Partial Periodic Patterns in Time Series Database,β
Proc. Intβl Conf. on Data Eng., pp. 106-115, 1999.
J. Han and Y. Fu, βDiscovery of Multiple-Level Association Rules from Large Databases,β Proc. 21th
Intβl Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
J. Han, J. Pei, and Y. Yin, βMining Frequent Patterns without Candidate Generation,β Proc. ACMSIGMOD Intβl Conf. Management of Data, pp. 1-12, 2000.
S.C. Lee, J. Paik, J. Ok, I. Song, and U.M. Kim, βEfficient Mining of User Behaviors by Temporal
Mobile Access Patterns,β Intβl J. Computer Science Security, vol. 7, no. 2, pp. 285-291, 2007.
H.F. Li, H.Y. Huang, Y.C. Chen, Y.J. Liu, and S.Y. Lee, βFast and Memory Efficient Mining of High
Utility Itemsets in Data Streams,β Proc. IEEE Eighth Intβl Conf. on Data Mining, pp. 881- 886, 2008.
Y.C. Li, J.S. Yeh, and C.C. Chang, βIsolated Items Discarding Strategy for Discovering High Utility
Itemsets,β Data and Knowledge Eng., vol. 64, no. 1, pp. 198-217, Jan. 2008.
C.H. Lin, D.Y. Chiu, Y.H. Wu, and A.L.P. Chen, βMining Frequent Itemsets from Data Streams with a
Time-Sensitive Sliding Window,β Proc. SIAM Intβl Conf. Data Mining (SDM β05), 2005.
Y. Liu, W. Liao, and A. Choudhary, βA Fast High Utility Itemsets Mining Algorithm,β Proc. UtilityBased Data Mining Workshop, 2005.
R. Martinez, N. Pasquier, and C. Pasquier, βGenMiner: Mining nonredundant Association Rules from
Integrated Gene Expression Data and Annotations,β Bioinformatics, vol. 24, pp. 2643-2644, 2008.
J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, βH-Mine: Fast and Space-Preserving Frequent
Pattern Mining in Large Databases,β IIE Trans. Inst. of Industrial Engineers, vol. 39, no. 6, pp. 593605, June 2007.
J. Pei, J. Han, B. MortazaviAsl, H. Pinto, Q. Chen, U. Moal, and M.C. Hsu, βMining Sequential
Patterns by Pattern-Growth: The Prefixspan Approach,β IEEE Trans. Knowledge and Data Eng.,
vol.16, no.10, pp. 1424-1440, Oct. 2004.
J. Pisharath, Y. Liu, B. Ozisikyilmaz, R. Narayanan, W.K. Liao, A. Choudhary, and G. Memik NUMineBench Version 2.0 Data Set and Technical Report, http://cucis.ece.northwestern.edu/
projects/DMS/MineBench.html, 2012.
B.E. Shie, H.-F. Hsiao, V., S. Tseng, and P.S. Yu, βMining High Utility Mobile Sequential Patterns in
Mobile Commerce Environments,β Proc. 16th Intβl Conf. Database Systems for Advanced Applications
(DASFAA β11), vol. 6587/2011, pp. 224-238, 2011.
B.E. Shie, V.S. Tseng, and P.S. Yu, βOnline Mining of Temporal Maximal Utility Itemsets from Data
Streams,β Proc. 25th Ann. ACM Symp. Applied Computing, Mar. 2010.
K. Sun and F. Bai, βMining Weighted Association Rules without Preassigned Weights,β IEEE Trans.
Knowledge and Data Eng., vol. 20, no. 4, pp. 489-495, Apr. 2008.
S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, and Y.K. Lee, βEfficient Frequent Pattern Mining over Data
Streams,β Proc. ACM 17th Conf. Information and Knowledge Management, 2008.
F. Tao, F. Murtagh, and M. Farid, βWeighted Association Rule Mining Using Weighted Support and
Significance Framework,β Proc. ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD
β03), pp. 661-666, 2003.
V.S. Tseng, C.J. Chu, and T. Liang, βEfficient Mining of Temporal High Utility Itemsets from Data
Streams,β Proc. ACM KDD Workshop Utility-Based Data Mining Workshop (UBDM β06), Aug. 2006.
49
13. EUP-Growth+ - Efficient Algorithm for Mining High Utility Itemset
[30].
[31].
[32].
[33].
[34].
[35].
[36].
[37].
[38].
[39].
V.S. Tseng, C.W. Wu, B.E. Shie, and P.S. Yu, βUP-Growth: An Efficient Algorithm for High Utility
Itemsets Mining,β Proc. 16th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD
β10), pp. 253-262, 2010.
V.S. Tseng, C.W. Wu, B.E. Shie, and P.S. Yu, β An Efficient Algorithm for Mining High Utility
Itemsets from Transactional Database,β IEEE Transactions On Knowledge And Data Engineering,
Vol. 25, No. 8, August 2013
W. Wang, J. Yang, and P. Yu, βEfficient Mining of Weighted Association Rules (WAR),β Proc. ACM
SIGKDD Conf. Knowledge Discovery and Data Mining (KDD β00), pp. 270-274, 2000.
H. Yao, H.J. Hamilton, and L. Geng, βA Unified Framework for Utility-Based Measures for Mining
Itemsets,β Proc. ACM SIGKDD Second Workshop Utility-Based Data Mining, pp. 28-37, Aug. 2006.
S.J. Yen and Y.S. Lee, βMining High Utility Quantitative Association Rules.β Proc. Ninth Intβl Conf.
Data Warehousing and Knowledge Discovery (DaWaK), pp. 283-292, Sept. 2007.
C.H. Yun and M.S. Chen, βUsing PatternJoin and Purchase Combination for Mining Web Transaction
Patterns in an Electronic Commerce Environment,β Proc. IEEE 24th Ann. Intβl Computer Software and
Application Conf., pp. 99-104, Oct. 2000.
C.H. Yun and M.S. Chen, βMining Mobile Sequential Patterns in a Mobile Commerce Environment,β
IEEE Trans. Systems, Man, and CyberneticsPart C: Applications and Rev., vol. 37, no. 2, pp. 278-295,
Mar. 2007.
U. Yun, βAn Efficient Mining of Weighted Frequent Patterns with Length Decreasing Support
Constraints,β KnowledgeBased Systems, vol. 21, no. 8, pp. 741-752, Dec. 2008.
U. Yun and J.J. Leggett, βWFIM: Weighted Frequent Itemset Mining with a Weight Range and a
Minimum Weight,β Proc. SIAM Intβl Conf. Data Mining (SDM β05), pp. 636-640, 2005.
U. Yun and J.J. Leggett, βWIP: Mining Weighted Interesting Patterns with a Strong Weight and/or
Support Affinity,β Proc. SIAM Intβl Conf. Data Mining (SDM β06), pp. 623-627, Apr. 2006.
50