A Survey Report on High Utility Itemset Mining for Frequent Pattern MiningIJSRD
Data Mining is the Process to extract knowledge from the data or data stream, Mining high utility itemsets focus on the itemset with high profit only, as data or itemset may be static or in a stream mine this type if data has become a significant research topic. In this research paper we have presented different methods available for mining high utility dataset , to develop algorithm for this requires logic to mine the high utility itemsets from data streams, We have compared following algorithms, One pass Algorithm, Two Phase Algorithm, Mining top k-Utility Frequent itemset, and Sliding window based algorithms like MHUI-BIT(Mining High Utility Itemset Based on BITVector), MHUI-TID(Mining HighUtility Itemset based on TID-List) for lexicographical tree based summary data structure. And the FHM algorithm which we will used for our proposed work “Fast High Utility Miner) Using estimated utility Co-occurrence Pruning.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURIidescitation
Classical frequent itemset mining identifies frequent itemsets in transaction
databases using only frequency of item occurrences, without considering utility of items. In
many real world situations, utility of itemsets are based upon user’s perspective such as
cost, profit or revenue and are of significant importance. Utility mining considers using
utility factors in data mining tasks. Utility-based descriptive data mining aims at
discovering itemsets with high total utility is termed High Utility Itemset mining. High
Utility itemsets may contain frequent as well as rare itemsets. Classical utility mining only
considers items and their utilities as discrete values. In real world applications, such utilities
can be described by fuzzy sets. Thus itemset utility mining with fuzzy modeling allows item
utility values to be fuzzy and dynamic over time. In this paper, an algorithm, FHURI (Fuzzy
High Utility Rare Itemset Mining) is presented to efficiently and effectively mine very-high
(and high) utility rare itemsets from databases, by fuzzification of utility values. FHURI can
effectively extract fuzzy high utility rare itemsets by integrating fuzzy logic with high utility
rare itemset mining. FHURI algorithm may have practical meaning to real-world
marketing strategies. The results are shown using synthetic datasets.
A Survey Report on High Utility Itemset Mining for Frequent Pattern MiningIJSRD
Data Mining is the Process to extract knowledge from the data or data stream, Mining high utility itemsets focus on the itemset with high profit only, as data or itemset may be static or in a stream mine this type if data has become a significant research topic. In this research paper we have presented different methods available for mining high utility dataset , to develop algorithm for this requires logic to mine the high utility itemsets from data streams, We have compared following algorithms, One pass Algorithm, Two Phase Algorithm, Mining top k-Utility Frequent itemset, and Sliding window based algorithms like MHUI-BIT(Mining High Utility Itemset Based on BITVector), MHUI-TID(Mining HighUtility Itemset based on TID-List) for lexicographical tree based summary data structure. And the FHM algorithm which we will used for our proposed work “Fast High Utility Miner) Using estimated utility Co-occurrence Pruning.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURIidescitation
Classical frequent itemset mining identifies frequent itemsets in transaction
databases using only frequency of item occurrences, without considering utility of items. In
many real world situations, utility of itemsets are based upon user’s perspective such as
cost, profit or revenue and are of significant importance. Utility mining considers using
utility factors in data mining tasks. Utility-based descriptive data mining aims at
discovering itemsets with high total utility is termed High Utility Itemset mining. High
Utility itemsets may contain frequent as well as rare itemsets. Classical utility mining only
considers items and their utilities as discrete values. In real world applications, such utilities
can be described by fuzzy sets. Thus itemset utility mining with fuzzy modeling allows item
utility values to be fuzzy and dynamic over time. In this paper, an algorithm, FHURI (Fuzzy
High Utility Rare Itemset Mining) is presented to efficiently and effectively mine very-high
(and high) utility rare itemsets from databases, by fuzzification of utility values. FHURI can
effectively extract fuzzy high utility rare itemsets by integrating fuzzy logic with high utility
rare itemset mining. FHURI algorithm may have practical meaning to real-world
marketing strategies. The results are shown using synthetic datasets.
Weighted frequent pattern mining is suggested to find out more important frequent pattern by considering different weights of each item. Weighted Frequent Patterns are generated in weight ascending and frequency descending order by using prefix tree structure. These generated weighted frequent patterns are applied to maximal frequent item set mining algorithm. Maximal frequent pattern mining can reduces the number of frequent patterns and keep sufficient result information. In this paper, we proposed an efficient algorithm to mine maximal weighted frequent pattern mining over data streams. A new efficient data structure i.e. prefix tree and conditional tree structure is used to dynamically maintain the information of transactions. Here, three information mining strategies (i.e. Incremental, Interactive and Maximal) are presented. The detail of the algorithms is also discussed. Our study has submitted an application to the Electronic shop Market Basket Analysis. Experimental studies are performed to evaluate the good effectiveness of our algorithm..
Parallel Key Value Pattern Matching Modelijsrd.com
Mining frequent itemsets from the huge transactional database is an important task in data mining. To find frequent itemsets in databases involves big decision in data mining for the purpose of extracting association rules. Association rule mining is used to find relationships among large datasets. Many algorithms were developed to find those frequent itemsets. This work presents a summarization and new model of parallel key value pattern matching model which shards a large-scale mining task into independent, parallel tasks. It produces a frequent pattern showing their capabilities and efficiency in terms of time consumption. It also avoids the high computational cost. It discovers the frequent item set from the database.
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...Editor IJMTER
Basic idea is that the search tree could be divided into sub process of equivalence
classes. And since generating item sets in sub process of equivalence classes is independent from
each other, we could do frequent item set mining in sub trees of equivalence classes in parallel. So
the straightforward approach to parallelize Éclat is to consider each equivalence class as a data
(agriculture). We can distribute data to different nodes and nodes could work on data without any
synchronization. Even though the sorting helps to produce different sets in smaller sizes, there is a
cost for sorting. Our Research to analysis is that the size of equivalence class is relatively small
(always less than the size of the item base) and this size also reduces quickly as the search goes
deeper in the recursion process. Base on time using more than using agriculture data we can handle
large amount of data so first we develop éclat algorithm then develop parallel éclat algorithm then
compare with using same data with respect time .with the help of support and confidence.
An Efficient Compressed Data Structure Based Method for Frequent Item Set Miningijsrd.com
Frequent pattern mining is very important for business organizations. The major applications of frequent pattern mining include disease prediction and analysis, rain forecasting, profit maximization, etc. In this paper, we are presenting a new method for mining frequent patterns. Our method is based on a new compact data structure. This data structure will help in reducing the execution time.
Now a day enormous amount of data is getting explored through Internet of Things (IoT) as technologies
are advancing and people uses these technologies in day to day activities, this data is termed as Big Data
having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose
frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by
traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets
but it has large communication cost which reduces execution efficiency. This proposed new pre-processed
k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using kmeans
algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets
from generated clusters using MapReduce programming model. Results shown that execution efficiency of
ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as
one of the pre-processing technique.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Optimized High-Utility Itemsets Mining for Effective Association Mining Paper IJECEIAES
Association rule mining is intently used for determining the frequent itemsets of transactional database; however, it is needed to consider the utility of itemsets in market behavioral applications. Apriori or FP-growth methods generate the association rules without utility factor of items. High-utility itemset mining (HUIM) is a well-known method that effectively determines the itemsets based on high-utility value and the resulting itemsets are known as high-utility itemsets. Fastest high-utility mining method (FHM) is an enhanced version of HUIM. FHM reduces the number of join operations during itemsets generation, so it is faster than HUIM. For large datasets, both methods are very expenisve. Proposed method addressed this issue by building pruning based utility co-occurrence structure (PEUCS) for elimatination of low-profit itemsets, thus, obviously it process only optimal number of high-utility itemsets, so it is called as optimal FHM (OFHM). Experimental results show that OFHM takes less computational runtime, therefore it is more efficient when compared to other existing methods for benchmarked large datasets.
Generation of Potential High Utility Itemsets from Transactional DatabasesAM Publications
Mining high utility item sets from a transactional database refers to the discovery of item sets with high utility.
Previous algorithm such as Apriori and Fp-Growth incurs the problem of producing a large number of candidate item sets for
high utility item sets. Such large number of candidate item sets degrades the mining performance in terms of execution time. So,
to improve the mining performance Up-Growth came into existence. Up-Growth effectively mines the potential high utility item
sets from the Transactional database. The information of high utility item sets is maintained in a tree-based data structure named
utility pattern tree (UP-Tree) such that candidate item sets can be generated efficiently with only two scans of database. The
performance of UP-Growth is compared with the state-of-the-art algorithms on many types of both real and synthetic data sets.
New Research Articles 2019 April Issue International Journal on Computational...ijcsa
International Journal on Computational Science & Applications (IJCSA) is a Bi-monthly peer-reviewed and refereed open access journal that publishes articles which contribute new results in all areas of the Computational Science & Applications. Computational science is interdisciplinary fields in which mathematical models are combined with scientific computing methods to study of a wide range of problems in science and engineering. Modeling and simulation tools find increasing applications not only in fundamental research, but also in real-world design and industry applications.
Data mining techniques application for prediction in OLAP cubeIJECEIAES
Data warehouses represent collections of data organized to support a process of decision support, and provide an appropriate solution for managing large volumes of data. OLAP online analytics is a technology that complements data warehouses to make data usable and understandable by users, by providing tools for visualization, exploration, and navigation of data-cubes. On the other hand, data mining allows the extraction of knowledge from data with different methods of description, classification, explanation and prediction. As part of this work, we propose new ways to improve existing approaches in the process of decision support. In the continuity of the work treating the coupling between the online analysis and data mining to integrate prediction into OLAP, an approach based on automatic learning with Clustering is proposed in order to partition an initial data cube into dense sub-cubes that could serve as a learning set to build a prediction model. The technique of data mining by regression trees is then applied for each sub-cube to predict the value of a cell.
A New Extraction Optimization Approach to Frequent 2 Item setsijcsa
In this paper, we propose a new optimization approach to the APRIORI reference algorithm (AGR 94) for 2-itemsets (sets of cardinal 2). The approach used is based on two-item sets. We start by calculating the 1- itemets supports (cardinal 1 sets), then we prune the 1-itemsets not frequent and keep only those that are frequent (ie those with the item sets whose values are greater than or equal to a fixed minimum threshold). During the second iteration, we sort the frequent 1-itemsets in descending order of their respective supports and then we form the 2-itemsets. In this way the rules of association are discovered more quickly. Experimentally, the comparison of our algorithm OPTI2I with APRIORI, PASCAL, CLOSE and MAXMINER, shows its efficiency on weakly correlated data. Our work has also led to a classical model of sideby-side classification of items that we have obtained by establishing a relationship between the different sets of 2-itemsets.
Design Issues for Search Engines and Web Crawlers: A ReviewIOSR Journals
Abstract: The World Wide Web is a huge source of hyperlinked information contained in hypertext documents.
Search engines use web crawlers to collect these web documents from web for storage and indexing. The prompt
growth of the World Wide Web has posed incomparable challenges for the designers of search engines and web
crawlers; that help users to retrieve web pages in a reasonable amount of time. In this paper, a review on need
and working of a search engine, and role of a web crawler is being presented.
Key words: Internet, www, search engine, types, design issues, web crawlers.
Weighted frequent pattern mining is suggested to find out more important frequent pattern by considering different weights of each item. Weighted Frequent Patterns are generated in weight ascending and frequency descending order by using prefix tree structure. These generated weighted frequent patterns are applied to maximal frequent item set mining algorithm. Maximal frequent pattern mining can reduces the number of frequent patterns and keep sufficient result information. In this paper, we proposed an efficient algorithm to mine maximal weighted frequent pattern mining over data streams. A new efficient data structure i.e. prefix tree and conditional tree structure is used to dynamically maintain the information of transactions. Here, three information mining strategies (i.e. Incremental, Interactive and Maximal) are presented. The detail of the algorithms is also discussed. Our study has submitted an application to the Electronic shop Market Basket Analysis. Experimental studies are performed to evaluate the good effectiveness of our algorithm..
Parallel Key Value Pattern Matching Modelijsrd.com
Mining frequent itemsets from the huge transactional database is an important task in data mining. To find frequent itemsets in databases involves big decision in data mining for the purpose of extracting association rules. Association rule mining is used to find relationships among large datasets. Many algorithms were developed to find those frequent itemsets. This work presents a summarization and new model of parallel key value pattern matching model which shards a large-scale mining task into independent, parallel tasks. It produces a frequent pattern showing their capabilities and efficiency in terms of time consumption. It also avoids the high computational cost. It discovers the frequent item set from the database.
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...Editor IJMTER
Basic idea is that the search tree could be divided into sub process of equivalence
classes. And since generating item sets in sub process of equivalence classes is independent from
each other, we could do frequent item set mining in sub trees of equivalence classes in parallel. So
the straightforward approach to parallelize Éclat is to consider each equivalence class as a data
(agriculture). We can distribute data to different nodes and nodes could work on data without any
synchronization. Even though the sorting helps to produce different sets in smaller sizes, there is a
cost for sorting. Our Research to analysis is that the size of equivalence class is relatively small
(always less than the size of the item base) and this size also reduces quickly as the search goes
deeper in the recursion process. Base on time using more than using agriculture data we can handle
large amount of data so first we develop éclat algorithm then develop parallel éclat algorithm then
compare with using same data with respect time .with the help of support and confidence.
An Efficient Compressed Data Structure Based Method for Frequent Item Set Miningijsrd.com
Frequent pattern mining is very important for business organizations. The major applications of frequent pattern mining include disease prediction and analysis, rain forecasting, profit maximization, etc. In this paper, we are presenting a new method for mining frequent patterns. Our method is based on a new compact data structure. This data structure will help in reducing the execution time.
Now a day enormous amount of data is getting explored through Internet of Things (IoT) as technologies
are advancing and people uses these technologies in day to day activities, this data is termed as Big Data
having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose
frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by
traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets
but it has large communication cost which reduces execution efficiency. This proposed new pre-processed
k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using kmeans
algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets
from generated clusters using MapReduce programming model. Results shown that execution efficiency of
ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as
one of the pre-processing technique.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Optimized High-Utility Itemsets Mining for Effective Association Mining Paper IJECEIAES
Association rule mining is intently used for determining the frequent itemsets of transactional database; however, it is needed to consider the utility of itemsets in market behavioral applications. Apriori or FP-growth methods generate the association rules without utility factor of items. High-utility itemset mining (HUIM) is a well-known method that effectively determines the itemsets based on high-utility value and the resulting itemsets are known as high-utility itemsets. Fastest high-utility mining method (FHM) is an enhanced version of HUIM. FHM reduces the number of join operations during itemsets generation, so it is faster than HUIM. For large datasets, both methods are very expenisve. Proposed method addressed this issue by building pruning based utility co-occurrence structure (PEUCS) for elimatination of low-profit itemsets, thus, obviously it process only optimal number of high-utility itemsets, so it is called as optimal FHM (OFHM). Experimental results show that OFHM takes less computational runtime, therefore it is more efficient when compared to other existing methods for benchmarked large datasets.
Generation of Potential High Utility Itemsets from Transactional DatabasesAM Publications
Mining high utility item sets from a transactional database refers to the discovery of item sets with high utility.
Previous algorithm such as Apriori and Fp-Growth incurs the problem of producing a large number of candidate item sets for
high utility item sets. Such large number of candidate item sets degrades the mining performance in terms of execution time. So,
to improve the mining performance Up-Growth came into existence. Up-Growth effectively mines the potential high utility item
sets from the Transactional database. The information of high utility item sets is maintained in a tree-based data structure named
utility pattern tree (UP-Tree) such that candidate item sets can be generated efficiently with only two scans of database. The
performance of UP-Growth is compared with the state-of-the-art algorithms on many types of both real and synthetic data sets.
New Research Articles 2019 April Issue International Journal on Computational...ijcsa
International Journal on Computational Science & Applications (IJCSA) is a Bi-monthly peer-reviewed and refereed open access journal that publishes articles which contribute new results in all areas of the Computational Science & Applications. Computational science is interdisciplinary fields in which mathematical models are combined with scientific computing methods to study of a wide range of problems in science and engineering. Modeling and simulation tools find increasing applications not only in fundamental research, but also in real-world design and industry applications.
Data mining techniques application for prediction in OLAP cubeIJECEIAES
Data warehouses represent collections of data organized to support a process of decision support, and provide an appropriate solution for managing large volumes of data. OLAP online analytics is a technology that complements data warehouses to make data usable and understandable by users, by providing tools for visualization, exploration, and navigation of data-cubes. On the other hand, data mining allows the extraction of knowledge from data with different methods of description, classification, explanation and prediction. As part of this work, we propose new ways to improve existing approaches in the process of decision support. In the continuity of the work treating the coupling between the online analysis and data mining to integrate prediction into OLAP, an approach based on automatic learning with Clustering is proposed in order to partition an initial data cube into dense sub-cubes that could serve as a learning set to build a prediction model. The technique of data mining by regression trees is then applied for each sub-cube to predict the value of a cell.
A New Extraction Optimization Approach to Frequent 2 Item setsijcsa
In this paper, we propose a new optimization approach to the APRIORI reference algorithm (AGR 94) for 2-itemsets (sets of cardinal 2). The approach used is based on two-item sets. We start by calculating the 1- itemets supports (cardinal 1 sets), then we prune the 1-itemsets not frequent and keep only those that are frequent (ie those with the item sets whose values are greater than or equal to a fixed minimum threshold). During the second iteration, we sort the frequent 1-itemsets in descending order of their respective supports and then we form the 2-itemsets. In this way the rules of association are discovered more quickly. Experimentally, the comparison of our algorithm OPTI2I with APRIORI, PASCAL, CLOSE and MAXMINER, shows its efficiency on weakly correlated data. Our work has also led to a classical model of sideby-side classification of items that we have obtained by establishing a relationship between the different sets of 2-itemsets.
Design Issues for Search Engines and Web Crawlers: A ReviewIOSR Journals
Abstract: The World Wide Web is a huge source of hyperlinked information contained in hypertext documents.
Search engines use web crawlers to collect these web documents from web for storage and indexing. The prompt
growth of the World Wide Web has posed incomparable challenges for the designers of search engines and web
crawlers; that help users to retrieve web pages in a reasonable amount of time. In this paper, a review on need
and working of a search engine, and role of a web crawler is being presented.
Key words: Internet, www, search engine, types, design issues, web crawlers.
Synthesis and Application of Direct Dyes Derived From Terephthalic and Isopht...IOSR Journals
Abstract: The synthesis of direct dyes derived from terephthalic and isophthalic acid using J and H- acids was
undertaken with the view of replacing benzidine moiety in the production of direct dyes. The amide derivatives
of isophthalic and terephthalic acids were used as the coupling components while aniline and its derivatives
were used as the source of diazo components. The amide derivatives were characterized by Gas
Chromatography/Mass Spectrometry and Infra-red analysis. The spectroscopic properties of the dyes in various
solvents were also examined and most of the dyes showed bathochromic shifts when the solvent was changed to
more polar solvents. The dyes also showed positive and negative halochromism with the addition of few drops of
hydrochloric acid (HCl). The synthesized dyes were applied to cotton fabrics and their performance properties
evaluated. They have good exhaustion in the presence of electrolyte and have good wash fastness properties
upon application of after-treating agents of values of 3-4, 4 and 4-5. They also had good fastness properties to
light of values between 4-7. Their resistance to rubbing and perspiration had values between 3 and 4-5. The
toxicity of the synthesized coupling components was studied using the Dietrich Lorke (LD50) method on Albino
miceand they were found to be non-toxic.
Key words: Benzidine, direct dyes, exhaustion, electrolyte, cotton, fastness.
Traditional use of Monocotyledon Plants of Arakuvalley Mandalam, Visakhapatna...IOSR Journals
An ethno-medico botanical survey of plants used in the treatment of different type of diseases of Arakuvalley Mandalam, Visakhapatnam district, Andhra Pradesh was conducted. The information was collected on the basis of personal interviews with traditional healers, tribal doctors and old women of the society. The investigation revealed that 34 Monocotyledon plant species belonging to 10 families and 28 genera are commonly used in the treatment of varies ailments
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Frequent pattern mining techniques helpful to find interesting trends or patterns in
massive data. Prior domain knowledge leads to decide appropriate minimum support threshold. This
review article show different frequent pattern mining techniques based on apriori or FP-tree or user
define techniques under different computing environments like parallel, distributed or available data
mining tools, those helpful to determine interesting frequent patterns/itemsets with or without prior
domain knowledge. Proposed review article helps to develop efficient and scalable frequent pattern
mining techniques.
In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules.
An incremental mining algorithm for maintaining sequential patterns using pre...Editor IJMTER
Mining useful information and helpful knowledge from large databases has evolved into
an important research area in recent years. Among the classes of knowledge derived, finding
sequential patterns in temporal transaction databases is very important since it can help model
customer behavior. In the past, researchers usually assumed databases were static to simplify datamining problems. In real-world applications, new transactions may be added into databases
frequently. Designing an efficient and effective mining algorithm that can maintain sequential
patterns as a database grows is thus important. In this paper, we propose a novel incremental mining
algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce
the need for rescanning original databases.
Efficient discovery of frequent itemsets in large datasets is a crucial task of data mining. In recent
years, several approaches have been proposed for generating high utility patterns; they arise the problems of
producing a large number of candidate itemsets for high utility itemsets and probably degrade mining
performance in terms of speed and space. Mining high utility itemsets from a transactional database refers to the
discovery of itemsets with high utility like profits. Although a number of relevant approaches have been
proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high
utility itemsets. Such a large number of candidate itemsets degrades the mining performance in terms of
execution time and space requirement
In this paper, we present a literature survey of existing frequent item set mining algorithms. The concept of frequent item set mining is also discussed in brief. The working procedure of some modern frequent item set mining techniques is given. Also the merits and demerits of each method are described. It is found that the frequent item set mining is still a burning research topic.
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesVenu Madhav
Frequent itemset mining and association rule generation is
a challenging task in data stream. Even though, various algorithms
have been proposed to solve the issue, it has been found
out that only frequency does not decides the significance
interestingness of the mined itemset and hence the association
rules. This accelerates the algorithms to mine the association
rules based on utility i.e. proficiency of the mined rules. However,
fewer algorithms exist in the literature to deal with the utility
as most of them deals with reducing the complexity in frequent
itemset/association rules mining algorithm. Also, those few
algorithms consider only the overall utility of the association
rules and not the consistency of the rules throughout a defined
number of periods. To solve this issue, in this paper, an enhanced
association rule mining algorithm is proposed. The algorithm
introduces new weightage validation in the conventional
association rule mining algorithms to validate the utility and
its consistency in the mined association rules. The utility is
validated by the integrated calculation of the cost/price efficiency
of the itemsets and its frequency. The consistency validation
is performed at every defined number of windows using the
probability distribution function, assuming that the weights are
normally distributed. Hence, validated and the obtained rules
are frequent and utility efficient and their interestingness are
distributed throughout the entire time period. The algorithm is
implemented and the resultant rules are compared against the
rules that can be obtained from conventional mining algorithms
FP-Tree is also a huge hierarchical data structure and cannot fit into the main memory also it is not suitable for “Incremental-mining” nor used in “Interactive-mining” system
An improved apriori algorithm for association rulesijnlc
There are several mining algorithms of association rules. One of the most popular algorithms is Apriori
that is used to extract frequent itemsets from large database and getting the association rule for
discovering the knowledge. Based on this algorithm, this paper indicates the limitation of the original
Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and
presents an improvement on Apriori by reducing that wasted time depending on scanning only some
transactions. The paper shows by experimental results with several groups of transactions, and with
several values of minimum support that applied on the original Apriori and our implemented improved
Apriori that our improved Apriori reduces the time consumed by 67.38% in comparison with the original
Apriori, and makes the Apriori algorithm more efficient and less time consuming
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
In this paper, MDL based reduction in frequent pattern is presented. The ideal outcome of any pattern mining process is to explore the data in new insights. And also, we need to eliminate the non-interesting patterns that describe noise. The major problem in frequent pattern mining is to identify the interesting patterns. Instead of performing association rule mining on all the frequent item sets, it is feasible to select a sub set of frequent item sets and perform the mining task. Selecting a small set of frequent item sets from large amount of interesting ones is a difficult task. In our approach, MDL based algorithm is used for reducing the number of frequent item sets to be used for association rule mining is presented.
Associationship is an important component of data mining. In real world applications, the knowledge that is used for aiding decision-making is always time-varying. However, most of the existing data mining approaches rely on the assumption that discovered knowledge is valid indefinitely. For supporting better decision making, it is desirable to be able to actually identify the temporal features with the interesting patterns or rules. This paper presents a novel approach for mining Efficient Temporal Association Rule (ETAR). The basic idea of ETAR is to first partition the database into time periods of item set and then progressively accumulates the occurrence count of each item set based on the intrinsic partitioning characteristics. Explicitly, the execution time of ETAR is, in orders of magnitude, smaller than those required by schemes which are directly extended from existing methods because it scan the database only once.
Associationship is an important component of data mining. In real world applications, the knowledge that is used for aiding decision-making is always time-varying. However, most of the existing data mining approaches rely on the assumption that discovered knowledge is valid indefinitely. For supporting better decision making, it is desirable to be able to actually identify the temporal features with the interesting patterns or rules. This paper presents a novel approach for mining Efficient Temporal Association Rule (ETAR). The basic idea of ETAR is to first partition the database into time periods of item set and then progressively accumulates the occurrence count of each item set based on the intrinsic partitioning characteristics. Explicitly, the execution time of ETAR is, in orders of magnitude, smaller than those required by schemes which are directly extended from existing methods because it scan the database only once.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
B017550814
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 5, Ver. V (Sep. – Oct. 2015), PP 08-14
www.iosrjournals.org
DOI: 10.9790/0661-17550814 www.iosrjournals.org 8 | Page
Mining High Utility Itemsets from its Concise and Lossless
Representations
Nasreen Ali A.1
, Arunkumar M2
1
(Computer Science and Engineering, Ilahia College of Engineering and Technology/ Mahatma Gandhi
University, India)
2
(Computer Science and Engineering, Ilahia College of Engineering and Technology/ Mahatma Gandhi
University, India)
Abstract: Mining high utility items from databases using the utility of items is an emerging technology.Recent
algorithms have a drawback in the performance level considering memory and time.Novel strategy proposed
here is the Miner Algorithm.A vertical data structure is used to store the elements with the utility values. A
matrix representation is generated to identify the element co-occurrences and reduce the join operation for the
patterns generated. An extensive experimental study with the datasets shows that the resulting algorithm
reduces the join operation upto 95% compared with the UP Growth state of the art algorithm.
Keywords: Utility, utility list, co-occurences, pruning
I. Introduction
Data mining is concerned with large volumes of data to analyze and automatically discover interesting
regularities or relationships. The primary goal is to discover hidden patterns, unexpected trends in the data. Data
mining activities uses combination of techniques from database technologies, statistics, artificial intelligence
and machine learning. Real world applications include bioinformatics, genetics, medicine, clinical research,
education, retail and marketing research.
Frequency Mining [1] is a popular data mining task with a wide range of applications. Given a
transaction database, it consists of discovering frequent itemsets. i.e. groups of items (itemsets) appearing
frequently in transactions [1]. However, an important limitation of frequency mining is that all items have the
same importance (weight, unit profit or value). These assumptions often do not hold in real applications. For
example, consider a database of customer transactions containing unit profit for each item and different
quantities of each item. Frequency mining algorithms would discard this information and may thus discover
many frequent itemsets generating a low profit and fail to discover less frequent itemsets that generate a high
profit.
Utility mining [5], [6], [7], [8], emerges as an important topic in data mining. In utility mining, each
item has a weight (e.g. unit profit) and can appear more than once in each transaction (e.g. purchase quantity).
The utility of an itemset represents its importance, which can be measured in terms of weight, profit, cost,
quantity or other information depending on the user preference. Utility is a measure of how useful or profitable
an itemset X is .The utility of items in a transaction database consists of two aspects: (1) the importance of
distinct items, which is called external utility, and (2) the importance of the items in the transaction, which is
called internal utility. The utility of an item is defined as the external utility multiplied by the internal utility.
The utility of an itemset X, i.e., u(X), is the sum of the utilities of itemset X in all the transactions containing X.
An itemset X is called a high utility itemset if and only if u(X) > min_utility, where min_utility is a user-defined
minimum utility threshold. However, mining high utility itemsets from databases is not an easy task since
downward closure property in frequent itemset mining does not hold. In other words, pruning search space for
high utility itemset mining is difficult because a superset of a low-utility itemset may be a high utility itemset. A
naı¨ve method to address this problem is to enumerate all itemsets from databases by the principle of
exhaustion. Obviously, this method suffers from the problems of a large search space, especially when databases
contain lots of long transactions or a low minimum utility threshold is set. Hence, how to effectively prune the
search space and efficiently capture all high utility itemsets with no miss is a crucial challenge in utility mining.
To identify high utility itemsets, most existing algorithms first generate candidate itemsets by
overestimating their utilities, and subsequently compute the exact utilities of these candidates. These algorithms
incur the problem that a very large number of candidates are generated, but most of the candidates are found out
to b e not high utility after their exact utilities are computed. In this paper, we propose an algorithm, called
Miner, for high utility itemset mining .Miner uses a novel structure, called utility-list, to store both the utility
information about an itemset and the heuristic information for pruning the search space of Miner. By avoiding
the costly generation and utility computation of numerous candidate itemsets, Miner can efficiently mine high
utility itemsets from the utility lists constructed from a mined database. To reduce the number of costly joins
2. Mining High Utility Itemsets from its Concise and Lossless Representations
DOI: 10.9790/0661-17550814 www.iosrjournals.org 9 | Page
that are performed, we propose a novel pruning strategy named EUCP (Estimated Utility Cooccurrence Pruning)
that can prune itemsets without having to perform joins. This strategy is easy to implement and very effective.
We compare the performance of Miner and UP Growth on real-life datasets. Results show that Miner performs
upto 95% less join operations than UP Growth and is up to six times faster than UP Growth. Experimental
results show that Miner outperforms this algorithm in terms of both running time and memory consumption.
II. Review of literature
Fast Algorithms for mining Association Rule by R.Agrawal and R.Srikant in 1994 proposed Apriori
algorithm.Apriori is more efficient during the candidate generation process for two reasons; Apriori employs a
different candidate’s generation method and a new pruning technique. There are two processes to finds out all
the large itemsets from the database in Apriori algorithm. First the candidate itemsets are generated, and then the
database is scanned to check the actual support count of the corresponding itemsets. During the first scanning of
the database the support count of each item is calculated and the large 1 -itemsets are generated by pruning those
itemsets whose supports are below the predefined threshold. In each pass only those candidate itemsets that
include the same specified number of items are generated and checked.
Advantages are 1] Uses large itemset property. 2] Easily parallelized. 3] Easy to implement. 4] It doesn’t need
to generate conditional pattern bases. Disadvantages: 1] It requires multiple database scans. 2] Assumes
transaction database is memory resident. 3] Generating candidate itemsets.
Mining Frequent Patterns without Candidate Generation by J. Han, J. Pei, and Y. Yin in 2000
proposed a novel frequent pattern tree (FP-tree),which is an extended prefix tree structure for storing
compressed, crucial information about frequent patterns, and develop an efficient FP-tree based mining method,
FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is
achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data
structure, which avoids costly, repeated database scans, (2) FP-tree-based mining adopts a pattern fragment
growth method to avoid the costly generation of a large number of candidate sets, and (3)a partitioning-based,
divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined
patterns in conditional databases, which dramatically reduces the search space. Performance study shows that
the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an
order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent
pattern mining methods. Here the item priority is not taken into consideration. Advantages are 1] It finds
frequent itemsets without generating any candidate itemset 2] Scans database just twice. 3] Does not generate
candidate itemsets. Disadvantages are 1] It treats all items with the same importance/weight/price. 2] Consumes
more memory and performs badly with long pattern data sets.
A Fast High Utility Itemsets Mining Algorithm by Y. Liu, W.-K. Liao and A. Choudhary in 2005
proposed Two Phase algorithm. Utility mining focuses on identifying the itemsets with high utilities. As
“downward closure property” doesn’t apply to utility mining, the generation of candidate itemsets is the most
costly in terms of time and memory space. In this paper, a Two-Phase algorithm is presented to efficiently prune
down the number of candidates and can precisely obtain the complete set of high utility itemsets. In the first
phase, a model is proposed that applies the “transaction-weighted downward closure property” on the search
space to expedite the identification of candidates. In the second phase, one extra database scan is performed to
identify the high utility itemsets. It performs very efficiently in terms of speed and memory cost, and shows
good scalability on multiple processors, even on large databases that are difficult for existing algorithms to
handle. Advantages are 1] It performs very efficiently in terms of speed and memory cost .Disadvantages are 1]
Generate too many candidates to obtain HTWUI require multiple database scan.
UP-Growth: An Efficient Algorithm for High Utility Itemsets Mining by Vincent S. Tseng, Cheng-
Wei Wu, Bai-En Shie, and Philip S. Yu in 2010, proposed an efficient algorithm, namely UP-Growth (Utility
Pattern Growth), for mining high utility itemsets with a set of techniques for pruning candidate itemsets. The
information of high utility itemsets is maintained in a special data structure named UP-Tree (Utility Pattern
Tree) such that the candidate itemsets can be generated efficiently with only two scans of the database. The
experimental results show that UP-Growth not only reduces the number of candidates effectively but also
outperforms other algorithms substantially in terms of execution time, especially when the database contains lots
of long transactions.
Mining High utility Itemsets without Candidate Generation by Mengchi Liu Wuhan proposed the
algorithm HUI-Miner. High utility itemsets refer to the sets of items with high utility like profit in a database,
and efficient mining of high utility itemsets plays a crucial role in many real-life applications and is an
important research issue in data mining area. To identify high utility itemsets, most existing algorithms first
generate candidate itemsets by overestimating their utilities, and subsequently compute the exact utilities of
these candidates. These algorithms incur the problem that a very large number of candidates are generated, but
most of the candidates are found out to be not high utility after their exact utilities are computed. HUI-Miner
3. Mining High Utility Itemsets from its Concise and Lossless Representations
DOI: 10.9790/0661-17550814 www.iosrjournals.org 10 | Page
uses a novel structure, called utility-list, to store both the utility information about an itemset and the heuristic
information for pruning the search space of HUI-Miner. By avoiding the costly generation and utility
computation of numerous candidate itemsets, HUI-Miner can efficiently mine high utility itemsets from the
utility-lists constructed from a mined database. We compared HUI-Miner with the state-of-the-art algorithms on
various databases, and experimental results show that HUI-Miner outperforms these algorithms in terms of both
running time and memory consumption.
III. current work
We first introduce important preliminary definitions.
A transaction database is a set of transactions D = {T1, T2, Tn} such that for each transaction Tc has a
unique identifier c called its Tid. Each item i ∈ I where I is the set of items is associated with a positive number
p(i), called its external importance or utility (e.g. unit profit). For each transaction Tc such that i ∈ Tc, a positive
number q (i, Tc) is called the internal importance or utility of i (e.g. purchase quantity).
Example 1. Consider the database of Fig. 1 (left), which will be used as our running example. This
database contains five transactions (T1, T2...T5). Transaction T2 indicates that items a, c, e and g appear in this
transaction with an internal utility of respectively 2, 6, 2 and 5. FIG. 1 (right) indicates that the external utility of
these items are respectively 5, 1, 3 and 1.
The importance or utility of an item i in a transaction Tc is denoted as u (i, Tc) and defined as p (i) × q
(i, Tc).The importance or utility of an itemset X (a group of items X ⊆ I) in a transaction Tc is denoted as u(X,
Tc) and defined as u(X, Tc) = Pi ∈X u (i, Tc).
Example 2. The utility of item a in T2 is u(a,T2) = 5 × 2 = 10. The utility of the itemset {a, c}
in T2 is u ({a, c}, T2) = u (a, T2) + u(c, T2) = 5 × 2 + 1 × 6 = 16.
The importance or utility of an itemset X is denoted as u(X) and defined as u(X) = PTc ∈g (X) u(X,
Tc), where g(X) is the set of transactions containing X.
Example 3. The importance or utility of the itemset {a,c} is u({a,c}) = u(a)+u(c) = u(a,T1)+u(a,T2) + u(a,T3) +
u(c,T1) + u(c,T2) + u(c,T3) = 5 + 10 + 5 + 1 + 6 + 1 = 28.
The purpose of high-utility itemset mining is to discover all high-utility itemsets. An itemset X is a
high-utility itemset if its utility u(X) is no less than a user-specified minimum utility threshold min_util given by
the user. Otherwise, X is a low-utility itemset.
Example 4. If min_util = 30, the high-utility itemsets in the database of our running example are {b,d}, {a,c,e},
{b,c,d}, {b,c,e}, {b,d,e}, {b,c,d,e} with respectively a utility of 30, 31, 34, 31, 36, 40 and 30.
The utility of the transaction (TU) Tc is the sum of the utility of the items from Tc in Tc. i.e. TU (Tc)
=Px ∈Tc u(x, Tc).
Example 5. FIG. 2 (left) shows the TU of transactions T1, T2, T3, T4, and T5 from our running example.
The transaction utility of transactions containing the item gives the Transaction Weighted Utility of the
item X, i.e. TWU(X) = PT c ∈g (X) TU (Tc).
Example 6. Fig. 2 (center) shows the TWU of single items a, b,c,d, d, e, f, g.Consider item a. TWU(A) =
TU(T1) + TU(T2) + TU(T3) = 8 + 27 + 30 = 65.
The TWU measure has three important properties that are used to prune the search space.
Property (overestimation). The TWU of an itemset X is higher than or equal to its utility, i.e. TWU(X) ≥ u(X)
[8].
Property 2 (antimonotonicity). The TWU measure is anti-monotonic. Let X and Y be two itemsets. If X
⊂ Y, then TWU(X) ≥TWU(Y) [8].
Property 3 (pruning). Let X be an itemset. If TWU(X) < min_util, then the itemset X is a low-utility
itemset as well as all its supersets. Proof. This directly follows from Property 1 and Property 2.
The set of items in the utility-list of an itemset X in a database D is a set of tuples such that there is a
tuple (tid, iutil, rutil) for each transaction T tid containing X. The iutil element of a tuple is the utility of X in
Ttid. i.e. u(X, Ttid). The rutil element of a tuple is defined as Pi ∈Ttid ∧i 6∈X U (i, Ttid).
4. Mining High Utility Itemsets from its Concise and Lossless Representations
DOI: 10.9790/0661-17550814 www.iosrjournals.org 11 | Page
Example 7. The utility-list of {a} is {(T1, 5, 3) (T2, 10, 17) (T3, 5, 25)}. The utility list of {e} is {(T2, 6, 5) (T3,
3, 5) (T4, 3, 0)}. The utility-listof {a, e} is {(T2, 16, 5), (T3, 8, 5)}.
To obtain the high-utility itemsets, Miner perform only one database scan to create utility-lists of
patterns containing single items. Then, bigger patterns are obtained by performing the join operation of utility-
lists of smaller patterns. Pruning the search space is done using the two following properties.
Property 4 (sum of iutils). Let X be an itemset. If the sum of iutil values in the utility-list of x is higher than or
equal to min_util, then X is a high-utility itemset. Otherwise, it is a low-utility itemset [7].
Property 5 (sum of iutils and rutils). Let X be an itemset. Let the extensions of X be the itemsets that can be
obtained by appending an item y to X such that y i for all item i in X. If the sum of iutil and rutil values in the
utility-list of x is less than min_util, all extensions of X and their transitive extensions are low-utility itemsets
[7].
In the next section, we introduce our novel algorithm, which improves upon existing algorithms by
being able to eliminate low-utility itemsets without performing join operations.
Table. 1. A transaction database (left) and external utility values (right)
Table. 2. Transaction utilities (left), TWU values (center) and EUCS (right)
IV. Algorithm
In this section, we present our proposal, the Miner algorithm. The main procedure (Algorithm 1) takes
as input a transaction database with utility values and the min_util threshold. The algorithm first scans the
database to calculate the TWU of each item. Then, the algorithm identifies the set I∗ of all items having a TWU
greater than min_util .TWU values can be used to arrange the items in the ascending order. Items in transactions
are reordered according to the total order during the second database scan and the utility-list of each item i ∈ I∗
is built and the novel structure named EUCS (Estimated Utility Co-Occurrence Structure) is built. This structure
is defined as a set of triples of the form (a, b, c) ∈ I∗ × I∗ × R. A triple (a, b, c) indicates that TWU ({a, b}) = c.
The EUCS can be implemented as a triangular matrix as shown in TABLE. 2 (right) where only tuple
of the form (a, b, c) such that c ≠ 0 are kept. EUCS structure is more memory efficient because only few items
co-occur with other items. Building the EUCS is very fast (it is performed with a single database scan) and
occupies a small amount of memory, bounded by |I∗|×|I∗|, although in practice the size is much smaller because
a limited number of pairs of items co-occur in transactions. After the construction of the EUCS, the depth-first
search exploration of itemsets starts by calling the recursive procedure Search with the empty itemset ∅, the set
of single items I∗, min_util and the EUCS structure.
The Search procedure (Algorithm 2) has input (1) an itemset P, (2) extensions of P having the form Pz
meaning that Pz was previously obtained by appending an item z to P, (3) min_util and (4) the EUCS. The
search procedure operates as follows. The sum of iutil values of each extension of P, i.e Px, is taken and if it is
greater than min_util then Px is a high-utility itemset and it is output. The sums of iutil and rutil values in the
utility list of Px are greater than min_util then the extensions of Px should be explored. This is performed by
merging Px with all extensions Py of P such that order of y greater than x to form extensions of the form Pxy
containing |Px| + 1 items. The utility-list of Pxy is then constructed by calling the Construct procedure (cf.
Algorithm 3) to join the utility-lists of P, Px and Py. Then, a recursive call to the Search procedure with Pxy is
id Transactions
T1 (a,1)(c,1)(d,1)
T2 (a,2)(c,6)(e,2)(g,5)
T3 (a,1)(b,2)(c,1)(d,6),(e,1),(f,5)
T4 (b,4)(c,3)(d,3)(e,1)
T5 (b,2)(c,2)(e,1)(g,2)
Item a b c d e f g
Profit 5 2 1 2 3 1 1
TID TU
T1 8
T2 27
T3 30
T4 20
T5 11
Items TWU
a 61
b 65
c 96
d 58
e 88
f 30
g 38
Item a b c d e f
b 30
c 65 61
d 38 50 58
e 57 61 77 50
f 30 30 30 30 30
g 27 38 38 0 38 0
5. Mining High Utility Itemsets from its Concise and Lossless Representations
DOI: 10.9790/0661-17550814 www.iosrjournals.org 12 | Page
done to calculate its utility and explore its extension(s). Search procedure is recursive and starts from single
items, and appends single items and it only prunes the search space based on Property 5. It can be easily seen
based on Property 4 and 5 that this procedure is correct and complete to discover all high-utility itemsets.
The Construct procedure considers the revised transactions where the transactions are arranged in the
ascending order of the TWU.X/T is the set of all the items in T after X.ru(X/T) is the remaining utility of itemset
X in T which is the sum of all the items in X/T.Here there is no need of database scan. First we identify the
common one item transaction and combine it to form the two item utility list. To construct the k item utility list
we have to combine the k-1 item utility list and k item utility list
Co-occurrence based Pruning. The main novelty in Miner is a pruning mechanism named EUCP
(Estimated Utility Co-occurrence Pruning), which uses the EUCS. EUCP performs the construction of the utility
list by eliminating the low-utility extension of Pxy and all its transitive extensions. This is done on line 8 of the
Search procedure. The pruning condition do not explore Pxy and its supersets if any tuple (x, y, c) in EUCS such
that c ≤ min_util. This strategy is correct (only prune low-utility itemsets). The proof is that by Property 3, if an
itemset X contains another itemset Y such that TWU(Y) <min_util, then X and its supersets are low-utility
itemsets. Search procedure is recursive and will check all other pairs of items in Pxy in previous recursions of
the Search procedure leading to Pxy. For example, consider an itemset Z = {a1, a2, a3, a4}. To generate this
itemset, the search procedure had to combine {a1, a2 a3} and {a1, a2, a4}, obtained by combining {a1, a2} and
{a1, a3}, and {a1, a2} and {a1, a4}, obtained by combining single items. It can be easily observed that when
generating Z all pairs of items in Z have been checked by EUCP except {a3, a4}.
Algorithm 1: Miner Algorithm
input: D: a transaction database, min_util: a user-specified threshold
output: the set of high-utility itemsets
1 Scan D to calculate the TWU of single items;
2 I ∗ ← each item i such that TWU (i) < min_util;
3 Let be the total order of TWU ascending values on I ∗;
4 Scan D to built the utility-list of each item i ∈ I ∗ and build the EUCS structure;
5 Search (∅, I ∗, min_util, EUCS);
Algorithm 2: Search Algorithm
input: P: an itemset, ExtensionsOfP: a set of extensions of P, the min_util threshold, the EUCS
structure
output: the set of high-utility itemsets
1 foreach itemset P x ∈ ExtensionsOfP do
2 if SUM (Px.utilitylist.iutils) ≥ min_util then
3 output Px;
4 end
5 if SUM (Px.utilitylist.iutils) +SUM (Px.utilitylist.rutils) ≥ min_util then
6 ExtensionsOfPx ← ∅;
7 foreach itemset Py ∈ ExtensionsOfP such that y x do
8 if ∃(x, y, c) ∈ EUCS such that c ≥ min_util) then
9 Pxy ← Px ∪ Py;
10 Pxy.utilitylist ← Construct (P, Px, Py);
11 ExtensionsOfPx ← ExtensionsOfPx ∪ Pxy;
12 end
13 end
14 Search (Px, ExtensionsOfPx, min_util);
15 end
16 end
Algorithm 3: Construct Algorithm
input : P: an itemset, P x: the extension of P with an item x, P y: the extension of P with an item y
output: the utility-list of P xy
1 UtilityListOfPxy ← ∅;
2 foreach tuple ex ∈ P x.utilitylist do
3 if ∃ey ∈ P y.utilitylist and ex.tid = exy.tid then
4 if P.utilitylist ≠ ∅ then
6. Mining High Utility Itemsets from its Concise and Lossless Representations
DOI: 10.9790/0661-17550814 www.iosrjournals.org 13 | Page
5 Search element e ∈ P.utilitylist such that e.tid = ex.tid.;
6 exy ← (ex.tid, ex.iutil + ey.iutil − e.iutil, ey.rutil);
7 end
8 else
9 exy ← (ex.tid, ex.iutil + ey.iutil, ey.rutil);
10 end
11 UtilityListOfP xy ← UtilityListOfP xy ∪ {exy};
12 end
13 end
14 return UtilityListPxy;
V. Experimental study
We performed experiments to assess the performance of the proposed algorithm. Experiments were
performed on a computer with a 64 bit Corei5 processor running Windows 7 and 5 GB of free RAM. We
compared the performance of Miner with the state-of-the-art algorithm UP Growth for high-utility itemset
mining. All memory measurements were done using the Java API. Experiments were carried on real-life dataset
having varied characteristics. The dataset contains 1,112,949 transactions with 46,086 distinct items and an
average transaction length of 7.26 items. External utilities for items are generated between 1 and 1,000 by using
a log-normal distribution and quantities of items are generated randomly between 1 and 5, as the settings of [2,
7, 10].
Execution time: We first ran the Miner and UP growth algorithms on the dataset while decreasing the
min_util threshold until algorithms became too long to execute, ran out of memory or a clear winner was
observed. We recorded the execution time, the percentage of candidate pruned by the Miner algorithm and the
total size of the EUCS. The comparison of execution time against min-utility threshold is shown in Fig 1. Miner
was faster up to 6 times than UP growth.
Fig: 1 Performance of Execution Time against Min utility Threshold
Pruning effectiveness: These results show that candidate pruning can be very effective by pruning up to
95% of candidates. As expected, when more pruning was done, the performance gap between Miner and UP
growth became larger.
Fig: 2 Performance of Candidate Items against Min utility threshold
7. Mining High Utility Itemsets from its Concise and Lossless Representations
DOI: 10.9790/0661-17550814 www.iosrjournals.org 14 | Page
Memory overhead: We also studied the memory overhead of using the EUCS structure. We found that
the memory footprint of EUCS was 6 times less than UP Growth. We therefore conclude that the cost of using
the EUCP strategy in terms of memory is low.
Fig: 3 Performance of Memory Complexity against Threshold.
VI. Conclusion
In this paper, we have presented a novel algorithm for high-utility itemset mining named Miner. This
algorithm integrates a novel strategy named EUCP (Estimated Utility Cooccurrence Pruning) to reduce the
number of joins operations when mining high-utility itemsets using the utilitylist data structure. We have
performed an extensive experimental study on real-life datasets to compare the performance of Miner with the
state-of-the-art algorithm UP Growth. Results show that the pruning strategy reduces the search space by upto
95 % and that Miner is up to 8 times faster than UP Growth.
Acknowledgements
The authors wish to thank the Management, the Principal and Head of the Department (CSE) of ICET
for the support and help in completing the work.
References
[1]. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large
databases. In: Proc. Int. Conf. Very Large Databases, pp. 487–499, (1994)
[2]. V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, “UP-Growth: An efficient algorithm for high utility itemset mining,” in Proc.
ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2010, pp. 253–262.
[3]. Y. Liu, W. Liao, and A. Choudhary, “A fast high utility itemsets mining algorithm,” in Proc. Utility-Based Data Mining
Workshop,2005, pp. 90–99.
[4]. Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu,Fellow, IEEE, “Efficient Algorithms for Mining High Utility
Itemsets from Transactional Databases”, IEEE Transactions On Knowledge And Data Engineering, Vol. 25, No. 8, August 2013.
[5]. C. Lucchese, S. Orlando, and R. Perego, “Fast and memory efficient mining of frequent closed itemsets,” IEEE Trans. Knowl. Data
Eng., vol. 18, no. 1, pp. 21–36, Jan. 2006.
[6]. K. Chuang, J. Huang, and M. Chen, “Mining top-k frequent patterns in the presence of the memory constraint,” VLDB J., vol. 17,
pp. 1321–1344, 2008.
[7]. R. Chan, Q. Yang, and Y. Shen, “Mining high utility itemsets,” in Proc. IEEE Int. Conf. Data Min., 2003, pp. 19–26.
[8]. Erwin, R. P. Gopalan, and N. R. Achuthan, “Efficient mining of high utility itemsets from large datasets,” in Proc. Int. Conf.
Pacific- Asia Conf. Knowl. Discovery Data Mining, 2008, pp. 554–561.
[9]. K. Gouda and M. J. Zaki, “Efficiently mining maximal frequent itemsets,” in Proc. IEEE Int. Conf. Data Mining, 2001, pp. 163–
170.
[10]. Mengchi Liu, Junfeng Qu, “Mining High Utility Itemsets without Candidate Generation”,in Proceeding CIKM '12 Proceedings of
the 21st ACM international conference on Information and knowledge management,Pages 55-64
[11]. C. W. Wu, P. Fournier-Viger, P. S. Yu, and V. S. Tseng,” Efficient mining of a concise and lossless representation of high utility
itemsets”, In Proc. IEEE Int’l Conf. Data Mining, pages 824 –833, 2011.
[12]. C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong and Y.-K. Lee, “Efficient tree structures for high utility pattern mining in incremental
databases,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 12, pp. 1708– 1721, Dec. 2009.