This document discusses an optimal approach to derive disjunctive positive and negative association rules from association rule mining using a genetic algorithm. It aims to address some shortfalls of conventional algorithms like Apriori by supporting disjunctive rules, using multiple minimum support thresholds, and effectively identifying negative rules. The proposed approach uses a modified FP-Growth algorithm and genetic algorithm to generate conjunctive and disjunctive positive and negative rules in an optimized manner by reducing candidate generation time and capturing useful rare item relationships.
A model for profit pattern mining based on genetic algorithmeSAT Journals
Abstract
Mining profit oriented patterns is a novel technique of association rule mining in data mining, which basically focuses on important issues related with business. As it is well known that every business aims to generate the profit and find the ways to improve the same. In earlier days association rule mining was used for market basket analysis and targeted only some of the business and commercial aspects. Afterwards the researchers started to aim the most prominent element of any business i.e. Profit, and determined the innovative way to generate the association rules based on profit. Profit oriented patterns mining approach combines the statistic based pattern mining with value-based decision making to generate those patterns with the maximum profit and some ways to generate recommenders for future strategy. To achieve the desired goal the traditional association rule mining alone is not effectual, so we combine the strength of genetic algorithm with association rule mining to enhance its capability. The study shows that Genetic Algorithm improves the effectiveness and efficiency of association rule mining outcome, since genetic algorithms are competent to handle the problems related with the uncertainty, multi-dimensional, non-differential, non-continuous, and non-parametrical, non-linearity constraint and multi-objective optimization problems. In this paper we apply the concept of profit pattern mining with genetic algorithm to generate profit oriented pattern which help out in future business expansion and fulfill the business objective.
Keywords: Data Mining, Association Rule Mining, Profit Pattern Mining, Genetic Algorithm
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...ijsrd.com
Data mining can be defined as the process of uncovering hidden patterns in random data that are potentially useful. The discovery of interesting association relationships among large amounts of business transactions is currently vital for making appropriate business decisions. Association rule analysis is the task of discovering association rules that occur frequently in a given transaction data set. Its task is to find certain relationships among a set of data (itemset) in the database. It has two measurements: Support and confidence values. Confidence value is a measure of rule’s strength, while support value corresponds to statistical significance. There are currently a variety of algorithms to discover association rules. Some of these algorithms depend on the use of minimum support to weed out the uninteresting rules. Other algorithms look for highly correlated items, that is, rules with high confidence. Traditional association rule mining techniques employ predefined support and confidence values. However, specifying minimum support value of the mined rules in advance often leads to either too many or too few rules, which negatively impacts the performance of the overall system. This work proposes a way to efficiently mine association rules over dynamic databases using Dynamic Matrix Apriori technique and Multiple Support Apriori (MSApriori). A modification for Matrix Apriori algorithm to accommodate this modification is proposed. Experiments on large set of data bases have been conducted to validate the proposed framework. The achieved results show that there is a remarkable improvement in the overall performance of the system in terms of run time, the number of generated rules, and number of frequent items used.
This document summarizes an article that proposes a new algorithm for efficiently mining both positive and negative association rules from transactional databases. The algorithm first constructs a frequent pattern tree (FP-tree) to store the transaction information. It then uses an FP-growth approach to iteratively find frequent patterns and generate the positive and negative association rules without candidate generation. The algorithm aims to overcome limitations of previous methods and efficiently find all valid comparative association rules.
IRJET- Minning Frequent Patterns,Associations and CorrelationsIRJET Journal
This document discusses mining frequent patterns, associations, and correlations from data. It begins by defining frequent patterns as patterns that occur often in a dataset. It then discusses market basket analysis and how it is used to find associations between frequently purchased items. The document outlines key concepts for mining patterns including support, confidence, and association rules. It also discusses different types of patterns that can be mined such as closed, maximal and approximate patterns. Finally, it provides an overview of the different methodologies used for pattern mining and applications.
A literature review of modern association rule mining techniquesijctet
This document discusses association rule mining techniques for extracting useful patterns from large datasets. It provides background on association rule mining and defines key concepts like support, confidence and frequent itemsets. The document then reviews several classic association rule mining algorithms like AIS, Apriori and FP-Growth. It explains that these algorithms aim to improve quality and efficiency by reducing database scans, generating fewer candidate itemsets and using pruning techniques.
Top Down Approach to find Maximal Frequent Item Sets using Subset Creationcscpconf
Association rule has been an area of active research in the field of knowledge discovery. Data
mining researchers had improved upon the quality of association rule mining for business
development by incorporating influential factors like value (utility), quantity of items sold
(weight) and more for the mining of association patterns. In this paper, we propose an efficient
approach to find maximal frequent item set first. Most of the algorithms in literature used to find
minimal frequent item first, then with the help of minimal frequent item sets derive the maximal
frequent item sets. These methods consume more time to find maximal frequent item sets. To
overcome this problem, we propose a navel approach to find maximal frequent item set directly using the concepts of subsets. The proposed method is found to be efficient in finding maximal frequent item sets.
This document provides a literature review on data mining with Oracle 10g using clustering and classification algorithms. It discusses the data mining process and common algorithms used, including Naive Bayes, decision trees, k-means clustering, and neural networks. The review categorizes data mining techniques into supervised learning (classification, prediction) and unsupervised learning (clustering, association rule mining). It also outlines the typical 4-step data mining process of problem definition, data preparation, model building and evaluation, and knowledge deployment.
The document summarizes a survey of multi-objective evolutionary algorithms for association rule mining. It discusses how association rule mining seeks to find relationships between items in large datasets. It can be viewed as a multi-objective optimization problem, with objectives like support, confidence, and interestingness that need to be optimized. The document reviews several multi-objective evolutionary algorithms that have been applied to association rule mining, focusing on encoding techniques, objective functions used, evolutionary operators, and methods for selecting the final solution set.
A model for profit pattern mining based on genetic algorithmeSAT Journals
Abstract
Mining profit oriented patterns is a novel technique of association rule mining in data mining, which basically focuses on important issues related with business. As it is well known that every business aims to generate the profit and find the ways to improve the same. In earlier days association rule mining was used for market basket analysis and targeted only some of the business and commercial aspects. Afterwards the researchers started to aim the most prominent element of any business i.e. Profit, and determined the innovative way to generate the association rules based on profit. Profit oriented patterns mining approach combines the statistic based pattern mining with value-based decision making to generate those patterns with the maximum profit and some ways to generate recommenders for future strategy. To achieve the desired goal the traditional association rule mining alone is not effectual, so we combine the strength of genetic algorithm with association rule mining to enhance its capability. The study shows that Genetic Algorithm improves the effectiveness and efficiency of association rule mining outcome, since genetic algorithms are competent to handle the problems related with the uncertainty, multi-dimensional, non-differential, non-continuous, and non-parametrical, non-linearity constraint and multi-objective optimization problems. In this paper we apply the concept of profit pattern mining with genetic algorithm to generate profit oriented pattern which help out in future business expansion and fulfill the business objective.
Keywords: Data Mining, Association Rule Mining, Profit Pattern Mining, Genetic Algorithm
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...ijsrd.com
Data mining can be defined as the process of uncovering hidden patterns in random data that are potentially useful. The discovery of interesting association relationships among large amounts of business transactions is currently vital for making appropriate business decisions. Association rule analysis is the task of discovering association rules that occur frequently in a given transaction data set. Its task is to find certain relationships among a set of data (itemset) in the database. It has two measurements: Support and confidence values. Confidence value is a measure of rule’s strength, while support value corresponds to statistical significance. There are currently a variety of algorithms to discover association rules. Some of these algorithms depend on the use of minimum support to weed out the uninteresting rules. Other algorithms look for highly correlated items, that is, rules with high confidence. Traditional association rule mining techniques employ predefined support and confidence values. However, specifying minimum support value of the mined rules in advance often leads to either too many or too few rules, which negatively impacts the performance of the overall system. This work proposes a way to efficiently mine association rules over dynamic databases using Dynamic Matrix Apriori technique and Multiple Support Apriori (MSApriori). A modification for Matrix Apriori algorithm to accommodate this modification is proposed. Experiments on large set of data bases have been conducted to validate the proposed framework. The achieved results show that there is a remarkable improvement in the overall performance of the system in terms of run time, the number of generated rules, and number of frequent items used.
This document summarizes an article that proposes a new algorithm for efficiently mining both positive and negative association rules from transactional databases. The algorithm first constructs a frequent pattern tree (FP-tree) to store the transaction information. It then uses an FP-growth approach to iteratively find frequent patterns and generate the positive and negative association rules without candidate generation. The algorithm aims to overcome limitations of previous methods and efficiently find all valid comparative association rules.
IRJET- Minning Frequent Patterns,Associations and CorrelationsIRJET Journal
This document discusses mining frequent patterns, associations, and correlations from data. It begins by defining frequent patterns as patterns that occur often in a dataset. It then discusses market basket analysis and how it is used to find associations between frequently purchased items. The document outlines key concepts for mining patterns including support, confidence, and association rules. It also discusses different types of patterns that can be mined such as closed, maximal and approximate patterns. Finally, it provides an overview of the different methodologies used for pattern mining and applications.
A literature review of modern association rule mining techniquesijctet
This document discusses association rule mining techniques for extracting useful patterns from large datasets. It provides background on association rule mining and defines key concepts like support, confidence and frequent itemsets. The document then reviews several classic association rule mining algorithms like AIS, Apriori and FP-Growth. It explains that these algorithms aim to improve quality and efficiency by reducing database scans, generating fewer candidate itemsets and using pruning techniques.
Top Down Approach to find Maximal Frequent Item Sets using Subset Creationcscpconf
Association rule has been an area of active research in the field of knowledge discovery. Data
mining researchers had improved upon the quality of association rule mining for business
development by incorporating influential factors like value (utility), quantity of items sold
(weight) and more for the mining of association patterns. In this paper, we propose an efficient
approach to find maximal frequent item set first. Most of the algorithms in literature used to find
minimal frequent item first, then with the help of minimal frequent item sets derive the maximal
frequent item sets. These methods consume more time to find maximal frequent item sets. To
overcome this problem, we propose a navel approach to find maximal frequent item set directly using the concepts of subsets. The proposed method is found to be efficient in finding maximal frequent item sets.
This document provides a literature review on data mining with Oracle 10g using clustering and classification algorithms. It discusses the data mining process and common algorithms used, including Naive Bayes, decision trees, k-means clustering, and neural networks. The review categorizes data mining techniques into supervised learning (classification, prediction) and unsupervised learning (clustering, association rule mining). It also outlines the typical 4-step data mining process of problem definition, data preparation, model building and evaluation, and knowledge deployment.
The document summarizes a survey of multi-objective evolutionary algorithms for association rule mining. It discusses how association rule mining seeks to find relationships between items in large datasets. It can be viewed as a multi-objective optimization problem, with objectives like support, confidence, and interestingness that need to be optimized. The document reviews several multi-objective evolutionary algorithms that have been applied to association rule mining, focusing on encoding techniques, objective functions used, evolutionary operators, and methods for selecting the final solution set.
Market Basket Analysis Revisited using SQL Pattern Matching Shankar Somayajula
This document provides an overview of market basket analysis and pattern matching. It discusses typical use cases for pattern matching in various industries. It then covers the basics of traditional market basket analysis (MBA) including common metrics like support, confidence and lift. The document proposes ways to revisit MBA, including augmenting transaction data with tags and analyzing rules using additional metrics and sequential patterns. It outlines considerations for the design of an MBA system and the typical architecture/process flow.
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Data Mining For Supermarket Sale Analysis Using Association Ruleijtsrd
Data mining is the novel technology of discovering the important information from the data repository which is widely used in almost all fields Recently, mining of databases is very essential because of growing amount of data due to its wide applicability in retail industries in improving marketing strategies. Analysis of past transaction data can provide very valuable information on customer behavior and business decisions. The amount of data stored grows twice as fast as the speed of the fastest processor available to analyze it.Its main purpose is to find the association relationship among the large number of database items. It is used to describe the patterns of customers purchase in the supermarket. This is presented in this paper. Rajeshri Shelke"Data Mining For Supermarket Sale Analysis Using Association Rule" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd94.pdf http://www.ijtsrd.com/engineering/computer-engineering/94/data-mining-for-supermarket-sale-analysis-using-association-rule/rajeshri-shelke
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules IJECEIAES
Association Rule mining plays an important role in the discovery of knowledge and information. Association Rule mining discovers huge number of rules for any dataset for different support and confidence values, among this many of them are redundant, especially in the case of multi-level datasets. Mining non-redundant Association Rules in multi-level dataset is a big concern in field of Data mining. In this paper, we present a definition for redundancy and a concise representation called Reliable Exact basis for representing non-redundant Association Rules from multi-level datasets. The given non-redundant Association Rules are loss less representation for any datasets.
Currently the number of tuples of a database of an enterprise is increasing significantly. Sometimes the associations among attributes in tuples are essential to make plan or decision for future for higher
authority of an organization. The quantitative attributes in tuples must be split into two or more intervals. Due to the over and under-estimation problem closer to the boundary of classical logic, fuzzy logic has been used to make intervals for quantitative attribute. These fuzzy intervals are based on the generation of more realistic associations. This paper focuses on implication of association rules among the quantitative
attributes and categorical attribute of a database employing fuzzy logic and Frequent Pattern (FP) - Growth algorithm. The effectiveness of the method has been justified over a sample database.
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
Information mining is a capable idea with incredible potential to anticipate future patterns and conduct. It alludes to the extraction of concealed information from vast data sets by utilizing procedures like factual examination, machine learning, grouping, neural systems and genetic algorithms. In naive baye’s, there exists a problem of zero likelihood. This paper proposed RB-Bayes method based on baye’s theorem for prediction to remove problem of zero likelihood. We also compare our method with few existing methods i.e. naive baye’s and SVM. We demonstrate that this technique is better than some current techniques and specifically can analyze data sets in better way. At the point when the proposed approach is tried on genuine data-sets, the outcomes got improved accuracy in most cases. RB-Bayes calculation having precision 83.333.
This document summarizes a study analyzing sales data at PT. Panca Putra Solusindo using the Apriori and FP-Growth algorithms. The study aimed to determine the most frequently purchased electronic products to help develop marketing strategies. Transaction data from 2016 was analyzed using the Apriori and FP-Growth algorithms in Weka software. The results showed the top selling items, with PCs and HP notebooks having the highest support over 46% and 30%. The algorithms helped identify best-selling products to optimize promotions.
Efficient Utility Based Infrequent Weighted Item-Set MiningIJTET Journal
Abstract— Association Rule Mining (ARM) is one of the most popular data mining techniques. Most of the past work is based on frequent item-set. In current years, the concentration of researchers has been focused on infrequent item-set mining. The infrequent item-set mining problem is discovering item-sets whose frequency of the data is less than or equal to maximum threshold. This paper addresses the mining of infrequent item-set. To address this issue, the IWI-support measure is defined as a weighted frequency of occurrence of an item set in the analyzed data. This Infrequent weighted item set mining discovers frequent item sets from transactional databases using only items occurrence frequency and not considering items utility. But in many real world situations, utility of item sets based upon user‘s perspective such as cost, profit or revenue is of significant importance. In our proposed system we are proposing the High Utility based Infrequent Weighted Item set mining (HUIWIM). High Utility based Infrequent Weighted Item set mining (HUIWIM) to find high utility Infrequent weighted item set based on minimum threshold values and user preferences. The proposed system is used for efficiently and effectively mine high utility infrequent weighted item set from databases and it can improve the performance of the system compared to the existing system.
The document discusses the Apriori algorithm for frequent itemset mining. It explains that the Apriori algorithm uses an iterative approach consisting of join and prune steps to discover frequent itemsets that occur together above a minimum support threshold. The algorithm first finds all frequent 1-itemsets, then generates and prunes longer candidate itemsets in each iteration until no further frequent itemsets are found.
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Edureka!
Here are the steps to calculate the standard deviation of the numbers:
1) Find the mean (average) of the numbers: (9 + 2 + 5 + ... + 10 + 9 + 6 + 9 + 4) / 20 = 7
2) For each number, subtract the mean and square the result:
(9 - 7)2 = 4
(2 - 7)2 = 49
...
(4 - 7)2 = 9
3) Sum all the squared differences: 4 + 49 + ... + 9 = S
4) Divide the sum by the number of values minus 1: S / (20 - 1)
5) Take the square root. This is the standard
ML practitioners and advocates are increasingly finding themselves becoming gatekeepers of the modern world. The models you create have power to get people arrested or vindicated, get loans approved or rejected, determine what interest rate should be charged for such loans, who should be shown to you in your long list of pursuits on your Tinder, what news do you read, who gets called for a job phone screen or even a college admission... the list goes on. My goal in this talk is to summarize the kinds of disparate outcomes that are caused by cargo cult machine learning, and recent academic efforts to address some of them.
A novel association rule mining and clustering based hybrid method for music ...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailingertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-item-associations-methodology-and-a-case-study-in-apparel-retailing/
Association mining is the conventional data mining technique for analyz-ing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analy-sis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analy-sis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.
The potential role of ai in the minimisation and mitigation of project delayPieter Rautenbach
Artificial intelligence (AI) can have wide reaching application within the construction
industry, however, the actual application of this set of technologies is currently under exploited. This
paper considers the role that the application of AI can take in optimising the efficiencies of project
execution and how this can potentially reduce project duration and minimise and mitigate delay on
projects.
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKIJDKP
Retail company’s data may be geographically spread in different locations due to huge amount of data and
rapid growth in transactions. But for decision making, knowledge workers need integrated data of all sites.
Therefore the main challenge is to get generalized patterns or knowledge from the transactional data
which is spread at various locations. Transporting data from those locations to server site increases the
cost of transportation of data and at the same time finding patterns from huge data on the server increases
the time and space complexity. Thus multi-database mining plays a vital role to extract knowledge from
different data sources. Thus the technique proposed finds the patterns on various sites and instead of
transporting the data, only the patterns from various locations get transported to the server to find final
deliverable pattern. The technique uses the ranking algorithm to rank the items based on their profit, date
of expiry and stock available at each location. Then association rule mining (ARM) is used to extract
patterns based on ranking of items. Finally all the patterns discovered from various locations are merged
using pattern merger algorithm. Proposed algorithm is implemented and experimental results are taken
for both classical association rule mining on integrated data and for datasets at various sources. Finally
all patterns are combined to discover actionable patterns using pattern merger algorithm given in section
Team 7 presented their findings on anomaly detection in credit card transactions. They tested several techniques including random forests with SMOTE oversampling, one class SVMs, and threshold tuning to optimize AUC. Their best model predicted 98% of fraud transactions while maintaining high precision and recall. They demonstrated their model on a live credit card fraud detection system and for analyzing single transactions.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Detecting the High Level Similarities in Software Implementation Process Usin...IOSR Journals
1) The document discusses detecting higher-level structural clones in software, beyond just simple code clones. Structural clones show larger patterns of similarity than simple clones alone.
2) It proposes a technique called Clone Miner to detect structural clones using data mining. Clone Miner formulates the structural clone concept and applies data mining to detect these similarities.
3) Detecting structural clones can help with tasks like software maintenance, reengineering for reuse, and understanding the overall design of a system.
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
The document presents an algorithm for sanitizing a database to hide sensitive patterns while minimizing changes to the original data. It identifies sensitive transactions containing restrictive patterns to be hidden and sorts them by degree and size. It then selects items with the maximum cover across restrictive patterns and removes them from sensitive transactions, reducing support for the patterns. This process iterates until restrictive pattern support is reduced to 0. The sanitized database combines modified sensitive transactions with unmodified non-sensitive transactions. The algorithm is tested on sample databases to evaluate effectiveness with minimal impact on the original data.
Video Streaming Compression for Wireless Multimedia Sensor NetworksIOSR Journals
This document discusses video streaming compression for wireless multimedia sensor networks. It proposes a cross-layer system that jointly controls the video encoding rate, transmission rate, and uses an adaptive parity scheme. At the application layer, video is compressed and divided into packets. These packets are encoded at the transport layer and forwarded through the network layer. An active buffer management scheme and adaptive parity check are used to maximize received video quality over lossy wireless links. Simulation results show the proposed techniques can achieve higher throughput by resequencing dropped packets. The goal is to design an efficient system for wireless transmission of compressed video that optimizes video quality.
Market Basket Analysis Revisited using SQL Pattern Matching Shankar Somayajula
This document provides an overview of market basket analysis and pattern matching. It discusses typical use cases for pattern matching in various industries. It then covers the basics of traditional market basket analysis (MBA) including common metrics like support, confidence and lift. The document proposes ways to revisit MBA, including augmenting transaction data with tags and analyzing rules using additional metrics and sequential patterns. It outlines considerations for the design of an MBA system and the typical architecture/process flow.
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Data Mining For Supermarket Sale Analysis Using Association Ruleijtsrd
Data mining is the novel technology of discovering the important information from the data repository which is widely used in almost all fields Recently, mining of databases is very essential because of growing amount of data due to its wide applicability in retail industries in improving marketing strategies. Analysis of past transaction data can provide very valuable information on customer behavior and business decisions. The amount of data stored grows twice as fast as the speed of the fastest processor available to analyze it.Its main purpose is to find the association relationship among the large number of database items. It is used to describe the patterns of customers purchase in the supermarket. This is presented in this paper. Rajeshri Shelke"Data Mining For Supermarket Sale Analysis Using Association Rule" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd94.pdf http://www.ijtsrd.com/engineering/computer-engineering/94/data-mining-for-supermarket-sale-analysis-using-association-rule/rajeshri-shelke
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules IJECEIAES
Association Rule mining plays an important role in the discovery of knowledge and information. Association Rule mining discovers huge number of rules for any dataset for different support and confidence values, among this many of them are redundant, especially in the case of multi-level datasets. Mining non-redundant Association Rules in multi-level dataset is a big concern in field of Data mining. In this paper, we present a definition for redundancy and a concise representation called Reliable Exact basis for representing non-redundant Association Rules from multi-level datasets. The given non-redundant Association Rules are loss less representation for any datasets.
Currently the number of tuples of a database of an enterprise is increasing significantly. Sometimes the associations among attributes in tuples are essential to make plan or decision for future for higher
authority of an organization. The quantitative attributes in tuples must be split into two or more intervals. Due to the over and under-estimation problem closer to the boundary of classical logic, fuzzy logic has been used to make intervals for quantitative attribute. These fuzzy intervals are based on the generation of more realistic associations. This paper focuses on implication of association rules among the quantitative
attributes and categorical attribute of a database employing fuzzy logic and Frequent Pattern (FP) - Growth algorithm. The effectiveness of the method has been justified over a sample database.
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
Information mining is a capable idea with incredible potential to anticipate future patterns and conduct. It alludes to the extraction of concealed information from vast data sets by utilizing procedures like factual examination, machine learning, grouping, neural systems and genetic algorithms. In naive baye’s, there exists a problem of zero likelihood. This paper proposed RB-Bayes method based on baye’s theorem for prediction to remove problem of zero likelihood. We also compare our method with few existing methods i.e. naive baye’s and SVM. We demonstrate that this technique is better than some current techniques and specifically can analyze data sets in better way. At the point when the proposed approach is tried on genuine data-sets, the outcomes got improved accuracy in most cases. RB-Bayes calculation having precision 83.333.
This document summarizes a study analyzing sales data at PT. Panca Putra Solusindo using the Apriori and FP-Growth algorithms. The study aimed to determine the most frequently purchased electronic products to help develop marketing strategies. Transaction data from 2016 was analyzed using the Apriori and FP-Growth algorithms in Weka software. The results showed the top selling items, with PCs and HP notebooks having the highest support over 46% and 30%. The algorithms helped identify best-selling products to optimize promotions.
Efficient Utility Based Infrequent Weighted Item-Set MiningIJTET Journal
Abstract— Association Rule Mining (ARM) is one of the most popular data mining techniques. Most of the past work is based on frequent item-set. In current years, the concentration of researchers has been focused on infrequent item-set mining. The infrequent item-set mining problem is discovering item-sets whose frequency of the data is less than or equal to maximum threshold. This paper addresses the mining of infrequent item-set. To address this issue, the IWI-support measure is defined as a weighted frequency of occurrence of an item set in the analyzed data. This Infrequent weighted item set mining discovers frequent item sets from transactional databases using only items occurrence frequency and not considering items utility. But in many real world situations, utility of item sets based upon user‘s perspective such as cost, profit or revenue is of significant importance. In our proposed system we are proposing the High Utility based Infrequent Weighted Item set mining (HUIWIM). High Utility based Infrequent Weighted Item set mining (HUIWIM) to find high utility Infrequent weighted item set based on minimum threshold values and user preferences. The proposed system is used for efficiently and effectively mine high utility infrequent weighted item set from databases and it can improve the performance of the system compared to the existing system.
The document discusses the Apriori algorithm for frequent itemset mining. It explains that the Apriori algorithm uses an iterative approach consisting of join and prune steps to discover frequent itemsets that occur together above a minimum support threshold. The algorithm first finds all frequent 1-itemsets, then generates and prunes longer candidate itemsets in each iteration until no further frequent itemsets are found.
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Edureka!
Here are the steps to calculate the standard deviation of the numbers:
1) Find the mean (average) of the numbers: (9 + 2 + 5 + ... + 10 + 9 + 6 + 9 + 4) / 20 = 7
2) For each number, subtract the mean and square the result:
(9 - 7)2 = 4
(2 - 7)2 = 49
...
(4 - 7)2 = 9
3) Sum all the squared differences: 4 + 49 + ... + 9 = S
4) Divide the sum by the number of values minus 1: S / (20 - 1)
5) Take the square root. This is the standard
ML practitioners and advocates are increasingly finding themselves becoming gatekeepers of the modern world. The models you create have power to get people arrested or vindicated, get loans approved or rejected, determine what interest rate should be charged for such loans, who should be shown to you in your long list of pursuits on your Tinder, what news do you read, who gets called for a job phone screen or even a college admission... the list goes on. My goal in this talk is to summarize the kinds of disparate outcomes that are caused by cargo cult machine learning, and recent academic efforts to address some of them.
A novel association rule mining and clustering based hybrid method for music ...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailingertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-item-associations-methodology-and-a-case-study-in-apparel-retailing/
Association mining is the conventional data mining technique for analyz-ing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analy-sis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analy-sis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.
The potential role of ai in the minimisation and mitigation of project delayPieter Rautenbach
Artificial intelligence (AI) can have wide reaching application within the construction
industry, however, the actual application of this set of technologies is currently under exploited. This
paper considers the role that the application of AI can take in optimising the efficiencies of project
execution and how this can potentially reduce project duration and minimise and mitigate delay on
projects.
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKIJDKP
Retail company’s data may be geographically spread in different locations due to huge amount of data and
rapid growth in transactions. But for decision making, knowledge workers need integrated data of all sites.
Therefore the main challenge is to get generalized patterns or knowledge from the transactional data
which is spread at various locations. Transporting data from those locations to server site increases the
cost of transportation of data and at the same time finding patterns from huge data on the server increases
the time and space complexity. Thus multi-database mining plays a vital role to extract knowledge from
different data sources. Thus the technique proposed finds the patterns on various sites and instead of
transporting the data, only the patterns from various locations get transported to the server to find final
deliverable pattern. The technique uses the ranking algorithm to rank the items based on their profit, date
of expiry and stock available at each location. Then association rule mining (ARM) is used to extract
patterns based on ranking of items. Finally all the patterns discovered from various locations are merged
using pattern merger algorithm. Proposed algorithm is implemented and experimental results are taken
for both classical association rule mining on integrated data and for datasets at various sources. Finally
all patterns are combined to discover actionable patterns using pattern merger algorithm given in section
Team 7 presented their findings on anomaly detection in credit card transactions. They tested several techniques including random forests with SMOTE oversampling, one class SVMs, and threshold tuning to optimize AUC. Their best model predicted 98% of fraud transactions while maintaining high precision and recall. They demonstrated their model on a live credit card fraud detection system and for analyzing single transactions.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Detecting the High Level Similarities in Software Implementation Process Usin...IOSR Journals
1) The document discusses detecting higher-level structural clones in software, beyond just simple code clones. Structural clones show larger patterns of similarity than simple clones alone.
2) It proposes a technique called Clone Miner to detect structural clones using data mining. Clone Miner formulates the structural clone concept and applies data mining to detect these similarities.
3) Detecting structural clones can help with tasks like software maintenance, reengineering for reuse, and understanding the overall design of a system.
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
The document presents an algorithm for sanitizing a database to hide sensitive patterns while minimizing changes to the original data. It identifies sensitive transactions containing restrictive patterns to be hidden and sorts them by degree and size. It then selects items with the maximum cover across restrictive patterns and removes them from sensitive transactions, reducing support for the patterns. This process iterates until restrictive pattern support is reduced to 0. The sanitized database combines modified sensitive transactions with unmodified non-sensitive transactions. The algorithm is tested on sample databases to evaluate effectiveness with minimal impact on the original data.
Video Streaming Compression for Wireless Multimedia Sensor NetworksIOSR Journals
This document discusses video streaming compression for wireless multimedia sensor networks. It proposes a cross-layer system that jointly controls the video encoding rate, transmission rate, and uses an adaptive parity scheme. At the application layer, video is compressed and divided into packets. These packets are encoded at the transport layer and forwarded through the network layer. An active buffer management scheme and adaptive parity check are used to maximize received video quality over lossy wireless links. Simulation results show the proposed techniques can achieve higher throughput by resequencing dropped packets. The goal is to design an efficient system for wireless transmission of compressed video that optimizes video quality.
Performance of Web Services on Smart Phone PlatformsIOSR Journals
This document discusses and compares the performance of Web Services on smart phone platforms using SOAP and REST. It begins with an introduction to Web Services and the problems with using SOAP on mobile devices due to its limitations in processing power, bandwidth usage, and flexibility. It then proposes using RESTful Web Services as an alternative as they avoid XML parsing and are based on the lightweight HTTP protocol. The document analyzes the performance of SOAP versus REST Web Services on a mobile device to determine which is more efficient for smart phones.
Automatic Learning Image Objects via Incremental ModelIOSR Journals
This document presents a novel approach for automatically collecting and learning object category datasets and models in an incremental manner by leveraging web image resources. The proposed framework uses object recognition techniques to iteratively accumulate model knowledge and image examples, mimicking human learning. An incremental learning algorithm is used to automatically collect larger object category datasets than existing datasets like Caltech 101. As new images are classified and added in each iteration, the object model becomes more robust, leading to an even larger and more accurate collected dataset. Experiments show the approach is effective at collecting superior image datasets compared to existing resources.
Simulation And Hardware Analysis Of Three Phase PWM Rectifier With Power Fact...IOSR Journals
This document summarizes a research paper on simulating and analyzing a three-phase PWM rectifier with power factor correction. The paper describes the design of a three-phase PWM rectifier circuit to convert AC power input into DC power output at unity power factor. Simulation results show that the rectifier controls input currents to be sinusoidal and in phase with voltages, improving power quality. Hardware testing also demonstrates unity power factor with voltage and current waveforms in phase. The rectifier design and simulation aim to improve power quality by controlling reactive power and achieving unity power factor.
Emotion Recognition Based On Audio SpeechIOSR Journals
This document summarizes a research paper on emotion recognition based on audio speech. It discusses how acoustic features are extracted from speech signals by applying preprocessing techniques like preemphasis and framing. It describes extracting features like Mel frequency cepstral coefficients (MFCCs) that capture characteristics of the vocal tract. Support vector machines (SVMs) are used as pattern classification methods to build models for each emotion and compare test speech features to recognize emotions. The paper confirms the advantage of its audio-based emotion recognition approach through experimental results and discusses potential improvements and future work on increasing efficiency and recognizing emotion intensity.
A Soft-Switching DC/DC Converter with High Voltage GainIOSR Journals
This document summarizes a proposed soft-switching DC/DC converter with high voltage gain. The proposed converter uses zero-voltage switching (ZVS) and zero current switching (ZCS) techniques to reduce switching losses and improve efficiency. It provides a continuous input current and high voltage gain using a coupled inductor cell. Experimental results on a 200W prototype show the converter achieves soft switching and provides a 24V to 360V conversion with high efficiency.
Survey of Real Time Scheduling AlgorithmsIOSR Journals
This document summarizes and reviews real-time scheduling algorithms. It discusses both static and dynamic scheduling algorithms for real-time tasks on uniprocessors and multiprocessors. The document reviews algorithms such as Earliest Deadline First, Least Laxity First, and Modified Instantaneous Utilization Factor Scheduling. It concludes that Modified Instantaneous Utilization Factor Scheduling provides better results than previous algorithms in terms of context switching, response time, and CPU utilization for uniprocessor scheduling.
Design of a micro device to electrically separate and count wbc from whole bloodIOSR Journals
This document describes a micro device designed to electrically separate and count white blood cells (WBCs) from whole blood. The device consists of four main parts: 1) A micro mixer that dilutes the whole blood sample with a diluent. 2) A size-dependent electrical cell separator that uses dielectrophoresis to separate WBCs from other blood cells based on their size. 3) A micro nozzle that increases the flow velocity of cells. 4) A cell counting unit that uses a Coulter counter approach to count the number of separated WBCs by detecting changes in impedance as cells pass between electrodes. The design of each unit was simulated using software to verify it could perform its intended function. The overall
Face Recognition System under Varying Lighting ConditionsIOSR Journals
The document discusses a face recognition system that is robust to varying lighting conditions. It proposes combining illumination normalization preprocessing, local texture-based face representations like Local Binary Patterns (LBP) and Local Ternary Patterns (LTP), and distance transform-based matching. The preprocessing aims to eliminate lighting variations while preserving important appearance details. Local patterns make the representations less sensitive to noise. Experiments show the method outperforms other preprocessors across datasets and lighting conditions, providing a 88.1% verification rate at 0.1% false acceptance. The system is implemented in MATLAB.
A Greybox Hospital Information System in the Medical Center Tobruk Libya base...IOSR Journals
This document presents a study on developing a greybox hospital information system for the Medical Center in Tobruk, Libya based on the Three-layer Graph-based Model (3LGM). The study aims to model the current information system and propose improvements using 3LGM. It describes modeling the main functions, logical and physical layers, use cases, and databases for patient, doctor, and clinical documentation data. Tables compare 3LGM to other models. Figures illustrate the domain layers, tools layers, use cases, and database tables. The conclusion is that all tasks were successfully completed to develop and implement an information system model to support management of patient, doctor, and clinical data using 3LGM.
Inflammatory Conditions Mimicking Tumours In Calabar: A 30 Year Study (1978-2...IOSR Journals
1. The document discusses different image fusion techniques, specifically wavelet transform and curvelet transform based fusion.
2. Wavelet transform is commonly used for image fusion due to its simplicity and ability to preserve time-frequency details. Curvelet transform is better for fusing images with curved edges.
3. The paper compares fusion results of medical images like MR and CT using wavelet and curvelet transforms, finding that curvelet transform provides superior results in metrics like entropy and peak signal-to-noise ratio.
This document discusses effective cryptography standards for conducting business and personal communications over the Internet. It begins by noting that while organizations are working to secure themselves from growing cyber threats, data breaches continue to occur frequently. The document then reviews the problem and proposes several possible solutions. It discusses different cryptography techniques from the past like private key algorithms, public key algorithms, and hash functions. It also covers more modern techniques like quantum cryptography and elliptic curve cryptography. The document concludes that with continued development of proven cryptography technologies, we may be able to better protect secrecy in digital communications.
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...IOSR Journals
This document presents two improved biological sequence compression algorithms that utilize a lookup table (LUT) and identification of tandem repeats in sequences. The first algorithm maps all possible 3-character combinations to ASCII characters using a 125-entry LUT. The second maps all possible 4-character combinations to ASCII characters using a 256-entry LUT. These algorithms aim to achieve high compression factors, saving percentages, and faster compression/decompression times compared to previous biological sequence compression methods.
Achieving Privacy in Publishing Search logsIOSR Journals
The document discusses algorithms for publishing search logs while preserving user privacy. It analyzes a search log using an algorithm that produces three types of outputs: query counts, a query-action graph showing query-result click counts, and a query-reformulation graph showing query suggestions clicked. The algorithm adds noise to query counts before publishing to achieve differential privacy. It aims to provide useful aggregated information for applications like search improvement while preventing re-identification of individual user data in the search log.
This document summarizes a research paper that proposes using an improved harmony search algorithm for automatic clustering. The improved harmony search algorithm uses variable parameters like pitch adjustment rate and bandwidth to evolve candidate cluster centers and determine the appropriate number of clusters. The algorithm aims to minimize within-cluster variance and maximize between-cluster variance. It was tested on several datasets and found to be able to accurately determine the number of clusters and locations of cluster centers.
Performance Evaluation of Routing Protocols for MAC Layer ModelsIOSR Journals
This document evaluates the performance of several routing protocols (Bellman-Ford, AODV, DSR, LAR1, WRP, FSR, ZRP) for different MAC layer models (802.11, CSMA, TSMA) using the GloMoSim simulator. Key metrics analyzed are packet delivery ratio, end-to-end delay, and throughput. The results show that for 802.11, AODV and LAR1 have the highest packet delivery ratios and throughput, while WRP and FSR have the lowest delays. For CSMA and TSMA, ZRP demonstrates the best overall performance due to its ability to use both proactive and reactive routing.
Public Verifiability in Cloud Computing Using Signcryption Based on Elliptic ...IOSR Journals
This document proposes a scheme for public verifiability in cloud computing using signcryption based on elliptic curves. The scheme aims to ensure the correctness of user's data stored on the cloud without demanding the user's time, feasibility, or resources. It utilizes erasure-correcting codes, homomorphic tokens, and distributed verification to guarantee data dependability and allow detection and localization of data errors on misbehaving servers. Signcryption and unsigncryption based on elliptic curves are used to enforce public verifiability, improving on a previous cloud storage model.
Multicast Dual Arti-Q System in Vehicular Adhoc NetworksIOSR Journals
This document discusses a dual Arti-Q system for efficient call taxi management in vehicular ad hoc networks (VANETs). The existing Artigence system uses an Arti-Q algorithm with two components: Arti-Q main and Arti-Q proxy. The dual Arti-Q system proposes distributing the functionalities of scheduling and message transmission between separate servers to process requests in parallel and reduce response times. By separating these functions, the dual Arti-Q system aims to improve efficiency over the existing Artigence approach for managing taxi reservations and communications in VANETs.
‘Erules’ [3] is an integrated algorithm that is used to mine any data warehouse to extract useful and reliable rule sets effectively. It is used to generate positive &negative; conjunctive & disjunctive rules with the help of genetic algorithm and modified FP growth & Apriori Algorithms accordingly. It is an integrated algorithm for useful and effective association rule mining to capture even useful rare items; Lift Factor is also used to analyze the strength of derived rules. However redundant rules were one of the major challenges which were not addressed. This paper concisely deals the elimination of rule sets with the appropriate modification with the existing algorithm so that it can generate positive and negative rule sets for the non redundant rules with less cost. Besides a voluminous Pharmacy data set has been taken and the effectiveness /performance of ‘Erules’ got measured on it.
Introduction To Multilevel Association Rule And Its MethodsIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
A model for profit pattern mining based on genetic algorithmeSAT Journals
Abstract
Mining profit oriented patterns is a novel technique of association rule mining in data mining, which basically focuses on important issues related with business. As it is well known that every business aims to generate the profit and find the ways to improve the same. In earlier days association rule mining was used for market basket analysis and targeted only some of the business and commercial aspects. Afterwards the researchers started to aim the most prominent element of any business i.e. Profit, and determined the innovative way to generate the association rules based on profit. Profit oriented patterns mining approach combines the statistic based pattern mining with value-based decision making to generate those patterns with the maximum profit and some ways to generate recommenders for future strategy. To achieve the desired goal the traditional association rule mining alone is not effectual, so we combine the strength of genetic algorithm with association rule mining to enhance its capability. The study shows that Genetic Algorithm improves the effectiveness and efficiency of association rule mining outcome, since genetic algorithms are competent to handle the problems related with the uncertainty, multi-dimensional, non-differential, non-continuous, and non-parametrical, non-linearity constraint and multi-objective optimization problems. In this paper we apply the concept of profit pattern mining with genetic algorithm to generate profit oriented pattern which help out in future business expansion and fulfill the business objective.
Keywords: Data Mining, Association Rule Mining, Profit Pattern Mining, Genetic Algorithm
The document proposes an algorithm called MSApriori_VDB for efficiently mining rare association rules from transactional databases. It first converts the transaction database to a vertical data format to reduce the number of scans. It then uses a multiple minimum support framework where each item is assigned a minimum item support based on its frequency. The algorithm generates candidate itemsets, calculates their support, and prunes uninteresting itemsets to identify interesting rare associations with high confidence. Experimental results show the algorithm outperforms previous approaches in memory usage and runtime.
The document presents a proposed algorithm called MSApriori_VDB for efficiently mining rare association rules from transactional databases. The algorithm first converts the transaction database to a vertical data format to reduce the number of scans. It then uses a multiple minimum support framework where each item is assigned a minimum item support based on its frequency. The algorithm generates candidate itemsets, calculates their support, and prunes uninteresting itemsets to identify interesting rare associations with high confidence. Experimental results show the algorithm outperforms previous approaches in memory usage and runtime.
The document summarizes research on improving the Apriori algorithm for association rule mining. It first provides background on association rule mining and the standard Apriori algorithm. It then discusses several proposed improvements to Apriori, including reducing the number of database scans, shrinking the candidate itemset size, and using techniques like pruning and hash trees. Finally, it outlines some open challenges for further optimizing association rule mining.
This document summarizes research on improving the Apriori algorithm for mining association rules from transactional databases. It first provides background on association rule mining and describes the basic Apriori algorithm. The Apriori algorithm finds frequent itemsets by multiple passes over the database but has limitations of increased search space and computational costs as the database size increases. The document then reviews research on variations of the Apriori algorithm that aim to reduce the number of database scans, shrink the candidate sets, and facilitate support counting to improve performance.
NEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUEcscpconf
Data mining is the process of extracting hidden patterns of data. Association rule mining is an
important data mining task that finds interesting association among a large set of data item. It
may disclose pattern and various kinds of sensitive information. Such information may be
protected against unauthorized access. Association rule hiding is one of the techniques of
privacy preserving data mining to protect the association rules generated by association rule
mining. This paper adopts data distortion technique for hiding sensitive association rules.
Algorithms based on this technique either hide a specific rule using data alteration technique or
hide the rules depending on the sensitivity of the items to be hidden. In the proposed technique,
positions of sensitive items are altered while maintaining the support. The proposed technique
uses the idea of representative rules to prune the rules first and then hides the sensitive rules.
A threshold free implication rule miningAjay Desai
this is a solution paper to logic based pattern discovery, in this paper we are overcoming the space and time complexity problem which we encounter in the coherent rule mining approach through pruning technique
This document summarizes a research paper that proposes using a genetic algorithm to generate high-quality association rules for measuring data quality. The genetic algorithm evaluates rules based on four metrics: confidence, completeness, comprehensibility, and interestingness. It aims to discover high-level prediction rules that perform better than traditional greedy rule induction algorithms at handling attribute interactions. The genetic algorithm represents rules as chromosomes and uses the four metrics as an objective fitness function to evaluate the quality of each rule.
This document discusses data mining techniques, including the data mining process and common techniques like association rule mining. It describes the data mining process as involving data gathering, preparation, mining the data using algorithms, and analyzing and interpreting the results. Association rule mining is explained in detail, including how it can be used to identify relationships between frequently purchased products. Methods for mining multilevel and multidimensional association rules are also summarized.
A Survey on Fuzzy Association Rule Mining MethodologiesIOSR Journals
Abstract : Fuzzy association rule mining (Fuzzy ARM) uses fuzzy logic to generate interesting association
rules. These association relationships can help in decision making for the solution of a given problem. Fuzzy
ARM is a variant of classical association rule mining. Classical association rule mining uses the concept of
crisp sets. Because of this reason classical association rule mining has several drawbacks. To overcome those
drawbacks the concept of fuzzy association rule mining came. Today there is a huge number of different types of
fuzzy association rule mining algorithms are present in research works and day by day these algorithms are
getting better. But as the problem domain is also becoming more complex in nature, continuous research work
is still going on. In this paper, we have studied several well-known methodologies and algorithms for fuzzy
association rule mining. Four important methodologies are briefly discussed in this paper which will show the
recent trends and future scope of research in the field of fuzzy association rule mining.
Keywords: Knowledge discovery in databases, Data mining, Fuzzy association rule mining, Classical
association rule mining, Very large datasets, Minimum support, Cardinality, Certainty factor, Redundant rule,
Equivalence , Equivalent rules
This document summarizes an article that proposes an improved association rule mining method using correlation techniques. It integrates positive and negative rule mining concepts with frequent pattern mining to generate unique rules for classification without ambiguity. The method first constructs an FP-tree to efficiently find frequent itemsets. It then uses correlations between itemsets and class labels to determine whether rules belong to the positive or negative rule set. Classification of new data is performed by applying the matching rule subsets. The proposed method aims to address issues with traditional association rule and frequent pattern mining approaches for large datasets.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses and compares different measures of interestingness that can be used to evaluate association rules generated by data mining algorithms. It first reviews twelve common interestingness measures, including support, confidence, lift, conviction, and p-measure. It then applies these measures to association rules generated from a sample dataset about demographics and hobbies in the western region. The results are summarized in a table comparing the interestingness values calculated by each measure for different rules. The document aims to help users understand and select the most appropriate interestingness measure depending on their application needs.
An Ontological Approach for Mining Association Rules from Transactional DatasetIJERA Editor
This document discusses using an ontology relational weights measure (ORWM) approach to mine interesting infrequent item sets from transactional datasets. It introduces three algorithms: 1) Infrequent Weighted Item Set Miner, which mines infrequent item sets using a FP-growth-like approach; 2) Minimal Infrequent Weighted Item Set Miner, which avoids extracting non-minimal item sets; and 3) ORWM, which integrates user knowledge, prunes rules to minimize the number generated, and uses a weighted support measure to discover interesting patterns. The ORWM represents items as a directed graph and applies the HITS model to rank items and discover high authority/hub infrequent item sets of interest to users.
This document describes a Multilevel Relationship Algorithm (MRA) for improving association rule mining. MRA works in three stages: 1) It uses an Apriori algorithm to find level 1 associations between items within individual shops. 2) It uses the level 1 associations to find frequent itemsets across shops. 3) It uses Bayesian probability to determine dependencies between items across shops and generate learning rules. The algorithm aims to discover relationships between sales data from different shops to gain insights for business decisions.
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASEIJwest
Clustering is categorizing data into groups with similar objects. Data mining adds to complexities of clustering a large dataset with various features. Among these datasets, there are electronic business stores which offer their products through web. These stores require recommendation systems which can offer products to the user which the user might require them with higher probability. In this study, previous purchases of users are used to present a sorted list of products to the user. Identifying associations related to users and finding centers increases precision of the recommended list. Configuration of associations and creating a profile for users is important in current studies. In the proposed method, association rules are presented to model user interactions in the web which use time that a page is visited and frequency of visiting a page to weight pages and describes users’ interest to page groups. Therefore, weight of each transaction item describes user’s interest in that item. Analyzing results show that the proposed method presents a more complete model of users’ behavior because it combines weight and membership degree of pages simultaneously for ranking candidate pages. This method has obtained higher accuracy compared to other methods even in higher number of pages.
Configuring Associations to Increase Trust in Product Purchase dannyijwest
Clustering is categorizing data into groups with similar objects. Data mining adds to complexities of clustering a large dataset with various features. Among these datasets, there are electronic business stores which offer their products through web. These stores require recommendation systems which can offer products to the user which the user might require them with higher probability. In this study, previous purchases of users are used to present a sorted list of products to the user. Identifying associations related to users and finding centers increases precision of the recommended list. Configuration of associations and creating a profile for users is important in current studies. In the proposed method, association rules are presented to model user interactions in the web which use time that a page is visited and frequency of visiting a page to weight pages and describes users’ interest to page groups. Therefore, weight of each transaction item describes user’s interest in that item. Analyzing results show that the proposed method presents a more complete model of users’ behavior because it combines weight and membership degree of pages simultaneously for ranking candidate pages. This method has obtained higher accuracy compared to other methods even in higher number of pages.
The document discusses hiding sensitive association rules in data mining. It proposes an approach that alters the position of sensitive items in transactions while keeping their overall support unchanged. This is unlike existing approaches that increase or decrease the size of the database or support of sensitive items. The key advantages are hiding more rules with fewer alterations while preserving the support of sensitive items and database size. An algorithm is presented and example demonstrations are provided. The performance is compared to prior approaches.
Similar to An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule Mining using Genetic Algorithm (20)
This document provides a technical review of secure banking using RSA and AES encryption methodologies. It discusses how RSA and AES are commonly used encryption standards for secure data transmission between ATMs and bank servers. The document first provides background on ATM security measures and risks of attacks. It then reviews related work analyzing encryption techniques. The document proposes using a one-time password in addition to a PIN for ATM authentication. It concludes that implementing encryption standards like RSA and AES can make transactions more secure and build trust in online banking.
This document analyzes the performance of various modulation schemes for achieving energy efficient communication over fading channels in wireless sensor networks. It finds that for long transmission distances, low-order modulations like BPSK are optimal due to their lower SNR requirements. However, as transmission distance decreases, higher-order modulations like 16-QAM and 64-QAM become more optimal since they can transmit more bits per symbol, outweighing their higher SNR needs. Simulations show lifetime extensions up to 550% are possible in short-range networks by using higher-order modulations instead of just BPSK. The optimal modulation depends on transmission distance and balancing the energy used by electronic components versus power amplifiers.
This document provides a review of mobility management techniques in vehicular ad hoc networks (VANETs). It discusses three modes of communication in VANETs: vehicle-to-infrastructure (V2I), vehicle-to-vehicle (V2V), and hybrid vehicle (HV) communication. For each communication mode, different mobility management schemes are required due to their unique characteristics. The document also discusses mobility management challenges in VANETs and outlines some open research issues in improving mobility management for seamless communication in these dynamic networks.
This document provides a review of different techniques for segmenting brain MRI images to detect tumors. It compares the K-means and Fuzzy C-means clustering algorithms. K-means is an exclusive clustering algorithm that groups data points into distinct clusters, while Fuzzy C-means is an overlapping clustering algorithm that allows data points to belong to multiple clusters. The document finds that Fuzzy C-means requires more time for brain tumor detection compared to other methods like hierarchical clustering or K-means. It also reviews related work applying these clustering algorithms to segment brain MRI images.
1) The document simulates and compares the performance of AODV and DSDV routing protocols in a mobile ad hoc network under three conditions: when users are fixed, when users move towards the base station, and when users move away from the base station.
2) The results show that both protocols have higher packet delivery and lower packet loss when users are either fixed or moving towards the base station, since signal strength is better in those scenarios. Performance degrades when users move away from the base station due to weaker signals.
3) AODV generally has better performance than DSDV, with higher throughput and packet delivery rates observed across the different user mobility conditions.
This document describes the design and implementation of 4-bit QPSK and 256-bit QAM modulation techniques using MATLAB. It compares the two techniques based on SNR, BER, and efficiency. The key steps of implementing each technique in MATLAB are outlined, including generating random bits, modulation, adding noise, and measuring BER. Simulation results show scatter plots and eye diagrams of the modulated signals. A table compares the results, showing that 256-bit QAM provides better performance than 4-bit QPSK. The document concludes that QAM modulation is more effective for digital transmission systems.
The document proposes a hybrid technique using Anisotropic Scale Invariant Feature Transform (A-SIFT) and Robust Ensemble Support Vector Machine (RESVM) to accurately identify faces in images. A-SIFT improves upon traditional SIFT by applying anisotropic scaling to extract richer directional keypoints. Keypoints are processed with RESVM and hypothesis testing to increase accuracy above 95% by repeatedly reprocessing images until the threshold is met. The technique was tested on similar and different facial images and achieved better results than SIFT in retrieval time and reduced keypoints.
This document studies the effects of dielectric superstrate thickness on microstrip patch antenna parameters. Three types of probes-fed patch antennas (rectangular, circular, and square) were designed to operate at 2.4 GHz using Arlondiclad 880 substrate. The antennas were tested with and without an Arlondiclad 880 superstrate of varying thicknesses. It was found that adding a superstrate slightly degraded performance by lowering the resonant frequency and increasing return loss and VSWR, while decreasing bandwidth and gain. Specifically, increasing the superstrate thickness or dielectric constant resulted in greater changes to the antenna parameters.
This document describes a wireless environment monitoring system that utilizes soil energy as a sustainable power source for wireless sensors. The system uses a microbial fuel cell to generate electricity from the microbial activity in soil. Two microbial fuel cells were created using different soil types and various additives to produce different current and voltage outputs. An electronic circuit was designed on a printed circuit board with components like a microcontroller and ZigBee transceiver. Sensors for temperature and humidity were connected to the circuit to monitor the environment wirelessly. The system provides a low-cost way to power remote sensors without needing battery replacement and avoids the high costs of wiring a power source.
1) The document proposes a model for a frequency tunable inverted-F antenna that uses ferrite material.
2) The resonant frequency of the antenna can be significantly shifted from 2.41GHz to 3.15GHz, a 31% shift, by increasing the static magnetic field placed on the ferrite material.
3) Altering the permeability of the ferrite allows tuning of the antenna's resonant frequency without changing the physical dimensions, providing flexibility to operate over a wide frequency range.
This document summarizes a research paper that presents a speech enhancement method using stationary wavelet transform. The method first classifies speech into voiced, unvoiced, and silence regions based on short-time energy. It then applies different thresholding techniques to the wavelet coefficients of each region - modified hard thresholding for voiced speech, semi-soft thresholding for unvoiced speech, and setting coefficients to zero for silence. Experimental results using speech from the TIMIT database corrupted with white Gaussian noise at various SNR levels show improved performance over other popular denoising methods.
This document reviews the design of an energy-optimized wireless sensor node that encrypts data for transmission. It discusses how sensing schemes that group nodes into clusters and transmit aggregated data can reduce energy consumption compared to individual node transmissions. The proposed node design calculates the minimum transmission power needed based on received signal strength and uses a periodic sleep/wake cycle to optimize energy when not sensing or transmitting. It aims to encrypt data at both the node and network level to further optimize energy usage for wireless communication.
This document discusses group consumption modes. It analyzes factors that impact group consumption, including external environmental factors like technological developments enabling new forms of online and offline interactions, as well as internal motivational factors at both the group and individual level. The document then proposes that group consumption modes can be divided into four types based on two dimensions: vertical (group relationship intensity) and horizontal (consumption action period). These four types are instrument-oriented, information-oriented, enjoyment-oriented, and relationship-oriented consumption modes. Finally, the document notes that consumption modes are dynamic and can evolve over time.
The document summarizes a study of different microstrip patch antenna configurations with slotted ground planes. Three antenna designs were proposed and their performance evaluated through simulation: a conventional square patch, an elliptical patch, and a star-shaped patch. All antennas were mounted on an FR4 substrate. The effects of adding different slot patterns to the ground plane on resonance frequency, bandwidth, gain and efficiency were analyzed parametrically. Key findings were that reshaping the patch and adding slots increased bandwidth and shifted resonance frequency. The elliptical and star patches in particular performed better than the conventional design. Three antenna configurations were selected for fabrication and measurement based on the simulations: a conventional patch with a slot under the patch, an elliptical patch with slots
1) The document describes a study conducted to improve call drop rates in a GSM network through RF optimization.
2) Drive testing was performed before and after optimization using TEMS software to record network parameters like RxLevel, RxQuality, and events.
3) Analysis found call drops were occurring due to issues like handover failures between sectors, interference from adjacent channels, and overshooting due to antenna tilt.
4) Corrective actions taken included defining neighbors between sectors, adjusting frequencies to reduce interference, and lowering the mechanical tilt of an antenna.
5) Post-optimization drive testing showed improvements in RxLevel, RxQuality, and a reduction in dropped calls.
This document describes the design of an intelligent autonomous wheeled robot that uses RF transmission for communication. The robot has two modes - automatic mode where it can make its own decisions, and user control mode where a user can control it remotely. It is designed using a microcontroller and can perform tasks like object recognition using computer vision and color detection in MATLAB, as well as wall painting using pneumatic systems. The robot's movement is controlled by DC motors and it uses sensors like ultrasonic sensors and gas sensors to navigate autonomously. RF transmission allows communication between the robot and a remote control unit. The overall aim is to develop a low-cost robotic system for industrial applications like material handling.
This document reviews cryptography techniques to secure the Ad-hoc On-Demand Distance Vector (AODV) routing protocol in mobile ad-hoc networks. It discusses various types of attacks on AODV like impersonation, denial of service, eavesdropping, black hole attacks, wormhole attacks, and Sybil attacks. It then proposes using the RC6 cryptography algorithm to secure AODV by encrypting data packets and detecting and removing malicious nodes launching black hole attacks. Simulation results show that after applying RC6, the packet delivery ratio and throughput of AODV increase while delay decreases, improving the security and performance of the network under attack.
The document describes a proposed modification to the conventional Booth multiplier that aims to increase its speed by applying concepts from Vedic mathematics. Specifically, it utilizes the Urdhva Tiryakbhyam formula to generate all partial products concurrently rather than sequentially. The proposed 8x8 bit multiplier was coded in VHDL, simulated, and found to have a path delay 44.35% lower than a conventional Booth multiplier, demonstrating its potential for higher speed.
This document discusses image deblurring techniques. It begins by introducing image restoration and focusing on image deblurring. It then discusses challenges with image deblurring being an ill-posed problem. It reviews existing approaches to screen image deconvolution including estimating point spread functions and iteratively estimating blur kernels and sharp images. The document also discusses handling spatially variant blur and summarizes the relationship between the proposed method and previous work for different blur types. It proposes using color filters in the aperture to exploit parallax cues for segmentation and blur estimation. Finally, it proposes moving the image sensor circularly during exposure to prevent high frequency attenuation from motion blur.
This document describes modeling an adaptive controller for an aircraft roll control system using PID, fuzzy-PID, and genetic algorithm. It begins by introducing the aircraft roll control system and motivation for developing an adaptive controller to minimize errors from noisy analog sensor signals. It then provides the mathematical model of aircraft roll dynamics and describes modeling the real-time flight control system in MATLAB/Simulink. The document evaluates PID, fuzzy-PID, and PID-GA (genetic algorithm) controllers for aircraft roll control and finds that the PID-GA controller delivers the best performance.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule Mining using Genetic Algorithm
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 73-82
www.iosrjournals.org
www.iosrjournals.org 73 | Page
An Optimal Approach to derive Disjunctive Positive and Negative
Rules from Association Rule Mining using Genetic Algorithm
Kannika Nirai Vaani.M 1
, E Ramaraj 2
1
(Training Division, TechMahindra Ltd, India)
2
(Dept of Computer Science and Engineering, Alagappa University, India)
Abstract: Mining frequent itemsets and association rules is a popular and well researched approach for
discovering interesting relationships between variables in large databases. Association rule mining is one of the
most important techniques of data mining that aims to induce associations among sets of items in transaction
databases or other data repositories. There are various Algorithms developed and customized to derive the
effective rules to improve the business. Amongst all, Apriori algorithms and FP Growth Algorithms play a vital
role in finding out frequent item set and subsequently deriving rule sets based on business constraints. However
there are few shortfalls in these conventional Algorithms. They are i) candidate items generation consumes lot
of time in the case of large datasets ii) It supports majorly the conjunctive nature of association rules iii) The
single minimum support factor not suffice to generate the effective rules iv) ‘support/confident’ alone not
helping to validate the rules generated and v) Negative rules are not addressed effectively. Points from i) to iv)
were addressed in the earlier works [10][13] . However identifying and deriving negative rules are still a
challenge. The proposed work is considered to be the extended version of our earlier work [13]. It focuses how
effectively negative rules can be derived with the help of logical rules sets which was not addressed in our
earlier work. For this exercise the earlier work has been taken as the reference and the appropriate
modifications and additions are updated into it where ever applicable. Hence by using this approach
conjunctive & disjunctive; positive& negative rules can be generated effectively in an optimized manner.
Keywords - Logical rule set, FP Growth Algorithm, Genetic Algorithm, Lift ratio, Multiple Minimum Support,
Disjunctive Rules
I. INTRODUCTION
Data mining is the task to mining the useful meaningful information from data warehouse. It is the
source of inexplicit, purely valid, and potentially useful and important knowledge from large volumes of natural
data [8].The selected knowledge must be not only precise but also readable, comprehensible and ease of
understanding.Association rule basically use for finding out the useful patterns, relation between items found in
the database of transactions [9].Association rule mining generally experimented to find all rules that satisfy
user-specified minimum support and minimum confidence constraints [3].The important factor that makes
association rule mining practical and useful is the minimum support. It is used to limit the number of rules
generated. However, using only a single minsup implicitly assumes that all items in the data are of the same
nature and/or have similar frequencies in the database. This is often not the case in real-life applications. In
many applications, some items appear very frequently in the data, while others rarely appear. If the frequencies
of items vary a great deal, we will encounter two problems,
1. If minsup is set too high, we will not find those rules that involve infrequent items or rare items in the data.
2. In order to find rules that involve both frequent and rare items, then minsup to be kept very low. However,
this may produce too many rules.
So when one common support is fixed as minimum support for all the items, the rules which are not
frequent occur but majorly contributing towards profit may get lost without notice.
For example, in a supermarket transaction data, in order to find rules involving those infrequently purchased
items such as food processor and cooking pan (they generate more profits per item) very minimum support
needs to be set ; but due to this the unwanted and rare items will not be get pruned. Hence fixing multiple
minimum support for each items have become significant.
Multiple Minimum Supports to handle rare items:
In many data mining applications [4], some items appear very frequently in the data, while others
rarely appear. If minsup is set too high, those rules that involve rare items will not be found. To find rules that
involve both frequent and rare items, minsup has to be set very low. This may cause combinatorial explosion
because those frequent items will be associated with one another in all possible ways. The disadvantage of
support is the rare item problem. Items that occur very infrequently in the data set are pruned although they
2. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 74 | Page
would still produce interesting and potentially valuable rules. The rare item problem is important for transaction
data which usually have a very uneven distribution of support for the individual items.
Algorithms dealing Association Rule Mining:
Apriori algorithm:In the earlier work [10] the Apriori algorithm was modified to include multiple
supports, disjunctive and conjunctive rules and Genetic Algorithm was used to generate useful rules effectively.
Major shortfall in modified Apriori was „time taken‟ to generate frequent itemsets. In general Apriori-like
algorithm may still suffer from the following two nontrivial costs, It is costly to handle a huge number of
candidate sets [12]. It is tedious to repeatedly scan the database and check a large set of candidates by pattern
matching, which is especially true for mining long patterns. If one can avoid generating a huge set of candidates,
the mining performance can be substantially improved. Hence the need of introducing an algorithm which takes
considerably less time was realized. There comes the modified FP growth Algorithm. FP-growth algorithm
:FP Growth approach is based on divide and conquers strategy for producing the frequent item sets [11]. FP-
growth is mainly used for mining frequent item sets without candidate generation.
Lift Ratio
A high value of confidence suggests a strong association rule. However this can be deceptive because if
the antecedent and/or the consequent have a high support, we can have a high value for confidence even when
they are independent. A better measure to judge the strength of an association rule is to compare the confidence
of the rule with the benchmark value where we assume that the occurrence of the consequent item set in a
transaction is independent of the occurrence of the antecedent for each rule. This benchmark can be found out
from the frequency counts of the frequent item sets. This enables to compute the lift ratio of a rule. The lift ratio
is the confidence of the rule divided by the confidence assuming independence of consequent from antecedent.
A lift ratio greater than 1.0 suggests that there is some usefulness to the rule. The larger the lift ratio, the greater
is the strength of the association. With the lift value, the importance of a rule can be validated in an effective
manner.
Formula List:
Confidence and Life factors are calculated as below,
=
The above factor can also be included while validating any rule set.
Disjunctive rules:
Association rule mining deals conjunctive rules majorly. But using disjunctive rules, extensive rule sets
which are very much useful in mining the dataset can be found out effectively. At times disjunctive rule sets are
also preferred to infer interesting rules. For example ideal rule set for the below query “If classes B or C or D
are committed, what is the chance that A is also committed?” would be B OR C OR D A.
Let X = {i1, i2...in} be a set of items. XC = {i1 AND i2 AND … AND in} or simply {i1i2...in} is a conjunctive
term of X, and XD = {i1 OR i2 OR … OR in} is a Disjunctive term of X. The possible rule is of one of the
following types, involving conjunctive and disjunctive terms from the set of items X U Y| X Ω Y = Φ[2][5].
The below table contains the set of possible positive rule sets which contain Conjunctive and Disjunctive nature
of rules.
Table I :Ruleset of Conjuction and Disjuntion for Positive Rules
Type Support Confidence
Xc ⟹Yc s = P(XC U YC) c = P(YC | XC)
Xc ⟹YD s = P(XC U YD) c = P(YC | XD)
XD ⟹Yc s = P(XD U YC) c = P(YD | XC)
XD ⟹YD s = P(XD U YD) c = P(YD| XD)
Rule sets List1:
1. XY Z ⟹VW - type Xc ⟹Yc
2. XY Z ⟹V OR W - type Xc ⟹YD
3. X OR Y OR Z ⟹ VW - type XD ⟹Yc
4. X OR Y OR Z ⟹ V OR W - type XD ⟹YD
3. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 75 | Page
The number of rules that are found by the Associations mining function can be reduced by using rule
filters if the number of frequent item sets is high. Rule filters are a powerful way to limit the amount of rules to
be generated or the content of the rules. This parameter determines the maximum number of items that occur in
an association rule.In the example the frequent itemset (E,A,C) has been used for analysis and in this case the
maximum rule length is 3 and maximum antecedent allowed is 2; the maximum consequents allowed are 2.
Disjunctive rules with Multiple minimum Supports using Modified FP growth Algorithm using E-Rules:
„E-Rules‟[13] is an algorithm to derive positive conjunctive & disjunctive with the help of genetic
algorithm where in FP growth approach was modified accordingly. It is an integrated algorithm for useful and
effective association rule mining to capture even useful rare items; Conjunctive& Disjunctive rules sets using
Genetic Algorithm; Lift Factor to analyse the strength of derived rules. The previous work was predominantly
worked on „time taken‟ to generate frequent item sets.It was found that there was a drastic difference in „time
taken‟ as modified FP growth algorithm was used in the pace of Apriori algorithm. The salient features of our
earlier were i) Apriori Algorithm has been replaced with modified FP algorithm to reduce the time in generating
candidate item generation ii) Genetic Algorithm is used in generating rule set iii) An integrated approach is
proposed in combining Multiple minimum support, Conjunction and disjunction rule generation ,Lift factor to
validate the result set.
A. Summary of earlier work:
General Framework of E-Rules:
From the below diagram the overall design framework of our earlier work can be understood easily.
As the data set containing various transactions is given as input to E-Rules, it will generate the frequent
item set for all possible LFH and RHS rule sets out of it using modified FP growth algorithm. All possible
positive rule sets (Disjunction and Conjunction) are generated as per proposed rule guides [5].For the rules
generated confident and Lift factors are calculated to validate the strength of the rules. The output is considered
as Useful Rules.
Example:An example is given to demonstrate „ERules‟ .Table II shows dataset that contains 10 transactions and
7 items and the ordered items as per their frequency.
Table II :Dataset for 10 transactions:
TID Items Ordered Items
1 ABDG BADG
2 BDE EBD
3 ABCEF EBACF
4 BDEG EBDG
5 ABCEF EBACF
6 BEG EBG
7 ACDE EADC
8 ABE EBA
9 ABEF EBAF
10 ACDE EADC
Modified FP Growth Algorithm
Transaction Data Set
Frequent Items Set
Disjunctive and Conjunctive
Association Rules
Validate Strong Rules
Useful Rules
4. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 76 | Page
Modified FP growth Algorithm has been taken for generating Frequent Item set to reduce the time
taken to generate frequent item sets. Major steps in FP-growth are,
Step1: It firstly compresses the database showing frequent item set in to FP-tree. FP-tree is built using 2 passes
over the dataset.
Step2: It divides the FP-tree in to a set of conditional database and mines each database separately, thus extract
frequent item sets from FP-tree directly. FP-tree is a frequent pattern tree can be defined below.
FP Tree for the dataset in Table II:
Figure :1 FP Tree generation:
Table III: Conditional Pattern Bases and Frequent items generation:
Item Conditional Path Conditional Pattern Base Conditional FP tree
G {(E:9,B:7),(E:9,B:7,D:2),(B:1,A:1,D:1) {(E:1,B:1), (E1,B:1),(B1,A1,D1)} {(E:2,B:3)}|G}
F {(E:9,B:7,A:4),(E:9,B:7,A:4,C:2)} {(E:1,B:1,A:1),(E:2,B:2,A:2)} {(E:3,B:3,A:3)|F}
C {(E:9,A:2,D:2),(E:9,B:7,A:4)} {(E:2,A:2,D:2),(E:2,B:2,A:2)} {(E:4,A:4)|C}
D {(E:9,A:2),(E:9,B:7),(B:1,A:1)} {(E:2,A:2),(E:2,B;2),(B:1,A:1)} {(E:4,A:3,B:3)}|D
A {(E:9),(E:9,B:7),(B:1)} {(E2),(E:4,B:4),(B:1)} {(E:6,B:5)}|A
B {(E:9)} {(E:7),(ø)} {(E:7),(ø:1)}|B
E ø ø ø
From the FP-tree construction process, it is seen that one needs exactly two scans of the transaction
database. Let us see how the FP tree is constructed for node C. For node C, it derives a frequent pattern (E:4,
A:4) and two paths in the FP-tree (E:9,A:2,D:2) and (E:9,B:7,A:4) .The first path indicates that string (E,A,D
and C) appears twice in the database. To study which string appear together with C, only C's prefix path {E:2;
A:2; D:2} counts. Similarly, the second path indicates string (E,B and A) appears twice in the set of transactions
in DB, or C's prefix path is {E2,B:2,A:2).These two prefix paths of C form C's sub-pattern base, which is called
C's conditional pattern base (i.e., the sub-pattern base under the condition of C's existence). Construction of an
FP-tree on this conditional pattern base (which is called C's conditional FP-tree) leads to branch (E:4,A:4).
Hence the frequent patterns are generated as per Algorithm 2. They are
{E,A,C,(E,A),(E,D),(E,C),(E,A,D),(E,A,C),(E,A,D,C)} . The search for frequent patterns associated with C
terminates. Frequent itemset (E, A, C) has been taken for the demonstration purpose. The possible association
rules can be generated as per below table. First column in the below table denotes whether it is Antecedent or
Consequent; 1-Antecedent 0-Consequent.
NULL
E:9
B:7
A:2
E
:
9
B
:
8
A
:
7
B:1
A:4 A:1
D
:
5
C
:
4
F
:
3
D:2 D:2 D:1
C:2 C:2
F:1 F:2
G
:
3
G:1 G:1 G:1
Items
Head of
Links
5. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 77 | Page
Table IV : Encoded Frequent Item sets for Antecedent and Consequents:
Antecedent/Consequent A C E
1 1 0 0
1 0 1 0
1 0 0 1
1 1 1 0
1 0 1 1
1 1 0 1
0 1 0 0
0 0 1 0
0 0 0 1
0 1 1 0
0 0 1 1
0 1 0 1
1) Bitwise items storage:
In order to read and scan the item sets for the calculation, each item needs to get encoded with respect
to the transaction into bits. Here the encoding style could be 1 for presence of item in transaction and 0 for
absence. Find below the Bit pattern for items.
Table V: Encoding pattern for each Item:
Item Transaction ID Bit Pattern Count
A {TID1,ITD3,TID5, TID7,TID8,TID9,
TID10}
1010101111 7
B {TID1,TID2,TID3, TID4,TID5,TID6,
TID8,TID9}
1111110110 8
C {TID3,TID5,TID7, TID10} 0010101001 4
D {TID1,TID2,TID4, TID7,TID10} 1101001001 5
E {TID2,TID3,TID4, TID5,TID6,TID7,
TID8,TID9,TID10}
0111111111 9
F {TID3,TID5,TID9} 0010100010 3
G {TID1,TID4,TID6} 1001010000 3
Using the above encoding Genetic Algorithm is deriving possible positive rules for Frequent Item sets.
It can be generated as per Rule sets List 1 in Table I.“If A is bought then C and E also bought”A⟹CE; “If A is
bought then C or E also bought “ A⟹C or E Similarly the below possible rules can be derived.
Table VI: Set of Positive conjunctive and disjunctive rules:
Rule Confident Lift
Antecedent Symbol Consequent
E ⟹ AC 0.444444444 1.111111
C+E ⟹ A 0.666666667 0.952381
A ⟹ CE 0.571428571 1.428571
AE ⟹ C 0.666666667 1.666667
A ⟹ C+E 0.857142857 0.952381
E ⟹ A+C 0.666666667 0.952381
A+E ⟹ C 0.4 1
A+C ⟹ E 0.857142857 0.952381
C ⟹ AE 1 1.666667
CE ⟹ A 1 1.428571
AC ⟹ E 1 1.111111
C ⟹ A+E 1 1
In the above table the Confident and its relative lift factors are calculated as per the below
formula.Confident and Lift factors can be calculated using Formula List as mentioned above.In the above table
there are only the conjunctives and disjunctive positive rules are observed. Hence the scope of introducing
negative rules is realized.As a final step the ultimate useful rules can be identified using its confidence and lift
factors. The below table shows the final rules for the frequent item set (E,C and A) .The predefined minimum
confident was 0.75 hence the rules whose confident <0.75 can be pruned by „E-Rules‟ .Hence the result set is as
follows,
6. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 78 | Page
Table VII: Pruned Rules:
Rule Confident Lift
Antecedent Symbol Consequent
A ⟹ C+E 0.857142857 0.952381
A+C ⟹ E 0.857142857 0.952381
C ⟹ AE 1 1.666667
CE ⟹ A 1 1.428571
AC ⟹ E 1 1.111111
C ⟹ A+E 1 1
This is the time to validate the reliability and usefulness. Comparing Lift with Confidence, the Lift of a
rule is a relative measure in the sense that it compares the degree of dependence in a rule versus independence
between the consequent items and the antecedent items. The rules that have higher Lift will have higher
dependence in them.A lift ratio greater than 1.0 suggests that there is some usefulness to the rule. Hence the
useful rules were finalized by checking lift value(>1) as well. „E-Rules‟ generates conjunctive; disjunctive and
positive rules effectively. But the negative rules are not addressed.
II. THE PROPOSED APPROACH: NEGATIVE ASSOCIATION RULES
In much of association rule mining algorithms, result sets are derived for positive rules. And the useful
rules are decided based on factors like pre defined Confidence, Lift etc. In the previous work Table VII shows
such useful rules which are having confidence greater than 0.75 and Lift almost 1 or more. These rules are only
focusing the itemset or combination of itemsets which are present in the dataset. But the absence of these item
sets is not dealt. Observing the absence of items from a transaction record will produce a more complete result
[14]. Hence to mine negative association rules, the information of all rules sets regardless of their
confidence/Lift needs to be maintained to produce negative rule set for each of its positive rule set. Otherwise,
the relationships involving the absence of these item sets cannot be discovered. If an infrequent rule set is
discarded, then it is not possible to further discover negative association rules that involve the absence of this
rule set; for example, 𝑋⇒¬𝑌 and ¬𝑋⇒¬𝑌. Consequently, the opportunity to report a large number of negative
association rules involving the absence of item sets will be lost. Thus, avoidance of infrequent patterns is a
major hindrance in discovering negative association rules.The loss of negative association rules involving the
same item sets used by the positive association rules can be analyzed as below. Among two item sets 𝑋 and 𝑌
used by a positive association rule 𝑋⇒𝑌, there can exist three other negative association rules 𝑋⇒¬𝑌, ¬𝑋⇒𝑌
and ¬𝑋⇒¬𝑌.
A. Negative rule set list for Conjunctive and Disjunctive rules:
Assume that union of item sets X and Y are present in data set.The association rule of the same can be
represented as Xc ⇒Yc where c represents the conjunction nature of antecedent and consequent. The support
for these item sets can be expressed using probability, 𝑃(Xc U Yc). The support of not observing the same item
sets in the same transaction records, also known as absence of an item sets, is given by 𝑃 ¬( Xc U Yc). And it is
nothing but 𝑃 ¬( Xc U Yc) =1−𝑃( Xc U Yc). We can analyse for the loss of negative association rules involving
the same item sets used by the positive association rules. Among two item sets 𝑋 and 𝑌 used by a positive
association rule 𝑋⇒𝑌, there can exist three other negative association rules 𝑋⇒¬𝑌, ¬𝑋⇒𝑌 and ¬𝑋⇒¬𝑌.Hence
for the set of items the conjunctive and disjunctive negative rules can be derived as per the below table.
Table VIII: The modified List of Conjunctive and disjunctive for Positive and Negative Ruleset
Positive Rule
List
Possible
Negative Rules
Support for Presence of
ruleset
Support for absence of
ruleset
Xc ⟹Yc Xc⟹¬Yc s = P(XC U ¬YC) 1-P( Xc U ¬Yc)
¬Xc⟹Yc s = P(¬XC U YC) 1-P(¬Xc U Yc)
¬Xc⟹¬Yc s = P(¬XC U ¬YC) 1-P(¬Xc U ¬Yc)
Xc ⟹YD Xc⟹¬YD s = P(XC U ¬YD) 1- P(XC U ¬YD)
¬Xc⟹ YD s = P(¬XC U YD) 1- P(¬XC U YD)
¬Xc⟹¬ YD s = P(¬XC U ¬YD) 1- P¬ (XC U ¬YD)
XD ⟹Yc XD⟹¬YC s = P(XD U ¬YC) 1- P(XD U ¬YC)
¬XD⟹ Yc s = P(¬XD U YC) 1- P(¬XD U YC)
¬XD⟹¬ YC s = P(¬XD U ¬YC) 1- P(¬XD U ¬YC)
XD ⟹YD XD⟹¬YD s = P(XD U ¬YD) 1- P(XD U ¬YD)
¬XD⟹ YD s = P(¬XD U YD) 1- P(¬XD U YD)
¬XD⟹¬ YD s = P(¬XD U ¬YD) 1- P(¬XD U ¬YD)
7. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 79 | Page
Rule sets List of the type Xc ⟹Yc:
1. XY Z ⟹VW - Positive Rule
2. XY Z ⟹¬VW - Negative Rule
3. ¬XY Z ⟹VW -Negative Rule
4. ¬XY Z ⟹¬VW -Negative Rule
The above shows the list of possible rule set for itemsets XYZ and VW of the type Xc ⟹Yc , where
Xc(Antecedent of conjunctive nature) :XYZ and Yc(Consequent of Conjunctive nature) :VW.
III. FRAME WORK OF THE PROPOSED TASK:
The general overall framework of the proposed work can be represented by the below diagram.
For any given transaction dataset, the proposed algorithm generates the positive, negative rule set in an
effective and optimized manner. As per algorithm mentioned in section V, the set of positive and negative rules
are generated before the rules are going to be pruned as per its minimum confident and its Lift factor. Hence the
chances of losing any useful rules are avoided. So the output of the Algorithm is ensuring the useful and reliable
rules.
IV. GENETIC ALGORITHM:
Genetic Algorithm (GA)[1] incorporates Darwinian evolutionary theory with sexual reproduction. GA
is stochastic search algorithm modeled on the process of natural selection, which underlines biological
evolution. GA has been successfully applied in many search, optimization, and machine learning problems.GA
works in an iteration manner by generating new populations of strings from old ones. Every string is the
encoded binary, real etc., version of a candidate solution .An evaluation function associates a fitness measure to
every string indicating its fitness for the problem. This type of representation is relative to position. Presence of
1 at ith position indicates occurrence of the item in transaction[i]. Similarly presence of 0 at jth position
indicates absence of item in transaction[j].For example, Bit pattern for A can be given as 1010101111 based on
its availability in the transactions in below table. Similarly ¬A gives 0101010000 based on its absence in each
transaction. Hence conjunction and disjunction of few rules can be evaluated and its relative count could be
calculated as follows.
8. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 80 | Page
Boolean Operation on bit pattern for few positive and negative rules:
Table IX: Bit Pattern for All rules:
Items Operation Result Count
A and C {1010101111} and {0010101001} {0010101001} 4
A or C {1010101111} or {0010101001} {1010101111} 7
(A or C) and E ({1010101111} or {0010101001}) and
{0111111111}
{0010101111} 6
¬(A and C) ¬ ({1010101111} and {0010101001}) {1101010110} 6
¬(A or C) ¬ ({{1010101111} or {0010101001}) {0101010000} 3
(A or C) and ¬E ({1010101111} or {0010101001}) and
{1000000000}
{1000000000} 1
¬(A or C) and E ({1010101111} or {0010101001}) and
{0111111111}
{0101010000} 3
¬(A or C) and ¬E ({1010101111} or {0010101001}) and
{1000000000}
{0000000000} 0
From the above table it is clear that the P(negative rule)=1-P(Positive rule)Where P denotes the
probability (refer table)For instance, the frequency of positive rule (A and C) is 4 in the data set, and the
frequency of its negative rule ¬ (A and C) is proved to be 6. It is found to be correct in the above table where the
number of transactions is 10.The generated rules can be analysed with its confident and Lift factors. Below are
few positive & negative rules and its confident & Lift factors for the rule set (A+C)=>E. Similarly the negative
rule sets for the other positive rule sets can be generated.
Table X: List of Positive and Negative Rules
Rule Confident Lift
Antecedent Symbol Consequent
E ⟹ AC 0.444444444 1.111111
C+E ⟹ A 0.666666667 0.952381
A ⟹ CE 0.571428571 1.428571
AE ⟹ C 0.666666667 1.666667
A ⟹ C+E 0.857142857 0.952381
E ⟹ A+C 0.666666667 0.952381
A+E ⟹ C 0.4 1
A+C ⟹ E 0.857142857 0.952381
C ⟹ AE 1 1.666667
CE ⟹ A 1 1.428571
AC ⟹ E 1 1.111111
C ⟹ A+E 1 1
(A + C) ⟹ ¬E 0.142857143 1.42
¬(A + C) ⟹ E 1 1.1
¬(A+ C) ⟹ ¬E 0 0
From the above it is understood that the positive rules having confidence 1 and lift factor more than 1
are most useful and promising rules. But in the case of negative rules we can understand that the rules are
validated if its confident is 1 or lift factor more than 1.Because the absence of any item in a transaction set will
also mean that items may not be available for purchase. Hence in that case we get the highlighted rules as
useful and promising rules which are going to be playing vital role in analyzing and improving the business.
A. Genetic operators:
Genetic Algorithm uses genetic operators [7] to generate the offspring of the existing population. This
section describes three operators of Genetic Algorithms that were used in GA algorithm: selection, crossover
and mutation.
1) Selection: The selection operator chooses a chromosome in the current population according to the fitness
function and copies it without changes into the new population.GA algorithm used route wheel selection where
the fittest members of each generation are more chance to select.
2) Crossover: The crossover operator, according to a certain probability, produces two new chromosomes from
two selected chromosomes by swapping segments of genes.
3) Mutation: The mutation operator is used for maintaining diversity. During the mutation phase and according
to mutation probability, 0.005 in GA algorithm, value of each gene in each selected chromosome is changed.
Genetic algorithms are playing vital role in this overall algorithm. They are involved in, i)generating bit
pattern s for each items& rule set and finding out the count ii)generating positive and negative rules based on
proposed algorithm iii)calculating confident and lift factors for each rule iv) pruning the useless rules.
9. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 81 | Page
V. TO GENERATE NEGATIVE RULES
In order to make „E-Rules‟ to generate negative rules, they are modifications needed in the existing
algorithm [13]. Each frequent item set in frequent item set derived can be given as one of the input for the below
algorithm to derive the respective conjunctive and disjunctive rules.(E,A,C) frequent set has been taken for the
demonstration purpose.
Pseudo code for generating Positive and Negative Rules:
Step 1: Create function whose parameters are Dataset, list_of_antecedents-A , list_of_consequent –C
List of A and C can be generated using the following conditions using Genetic Algorithm.
Min (A) =Cnt (FIS)-[Cnt (FIS)-1];Max (A) =Cnt (FIS)-[(Cnt (FIS)-(Cnt (FIS)-1))]
Example, If Frequent Item Set (FIS) =3 then Maximum Antecedes=2 and Minimum Antecedents =1
This holds true for consequents as well, hence, Min(C) = Cnt (FIS)-[Cnt (FIS)-1];Max(C) = Cnt (FIS)-[(Cnt
(FIS)-(Cnt (FIS)-1))]
[A- Antecedent; C-Consequent; Cnt- number of frequent item set;Bit pattern for the frequent item set along with
operators are like in Table IV.And the function returns interesting rules as per Rule sets like in Table X]
Step 2: Find out allowed antecedents (A) and Consequent(C). Here A and C contains list of FIS.
Step 3: Generation of Negative rules:
P←ø, N←ø AR←ø ---initially null
P-returns Positive rules;N-returns Negative rules;AR- union of Positive and negative rules
For each Item set A and for each item set C, Create an offspring when bit pattern (A) is not equal to bit
pattern(C).
If Bit pattern (A) and/or Bit pattern(C)>= min_predefined_min_support then return the rule P U AC
i. If Bit pattern (A) and/or Bit pattern ¬ (C)>=min_predefined_min_support then return the
rule N UA¬C
ii. If Bit pattern ¬ (A) and/or Bit pattern (C)>=min_predefined_min_support then return the
rule N U ¬AC
iii. If Bit pattern ¬ (A) and/or Bit pattern ¬(C)>=min_predefined_min_support then return
the rule N U ¬A¬C
b. Else Prune the rule.
Step 4: Return P U N
The output of this algorithm is a combination of positive (P) and negative rules (N).
VI. CONCLUSION
„E-Rules‟ has become a complete algorithm which will generate conjunctive&disjunctive,
positive&negative ruleset effectively. Negative rule sets are generated using logical rule set mapping for each
positive rule set. Hence it is useful and effective association rule mining algorithm to capture even useful rare
items using Genetic Algorithm in more optimal manner. This work is unique as it is dealing an algorithm to
generate negative rulesets for disjunctive and conjunctive rules, which was not addressed in the earlier work
[13].
REFERENCES
[1] Anandhavalli M, Suraj Kumar Sudhanshu, Ayush Kumar and Ghose M.K. “Optimized association rule mining using genetic
algorithm”, Advances in Information Mining, ISSN: 1( 2), 2009, 0975–3265.
[2] Marcus C. Sampaio, Fernando H. B. Cardoso, Gilson P. dos Santos Jr.,Lile Hattori “Mining Disjunctive Association Rules” 15
Aug. 2008
[3] Bing Liu, Wynne Hsu and Yiming Ma “Mining Association Rules with Multiple Minimum Supports”; ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining (KDD-99), August 15-18, 1999, San Diego, CA, USA.
[4] Yeong-Chyi Lee a, Tzung-Pei Hong b,_, Wen-Yang Lin , Mining “Association Rules with Multiple Minimum Supports Using
Maximum Constraints”; Elsevier Science, November `22, 2004.
[5] Michelle Lyman, “Mining Disjunctive Association Rules Using Genetic Programming” The National Conference On
Undergraduate Research (NCUR); April 21-23, 2005
[6] Farah Hanna AL-Zawaidah, Yosef Hasan Jbara, Marwan AL-Abed Abu-Zanona, “An Improved Algorithm for Mining
Association Rules in Large Databases” ; Vol. 1, No. 7, 311-316, 2011
[7] Rupesh Dewang, Jitendra Agarwal, “A New Method for Generating All Positive and Negative Association Rules”; International
Journal on Computer Science and Engineering, vol.3,pp. 1649-1657,2011
[8] Olafsson Sigurdur, Li Xiaonan, and Wu Shuning.”Operations research and data mining,in”: European Journal of Operational
Research 187 (2008) pp:1429–1448.
[9] Agrawal R., Imielinksi T. and Swami A. “Database mining: a performance perspective”, IEEETransactions on Knowledge and Data
Engineering 5 (6), (1993), pp: 914–925.
[10] Kannika Nirai Vaani.M, Ramaraj E “An integrated approach to derive effective rules from association rule mining using genetic
algorithm”, Pattern Recognition, Informatics and Medical Engineering (PRIME), 2013 International Conference, (2013), pp: 90–95.
[11] Jiawei Han, Jian Pei, and Yiwen Yin. “Mining Frequent Patterns without Candidate Generation”, Data Mining and Knowledge
Discovery (8), (2004), pp: 53-87.
10. An Optimal Approach to derive Disjunctive Positive and Negative Rules from Association Rule
www.iosrjournals.org 82 | Page
[12] Oskar Kohonen ,Popular Algorithms in Data Mining and Machine Learning (http://www.cis.hut.fi/Opinnot/T-61.6020/2008/fptree.pdf
;),2008.
[13] Kannika Nirai Vaani.M, Ramaraj E “E-Rules: An Enhanced Approach to derive disjunctive and useful Rules from Association Rule
Mining without candidate item generation”,VOL 72,NO 19,(June 2013).
[14] Alex Tze Hiang Sim, Maria Indrawan, Samar Zutshi, Bala Srinivasan “Logic-Based Pattern Discovery”, IEEETransactions On
Knowledge And Data Engineering, VOL. 22, NO. 6,( JUNE 2010).