This document proposes an algorithm to efficiently discover irregular association rules from large databases. Irregular rules represent patterns that occur rarely together, such as wrong decisions, illegal practices, or variability in decisions. The algorithm treats items as either actions (decisions, actions, outputs) or non-actions (facts, statements, criteria). It finds rules where the antecedent contains non-action items that occur frequently and the consequent contains rare action items. This approach can detect fraud, abuse, or other irregularities more effectively than methods that only consider item frequency. The algorithm's effectiveness is demonstrated on real patient data.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
This document provides an example of conducting a one-sample hypothesis test to determine if the claimed mean battery life of 500 days by a manufacturer is accurate based on a sample of battery data. The null and alternative hypotheses are defined as H0: μ = 500 days and HA: μ < 500 days. The sample mean of 490 days is calculated and found to be 2.19 standard deviations below the claimed mean. The p-value is the probability of obtaining a sample mean this low or lower if the null hypothesis is true, which is calculated to be 0.0142, a small probability. This provides evidence to reject the manufacturer's claim and conclude the population mean is actually less than 500 days.
This document proposes a new algorithm called CH-Search for discovering association rules from transactional datasets without requiring a minimum support threshold. The algorithm is based on propositional logic and discovers rules that map to logical equivalences. It considers both positively and negatively correlated rules. Patterns are discovered from the generated rules and are determined to be more efficient than the rules. The algorithm discovers a natural threshold from the data and is not dependent on domain expertise. It can flexibly handle different database formats and discovers both frequent and infrequent itemsets.
Advanced Statistical Manual for Ayurveda ResearchAyurdata
These slides covers more advanced statistical applications including that in data science.
The mode of presentation is that the concept is introduced first, followed by illustration and the use in a real context.
The document discusses hiding sensitive association rules in data mining. It proposes an approach that alters the position of sensitive items in transactions while keeping their overall support unchanged. This is unlike existing approaches that increase or decrease the size of the database or support of sensitive items. The key advantages are hiding more rules with fewer alterations while preserving the support of sensitive items and database size. An algorithm is presented and example demonstrations are provided. The performance is compared to prior approaches.
Evidential reasoning based decision system to select health care locationIJAAS Team
The general public’s demand of Bangladesh for safe health is rising promptly with the improvement of the living standard. However, the allocation of limited and unbalanced medical resources is deteriorating the assurance of safe health of the people. Therefore, the new hospital construction with rational allocation of resources is imminent and significant. The site selection for establishing a hospital is one of the crucial policy-related decisions taken by planners and policy makers. The process of hospital site selection is inherently complicated because of this involves many factors to be measured and evaluated. These factors are expressed both in objective and subjective ways where as a hierarchical relationship exists among the factors. In addition, it is difficult to measure qualitative factors in a quantitative way, resulting incompleteness in data and hence, uncertainty. Besides it is essential to address the subject of uncertainty by using apt methodology; otherwise, the decision to choose a suitable site will become inapt. Therefore, this paper demonstrates the application of a novel method named belief rulebased inference methodology-RIMER base intelligent decision system(IDS), which is capable of addressing suitable site for hospital by taking account of large number of criteria, where there exist factors of both subjective and objective nature.
A Periodical Production Plan for Uncertain Orders in a Closed-Loop Supply Cha...IJERA Editor
This document proposes fuzzy set theory models to address production planning under uncertain demand in a closed-loop supply chain system. Specifically, it develops a Fuzzy Chance-Constrained Production Mix Model (FCCPMM) that uses fuzzy set concepts like possibility distributions and α-cut sets to formulate constraints allowing the decision maker to account for demand uncertainty in an optimization model seeking to maximize profit. The model is demonstrated through a numerical example and is intended to help producers better cope with production risks from uncertain customer orders in a closed-loop supply chain context.
This document discusses using IBM's Watson technology in healthcare applications, specifically in radiology. It describes how Watson can consume medical reports and patient information to provide differential diagnoses, treatment recommendations, and next steps. It also outlines challenges of integrating Watson with existing medical systems and ensuring ongoing access to updated medical data sources. While Watson excels at data analysis, the document notes that technology may struggle to replace the human qualities patients desire in their doctors, such as empathy and personalization.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
This document provides an example of conducting a one-sample hypothesis test to determine if the claimed mean battery life of 500 days by a manufacturer is accurate based on a sample of battery data. The null and alternative hypotheses are defined as H0: μ = 500 days and HA: μ < 500 days. The sample mean of 490 days is calculated and found to be 2.19 standard deviations below the claimed mean. The p-value is the probability of obtaining a sample mean this low or lower if the null hypothesis is true, which is calculated to be 0.0142, a small probability. This provides evidence to reject the manufacturer's claim and conclude the population mean is actually less than 500 days.
This document proposes a new algorithm called CH-Search for discovering association rules from transactional datasets without requiring a minimum support threshold. The algorithm is based on propositional logic and discovers rules that map to logical equivalences. It considers both positively and negatively correlated rules. Patterns are discovered from the generated rules and are determined to be more efficient than the rules. The algorithm discovers a natural threshold from the data and is not dependent on domain expertise. It can flexibly handle different database formats and discovers both frequent and infrequent itemsets.
Advanced Statistical Manual for Ayurveda ResearchAyurdata
These slides covers more advanced statistical applications including that in data science.
The mode of presentation is that the concept is introduced first, followed by illustration and the use in a real context.
The document discusses hiding sensitive association rules in data mining. It proposes an approach that alters the position of sensitive items in transactions while keeping their overall support unchanged. This is unlike existing approaches that increase or decrease the size of the database or support of sensitive items. The key advantages are hiding more rules with fewer alterations while preserving the support of sensitive items and database size. An algorithm is presented and example demonstrations are provided. The performance is compared to prior approaches.
Evidential reasoning based decision system to select health care locationIJAAS Team
The general public’s demand of Bangladesh for safe health is rising promptly with the improvement of the living standard. However, the allocation of limited and unbalanced medical resources is deteriorating the assurance of safe health of the people. Therefore, the new hospital construction with rational allocation of resources is imminent and significant. The site selection for establishing a hospital is one of the crucial policy-related decisions taken by planners and policy makers. The process of hospital site selection is inherently complicated because of this involves many factors to be measured and evaluated. These factors are expressed both in objective and subjective ways where as a hierarchical relationship exists among the factors. In addition, it is difficult to measure qualitative factors in a quantitative way, resulting incompleteness in data and hence, uncertainty. Besides it is essential to address the subject of uncertainty by using apt methodology; otherwise, the decision to choose a suitable site will become inapt. Therefore, this paper demonstrates the application of a novel method named belief rulebased inference methodology-RIMER base intelligent decision system(IDS), which is capable of addressing suitable site for hospital by taking account of large number of criteria, where there exist factors of both subjective and objective nature.
A Periodical Production Plan for Uncertain Orders in a Closed-Loop Supply Cha...IJERA Editor
This document proposes fuzzy set theory models to address production planning under uncertain demand in a closed-loop supply chain system. Specifically, it develops a Fuzzy Chance-Constrained Production Mix Model (FCCPMM) that uses fuzzy set concepts like possibility distributions and α-cut sets to formulate constraints allowing the decision maker to account for demand uncertainty in an optimization model seeking to maximize profit. The model is demonstrated through a numerical example and is intended to help producers better cope with production risks from uncertain customer orders in a closed-loop supply chain context.
This document discusses using IBM's Watson technology in healthcare applications, specifically in radiology. It describes how Watson can consume medical reports and patient information to provide differential diagnoses, treatment recommendations, and next steps. It also outlines challenges of integrating Watson with existing medical systems and ensuring ongoing access to updated medical data sources. While Watson excels at data analysis, the document notes that technology may struggle to replace the human qualities patients desire in their doctors, such as empathy and personalization.
Finding Symmetric Association Rules to Support Medical Qualitative Researchrazanpaul
This document proposes an algorithm to discover both asymmetric and symmetric relationships between medical attributes from patient data. Existing algorithms only find asymmetric relationships or relationships between frequent items. The proposed algorithm allows medical researchers to specify constraints like minimum support and confidence for groups of attributes, which attributes can appear in the antecedent, consequent, or both. It maps complex medical data like numbers and text to items. It generates candidate itemsets based on group constraints and uses support to find desired itemsets. The goal is to find meaningful symmetric relationships between specified medical attributes.
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
This document discusses an optimal approach to derive disjunctive positive and negative association rules from association rule mining using a genetic algorithm. It aims to address some shortfalls of conventional algorithms like Apriori by supporting disjunctive rules, using multiple minimum support thresholds, and effectively identifying negative rules. The proposed approach uses a modified FP-Growth algorithm and genetic algorithm to generate conjunctive and disjunctive positive and negative rules in an optimized manner by reducing candidate generation time and capturing useful rare item relationships.
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
The document presents an algorithm for sanitizing a database to hide sensitive patterns while minimizing changes to the original data. It identifies sensitive transactions containing restrictive patterns to be hidden and sorts them by degree and size. It then selects items with the maximum cover across restrictive patterns and removes them from sensitive transactions, reducing support for the patterns. This process iterates until restrictive pattern support is reduced to 0. The sanitized database combines modified sensitive transactions with unmodified non-sensitive transactions. The algorithm is tested on sample databases to evaluate effectiveness with minimal impact on the original data.
The document summarizes research on improving the Apriori algorithm for association rule mining. It first provides background on association rule mining and the standard Apriori algorithm. It then discusses several proposed improvements to Apriori, including reducing the number of database scans, shrinking the candidate itemset size, and using techniques like pruning and hash trees. Finally, it outlines some open challenges for further optimizing association rule mining.
This document summarizes research on improving the Apriori algorithm for mining association rules from transactional databases. It first provides background on association rule mining and describes the basic Apriori algorithm. The Apriori algorithm finds frequent itemsets by multiple passes over the database but has limitations of increased search space and computational costs as the database size increases. The document then reviews research on variations of the Apriori algorithm that aim to reduce the number of database scans, shrink the candidate sets, and facilitate support counting to improve performance.
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining IJDKP
Data mining services require accurate input data for their results to be meaningful, but privacy concerns
may influence users to provide spurious information. In order to preserve the privacy of the client in data
mining process, a variety of techniques based on random perturbation of data records have been proposed
recently. We focus on an improved distortion process that tries to enhance the accuracy by selectively
modifying the list of items. The normal distortion procedure does not provide the flexibility of tuning the
probability parameters for balancing privacy and accuracy parameters, and each item's presence/absence
is modified with an equal probability. In improved distortion technique, frequent one item-sets, and nonfrequent one item-sets are modified with a different probabilities controlled by two probability parameters
fp, nfp respectively. The owner of the data has a flexibility to tune these two probability parameters (fp and
nfp) based on his/her requirement for privacy and accuracy. The experiments conducted on real time
datasets confirmed that there is a significant increase in the accuracy at a very marginal cost in privacy.
This document provides an overview of data mining techniques and tools. It discusses data mining processes like predictive and descriptive data mining. It describes various data mining tasks such as classification, clustering, regression, and association rule learning. It then examines specific techniques for prediction using data mining, including classification analysis, association rule learning, decision trees, neural networks, and clustering analysis. Finally, it reviews several popular open-source tools that can be used to implement these data mining techniques, such as RapidMiner, Oracle Data Mining, IBM SPSS Modeler, KNIME, Python, Orange, Kaggle, Rattle, and Weka.
This document describes a decision support system (DSS) that uses the Apriori algorithm, genetic algorithm, and fuzzy logic to analyze medical data and make accurate diagnostic decisions. The DSS first uses Apriori to extract association rules from pre-processed medical data. It then applies a genetic algorithm to optimize the results and determine optimal attribute values. Finally, it employs fuzzy logic for decision-making based on the optimized attribute values. The authors tested their DSS on diabetes data and found the results to be interesting. Their proposed system aims to help medical professionals make quicker and more accurate diagnostic decisions.
Clustering Medical Data to Predict the Likelihood of Diseasesrazanpaul
This document proposes a constraint k-Means-Mode clustering algorithm to predict the likelihood of diseases using medical data containing both continuous and categorical attributes. It first maps complex medical data to mineable items using domain dictionaries and rule bases. The developed algorithm can handle both continuous and discrete data, perform clustering based on anticipated likelihood attributes with core disease attributes, and was tested on a real-world patient dataset to demonstrate its effectiveness.
Re-mining Positive and Negative Association Mining ResultsGurdal Ertek
Positive and negative association mining are well-known and
extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, separately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been handled
using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the association rules, are characterized and described through the second data
mining stage re-mining. The applicability of the methodology is demonstrated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue.
https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-positive-and-negative-association-mining-results/
‘Erules’ [3] is an integrated algorithm that is used to mine any data warehouse to extract useful and reliable rule sets effectively. It is used to generate positive &negative; conjunctive & disjunctive rules with the help of genetic algorithm and modified FP growth & Apriori Algorithms accordingly. It is an integrated algorithm for useful and effective association rule mining to capture even useful rare items; Lift Factor is also used to analyze the strength of derived rules. However redundant rules were one of the major challenges which were not addressed. This paper concisely deals the elimination of rule sets with the appropriate modification with the existing algorithm so that it can generate positive and negative rule sets for the non redundant rules with less cost. Besides a voluminous Pharmacy data set has been taken and the effectiveness /performance of ‘Erules’ got measured on it.
Interestingness Measures In Rule Mining: A ValuationIJERA Editor
In data mining it is normally desirable that discovered knowledge should possess characteristics such as accuracy, comprehensibility and interestingness. The vast majority of data mining algorithms generate patterns that are accurate and reliable but they might not be interesting. Interestingness measures are used to find the truly interesting rules which will help the user in decision making in exceptional state of affairs. A variety of interestingness measures for rule mining have been suggested by researchers in the field of data mining. In this paper we are going to carry out a valuation of these interestingness measures and classify them from numerous perspectives.
Comparative study of frequent item set in data miningijpla
In this paper, we are an overview of already presents frequent item set mining algorithms. In these days
frequent item set mining algorithm is very popular but in the frequent item set mining computationally
expensive task. Here we described different process which use for item set mining, We also compare
different concept and algorithm which used for generation of frequent item set mining From the all the
types of frequent item set mining algorithms that have been developed we will compare important ones. We
will compare the algorithms and analyze their run time performance.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
This document presents an algorithm for mining negative association rules from transactional databases. Negative association rules consider items that are absent from transactions in addition to items that are present. The proposed algorithm uses the conviction measure in addition to support and confidence to mine both positive and negative association rules without requiring extra database scans. Experimental results on a real dataset show the algorithm performs better than an existing approach in discovering stronger positive and negative rules. Future work plans to test the algorithm on additional real datasets and compare its performance to other related algorithms.
Using rule based classifiers for the predictive analysis of breast cancer rec...Alexander Decker
The document discusses using rule-based classifiers to predict breast cancer recurrence. It analyzed 286 cancer patient records using data mining tools including RIPPER, decision trees (DT), and decision tables with naive Bayes (DTNB). Experimental results found DTNB provided the most accurate predictions of recurrence compared to the other classifiers. The generated rule set from DTNB can be used to label new patients as developing or not developing recurrence based on their characteristics, assisting doctors in making faster decisions.
11.using rule based classifiers for the predictive analysis of breast cancer ...Alexander Decker
The document discusses using rule-based classifiers to predict breast cancer recurrence. It analyzed 286 cancer patient records using data mining tools including RIPPER, decision trees (DT), and decision tables with naive Bayes (DTNB). Experimental results found DTNB provided the most accurate predictions of recurrence compared to the other classifiers. The generated rule set from DTNB can be used to label new patients as developing or not developing recurrence based on their characteristics, assisting doctors in making faster decisions.
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
This document reviews algorithms that can be used for crime analysis and prediction. It discusses various data mining and machine learning techniques including classification algorithms like decision trees, k-nearest neighbors, and random forests as well as clustering algorithms like k-means clustering. Deep learning techniques are also examined for identifying relationships between different types of crimes and predicting where and when crimes may occur. The document evaluates these different algorithmic approaches and concludes that major developments in data science and machine learning now allow for effective crime analysis and prediction by discovering patterns in criminal data.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Finding Symmetric Association Rules to Support Medical Qualitative Researchrazanpaul
This document proposes an algorithm to discover both asymmetric and symmetric relationships between medical attributes from patient data. Existing algorithms only find asymmetric relationships or relationships between frequent items. The proposed algorithm allows medical researchers to specify constraints like minimum support and confidence for groups of attributes, which attributes can appear in the antecedent, consequent, or both. It maps complex medical data like numbers and text to items. It generates candidate itemsets based on group constraints and uses support to find desired itemsets. The goal is to find meaningful symmetric relationships between specified medical attributes.
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
This document discusses an optimal approach to derive disjunctive positive and negative association rules from association rule mining using a genetic algorithm. It aims to address some shortfalls of conventional algorithms like Apriori by supporting disjunctive rules, using multiple minimum support thresholds, and effectively identifying negative rules. The proposed approach uses a modified FP-Growth algorithm and genetic algorithm to generate conjunctive and disjunctive positive and negative rules in an optimized manner by reducing candidate generation time and capturing useful rare item relationships.
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
The document presents an algorithm for sanitizing a database to hide sensitive patterns while minimizing changes to the original data. It identifies sensitive transactions containing restrictive patterns to be hidden and sorts them by degree and size. It then selects items with the maximum cover across restrictive patterns and removes them from sensitive transactions, reducing support for the patterns. This process iterates until restrictive pattern support is reduced to 0. The sanitized database combines modified sensitive transactions with unmodified non-sensitive transactions. The algorithm is tested on sample databases to evaluate effectiveness with minimal impact on the original data.
The document summarizes research on improving the Apriori algorithm for association rule mining. It first provides background on association rule mining and the standard Apriori algorithm. It then discusses several proposed improvements to Apriori, including reducing the number of database scans, shrinking the candidate itemset size, and using techniques like pruning and hash trees. Finally, it outlines some open challenges for further optimizing association rule mining.
This document summarizes research on improving the Apriori algorithm for mining association rules from transactional databases. It first provides background on association rule mining and describes the basic Apriori algorithm. The Apriori algorithm finds frequent itemsets by multiple passes over the database but has limitations of increased search space and computational costs as the database size increases. The document then reviews research on variations of the Apriori algorithm that aim to reduce the number of database scans, shrink the candidate sets, and facilitate support counting to improve performance.
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining IJDKP
Data mining services require accurate input data for their results to be meaningful, but privacy concerns
may influence users to provide spurious information. In order to preserve the privacy of the client in data
mining process, a variety of techniques based on random perturbation of data records have been proposed
recently. We focus on an improved distortion process that tries to enhance the accuracy by selectively
modifying the list of items. The normal distortion procedure does not provide the flexibility of tuning the
probability parameters for balancing privacy and accuracy parameters, and each item's presence/absence
is modified with an equal probability. In improved distortion technique, frequent one item-sets, and nonfrequent one item-sets are modified with a different probabilities controlled by two probability parameters
fp, nfp respectively. The owner of the data has a flexibility to tune these two probability parameters (fp and
nfp) based on his/her requirement for privacy and accuracy. The experiments conducted on real time
datasets confirmed that there is a significant increase in the accuracy at a very marginal cost in privacy.
This document provides an overview of data mining techniques and tools. It discusses data mining processes like predictive and descriptive data mining. It describes various data mining tasks such as classification, clustering, regression, and association rule learning. It then examines specific techniques for prediction using data mining, including classification analysis, association rule learning, decision trees, neural networks, and clustering analysis. Finally, it reviews several popular open-source tools that can be used to implement these data mining techniques, such as RapidMiner, Oracle Data Mining, IBM SPSS Modeler, KNIME, Python, Orange, Kaggle, Rattle, and Weka.
This document describes a decision support system (DSS) that uses the Apriori algorithm, genetic algorithm, and fuzzy logic to analyze medical data and make accurate diagnostic decisions. The DSS first uses Apriori to extract association rules from pre-processed medical data. It then applies a genetic algorithm to optimize the results and determine optimal attribute values. Finally, it employs fuzzy logic for decision-making based on the optimized attribute values. The authors tested their DSS on diabetes data and found the results to be interesting. Their proposed system aims to help medical professionals make quicker and more accurate diagnostic decisions.
Clustering Medical Data to Predict the Likelihood of Diseasesrazanpaul
This document proposes a constraint k-Means-Mode clustering algorithm to predict the likelihood of diseases using medical data containing both continuous and categorical attributes. It first maps complex medical data to mineable items using domain dictionaries and rule bases. The developed algorithm can handle both continuous and discrete data, perform clustering based on anticipated likelihood attributes with core disease attributes, and was tested on a real-world patient dataset to demonstrate its effectiveness.
Re-mining Positive and Negative Association Mining ResultsGurdal Ertek
Positive and negative association mining are well-known and
extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, separately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been handled
using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the association rules, are characterized and described through the second data
mining stage re-mining. The applicability of the methodology is demonstrated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue.
https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-positive-and-negative-association-mining-results/
‘Erules’ [3] is an integrated algorithm that is used to mine any data warehouse to extract useful and reliable rule sets effectively. It is used to generate positive &negative; conjunctive & disjunctive rules with the help of genetic algorithm and modified FP growth & Apriori Algorithms accordingly. It is an integrated algorithm for useful and effective association rule mining to capture even useful rare items; Lift Factor is also used to analyze the strength of derived rules. However redundant rules were one of the major challenges which were not addressed. This paper concisely deals the elimination of rule sets with the appropriate modification with the existing algorithm so that it can generate positive and negative rule sets for the non redundant rules with less cost. Besides a voluminous Pharmacy data set has been taken and the effectiveness /performance of ‘Erules’ got measured on it.
Interestingness Measures In Rule Mining: A ValuationIJERA Editor
In data mining it is normally desirable that discovered knowledge should possess characteristics such as accuracy, comprehensibility and interestingness. The vast majority of data mining algorithms generate patterns that are accurate and reliable but they might not be interesting. Interestingness measures are used to find the truly interesting rules which will help the user in decision making in exceptional state of affairs. A variety of interestingness measures for rule mining have been suggested by researchers in the field of data mining. In this paper we are going to carry out a valuation of these interestingness measures and classify them from numerous perspectives.
Comparative study of frequent item set in data miningijpla
In this paper, we are an overview of already presents frequent item set mining algorithms. In these days
frequent item set mining algorithm is very popular but in the frequent item set mining computationally
expensive task. Here we described different process which use for item set mining, We also compare
different concept and algorithm which used for generation of frequent item set mining From the all the
types of frequent item set mining algorithms that have been developed we will compare important ones. We
will compare the algorithms and analyze their run time performance.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
This document presents an algorithm for mining negative association rules from transactional databases. Negative association rules consider items that are absent from transactions in addition to items that are present. The proposed algorithm uses the conviction measure in addition to support and confidence to mine both positive and negative association rules without requiring extra database scans. Experimental results on a real dataset show the algorithm performs better than an existing approach in discovering stronger positive and negative rules. Future work plans to test the algorithm on additional real datasets and compare its performance to other related algorithms.
Using rule based classifiers for the predictive analysis of breast cancer rec...Alexander Decker
The document discusses using rule-based classifiers to predict breast cancer recurrence. It analyzed 286 cancer patient records using data mining tools including RIPPER, decision trees (DT), and decision tables with naive Bayes (DTNB). Experimental results found DTNB provided the most accurate predictions of recurrence compared to the other classifiers. The generated rule set from DTNB can be used to label new patients as developing or not developing recurrence based on their characteristics, assisting doctors in making faster decisions.
11.using rule based classifiers for the predictive analysis of breast cancer ...Alexander Decker
The document discusses using rule-based classifiers to predict breast cancer recurrence. It analyzed 286 cancer patient records using data mining tools including RIPPER, decision trees (DT), and decision tables with naive Bayes (DTNB). Experimental results found DTNB provided the most accurate predictions of recurrence compared to the other classifiers. The generated rule set from DTNB can be used to label new patients as developing or not developing recurrence based on their characteristics, assisting doctors in making faster decisions.
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
This document reviews algorithms that can be used for crime analysis and prediction. It discusses various data mining and machine learning techniques including classification algorithms like decision trees, k-nearest neighbors, and random forests as well as clustering algorithms like k-means clustering. Deep learning techniques are also examined for identifying relationships between different types of crimes and predicting where and when crimes may occur. The document evaluates these different algorithmic approaches and concludes that major developments in data science and machine learning now allow for effective crime analysis and prediction by discovering patterns in criminal data.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to Mining Irregular Association Rules based on Action & Non-action Type Data (20)
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
2. negative association rules, P ¬Q, ¬P Q and ¬P 3. Irregular Association Rules
¬Q. To extract negative association rules, most
papers employ different correlation measures Let D = t1 , t 2 , . . . , t n be a database of n
between attributes [12-14]. In [13], the author transactions with a set of items I = i1 , i2 , . . . , im .
proposed a level-wise search algorithm for mining Let set of action items of I be AI = ai1 , ai2 , . . . , aik
both positive and negative association rules that where k is the number of action items. Let set of
employs rule dependency measures. In [14], authors non-action items of I be
proposed another level-wise search algorithm for NAI = nai1 , nai2 , . . . , naim k where is the
simultaneously extracting positive and negative number of non- action items. For an itemset P I
association rules using Pearson correlation and a transaction t in D, we say that t supports P if t
coefficient. In [15] , author have proposed detection has values for all the attributes in P; for conciseness,
model using multi layer perceptron neural networks we also write P t. By Dp we denote the
(MLP) to detect fraud/abuse problem based on
transactions that contain all attributes in P. The
medical claims. It has been proposed to detect new, D
unusual and known fraudulent/abusive behaviors. It support of P is computed as P = p , i.e. the
works based on detection model which is very slow fraction of transactions containing P. A irregular rule
and need huge memory requirement to analyze is of the form: P Q, with P NAI, Q A I, P
existing large database. In [16], author used positive Q = . To hold the rule following condition must
association rule to build clinical pathways, which can meet: P P or support P >=
detect fraud and abuse on new data. However, this , P P , Q or support P, Q <
model cannot detect fraud and abuse from the =
existing large healthcare data. Our proposed P P ,Q
and <= confidence where P(x)
approach detects fraud and abuse from the existing P P
large information. is the probability of x.
Original Mapped Original Mapped
Generate dictionary for value value value value
each categorical attribute Headache 1 Yes 1
Fever 2 No 2
PatientActual Data
Age Smoke Diagnosis Dictionary of Dictionary of
ID Diagnosis attribute Smoke attribute
1020D 33 Yes Headache
1021D 63 No Fever Map to integer items using
rule base and dictionaries
Actual data
If age <= 12 then 1
Medical If 13<=age<=60 then 2
domain If 60 <=age then 3 Patient Age Smoke Diagnosis
knowledge If smoke = y then 1 ID
If smoke = n then 2 1020D 2 1 1
If Sex = M then 1
1021D 3 2 2
If Sex = F then 2
Rule Base Data suitable for Knowledge Discovery
Figure 1. Data transformation of medical data
4. Mapping complex medical data to map continuous numerical data to items using these
mineable items developed rules.
We have used domain dictionary approach to
For knowledge discovery, the medical data have transform the data, for which medical domain expert
to be transformed into a suitable transaction format knowledge is not applicable, to numerical form. As
to discover knowledge. We have addressed the cardinality of attributes except continuous numeric
problem of mapping complex medical data to items data are not high in medical domain, these attribute
using domain dictionary and rule base as shown in values are mapped integer values using medical
figure 1. The medical data are types of categorical, domain dictionaries. Therefore, the mapping process
continuous numerical data, Boolean, interval, is divided in two phases. Phase 1: a rule base is
percentage, fraction and ratio. Medical domain constructed based on the knowledge of medical
expert have the knowledge of how to map ranges of domain experts and dictionaries are constructed for
numerical data for each attribute to a series of items. attributes where domain expert knowledge is not
For example, there are certain conventions to applicable, Phase 2: attribute values are mapped to
consider a person is young, adult, or elder with integer values using the corresponding rule base and
respect to age. A set of rules is created for each the dictionaries.
continuous numerical attribute using the knowledge
of medical domain experts. A rule engine is used to 5. The proposed algorithm
64
3. General intuition of this algorithm is as follows:
based on a set of lab tests with same results, if 99% 5.1. Candidate Generation
doctors practice patients as disease x and 1 percent
doctors practice patients as other diseases, then there The idea behind candidate generation of all level-
is a strong possibility that this 1 percent doctors are wise algorithms like Apriori is based on the
doing illegal practice. In other words, if consequent following simple fact: Every subset of a frequent
C occurs infrequently with antecedent A and itemset is frequent so that they can reduce the
number of itemsets that have to be checked.
that is a strong candidate of variability. In every However, our proposed algorithm in candidate
domain, there are a set of facts. Based on these facts, generation phase check this fact if the itemsets only
decision and action are taken. In a rule S T, if S contains non-action items. This idea makes itemsets
contains a set of facts and T contains decision or consist of both rare action items and high frequency
action. Then such rules represent the decision T with non-action items. If the new candidate contains one
their corresponding facts S. If S T has sufficient or more action items then it is selected as a valid
support and confidence then it represents that candidate. If the new candidate contains only non-
decision or action T is taken routinely based on facts action items then, it is selected as a valid candidate
S. However, if S is high frequent and rule S-T has only if every subset of new candidate is frequent.
very low confidence. Then it indicates based on facts This way the algorithm keeps the new candidates
S any other decision instead of T is usually taken. It that have one or more action items.
also indicates that the decision is exceptionally taken
based on these facts. The main features of the 5.2. Candidate Selection
proposed algorithm are as follows:
If minimum support is only used like We have used two separate supports metrics to filter
conventional association mining algorithm, out candidates. An itemset with only non-action
desired itemsets that involve rarely appeared items is compared with minimum antecedent support
action items with the high frequent non-action metric as non-action items can only take part in
items will not be found. To find rules that antecedent part of irregular rule, which need to be
involve both frequent antecedent part and rare high frequent. An itemset with one or more action
consequent items, we have used two support items is compared with maximum antecedent
metrics: minimum antecedent support, consequent support metric to keep rare action items
maximum antecedent consequent support. with the high frequent non-action items. An itemset
The proposed algorithm uses maximum with only non-action items is selected if it has
confidence constraint instead of widely used support greater or equal to minimum antecedent
minimum confidence constraint to form the support. An itemset with one or more action items
rules. Moreover, it partitions itemsets into action is selected if it has support smaller or equal to
item and non-action items instead of subset maximum antecedent consequent support. By this
generation to form rules. way, itemsets are explored which has high support
Rules have non-action items in the antecedent for non-action items and low support for action items
and action items in the consequent. with high support non-action items. Here pruning is
In candidate generation, it does not check the based mostly on minimum antecedent support,
maximum antecedent consequent support and
checki
or more action items to keep that itemset.
Let MAS is minimum antecedent support, MACS
is maximum antecedent consequent support, Ij is the 5.3. Generating Association Rule
itemsets of size j, Sm is the desired itemset of size m;
Ck be the sets of candidates of size k. Figure 2 shows This problem needs association rules that represent
the association mining algorithm for finding irregular irregular relationships between action and non-action
rule. Like algorithm Apriori, our algorithm is also items that occur rarely together. For this reason, the
based on level wise search. Each item consists of proposed algorithm uses maximum confidence
attribute name and its value. Retrieving information constraint to form rules as it needs rule that has
of a 1-itemset, we make a new 1-itemset if this 1- high support in antecedent portion and has very low
itemset is not created already, otherwise update its support in itemset from which the rule is generated.
support. The non-action 1-itemset is selected if it has It selects a rule if its confidence is less or equal to
support greater or equal to minimum antecedent maximum confidence constraint. Moreover, it does
support. The action 1-itemset is selected whatever not use subset generation to the itemsets to form
support it has. By this way, 1-itemsets are explored rules. Here an itemset is partitioned into action item
which have high support for antecedent items and and non-action items. Action items are for
have arbitrary support for consequent items. consequent part and non-action items are for
65
4. antecedent part. Here each itemset is mapped to only
one rule.
Algorithm: Find itemsets which consist of non-action Procedure CalculateCandidatesSupport(Ck)
items with high support and action items with low 1, For each transaction t of Database
support based on candidate generation. 1.1 CalculateSupportFromOneTransactionFor-
Input: Database, minimum antecedent support, maximum Cadidates(Ck, t);
antecedent consequent support procedure CalculateSupportFromOneTransaction
Output : Itemsets which are strong candidates of variability. ForCadidates(Ck, t)
1. K=1, S = {Ø}; 1.Ct =Find the subsets of Ck which are candidate
2. Read the metadata about which attributes are action t
type and which are not. 2.1 c.count++
3. Ik = Select 1-itemsets either which consist of a Algorithm : Find Assosiation rules for Variability
non-action item and has support greater or equal to Finding
minimum antecedent support or which consists of Input: I (Vaiavility Itemsets), maximumConfidence
an action item. Output: R ( set of rules )
4. While(Ik { 1. R = Ø
4.1 K++; 2. For each X I
4.2 Ck = Candidate_generation(Ik-1) 2.1 Antecedent set AS = (as1, as2 n){
4.3 CalculateCandidatesSupport(Ck) where asi X and AC(asi
4.4 Ik = SelectDesiredItemSetFromCandidates 2.2 Consequent set CS = (cs1, cs2 n){
(CK, Sk , MAS, MACS); where csi X and AC(csi
4.5 S = S U Sk 2.3 if (support (AS CS)/Support (AS)) <=
5. return S maximum confidence
procedure SelectDesiredItemSetFromCandidates 2.3.1 AS CS is a valid rule.
(CK , Sk , MAS, MACS) 2.3.2 R = R U (AS CS)
1. For each Itemset c Ck procedure Candidate_generation(Ik-1)
1.1 If c contains only non-action items 1.For each Itemset i1 Ik-1
1.1. 1 If c.support >= MAS 1.1 For each Itemset i2 Ik-1
1.1.2 Add it to I 1.1.1 Newcandidate, NC = Union(i1,i2);
1.2 else if c contains one or more action items 1.1.2 If Size of NC is k
with non-action items. 1.1.2.1 If NC contains one or more action items
1.2.1 If c.support <= MACS 1.1.2.1.1 Add it to Ck if every subset of
1.2.2 Add it to I & Sk non-action items is frequent.
1.3 If c contains only action items 1.1.2.2 else
1.3.1 Add it to I 1.1.2.2.1 If every subset of NC is frequent
2. return I 1.1.2.2.1.1 Add it to Ck
othewise remove it.
2. return Ck;
Figure 2. Association mining algorithm for finding irregular rule
5.3.1. Lemma 1. Number of rules is equal to number maximum confidence support. Action item and non
of desired itemsets and number of discarded rules = action item of a desired itemset is mapped to
mp S where S is the number of desired itemsets. antecedent items and consequent items of a rule. So
Proof: A single desired itemset consists of action every desired itemset is mapped to a single valid
type items and non-action type items. Action items rule. Total rules = number of desired itemsets = S.
and non-action items are mapped to consequent and Let m is the average number of distinct value, each
antecedent parts respectively. Let I = { i1, i2 n} multidimensional attribute holds. P is the number of
be the set of items to be mined, where items can be attributes to be mined. Number of possible different
either action type or non-action type. Let AI = {ai1, rules = . Number of discarded rules =
ai2 u) be the set of action items to be mined. where S is the number of desired itemsets.
Let NAI= { nai1, nai2 v) be the set of non-
action items to be mined. Each nai has to have 6. Results and discussion
confidence greater than minimum confidence support
to be included as 1- itemset and all ai are included as The experiments were done using PC with core 2
1- itemset. Let, C= {c1,c2,c3 n} be the set of duo processor with a clock rate of 1.8 GHz and 3GB
candidate itemsets. A new candidate NC is added to of main memory. The operating system was
C if the non-action part of NC named NCNA holds Microsoft Vista and implementation language was
the following property: support (each subset of c#. We used a patient dataset to verify our method.
NCNA) >= minimum antecedent support. A The dataset contains items, which are either actions
candidate c is selected for rule generation if and only that include decision, diagnosis and cost or non-
actions that include lab tests, any symptom of patient
66
5. and any criterion of disease. Each instance represents which has 50273 instances and 514 attributes
the data of one patient. We have filtered out (included 150 discrete and 364 numerical attributes).
instances which has noisy or missing values. The All these data are converted into mineable items
data set of interest has collected and preprocessed (integer representation) using domain dictionary and
from the different local hospitals of Bangladesh, rule base.
2000 6
times in secs
Number of rules
1500 A priori 4 MAS = .7
1000 MACS = .1
500 2
Proposed MAS = .85
0 0
Algorithm MACS = .5
10K 25K 50K 2 5 10
Number of transactions Maximum Confidence
Figure 3. Time comparison of Apriori and Figure 4. Number of rules based on maximum
proposed algorithm for the patient dataset confidence
2000 2000
Time(in seconds)
times in secs
1500 1500 MAS = .85
MACS = .1
1000 1000 MC = .1
MC = .05
500 500
MACS = .05 MAS = .70
0 0
MC = .05 MC = .1
90% 70% 50% 3% 5% 10%
minimum antecedent support MACS
Figure 5. Time comparison of different Figure 6. Time comparison of different maximum
maximum antecedent supports antecedent consequent supports
1 1
Accuracy
Accuracy
MACS = .1 MAS = .85
0.5 0.5 MCS = .05
MC = .1
MACS = .1 MAS = .7
0 0
MC = .05 MCS = .1
50% 70% 90% 10% 5% 2%
Maximum confidence
minimum antecedent support
Figure 7. Accuracy of the proposed algorithm Figure 8. Accuracy of the proposed algorithm
based on irregular metric based on maximum Confidence
Table 1. Test result for patient dataset on minimum antecedent support, maximum
Minimum antecedent support 70% 85% antecedent consequent support and checking the
Maximum antecedent 10% 5%
consequent support -action items. Figure 4 presents if
Maximum confidence 10% 5% maximum confidence (MC) increases, number of
Number of desired itemsets 49 31 valid rules increases. Figure 5 shows how time is
Number of Desired rules 5 3 varied with different minimum antecedent support
Time (Seconds) 922.2 1634.56 (MAS) values for irregular rule finding algorithm.
Table 1 shows test result for patient dataset, after Here we measured the performance of irregular rule
running the program of the proposed algorithm with finding algorithm in terms of MAS keeping MACS,
different parameters. Second column of the table MC, number of action items, number of non-action
presents the test result, where we used minimum items constant. Time is not varied significantly
antecedent support of 70%, maximum antecedent because MAS has no lead to reduce disk access as
consequent support of 10% and maximum the patient data set has all sizes of candidates for
confidence of 10%. 49 desired itemsets were these MAS values. It has only lead to the number of
generated in total. 3 rules were discovered in total. It valid candidate generations and it can save some
took about 922.2013 seconds to find these rules. CPU time. As it has lead to the CPU time, the three
Third column of the table presents the test result, different cases take slightly different time.
where we used minimum antecedent support of 85%, Figure 6 shows how time is varied with different
maximum antecedent consequent support of 5% and MACS by keeping MAS , MC, number of action
maximum confidence of 5%. 31 desired itemsets items, number of non-action items constant. Time is
were generated in total. 5 rules were discovered in not varied significantly because MACS has no lead
total. It took about 1634.5634 seconds to find these to reduce disk access as the patient data set has all
rules. sizes of candidates for these MAS values. It has only
Figure 3 shows Apriori has taken significant lead to the number of valid candidate generations
higher time compared to the proposed algorithm. It is and it can save some CPU time. As it has lead to the
because pruning in the proposed algorithm is based CPU time, the three different cases take slightly
different time. As maximum consequent support
67
6. decreases, number of valid candidate generation New York, NY, USA, 1995, pp. 175 - 186.
decreases For this reason, case with 5% MACS takes [6] A. Savasere, E. Omiecinski, and S. B. Navathe, "An
more time than case with 3% MACS and case with Efficient Algorithm for Mining Association Rules in
10% MACS takes more time than case with 5% Large Databases," in Proceedings of the 21th
MACS. Figure 7 illustrates accuracy results for our International Conference on Very Large Data Bases,
1995, pp. 432 - 444.
proposed algorithm based on minimum antecedent
[7] B. Liu, W. Hsu, and Y. Ma, "Mining Association
support. The value of minimum antecedent support Rules with Multiple Minimum Supports.," in
for each presented result is also indicated. The figure SIGKDD Explorations, 1999, pp. 337--341.
presents MAS has no lead in accuracy, as it is not [8] H. Yun, D. Ha, B. Hwang, and K. H. Ryu, "Mining
used as a parameter in selecting valid candidate and association rules on significant rare data using relative
rules. Figure 8 illustrates accuracy results for our support.," Journal of Systems and Software archive,
proposed algorithm based on maximum confidence. vol. 67, no. 3, pp. 181 - 191, 2003.
The figure presents maximum confidence has lead in [9] M. Hahsler, "A Model-Based Frequency Constraint
accuracy as it is used as parameter in selecting valid for Mining Associations from Transaction Data.,"
rules. As maximum confidence decreases, accuracy Data Mining and Knowledge Discovery, vol. 13, no.
2, pp. 137 - 166, 2006.
increases and the number of discovered rules
[10] L. Zhou and S. Yau, "Association rule and
decreases. It is because less confidence indicates that
quantitative association rule mining among infrequent
antecedent and consequent occurs rarely together in items," in International Conference on Knowledge
the dataset. Discovery and Data Mining, San Jose, California,
2007, pp. 156-167.
7. Conclusion [11] R. U. Kiran and P. K. Reddy, "An improved multiple
minimum support based approach to mine rare
Irregular patterns represent wrong decision, association rules.," in The IEEE Symposium on
illegal practice and variability in decision. In this Computational Intelligence and Data Mining,
Nashville, TN, USA, 2009, pp. 340-347.
paper, we propose a level wise search algorithm that
[12] S. Brin, R. Motwani, and C. Silverstein, "Beyond
works based on action and non-action type data to Market Baskets: Generalizing Association Rules to
find irregular association rule. The proposed Correlations," in In The Proceedings of SIGMOD,
algorithm has been applied to a real world patient AZ,USA, 1997, pp. 265-276.
data set. We have shown significant accuracy in the [13] X. Wu, C. Zhang, and S. Zhang, "Efficient Mining of
output of the proposed algorithm. Although we have Both Positive and Negative Association Rules," ACM
used level-wise search for finding irregular patterns, Transactions on Information Systems, vol. 22, no. 3,
each step of our algorithm is different from any other p. 381 405, 2004.
level-wise search algorithm. Rules generation from [14] M. L. Antonie and O. R. Zaïane, "Mining positive and
desired item sets is also different from conventional negative association rules: an approach for confined
rules," in Proceedings of the 8th European
association mining algorithms.
Conference on Principles and Practice of Knowledge
Discovery in Databases, Pisa, Italy, 2004, pp. 27 - 38.
8. References [15] P. A. Ortega, C. J. Figueroa, and G. A. Ruz, "A
Medical Claim Fraud/Abuse Detection System based
[1] R. Agrawal, T. . Swami, "Mining on Data Mining: A Case Study in Chile," in DMIN,
Association Rules between Sets of Items in Very 2006, pp. 224-231.
Large Databases," in Proceedings of the 1993 ACM [16] W. S. Yanga and S. Y. Hwangb, "A Process-Mining
SIGMOD international conference on Management of Framework for the Detection of Healthcare Fraud and
data, Washington, D.C., 1993, pp. 207-216. Abuse.," Expert Systems with Applications, vol. 31,
[2] R. Agrawal and R. Srikant, "Fast Algorithms for no. 1, p. 56 68, July 2006.
Mining Association Rules in Large Databases," in
Proceedings of the 20th International Conference on
Very Large Data Bases, San Francisco, CA, USA,
1994, pp. 487 - 499.
[3] S. Brin, R. Motwani, J. D. Ullman, and Shalom Tsur,
"Dynamic Itemset Counting and Implication Rules for
Market Basket Data," in Proceedings of the 1997
ACM SIGMOD international conference on
Management of data, Tucson, Arizona, United States,
1997, pp. 255-264.
[4] H. Mannila, H. Toivonen, and A. I. Verkamo,
"Efficient Algorithms for Discovering Association
Rules," in AAAI Workshop on Knowledge Discovery
in Databases, 1994, pp. 181-192.
[5] J. S. Park, M. S. Chen, and P. S. Yu, "An Effctive
Hash based Algorithm for mining association rules,"
in Prof. ACM SIGMOD Conf Management of Data,
68