This document proposes an algorithm to discover both asymmetric and symmetric relationships between medical attributes from patient data. Existing algorithms only find asymmetric relationships or relationships between frequent items. The proposed algorithm allows medical researchers to specify constraints like minimum support and confidence for groups of attributes, which attributes can appear in the antecedent, consequent, or both. It maps complex medical data like numbers and text to items. It generates candidate itemsets based on group constraints and uses support to find desired itemsets. The goal is to find meaningful symmetric relationships between specified medical attributes.
This document summarizes a research paper that proposes using machine learning algorithms and natural language processing techniques to extract disease and treatment information from medical texts. It discusses using a pipeline of tasks including identifying relevant sentences, representing the sentences, and classifying the relationships between diseases and treatments. The paper reviews previous work on using algorithms like naive Bayes and conditional naive Bayes for information extraction and relation classification. It proposes applying machine learning and natural language processing to biomedical texts from sources like Medline to automatically extract symptoms, causes, and treatments for diseases specified in user queries. This extracted information could help doctors and patients by providing structured medical knowledge in a time-saving manner.
Evidential reasoning based decision system to select health care locationIJAAS Team
The general public’s demand of Bangladesh for safe health is rising promptly with the improvement of the living standard. However, the allocation of limited and unbalanced medical resources is deteriorating the assurance of safe health of the people. Therefore, the new hospital construction with rational allocation of resources is imminent and significant. The site selection for establishing a hospital is one of the crucial policy-related decisions taken by planners and policy makers. The process of hospital site selection is inherently complicated because of this involves many factors to be measured and evaluated. These factors are expressed both in objective and subjective ways where as a hierarchical relationship exists among the factors. In addition, it is difficult to measure qualitative factors in a quantitative way, resulting incompleteness in data and hence, uncertainty. Besides it is essential to address the subject of uncertainty by using apt methodology; otherwise, the decision to choose a suitable site will become inapt. Therefore, this paper demonstrates the application of a novel method named belief rulebased inference methodology-RIMER base intelligent decision system(IDS), which is capable of addressing suitable site for hospital by taking account of large number of criteria, where there exist factors of both subjective and objective nature.
Data mining techniques on heart failure diagnosisSteve Iduye
The document discusses using data mining techniques to diagnose coronary artery disease (CAD) through three case studies. Case 1 uses association rule mining on the Cleveland dataset to identify risk factors for CAD. Case 2 uses decision trees and bagging algorithms on laboratory and echocardiography features to diagnose CAD. Case 3 applies classification algorithms like SMO and Naive Bayes as well as feature selection and creation to the Z-Alizadeh Sani dataset to predict artery stenosis. The studies demonstrate how data mining can effectively analyze medical data and extract rules to diagnose CAD.
This document describes a decision support system (DSS) that uses the Apriori algorithm, genetic algorithm, and fuzzy logic to analyze medical data and make accurate diagnostic decisions. The DSS first uses Apriori to extract association rules from pre-processed medical data. It then applies a genetic algorithm to optimize the results and determine optimal attribute values. Finally, it employs fuzzy logic for decision-making based on the optimized attribute values. The authors tested their DSS on diabetes data and found the results to be interesting. Their proposed system aims to help medical professionals make quicker and more accurate diagnostic decisions.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Analysis on Data Mining Techniques for Heart Disease DatasetIRJET Journal
This document analyzes various data mining techniques for classifying heart disease datasets. It compares the performance of classification algorithms like decision trees and lazy learning on aspects like time taken to build models. The algorithms are tested on a heart disease dataset from a public repository using the KEEL data mining tool. Decision trees and k-nearest neighbors are implemented using distance functions like Euclidean and HVDM across different validation modes. The results show that k-nearest neighbors with no validation is the most efficient algorithm for predicting heart disease, taking the least time to build models of the dataset. The study aims to determine the optimal classification algorithm for heart disease prediction systems.
Advanced Statistical Manual for Ayurveda ResearchAyurdata
These slides covers more advanced statistical applications including that in data science.
The mode of presentation is that the concept is introduced first, followed by illustration and the use in a real context.
This describes the techniques that are used for prediction of heart diseases using the concept of data mining.It states about IHPDS(Intelligent Heart Disease Prediction System)
This document summarizes a research paper that proposes using machine learning algorithms and natural language processing techniques to extract disease and treatment information from medical texts. It discusses using a pipeline of tasks including identifying relevant sentences, representing the sentences, and classifying the relationships between diseases and treatments. The paper reviews previous work on using algorithms like naive Bayes and conditional naive Bayes for information extraction and relation classification. It proposes applying machine learning and natural language processing to biomedical texts from sources like Medline to automatically extract symptoms, causes, and treatments for diseases specified in user queries. This extracted information could help doctors and patients by providing structured medical knowledge in a time-saving manner.
Evidential reasoning based decision system to select health care locationIJAAS Team
The general public’s demand of Bangladesh for safe health is rising promptly with the improvement of the living standard. However, the allocation of limited and unbalanced medical resources is deteriorating the assurance of safe health of the people. Therefore, the new hospital construction with rational allocation of resources is imminent and significant. The site selection for establishing a hospital is one of the crucial policy-related decisions taken by planners and policy makers. The process of hospital site selection is inherently complicated because of this involves many factors to be measured and evaluated. These factors are expressed both in objective and subjective ways where as a hierarchical relationship exists among the factors. In addition, it is difficult to measure qualitative factors in a quantitative way, resulting incompleteness in data and hence, uncertainty. Besides it is essential to address the subject of uncertainty by using apt methodology; otherwise, the decision to choose a suitable site will become inapt. Therefore, this paper demonstrates the application of a novel method named belief rulebased inference methodology-RIMER base intelligent decision system(IDS), which is capable of addressing suitable site for hospital by taking account of large number of criteria, where there exist factors of both subjective and objective nature.
Data mining techniques on heart failure diagnosisSteve Iduye
The document discusses using data mining techniques to diagnose coronary artery disease (CAD) through three case studies. Case 1 uses association rule mining on the Cleveland dataset to identify risk factors for CAD. Case 2 uses decision trees and bagging algorithms on laboratory and echocardiography features to diagnose CAD. Case 3 applies classification algorithms like SMO and Naive Bayes as well as feature selection and creation to the Z-Alizadeh Sani dataset to predict artery stenosis. The studies demonstrate how data mining can effectively analyze medical data and extract rules to diagnose CAD.
This document describes a decision support system (DSS) that uses the Apriori algorithm, genetic algorithm, and fuzzy logic to analyze medical data and make accurate diagnostic decisions. The DSS first uses Apriori to extract association rules from pre-processed medical data. It then applies a genetic algorithm to optimize the results and determine optimal attribute values. Finally, it employs fuzzy logic for decision-making based on the optimized attribute values. The authors tested their DSS on diabetes data and found the results to be interesting. Their proposed system aims to help medical professionals make quicker and more accurate diagnostic decisions.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Analysis on Data Mining Techniques for Heart Disease DatasetIRJET Journal
This document analyzes various data mining techniques for classifying heart disease datasets. It compares the performance of classification algorithms like decision trees and lazy learning on aspects like time taken to build models. The algorithms are tested on a heart disease dataset from a public repository using the KEEL data mining tool. Decision trees and k-nearest neighbors are implemented using distance functions like Euclidean and HVDM across different validation modes. The results show that k-nearest neighbors with no validation is the most efficient algorithm for predicting heart disease, taking the least time to build models of the dataset. The study aims to determine the optimal classification algorithm for heart disease prediction systems.
Advanced Statistical Manual for Ayurveda ResearchAyurdata
These slides covers more advanced statistical applications including that in data science.
The mode of presentation is that the concept is introduced first, followed by illustration and the use in a real context.
This describes the techniques that are used for prediction of heart diseases using the concept of data mining.It states about IHPDS(Intelligent Heart Disease Prediction System)
Optimized Column-Oriented Model: A Storage and Search Efficient Representatio...razanpaul
The document proposes a new data model called Optimized Column-Oriented Model (OCOM) to represent medical data. OCOM aims to be more storage and search efficient compared to the commonly used Entity-Attribute-Value (EAV) model. It stores data in a column-oriented format and uses a compact representation that concatenates the position and value of non-null elements into a single integer. This allows for efficient binary search and retrieval compared to EAV, which stores attribute names as data. An experimental evaluation showed that OCOM occupies less storage space and is more search efficient than EAV for medical data warehousing queries.
Clustering Medical Data to Predict the Likelihood of Diseasesrazanpaul
This document proposes a constraint k-Means-Mode clustering algorithm to predict the likelihood of diseases using medical data containing both continuous and categorical attributes. It first maps complex medical data to mineable items using domain dictionaries and rule bases. The developed algorithm can handle both continuous and discrete data, perform clustering based on anticipated likelihood attributes with core disease attributes, and was tested on a real-world patient dataset to demonstrate its effectiveness.
Search Efficient Representation of Healthcare Data based on the HL7 RIMrazanpaul
This document proposes a search efficient data model called Optimized Entity Attribute Value (OEAV) to represent healthcare data based on the HL7 Reference Information Model (RIM). OEAV aims to address the challenges of sparse, high-dimensional data with frequent schema changes in a more efficient manner than the commonly used Entity Attribute Value (EAV) model. The document describes the limitations of EAV, presents the organizational structure and benefits of OEAV, and evaluates its performance compared to EAV through implementation and analysis.
East Aurora has a strong basketball team that will be a formidable opponent. Their record and past performances indicate they are among the best teams in the league. This upcoming game will be a major test for the team and winning will require playing their best game of the season.
Mining Irregular Association Rules based on Action & Non-action Type Datarazanpaul
This document proposes an algorithm to efficiently discover irregular association rules from large databases. Irregular rules represent patterns that occur rarely together, such as wrong decisions, illegal practices, or variability in decisions. The algorithm treats items as either actions (decisions, actions, outputs) or non-actions (facts, statements, criteria). It finds rules where the antecedent contains non-action items that occur frequently and the consequent contains rare action items. This approach can detect fraud, abuse, or other irregularities more effectively than methods that only consider item frequency. The algorithm's effectiveness is demonstrated on real patient data.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
This document provides an overview of data mining techniques and tools. It discusses data mining processes like predictive and descriptive data mining. It describes various data mining tasks such as classification, clustering, regression, and association rule learning. It then examines specific techniques for prediction using data mining, including classification analysis, association rule learning, decision trees, neural networks, and clustering analysis. Finally, it reviews several popular open-source tools that can be used to implement these data mining techniques, such as RapidMiner, Oracle Data Mining, IBM SPSS Modeler, KNIME, Python, Orange, Kaggle, Rattle, and Weka.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper that proposes a machine learning approach to identify disease-treatment relationships from biomedical text. It extracts sentences mentioning diseases and treatments from medical publications and classifies the semantic relationships between them. The researchers evaluate their methodology on a dataset of sentences annotated with diseases, treatments and their relationships. Their results show the machine learning models can reliably extract this information and outperform previous methods on the same data. The proposed approach could be integrated into applications to disseminate healthcare information from published literature to medical professionals and patients.
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
As we know that health care industry is completely based on assumptions, which after get tested and verified via various tests and patient have to be depend on the doctors knowledge on that topic . so we made a system that uses data mining techniques to predict the health of a person based on various medical test results. so we can predict the health of that person based on that analysis performed by the system.The system currently design only for heart issues, for that we had used Statlog (Heart) Data Set from UCI Machine Learning Repository it includes attributes like age, sex, chest pain type, cholesterol, sugar, outcomes,etc.for training the system. we only need to passed few general inputs in order to generate the prediction and the prediction results from all algorithms are they merged together by calculating there mean value that value shows the actual outcome of the prediction process which entirely works in background
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Surveyijtsrd
The Healthcare exchange generally clinical diagnosis is ended commonly by doctor's knowledge and practice. Computer Aided Decision Support System plays a major task in the medical field. Data mining provides the methodology and technology to modify these rises of data into valuable data for decision making. By utilizing data mining techniques it requires less time for the prediction of the diseases with more accuracy. Among the expanding research on coronary diseases predicting system, it has happened significant to classifications the exploration results and gives readers with a layout of the current coronary diseases forecast strategies in every discussion. Data mining tools can respond to exchange addresses that expectedly being used much time over riding to decide. In this paper we study different papers in which at least one algorithm of data mining used for the prediction of coronary diseases. As of the study it is observed that Naïve Bayes Technique increase the accuracy of the coronary diseases prediction system. The commonly used techniques for Heart Disease Prediction and their complexities are outlined in this paper. D. Haripriya | Dr. M. Lovelin Ponn Felciah "Prognosis of Cardiac Disease using Data Mining Techniques: A Comprehensive Survey" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26605.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26605/prognosis-of-cardiac-disease-using-data-mining-techniques-a-comprehensive-survey/d-haripriya
Android Based Questionnaires Application for Heart Disease Prediction Systemijtsrd
Today classification techniques in data mining are most popular to prediction and data exploration. This Heart Disease Prediction System HDPS is using Naive Bayesian Classification with a comparison for simple probability and that of Jelinek Mercer JM Smoothing. It is implemented as an Android based application user must be feedback and answers the questions then can be seen the result as user desired in different ways exactly heart disease is present or not and then with predictions No, Low, Average, High, Very High . And the system will be provided required suggestions such as doctor details and medications to patients could be able. It will be also proved that enhanced Naive Bayes with Jelinek Mercer smoothing technique is also effective to eliminate the noise for prediction the heart disease. This system can also calculate classifier accuracy by using precision and recall. Nan Yu Hlaing | Phyu Pyar Moe "Android Based Questionnaires Application for Heart Disease Prediction System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26750.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26750/android-based-questionnaires-application-for-heart-disease-prediction-system/nan-yu-hlaing
1Deliverable 3 - Evaluate Research and DataAttempt 2EttaBenton28
The research question examined how AI integration in clinical radiology could disrupt the industry. Two articles presented credible data by citing authors with advanced degrees from reputable institutions. Participants generally agreed AI improved accuracy and efficiency in radiology IT departments. However, some radiologists viewed AI negatively as a potential job threat. More research is needed to validate AI operations and address professional concerns to facilitate acceptance.
Metadata from electronic health records was used to evaluate whether physicians in training reviewed patient radiographic images before presenting cases to attending surgeons. The metadata automatically recorded the unique identifiers of physicians when they accessed patient records and images, allowing researchers to analyze viewing patterns over a 5-month period. This demonstrated that metadata can profile clinical practices and replace traditional observation methods for evaluating physician behavior.
This document proposes a conceptual model for automatically matching individuals with health researchers for research studies using electronic medical record data. The model involves selecting relevant medical measurements for a "candidate" research participant, filtering individuals based on rules, reducing the data dimensions using principal component analysis, and calculating similarity between individuals' medical data using similarity coefficients. A simulation applies the model to a medical data set and demonstrates that it can significantly reduce the data needed to automatically match individuals for health research.
Using rule based classifiers for the predictive analysis of breast cancer rec...Alexander Decker
The document discusses using rule-based classifiers to predict breast cancer recurrence. It analyzed 286 cancer patient records using data mining tools including RIPPER, decision trees (DT), and decision tables with naive Bayes (DTNB). Experimental results found DTNB provided the most accurate predictions of recurrence compared to the other classifiers. The generated rule set from DTNB can be used to label new patients as developing or not developing recurrence based on their characteristics, assisting doctors in making faster decisions.
11.using rule based classifiers for the predictive analysis of breast cancer ...Alexander Decker
The document discusses using rule-based classifiers to predict breast cancer recurrence. It analyzed 286 cancer patient records using data mining tools including RIPPER, decision trees (DT), and decision tables with naive Bayes (DTNB). Experimental results found DTNB provided the most accurate predictions of recurrence compared to the other classifiers. The generated rule set from DTNB can be used to label new patients as developing or not developing recurrence based on their characteristics, assisting doctors in making faster decisions.
The document describes a proposed clinical decision support system that uses k-means clustering and an artificial neural network with particle swarm optimization to classify patient data and determine diagnoses. It begins with background on clinical decision making and existing systems. It then outlines the proposed system, which involves clustering patient data using k-means, and training an artificial neural network using particle swarm optimization and backpropagation to classify new patient data and determine optimal treatment. The combination of these techniques is meant to improve accuracy, efficiency, time consumption and costs compared to other methods.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
This document discusses clustering dichotomous health care data using the K-means algorithm after transforming the data using Wiener transformation. It begins with an introduction to dichotomous data and the challenges of clustering medical data. It then describes the K-means clustering algorithm and various distance measures used for binary data clustering. The document proposes using Wiener transformation to first transform binary data to real values before applying K-means clustering. It evaluates the results on a lens dataset using inter-cluster and intra-cluster distances, finding the transformed data yields better clusters than the original binary data according to these metrics.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
Dichotomous data is a type of categorical data, which is binary with categories zero and one. Health care data is one of the heavily used categorical data. Binary data are the simplest form of data used for heath care databases in which close ended questions can be used; it is very efficient based on computational efficiency and memory capacity to represent categorical type data. Clustering health care or medical data is very tedious due to its complex data representation models, high dimensionality and data sparsity. In this paper, clustering is performed after transforming the dichotomous data into real by wiener transformation. The proposed algorithm can be usable for determining the correlation of the health disorders and symptoms observed in large medical and health binary databases. Computational results show that the clustering based on Wiener transformation is very efficient in terms of objectivity and subjectivity.
AI and Big Data in Psychiatry: An Introduction and OverviewCarlo Carandang
Dr. Carlo Carandang, a psychiatrist and data scientist, talks about how Big Data can be implemented into clinical psychiatric practice to improve patient care and reduce costs. Dr. Carandang introduces Big Data topics, Big Data systems, machine learning algorithms, and AI psychiatry applications. Dr. Carandang presented this talk at the 2019 Presidential Symposium in Washington, DC, sponsored by the Washington Psychiatric Society.
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...DNA Compass
World DNA Day and Genome Day, Dalian China 2011
"Possible Solution for Managing the Worlds Genetic Data" given by Alice Rathjen, Founder & President DNA Guide, Inc.
Proposes genetic tests be given a rating for quality of science, medical utility and viewing risk so as to facilitate the flow of genetic information in a responsible manner from the lab to the physician and patient. Explains how technology combined with public policy could enable both privacy and personalized medicine to thrive. Advocates individual ownership over personal genetic data and suggests the genome as a data format could provide the foundation for digital human rights.
tags: DNA, genetic testing, privacy, personalized medicine, FDA regulation
Optimized Column-Oriented Model: A Storage and Search Efficient Representatio...razanpaul
The document proposes a new data model called Optimized Column-Oriented Model (OCOM) to represent medical data. OCOM aims to be more storage and search efficient compared to the commonly used Entity-Attribute-Value (EAV) model. It stores data in a column-oriented format and uses a compact representation that concatenates the position and value of non-null elements into a single integer. This allows for efficient binary search and retrieval compared to EAV, which stores attribute names as data. An experimental evaluation showed that OCOM occupies less storage space and is more search efficient than EAV for medical data warehousing queries.
Clustering Medical Data to Predict the Likelihood of Diseasesrazanpaul
This document proposes a constraint k-Means-Mode clustering algorithm to predict the likelihood of diseases using medical data containing both continuous and categorical attributes. It first maps complex medical data to mineable items using domain dictionaries and rule bases. The developed algorithm can handle both continuous and discrete data, perform clustering based on anticipated likelihood attributes with core disease attributes, and was tested on a real-world patient dataset to demonstrate its effectiveness.
Search Efficient Representation of Healthcare Data based on the HL7 RIMrazanpaul
This document proposes a search efficient data model called Optimized Entity Attribute Value (OEAV) to represent healthcare data based on the HL7 Reference Information Model (RIM). OEAV aims to address the challenges of sparse, high-dimensional data with frequent schema changes in a more efficient manner than the commonly used Entity Attribute Value (EAV) model. The document describes the limitations of EAV, presents the organizational structure and benefits of OEAV, and evaluates its performance compared to EAV through implementation and analysis.
East Aurora has a strong basketball team that will be a formidable opponent. Their record and past performances indicate they are among the best teams in the league. This upcoming game will be a major test for the team and winning will require playing their best game of the season.
Mining Irregular Association Rules based on Action & Non-action Type Datarazanpaul
This document proposes an algorithm to efficiently discover irregular association rules from large databases. Irregular rules represent patterns that occur rarely together, such as wrong decisions, illegal practices, or variability in decisions. The algorithm treats items as either actions (decisions, actions, outputs) or non-actions (facts, statements, criteria). It finds rules where the antecedent contains non-action items that occur frequently and the consequent contains rare action items. This approach can detect fraud, abuse, or other irregularities more effectively than methods that only consider item frequency. The algorithm's effectiveness is demonstrated on real patient data.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
This document provides an overview of data mining techniques and tools. It discusses data mining processes like predictive and descriptive data mining. It describes various data mining tasks such as classification, clustering, regression, and association rule learning. It then examines specific techniques for prediction using data mining, including classification analysis, association rule learning, decision trees, neural networks, and clustering analysis. Finally, it reviews several popular open-source tools that can be used to implement these data mining techniques, such as RapidMiner, Oracle Data Mining, IBM SPSS Modeler, KNIME, Python, Orange, Kaggle, Rattle, and Weka.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes a research paper that proposes a machine learning approach to identify disease-treatment relationships from biomedical text. It extracts sentences mentioning diseases and treatments from medical publications and classifies the semantic relationships between them. The researchers evaluate their methodology on a dataset of sentences annotated with diseases, treatments and their relationships. Their results show the machine learning models can reliably extract this information and outperform previous methods on the same data. The proposed approach could be integrated into applications to disseminate healthcare information from published literature to medical professionals and patients.
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
As we know that health care industry is completely based on assumptions, which after get tested and verified via various tests and patient have to be depend on the doctors knowledge on that topic . so we made a system that uses data mining techniques to predict the health of a person based on various medical test results. so we can predict the health of that person based on that analysis performed by the system.The system currently design only for heart issues, for that we had used Statlog (Heart) Data Set from UCI Machine Learning Repository it includes attributes like age, sex, chest pain type, cholesterol, sugar, outcomes,etc.for training the system. we only need to passed few general inputs in order to generate the prediction and the prediction results from all algorithms are they merged together by calculating there mean value that value shows the actual outcome of the prediction process which entirely works in background
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Surveyijtsrd
The Healthcare exchange generally clinical diagnosis is ended commonly by doctor's knowledge and practice. Computer Aided Decision Support System plays a major task in the medical field. Data mining provides the methodology and technology to modify these rises of data into valuable data for decision making. By utilizing data mining techniques it requires less time for the prediction of the diseases with more accuracy. Among the expanding research on coronary diseases predicting system, it has happened significant to classifications the exploration results and gives readers with a layout of the current coronary diseases forecast strategies in every discussion. Data mining tools can respond to exchange addresses that expectedly being used much time over riding to decide. In this paper we study different papers in which at least one algorithm of data mining used for the prediction of coronary diseases. As of the study it is observed that Naïve Bayes Technique increase the accuracy of the coronary diseases prediction system. The commonly used techniques for Heart Disease Prediction and their complexities are outlined in this paper. D. Haripriya | Dr. M. Lovelin Ponn Felciah "Prognosis of Cardiac Disease using Data Mining Techniques: A Comprehensive Survey" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26605.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26605/prognosis-of-cardiac-disease-using-data-mining-techniques-a-comprehensive-survey/d-haripriya
Android Based Questionnaires Application for Heart Disease Prediction Systemijtsrd
Today classification techniques in data mining are most popular to prediction and data exploration. This Heart Disease Prediction System HDPS is using Naive Bayesian Classification with a comparison for simple probability and that of Jelinek Mercer JM Smoothing. It is implemented as an Android based application user must be feedback and answers the questions then can be seen the result as user desired in different ways exactly heart disease is present or not and then with predictions No, Low, Average, High, Very High . And the system will be provided required suggestions such as doctor details and medications to patients could be able. It will be also proved that enhanced Naive Bayes with Jelinek Mercer smoothing technique is also effective to eliminate the noise for prediction the heart disease. This system can also calculate classifier accuracy by using precision and recall. Nan Yu Hlaing | Phyu Pyar Moe "Android Based Questionnaires Application for Heart Disease Prediction System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26750.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26750/android-based-questionnaires-application-for-heart-disease-prediction-system/nan-yu-hlaing
1Deliverable 3 - Evaluate Research and DataAttempt 2EttaBenton28
The research question examined how AI integration in clinical radiology could disrupt the industry. Two articles presented credible data by citing authors with advanced degrees from reputable institutions. Participants generally agreed AI improved accuracy and efficiency in radiology IT departments. However, some radiologists viewed AI negatively as a potential job threat. More research is needed to validate AI operations and address professional concerns to facilitate acceptance.
Metadata from electronic health records was used to evaluate whether physicians in training reviewed patient radiographic images before presenting cases to attending surgeons. The metadata automatically recorded the unique identifiers of physicians when they accessed patient records and images, allowing researchers to analyze viewing patterns over a 5-month period. This demonstrated that metadata can profile clinical practices and replace traditional observation methods for evaluating physician behavior.
This document proposes a conceptual model for automatically matching individuals with health researchers for research studies using electronic medical record data. The model involves selecting relevant medical measurements for a "candidate" research participant, filtering individuals based on rules, reducing the data dimensions using principal component analysis, and calculating similarity between individuals' medical data using similarity coefficients. A simulation applies the model to a medical data set and demonstrates that it can significantly reduce the data needed to automatically match individuals for health research.
Using rule based classifiers for the predictive analysis of breast cancer rec...Alexander Decker
The document discusses using rule-based classifiers to predict breast cancer recurrence. It analyzed 286 cancer patient records using data mining tools including RIPPER, decision trees (DT), and decision tables with naive Bayes (DTNB). Experimental results found DTNB provided the most accurate predictions of recurrence compared to the other classifiers. The generated rule set from DTNB can be used to label new patients as developing or not developing recurrence based on their characteristics, assisting doctors in making faster decisions.
11.using rule based classifiers for the predictive analysis of breast cancer ...Alexander Decker
The document discusses using rule-based classifiers to predict breast cancer recurrence. It analyzed 286 cancer patient records using data mining tools including RIPPER, decision trees (DT), and decision tables with naive Bayes (DTNB). Experimental results found DTNB provided the most accurate predictions of recurrence compared to the other classifiers. The generated rule set from DTNB can be used to label new patients as developing or not developing recurrence based on their characteristics, assisting doctors in making faster decisions.
The document describes a proposed clinical decision support system that uses k-means clustering and an artificial neural network with particle swarm optimization to classify patient data and determine diagnoses. It begins with background on clinical decision making and existing systems. It then outlines the proposed system, which involves clustering patient data using k-means, and training an artificial neural network using particle swarm optimization and backpropagation to classify new patient data and determine optimal treatment. The combination of these techniques is meant to improve accuracy, efficiency, time consumption and costs compared to other methods.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
This document discusses clustering dichotomous health care data using the K-means algorithm after transforming the data using Wiener transformation. It begins with an introduction to dichotomous data and the challenges of clustering medical data. It then describes the K-means clustering algorithm and various distance measures used for binary data clustering. The document proposes using Wiener transformation to first transform binary data to real values before applying K-means clustering. It evaluates the results on a lens dataset using inter-cluster and intra-cluster distances, finding the transformed data yields better clusters than the original binary data according to these metrics.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
Dichotomous data is a type of categorical data, which is binary with categories zero and one. Health care data is one of the heavily used categorical data. Binary data are the simplest form of data used for heath care databases in which close ended questions can be used; it is very efficient based on computational efficiency and memory capacity to represent categorical type data. Clustering health care or medical data is very tedious due to its complex data representation models, high dimensionality and data sparsity. In this paper, clustering is performed after transforming the dichotomous data into real by wiener transformation. The proposed algorithm can be usable for determining the correlation of the health disorders and symptoms observed in large medical and health binary databases. Computational results show that the clustering based on Wiener transformation is very efficient in terms of objectivity and subjectivity.
AI and Big Data in Psychiatry: An Introduction and OverviewCarlo Carandang
Dr. Carlo Carandang, a psychiatrist and data scientist, talks about how Big Data can be implemented into clinical psychiatric practice to improve patient care and reduce costs. Dr. Carandang introduces Big Data topics, Big Data systems, machine learning algorithms, and AI psychiatry applications. Dr. Carandang presented this talk at the 2019 Presidential Symposium in Washington, DC, sponsored by the Washington Psychiatric Society.
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...DNA Compass
World DNA Day and Genome Day, Dalian China 2011
"Possible Solution for Managing the Worlds Genetic Data" given by Alice Rathjen, Founder & President DNA Guide, Inc.
Proposes genetic tests be given a rating for quality of science, medical utility and viewing risk so as to facilitate the flow of genetic information in a responsible manner from the lab to the physician and patient. Explains how technology combined with public policy could enable both privacy and personalized medicine to thrive. Advocates individual ownership over personal genetic data and suggests the genome as a data format could provide the foundation for digital human rights.
tags: DNA, genetic testing, privacy, personalized medicine, FDA regulation
This document discusses developing a pet care application using machine learning. It aims to use CNNs to predict dog breeds from photos and decision trees to predict diseases. Accurately identifying breeds and diseases early could help pet owners provide better care and save pet lives. The document reviews related work using deep learning for tasks like image classification. It proposes a system to first use a CNN for breed prediction from images then evaluate algorithms like decision trees for disease prediction from pet data. The goal is to integrate machine learning into veterinary healthcare to optimize treatment and enable early diagnosis.
Patterns discovered from based on collected molecular profiles of patient tumour samples, and also clinical metadata, could be used to provide personalized cancer treatment to patients with
similar molecular subtypes. Computational algorithms for cancer diagnosis, prognosis, and therapeutics that can recognize specific functions and aid in classifiers based on a plethora of
publicly accessible cancer research outcomes are needed. Machine learning, a branch of artificial intelligence, has a great deal of potential for problem solving in cryptic cancer
datasets, as per a literature study. We focus on the new state of machine learning applications in cancer research in this study, illustrating trends and analysing major accomplishments,
roadblocks, and challenges along the way to clinic implementation. In the context of noninvasive treating cancer using diet-based and natural biomarkers, we propose a novel machine learning algorithm.
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
Data mining techniques are used for a variety of applications. In healthcare industry, datamining plays an important
role in predicting diseases. For detecting a disease number of tests should be required from the patient. But using data
mining technique the number of tests can be reduced. This reduced test plays an important role in time and performance.
This report analyses data mining techniques which can be used for predicting different types of diseases. This report reviewed
the research papers which mainly concentrate on predicting various disease
DISEASE PREDICTION SYSTEM USING SYMPTOMSIRJET Journal
This document describes a disease prediction system that uses machine learning techniques like naive bayes classification to predict diseases based on patient symptoms. The system collects medical data from various sources to build a dataset with 5000 rows and 133 columns. It preprocesses the data and builds a model using naive bayes classification that is able to accurately predict diseases from the test data with 100% accuracy based on a confusion matrix. The system architecture allows patients to input symptoms and receives a predicted disease and confidence score. It then refers the patient to doctors specialized in the predicted disease to enable online consultation.
Similar to Finding Symmetric Association Rules to Support Medical Qualitative Research (20)
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Ukraine
Під час доповіді відповімо на питання, навіщо потрібно підвищувати продуктивність аплікації і які є найефективніші способи для цього. А також поговоримо про те, що таке кеш, які його види бувають та, основне — як знайти performance bottleneck?
Відео та деталі заходу: https://bit.ly/45tILxj
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
In our second session, we shall learn all about the main features and fundamentals of UiPath Studio that enable us to use the building blocks for any automation project.
📕 Detailed agenda:
Variables and Datatypes
Workflow Layouts
Arguments
Control Flows and Loops
Conditional Statements
💻 Extra training through UiPath Academy:
Variables, Constants, and Arguments in Studio
Control Flow in Studio
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsScyllaDB
ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.
2. experts have the knowledge of how to map ranges of cardinality of attributes except continuous numeric
numerical data for each attribute to a series of items. data are not high in medical domain, these attribute
For example, there are certain conventions to values are mapped to integer values using medical
consider a person is young, adult, or elder with domain dictionaries. Therefore, the mapping process
respect to age. A set of rules is created for each is divided in two phases. Phase 1: a rule base is
continuous numerical attribute using the knowledge constructed based on the knowledge of medical
of medical domain experts. A rule engine is used to domain experts and dictionaries are constructed for
map continuous numerical data to items using these attributes where domain expert knowledge is not
developed rules. applicable, Phase 2: attribute values are mapped to
We have used domain dictionary approach to integer values using the corresponding rule base and
transform the data, for which medical domain expert the dictionaries.
knowledge is not applicable, to numerical form. As
Original Mapped Original Mapped
Generate dictionary for value value value value
each categorical attribute Headache 1 Yes 1
Fever 2 No 2
PatientActual Data
Age Smoke Diagnosis Dictionary of Dictionary of
ID Diagnosis attribute Smoke attribute
1020D 33 Yes Headache
1021D 63 No Fever Map to integer items using
rule base and dictionaries
Actual data
If age <= 12 then 1
Medical If 13<=age<=60 then 2
domain If 60 <=age then 3 Patient Age Smoke Diagnosis
knowledge If smoke = y then 1 ID
If smoke = n then 2 1020D 2 1 1
If Sex = M then 1
1021D 3 2 2
If Sex = F then 2
Rule Base Data suitable for Knowledge Discovery
Figure 1. Data transformation of medical data
3. The proposed algorithm Uninteresting relationships among medical
attributes are avoided in the candidate
The main theme of this algorithm is based on the generation phase which reduces number of
following two statements. Interesting relationships rules, finds out only interesting relationships
among various medical attributes are concealed in and makes the algorithm fast.
subsets of the attributes, but do not come out on all Confidence is not the perfect method to rank
attributes taken together. All interesting relationships symmetric medical relationships because it does not
among various medical attributes have not same account for the consequent frequency with the
support and confidence. The algorithm constructs a antecedent. For the ranking of medical relationship, a
candidate itemsets based on groups constraint and direct measure of association rule between variables
use the corresponding support of each group in is a perfect scheme. For a medical relationship s t
candidate selection process to discover all possible , s is a group of medical items where each item is
desired itemsets of that group. The goals of this constrained to be appear in antecedent or both and t
algorithm are the following: finding desired rules of is a group of medical attributes where each item is
medical researcher and running fast. The features of appear to be in consequent or both. Moreover,
this proposed algorithm are as follows: s t = Ø. For this relationship, the support is
It allows grouping of attributes to find defined as support = P s, t and the confidence is
relationship among medical attributes. This defined as = P s, t /P t where P is the probability.
provides control on the search process. The correlation coefficient (also known as the -
Minimum confidence and support can vary coefficient) measures the degree of relationship
from one group to another group. between two random variables by measuring the
One item can belong to several groups degree of linear interdependency. It is defined by the
Attributes are constrained to appear on either covariance between the two variables divided by
antecedent or consequent or both side of the their standard deviations:
rule. Cov(s, t)
st =
It does not generate subsets on full desired s t
itemset, but generates subsets for items that Here Cov(s, t) represents the covariance of the
can appear in both consequent and two variables and X and Y are stand for standard
antecedent.
82
3. deviation .The covariance measures how two belong to zero or more groups. 1-itemset is selected
variables change together: if it has support greater or equal to one of its
Cov s, t = P s, t P s P t corresponding group support. As medical attribute
As we know, standard deviation is the square root value contains patient information that is
of its variance and variance is a special case of multidimensional, the algorithm performs the count
covariance when the two variables are identical. operation by comparing the value of attributes
instead of determining presence or absence of values
s = Var s = Cov s, s
of attributes to calculate support.
= P s, s P s P(s) = P s P(s)2
Similarly, t = P t P(t) 2
3.1. Candidate Generation and Selection
P s, t P s P t
st =
P s P(s) 2 P t P(t)2 The intuition behind candidate generation of all
Here P s, t is the support of itemset consists of level-wise algorithms like Apriori is based on the
both s and t. Let the support of the itemset be Sst . following simple fact: Every subset of a frequent
Here p s and p t is the support of antecedent s and itemset is frequent so that they can reduce the
antecedent t respectively. Let the support of number of itemsets that have to be checked.
antecedent s and consequent t be Ss andSt . The value However, the idea behind candidate generation of
of Sst , Ss and St are computed during the desired proposed algorithm is every item in the itemset has
itemset generation of our proposed algorithm. Using to be in the same group. This idea makes the new
these values, we can calculate the correlation of candidates that consist of items in the same group
every medical relationship rule between a group of and keeps itemsets consist of both rare items and
medical items to another group of medical items. The high frequent items. If all the items in a new
correlation value will indicate medical researchers candidate set are in the same group, then it is
how strong a medical relationship is in perspective of selected as a valid candidate, otherwise the new
historical data. candidate is not added to valid candidate itemsets.
Sst Ss St Here for each group there are different support and
st = confidence. Each candidate itemset belongs to a
Ss Ss 2 St St 2 particular group. After finding group id of a
candidate itemset, the algorithm uses corresponding
So putting the value of , and in
support for candidate selection where as Apriori uses
association rule generation phase, we have found the
a single support threshold for all the candidate
single metric, correlation coefficient, to represent
itemsets. By this way, itemsets are explored which
how much antecedent and consequent are medically
are desired to medical researchers.
related with each other. For each medical
relationship or rule, this metric has been used to
indicate the degree of strong relationship between a
3.2. Generating association rules
group of items to another group of items to support
Let AC(item) be the function which returns one out
medical qualitative research. The ranges of values
of three values: 1 if item is constrained to be in the
for is between -1 and +1. If two variables are
antecedent of a rule, 2 if it is constrained to be in
independent then equals 0. When equals +1
the consequent and 0 if it can be in either. Using this
the variables are considered perfectly positively
function, itemset is partitioned into antecedent set,
correlated. A positive correlation is the evidence of a
consequent set and both set. Moreover, it does not
general tendency that when a group of attribute
use subset generation to itemsets to form rules like
values s for a patient happens, another group of
conventional association mining algorithm; it only
attribute values y for the same patient happens. More
uses subset generation to both set. Each subset of
positive value means the relationship is more strong.
both set is added in antecedent part in one rule and is
When equals -1 the variables are considered
added in consequent part in another rule. Each
perfectly negatively correlated.
itemset belongs to a particular group. In addition to,
Figure 2 shows the association-mining algorithm
there is a different confidence for each group
to support medical research. Like Apriori, our
whereas Apriori uses a single confidence for all the
algorithm is also based on level wise search. The
itemsets. After finding group id of an itemset, the
major difference in our proposed algorithm is
algorithm uses corresponding confidence to form
candidate generation process with Apriori. Each item
rules. By this way, rules are explored which are
consists of attribute name and its value. Having
desired of medical researchers.
retrieved information of a 1-itemset, we make a new
1-itemset if this 1-itemset is not created already,
otherwise update its support. The 1-itemset can
83
4. Algorithm: Find itemsets which has high support procedure SelectDesiredItemSetFromCandidates
and are in the same group. (CK, GroupSupports )
Input: Data and metadata files. k
Output : Itemsets which are desired to Medical 1.1 j=FindGroupNoWhichHasMinimum
Researchers. SupportIfMultipleGroupsExist (c)
1. K=1; 1.2 If c.support >= GroupSupports[j]
2. Read the metadata about which attributes can only 1.3 Add it to I
appear in the antecedent of a rule, can only appear 2. return I
in the consequent and can appear in either Algorithm : Find assosiation rules for decision
3. Read Groups Information along with each group supportability of medical reasearcher.
support and confidence from configuration file and Input: I : Itemsets , GroupConfidences
make dictionary , here key is the attribute number Output: R: Set of rules
and value is a list of group numbers on whcih the 1. R = Ø
corresponding attribute belongs to. 2. For each X I
4. Ik = Select 1-itemsets that have support greater or 2.1 j =FindGroupNoWhichHasMinimum
equal to one of its corresponding group support. ConfideceIfMultipleGroupsExist(X)
5. While(Ik 2.2 Both Set B = (b1, b2 n){ where bi
5.1 K++; X and AC(bi) = 0}
5.2 CK = Candidate_generation(Ik-1)
5.3 CalculateCandidatesSupport(Ck) where asi i)= 1}
5.4 Ik = SelectDesiredItemSetFromCandidates(CK, 2.4 Consequent set CS = (cs1, cs2 n){
GroupSupports) ; where csi X and AC(csi) = 2}
5.5 I = I U Ik 2.5 For each subset Y of B
6. return I 2.5.1 Y1 = B-Y;
procedure Candidate_generation(Ik-1: frequent (k-1) 2.5.2 AS1 =AS U Y
itemsets) 2.5.3 CS1 = CS U Y1
1. for each Itemset i1 k-1 2.5.4 if (support (AS1 CS1)/Support
1.1for each Itemset i2 k-1 (AS1)) >= GroupConfidences[j];
1.1.1 newcandidate, NC = Union(i1,i2); 2.5.4.1 AS1 CS1 is a valid rule.
1.1.2 if size of NC is k 2.5.4.2 R = R U (AS1 CS1)
1.1.2.1 isInSameGroup =TestWhetherAll- 2.5.5 AS2 =AS U Y1
TheItemsInSameGroup(NC) 2.5.6 CS2 = CS U Y
1.1.2.2 if (isInSameGroup == true) 2.5.7 if (support (AS2 CS2)/Support
1.1.2.2.1 add NC to Ck othewise (AS2)) >= GroupConfidences[j];
remove it. 2.5.7.1 AS2 CS2 is a valid rule.
2. return Ck; 2.5.7.2 R = R U (AS2 CS2)
Figure 2: Association mining algorithm to support medical research
determines number of items in a itemset. Number of
3.2.1. Lemma 1. Number of rules is equal to
k L(D 2i ) rules from D =2 ( 2 ) . So total number of rules =
i=1 2 where k is the number of desired ( 2 )
itemsets and L is function, which determines number =1 2 where k is the number of desired
itemsets. Let m is the average number of distinct
of items in an itemset. D2 is the both set. Number of
k value, each multidimensional attribute holds. P is the
discarded rules = mp i=1 2
L(D 2i )
. number of attributes. Number of possible different
Proof: Let I = {i1, i2 n} be the set of items. Let rules = . Number of discarded rules =
G= {g1,g2,g3 q} be the set of groups. Let R= =1 2 ( 2 ).
{r1,r2,r3 s} be the set of restrictions. GS is the
function, which finds groups with the smallest 4. Results and discussion
confidence. If not all items are in the same group, the
GS returns NULL. 1-itemset is selected if S( 1- The experiments were done using PC with core
itemset) >= S(GS(1-itemset)) where S is the function, 2 duo processor with a clock rate of 1.8 GHz and
which returns support for an itemset. Let C= {c1, c2, 3GB of main memory. The operating system was
c3 x} be the set of candidate itemsets. A new Microsoft Vista and implementation language was
candidate NC is added to C c#. We used 1 dataset to verify our method. The data
ci is selected for rule generation if S(C) >= S set of interest is patient dataset collected and
(GS(C)). A desired itemset, D, is partitioned into preprocessed from Bangladeshi hospitals, which has
three parts. D = {D0, D1, D2}. D0 is mapped to 50273 instances and 514 attributes (included 150
anticipated items, D1 is mapped to consequent items, discrete and 364 numerical attributes). It contains all
D2 is mapped to both. Each subset of D 2, d, is added categories of healthcare data: ratio, interval, decimal,
to both antecedent and consequent. When d is added integer, percentage etc. All these data are converted
to antecedent then D2-d is added to consequent. On into mineable items (integer representation) using
the other hand, when d is added to consequent then domain dictionary and rule base. We have taken an
D2-d is added to antecedent. L is a function, which
84
5. average value from 10 trials for each of the test constrains on attributes constant. Time is not varied
result. significantly because the number of groups has no
Table 1. Test result for patient dataset lead to reduce disk access. This is because number of
Number of groups 4 8 groups has no lead to the number of candidate
Support for each group .55, .47,.84, .66, generations phases and to the number of support
.64, .55,.85, .94, calculation phases. The number of groups has only
.76,.45 .86,.35 lead to the number of valid candidate generations
Correlation for each group .71, .63, .85,.82, and it can save some CPU time.
.41, .76,.91, .73, 4 Groups 8Groups 12 Groups
.51,.61 .82, .71
Number of Items to be 4,4,4,4 5,4,5,6, 2000
Time(Seconds)
constrained in antecedent for 4,5,5,7
each group
Number of Items to be 1,2,2,1 1,2,2,1
constrained in consequent for 1,2,2,1 0
each group
8 4 12
Number of Items to be 0,0,0,0 1,1,1,1
Group Size
constrained in both for each 1,1,1,0
group Figure 4: Time comparison of the proposed
Total number of desired itemsets 125 311 algorithms for the patient dataset based on
Total number of desired rules 21 28 Group Size
Figure 4 shows how time is varied with different
Time(Seconds) 173.09 556.11
group size for medical research algorithm. Here we
Table 1 shows test result for patient dataset, after
measured the performance of Medical Research
running the program of the proposed algorithm with
algorithm in terms of group size keeping number of
different parameters. Second column of the table
groups constant, support and confidence of each
presents the test result, where we used 4 groups,
group constant, antecedent and consequent
minimum support of 45%-76% and correlation of
constrains on attributes constant. Time is varied
.41-.71 to mine symmetric association rules for
significantly because group size has lead to reduce
medical researcher. The maximum number of items
disk access. This is because group size has lead to
in a rule was 6. 125 desired itemsets were generated
the number of candidate generations phases and to
in total. 21 rules were discovered in total. It took
the number of support calculation phases.
about 3461 seconds to find these rules. Third column Group Size 4 Group size 10
of the table presents the test result, where we used 8 1 Group Size 18
groups, minimum support of 35%-94% and
Accuracy
correlation of .63-.91 to mine symmetric association
rules for medical researcher. The maximum number 0.5
of items in a rule was 8. 311 desired itemsets were
generated in total. 28 rules were discovered in total. 0
It took about 11122 seconds to find these rules.
0.5 0.7 0.85
Group Size 4 Group Size 10
Group Size 18 Correlation
2000
Time(Seconds)
Figure 5: Accuracy of test result for the
patient dataset based on correlation
1000 Figure 5 illustrates accuracy results for our
proposed algorithm. The value of correlation for
each presented result is also indicated. For accuracy
0 measurement, we intentionally discovered
relationships among attributes for which trends are
4 Number of Groups12
8
known. Here we calculated accuracy as the ratio
Figure 3: Time comparison of the proposed between the number of correct discovered
algorithms for the patient dataset based on relationships and total number of discovered
number of groups relationships. A discovered relationship is correct if
Figure 3 shows how time is varied with different it is one of the known trends of medical domain. It
number of groups for the medical research algorithm. shows that an average accuracy of 55% is achieved
We measured the performance of Medical Research with correlation 0.5. The proposed algorithm with
algorithm in terms of number of groups keeping correlation 0.7 achieves an average accuracy of
group size constant, support and confidence of each 85.66%. The proposed algorithm with correlation 0.7
group constant, antecedent and consequent achieves an average accuracy of 94.66%. As
85
6. accuracy refers to the rate of correct values in the Large Databases," in Proceedings of the 1993 ACM
data, the figure represents the success of our SIGMOD international conference on Management of
proposed data mining algorithm. data, Washington, D.C., 1993, pp. 207-216.
[5] H. Mannila, H. Toivonen, and A. I. Verkamo,
"Efficient Algorithms for Discovering Association
5. Conclusion Rules," in AAAI Workshop on Knowledge Discovery
in Databases, 1994, pp. 181-192.
Medical Researchers are interested to find [6] R. Srikant and R. Agrawal, "Mining Generalized
relationship among various diseases, lab tests, Association Rules," in In Proc. of the 21st Int'l
symptoms, etc. Due to high dimensionality of Conference on Very Large Databases, Zurich,
medical data, conventional association mining Switzerland, 1995.
algorithms discover a very high number of rules with [7] R. Srikant, Q. Vu, and R. Agrawal, "Mining
many attributes, which are tedious, redundant to association rules with item constraints," in In Proc.
medical researchers and not among their desired set 3rd Int. Conf. Knowledge Discovery and Data
of attributes. In this paper, we have proposed an Mining, 1997, pp. 67--73.
association rule mining algorithm for finding [8] A. Savasere, E. Omiecinski, and S. B. Navathe, "An
symmetric association rules to support medical Efficient Algorithm for Mining Association Rules in
Large Databases," in Proceedings of the 21th
qualitative research. The main theme of this
International Conference on Very Large Data Bases,
algorithm is based on the following two statements: 1995, pp. 432 - 444.
interesting relationships among various medical [9] H. Mannila, "Database methods for data mining," in
attributes are concealed in subsets of the attributes, The Fourth International Conference on Knowledge
but do not come out on all attributes taken together Discovery and Data Mining, 1998.
and all interesting relationships among various [10] B. Liu, W. Hsu, and Y. Ma, "Mining Association
medical attributes have not same support and Rules with Multiple Minimum Supports.," in
correlation. The algorithm constructs a candidate SIGKDD Explorations, 1999, pp. 337--341.
item sets based on groups constraint and use the [11] H. Yun, D. Ha, B. Hwang, and K. H. Ryu, "Mining
corresponding support of each group in candidate association rules on significant rare data using relative
selection process to discover all possible desired item support.," Journal of Systems and Software archive,
sets of that group. We propose measuring vol. 67, no. 3, pp. 181 - 191, 2003.
interestingness of known symmetric relationships [12] M. Hahsler, "A Model-Based Frequency Constraint
and unknown symmetric relationships via the for Mining Associations from Transaction Data.,"
Data Mining and Knowledge Discovery, vol. 13, no.
correlation measure of antecedent items and
2, pp. 137 - 166, 2006.
consequent items. The proposed algorithm has been
[13] L. Zhou and S. Yau, "Association rule and
applied to a real world medical data set. We have quantitative association rule mining among infrequent
shown significant accuracy in the output of the items," in International Conference on Knowledge
proposed algorithm. Although we have used level- Discovery and Data Mining, San Jose, California,
wise search for finding symmetric association rules, 2007, pp. 156-167.
each step of our algorithm is different from any [14] C. Ordonez, C. Santana, and L. d. Braal, "Discovering
level-wise search algorithm. Rules generation from Interesting Association Rules in Medical Data," in
desired item sets is also different from conventional Proccedings of ACM SIGMOD Workshop on
association mining algorithms. Research Issues on Data Mining and Knowledge
Discovery, 2000, pp. 78-85.
[15] L. J. Sheela and V. Shanthi, "DIMAR - Discovering
6. References interesting medical association rules form MRI
scans," in 6th International Conference on Electrical
[1] R. Agrawal and R. Srikant, "Fast Algorithms for Engineering/Electronics, Computer,
Mining Association Rules in Large Databases," in Telecommunications and Information Technology,
Proceedings of the 20th International Conference on 2009, pp. 654 - 658.
Very Large Data Bases, San Francisco, CA, USA, [16] C. Ordonez, N. Ezquerra, and C. A. Santana,
1994, pp. 487 - 499. "Constraining and summarizing association rules in
[2] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, medical data," Knowledge and Information Systems,
"Dynamic Itemset Counting and Implication Rules for vol. 9, no. 3, pp. 259 - 283, September 2005.
Market Basket Data," in Proceedings of the 1997 [17] H. Pan, J. Li, and Z. Wei, "Mining Interesting
ACM SIGMOD international conference on Association Rules in Medical Images," Lecture Notes
Management of data, Tucson, Arizona, United States, In Computer Science, vol. 3584, pp. 598-609, 2005.
1997, pp. 255-264.
[18] S. Doddi, A. Marathe, S. S. Ravi, and D. C Torney,
[3] J. S. Park, M. S. Chen, and P. S. Yu, "An Effctive "Discovery of association rules in medical data,"
Hash based Algorithm for mining association rules," Medical Informatics and the Internet in Medicine, vol.
in Prof. ACM SIGMOD Conf Management of Data, 26, no. 1, pp. 25-33, January 2001.
New York, NY, USA, 1995, pp. 175 - 186.
[4] R. Agrawal, T. . Swami, "Mining
Association Rules between Sets of Items in Very
86