Electronic Medical Records (EMRs) contain patients’ history related to their medication, vaccine, test results and insurance information. EMRs need to be stored to facilitate the application of clinical treatment and prevention protocols. Clustering algorithms automate the process of information extraction and support health data management. Hence, in this mapping study, we systematically examine the literature on clustering algorithms used for analysing EMRs. We focus on studies published in 2016-2021 to present an overview of clustering techniques used in these studies to analyse medical data. We found 27 studies on clustering techniques, clustering technique problems and the evaluation parameters for analysing EMRs. However, although several studies have focused on this topic, only a few have taken the significant step of examining the clustering techniques used for analysing medical data particularly electronic medical record. Our results highlight that three clustering techniques have been used to analyse medical data, namely, the partitioning, the hierarchical and the density-based algorithms. We identified several clustering technique problems and 10 different evaluation parameters. The results suggest that researchers should focus on analysing medical data that will drive data-driven decision-making by management and promote a data-driven culture to ensure health care quality.
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAINijistjournal
Health care industry produces enormous quantity of data that clutches complex information relating to patients and their medical conditions. Data mining is gaining popularity in different research arenas due to its infinite applications and methodologies to mine the information in correct manner. Data mining techniques have the capabilities to discover hidden patterns or relationships among the objects in the medical data. In last decade, there has been increase in usage of data mining techniques on medical data for determining useful trends or patterns that are used in analysis and decision making. Data mining has an infinite potential to utilize healthcare data more efficiently and effectually to predict different kind of disease. This paper features various Data Mining techniques such as classification, clustering, association and also highlights related work to analyse and predict human disease.
Abstract:In health care domain, data mining plays a vital role for predicting diseases. For detecting a disease number of tests should be required from the patient but number of test should be reduced while using data mining techniques. The data mining technique analyze the test parameters and it concludes the associative relation between the parameters that reduces the tests and the reduced test plays key role in time and performance. In this study medical terms such as sex, blood pressure, and cholesterol like nineteen input attributes are used. In this paper association among various attributes which are the causative factors of heart diseases are analyzed. The patient’s records are observed before prediction and the factors are grouped as per its severity level. In this system the level of causative factors are categorized using K-Means clustering technique and it distinguishes the risky and non-risky factors. Frequent risk factors are mined from the clinical heart database using Apriori algorithm. The risk factors are taken for this study to predict the risk level and find the co-ordination among the factors that helps the medical people to predict the disease with minimum tests and treatments.
CLUSTERING ALGORITHM FOR A HEALTHCARE DATASET USING SILHOUETTE SCORE VALUEijcsit
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining
interesting research areas. Data mining tools and techniques help to discover and understand hidden
patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate
clustering method and optimal number of clusters in healthcare data can be confusing and difficult most
times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but
it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms.
This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable
algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans
and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means
algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed
DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and
different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms
have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm
performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining interesting research areas. Data mining tools and techniques help to discover and understand hidden patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate clustering method and optimal number of clusters in healthcare data can be confusing and difficult most times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms. This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAINijistjournal
Health care industry produces enormous quantity of data that clutches complex information relating to patients and their medical conditions. Data mining is gaining popularity in different research arenas due to its infinite applications and methodologies to mine the information in correct manner. Data mining techniques have the capabilities to discover hidden patterns or relationships among the objects in the medical data. In last decade, there has been increase in usage of data mining techniques on medical data for determining useful trends or patterns that are used in analysis and decision making. Data mining has an infinite potential to utilize healthcare data more efficiently and effectually to predict different kind of disease. This paper features various Data Mining techniques such as classification, clustering, association and also highlights related work to analyse and predict human disease.
Abstract:In health care domain, data mining plays a vital role for predicting diseases. For detecting a disease number of tests should be required from the patient but number of test should be reduced while using data mining techniques. The data mining technique analyze the test parameters and it concludes the associative relation between the parameters that reduces the tests and the reduced test plays key role in time and performance. In this study medical terms such as sex, blood pressure, and cholesterol like nineteen input attributes are used. In this paper association among various attributes which are the causative factors of heart diseases are analyzed. The patient’s records are observed before prediction and the factors are grouped as per its severity level. In this system the level of causative factors are categorized using K-Means clustering technique and it distinguishes the risky and non-risky factors. Frequent risk factors are mined from the clinical heart database using Apriori algorithm. The risk factors are taken for this study to predict the risk level and find the co-ordination among the factors that helps the medical people to predict the disease with minimum tests and treatments.
CLUSTERING ALGORITHM FOR A HEALTHCARE DATASET USING SILHOUETTE SCORE VALUEijcsit
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining
interesting research areas. Data mining tools and techniques help to discover and understand hidden
patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate
clustering method and optimal number of clusters in healthcare data can be confusing and difficult most
times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but
it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms.
This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable
algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans
and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means
algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed
DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and
different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms
have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm
performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining interesting research areas. Data mining tools and techniques help to discover and understand hidden patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate clustering method and optimal number of clusters in healthcare data can be confusing and difficult most times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms. This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
Clinical data science is a rapidly evolving field that utilizes advanced analytics and machine learning techniques to extract meaningful insights from large scale healthcare data. In recent years, there has been a significant increase in the availability of electronic health records, genomic data, wearable devices, and other digital health technologies, generating vast amounts of data. This article presents a comprehensive review of the current state of clinical data science and its future prospects. The review begins by providing an overview of the foundational concepts and methodologies employed in clinical data science. It explores various data sources, including structured and unstructured data, and highlights the challenges associated with data quality, privacy, and interoperability. The role of artificial intelligence and machine learning algorithms in data analysis and prediction is examined, along with the importance of data preprocessing and feature selection techniques. G. Dileepkumar | Nimisha Prajapati | Simhavalli Godavarthi "Clinical Data Science and its Future" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd58588.pdf Paper URL: https://www.ijtsrd.com.com/pharmacy/pharmacy-practice/58588/clinical-data-science-and-its-future/g-dileepkumar
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
With the promises of predictive analytics in big data, and the use of machine learning algorithms,
predicting future is no longer a difficult task, especially for health sector, that has witnessed a great
evolution following the development of new computer technologies that gave birth to multiple fields of
research. Many efforts are done to cope with medical data explosion on one hand, and to obtain useful
knowledge from it, predict diseases and anticipate the cure on the other hand. This prompted researchers
to apply all the technical innovations like big data analytics, predictive analytics, machine learning and
learning algorithms in order to extract useful knowledge and help in making decisions. In this paper, we
will present an overview on the evolution of big data in healthcare system, and we will apply three learning
algorithms on a set of medical data. The objective of this research work is to predict kidney disease by
using multiple machine learning algorithms that are Support Vector Machine (SVM), Decision Tree (C4.5),
and Bayesian Network (BN), and chose the most efficient one.
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUEScscpconf
The health sector has witnessed a great evolution following the development of new computer technologies, and that pushed this area to produce more medical data, which gave birth to multiple fields of research. Many efforts are done to cope with the explosion of medical data on one hand, and to obtain useful knowledge from it on the other hand. This prompted researchers to apply all the technical innovations like big data analytics, predictive analytics, machine learning and learning algorithms in order to extract useful knowledge and help in making decisions. With the promises of predictive analytics in big data, and the use of machine learning
algorithms, predicting future is no longer a difficult task, especially for medicine because predicting diseases and anticipating the cure became possible. In this paper we will present an overview on the evolution of big data in healthcare system, and we will apply a learning algorithm on a set of medical data. The objective is to predict chronic kidney diseases by using Decision Tree (C4.5) algorithm.
The Case Study of an Early Warning Models for the Telecare Patients in TaiwanIJERA Editor
To propose a practical early warning analysis model for the telecare patients, this study applied data mining
technology as a basis to investigate the classification of patient groups by disease severity and incidence using
data contained in a telecare database regarding the number of a clinic. The ultimate purpose of this study was to
provide a new direction for telecare system planning and developing strategies.
The subject of this case study was a private clinic which is providing telecare system to patients in Taiwan, and
we used three data mining techniques including discriminant analysis, logistic regression and artificial neural
network to construct an early warning analysis model based on several factors such as: Demographic variables,
pathological signals, health management index, diagnosis and treatment records, emergency notification signal.
According the results, the telecare system can build stronger physician-patient relationship in advance through
previously paying attention to patients’ physiological conditions, reminding them to do self-management, even
taking them to the hospital for observation. A comparison of discriminative rates showed that the artificial neural
network model had the highest overall correct classification rate, 85.52%, and thus is a tool worthy of
recommendation
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
Environmental changes and food habits affect people's health with numerous diseases in today's life. Machine learning is a technique that plays a vital role in predicting diseases from collected data. The health sector has plenty of electronic medical data, which helps this technique to diagnose various diseases quickly and accurately. There has been an improvement in accuracy in medical data analysis as data continues to grow in the medical field. Doctors may have a hard time predicting symptoms accurately. This proposed work utilized Kaggle data to predict and diagnose heart and diabetic diseases. The diseases heart and diabetes are the foremost cause of higher death rates for people. The dataset contains target features for the diagnosis of heart disease. This work finds the target variable for diabetic disease by comparing the patient's blood sugars to normal levels. Blood pressure, body mass index (BMI), and other factors diagnose these diseases and disorders. This work justifies the filter method and principal component analysis for selecting and extracting the feature. The main aim of this work is to highlight the implementation of three ensemble techniques-Adaptive boost, Extreme Gradient boosting, and Gradient boosting-as well as the emphasis placed on the accuracy of the results.
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A web/mobile decision support system to improve medical diagnosis using a com...TELKOMNIKA JOURNAL
This research provides a system that integrates the work of data mining and expert system for different tasks in the process of medical diagnosis, and provides detailed steps to the process of reaching a diagnosis based on the described symptoms and mapping them with existing diagnosis available on the web or on a cloud of medical knowledge based, aggregate these data in a fuzzy manner and produce a satisfactory diagnosis of the persisting problem. The mobile phone interface would make the system user-friendly and provides mobility and accessibility to the user, while posting updates and reading in details the steps that led to the decision or diagnosis that is reached by the K-mean and the fuzzy logic inference engine. The achieved results indicate a promising diagnosis performance of the system as it achieved 90% accuracy and 92.9% F-Score.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
prediction of heart disease using machine learning algorithmsINFOGAIN PUBLICATION
The successful experiment of data mining in highly visible fields like marketing, e-business, and retail has led to its application in other sectors and industries. Healthcare is being discovered among these areas. There is an opulence of data available within the healthcare systems. However, there is a scarcity of useful analysis tool to find hidden relationships in data. This research intends to provide a detailed description of Naïve Bayes and decision tree classifier that are applied in our research particularly in the prediction of Heart Disease. Some experiment has been conducted to compare the execution of predictive data mining technique on the same dataset, and the consequence reveals that Decision Tree outperforms over Bayesian classification.
Case Retrieval using Bhattacharya Coefficient with Particle Swarm Optimizationrahulmonikasharma
Now a day, health information management and utilization is the demanding task to health informaticians for delivering the eminence healthcare services. Extracting the similar cases from the case database can aid the doctors to recognize the same kind of patients and their treatment details. Accordingly, this paper introduces the method called H-BCF for retrieving the similar cases from the case database. Initially, the patient’s case database is constructed with details of different patients and their treatment details. If the new patient comes for treatment, then the doctor collects the information about that patient and sends the query to the H-BCF. The H-BCF system matches the input query with the patient’s case database and retrieves the similar cases. Here, the PSO algorithm is used with the BCF for retrieving the most similar cases from the patient’s case database. Finally, the Doctor gives treatment to the new patient based on the retrieved cases. The performance of the proposed method is analyzed with the existing methods, such as PESM, FBSO-neural network, and Hybrid model for the performance measures accuracy and F-Measure. The experimental results show that the proposed method attains the higher accuracy of 99.5% and the maximum F-Measure of 99% when compared to the existing methods.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
More Related Content
Similar to Clustering algorithms for analysing electronic medical record: A mapping study
Clinical data science is a rapidly evolving field that utilizes advanced analytics and machine learning techniques to extract meaningful insights from large scale healthcare data. In recent years, there has been a significant increase in the availability of electronic health records, genomic data, wearable devices, and other digital health technologies, generating vast amounts of data. This article presents a comprehensive review of the current state of clinical data science and its future prospects. The review begins by providing an overview of the foundational concepts and methodologies employed in clinical data science. It explores various data sources, including structured and unstructured data, and highlights the challenges associated with data quality, privacy, and interoperability. The role of artificial intelligence and machine learning algorithms in data analysis and prediction is examined, along with the importance of data preprocessing and feature selection techniques. G. Dileepkumar | Nimisha Prajapati | Simhavalli Godavarthi "Clinical Data Science and its Future" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd58588.pdf Paper URL: https://www.ijtsrd.com.com/pharmacy/pharmacy-practice/58588/clinical-data-science-and-its-future/g-dileepkumar
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
With the promises of predictive analytics in big data, and the use of machine learning algorithms,
predicting future is no longer a difficult task, especially for health sector, that has witnessed a great
evolution following the development of new computer technologies that gave birth to multiple fields of
research. Many efforts are done to cope with medical data explosion on one hand, and to obtain useful
knowledge from it, predict diseases and anticipate the cure on the other hand. This prompted researchers
to apply all the technical innovations like big data analytics, predictive analytics, machine learning and
learning algorithms in order to extract useful knowledge and help in making decisions. In this paper, we
will present an overview on the evolution of big data in healthcare system, and we will apply three learning
algorithms on a set of medical data. The objective of this research work is to predict kidney disease by
using multiple machine learning algorithms that are Support Vector Machine (SVM), Decision Tree (C4.5),
and Bayesian Network (BN), and chose the most efficient one.
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUEScscpconf
The health sector has witnessed a great evolution following the development of new computer technologies, and that pushed this area to produce more medical data, which gave birth to multiple fields of research. Many efforts are done to cope with the explosion of medical data on one hand, and to obtain useful knowledge from it on the other hand. This prompted researchers to apply all the technical innovations like big data analytics, predictive analytics, machine learning and learning algorithms in order to extract useful knowledge and help in making decisions. With the promises of predictive analytics in big data, and the use of machine learning
algorithms, predicting future is no longer a difficult task, especially for medicine because predicting diseases and anticipating the cure became possible. In this paper we will present an overview on the evolution of big data in healthcare system, and we will apply a learning algorithm on a set of medical data. The objective is to predict chronic kidney diseases by using Decision Tree (C4.5) algorithm.
The Case Study of an Early Warning Models for the Telecare Patients in TaiwanIJERA Editor
To propose a practical early warning analysis model for the telecare patients, this study applied data mining
technology as a basis to investigate the classification of patient groups by disease severity and incidence using
data contained in a telecare database regarding the number of a clinic. The ultimate purpose of this study was to
provide a new direction for telecare system planning and developing strategies.
The subject of this case study was a private clinic which is providing telecare system to patients in Taiwan, and
we used three data mining techniques including discriminant analysis, logistic regression and artificial neural
network to construct an early warning analysis model based on several factors such as: Demographic variables,
pathological signals, health management index, diagnosis and treatment records, emergency notification signal.
According the results, the telecare system can build stronger physician-patient relationship in advance through
previously paying attention to patients’ physiological conditions, reminding them to do self-management, even
taking them to the hospital for observation. A comparison of discriminative rates showed that the artificial neural
network model had the highest overall correct classification rate, 85.52%, and thus is a tool worthy of
recommendation
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
Environmental changes and food habits affect people's health with numerous diseases in today's life. Machine learning is a technique that plays a vital role in predicting diseases from collected data. The health sector has plenty of electronic medical data, which helps this technique to diagnose various diseases quickly and accurately. There has been an improvement in accuracy in medical data analysis as data continues to grow in the medical field. Doctors may have a hard time predicting symptoms accurately. This proposed work utilized Kaggle data to predict and diagnose heart and diabetic diseases. The diseases heart and diabetes are the foremost cause of higher death rates for people. The dataset contains target features for the diagnosis of heart disease. This work finds the target variable for diabetic disease by comparing the patient's blood sugars to normal levels. Blood pressure, body mass index (BMI), and other factors diagnose these diseases and disorders. This work justifies the filter method and principal component analysis for selecting and extracting the feature. The main aim of this work is to highlight the implementation of three ensemble techniques-Adaptive boost, Extreme Gradient boosting, and Gradient boosting-as well as the emphasis placed on the accuracy of the results.
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A web/mobile decision support system to improve medical diagnosis using a com...TELKOMNIKA JOURNAL
This research provides a system that integrates the work of data mining and expert system for different tasks in the process of medical diagnosis, and provides detailed steps to the process of reaching a diagnosis based on the described symptoms and mapping them with existing diagnosis available on the web or on a cloud of medical knowledge based, aggregate these data in a fuzzy manner and produce a satisfactory diagnosis of the persisting problem. The mobile phone interface would make the system user-friendly and provides mobility and accessibility to the user, while posting updates and reading in details the steps that led to the decision or diagnosis that is reached by the K-mean and the fuzzy logic inference engine. The achieved results indicate a promising diagnosis performance of the system as it achieved 90% accuracy and 92.9% F-Score.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
prediction of heart disease using machine learning algorithmsINFOGAIN PUBLICATION
The successful experiment of data mining in highly visible fields like marketing, e-business, and retail has led to its application in other sectors and industries. Healthcare is being discovered among these areas. There is an opulence of data available within the healthcare systems. However, there is a scarcity of useful analysis tool to find hidden relationships in data. This research intends to provide a detailed description of Naïve Bayes and decision tree classifier that are applied in our research particularly in the prediction of Heart Disease. Some experiment has been conducted to compare the execution of predictive data mining technique on the same dataset, and the consequence reveals that Decision Tree outperforms over Bayesian classification.
Case Retrieval using Bhattacharya Coefficient with Particle Swarm Optimizationrahulmonikasharma
Now a day, health information management and utilization is the demanding task to health informaticians for delivering the eminence healthcare services. Extracting the similar cases from the case database can aid the doctors to recognize the same kind of patients and their treatment details. Accordingly, this paper introduces the method called H-BCF for retrieving the similar cases from the case database. Initially, the patient’s case database is constructed with details of different patients and their treatment details. If the new patient comes for treatment, then the doctor collects the information about that patient and sends the query to the H-BCF. The H-BCF system matches the input query with the patient’s case database and retrieves the similar cases. Here, the PSO algorithm is used with the BCF for retrieving the most similar cases from the patient’s case database. Finally, the Doctor gives treatment to the new patient based on the retrieved cases. The performance of the proposed method is analyzed with the existing methods, such as PESM, FBSO-neural network, and Hybrid model for the performance measures accuracy and F-Measure. The experimental results show that the proposed method attains the higher accuracy of 99.5% and the maximum F-Measure of 99% when compared to the existing methods.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
Similar to Clustering algorithms for analysing electronic medical record: A mapping study (20)
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
K-centroid convergence clustering identification in one-label per type for di...IAESIJAI
Disease prediction is a high demand field which requires significant support from machine learning (ML) to enhance the result efficiency. The research works on application of K-means clustering supervised classification in disease prediction where each class only has one labeled data. The K-centroid convergence clustering identification (KC3 I) system is based on semi-K-means clustering but only requires single labeled data per class for the training process with the training dataset to update the centroid. The KC3 I model also includes a dictionary box to index all the input centroids before and after the updating process. Each centroid matches with a corresponding label inside this box. After the training process, each time the input features arrive, the trained centroid will put them to its cluster depending on the Euclidean distance, then convert them into the specific class name, which is coherent to that centroid index. Two validation stages were carried out and accomplished the expectation in terms of precision, recall, F1-score, and absolute accuracy. The last part demonstrates the possibility of feature reduction by selecting the most crucial feature with the extra tree classifier method. Total data are fed into the KC3 I system with the most important features and remain the same accuracy.
Plant leaf detection through machine learning based image classification appr...IAESIJAI
Since maize is a staple diet for people, especially vegetarians and vegans, maize leaf disease has a significant influence here on the food industry including maize crop productivity. Therefore, it should be understood that maize quality must be optimal; yet, to do so, maize must be safeguarded from several illnesses. As a result, there is a great demand for such an automated system that can identify the condition early on and take the appropriate action. Early disease identification is crucial, but it also poses a major obstacle. As a result, in this research project, we adopt the fundamental k-nearest neighbor (KNN) model and concentrate on building and developing the enhanced k-nearest neighbor (EKNN) model. EKNN aids in identifying several classes of disease. To gather discriminative, boundary, pattern, and structurally linked information, additional high-quality fine and coarse features are generated. This information is then used in the classification process. The classification algorithm offers high-quality gradient-based features. Additionally, the proposed model is assessed using the Plant-Village dataset, and a comparison with many standard classification models using various metrics is also done.
Backbone search for object detection for applications in intrusion warning sy...IAESIJAI
In this work, we propose a novel backbone search method for object detection for applications in intrusion warning systems. The goal is to find a compact model for use in embedded thermal imaging cameras widely used in intrusion warning systems. The proposed method is based on faster region-based convolutional neural network (Faster R-CNN) because it can detect small objects. Inspired by EfficientNet, the sought-after backbone architecture is obtained by finding the most suitable width scale for the base backbone (ResNet50). The evaluation metrics are mean average precision (mAP), number of parameters, and number of multiply–accumulate operations (MACs). The experimental results showed that the proposed method is effective in building a lightweight neural network for the task of object detection. The obtained model can keep the predefined mAP while minimizing the number of parameters and computational resources. All experiments are executed elaborately on the person detection in intrusion warning systems (PDIWS) dataset.
Deep learning method for lung cancer identification and classificationIAESIJAI
Lung cancer (LC) is calming many lives and is becoming a serious cause of concern. The detection of LC at an early stage assists the chances of recovery. Accuracy of detection of LC at an early stage can be improved with the help of a convolutional neural network (CNN) based deep learning approach. In this paper, we present two methodologies for Lung cancer detection (LCD) applied on Lung image database consortium (LIDC) and image database resource initiative (IDRI) data sets. Classification of these LC images is carried out using support vector machine (SVM), and deep CNN. The CNN is trained with i) multiple batches and ii) single batch for LC image classification as non cancer and cancer image. All these methods are being implemented in MATLAB. The accuracy of classification obtained by SVM is 65%, whereas deep CNN produced detection accuracy of 80% and 100% respectively for multiple and single batch training. The novelty of our experimentation is near 100% classification accuracy obtained by our deep CNN model when tested on 25 Lung computed tomography (CT) test images each of size 512×512 pixels in less than 20 iterations as compared to the research work carried out by other researchers using cropped LC nodule images.
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
Optical character recognition (OCR) is a technology that allows computers to recognize and extract text from images or scanned documents. It is commonly used to convert printed or handwritten text into machine-readable format. This Study presents an OCR system on Kannada Characters based on siamese neural network (SNN). Here the SNN, a Deep neural network which comprises of two identical convolutional neural network (CNN) compare the script and ranks based on the dissimilarity. When lesser dissimilarity score is identified, prediction is done as character match. In this work the authors use 5 classes of Kannada characters which were initially preprocessed using grey scaling and convert it to pgm format. This is directly input into the Deep convolutional network which is learnt from matching and non-matching image between the CNN with contrastive loss function in Siamese architecture. The Proposed OCR system uses very less time and gives more accurate results as compared to the regular CNN. The model can become a powerful tool for identification, particularly in situations where there is a high degree of variation in writing styles or limited training data is available.
Embedded artificial intelligence system using deep learning and raspberrypi f...IAESIJAI
Melanoma is a kind of skin cancer that originates in melanocytes responsible for producing melanin, it can be a severe and potentially deadly form of cancer because it can metastasize to other regions of the body if not detected and treated early. To facilitate this process, Recently, various computer-assisted low-cost, reliable, and accurate diagnostic systems have been proposed based on artificial intelligence (AI) algorithms, particularly deep learning techniques. This work proposed an innovative and intelligent system that combines the internet of things (IoT) with a Raspberry Pi connected to a camera and a deep learning model based on the deep convolutional neural network (CNN) algorithm for real-time detection and classification of melanoma cancer lesions. The key stages of our model before serializing to the Raspberry Pi: Firstly, the preprocessing part contains data cleaning, data transformation (normalization), and data augmentation to reduce overfitting when training. Then, the deep CNN algorithm is used to extract the features part. Finally, the classification part with applied Sigmoid Activation Function. The experimental results indicate the efficiency of our proposed classification system as we achieved an accuracy rate of 92%, a precision of 91%, a sensitivity of 91%, and an area under the curve- receiver operating characteristics (AUC-ROC) of 0.9133.
Deep learning based biometric authentication using electrocardiogram and irisIAESIJAI
Authentication systems play an important role in wide range of applications. The traditional token certificate and password-based authentication systems are now replaced by biometric authentication systems. Generally, these authentication systems are based on the data obtained from face, iris, electrocardiogram (ECG), fingerprint and palm print. But these types of models are unimodal authentication, which suffer from accuracy and reliability issues. In this regard, multimodal biometric authentication systems have gained huge attention to develop the robust authentication systems. Moreover, the current development in deep learning schemes have proliferated to develop more robust architecture to overcome the issues of tradition machine learning based authentication systems. In this work, we have adopted ECG and iris data and trained the obtained features with the help of hybrid convolutional neural network- long short-term memory (CNN-LSTM) model. In ECG, R peak detection is considered as an important aspect for feature extraction and morphological features are extracted. Similarly, gabor-wavelet, gray level co-occurrence matrix (GLCM), gray level difference matrix (GLDM) and principal component analysis (PCA) based feature extraction methods are applied on iris data. The final feature vector is obtained from MIT-BIH and IIT Delhi Iris dataset which is trained and tested by using CNN-LSTM. The experimental analysis shows that the proposed approach achieves average accuracy, precision, and F1-core as 0.985, 0.962 and 0.975, respectively.
Hybrid channel and spatial attention-UNet for skin lesion segmentationIAESIJAI
Melanoma is a type of skin cancer which has affected many lives globally. The American Cancer Society research has suggested that it a serious type of skin cancer and lead to mortality but it is almost 100% curable if it is detected and treated in its early stages. Currently automated computer vision-based schemes are widely adopted but these systems suffer from poor segmentation accuracy. To overcome these issue, deep learning (DL) has become the promising solution which performs extensive training for pattern learning and provide better classification accuracy. However, skin lesion segmentation is affected due to skin hair, unclear boundaries, pigmentation, and mole. To overcome this issue, we adopt UNet based deep learning scheme and incorporated attention mechanism which considers low level statistics and high-level statistics combined with feedback and skip connection module. This helps to obtain the robust features without neglecting the channel information. Further, we use channel attention, spatial attention modulation to achieve the final segmentation. The proposed DL based scheme is instigated on publically available dataset and experimental investigation shows that the proposed Hybrid Attention UNet approach achieves average performance as 0.9715, 0.9962, 0.9710.
Photoplethysmogram signal reconstruction through integrated compression sensi...IAESIJAI
The transmission of photoplethysmogram (PPG) signals in real-time is extremely challenging and facilitates the use of an internet of things (IoT) environment for healthcare- monitoring. This paper proposes an approach for PPG signal reconstruction through integrated compression sensing and basis function aware shallow learning (CSBSL). Integrated-CSBSL approach for combined compression of PPG signals via multiple channels thereby improving the reconstruction accuracy for the PPG signals essential in healthcare monitoring. An optimal basis function aware shallow learning procedure is employed on PPG signals with prior initialization; this is further fine-tuned by utilizing the knowledge of various other channels, which exploit the further sparsity of the PPG signals. The proposed method for learning combined with PPG signals retains the knowledge of spatial and temporal correlation. The proposed Integrated-CSBSL approach consists of two steps, in the first step the shallow learning based on basis function is carried out through training the PPG signals. The proposed method is evaluated using multichannel PPG signal reconstruction, which potentially benefits clinical applications through PPG monitoring and diagnosis.
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine learning models in speaker identification. Spectrograms of the speech have been used as input in deep learning-based speaker identification using clean speech. However, the performance of speaker identification systems gets degraded under noisy conditions. Cochleograms have shown better results than spectrograms in deep learning-based speaker recognition under noisy and mismatched conditions. Moreover, hybrid convolutional neural network (CNN) and recurrent neural network (RNN) variants have shown better performance than CNN or RNN variants in recent studies. However, there is no attempt conducted to use a hybrid CNN and enhanced RNN variants in speaker identification using cochleogram input to enhance the performance under noisy and mismatched conditions. In this study, a speaker identification using hybrid CNN and the gated recurrent unit (GRU) is proposed for noisy conditions using cochleogram input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) and without additive noises were employed for experiments. The experiment results and the comparison with existing works show that the proposed model performs better than other models in this study and existing works.
Multi-channel microseismic signals classification with convolutional neural n...IAESIJAI
Identifying and classifying microseismic signals is essential to warn of mines’ dangers. Deep learning has replaced traditional methods, but labor-intensive manual identification and varying deep learning outcomes pose challenges. This paper proposes a transfer learning-based convolutional neural network (CNN) method called microseismic signals-convolutional neural network (MS-CNN) to automatically recognize and classify microseismic events and blasts. The model was instructed on a limited sample of data to obtain an optimal weight model for microseismic waveform recognition and classification. A comparative analysis was performed with an existing CNN model and classical image classification models such as AlexNet, GoogLeNet, and ResNet50. The outcomes demonstrate that the MS-CNN model achieved the best recognition and classification effect (99.6% accuracy) in the shortest time (0.31 s to identify 277 images in the test set). Thus, the MS-CNN model can efficiently recognize and classify microseismic events and blasts in practical engineering applications, improving the recognition timeliness of microseismic signals and further enhancing the accuracy of event classification.
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...IAESIJAI
Efficient and accurate coronavirus disease (COVID-19) surveillance necessitates robust identification of individuals wearing face masks. This research introduces the sophisticated face mask dataset (SFMD), a comprehensive compilation of high-quality face mask images enriched with detailed annotations on mask types, fits, and usage patterns. Leveraging cutting-edge deep learning models—EfficientNet-B2, ResNet50, and MobileNet-V2—, we compare SFMD against two established benchmarks: the real-world masked face dataset (RMFD) and the masked face recognition dataset (MFRD). Across all models, SFMD consistently outperforms RMFD and MFRD in key metrics, including accuracy, precision, recall, and F1 score. Additionally, our study demonstrates the dataset's capability to cultivate robust models resilient to intricate scenarios like low-light conditions and facial occlusions due to accessories or facial hair.
Transfer learning for epilepsy detection using spectrogram imagesIAESIJAI
Epilepsy stands out as one of the common neurological diseases. The neural activity of the brain is observed using electroencephalography (EEG). Manual inspection of EEG brain signals is a slow and arduous process, which puts heavy load on neurologists and affects their performance. The aim of this study is to find the best result of classification using the transfer learning model that automatically identify the epileptic and the normal activity, to classify EEG signals by using images of spectrogram which represents the percentage of energy for each coefficient of the continuous wavelet. Dataset includes the EEG signals recorded at monitoring unit of epilepsy used in this study to presents an application of transfer learning by comparing three models Alexnet, visual geometry group (VGG19) and residual neural network ResNet using different combinations with seven different classifiers. This study tested the models and reached a different value of accuracy and other metrics used to judge their performances, and as a result the best combination has been achieved with ResNet combined with support vector machine (SVM) classifier that classified EEG signals with a high success rate using multiple performance metrics such as 97.22% accuracy and 2.78% the value of the error rate.
Deep neural network for lateral control of self-driving cars in urban environ...IAESIJAI
The exponential growth of the automotive industry clearly indicates that self-driving cars are the future of transportation. However, their biggest challenge lies in lateral control, particularly in urban bottlenecking environments, where disturbances and obstacles are abundant. In these situations, the ego vehicle has to follow its own trajectory while rapidly correcting deviation errors without colliding with other nearby vehicles. Various research efforts have focused on developing lateral control approaches, but these methods remain limited in terms of response speed and control accuracy. This paper presents a control strategy using a deep neural network (DNN) controller to effectively keep the car on the centerline of its trajectory and adapt to disturbances arising from deviations or trajectory curvature. The controller focuses on minimizing deviation errors. The Matlab/Simulink software is used for designing and training the DNN. Finally, simulation results confirm that the suggested controller has several advantages in terms of precision, with lateral deviation remaining below 0.65 meters, and rapidity, with a response time of 0.7 seconds, compared to traditional controllers in solving lateral control.
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...IAESIJAI
Recently, cardiovascular diseases (CVDs) have become a rapidly growing problem in the world, especially in developing countries. The latter are facing a lifestyle change that introduces new risk factors for heart disease, that requires a particular and urgent interest. Besides, cardiomegaly is a sign of cardiovascular diseases that refers to various conditions; it is associated with the heart enlargement that can be either transient or permanent depending on certain conditions. Furthermore, cardiomegaly is visible on any imaging test including Chest X-Radiation (X-Ray) images; which are one of the most common tools used by Cardiologists to detect and diagnose many diseases. In this paper, we propose an innovative deep learning (DL) model based on an attention module and MobileNet architecture to recognize Cardiomegaly patients using the popular Chest X-Ray8 dataset. Actually, the attention module captures the spatial relationship between the relevant regions in Chest X-Ray images. The experimental results show that the proposed model achieved interesting results with an accuracy rate of 81% which makes it suitable for detecting cardiomegaly disease.
Efficient commodity price forecasting using long short-term memory modelIAESIJAI
Predicting commodity prices, particularly food prices, is a significant concern for various stakeholders, especially in regions that are highly sensitive to commodity price volatility. Historically, many machine learning models like autoregressive integrated moving average (ARIMA) and support vector machine (SVM) have been suggested to overcome the forecasting task. These models struggle to capture the multifaceted and dynamic factors influencing these prices. Recently, deep learning approaches have demonstrated considerable promise in handling complex forecasting tasks. This paper presents a novel long short-term memory (LSTM) network-based model for commodity price forecasting. The model uses five essential commodities namely bread, meat, milk, oil, and petrol. The proposed model focuses on advanced feature engineering which involves moving averages, price volatility, and past prices. The results reveal that our model outperforms traditional methods as it achieves 0.14, 3.04%, and 98.2% for root mean square error (RMSE), mean absolute percentage error (MAPE), and R-squared (R2 ), respectively. In addition to the simplicity of the model, which consists of an LSTM single-cell architecture that reduced the training time to a few minutes instead of hours. This paper contributes to the economic literature on price prediction using advanced deep learning techniques as well as provides practical implications for managing commodity price instability globally.
1-dimensional convolutional neural networks for predicting sudden cardiacIAESIJAI
Sudden cardiac arrest (SCA) is a serious heart problem that occurs without symptoms or warning. SCA causes high mortality. Therefore, it is important to estimate the incidence of SCA. Current methods for predicting ventricular fibrillation (VF) episodes require monitoring patients over time, resulting in no complications. New technologies, especially machine learning, are gaining popularity due to the benefits they provide. However, most existing systems rely on manual processes, which can lead to inefficiencies in disseminating patient information. On the other hand, existing deep learning methods rely on large data sets that are not publicly available. In this study, we propose a deep learning method based on one-dimensional convolutional neural networks to learn to use discrete fourier transform (DFT) features in raw electrocardiogram (ECG) signals. The results showed that our method was able to accurately predict the onset of SCA with an accuracy of 96% approximately 90 minutes before it occurred. Predictions can save many lives. That is, optimized deep learning models can outperform manual models in analyzing long-term signals.
A deep learning-based approach for early detection of disease in sugarcane pl...IAESIJAI
In many regions of the nation, agriculture serves as the primary industry. The farming environment now faces a number of challenges to farmers. One of the major concerns, and the focus of this research, is disease prediction. A methodology is suggested to automate a process for identifying disease in plant growth and warning farmers in advance so they can take appropriate action. Disease in crop plants has an impact on agricultural production. In this work, a novel DenseNet-support vector machine: explainable artificial intelligence (DNet-SVM: XAI) interpretation that combines a DenseNet with support vector machine (SVM) and local interpretable model-agnostic explanation (LIME) interpretation has been proposed. DNet-SVM: XAI was created by a series of modifications to DenseNet201, including the addition of a support vector machine (SVM) classifier. Prior to using SVM to identify if an image is healthy or un-healthy, images are first feature extracted using a convolution network called DenseNet. In addition to offering a likely explanation for the prediction, the reasoning is carried out utilizing the visual cue produced by the LIME. In light of this, the proposed approach, when paired with its determined interpretability and precision, may successfully assist farmers in the detection of infected plants and recommendation of pesticide for the identified disease.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
Clustering algorithms for analysing electronic medical record: A mapping study
1. IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 12, No. 4, December 2023, pp. 1784~1792
ISSN: 2252-8938, DOI: 10.11591/ijai.v12.i4.pp1784-1792 1784
Journal homepage: http://ijai.iaescore.com
Clustering algorithms for analysing electronic medical record:
A mapping study
Siti Nur Shahidah Zaman Shah1
, Marshima Mohd Rosli1,2
1
College of Computing, Informatics & Media, Universiti Teknologi MARA, Selangor, Malaysia
2
Institute for Pathology, Laboratory and Forensic Medicine (I-PPerForM), University Teknologi MARA, Selangor, Malaysia
Article Info ABSTRACT
Article history:
Received Oct 6, 2022
Revised Jan 18, 2023
Accepted Jan 30, 2023
Electronic Medical Records (EMRs) contain patients’ history related to their
medication, vaccine, test results and insurance information. EMRs need to be
stored to facilitate the application of clinical treatment and prevention
protocols. Clustering algorithms automate the process of information
extraction and support health data management. Hence, in this mapping study,
we systematically examine the literature on clustering algorithms used for
analysing EMRs. We focus on studies published in 2016-2021 to present an
overview of clustering techniques used in these studies to analyse medical
data. We found 27 studies on clustering techniques, clustering technique
problems and the evaluation parameters for analysing EMRs. However,
although several studies have focused on this topic, only a few have taken the
significant step of examining the clustering techniques used for analysing
medical data particularly electronic medical record. Our results highlight that
three clustering techniques have been used to analyse medical data, namely,
the partitioning, the hierarchical and the density-based algorithms. We
identified several clustering technique problems and 10 different evaluation
parameters. The results suggest that researchers should focus on analysing
medical data that will drive data-driven decision-making by management and
promote a data-driven culture to ensure health care quality.
Keywords:
Clustering algorithm
Electronic medical record
Mapping study
Medical data
This is an open access article under the CC BY-SA license.
Corresponding Author:
Marshima Mohd Rosli
College of Computing, Informatics & Media, Universiti Teknologi MARA
Selangor, Malaysia
Email: marshima@tmsk.uitm.edu.my
1. INTRODUCTION
Several public hospitals use an electronic medical record (EMR) system to manage patient data, such
as data on patient care, diagnostic procedures, medical history and allergic conditions [1]–[3]. Owing to the
increasing number of EMRs globally, medical information is too complex and voluminous to be processed and
analysed using traditional methods. In recent years, the clinical information in EMRs is being fully used
through data mining to identify patterns using a combination of machine learning and statistics [4]–[6].
Data mining has been conducted ever since its emergence in the late 1980s. It is a cross-disciplinary
process that focuses on exploring very large datasets to uncover patterns [7]. Similarly, machine learning
resides within artificial intelligence and is closely linked with data mining, its predecessor. Machine learning
primarily uses learning algorithms to identify patterns in sample data and then uses these patterns to make
decisions without being specifically programmed for this purpose [8].
Machine learning has been widely applied in medical research by mining EMRs using data mining
methods to find patterns that can aid medical practitioners mainly in decision-making [9], [10]. Some machine
learning techniques are regression, classification, clustering, dimensional reduction and reinforcement
2. Int J Artif Intell ISSN: 2252-8938
Clustering algorithms for analysing electronic medical record … (Siti Nur Shahidah Zaman Shaha)
1785
learning. These techniques are commonly used as a tool for the solution of classification, forecasting and
prediction problems [11].
Clustering, a common machine learning technique, involves grouping objects according to similarities
in their data and classifying each into a specific group. The algorithm analyses the clustered data, identifies a
group of data that share the same traits and develops a model from these data [12]. For example, clustering is
used to group patients by gender, age and blood type based on their demographics.
In this study, we investigate the literature on clustering algorithms to identify challenges associated
with analysing medical data in recent research. Prior reviews on clustering techniques have paid less attention
to medical research. Therefore, we present the outcomes of a systematic mapping study to categorise recent
research on clustering algorithms used to analyse medical data. Through this study, we aim to address this
research gap by seeking answers to three main research questions: (RQ1) What are the studies that have
investigated clustering techniques for analysing EMRs? (RQ2) What are the studies that have discussed
problems associated with clustering techniques for analysing EMRs? (RQ3) What are the studies that have
used evaluation parameters on clustering techniques?
Through this systematic mapping study, we provide researchers and practitioners with an overview of
the existing clustering algorithms for analysing medical data. We also provide practical insights by exploring
these research questions. Overall, this systematic mapping study contributes to the knowledge base on
clustering algorithms for analysing medical data.
2. RELATED WORK
EMRs are the innovation that eliminate the need for traditional paper charts containing the patient’s
medical history [1]. The use of EMRs enables researchers to gain new insight into the depth of large datasets
[13]. It is another way to learn new medical knowledge, apart from through conducting clinical and biological
experiments. An EMR can store an extensive range of patient information, such as the history of medication,
vaccination, test results and insurance information.
EMRs are used to store all past clinical information of each patient in detail during their treatment at
medical institutions [14]. The incidence and clinical characteristics of diseases can be identified through mining
all previous clinical information recorded in the EMR system [15]–[17]. Data mining has become an important
technique to help researchers extract knowledge from large and complex data, such as medical records for
patients [18]–[20].
Data mining is also termed ‘knowledge discovery and data mining’. The data mining process mainly
aims to extract essential and useful information from a database [7]. It involves data pre-processing techniques,
such as data cleaning, integration, transformation and reduction; pattern discovery and knowledge evaluation
[21]. Various data mining techniques, such as association, classification and clustering, can be used to extract
interesting patterns from huge amounts of data.
The technique first used for data mining is association. It is a method used to detect interesting
relationships between variables in large databases [22]. It can help to identify the rules relations in data
collection between the medical authorities, such as medication, symptoms, disease and health condition
[23], [24]. Next, classification is a method to extract a model with significant classes described. Some examples
of classification techniques are the decision tree algorithm and the Naïve Bayes algorithm.
Clustering is another data mining technique. It involves grouping data with identical features into
classes or clusters [22]. It can be applied in different fields, such as artificial intelligence, pattern recognition
and neural networks [25]. In addition, clustering algorithms have been widely used for analysing medical data.
There are several clustering techniques, such as the partitioning algorithm, the hierarchical algorithm and the
density-based algorithm [26].
K-means is a known partitioning clustering algorithm commonly used to analyse EMRs. For example,
it is used to cluster EMRs to identify the relevant features of patients with kidney failure and heart disease [27].
K-means is the most popular clustering algorithm because of its success in yielding efficient clusters, its ease
of implementation, its improvement of prediction accuracy of and its flexibility [28].
The hierarchical algorithm is another clustering technique. It is quite popular owing to its visualisation
capability [23]. It focuses on building a cluster hierarchy [29]. Hierarchical clustering groups data by type in a
chain of importance, and it is used for probability analysis in healthcare [27]. This technique is commonly used
for measuring physiological features across all patient medical records [30]. The last clustering algorithm
technique is density-based clustering algorithm. Density-based spatial clustering of applications with noise
(DBSCAN) is one of the most famous density-based clustering techniques. DBSCAN is more efficient in
finding clusters, has the attribute of noise cancellation and is robust to outliers [31], [32]. The density-based
algorithm is also broadly used on healthcare and medical datasets such as biomedical images. For example,
DBSCAN is used on skin lesion images to detect homogeneous colour regions. In addition, DBSCAN is used
to identify populations with dengue fever [29].
3. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 4, December 2023: 1784-1792
1786
3. METHOD
A systematic mapping study is a type of well-defined methodology used to primarily identify and
review all possible available research evidence that is relevant. Consequently, it is used to provide a clear
explanation for specific research questions in a study. In addition, these questions should have been considered
in the literature. In this mapping study, we adopted the guidelines suggested by researchers in software
engineering [33], [34].
We categorised and structured empirical evidence based on studies that discussed the clustering
algorithms used to analyse medical data. Thus, this study will help researchers and practitioners to gather,
classify and aggregate results in related studies to identify research gaps and challenges for improvement in
the body of knowledge. The mapping process consists of five activities: defining research questions,
implementing the search strategy, selecting studies, keywording of abstracts and extracting data [35], [36].
3.1. Research questions
Framing a clear, concise research question is an obligation for any systematic mapping study. Without
a specified question, writing a well-written, well-focused review can be very onerous. For that reason, this
study’s research questions are developed using the PICOC (Population, Intervention, Comparison, Outcome,
Context) model of Kitchenham et al. [34]. We constructed three research questions as follows:
RQ1: What are the studies that have investigated clustering techniques for analysing EMR?
RQ2: What are the studies that have discussed problems associated with clustering techniques for
analysing EMR?
RQ3: What are the studies that have used evaluation parameters on clustering techniques?
3.2. Search strategy
A thorough search is required for the process of identifying relevant literature. For this reason, a search
strategy is necessary, which involves identifying the right search string with the relevant terms and keywords.
This approach will ensure wide coverage and increase the chance of identifying the right publications.
Therefore, in this mapping study, we constructed the search string as follows:
− Identify keywords in relevant papers.
− Search synonyms for identified keywords.
− Formulate search strings using Boolean OR to include alternative spellings and synonyms, and Boolean
AND to combine major terms.
The search string: (“clustering technique" OR " clustering algorithm" OR "machine learning" OR
"data mining”) AND (“analyse”) AND (“disease”) AND (“electronic medical record" OR " EMR" OR
"medical data”) AND (LIMIT-TO (SUBJAREA, "COMP")) AND (LIMIT-TO (LANGUAGE, "English"))
In the search process, we included papers from conferences and journals on the research topics, such
as the System & Software Journal and the Biomedical Informatics Journal. We limited our research to the
computer science domain. We considered papers published in the five years from 2016 to 2021. We used the
search string to conduct a primary search on SCOPUS, ScienceDirect, SpringerLink and ACM (Association
for Computing Machinery) Digital. The search results are presented in Table 1.
Table 1. Primary search results
Online database Search results Duplicate papers Relevant papers
SCOPUS 92 25 4
Science Direct 329 15 13
SpringerLink 96 9 4
ACM Digital 182 1 6
Total 699 50 27
3.3. Selection of papers
The only inclusion criteria for the overall selection process were that the selected studies should be
empirical studies related to clustering algorithms in the domain of medical data. Therefore, the literature search
covered studies published within the 5-year period of 2016–2021. The specific inclusion criteria were as
follows: (a) Peer-reviewed papers that provide evidence on analysing and evaluating EMR using clustering
algorithms; (b) state the evaluation criteria for clustering techniques; and (c) use EMRs. The exclusion criteria
were as follows: (a) Papers that provide evidence on the clustering techniques/algorithms but do not use EMRs
or medical data and b) are not in English.
4. Int J Artif Intell ISSN: 2252-8938
Clustering algorithms for analysing electronic medical record … (Siti Nur Shahidah Zaman Shaha)
1787
On performing the automatic search, we obtained a set of 699 papers. In the screening process, first,
we screened the title and abstract of each paper in order to exclude unrelated and identical documents, which
reduced the number of papers to 121. Next, we scanned the introduction and conclusion of each paper, which
reduced the first set to only 27 papers.
3.4. Classification scheme
First, we observed the keywords and concepts in the abstract of each paper. Next, we combined the
keywords and concepts to produce an outline of the issues and challenges of the background research. We then
selected suitable keywords to structure the schemes or categories. Last, we determined three different
categories, which are clustering techniques for analysing EMRs (RQ1), problems in clustering techniques
(RQ2) and evaluation criteria used to evaluate clustering techniques (RQ3).
3.5. Data extraction and mapping of studies
In this process, we used Excel spreadsheet for data logging and output production. We mapped the
identified studies into three categories and obtained relevant information from these to answer the research
questions. We tabulated the results into tables and calculated the publication frequencies in each category.
4. RESULTS
4.1. Clustering techniques for analysing EMR (RQ1)
Clustering algorithms have been widely used to analyse medical data [31], [37]–[40]. Three clustering
techniques are commonly used for analysing EMRs: partitioning, hierarchical and density-based algorithms.
K-means is a famous partitioning clustering algorithm that is commonly used for analysing EMRs [37]–[39],
[41]. It is used to group a collection of patient records to identify the relevant features of patients with heart
disease [42], [43]. In addition, some research efforts have been devoted to exploiting clustering techniques on
data related to kidney failure [27]. Moreover, k-means clustering has also been used in a study to group patients
with Parkinson’s disease [44].
Next, the hierarchical algorithm is another clustering method discussed in recent studies owing to its
visualisation capability, which is requested by many physicians [26]. Hierarchical clustering aims to construct
a cluster hierarchy and can be regarded as a partitional clustering sequence. Its groups data included in a type
of chain of importance that are utilised for expectation in healthcare. Next, hierarchical clustering is commonly
used for physiological measurements across all patients [30].
Density-based clustering algorithms are widely used to find clusters of nonlinear and arbitrary shapes
based on the density of the connected points. DBSCAN is one of the most famous density-based clustering
techniques [45]–[47]. The density-based algorithm is broadly used on healthcare and medical records such as
biomedical images [48], [49]. DBSCAN is used to detect homogeneous colour regions in skin lesion images.
It is also used to discover the hidden cluster and pattern for heart disease [50]. Further, it is used to determine
the population with dengue fever [29].
We found 27 relevant studies, of which only 23 fulfilled the three inclusion criteria. We noticed that
some had included more than one clustering technique. For example, k-means and DBSCAN were both used
in one study [50]. Figure 1 presents the distribution of studies on clustering techniques for analysing EMR.
As Figure 1 shows, the number of studies on clustering techniques for analysing EMRs increased from
2016 to 2021. In 2019, the highest number of studies was published on the partitioning clustering technique,
namely, 7 studies. Meanwhile, six studies were on hierarchical clustering and three on density-based clustering.
4.2. Clustering technique problems for analysing EMR (RQ2)
There are many problems related to the clustering technique. However, we only show the related and
mentioned issues present in the papers without any clarification for categorization. In addition, papers that
present more than one problem were counted in every category, and the total sums up to more than 27. Table
2 shows the frequency of problems in the clustering techniques.
As previously mentioned, we identified 27 studies that met the inclusion criteria. As illustrated in
Table 2, the partitioning clustering algorithm is not recommended for datasets with noise and many outliers.
This is because it is highly susceptible to noise and outliers with abnormal values [27]. The k-means
partitioning clustering technique is biased to spherical clusters and is sensitive to the initial selection of
centroids. Moreover, k-means is less efficient in partitioning the initial collection of data into subsets of
different data distributions, including patients with different examination history [43].
Owing to space and time constraints, the hierarchical algorithm is sometimes not suitable for large
data. If no data or limited data are available, hierarchical clustering may be a suitable option because the number
of clusters is not predetermined. However, hierarchical clustering is expensive and not scalable. Therefore, it
is not suitable for big data [29]. In addition, it is hard to detect the gaps between the objects.
5. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 4, December 2023: 1784-1792
1788
Figure 1. Distribution of studies on clustering techniques for analysing EMR
The process of density-based clustering is slow for large datasets. DBSCAN is able to find clusters
with different sizes and forms, but it is weak in identifying clusters of variant density [42]. This limitation can
be overcome by decomposing the clustering process in subsequent steps with the multiple-level DBSCAN
algorithm. Another issue is that the DBSCAN algorithm fails to track cluster centres and cannot be simply
trained and utilised with new data [32].
Table 2. Distribution of papers on clustering technique problems
Clustering techniques Problems No. of papers
Partitioning clustering − P1: Highly susceptible to noise and abnormal values outliers.
− P2: Biased to spherical clusters.
− P3: Sensitive to the initial selection of centroids.
− P4: Less efficient in partitioning initial collection of data into subsets.
10
4
2
2
Hierarchical clustering − H1: Not suitable for big data owing to space and time limitations.
− H2: Expensive and not scalable.
− H3: Hard to detect the gaps between the objects.
3
3
1
Density-based clustering − D1: Slow for large datasets.
− D2: Weak in identifying clusters of variant density.
− D3: Unable to track cluster centres.
− D4: Cannot be simply trained and utilised with new data.
3
1
2
1
Total 31
4.3. Evaluation parameters used in clustering techniques (RQ3)
All these algorithms are explained and analysed based on certain evaluation parameters, such as the
number of clusters, cluster distance, epsilon value and threshold value. Table 3 illustrates the distribution of
relevant papers for the common evaluation parameters used in 21 studies.
As Table 3 shows, several evaluation parameters are used in clustering techniques. The most used
evaluation parameter is the number of clusters-9 papers applied this parameter to evaluate clustering
performance. We found that two papers each used the sum of squared error, silhouette score, overall similarity,
cluster density and threshold value as an evaluation parameter of the clustering technique. In addition, some
other parameters can be used to evaluate the clustering technique, such as cluster distance, the Davies–Bouldin
Index, epsilon value and neighbourhood size.
4.5. Mapping
To provide an overview of clustering algorithms research analysing medical data in the literature, we
present a map in Figure 2 that depicts the distribution of papers on clustering algorithms analysing medical
6. Int J Artif Intell ISSN: 2252-8938
Clustering algorithms for analysing electronic medical record … (Siti Nur Shahidah Zaman Shaha)
1789
data according to their techniques, problems and common evaluation parameters used. This mapping study
presents each paper in a bubble form, where the size and number of each bubble represent the frequency of
papers classified for that category. Because a paper may contribute more than one technique, problem or
evaluation parameters, each of these related papers is divided into more than one factor in each category. For
that reason, the total number of paper counts in the bubble plot is not equal to the total of 27 relevant papers.
Table 3 Distribution of papers on evaluation parameters
No. Evaluation parameter No of papers
1. Sum of Squared Error 2
2. Silhouette Score 2
3. Overall Similarity 2
4. Cluster Density 2
5. Cluster Distance 1
6. Davies–Bouldin Index 1
7. Epsilon Value 1
8. Neighbourhood Size 1
9. Number of Clusters 9
10. Threshold Value 2
Total 21
Figure 2. Distribution of clustering algorithms research by techniques, problems and evaluation parameters
5. DISCUSSIONS
In this study, we aimed to determine the status of recent research on clustering algorithms for
analysing medical data. We performed a mapping study to identify the clustering techniques used, associated
problems and the evaluation parameters used in these techniques and found 27 such studies. Almost all the
reviewed papers were published between 2016 and 2021.
Thus, this study highlights the clustering techniques used to analyse EMRs. We found that 23 studies
have applied clustering techniques for analysing EMRs, namely partitioning, hierarchical and density-based
algorithms. Among all the techniques, the partitioning technique is the most popular. Some studies applied
more than one clustering technique for data analysis. Further, 18 studies used partitioning clustering to cluster
and analyse medical data, followed by five studies that used hierarchical clustering and three that used density-
based clustering.
Next, this study emphasises the problems involved in these clustering techniques. Partitioning
clustering is not recommended for noisy datasets because it is highly susceptible to noise. Next, hierarchical
clustering is sometimes not suitable for big data because it is expensive and not scalable. Meanwhile, density-
based clustering is weak in identifying clusters of variant density. Another issue is the failure to track cluster
centres; further, it cannot be simply trained and utilised with new data.
Last, this study also highlights the evaluation parameters used in clustering techniques. Clustering
techniques are analysed based on certain evaluation parameters. We identified 10 evaluation parameters: sum
7. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 4, December 2023: 1784-1792
1790
of squared error, silhouette score, overall similarity, cluster density, cluster distance, the Davies–Bouldin Index,
epsilon value, neighbourhood size, number of clusters and threshold value. We found that the number of
clusters is the most popular evaluation parameter, with nine studies using it.
6. CONCLUSION
This systematic mapping study has provided an in-depth review of clustering algorithms used for
analysing medical data. Concerning the limitations of the review, as we stated, according to the selection
criteria, we included only studies that clearly examined clustering algorithms for analysing medical data. We
excluded studies that described clustering algorithms but did not use EMRs or medical data. The results show
that three main types of clustering techniques for analysing EMR which are partitioning, hierarchical and
density-based clustering. Meanwhile, based on the mapping results, the most addressed problems of clustering
techniques in the literature are issues that related with highly susceptible to noise and outliers with abnormal
values. However, the results also reveal a lack of significant research in addressing evaluation parameterr
related with cluster distance, the Davies–Bouldin Index, epsilon value and neighbourhood size. We encourage
researchers to consider clustering algorithms used to analyse medical data for improving the management of
medical data.
ACKNOWLEDGEMENTS
The authors would like to thank the Ministry of Education Malaysia and Universiti Teknologi MARA
for their financial support to this project under Lestari Grant No. 600-RMC/MYRA 5/3/LESTARI (078/2020).
We would also like to thank the College of Computing, Informatics & Media, Universiti Teknologi MARA,
Selangor, Malaysia for all the supports.
REFERENCES
[1] G. Canino, P. H. Guzzi, G. Tradigo, A. Zhang, and P. Veltri, “On the analysis of diseases and their related geographical data,” IEEE
Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 228–237, Jan. 2017, doi: 10.1109/JBHI.2015.2496424.
[2] L. Waithera, J. Muhia, and R. Songole, “Impact of electronic medical records on healthcare delivery in Kisii teaching and referral
hospital,” Medical & Clinical Reviews, vol. 03, no. 04, 2017, doi: 10.21767/2471-299x.1000062.
[3] R. Eden, A. Burton-Jones, A. Staib, and C. Sullivan, “Surveying perceptions of the early impacts of an integrated electronic medical
record across a hospital and healthcare service,” Australian Health Review, vol. 44, no. 5, pp. 690–698, 2020,
doi: 10.1071/AH19157.
[4] T. Giaedi, “The impact of electronic medical records on improvement of health care delivery,” Libyan Journal of Medicine, vol. 3,
no. 1, p. 4, 2008, doi: 10.4176/071118.
[5] G. Ferrante, A. Licari, S. Fasola, G. L. Marseglia, and S. La Grutta, “Artificial intelligence in the diagnosis of pediatric allergic
diseases,” Pediatric Allergy and Immunology, vol. 32, no. 3, pp. 405–413, 2021, doi: 10.1111/pai.13419.
[6] D. Schlauch and E. Al., “Development of a real-time risk model (RTRM) for predicting in-hospital COVID-19 mortality,” Nucl.
Phys., vol. 13, no. 1, pp. 104–116, 1959, doi: 10.1101/2021.04.26.21256138.
[7] D. Verma and N. Mishra, “Analysis and prediction of breast cancer and diabetes disease datasets using data mining classification
techniques,” Proceedings of the International Conference on Intelligent Sustainable Systems, ICISS 2017, pp. 533–538, 2018,
doi: 10.1109/ISS1.2017.8389229.
[8] J. R. Koza, F. H. Bennett, D. Andre, and M. A. Keane, “Automated design of both the topology and sizing of analog electrical
circuits using genetic programming,” Artificial Intelligence in Design ’96, pp. 151–170, 1996, doi: 10.1007/978-94-009-0279-4_9.
[9] J. M. Banda et al., “Finding missed cases of familial hypercholesterolemia in health systems using machine learning,” npj Digital
Medicine, vol. 2, no. 1, 2019, doi: 10.1038/s41746-019-0101-5.
[10] A. Pina et al., “Virtual genetic diagnosis for familial hypercholesterolemia powered by machine learning,” European Journal of
Preventive Cardiology, vol. 27, no. 15, pp. 1639–1646, 2020, doi: 10.1177/2047487319898951.
[11] S. Masrom, R. A. Rahman, N. Baharun, and A. S. A. Rahman, “Automated machine learning with genetic programming on real
dataset of tax avoidance classification problem,” in Proceedings of the 2020 9th International Conference on Educational and
Information Technology, Feb. 2020, pp. 139–143, doi: 10.1145/3383923.3383942.
[12] B. Huidong JIN and K.-S. Leung, “Scalable model-based clustering algorithms for large databases and their applications,” 2002.
[13] R. Wang, J. Zhao, L. Peng, B. Yang, L. Wang, and B. Li, “Medical entity recognition of esophageal carcinoma based on
word clustering,” 2018 International Conference on Security, Pattern Analysis, and Cybernetics, SPAC 2018, pp. 348–353, 2018,
doi: 10.1109/SPAC46244.2018.8965515.
[14] V. Ehrenstein, H. Kharrazi, H. Lehmann, and C. O. Taylor, “Obtaining data from electronic health records,” Tools and Technologies
for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, pp. 1–92, 2019.
[15] B. Song, Y. Feng, X. Li, Z. Sun, and Y. Yang, “Un-apriori: A novel association rule mining algorithm for unstructured EMRs,”
2017 IEEE 19th International Conference on e-Health Networking, Applications and Services, Healthcom 2017, vol. 2017-
December, pp. 1–6, 2017, doi: 10.1109/HealthCom.2017.8210792.
[16] M. Hosni et al., “A systematic mapping study for ensemble classification methods in cardiovascular disease,” Artificial Intelligence
Review, vol. 54, no. 4, pp. 2827–2861, 2021, doi: 10.1007/s10462-020-09914-6.
[17] F. Ahmad, N. A. Mat Isa, Z. Hussain, and M. K. Osman, “Intelligent medical disease diagnosis using improved hybrid genetic
algorithm - multilayer perceptron network,” Journal of Medical Systems, vol. 37, no. 2, 2013, doi: 10.1007/s10916-013-9934-7.
8. Int J Artif Intell ISSN: 2252-8938
Clustering algorithms for analysing electronic medical record … (Siti Nur Shahidah Zaman Shaha)
1791
[18] E. Saleh et al., “Learning ensemble classifiers for diabetic retinopathy assessment,” Artificial Intelligence in Medicine, vol. 85,
pp. 50–63, 2018, doi: 10.1016/j.artmed.2017.09.006.
[19] S. Bashir, U. Qamar, F. H. Khan, and L. Naseem, “HMV: A medical decision support framework using multi-layer classifiers for
disease prediction,” Journal of Computational Science, vol. 13, pp. 10–25, 2016, doi: 10.1016/j.jocs.2016.01.001.
[20] Y. Li, E. Porter, A. Santorelli, M. Popović, and M. Coates, “Microwave breast cancer detection via cost-sensitive ensemble
classifiers: Phantom and patient investigation,” Biomedical Signal Processing and Control, vol. 31, pp. 366–376, 2017,
doi: 10.1016/j.bspc.2016.09.003.
[21] M. Umamaheswari and P. I. Devi, “Prediction of myocardial infarction using K-medoid clustering algorithm,” Proceedings of
the 2017 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing, INCOS 2017,
vol. 2018-February, pp. 1–6, 2018, doi: 10.1109/ITCOSP.2017.8303128.
[22] S. Babu et al., “Heart disease diagnosis using data mining technique,” Proceedings of the International Conference on Electronics,
Communication and Aerospace Technology, ICECA 2017, vol. 2017-January, pp. 750–753, 2017,
doi: 10.1109/ICECA.2017.8203643.
[23] W. Sun, Z. Cai, F. Liu, S. Fang, and G. Wang, “A survey of data mining technology on electronic medical records,” 2017 IEEE
19th International Conference on e-Health Networking, Applications and Services, Healthcom 2017, vol. 2017-December,
pp. 1–6, 2017, doi: 10.1109/HealthCom.2017.8210774.
[24] M. Liao, Y. Li, F. Kianifard, E. Obi, and S. Arcona, “Cluster analysis and its application to healthcare claims data: A study of end-
stage renal disease patients who initiated hemodialysis epidemiology and health outcomes,” BMC Nephrology, vol. 17, no. 1, 2016,
doi: 10.1186/s12882-016-0238-2.
[25] M. Sampath Premkumar and S. Hari Ganesh, “A median based external initial centroid selection method for k-means clustering,”
Proceedings - 2nd World Congress on Computing and Communication Technologies, WCCCT 2017, pp. 143–146, 2017,
doi: 10.1109/WCCCT.2016.42.
[26] R. Fang, S. Pouyanfar, Y. Yang, S. C. Chen, and S. S. Iyengar, “Computational health informatics in the big data age: A survey,”
ACM Computing Surveys, vol. 49, no. 1, 2016, doi: 10.1145/2932707.
[27] A. Abugabah, A. Al Smadi, and A. Abuqabbeh, “Data mining in health care sector: Literature notes,” ACM International Conference
Proceeding Series, pp. 63–68, 2019, doi: 10.1145/3372422.3372451.
[28] N. Arora, S. Jain, and S. K. Verma, “Range clustering: An algorithm for empirical evaluation of classical clustering algorithms,”
2016 9th International Conference on Contemporary Computing, IC3 2016, 2017, doi: 10.1109/IC3.2016.7880242.
[29] M. I. Razzak, M. Imran, and G. Xu, “Big data analytics for preventive medicine,” Neural Computing and Applications, vol. 32,
no. 9, pp. 4417–4451, 2020, doi: 10.1007/s00521-019-04095-y.
[30] C. Fiarni, E. M. Sipayung, and S. Maemunah, “Analysis and prediction of diabetes complication disease using data mining
algorithm,” Procedia Computer Science, vol. 161, pp. 449–457, 2019, doi: 10.1016/j.procs.2019.11.144.
[31] R. Delshi Howsalya Devi and P. Deepika, “Performance comparison of various clustering techniques for diagnosis of breast cancer,”
2015 IEEE International Conference on Computational Intelligence and Computing Research, ICCIC 2015, 2016,
doi: 10.1109/ICCIC.2015.7435711.
[32] K. Magoev, V. V Krzhizhanovskaya, and S. V Kovalchuk, “Application of clustering methods for detecting critical acute coronary
syndrome patients,” Procedia Computer Science, vol. 136, pp. 370–379, 2018, doi: 10.1016/j.procs.2018.08.277.
[33] D. Budgen, M. Turner, P. Brereton, and B. Kitchenham, “Using mapping studies in software engineering,” Proceedings of PPIG
2008, vol. 2, pp. 195–204, 2008.
[34] B. Kitchenham, P. Brereton, and D. Budgen, “The educational value of mapping studies of software engineering literature,”
Proceedings - International Conference on Software Engineering, vol. 1, pp. 589–598, 2010, doi: 10.1145/1806799.1806887.
[35] B. A. Kitchenham, D. Budgen, and O. Pearl Brereton, “Using mapping studies as the basis for further research - A participant-
observer case study,” Information and Software Technology, vol. 53, no. 6, pp. 638–651, 2011, doi: 10.1016/j.infsof.2010.12.011.
[36] K. Petersen, S. Vakkalanka, and L. Kuzniarz, “Guidelines for conducting systematic mapping studies in software engineering: An
update,” Information and Software Technology, vol. 64, pp. 1–18, 2015, doi: 10.1016/j.infsof.2015.03.007.
[37] S. E. V. Haryanto, M. Y. Mashor, A. S. A. Nasir, and Z. Mohamed, “Identification of Giemsa staind of malaria using k-means
clustering segmentation technique,” 2018 6th International Conference on Cyber and IT Service Management, CITSM 2018, 2019,
doi: 10.1109/CITSM.2018.8674254.
[38] P. Manivannan and P. I. Devi, “Dengue fever prediction using K-means clustering algorithm,” Proceedings of the 2017 IEEE
International Conference on Intelligent Techniques in Control, Optimization and Signal Processing, INCOS 2017, vol. 2018-
February, pp. 1–5, 2018, doi: 10.1109/ITCOSP.2017.8303126.
[39] M. Guftar, S. H. Ali, A. A. Raja, and U. Qamar, “A novel framework for classification of syncope disease using K-means
clustering algorithm,” IntelliSys 2015 - Proceedings of 2015 SAI Intelligent Systems Conference, pp. 127–132, 2015,
doi: 10.1109/IntelliSys.2015.7361135.
[40] C. Prathibhamol, A. Suresh, and G. Suresh, “Prediction of cardiac arrhythmia type using clustering and regression approach (P-CA-
CRA),” 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, vol. 2017-
Janua, pp. 51–54, 2017, doi: 10.1109/ICACCI.2017.8125815.
[41] M. Thangamani, R. Vijayalakshmi, M. Ganthimathi, M. Ranjitha, P. Malarkodi, and S. Nallusamy, “Efficient classification of
heart disease using K-means clustering algorithm,” International Journal of Engineering Trends and Technology, vol. 68, no. 12,
pp. 48–53, 2020, doi: 10.14445/22315381/IJETT-V68I12P209.
[42] T. Cerquitelli, S. Chiusano, and X. Xiao, “Exploiting clustering algorithms in a multiple-level fashion: A comparative study in the
medical care scenario,” Expert Systems with Applications, vol. 55, pp. 297–312, 2016, doi: 10.1016/j.eswa.2016.02.005.
[43] T. Cerquitelli, A. Servetti, and E. Masala, “Discovering users with similar internet access performance through cluster analysis,”
Expert Systems with Applications, vol. 64, pp. 536–548, 2016, doi: 10.1016/j.eswa.2016.08.025.
[44] K. Poczeta, L. Kubus, and A. Yastrebov, “Multidimensional medical data modeling based on fuzzy cognitive maps and k-means
clustering,” Procedia Computer Science, vol. 176, pp. 118–127, 2020, doi: 10.1016/j.procs.2020.08.013.
[45] M. F. Ijaz, M. Attique, and Y. Son, “Data-driven cervical cancer prediction model with outlier detection and over-sampling
methods,” Sensors (Switzerland), vol. 20, no. 10, 2020, doi: 10.3390/s20102809.
[46] K. Chandel, V. Kunwar, A. S. Sabitha, A. Bansal, and T. Choudhury, “Analysing thyroid disease using density-based
clustering technique,” International Journal of Business Intelligence and Data Mining, vol. 17, no. 3, pp. 273–297, 2020,
doi: 10.1504/IJBIDM.2020.109297.
[47] K. Mumtaz, M. Studies, and T. Nadu, “An analysis on density based clustering of multi dimensional spatial data,” Indian Journal
of Computer Science and Engineering, vol. 1, no. 1, pp. 8–12, 2010.
9. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 4, December 2023: 1784-1792
1792
[48] A. Al-Shammari, R. Zhou, M. Naseriparsaa, and C. Liu, “An effective density-based clustering and dynamic maintenance
framework for evolving medical data streams,” International Journal of Medical Informatics, vol. 126, pp. 176–186, 2019,
doi: 10.1016/j.ijmedinf.2019.03.016.
[49] A. Mansoul and B. Atmani, “Representative case-based retrieval to support case-based reasoning for prediction,” International
Journal of Strategic Decision Sciences, vol. 12, no. 3, pp. 14–35, 2021, doi: 10.4018/ijsds.2021070102.
[50] R. M. Liaqat, B. Mehboob, N. A. Saqib, and M. A. Khan, “A framework for clustering cardiac patient’s records using unsupervised
learning techniques,” Procedia Computer Science, vol. 58, pp. 368–373, 2016, doi: 10.1016/j.procs.2016.09.056.
BIOGRAPHIES OF AUTHORS
Siti Nur Shahidah Zaman Shah received Bachelor Degree in Computer Science
from Universiti Teknologi MARA in 2019. She received the Master degree in Computer
Science from Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA
in 2021. She can be contacted at email sitinurshahidaa@gmail.com.
Marshima Mohd Rosli is a senior lecturer at the Department of Computer
Science, Universiti Teknologi MARA, Malaysia, where she has been a faculty member since
2007. Marshima graduated with Bsc (Hons) Information Technology from Universiti Utara
Malaysia in 2001 and an M.Sc. in Real Time Software Engineering from Universiti
Teknologi Malaysia in 2006. She completed her Ph.D in Computer Science from the
University of Auckland, New Zealand, in 2018. Her research interests are primarily in the
area of software engineering, artificial intelligent and data analytics. She can be contacted at
email: marshima@fskm.uitm.edu.my