Healthcare systems generate a huge data collected from medical tests. Data mining is the computing
process of discovering patterns in large data sets such as medical examinations. Blood diseases are not an
exception; there are many test data can be collected from their patients. In this paper, we applied data
mining techniques to discover the relations between blood test characteristics and blood tumor in order to
predict the disease in an early stage, which can be used to enhance the curing ability. We conducted
experiments in our blood test dataset using three different data mining techniques which are association
rules, rule induction and deep learning. The goal of our experiments is to generate models that can
distinguish patients with normal blood disease from patients who have blood tumor. We evaluated our
results using different metrics applied on real data collected from Gaza European hospital in Palestine.
The final results showed that association rules could give us the relationship between blood test
characteristics and blood tumor. Also, it demonstrated that deep learning classifiers has the best ability to
predict tumor types of blood diseases with an accuracy of 79.45%. Also, rule induction gave us an
explanation of rules that describes both tumor in blood and normal hematology.
Anemia is a state of poor health where there is presence of low amount of red blood cell in blood stream.
This research aims to design a model for prediction of Anemia in children under 5 years of age using
Complete Blood Count reports. Data are collected from Kanti Children Hospital which consist of 700 data
records. Then they are preprocessed, normalized, balanced and selected machine learning algorithms were
applied. It is followed by verification, validation along with result analysis. Random Forest is the best
performer which showed accuracy of 98.4%. Finally, Feature Selection as well as Ensemble Learning
methods, Voting, Stacking, Bagging and Boosting were applied to improve the performance of algorithms.
Selecting the best performer algorithm, stacking with other algorithms, bagging it, boosting it are very
much crucial to improve accuracy despite of any time issue for prediction of anemia in children below 5
years of age.
Anemia is a state of poor health where there is presence of low amount of red blood cell in blood stream.
This research aims to design a model for prediction of Anemia in children under 5 years of age using
Complete Blood Count reports. Data are collected from Kanti Children Hospital which consist of 700 data
records. Then they are preprocessed, normalized, balanced and selected machine learning algorithms were
applied. It is followed by verification, validation along with result analysis. Random Forest is the best
performer which showed accuracy of 98.4%. Finally,
This study aimed to develop an unbiased RNA profiling approach for the early detection of colorectal cancer (CRC) and advanced adenomas (AA) using blood samples. The researchers combined a literature review with microarray analysis of circulating RNA purified from plasma to identify RNA biomarker panels. They tested the panels on two cohorts, detecting CRC with 75% sensitivity and 93% specificity using an 8-gene panel, and detecting AA with 60% sensitivity and 87% specificity using a 2-gene panel. The study demonstrates the feasibility of unbiased molecular diagnosis of CRC and AA from blood and introduces circulating RNA profiling as a potential non-invasive screening approach.
Heart Attack Prediction System Using Fuzzy C Means ClassifierIOSR Journals
This document presents a heart attack prediction system using a fuzzy C-means classifier. The system utilizes 13 patient attributes as inputs to the fuzzy C-means classifier to determine the risk of a heart attack. The classifier was tested on medical records from 270 patients and achieved a classification accuracy of 92%. Fuzzy C-means clustering allows data points to belong to multiple clusters, providing a more efficient and cost-effective way to predict the likelihood of patients experiencing a heart attack compared to other algorithms.
This document discusses the debate between randomized clinical trials (RCTs) and observational studies using big data. While RCTs are better for minimizing bias, observational studies can include more patients and answer questions RCTs cannot. The document outlines several large cancer databases that can help learn from every patient, including SEER and NCDB registries. It describes how these databases are being enriched with additional data sources like EHRs, genomic data, and mobile devices. This evolving use of big data from numerous sources can improve outcomes by better understanding toxicity, costs, and quality of cancer care.
Genome feature optimization and coronary artery disease prediction using cuck...CSITiaesprime
Cardiovascular diseases are among the major health ailment issue leading to millions of deaths every year. In recent past, analyzing gene expression data, particularly using machine learning strategies to predict and classify the given unlabeled gene expression record is a generous research issue. Concerning this, a substantial requirement is feature optimization, which is since the overall genes observed in human body are closely 25000 and among them 636 are cardiovascular related genes. Hence, it complexes the process of training the machine learning models using these entire cardiovascular gene features. This manuscript uses bidirectional pooled variance strategy of ANOVA standard to select optimal features. Along the side to surpass the constraint observed in traditional classifiers, which is unstable accuracy at k-fold cross validation, this manuscript proposed a classification strategy that build upon the swarm intelligence technique called cuckoo search. The experimental study indicating that the number of optimal features those selected by proposed model is substantially low that compared to the other contemporary model that selects features using forward feature selection and classifies using support vector machine classifier (FFS&SVM). The experimental study evinced that the proposed model, which selects feature by bidirectional pooled variance estimation and classifies using proposed classification strategy that build on cuckoo search (BPVE&CS) outperformed the selected contemporary model (FFS&SVM).
A KNOWLEDGE DISCOVERY APPROACH FOR BREAST CANCER MANAGEMENT IN THE KINGDOM OF...hiij
In this paper, we introduce an approach to improve and support decision-making process for breast cancer management in the Kingdom of Saudi Arabia. This can be accomplished by applying different association rule mining algorithms on the cancer information system in Saudi Arabia. It also provides valuable information about predicted distribution and segmentation of cancer in Saudi Arabia, which may be linked to possible risk factors. From the extracted patterns, the information need to be considered in the decision making process can be identified and recognized as well, which yields to knowledge based decisions. Consequently, identifying health risk behaviors among target group of patients and adopting intervention and preventive measures can be initiated in order to decrease breast cancer incidence and prevalence and ultimately the health care costs.
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGIJDKP
Heart disease is most common disease reported currently in the United States among both the genders and
according to official statistics about fifty percent of the American population is suffering from some form of
cardiovascular disease. This paper performs chi square tests and linear regression analysis to predict
heart disease based on the symptoms like chest pain and dizziness. This paper will help healthcare sectors
to provide better assistance for patients suffering from heart disease by predicting it in beginning stage of
disease. Chi square test is conducted to identify whether there is a relation between chest pain and heart
disease cases in the United States by analyzing heart disease dataset from IEEE Data Port. The test results
and analysis show that males in the United States are most likely to develop heart disease with the
symptoms like chest pain, dizziness, shortness of breath, fatigue, and nausea. This test also shows that
there is a week corelation of 0.5 is identified which shows people with all ages including teens can face
heart diseases and its prevalence increase with age. Also, the tests indicate that 90 percent of the
participant who are facing severe chest pain is suffering from heart disease where majority of the
successful heart disease identified is in males and only 10 percent participants are identified as healthy.
The evaluated p-values are much greater than the statistical threshold of 0.05 which concludes factors like
sex, Exercise angina, Cholesterol, old peak, ST_Slope, obesity, and blood sugar play significant role in
onset of cardiovascular disease. We have tested the dataset with prediction model built on logistic
regression and observed an accuracy of 85.12 percent.
Anemia is a state of poor health where there is presence of low amount of red blood cell in blood stream.
This research aims to design a model for prediction of Anemia in children under 5 years of age using
Complete Blood Count reports. Data are collected from Kanti Children Hospital which consist of 700 data
records. Then they are preprocessed, normalized, balanced and selected machine learning algorithms were
applied. It is followed by verification, validation along with result analysis. Random Forest is the best
performer which showed accuracy of 98.4%. Finally, Feature Selection as well as Ensemble Learning
methods, Voting, Stacking, Bagging and Boosting were applied to improve the performance of algorithms.
Selecting the best performer algorithm, stacking with other algorithms, bagging it, boosting it are very
much crucial to improve accuracy despite of any time issue for prediction of anemia in children below 5
years of age.
Anemia is a state of poor health where there is presence of low amount of red blood cell in blood stream.
This research aims to design a model for prediction of Anemia in children under 5 years of age using
Complete Blood Count reports. Data are collected from Kanti Children Hospital which consist of 700 data
records. Then they are preprocessed, normalized, balanced and selected machine learning algorithms were
applied. It is followed by verification, validation along with result analysis. Random Forest is the best
performer which showed accuracy of 98.4%. Finally,
This study aimed to develop an unbiased RNA profiling approach for the early detection of colorectal cancer (CRC) and advanced adenomas (AA) using blood samples. The researchers combined a literature review with microarray analysis of circulating RNA purified from plasma to identify RNA biomarker panels. They tested the panels on two cohorts, detecting CRC with 75% sensitivity and 93% specificity using an 8-gene panel, and detecting AA with 60% sensitivity and 87% specificity using a 2-gene panel. The study demonstrates the feasibility of unbiased molecular diagnosis of CRC and AA from blood and introduces circulating RNA profiling as a potential non-invasive screening approach.
Heart Attack Prediction System Using Fuzzy C Means ClassifierIOSR Journals
This document presents a heart attack prediction system using a fuzzy C-means classifier. The system utilizes 13 patient attributes as inputs to the fuzzy C-means classifier to determine the risk of a heart attack. The classifier was tested on medical records from 270 patients and achieved a classification accuracy of 92%. Fuzzy C-means clustering allows data points to belong to multiple clusters, providing a more efficient and cost-effective way to predict the likelihood of patients experiencing a heart attack compared to other algorithms.
This document discusses the debate between randomized clinical trials (RCTs) and observational studies using big data. While RCTs are better for minimizing bias, observational studies can include more patients and answer questions RCTs cannot. The document outlines several large cancer databases that can help learn from every patient, including SEER and NCDB registries. It describes how these databases are being enriched with additional data sources like EHRs, genomic data, and mobile devices. This evolving use of big data from numerous sources can improve outcomes by better understanding toxicity, costs, and quality of cancer care.
Genome feature optimization and coronary artery disease prediction using cuck...CSITiaesprime
Cardiovascular diseases are among the major health ailment issue leading to millions of deaths every year. In recent past, analyzing gene expression data, particularly using machine learning strategies to predict and classify the given unlabeled gene expression record is a generous research issue. Concerning this, a substantial requirement is feature optimization, which is since the overall genes observed in human body are closely 25000 and among them 636 are cardiovascular related genes. Hence, it complexes the process of training the machine learning models using these entire cardiovascular gene features. This manuscript uses bidirectional pooled variance strategy of ANOVA standard to select optimal features. Along the side to surpass the constraint observed in traditional classifiers, which is unstable accuracy at k-fold cross validation, this manuscript proposed a classification strategy that build upon the swarm intelligence technique called cuckoo search. The experimental study indicating that the number of optimal features those selected by proposed model is substantially low that compared to the other contemporary model that selects features using forward feature selection and classifies using support vector machine classifier (FFS&SVM). The experimental study evinced that the proposed model, which selects feature by bidirectional pooled variance estimation and classifies using proposed classification strategy that build on cuckoo search (BPVE&CS) outperformed the selected contemporary model (FFS&SVM).
A KNOWLEDGE DISCOVERY APPROACH FOR BREAST CANCER MANAGEMENT IN THE KINGDOM OF...hiij
In this paper, we introduce an approach to improve and support decision-making process for breast cancer management in the Kingdom of Saudi Arabia. This can be accomplished by applying different association rule mining algorithms on the cancer information system in Saudi Arabia. It also provides valuable information about predicted distribution and segmentation of cancer in Saudi Arabia, which may be linked to possible risk factors. From the extracted patterns, the information need to be considered in the decision making process can be identified and recognized as well, which yields to knowledge based decisions. Consequently, identifying health risk behaviors among target group of patients and adopting intervention and preventive measures can be initiated in order to decrease breast cancer incidence and prevalence and ultimately the health care costs.
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGIJDKP
Heart disease is most common disease reported currently in the United States among both the genders and
according to official statistics about fifty percent of the American population is suffering from some form of
cardiovascular disease. This paper performs chi square tests and linear regression analysis to predict
heart disease based on the symptoms like chest pain and dizziness. This paper will help healthcare sectors
to provide better assistance for patients suffering from heart disease by predicting it in beginning stage of
disease. Chi square test is conducted to identify whether there is a relation between chest pain and heart
disease cases in the United States by analyzing heart disease dataset from IEEE Data Port. The test results
and analysis show that males in the United States are most likely to develop heart disease with the
symptoms like chest pain, dizziness, shortness of breath, fatigue, and nausea. This test also shows that
there is a week corelation of 0.5 is identified which shows people with all ages including teens can face
heart diseases and its prevalence increase with age. Also, the tests indicate that 90 percent of the
participant who are facing severe chest pain is suffering from heart disease where majority of the
successful heart disease identified is in males and only 10 percent participants are identified as healthy.
The evaluated p-values are much greater than the statistical threshold of 0.05 which concludes factors like
sex, Exercise angina, Cholesterol, old peak, ST_Slope, obesity, and blood sugar play significant role in
onset of cardiovascular disease. We have tested the dataset with prediction model built on logistic
regression and observed an accuracy of 85.12 percent.
A KNOWLEDGE DISCOVERY APPROACH FOR BREAST CANCER MANAGEMENT IN THE KINGDOM OF...hiij
In this paper, we introduce an approach to improve and support decision-making process for breast cancer management in the Kingdom of Saudi Arabia. This can be accomplished by applying different association rule mining algorithms on the cancer information system in Saudi Arabia. It also provides valuable information about predicted distribution and segmentation of cancer in Saudi Arabia, which may be linked to possible risk factors. From the extracted patterns, the information need to be considered in the decision making process can be identified and recognized as well, which yields to knowledge based decisions.Consequently, identifying health risk behaviors among target group of patients and adopting interventional and preventive measures can be initiated in order to decrease breast cancer incidence and prevalence and ultimately the health care costs
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...csitconf
Feature Selection (FS) has become the focus of much research on decision support systems
areas for which datasets with tremendous number of variables are analyzed. In this paper we
present a new method for the diagnosis of Coronary Artery Diseases (CAD) founded on Genetic
Algorithm (GA) wrapped Bayes Naïve (BN) based FS.
Basically, CAD dataset contains two classes defined with 13 features. In GA–BN algorithm, GA
generates in each iteration a subset of attributes that will be evaluated using the BN in the
second step of the selection procedure. The final set of attribute contains the most relevant
feature model that increases the accuracy. The algorithm in this case produces 85.50%
classification accuracy in the diagnosis of CAD. Thus, the asset of the Algorithm is then
compared with the use of Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and
C4.5 decision tree Algorithm. The result of classification accuracy for those algorithms are
respectively 83.5%, 83.16% and 80.85%. Consequently, the GA wrapped BN Algorithm is
correspondingly compared with other FS algorithms. The Obtained results have shown very
promising outcomes for the diagnosis of CAD.
Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...cscpconf
This document presents a new method for diagnosing coronary artery disease (CAD) using genetic algorithm (GA) wrapped Bayes naive (BN) feature selection. The method uses a GA to generate feature subsets that are evaluated using BN classification. Over multiple iterations, the GA selects the feature subset that provides the highest accuracy. The algorithm is tested on a CAD dataset containing 13 features and achieves 85.5% classification accuracy. This performance is compared to other machine learning algorithms like SVM, MLP and C4.5 decision trees, which achieve lower accuracies of 83.5%, 83.16% and 80.85% respectively. The proposed method is also compared to other feature selection techniques like best first search and sequential floating forward search wrapped
This document summarizes a presentation given by Peter Embi on clinical and translational research and informatics literature from 2012-2013. It begins with Embi's background and approach to identifying relevant papers. It then describes the topics covered in the presentation, which are grouped into categories like clinical data reuse, data management/discovery, researcher support/resources, and recruitment. For each category, 1-2 key papers are summarized in 1-3 sentences. The summaries highlight the papers' goals, methods, and conclusions. The document concludes by mentioning other notable papers and events from the past year.
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...BASMAJUMAASALEHALMOH
This document discusses machine learning techniques for diagnosing cardiac disease. It evaluates three datasets using different machine learning algorithms and proposes a custom convolutional neural network and extreme gradient boosting hybrid model that shows better accuracy. It also proposes a custom sequential dense neural network model with seven layers that achieves 92.3% accuracy on a modified Cleveland dataset for diagnosing cardiac disease. Previous related work applying machine learning methods like decision trees, K-nearest neighbors, and neural networks to cardiac disease diagnosis is also reviewed.
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
This document describes a study that uses machine learning techniques to predict heart disease and diabetes from medical data. The study collected data from a public repository and preprocessed it to handle missing values. Feature selection was performed using chi-square and principal component analysis to identify important features. Three boosting classifiers - Adaptive boosting, Gradient boosting, and Extreme Gradient boosting - were trained on the data and evaluated based on accuracy. The results showed that the boosting classifiers achieved accurate prediction for both heart disease and diabetes, with the highest accuracy reported for specific classifiers and diseases.
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
According to World Health Organization (WHO), breast cancer is the top cancer in women both in the
developed and the developing world. Increased life expectancy, urbanization and adoption of western
lifestyles trigger the occurrence of breast cancer in the developing world. Most cancer events are
diagnosed in the late phases of the illness and so, early detection in order to improve breast cancer
outcome and survival is very crucial.
In this study, it is intended to contribute to the early diagnosis of breast cancer. An analysis on breast
cancer diagnoses for the patients is given. For the purpose, first of all, data about the patients whose
cancers’ have already been diagnosed is gathered and they are arranged, and then whether the other
patients are in trouble with breast cancer is tried to be predicted under cover of those data. Predictions of
the other patients are realized through seven different algorithms and the accuracies of those have been
given. The data about the patients have been taken from UCI Machine Learning Repository thanks to Dr.
William H. Wolberg from the University of Wisconsin Hospitals, Madison. During the prediction process,
RapidMiner 5.0 data mining tool is used to apply data mining with the desired algorithms.
SPATIAL CLUSTERING AND ANALYSIS ON HEPATITIS C VIRUS INFECTIONS IN EGYPT IJDKP
Lots of studies worldwide have been carried out to check out the prevalence of Hepatitis C Virus (HCV) in human populations. Spatial data analysis and clustering detection is a vital process in HCV monitoring to discover the area of high risk and to help involved decision makers to draw hypotheses about the cause of disease. Egypt is declared as one of the countries having the highest prevalence rate of HCV worldwide. The anomaly of the HCV infection’s distribution in Egypt allowed several researches to identify the reasons that contributed to such widespread of HCV in this country. One way that can help in identification of areas with highest diseases is to give a detailed knowledge about the geographical distribution of HCV in Egypt. To achieve that goal, Data mining analytical tools integrated with GIS can help to visualize the distribution. Thus, the main propose of this paper is to present a spatial distribution of HCV in Egypt using case data obtained from the Egyptian health institute National Hepatology Tropical Medicine Research Institute (NHTMR). The visualization of the spatial analysis distribution by means of GIS allows us to investigate statistical results that are easily interpreted by non-experts.
javed_prethesis2608 on predcition of heart diseasejaved75
This document presents a thesis proposal for using deep neural networks to predict heart disease from patient data. The motivation is to improve prediction accuracy rates compared to previous machine learning models. The objectives are to explore patient data, preprocess the data, create a neural network model in TensorFlow, train the model, and predict heart disease. The methodology involves feature processing, feature selection, data preprocessing, creating a neural network, training the model, and predicting outcomes. The expected outcome is a prediction accuracy rate of over 75% using this deep learning approach.
Heart disease prediction by using novel optimization algorithm_ A supervised ...BASMAJUMAASALEHALMOH
This document discusses using a novel optimization algorithm called Salp Swarm Optimization (SSO) to predict heart disease. It aims to design a framework for heart disease prediction using major risk factors and different classifier algorithms like Naive Bayes, Support Vector Machine, K-Nearest Neighbors, and a Salp Swarm Optimized Neural Network (SSO-NN). The highest performance was obtained using a Bayesian Optimized Support Vector Machine with 93.3% accuracy, followed by SSO-NN with 86.7% accuracy. The results show that the proposed novel optimized algorithm can provide an effective healthcare monitoring system for early heart disease prediction.
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
This document discusses applying machine learning algorithms to predict chronic kidney disease. It:
1) Applied three algorithms (C4.5 decision tree, SVM, and Bayesian Network) to a chronic kidney disease dataset containing 400 patients and 24 attributes to classify patients as having chronic kidney disease or not.
2) Found that the C4.5 decision tree algorithm had the best performance based on accuracy (63%), error rate (0.37), kappa statistic (0.97), and other evaluation metrics. SVM and Bayesian Network performance was lower.
3) Concludes C4.5 decision tree is the most efficient algorithm for predicting chronic kidney disease based on this medical dataset.
Lung Nodule Feature Extraction and Classification using Improved Neural Netwo...IRJET Journal
1) The document presents a technique for lung nodule feature extraction and classification using an Improved Neural Network Algorithm (INNA).
2) Texture features are extracted from CT lung images containing nodules using a Grey Level Co-occurrence Matrix based gradient approach.
3) The extracted features are used to classify lung nodules using INNA, which utilizes an enhanced backpropagation learning rule.
4) Simulation results show the proposed INNA technique achieves 98.99% accuracy in classifying cancer datasets, outperforming other techniques.
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...ijcsa
Heart disease is one of the biggest health problems in the world because of high mortality and morbidity
caused by the disease. The use of data mining on medical data brought valuable and effective life
achievements and can enhance medical knowledge to make necessary decisions. Data mining plays an
important role in the field of medical science to solve health problems and diagnose ailments in critical
conditions and in normal conditions. For this reason, in this paper, data mining techniques are used to
diagnose heart disease from a dataset that includes 200 samples from different patients. Techniques used to
diagnose heart disease include Bagging, AdaBoostM1, Random Forest, Naive Bayes, RBF Network, IBK,
and NNge that all the techniques used to diagnose heart disease use Weka tool. Then these techniques are
compared to determine which is more accurate in the diagnosis of heart disease that according to the
results, it was found that the RBF Network with the accuracy of 88.2% is the most accurate classification in
the diagnosis of heart disease.
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEIJCSEIT Journal
Breast cancer is one of the leading cancers for women in developed countries including India. It is the
second most common cause of cancer death in women. The high incidence of breast cancer in women has
increased significantly in the last years. In this paper we have discussed various data mining approaches
that have been utilized for breast cancer diagnosis and prognosis. Breast Cancer Diagnosis is
distinguishing of benign from malignant breast lumps and Breast Cancer Prognosis predicts when Breast
Cancer is to recur in patients that have had their cancers excised. This study paper summarizes various
review and technical articles on breast cancer diagnosis and prognosis also we focus on current research
being carried out using the data mining techniques to enhance the breast cancer diagnosis and prognosis.
A Survey on Various Disease Prediction Techniquesijtsrd
An analysis of various diseases have been predicted using multiple data mining and text mining techniques. In this article we are going to discuss about 6 prediction techniques. Using gene expression pattern we predict the disease outcome and implementation of pathway based approach for classifying disease based on hyper box principles, we also present a novel hybrid prediction model with missing value imputation HPM-MI which analyze imputation using simple k-means clustering. A technique based on CCAR Constraint Class Association Rule has been used for reducing time consumption in prediction of a particular disease. We have discussed about text mining technique and their applications. Another technique has also been studied about hyper triglyceride mia from anthropometric measures which diverge according to age and gender. Using multilayer classifiers for disease prediction we can achieve high diagnosis accuracy and high performance. C. Leancy Jannet | G. V. Sumalatha "A Survey on Various Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-6 , October 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18624.pdf
This paper helps in foreseeing diabetes by applying data mining strategy. The revelation of information
from clinical datasets is significant so as to make powerful medical determination. The point of data mining is to
extricate information from data put away in dataset and produce clear and reasonable depiction of examples. Diabetes
is an interminable sickness and a significant general wellbeing challenge around the world. Utilizing data mining
techniques by taking hba1c test data to help individuals to predict diabetes has increase significant fame. In this paper,
six classification models are used to classify a diabetic or non-diabetic patient and male and female patients. The
dataset utilized is gathered from a Diagnostics and research laboratory Liaquat university of medical and health
sciences Jamshoro, which gathers the data of patients with diabetes, without diabetes by taking blood sample of patient
and performing hba1c. We utilized Weka tool for the analysis diabetes, no-diabetic examination. Out of six
classification algorithms, four algorithms depict hundred percent accuracy on train and test data.
KEY WORDS: Data mining, Diabetes, HbA1c, Classification models, Weka.
Are you interested in learning how to prevent hospital readmissions for your diabetic population? It is a popular belief that measuring blood glucose for your diabetic population is the most predictive variable in determining a hospital readmission for a diabetic. However, many providers of care simply do not perform the test on known diabetic patients. This study takes a look at an advanced analytic method that works within the current healthcare providers workflow to looks to identify the likelihood of a future 30-day unplanned readmission before hospital discharge.
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...IJDKP
This document discusses examining the effect of feature selection on improving patient deterioration prediction in intensive care units. The authors apply feature selection techniques to laboratory test data from the MIMIC-II database to identify the most important laboratory tests for predicting patient deterioration. They find that feature selection can help reduce redundant tests, potentially saving costs and allowing earlier treatment. The selected features provide insights into critical tests without domain expertise. In future work, the authors plan to evaluate additional feature selection methods and classification algorithms on this task.
A PRACTICAL APPROACH TO PREDICTING DEPRESSION: VERBAL AND NON-VERBAL INSIGHTS...hiij
While global standards have been established for diagnosing depression, the reliance on expert judgement
and observation remains a challenge. This study delves into a potential approach of efficient data
collection to increase the practicability of machine learning models in accurately predicting depression
based on a comprehensive analysis of verbal and non-verbal cues exhibited by individuals.
Health Disparities: Differences in Veteran and Non-Veteran Populations using ...hiij
Introduction: This study investigated self-reported health status, health screenings, vision problems, and
vaccination rates among veteran and non-veteran groups to uncover health disparities that are critical for
informed health system planning for veteran populations.
Methods: Using public-use data from the National Health Interview Survey (2015-2018), this study adopts
an ecologic cross-sectional approach to conduct an in-depth analysis and visualization of the data assisted
by Generative AI, specifically ChatGPT-4. This integration of advanced AI tools with traditional
epidemiological principles enables systematic data management, analysis, and visualization, offering a
nuanced understanding of health dynamics across demographic segments and highlighting disparities
essential for veteran health system planning.
Findings: Disparities in self-reports of health outcomes, health screenings, vision problems, and
vaccination rates were identified, emphasizing the need for targeted interventions and policy adjustments.
Conclusion: Insights from this study could inform health system planning, using epidemiological data
assessment to suggest enhancements for veteran healthcare delivery. These findings highlight the value of
integrating Generative AI with epidemiological analysis in shaping public health policy and health
planning.
More Related Content
Similar to BLOOD TUMOR PREDICTION USING DATA MINING TECHNIQUES
A KNOWLEDGE DISCOVERY APPROACH FOR BREAST CANCER MANAGEMENT IN THE KINGDOM OF...hiij
In this paper, we introduce an approach to improve and support decision-making process for breast cancer management in the Kingdom of Saudi Arabia. This can be accomplished by applying different association rule mining algorithms on the cancer information system in Saudi Arabia. It also provides valuable information about predicted distribution and segmentation of cancer in Saudi Arabia, which may be linked to possible risk factors. From the extracted patterns, the information need to be considered in the decision making process can be identified and recognized as well, which yields to knowledge based decisions.Consequently, identifying health risk behaviors among target group of patients and adopting interventional and preventive measures can be initiated in order to decrease breast cancer incidence and prevalence and ultimately the health care costs
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...csitconf
Feature Selection (FS) has become the focus of much research on decision support systems
areas for which datasets with tremendous number of variables are analyzed. In this paper we
present a new method for the diagnosis of Coronary Artery Diseases (CAD) founded on Genetic
Algorithm (GA) wrapped Bayes Naïve (BN) based FS.
Basically, CAD dataset contains two classes defined with 13 features. In GA–BN algorithm, GA
generates in each iteration a subset of attributes that will be evaluated using the BN in the
second step of the selection procedure. The final set of attribute contains the most relevant
feature model that increases the accuracy. The algorithm in this case produces 85.50%
classification accuracy in the diagnosis of CAD. Thus, the asset of the Algorithm is then
compared with the use of Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and
C4.5 decision tree Algorithm. The result of classification accuracy for those algorithms are
respectively 83.5%, 83.16% and 80.85%. Consequently, the GA wrapped BN Algorithm is
correspondingly compared with other FS algorithms. The Obtained results have shown very
promising outcomes for the diagnosis of CAD.
Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...cscpconf
This document presents a new method for diagnosing coronary artery disease (CAD) using genetic algorithm (GA) wrapped Bayes naive (BN) feature selection. The method uses a GA to generate feature subsets that are evaluated using BN classification. Over multiple iterations, the GA selects the feature subset that provides the highest accuracy. The algorithm is tested on a CAD dataset containing 13 features and achieves 85.5% classification accuracy. This performance is compared to other machine learning algorithms like SVM, MLP and C4.5 decision trees, which achieve lower accuracies of 83.5%, 83.16% and 80.85% respectively. The proposed method is also compared to other feature selection techniques like best first search and sequential floating forward search wrapped
This document summarizes a presentation given by Peter Embi on clinical and translational research and informatics literature from 2012-2013. It begins with Embi's background and approach to identifying relevant papers. It then describes the topics covered in the presentation, which are grouped into categories like clinical data reuse, data management/discovery, researcher support/resources, and recruitment. For each category, 1-2 key papers are summarized in 1-3 sentences. The summaries highlight the papers' goals, methods, and conclusions. The document concludes by mentioning other notable papers and events from the past year.
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...BASMAJUMAASALEHALMOH
This document discusses machine learning techniques for diagnosing cardiac disease. It evaluates three datasets using different machine learning algorithms and proposes a custom convolutional neural network and extreme gradient boosting hybrid model that shows better accuracy. It also proposes a custom sequential dense neural network model with seven layers that achieves 92.3% accuracy on a modified Cleveland dataset for diagnosing cardiac disease. Previous related work applying machine learning methods like decision trees, K-nearest neighbors, and neural networks to cardiac disease diagnosis is also reviewed.
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
This document describes a study that uses machine learning techniques to predict heart disease and diabetes from medical data. The study collected data from a public repository and preprocessed it to handle missing values. Feature selection was performed using chi-square and principal component analysis to identify important features. Three boosting classifiers - Adaptive boosting, Gradient boosting, and Extreme Gradient boosting - were trained on the data and evaluated based on accuracy. The results showed that the boosting classifiers achieved accurate prediction for both heart disease and diabetes, with the highest accuracy reported for specific classifiers and diseases.
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
According to World Health Organization (WHO), breast cancer is the top cancer in women both in the
developed and the developing world. Increased life expectancy, urbanization and adoption of western
lifestyles trigger the occurrence of breast cancer in the developing world. Most cancer events are
diagnosed in the late phases of the illness and so, early detection in order to improve breast cancer
outcome and survival is very crucial.
In this study, it is intended to contribute to the early diagnosis of breast cancer. An analysis on breast
cancer diagnoses for the patients is given. For the purpose, first of all, data about the patients whose
cancers’ have already been diagnosed is gathered and they are arranged, and then whether the other
patients are in trouble with breast cancer is tried to be predicted under cover of those data. Predictions of
the other patients are realized through seven different algorithms and the accuracies of those have been
given. The data about the patients have been taken from UCI Machine Learning Repository thanks to Dr.
William H. Wolberg from the University of Wisconsin Hospitals, Madison. During the prediction process,
RapidMiner 5.0 data mining tool is used to apply data mining with the desired algorithms.
SPATIAL CLUSTERING AND ANALYSIS ON HEPATITIS C VIRUS INFECTIONS IN EGYPT IJDKP
Lots of studies worldwide have been carried out to check out the prevalence of Hepatitis C Virus (HCV) in human populations. Spatial data analysis and clustering detection is a vital process in HCV monitoring to discover the area of high risk and to help involved decision makers to draw hypotheses about the cause of disease. Egypt is declared as one of the countries having the highest prevalence rate of HCV worldwide. The anomaly of the HCV infection’s distribution in Egypt allowed several researches to identify the reasons that contributed to such widespread of HCV in this country. One way that can help in identification of areas with highest diseases is to give a detailed knowledge about the geographical distribution of HCV in Egypt. To achieve that goal, Data mining analytical tools integrated with GIS can help to visualize the distribution. Thus, the main propose of this paper is to present a spatial distribution of HCV in Egypt using case data obtained from the Egyptian health institute National Hepatology Tropical Medicine Research Institute (NHTMR). The visualization of the spatial analysis distribution by means of GIS allows us to investigate statistical results that are easily interpreted by non-experts.
javed_prethesis2608 on predcition of heart diseasejaved75
This document presents a thesis proposal for using deep neural networks to predict heart disease from patient data. The motivation is to improve prediction accuracy rates compared to previous machine learning models. The objectives are to explore patient data, preprocess the data, create a neural network model in TensorFlow, train the model, and predict heart disease. The methodology involves feature processing, feature selection, data preprocessing, creating a neural network, training the model, and predicting outcomes. The expected outcome is a prediction accuracy rate of over 75% using this deep learning approach.
Heart disease prediction by using novel optimization algorithm_ A supervised ...BASMAJUMAASALEHALMOH
This document discusses using a novel optimization algorithm called Salp Swarm Optimization (SSO) to predict heart disease. It aims to design a framework for heart disease prediction using major risk factors and different classifier algorithms like Naive Bayes, Support Vector Machine, K-Nearest Neighbors, and a Salp Swarm Optimized Neural Network (SSO-NN). The highest performance was obtained using a Bayesian Optimized Support Vector Machine with 93.3% accuracy, followed by SSO-NN with 86.7% accuracy. The results show that the proposed novel optimized algorithm can provide an effective healthcare monitoring system for early heart disease prediction.
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
This document discusses applying machine learning algorithms to predict chronic kidney disease. It:
1) Applied three algorithms (C4.5 decision tree, SVM, and Bayesian Network) to a chronic kidney disease dataset containing 400 patients and 24 attributes to classify patients as having chronic kidney disease or not.
2) Found that the C4.5 decision tree algorithm had the best performance based on accuracy (63%), error rate (0.37), kappa statistic (0.97), and other evaluation metrics. SVM and Bayesian Network performance was lower.
3) Concludes C4.5 decision tree is the most efficient algorithm for predicting chronic kidney disease based on this medical dataset.
Lung Nodule Feature Extraction and Classification using Improved Neural Netwo...IRJET Journal
1) The document presents a technique for lung nodule feature extraction and classification using an Improved Neural Network Algorithm (INNA).
2) Texture features are extracted from CT lung images containing nodules using a Grey Level Co-occurrence Matrix based gradient approach.
3) The extracted features are used to classify lung nodules using INNA, which utilizes an enhanced backpropagation learning rule.
4) Simulation results show the proposed INNA technique achieves 98.99% accuracy in classifying cancer datasets, outperforming other techniques.
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...ijcsa
Heart disease is one of the biggest health problems in the world because of high mortality and morbidity
caused by the disease. The use of data mining on medical data brought valuable and effective life
achievements and can enhance medical knowledge to make necessary decisions. Data mining plays an
important role in the field of medical science to solve health problems and diagnose ailments in critical
conditions and in normal conditions. For this reason, in this paper, data mining techniques are used to
diagnose heart disease from a dataset that includes 200 samples from different patients. Techniques used to
diagnose heart disease include Bagging, AdaBoostM1, Random Forest, Naive Bayes, RBF Network, IBK,
and NNge that all the techniques used to diagnose heart disease use Weka tool. Then these techniques are
compared to determine which is more accurate in the diagnosis of heart disease that according to the
results, it was found that the RBF Network with the accuracy of 88.2% is the most accurate classification in
the diagnosis of heart disease.
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEIJCSEIT Journal
Breast cancer is one of the leading cancers for women in developed countries including India. It is the
second most common cause of cancer death in women. The high incidence of breast cancer in women has
increased significantly in the last years. In this paper we have discussed various data mining approaches
that have been utilized for breast cancer diagnosis and prognosis. Breast Cancer Diagnosis is
distinguishing of benign from malignant breast lumps and Breast Cancer Prognosis predicts when Breast
Cancer is to recur in patients that have had their cancers excised. This study paper summarizes various
review and technical articles on breast cancer diagnosis and prognosis also we focus on current research
being carried out using the data mining techniques to enhance the breast cancer diagnosis and prognosis.
A Survey on Various Disease Prediction Techniquesijtsrd
An analysis of various diseases have been predicted using multiple data mining and text mining techniques. In this article we are going to discuss about 6 prediction techniques. Using gene expression pattern we predict the disease outcome and implementation of pathway based approach for classifying disease based on hyper box principles, we also present a novel hybrid prediction model with missing value imputation HPM-MI which analyze imputation using simple k-means clustering. A technique based on CCAR Constraint Class Association Rule has been used for reducing time consumption in prediction of a particular disease. We have discussed about text mining technique and their applications. Another technique has also been studied about hyper triglyceride mia from anthropometric measures which diverge according to age and gender. Using multilayer classifiers for disease prediction we can achieve high diagnosis accuracy and high performance. C. Leancy Jannet | G. V. Sumalatha "A Survey on Various Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-6 , October 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18624.pdf
This paper helps in foreseeing diabetes by applying data mining strategy. The revelation of information
from clinical datasets is significant so as to make powerful medical determination. The point of data mining is to
extricate information from data put away in dataset and produce clear and reasonable depiction of examples. Diabetes
is an interminable sickness and a significant general wellbeing challenge around the world. Utilizing data mining
techniques by taking hba1c test data to help individuals to predict diabetes has increase significant fame. In this paper,
six classification models are used to classify a diabetic or non-diabetic patient and male and female patients. The
dataset utilized is gathered from a Diagnostics and research laboratory Liaquat university of medical and health
sciences Jamshoro, which gathers the data of patients with diabetes, without diabetes by taking blood sample of patient
and performing hba1c. We utilized Weka tool for the analysis diabetes, no-diabetic examination. Out of six
classification algorithms, four algorithms depict hundred percent accuracy on train and test data.
KEY WORDS: Data mining, Diabetes, HbA1c, Classification models, Weka.
Are you interested in learning how to prevent hospital readmissions for your diabetic population? It is a popular belief that measuring blood glucose for your diabetic population is the most predictive variable in determining a hospital readmission for a diabetic. However, many providers of care simply do not perform the test on known diabetic patients. This study takes a look at an advanced analytic method that works within the current healthcare providers workflow to looks to identify the likelihood of a future 30-day unplanned readmission before hospital discharge.
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...IJDKP
This document discusses examining the effect of feature selection on improving patient deterioration prediction in intensive care units. The authors apply feature selection techniques to laboratory test data from the MIMIC-II database to identify the most important laboratory tests for predicting patient deterioration. They find that feature selection can help reduce redundant tests, potentially saving costs and allowing earlier treatment. The selected features provide insights into critical tests without domain expertise. In future work, the authors plan to evaluate additional feature selection methods and classification algorithms on this task.
Similar to BLOOD TUMOR PREDICTION USING DATA MINING TECHNIQUES (20)
A PRACTICAL APPROACH TO PREDICTING DEPRESSION: VERBAL AND NON-VERBAL INSIGHTS...hiij
While global standards have been established for diagnosing depression, the reliance on expert judgement
and observation remains a challenge. This study delves into a potential approach of efficient data
collection to increase the practicability of machine learning models in accurately predicting depression
based on a comprehensive analysis of verbal and non-verbal cues exhibited by individuals.
Health Disparities: Differences in Veteran and Non-Veteran Populations using ...hiij
Introduction: This study investigated self-reported health status, health screenings, vision problems, and
vaccination rates among veteran and non-veteran groups to uncover health disparities that are critical for
informed health system planning for veteran populations.
Methods: Using public-use data from the National Health Interview Survey (2015-2018), this study adopts
an ecologic cross-sectional approach to conduct an in-depth analysis and visualization of the data assisted
by Generative AI, specifically ChatGPT-4. This integration of advanced AI tools with traditional
epidemiological principles enables systematic data management, analysis, and visualization, offering a
nuanced understanding of health dynamics across demographic segments and highlighting disparities
essential for veteran health system planning.
Findings: Disparities in self-reports of health outcomes, health screenings, vision problems, and
vaccination rates were identified, emphasizing the need for targeted interventions and policy adjustments.
Conclusion: Insights from this study could inform health system planning, using epidemiological data
assessment to suggest enhancements for veteran healthcare delivery. These findings highlight the value of
integrating Generative AI with epidemiological analysis in shaping public health policy and health
planning.
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
HEALTH DISPARITIES: DIFFERENCES IN VETERAN AND NON-VETERAN POPULATIONS USING ...hiij
Introduction: This study investigated self-reported health status, health screenings, vision problems, and
vaccination rates among veteran and non-veteran groups to uncover health disparities that are critical for
informed health system planning for veteran populations.
Methods: Using public-use data from the National Health Interview Survey (2015-2018), this study adopts
an ecologic cross-sectional approach to conduct an in-depth analysis and visualization of the data assisted
by Generative AI, specifically ChatGPT-4. This integration of advanced AI tools with traditional
epidemiological principles enables systematic data management, analysis, and visualization, offering a
nuanced understanding of health dynamics across demographic segments and highlighting disparities
essential for veteran health system planning.
Findings: Disparities in self-reports of health outcomes, health screenings, vision problems, and
vaccination rates were identified, emphasizing the need for targeted interventions and policy adjustments.
Conclusion: Insights from this study could inform health system planning, using epidemiological data
assessment to suggest enhancements for veteran healthcare delivery. These findings highlight the value of
integrating Generative AI with epidemiological analysis in shaping public health policy and health
planning.
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
BRIEF COMMENTARY: USING A LOGIC MODEL TO INTEGRATE PUBLIC HEALTH INFORMATICS ...hiij
The COVID-19 pandemic has been a watershed moment in public health surveillance, highlighting the
crucial role of data-driven insights in informing health actions and policies. Revisiting key concepts—
public health, epidemiology in public health practice, public health surveillance, and public health
informatics—lays the foundation for understanding how these elements converge to create a robust public
health surveillance system framework. Especially during the COVID-19 pandemic, this integration was
exemplified by the WHO efforts in data dissemination and the subsequent global response. The role of
public health informatics emerged as instrumental in this context, enhancing data collection, management,
analysis, interpretation, and dissemination processes. A logic model for public health surveillance systems
encapsulates the integration of these concepts. It outlines the inputs and outcomes and emphasizes the
crucial actions and resources for effective system operation, including the imperative of training and
capacity development.
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
AUTOMATIC AND NON-INVASIVE CONTINUOUS GLUCOSE MONITORING IN PAEDIATRIC PATIENTShiij
Glycated haemoglobin does not allow you to highlight the effects that food choices, physical activity and
medications have on your glycaemic control day by day. The best way to monitor and keep track of the
immediate effects that these have on your blood sugar levels is self-monitoring, therefore the use of a
glucometer. Thanks to this tool you have the possibility to promptly receive information that helps you to
intervene in the most appropriate way, bringing or keeping your blood sugar levels as close as possible to
the reference values indicated by your doctor. Currently, blood glucose meters are used to measure and
control blood glucose. Diabetes is a fairly complex disease and it is important for those who suffer from it
to check their blood sugar (blood sugar) periodically throughout the day to prevent dangerous
complications. Many children newly diagnosed with diabetes and their families may face unique challenges
when dealing with the everyday management of diabetes, including treatments, adapting to dietary
changes, and the routine monitoring of blood glucose. Many questions may also arise when selecting a
blood glucose meter for paediatric patients. With current blood glucose meters, even with multiple daily
self-tests, high and low blood glucose levels may not be detected. Key factors that may be considered when
selecting a meter include accuracy of the meter; size of the meter; small sample size required for testing;
ease of use and easy-to-follow testing procedure; ability for alternate testing sites; quick testing time and
availability of results; ease of portability to allow testing at school and during leisure time; easyto- read
numbers on display; memory options; cost of meter and supplies. In this study we will show a new
automatic portable, non-invasive device and painless for the daily continuous monitoring (24 hours a day)
of blood glucose in paediatric patients.
INTEGRATING MACHINE LEARNING IN CLINICAL DECISION SUPPORT SYSTEMShiij
This review article examines the role of machine learning (ML) in enhancing Clinical Decision Support
Systems (CDSSs) within the modern healthcare landscape. Focusing on the integration of various ML
algorithms, such as regression, random forest, and neural networks, the review aims to showcase their
potential in advancing patient care. A rapid review methodology was utilized, involving a survey of recent
articles from PubMed and Google Scholar on ML applications in healthcare. Key findings include the
demonstration of ML's predictive power in patient outcomes, its ability to augment clinician knowledge,
and the effectiveness of ensemble algorithmic approaches. The review highlights specific applications of
diverse ML models, including moment kernel machines in predicting surgical outcomes, k-means clustering
in simplifying disease phenotypes, and extreme gradient boosting in estimating injury risk. Emphasizing
the potential of ML to tackle current healthcare challenges, the article highlights the critical role of ML in
evolving CDSSs for improved clinical decision-making and patient care. This comprehensive review also
addresses the challenges and limitations of integrating ML into healthcare systems, advocating for a
collaborative approach to refine these systems for safety, efficacy, and equity.
BRIEF COMMENTARY: USING A LOGIC MODEL TO INTEGRATE PUBLIC HEALTH INFORMATICS ...hiij
The COVID-19 pandemic has been a watershed moment in public health surveillance, highlighting the
crucial role of data-driven insights in informing health actions and policies. Revisiting key concepts—
public health, epidemiology in public health practice, public health surveillance, and public health
informatics—lays the foundation for understanding how these elements converge to create a robust public
health surveillance system framework. Especially during the COVID-19 pandemic, this integration was
exemplified by the WHO efforts in data dissemination and the subsequent global response. The role of
public health informatics emerged as instrumental in this context, enhancing data collection, management,
analysis, interpretation, and dissemination processes. A logic model for public health surveillance systems
encapsulates the integration of these concepts. It outlines the inputs and outcomes and emphasizes the
crucial actions and resources for effective system operation, including the imperative of training and
capacity development.
INTEGRATING MACHINE LEARNING IN CLINICAL DECISION SUPPORT SYSTEMShiij
This review article examines the role of machine learning (ML) in enhancing Clinical Decision Support
Systems (CDSSs) within the modern healthcare landscape. Focusing on the integration of various ML
algorithms, such as regression, random forest, and neural networks, the review aims to showcase their
potential in advancing patient care. A rapid review methodology was utilized, involving a survey of recent
articles from PubMed and Google Scholar on ML applications in healthcare. Key findings include the
demonstration of ML's predictive power in patient outcomes, its ability to augment clinician knowledge,
and the effectiveness of ensemble algorithmic approaches. The review highlights specific applications of
diverse ML models, including moment kernel machines in predicting surgical outcomes, k-means clustering
in simplifying disease phenotypes, and extreme gradient boosting in estimating injury risk. Emphasizing
the potential of ML to tackle current healthcare challenges, the article highlights the critical role of ML in
evolving CDSSs for improved clinical decision-making and patient care. This comprehensive review also
addresses the challenges and limitations of integrating ML into healthcare systems, advocating for a
collaborative approach to refine these systems for safety, efficacy, and equity.
Health Informatics - An International Journal (HIIJ)hiij
Healthcare Informatics: An International Journal is a quarterly open access peer-reviewed journal that Publishes articles which contribute new results in all areas of the health care.
The journal focuses on all of aspect in theory, practices, and applications of Digital Health Records, Knowledge Engineering in Health, E-Health Information, and Information Management in healthcare, Bio-Medical Expert Systems, ICT in health promotion and related topics. Original contributions are solicited on topics covered under the broad areas such as (but not limited to) listed below:
The Proposed Guidelines for Cloud Computing Migration for South African Rural...hiij
It is now overdue for the hospitals in South African rural areas to implement cloud computing technologies in order to access patient data quickly in an emergency. Sometimes medical practitioners take time to attend patients due to the unavailability of kept records, leading to either a loss of time or the reassembling of processes to recapture lost patient files. However, there are few studies that highlight challenges faced by rural hospitals but they do not recommend strategies on how they can migrate to cloud computing. The purpose of this paper was to review recent papers about the critical factors that influence South African hospitals in adopting cloud computing. The contribution of the study is to lay out the importance of cloud computing in the health sectors and to suggest guidelines that South African rural hospitals can follow in order to successfully relocate into cloud computing.The existing literature revealed that Hospitals may enhance their record-keeping procedures and conduct business more effectively with the help of the cloud computing. In conclusion, if hospitals in South African rural areas is to fully benefit from cloud-based records management systems, challenges relating to data storage, privacy, security, and the digital divide must be overcome.
SUPPORTING LARGE-SCALE NUTRITION ANALYSIS BASED ON DIETARY SURVEY DATAhiij
While online survey systems facilitate the collection on copious records on diet, exercise and other healthrelated data, scientists and other public health experts typically must download data from those systems
into external tools for conducting statistical analyses. A more convenient approach would enable
researchers to perform analyses online, without the need to coordinate additional analysis tools. This
paper presents a system illustrating such an approach, using as a testbed the WAVE project, which is a 5-
year childhood obesity prevention initiative being conducted at Oregon State University by health scientists
utilizing a web application called WavePipe. This web application has enabled health scientists to create
studies, enrol subjects, collect physical activity data, and collect nutritional data through online surveys.
This paper presents a new sub-system that enables health scientists to analyse and visualize nutritional
profiles based on large quantities of 24-hour dietary recall records for sub-groups of study subjects over
any desired period of time. In addition, the sub-system enables scientists to enter new food information
from food composition databases to build a comprehensive food profile. Interview feedback from novice
health science researchers using the new functionality indicated that it provided a usable interface and
generated high receptiveness to using the system in practice.
AN EHEALTH ADOPTION FRAMEWORK FOR DEVELOPING COUNTRIES: A SYSTEMATIC REVIEWhiij
The document summarizes a systematic literature review on factors influencing adoption of eHealth technologies in developing countries. The review analyzed 29 papers published between 2009-2021. Key findings included:
- Widely used frameworks for eHealth adoption in developing countries were TAM, UTAUT, and TOE, but these did not fully capture all relevant factors.
- Additional factors identified included socio-demographic, technological, information, socio-cultural, organizational, governance, ethical/legal, and financial dimensions.
- The review proposed a novel, context-specific eHealth adoption framework for developing countries with eight dimensions addressing the above factors.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Software Engineering and Project Management - Software Testing + Agile Method...Prakhyath Rai
Software Testing: A Strategic Approach to Software Testing, Strategic Issues, Test Strategies for Conventional Software, Test Strategies for Object -Oriented Software, Validation Testing, System Testing, The Art of Debugging.
Agile Methodology: Before Agile – Waterfall, Agile Development.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Build the Next Generation of Apps with the Einstein 1 Platform.
Rejoignez Philippe Ozil pour une session de workshops qui vous guidera à travers les détails de la plateforme Einstein 1, l'importance des données pour la création d'applications d'intelligence artificielle et les différents outils et technologies que Salesforce propose pour vous apporter tous les bénéfices de l'IA.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...PIMR BHOPAL
Variable frequency drive .A Variable Frequency Drive (VFD) is an electronic device used to control the speed and torque of an electric motor by varying the frequency and voltage of its power supply. VFDs are widely used in industrial applications for motor control, providing significant energy savings and precise motor operation.
Mechanical Engineering on AAI Summer Training Report-003.pdf
BLOOD TUMOR PREDICTION USING DATA MINING TECHNIQUES
1. Health Informatics - An International Journal (HIIJ) Vol.6, No.2, May 2017
DOI: 10.5121/hiij.2017.6202 23
BLOOD TUMOR PREDICTION USING DATA MINING
TECHNIQUES
Alaa M. El-Halees1
, Asem H. Shurrab2
1
Faculty of Information Technology, Islamic University of Gaza, Gaza, Palestine
2
M.Sc., Dept. of I.T., Faculty of Information Technology, Islamic University of Gaza,
Gaza, Palestine
ABSTRACT
Healthcare systems generate a huge data collected from medical tests. Data mining is the computing
process of discovering patterns in large data sets such as medical examinations. Blood diseases are not an
exception; there are many test data can be collected from their patients. In this paper, we applied data
mining techniques to discover the relations between blood test characteristics and blood tumor in order to
predict the disease in an early stage, which can be used to enhance the curing ability. We conducted
experiments in our blood test dataset using three different data mining techniques which are association
rules, rule induction and deep learning. The goal of our experiments is to generate models that can
distinguish patients with normal blood disease from patients who have blood tumor. We evaluated our
results using different metrics applied on real data collected from Gaza European hospital in Palestine.
The final results showed that association rules could give us the relationship between blood test
characteristics and blood tumor. Also, it demonstrated that deep learning classifiers has the best ability to
predict tumor types of blood diseases with an accuracy of 79.45%. Also, rule induction gave us an
explanation of rules that describes both tumor in blood and normal hematology.
KEYWORDS
Hematology diseases, Blood tumor, Rule induction, Association rules, deep learning.
1. INTRODUCTION
Data generated from healthcare domain is vast and complex. These data contain many hidden
patterns which can help to discover and predict diseases in the medical field. The prediction
process of these diseases can reduce the numbers of mortalities and enhance the quality of life for
the patients infected with these diseases [1]. Data mining was widely used in the healthcare
domain, for example, data mining can help to detect fraud and abuse of health insurance, make
customer relationship management decisions by management, identify effective treatments and
best practices by physicians [2].
Hematologic diseases study the blood diseases such as leukemia, thalassemia , lymphoma …etc.
The medical aspect of Hematology is concerned with the treatment of blood disorders [3].
Hematologic diseases, like any other healthcare fields, generate an enormous amount of data.
Traditional statistics is not enough to analysis these data, using data mining techniques is a better
alternative [4]. Many types of research were done in this field, trying to discover a new
knowledge or patterns which can help the humanity to detect diseases and in best case predict it
before happening, by applying different types of data mining techniques and methods.
Researchers applied data mining techniques to Hematologic diseases usually use data generated
from tests like Complete Blood Count (CBC) test. CBC test measures the number of blood cells
2. Health Informatics - An International Journal (HIIJ) Vol.6, No.2, May 2017
24
circulating in the bloodstream. The test is a common laboratory blood test that can be used to
detect blood tumor and monitor tumor treatment [5].
The aim of our study is to use data mining techniques to classify CBC sample of a blood disease
patient as normal hematology disease or blood tumor. In our study, we collected data set of
CBC sampled from patients in Europe Gaza Hospital in Palestine; the data belong to the
Department of Oncology. Then, we applied three data mining methods to our collected dataset,
which are: association rules, rule induction and deep learning. Association rules are a method
used to discovery interesting relations between variables. Association rules have been used in
many applications of healthcare [6]. In our paper, we investigated which CBC test has a relation
with blood tumor sample. The second method we used was rule induction. Rule induction
discovers patterns hidden in data. In this paper, we used rule induction to discover patterns that
associated with blood tumor and normal hematology classes. The third method we used was
deep learning. Deep learning is a machine learning method that utilizes a hierarchical level of
artificial neural networks to carry out the process training data. Deep learning has been used for
the analysis of medical data (e.g. Ravi et al. in [7] gave a survey). We used deep learning because
of its ability to detect target class more accurately than other machine learning methods especially
in healthcare domain [8].
The rest of the paper is structured as follows: the second section discusses the related works, the
third section addresses our material and methods, the fourth section about experiments and
results, while the fifth section implies the conclusion and future works of the paper.
2. RELATED WORK
Because of the enormous numbers of data in medical fields, which are available today, many
researchers depending on data mining techniques to get new knowledge. Some of these research
done in hematology diseases, such as: Abdullah and Al-Asmari in [9] used data mining to
specify the anemia type for the anemic patients through a predictive model. They used real data
constructed from the Complete Blood Count (CBC) test results of the patients. These data filtered
and eliminated undesirable variables, then implemented on five classification algorithms which
are: Naïve Bayes, Multilayer Perception, J48 and SMO. They found that J48 decision tree and
SMO performs best with 93.75% accuracy in the percentage split of 60%. Shouval et al. in [4]
used data mining techniques in the field of Allogeneic Hematopoietic Stem Cell Transplantation
(SCT), that predicts transplantation outcome and donor selection. They proposed to use decision
trees, Artificial Neural Networks (ANNs) and Support Vector Machines (SVM). No actual
experiments were done. Al-shami and Al-halees in [10] used the data mining techniques on
CBC tests to detect Thalassemia disease. They conducted four type of experiments on the data
with all attributes in their data set samples and then repeats the experiments after reducing some
features from the dataset. They used three classifiers (Decision Tree, Naïve Bayes, and Neural
Network). The accuracy results of their experiments exceeded 90%, and it showed that the critical
point which can be the first indicator of the thalassemia existence is MCV ≤ 77.65. Also, Minnie
and Srinivasan in [11] used data mining on Blood Cell Counter data to convert the raw data into
transformed data that can be used for generating knowledge. They used association rules and
clusters on the collected data. Saichanma et al. in [12] used data mining technique to predict
abnormality in peripheral blood smear from 1,362 students by using 13 data set of hematological
parameters gathered from automated blood cell counter. They found that the decision tree, which
is created by the algorithm, can be used as a practical guideline for RBC morphology prediction
by using four hematological parameters (MCV, MCH, Hct, and RBC). In addition, Amin and
Habib in [8] compared different classification techniques using WEKA for Hematological Data.
They investigated which algorithm is most suitable for user working on hematological data. Their
model can predict hematological data comment and developed a mobile application that can make
3. Health Informatics - An International Journal (HIIJ) Vol.6, No.2, May 2017
25
diagnosis hematological data comments. The best algorithm based on the hematological data was
J48 classifier with an accuracy of 97.16%. Finally, Vijayarani and Sudha in [13] developed
weight based k-means algorithm for identifying leukemia, inflammatory, bacterial or viral
infection, HIV infection and pernicious anemia diseases from the hemogram blood test samples
data set. They found that the clustering accuracy of weight based k-means algorithm is better
when compared to k means and fuzzy c means.
3. MATERIAL AND METHODS
3.1 Collected Dataset
The dataset we used in this paper was collected from Europe Gaza Hospital, Gaza Strip,
Palestine. The dataset belongs to the Department of Oncology and Blood Analysis Division. The
dataset contains 5350 CBS samples after cleaning with different blood diseases. We divided the
dataset into two groups , group one has 1764 CBC samples of blood tumor patients we labeled
them as 'Tumor' and group two which has 3586 CBS samples of patients have other blood
diseases, we labeled them as 'Hematology'.
The dataset has 14 attributes represent the CBC features as in Table 1.We added one more
feature which is the gender of the patient because of its importance. Name and Patient-ID
dropped due to the privacy of the blood sample’s owner.
Table1: Attributes of CBC sample
No. Symbol Meaning
1 WBC White Blood Cell
2 RBC Red Blood Cell,
3 HGB Hemoglobin
4 HCT Hematocrit
5 MID mid-range absolute count
5 MCV Mean Cellular Volume
6 MCH Mean Cellular Hemoglobin
7 MCHC Mean Cellular Hemoglobin Concentration
8 RDW RBC Distribution Width
9 PLT Platelets Count
10 MPV Platelet volume
11 GRAD percentage of white blood cells with granules in their cytoplasm
12 LYM Lymphocyte percent
13 Gender Male, Female
14 Class Tumor, Hematology
3.2 Dataset Preprocessing
In the preprocessing stage, we eliminated useless attributes, refilled the missing values, removed
duplicative values and removed the outlier values of the collected samples.
In addition, the collected data was imbalanced where the data have 1764 tumor patients,
compared to 3586 hematology patients. To overcome this issue we used Synthetic Minority
Oversampling Technique method (SMOTE). For each minority data, a new synthetic data
instance is generated by taking the difference between the feature vector of the example and its
nearest neighbor belonging to the same class, multiplying it by a random number between 0 and 1
4. Health Informatics - An International Journal (HIIJ) Vol.6, No.2, May 2017
26
and then adding it to the instance [14]. After these operations, the number of records in the
dataset became 7172 records where half of them tumor class and the other half hematology
class.
In addition, in association rules, the data should be nominal not numerical, so we transformed the
values in each attribute to three types (Low, Normal, High) based on work of The WebMD
Medical Team in [15].
3.3 Data Mining Methods
In this paper, we used three data mining methods which are: association rules, rule induction and
deep learning
Association rule mining is one of the most important and well researched techniques of data
mining for descriptive task, initially used for market basket analysis. It finds all the rules existing
in the transactional database that satisfy some minimum support and minimum confidence
constraints. Association rules are expressed in the form of IF-THEN rules. In our experiments, we
used FP-Growth method to generate frequent itemsets. Then, frequent-itemsets are converted to
association rules [16]. However, the resulting rules were numerous. Therefore, rules are chosen
according to the goal and taking into consideration that the selected rules are strong rules which
should have a value more than certain minimum support and minimum confidence. Classification
using Association rule mining is a major Predictive analysis technique that aims to discover a
small set of rule in the database that forms an accurate classifier [17]. Classification Based
Association used the rule of the form <features-sets> -> Class Labels. These rules ranked first by
confidence and then support [18].
The second classification method we used was rule induction, which extracts a set of rules that
show the relationships between the attributes of a dataset and the class label [19]. Since
regularities hidden in data are expressed in terms of rules, rule induction is one of the
fundamental methods of data mining. Usually, rules are expressions of the form If (attribute_1=
value_ 1) and (attribute_2, value_2) …. (attribute_ n, value_n) then (class_name, class_label). In
our experiments, we used covering algorithms which is a strategy for generating a rule set
directly: for each class in turn find rule set that covers all instances in it (excluding instances not
in the class).
The third method we used was deep learning. Deep learning is an advanced type of neural
network that has a collection of algorithms used in machine learning. It uses to model high-level
abstractions in data through the use of model architectures, which are composed of multiple
nonlinear transformations, unlike traditional neural network which builds analysis with data in a
linear way. An algorithm is considered to be deep if the input data is passed through series of
nonlinearities or nonlinear transformations before it becomes output. In deep learning, the
manual identification of features in data removed and, instead, it relies on whatever training
process it has in order to discover the useful patterns in the input examples. This makes training
easier and faster, and it can yield a better result. In deep-learning networks, each layer of nodes
trains on a distinct set of features based on the previous layer’s output. The further the advance
into the neural net, the more complex the features the nodes can recognize, since they aggregate
and recombine features from the previous layer [20] .
4. EXPERIMENTS AND RESULTS DISCUSSION
In this section, we describe the experiments and discuss the results of applying the three
classifiers on our dataset. To conduct our experiments, we used cross-validation experimental
5. Health Informatics - An International Journal (HIIJ) Vol.6, No.2, May 2017
27
method with n=5 where we divided our dataset to five subsets one for training and the others for
testing. Then, we applied the classification step as follows:
4.1 Association Rules
We generated association rules from the given data set using the minimum support of 0.5 and
minimum confident of 0.7. Some examples of these rules that associated with blood tumor
samples are presented in Table 2. From the table, we can conclude that tumor is associated mainly
by: high RDW, low HGB, low HCT, low LYM, and low HGB.
Table 2: Sample Association rules related to tumor samples.
RDW = High, HGB = Low, HCT = Low class = tumor
LYM = Low, HGB = Low, HCT = Low class = tumor
MID = Normal, RDW = High, HCT = Low class = tumor
MID = Normal, RDW = High, HGB = Low class = tumor
LYM = Low, HGB = Low, HCT = Low class = tumor
MID = Normal, LYM = Low, HGB = Low class = tumor
LYM = Low, HGB = Low class = tumor
HGB = Low, HCT = Low class = tumor
In addition, Table 3 gives some examples of attributes associated with normal hematology
sample. From the table, we can notice that attributes related to normal hematology mainly are:
normal GRAN, normal RBC, low MCV and normal MID .
Table 3: Sample Association rules related to Hematology samples.
LYM = Low, GRAN = Normal, RBC = Normal, MCV = Low class = Hematology
GRAN = Normal, RBC = Normal, MCV = Low class = Hematology
MID = Normal, GRAN = Normal, RBC = Normal, MCV = Low class = Hematology
LYM = Low, MPV = Normal, HCT = Normal, HGB = Normal class = Hematology
4.2 Rules Induction
Figure 1 gives the most important rules that resultant from apply rule induction method in our
data set.
If LYM ≤ 1.850 and RBC ≤ 3.635 then tumor.
If GRAN ≤ 5.100 and LYM > 2.250 and MCV ≤ 74.700 then Hematology.
If MCV > 84.650 and RDW > 14.700 and GRAN > 5.950 then tumor.
If MCV > 78.350 and GRAN > 2.800 and LYM ≤ 1.850 and RDW > 14.800 then tumor.
If PLT > 374 and GRAN ≤ 9.900 then Hematology.
If MCV ≤ 86.100 and RDW ≤ 15.200 and RBC ≤ 4.415 then Hematology.
If RBC ≤ 4.595 and HCT > 33.450 and MCV > 90.200 then tumor.
If MCH > 25.750 and GRAN > 4.600 and HCT > 36.250 then tumor
Figure 1: Sample rules as result of using Inductive rules method
From these results, we can conclude that the most important rules that can predict tumor from
CBC sample are: LYM less than 1.85 and RBC less than 3.64. Also, the rule MCV greater than
84.650 and RDW greater than 14.700 and GRAN greater than 5.950.
6. Health Informatics - An International Journal (HIIJ) Vol.6, No.2, May 2017
28
For normal Hematology, the most important rules are: GRAN less than 5.100 and LYM greater
than 2.250 and MCV less than74.700. Also the rule: PLT greater than 374 and GRAN less than
9.900.
Table 4 gives the confusion matrix of using rule induction in blood samples. These results come
with accuracy 71.66%. It has f-measure of 71.75% for tumor prediction.
Table 4: Confusion matrix of using rule induction
True Tumor True Hematology
Predicted Tumor 2583 1029
Predicted Hematology 1004 2558
4.3 Deep Learning
In this paper, we used the H2O Deep Learning from [21] to predict tumor from CBC samples.
H2O is based on a multi-layer feed-forward artificial neural network that is trained with
stochastic gradient descent using back-propagation.
To get the best results we trained the system with 20 hidden layers which have 100 neurons for
each. We found that the best activation function is Rectifier Linear Unit with hidden dropout ratio
of 0.5 for each hidden layer. Number of epochs used in the experiments are 10. Also, the
experiment used Huber as loss function. Finally, Bernoulli distribution function was used. The
rest of the parameters set as default. Using these settings accuracy of the experiment was 79.45%
as see in Table 5. Also, the f-measure for tumor was 77.84%, and the f-measure for hematology
was 80.83.9%.
From the experiment, we also found that the most influence attributes are: MCV, HCT, RBC and
LYM.
Table 5: Confusion matrix when using deep learning
True Tumor True Hematology
Predicted Tumor 2590 477
Predicted Hematology 997 3110
5.0 CONCLUSION AND FUTURE WORKS
Enhancing the quality of life is the major purpose of all healthcare research. In this paper, we
tried to add some knowledge to this field. We inherent our knowledge from CBC blood test
characteristics. We conducted three experiments using three different type of classifiers which
are: Classification based association rules, rule induction and deep learning. Then, we evaluated
the results of our experiments by using accuracy and F-measure. The experiments gave different
accuracy rate according to the type of blood disease and the type of the classifier. We found that
the deep learning classifier has the best ability to detect tumor from blood samples disease, the
problem of this technique is that it has no explanation for the results. On the other hand, rule
induction has acceptable performance, but it gave us some importantly understandable rules.
Also, Association rules gave us some important relations among attributes in the data sample. In
our future work, we will select a big dataset to test our model on it, more classifiers also can be
used.
7. Health Informatics - An International Journal (HIIJ) Vol.6, No.2, May 2017
29
ACKNOWLEDGEMENTS
This research was supported by Qatar Charity under Ibhath project for research grants, which is
funded by the Cooperation Council for the Arab States of the Gulf throughout Islamic
Development Bank.
REFERENCES
[1] M. Durairaj and V. Ranjani, “Data Mining Applications in Healthcare: A Study,” Int. J. Sci. Technol.
Res., vol. 2, no. 10, pp. 29–35, 2013.
[2] H. C. Koh and G. Tan, “Data mining applications in healthcare,” J. Healthc. Inf. Manag., vol. 19, no.
2, p. 65, 2011.
[3] H. HemOnctoday. what is hematology. Available: http://www.healio.com/hematology-
oncology/news/online/{2dd178d0-7f92-46a8-add9- 2c7d634d2cea}/what-is-hematology. 2016
[4] R. Shouval, O. Bondi, H. Mishan, a Shimoni, R. Unger, and A. Nagler, “Application of machine
learning algorithms for clinical predictive modeling: a data-mining approach in SCT.,” Bone Marrow
Transplant., vol. 49, no. 3, pp. 332–7, 2014.
[5] Mayo Clinic . " Cancer blood tests: Lab tests used in cancer diagnosis"
http://www.mayoclinic.org/diseases-conditions/cancer/in-depth/cancer-diagnosis/art-20046459. 2017.
[6] M. Rashid, M. Hoque, and A. Sattar, “Association Rules Mining Based Clinical Observations,” arXiv
Prepr. arXiv1401.2571, 2014.
[7] D. Ravi, C. Wong, F. Deligianni, M. Berthelot, J. Andreu Perez, B. Lo, and G.-Z. Yang, “Deep
Learning for Health Informatics,” IEEE J. Biomed. Heal. Informatics, vol. 21, no. 1, pp. 1–1, 2016.
[8] A. Santos and D. R. Carvalho, “Deep learning for healthcare management and diagnosis,”
Iberoamerican Journal of Applied Computing , Vol. 5 ,No 2pp. 15–25, 2015.
[8] M. N. Amin and A. Habib, “Comparison of Different Classification Techniques Using WEKA for
Hematological Data,” Am. J. Eng. Res., no. 43, pp. 2320–847, 2015.
[9] M. Abdullah and S. Al-Asmari “Anemia types prediction based on data mining classification
algorithms,” Communication, Management and Information Technology – Sampaio de Alencar (Ed.)
2017.
[10] I. H. Alshami and A. M. Alhalees, “Automated Diagnosis of Thalassemia Based on DataMining
Classifiers,” Int. Conf. Informatics Appl., pp. 440–445, 2012.
[11] D. Minnie and S. Srinivasan, “Clustering the Preprocessed Automated Blood Cell Counter Data using
modified K-Means Algorithms and Generation of Association Rules,” vol. 52, no. 17, pp. 38–42,
2012.
[12] S. Saichanma, S. Chulsomlee, N. Thangrua, P. Pongsuchart, and D. Sanmun, “The observation report
of red blood cell morphology in Thailand teenager by using data mining technique,” Adv. Hematol.,
vol. 2014, pp. 4–8, 2014.
[13] S. Vijayarani and S. Sudha . "An Efficient Clustering Algorithm for Predicting Diseases from
Hemogram Blood Test Samples". Indian Journal of Science and Technology, Vol 8(17), August 2015
[14] N. Chawla, K. Bowyer, L. Hall, W. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling
Technique", Journal of Artificial Intelligence Research 16 (2002) 321–357.
[15] The WebMD Medical Team, http://www.webmd.com/a-to-z-guides/complete-blood-count-cbc#4
[16] J. Han , J. Pei , Y. Yin, Mining frequent patterns without candidate generation, Proceedings of the
2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000,
Dallas, Texas, USA.
[17] S. Soni and O. P. Vyas, “Using Associative Classifiers for Predictive Analysis in Health Care Data
Mining,” Int. J. Comput. Appl., vol. 4, no. 5, pp. 33–37, 2010.
[18] G. Chen, H. Liu, L. Yu, Q. Wei and X. Zhang. "A new approach to classification based on
association rule mining". Decision Support Systems, 42(2), 674-689. 2006
[19] N. Lavrac, “Rule induction,” In Intelligent Data Analysis, M. Berthold and D. Hand pp. 1–19, 2003.
[20] Deeplearning4j Development Team. Deeplearning4j: Open-source distributed deep learning for the
JVM, Apache Software Foundation License 2.0. http://deeplearning4j.org
[21] A. Candel, V. Parmar, E. LeDell, and A. Arora. "Deep Learning with H2O." http://h2o.ai/resources.
Mar 2017.
8. Health Informatics - An International Journal (HIIJ) Vol.6, No.2, May 2017
30
AUTHORS
Alaa El-Halees is a professor in computing and Deputy Dean for the faculty of Information Technology at
Islamic University of Gaza, Palestine. He holds a PhD degree in data mining from Leeds Metropolitan
University, UK in 2004, Msc degree in Software Engineering from Leeds Metropolitan University, UK in
1998 and BSc in Computer Engineering from University of Arizona , USA. Alaa has more than 24 years of
experience including leading a range of IT-related projects. Dr. Alaa supervises M.Sc. students in
Information Technology. He also leads and teaches modules at both BSc and MSc levels in Information
Technology. His research activities are in the area of data mining, in particular text mining, machine
learning and e-learning, Software Engineering and computer ethics.
Asem H. Shurrab have a B.A. in Computer Science from Islamic University of Gaza (IUG) at 2004, He is
studying MSc. in Information Technology since 2015 at IUG, he works as a teacher at public high schools.