Early prediction of liver disease is very important to save human life and take proper steps to control the
disease. Decision Tree algorithms have been successfully applied in various fields especially in medical
science. This research work explores the early prediction of liver disease using various decision tree
techniques. The liver disease dataset which is select for this study is consisting of attributes like total
bilirubin, direct bilirubin, age, gender, total proteins, albumin and globulin ratio. The main purpose of this
work is to calculate the performance of various decision tree techniques and compare their performance.
The decision tree techniques used in this study are J48, LMT, Random Forest, Random tree, REPTree,
Decision Stump, and Hoeffding Tree. The analysis proves that Decision Stump provides the highest
accuracy than other techniques
Existing model uses structured data to predict the patients of either high risk or low risk.
But for a complex disease, structured data is not a good way to describe the disease.
We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital.
In this paper, we mainly focus on the risk prediction of cerebral infarction.
Machine Learning for Disease PredictionMustafa Oğuz
A great application field of machine learning is predicting diseases. This presentation introduces what is preventable diseases and deaths. Then examines three diverse papers to explain what has been done in the field and how the technology works. Finishes with future possibilities and enablers of the disease prediction technology.
Existing model uses structured data to predict the patients of either high risk or low risk.
But for a complex disease, structured data is not a good way to describe the disease.
We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital.
In this paper, we mainly focus on the risk prediction of cerebral infarction.
Machine Learning for Disease PredictionMustafa Oğuz
A great application field of machine learning is predicting diseases. This presentation introduces what is preventable diseases and deaths. Then examines three diverse papers to explain what has been done in the field and how the technology works. Finishes with future possibilities and enablers of the disease prediction technology.
Heart is the most important organ of a human body. It circulates oxygen and other vital nutrients through blood to different parts of the body and helps in the metabolic activities. Apart from this it also helps in removal of the metabolic wastes. Thus, even minor problems in heart can affect the whole organism. Researchers are diverting a lot of data analysis work for assisting the doctors to predict the heart problem. So, an analysis of the data related to different health problems and its functioning can help in predicting with a certain probability for the wellness of this organ. In this paper we have analysed the different prescribed data of 1094 patients from different parts of India. Using this data, we have built a model which gets trained using this data and tries to predict whether a new out-of-sample data has a probability of having any heart attack or not. This model can help in decision making along with the doctor to treat the patient well and creating a transparency between the doctor and the patient. In the validation set of the data, it’s not only the accuracy that the model has to take care, rather the True Positive Rate and False-Negative Rate along with the AUC-ROC helps in building/fixing the algorithm inside the model.
We are predicting Heart Disease by Taking 14 Medical Parameters as an inputs through 2 data Minning Techniques(Decision Tree(Faster) And KNN neighbour Algorithms(Slower)).
And Visualizing The dataset.If the output 1 then it means Higher Chances of getting Heart Attack ,if 0 then it means Less chances of Heart Attack.
A major challenge facing healthcare organizations (hospitals, medical centers) is
the provision of quality services at affordable costs. Quality service implies diagnosing
patients correctly and administering treatments that are effective. Poor clinical decisions
can lead to disastrous consequences which are therefore unacceptable. Hospitals must
also minimize the cost of clinical tests. They can achieve these results by employing
appropriate computer-based information and/or decision support systems.
Most hospitals today employ some sort of hospital information systems to manage
their healthcare or patient data.
These systems are designed to support patient billing, inventory management and generation of simple statistics. Some hospitals use decision support systems, but they are largely limited. Clinical decisions are often made based on doctors’ intuition and experience rather than on the knowledge rich data hidden in the database.
This practice leads to unwanted biases, errors and excessive medical costs which affects the quality of service provided to patients.
Disease prediction and doctor recommendation systemsabafarheen
This paper will tell you how the system will work in terms of disease prediction also will suggest you nearest hospital with experienced doctors, cheap fees
Prediction of Heart Disease using Machine Learning Algorithms: A Surveyrahulmonikasharma
According to recent survey by WHO organisation 17.5 million people dead each year. It will increase to 75 million in the year 2030[1].Medical professionals working in the field of heart disease have their own limitation, they can predict chance of heart attack up to 67% accuracy[2], with the current epidemic scenario doctors need a support system for more accurate prediction of heart disease. Machine learning algorithm and deep learning opens new door opportunities for precise predication of heart attack. Paper provideslot information about state of art methods in Machine learning and deep learning. An analytical comparison has been provided to help new researches’ working in this field.
following topics are discussed inside the PPT:
Introduction
Objective
Motivation
Literature Survey
Some Key Features of Disease
Plan of Action
Methodology Adopted
Data Collection
Steps to be Performed
Functional Architecture
Breast cancer diagnosis and recurrence prediction using machine learning tech...eSAT Journals
Abstract Breast Cancer has become the common cause of death among women. Due to long hours invested in manual diagnosis and lesser diagnostic system available emphasize the development of automated diagnosis for early diagnosis of the disease. Our aim is to classify whether the breast cancer is benign or malignant and predict the recurrence and non-recurrence of malignant cases after a certain period. To achieve this we have used machine learning techniques such as Support Vector Machine, Logistic Regression, KNN and Naive Bayes. These techniques are coded in MATLAB using UCI machine learning depository. We have compared the accuracies of different techniques and observed the results. We found SVM most suited for predictive analysis and KNN performed best for our overall methodology. Keywords: Breast Cancer, SVM, KNN, Naive Bayes, Logistic Regression, Classification.
Machine Learning - Breast Cancer DiagnosisPramod Sharma
Machine learning is helping in making smart decisions faster. In this presentation measurements carried out on FNAC was analysed. The results were validated using 20 percent of the data. The data used for POC is from UCI Repository/
A ppt based on predicting prices of houses. Also tells about basics of machine learning and the algorithm used to predict those prices by using regression technique.
A Tentative analysis of Liver Disorder using Data Mining Algorithms J48, Deci...MangaiK4
Abstract — Nowadays healthcare field has additional data mining process became a crucial role to use for disease prediction. Data mining is that the process of investigate up info from the huge information sets. The medical information is extremely voluminous. Therefore the investigator is extremely difficult to predict the disease is challenging. To overcome this issue the researchers use data mining processing technique like classification, clustering, association rules so on. The most objective of this analysis work is to predict disease supported common attributes intake of alcohol, smoking, obesity, diabetes, consumption of contaminated food, case history of liver disease using classification algorithm. The algorithms employed in this analysis work are J48, Naive Bayes. These classification algorithms are compared base on the performance factors accuracy and execution time. The investigational results could be a improved classifier for predict the liver disease.
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEMijaia
Misguided information in health care has caused much havoc that have led to the death of millions of people as a result of misclassification, and inconsistent health care records; hence the objective of this paper is to develop an improved clinical decision support system. This system incorporated hybrid system
of non-knowledge based and knowledge based decision support system for the diagnosis of diseases and proper health care delivery records using prostate cancer and diabetes datasets to train and validate the model. The min-max method was adopted in normalizing the datasets, while genetic algorithm was
deployed in initiating the training weights of the MLP. The result obtained in this paper yielded a classification accuracy of 98%, sensitivity of 0.98 and specificity of 100 for prostate cancer and accuracy of 94%, sensitivity of 0.94 and specificity of 0.67 for diabetes.
Heart is the most important organ of a human body. It circulates oxygen and other vital nutrients through blood to different parts of the body and helps in the metabolic activities. Apart from this it also helps in removal of the metabolic wastes. Thus, even minor problems in heart can affect the whole organism. Researchers are diverting a lot of data analysis work for assisting the doctors to predict the heart problem. So, an analysis of the data related to different health problems and its functioning can help in predicting with a certain probability for the wellness of this organ. In this paper we have analysed the different prescribed data of 1094 patients from different parts of India. Using this data, we have built a model which gets trained using this data and tries to predict whether a new out-of-sample data has a probability of having any heart attack or not. This model can help in decision making along with the doctor to treat the patient well and creating a transparency between the doctor and the patient. In the validation set of the data, it’s not only the accuracy that the model has to take care, rather the True Positive Rate and False-Negative Rate along with the AUC-ROC helps in building/fixing the algorithm inside the model.
We are predicting Heart Disease by Taking 14 Medical Parameters as an inputs through 2 data Minning Techniques(Decision Tree(Faster) And KNN neighbour Algorithms(Slower)).
And Visualizing The dataset.If the output 1 then it means Higher Chances of getting Heart Attack ,if 0 then it means Less chances of Heart Attack.
A major challenge facing healthcare organizations (hospitals, medical centers) is
the provision of quality services at affordable costs. Quality service implies diagnosing
patients correctly and administering treatments that are effective. Poor clinical decisions
can lead to disastrous consequences which are therefore unacceptable. Hospitals must
also minimize the cost of clinical tests. They can achieve these results by employing
appropriate computer-based information and/or decision support systems.
Most hospitals today employ some sort of hospital information systems to manage
their healthcare or patient data.
These systems are designed to support patient billing, inventory management and generation of simple statistics. Some hospitals use decision support systems, but they are largely limited. Clinical decisions are often made based on doctors’ intuition and experience rather than on the knowledge rich data hidden in the database.
This practice leads to unwanted biases, errors and excessive medical costs which affects the quality of service provided to patients.
Disease prediction and doctor recommendation systemsabafarheen
This paper will tell you how the system will work in terms of disease prediction also will suggest you nearest hospital with experienced doctors, cheap fees
Prediction of Heart Disease using Machine Learning Algorithms: A Surveyrahulmonikasharma
According to recent survey by WHO organisation 17.5 million people dead each year. It will increase to 75 million in the year 2030[1].Medical professionals working in the field of heart disease have their own limitation, they can predict chance of heart attack up to 67% accuracy[2], with the current epidemic scenario doctors need a support system for more accurate prediction of heart disease. Machine learning algorithm and deep learning opens new door opportunities for precise predication of heart attack. Paper provideslot information about state of art methods in Machine learning and deep learning. An analytical comparison has been provided to help new researches’ working in this field.
following topics are discussed inside the PPT:
Introduction
Objective
Motivation
Literature Survey
Some Key Features of Disease
Plan of Action
Methodology Adopted
Data Collection
Steps to be Performed
Functional Architecture
Breast cancer diagnosis and recurrence prediction using machine learning tech...eSAT Journals
Abstract Breast Cancer has become the common cause of death among women. Due to long hours invested in manual diagnosis and lesser diagnostic system available emphasize the development of automated diagnosis for early diagnosis of the disease. Our aim is to classify whether the breast cancer is benign or malignant and predict the recurrence and non-recurrence of malignant cases after a certain period. To achieve this we have used machine learning techniques such as Support Vector Machine, Logistic Regression, KNN and Naive Bayes. These techniques are coded in MATLAB using UCI machine learning depository. We have compared the accuracies of different techniques and observed the results. We found SVM most suited for predictive analysis and KNN performed best for our overall methodology. Keywords: Breast Cancer, SVM, KNN, Naive Bayes, Logistic Regression, Classification.
Machine Learning - Breast Cancer DiagnosisPramod Sharma
Machine learning is helping in making smart decisions faster. In this presentation measurements carried out on FNAC was analysed. The results were validated using 20 percent of the data. The data used for POC is from UCI Repository/
A ppt based on predicting prices of houses. Also tells about basics of machine learning and the algorithm used to predict those prices by using regression technique.
A Tentative analysis of Liver Disorder using Data Mining Algorithms J48, Deci...MangaiK4
Abstract — Nowadays healthcare field has additional data mining process became a crucial role to use for disease prediction. Data mining is that the process of investigate up info from the huge information sets. The medical information is extremely voluminous. Therefore the investigator is extremely difficult to predict the disease is challenging. To overcome this issue the researchers use data mining processing technique like classification, clustering, association rules so on. The most objective of this analysis work is to predict disease supported common attributes intake of alcohol, smoking, obesity, diabetes, consumption of contaminated food, case history of liver disease using classification algorithm. The algorithms employed in this analysis work are J48, Naive Bayes. These classification algorithms are compared base on the performance factors accuracy and execution time. The investigational results could be a improved classifier for predict the liver disease.
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEMijaia
Misguided information in health care has caused much havoc that have led to the death of millions of people as a result of misclassification, and inconsistent health care records; hence the objective of this paper is to develop an improved clinical decision support system. This system incorporated hybrid system
of non-knowledge based and knowledge based decision support system for the diagnosis of diseases and proper health care delivery records using prostate cancer and diabetes datasets to train and validate the model. The min-max method was adopted in normalizing the datasets, while genetic algorithm was
deployed in initiating the training weights of the MLP. The result obtained in this paper yielded a classification accuracy of 98%, sensitivity of 0.98 and specificity of 100 for prostate cancer and accuracy of 94%, sensitivity of 0.94 and specificity of 0.67 for diabetes.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
Data mining visualization to support biochemical markers for liver fibrosis i...Waqas Tariq
The reference diagnostic test to detect fibrosis is liver biopsy (LB), a procedure subject to various limitations, including risk of patient injury and sampling error. FibroTest (FT) and ActiTest (AT) are biochemical markers (noninvasive tests) used in determining the level of fibrosis and the degree of necroinflammatory activity in the liver. The objective of this work is to discover the differences in the temporal patterns between noninvasive tests and liver biopsy by visualization tools, which made it easier to understand the relations of the complicated rules. This Study ware focused on the major serum fibrosis markers (FT/AT). The test uses a combination of serum biochemical markers with visualization technique to evaluate whether biochemical markers can be used to estimate the stage of liver fibrosis and necro-inflammatory activity in the liver.
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical
research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more
number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well
as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
Predicting Chronic Kidney Disease using Data Mining Techniquesijtsrd
Kidney is a significant aspect of a human body. Kidney infection or disappointments are expanded in every year. Presently a day’s chronic kidney infection is the most well known disease for the individuals. Today numerous individuals pass on due to chronic kidney disease. The principle issue of CKD is, it will influence the kidney gradually. A few people dont have side effects at all and are analysed by a lab test. It depicts the steady loss of kidney work. Early recognition and therapy are viewed as basic variables in the management and control of chronic kidney disease. Data mining techniques is utilized to extract data from clinical and laboratory, which can be useful to help doctors to recognize the seriousness stage of patients. Using Probabilistic Neural Networks PNN algorithm will get better prediction for determining the severity stage of chronic kidney disease. Seethal V | Kuldeep Baban Vayadande "Predicting Chronic Kidney Disease using Data Mining Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd37974.pdf Paper URL : https://www.ijtsrd.com/computer-science/data-miining/37974/predicting-chronic-kidney-disease-using-data-mining-techniques/seethal-v
An Approach for Disease Data Classification Using Fuzzy Support Vector MachineIOSRJECE
: Data Mining has great scope in the field of medicine. In this article we introduced one new fuzzy approach for prediction of hepatitis disease. Many researchers have proposed the use of K-nearest neighbor (KNN) for diabetes disease prediction. Some have proposed a different approach by using K-means clustering for reprocessing and then using KNN for classification. In our approach Naive Bayes classifier is used to clean the data. Finally, the classification is done using Fuzzy SVM algorithm. Hepatitis diseases data set is used to test our method. We are able to obtain model more precise than any others available in the literature. The Fuzzy SVM approach produced better result than KNN with Fuzzy c-meansand Fuzzy KNN with Fuzzy c-means. Theintroduction of Fuzzy Support Vector Machine algorithm certainly has a positive effect on the outcome of hepatitis disease. This fuzzy SVM model led to remarkable increase in classification accuracy
Structure and development of a clinical decision support system: application ...komalicarol
Clinical decision requires to infer great, diverse and not suitably
organized quantity of information and having low time to decide.
The therapeutic choice is fundamental to formulate a strategy to
avoid complications and to achieve favorable results, being more
important in some specialties. In addition, medical decision-makers are overloaded with clinical tasks, have an intense work rate and
are subject to a great demand, and are prone to greater tiredness.
In this sense, computer tools can be extremely useful, as can deal
with a lot of information in much less time than decision-makers.
Thus, the existence of a tool that assists them in decision-making
is of crucial importance
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESIAEME Publication
Diabetes mellitus is a common disease caused by a set of metabolic ailments
where the sugar stages over drawn-out period is very high. It touches diverse organs
of the human body which therefore harm a huge number of the body's system, in
precise the blood strains and nerves. Early prediction in such disease can be exact
and save human life. To achieve the goal, this research work mainly discovers
numerous factors associated to this disease using machine learning techniques.
Machine learning methods provide effectual outcome to extract knowledge by building
predicting models from diagnostic medical datasets together from the diabetic
patients. Quarrying knowledge from such data can be valuable to predict diabetic
patients. In this research, six popular used machine learning techniques, namely
Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), C4.5 Decision
Tree (DT), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) are
compared in order to get outstanding machine learning techniques to forecast diabetic
mellitus. Our new outcome shows that Support Vector Machine (SVM) achieved
higher accuracy compared to other machine learning techniques.
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
Environmental changes and food habits affect people's health with numerous diseases in today's life. Machine learning is a technique that plays a vital role in predicting diseases from collected data. The health sector has plenty of electronic medical data, which helps this technique to diagnose various diseases quickly and accurately. There has been an improvement in accuracy in medical data analysis as data continues to grow in the medical field. Doctors may have a hard time predicting symptoms accurately. This proposed work utilized Kaggle data to predict and diagnose heart and diabetic diseases. The diseases heart and diabetes are the foremost cause of higher death rates for people. The dataset contains target features for the diagnosis of heart disease. This work finds the target variable for diabetic disease by comparing the patient's blood sugars to normal levels. Blood pressure, body mass index (BMI), and other factors diagnose these diseases and disorders. This work justifies the filter method and principal component analysis for selecting and extracting the feature. The main aim of this work is to highlight the implementation of three ensemble techniques-Adaptive boost, Extreme Gradient boosting, and Gradient boosting-as well as the emphasis placed on the accuracy of the results.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%
This paper helps in foreseeing diabetes by applying data mining strategy. The revelation of information
from clinical datasets is significant so as to make powerful medical determination. The point of data mining is to
extricate information from data put away in dataset and produce clear and reasonable depiction of examples. Diabetes
is an interminable sickness and a significant general wellbeing challenge around the world. Utilizing data mining
techniques by taking hba1c test data to help individuals to predict diabetes has increase significant fame. In this paper,
six classification models are used to classify a diabetic or non-diabetic patient and male and female patients. The
dataset utilized is gathered from a Diagnostics and research laboratory Liaquat university of medical and health
sciences Jamshoro, which gathers the data of patients with diabetes, without diabetes by taking blood sample of patient
and performing hba1c. We utilized Weka tool for the analysis diabetes, no-diabetic examination. Out of six
classification algorithms, four algorithms depict hundred percent accuracy on train and test data.
KEY WORDS: Data mining, Diabetes, HbA1c, Classification models, Weka.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Digital Tools and AI for Teaching Learning and Research
LIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUES
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
DOI: 10.5121/ijdkp.2018.8201 1
LIVER DISEASE PREDICTION BY USING DIFFERENT
DECISION TREE TECHNIQUES
Nazmun Nahar1
and Ferdous Ara2
1
Department of Computer Science & Engineering, University of Chittagong
2
Lecturer, Department of Computer Science & Engineering
BGC Trust University Bangladesh
ABSTRACT
Early prediction of liver disease is very important to save human life and take proper steps to control the
disease. Decision Tree algorithms have been successfully applied in various fields especially in medical
science. This research work explores the early prediction of liver disease using various decision tree
techniques. The liver disease dataset which is select for this study is consisting of attributes like total
bilirubin, direct bilirubin, age, gender, total proteins, albumin and globulin ratio. The main purpose of this
work is to calculate the performance of various decision tree techniques and compare their performance.
The decision tree techniques used in this study are J48, LMT, Random Forest, Random tree, REPTree,
Decision Stump, and Hoeffding Tree. The analysis proves that Decision Stump provides the highest
accuracy than other techniques.
KEYWORDS
Data Mining, Decision Tree, Liver Disease
1. INTRODUCTION
The liver plays an important role in many bodily functions from protein production and blood
clotting to cholesterol, glucose (sugar), and iron metabolism. It has a range of functions, including
removing toxins from the body, and is crucial to survival. The loss of those functions can cause
significant damage to the body. When liver is infected with a virus, injured by chemicals, or
under attack from own immune system, the basic danger is the same – that liver will become so
damaged that it can no longer work to keep a person alive. Liver disease caused by hepatotrophic
viruses imposes a substantial burden on health care resources. Persistent infections from hepatitis
B virus (HBV), hepatitis C virus, and hepatitis delta virus result in chronic liver disease. The
most basic classification of liver disease is as acute and chronic. The definition of acute liver
disease is based on duration, with the history of the disease does not exceed six months. Acute
viral hepatitis and drug reactions account for the majority of cases of acute liver disease.
Liver disease is also referred to as hepatic disease. Usually nausea, vomiting, right upper quadrant
abdominal pain, fatigue and weakness are classic symptoms of liver disease. Symptoms of liver
patient include jaundice, abdominal pain, fatigue, nausea, vomiting, back pain, abdominal
swelling, weight loss, fluid in abnormal cavity, general itching, pale stool, enlarged spleen and
gallbladder [1]. Symptoms of liver disease can vary, but they often include swelling of the
abdomen and legs, bruising easily, changes in the colour of your stool and urine, and jaundice, or
yellowing of the skin and eyes. Sometimes there are no symptoms. Tests such as imaging tests
and liver function tests can check for liver damage and help to diagnose liver diseases.
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
2
The purpose of this study is to compare the decision tree algorithms such as J48, LMT, Random
Tree, Random Forest, REPTree, Decision Stump and Hoeffding Tree in diagnosis liver disease.
The liver dataset are analyzed using above decision tree algorithms and compare their
performance with respect to seven performance metrics (ACC%, MAE, PRE, REC, FME, Kappa
Statistics and runtime). The paper is organized as follows: In Section 2, the Related Works is
presented. In Section 3, Methodology used in this paper is given. Experiments and result is
presented in Section 4. The paper is concluded in Section 5.
2 RELATED WORKS
Machine learning has attracted a huge amount of researches and has been applied in various fields
in the world. In medicine, machine learning has proved its power in which it has been employed
to solve many emergency problems such as cancer treatment, heart disease, dengue fever
diagnosis and so on. Among several outstanding methods, Decision Tree algorithms have been
employed for many researches.
Liver disease of the patients has been continuously increasing because of inhale of harmful gases,
intake of contaminated food, different kinds of drugs and excessive consumption of alcohol.
Automatic classification tools may reduce burden on doctors [2]. The classification algorithms
based on classification of some liver patient datasets. For the algorithm he considered Naïve
Bayes classifier, C4.5, Back propagation Neural Network algorithm, and Support Vector
Machines which evaluated based on four criteria: Accuracy, Precision, Sensitivity and
Specificity. On the other hand, Aneeshkumar [3] used a methodology to effective classification of
liver and non-liver disease dataset. Pre-processing method is used to cleansing the data for
effective classification, after cleansing the data. 15 attributes of real medical data are collected
from dataset. C4.5 and Naive Bayes are the two algorithms used in his study. He divided datasets
into three different types of ratio based on average and standard deviation of each factor of both
class and evaluated the accuracy. The result in his study after evaluate the accuracy, he said C4.5
is gives better accuracy than Naive Bayes, because it gives more accuracy with the minimum time
taken. Naïve Bayes is sometimes better than FT growth algorithm with the use of machine
learning for detection of liver disease [4]. He compared among 29 datasets with 12 different
attributes. By comparing two decision tree algorithms which are FT growth and Naïve Bayes and
found that Naïve Bayes is better than FT growth algorithm with the use of machine learning
because, Naïve Bayes (75.54%) gives more accuracy than FT growth algorithm (72.66) using
WEKA Tool. Whereas, in comparison of Ft tree Naïve Baiyes and Kstar to predict the liver
disease disorder with evaluate using 10-fold cross validation, Rajeswari [5], identified FT tree
gives better role for increasing the accuracy of the dataset in classification technique algorithm.
The computer aided diagnosis (CAD) system consists of the segmentation of liver and lesion,
extraction of features from alesion and characterization of liver diseases by means of a classifier.
In an experiment Gunasundari [6] found conversional image processing operations, neural
networks and Genetic algorithm gives successful result for liver disease disorder diagnosis. In
future liver disease disorder diagnosis extended in many directions. Such as using effective
algorithms and more texture feature technique algorithms. CART uses a purity-based measure,
and the algorithm splits the training data set based on how probably the subsets become purer for
a class, and it spends more time to generate smaller trees. CART and C4.5 both algorithms are
gives good result with oversampling for liver disease disorder dataset. These algorithms could
reduce the minor class increment to smaller percentage [7]. Decision tree algorithm does not give
high priority for minor classes for that reason using duplication in BUPA liver disease disorder
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
3
dataset, increase the number of instances of minor class and proceed with two decision tree
algorithms and both algorithms gives good result in insufficiency of liver disease disorder data.
Case Based Reasoning (CBR) and Classification and Regression Tree (CART) techniques could
be useful to detect the liver disease [8]. Feature selection plays a vital role in text categorization.
A range of different methods have been developed, each having unique properties and selecting
different features. We show some results of an extensive study of feature selection approaches
using a wide range of combination methods. Bendi [9], proposed a Modified Rotation Forest
algorithm to calculate the accuracy of the liver classification techniques in UCI liver dataset using
the combo of feature selection technique and selected classification technique algorithm. Over the
past few years, the increasing attention on severe challenges in medical diagnosis process such as
sharply increased elderly patients, limited medical personnel, has led to a number of contributions
in the areas of the intelligent medical diagnosis methods. The early contributions can be found on
the neural networks, it provides a new significant way for intelligent medical diagnosis. A model
proposed by Kiruba [10] on intelligent agent based system to hike a precise and accurate of
diagnosis system. C4.5 decision tree algorithm and Random tree algorithm are used to predict.
Two different types of liver disease disorder dataset are combined and predict the accuracy of the
disease. And then conclude these both algorithms gives very good accuracy for diagnosing liver
disease disorder. Liver abscess is the commonest cause of hepatomegaly and it is due to
amoebiasis, followed by fatty liver, congestive cardic failure, hepatocellular carcinoma, and viral
hepatitis seen only in few patients [11].
3. METHODOLOGY
Main objective of this study is to identify that whether the patient has liver disease or not. Some
of the parameter are used for predicting the liver disease and compare the performance of the
various decision tree techniques. Weka is a data mining tool which is written in java and
developed at Waikato. WEKA is a very efficient data mining tool to classify the accuracy by
applying different algorithmic approaches and compare on the basis of datasets [12].It is also a
good tool for build new machine learning schemes. The result found from the liver disease dataset
by using Weka tool are in section 4.10-fold cross validation performed on the dataset.
The objective of this study is liver disease prediction using data mining tool. The main task in this
study is:
• Various decision tree techniques are used for the Prediction of the liver disease.
• Comparing different decision tree techniques.
• Finding best decision tree for the liver disease prediction.
A. DECISION TREE
Data mining is a process where intelligent methods are used to find out data patterns. It is an
important process of discovering pattern and knowledge from large volume of data. Now a day,
the usefulness of the methods has been proven in medical field by trying different algorithms.
One of the algorithm is data classification is the process of finding a model that can explain
different data classes. Some classification algorithms are Decision tree, Support vector Machine,
K-NN, Neural networks , Association rule, Bayesian networks, etc. However, for this work
decision tree is used because it provides the more accurate results than other algorithms. The
large datasets easily classified by the decision tree which is easy to understand by the human. The
structure of the Decision tree is looks like a tree structure. Decision tree is made of a root, leaf
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
4
nodes and internal nodes. Seven decision tree techniques have been used in this study. They are
J48, LMT, Random Forest, Random tree, REPTree, Decision Stump, and Hoeffding. Their
performance was analyzed using Accuracy(ACC),Precision(PRE),Recall(REC),Mean Absolute
Error(MAE),F-Measure(FME),Kappa Statistic and Run time. The descriptions of decision tree
technique that are used in this study are given below:
J48:
J48 is advance version of C4.5. The technique of this algorithm is to use divide-and-conquer
method. It uses pruning method to construct tree. It is a common method which is used in
information gain or entropy measure. Thus it is like tree structure with root node, intermediate
and leaf nodes. Node holds the decision and helps to acquire the result.
REP Tree:
REP Tree is a fast decision tree learner. Builds a decision/regression tree using entropy as
impurity measure and prunes it using reduced-error pruning. It only sorts values for numeric
attributes once.
Random Tree:
Random Tree is a group learning algorithm that creates many individual learners. It is an
algorithm for build a tree that treats K random features at each node. It involves a bagging idea to
create a random set of data for building a decision tree. For building a standard tree each node is
split using the best split among all variables.
Decision Stump:
Decision stumps are basically decision trees with a single label. A stump is opposed to a tree
which has multiple layers. It basically stops after the first split. Decision stumps are usually used
in large data. Hardly, they also help to make simple yes/no decision model for smaller dataset.
LMT:
LMT means logistic model tree. LMT is a classification model with an associated supervised
training algorithm. It combines decision tree learning and logistic prediction. Logistic model trees
use a decision tree that has linear regression models at its leaves to provide a section wise linear
regression model.
Random Forest:
Random forests are a group learning method for regression, classification and other works that
operate by building a multitude of decision trees at training period and outputting the class that is
the mode of the classification or mean prediction of the individual trees. Random forests average
multiple deep decision trees, which are trained on various parts of the same training set, with the
aim of minimizing the variance.
Hoeffding Tree
Hoeffding Tree is known as the streaming decision tree induction. The name is derived from the
Hoeffding bound that is used in the tree induction. The basic idea is, Hoeffding bound provides
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, M
particular level of confidence on the best attribute to split the tree, hence we can
model based on particular number of instances that
B. Dataset
The data are collected from UCI
based on the given attributes. The data set has eleven
The attributes description is given
set is built on both numerical
attribute such as total bilirubin, direct bilirubin, age, gender, total proteins, albumin, albumin and
globulin ratio which is the symptoms of liver disease.
liver instances. The Dataset used in the study consist of
416 are positively tested. Class value “
liver disease. The entire attribute and their types are given in Table 1.
shown in Fig1 which is loaded in the Weka tool.
Attribute Name
Age of the patient
Gender of the Patient
Total Bilirubin
Direct Bilirubin
Alkphos Alkaline Phospotase
Sgpt Alamine Aminotransferase
Total proteins
Albumin
Albumin and Globulin Ratio
Class
Fig1:
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, M
level of confidence on the best attribute to split the tree, hence we can
number of instances that has used.
The data are collected from UCI Machine Learning Repository [13] and it predicts
attributes. The data set has eleven attributes which predict the
given below. Based on data types the attributes are given
and nominal data types. In our study the dataset contains the
total bilirubin, direct bilirubin, age, gender, total proteins, albumin, albumin and
which is the symptoms of liver disease. The data sets consist of 538 liver and non
The Dataset used in the study consist of 167 negative tested for liver disease
Class value “Yes” means having liver disease and “No” means negative
The entire attribute and their types are given in Table 1.A portion of liver dataset is
shown in Fig1 which is loaded in the Weka tool.
Table 1: Attribute Description
Attribute Name Possible value
Age of the patient Numeric
Gender of the Patient Nominal
Total Bilirubin Numeric
Direct Bilirubin Numeric
Alkphos Alkaline Phospotase Numeric
Sgpt Alamine Aminotransferase Numeric
proteins Numeric
Numeric
Albumin and Globulin Ratio Numeric
Nominal
Fig1: Dataset for Liver Disease prediction
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
5
level of confidence on the best attribute to split the tree, hence we can construct the
predicts liver disease
attributes which predict the liver disease.
the attributes are given. The data
In our study the dataset contains the
total bilirubin, direct bilirubin, age, gender, total proteins, albumin, albumin and
538 liver and non-
liver disease and
means negative
A portion of liver dataset is
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
6
4. EXPERIMENTS AND RESULT
The comparison of various decision tree algorithms performed on liver disease data is shown in
Table 2.Decision Stump hast the highest accuracy rate. The accuracy rate of this algorithm is
70.67%.It is seen that Decision Stump is the most powerful classifier for this example. This result
shows that the liver disease of a new patient is predicted successfully with an acceptable ratio
70.67%.Random Tree, LMT and Hoeffding Tree has the accuracy rate as 69.47%, 69.30% and
69.75%.It is also seen that J48 has the worst accuracy rate with 65.69%.The comparison of
decision tree algorithm with respect to accuracy shown in Fig2. The comparison of decision tree
algorithm with respect to kappa statistics and runtime is shown in Fig3 and Fig4 respectively.
REPTree, Random Tree and Decision Tree are faster than other algorithms.LMT algorithm takes
a long time even though a small dataset is used. A decision tree is shown in Fig5 which is
generated by J48 algorithm.
Table 2: Comparing the various decision tree algorithms carried out liver dataset
.
Fig2: Comparison of the decision tree algorithms according to Accuracy
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
7
Fig3: Comparison of the decision tree algorithms according to Kappa statistics
Fig4: Comparison of the decision tree algorithms according to runtime
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
8
Fig5: Decision Tree Generated by J48
5. CONCLUSIONS
The study employed some decision tree algorithm such as J48, LMT, Random Forest, Random
tree, REPTree, Decision Stump and Hoeffding Tree to predict the liver disease at an earlier stage.
These algorithm gives various result based on Accuracy, Mean Absolute Error, Precision, Recall,
Kappa statistics and Runtime. These techniques were evaluated and their performance was
compared. From the analysis, Decision Stump outperforms well than other algorithms and its
achieved accuracy is 70.67%. The performance measure used for comparison are listed in the
table (Table 2) The application of Decision tree in predicting liver disease will benefit in
managing the health of individuals. However, in future, we will collect the very recent data from
various regions across the world for liver disease diagnosis. The results of this study will
encourage us to continue developing other advanced decision trees such as CART.
REFERENCES
[1] D. Sindhuja and R. J. Priyadarsini, “A survey on classification techniques in data mining for
analyzing liver disease disorder”, International Journal of Computer Science and Mobile Computing,
Vol.5, no.5 (2016), pp. 483-488.
[2] B. V. Ramana, M. R. P. Babu and N.B. Venkaeswarlu, “A Critical Study of Selected Classification
Algorithms for Liver Disease Diagnosis”, International Journal of Database Management Systems
(IJDMS), Vol.3, no.2, (2011) , pp. 101-114.
[3] A.S.Aneeshkumar and C.J. Venkateswaran, “Estimating the Surveillance of Liver Disorder using
Classification Algorithms”, International Journal of Computer Applications (0975 –8887) , Vol. 57,
no. 6, (2012), pp. 39-42.
[4] S.Dhamodharan, “Liver Disease Prediction Using Bayesian Classification”, 4th National Conference
on Advanced Computing, Applications & Technologies, Special Issue, May 2014.
9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.8, No.2, March 2018
9
[5] P.Rajeswari and G.S. Reena, “Analysis of Liver Disorder Using data mining Algorithms”, Global
Journal of Computer Science and Technology, Vol.10, no. 14 (2010), PP. 48- 52.
[6] G. Selvara and S. Janakiraman, “A Study of Textural Analysis Methods for the Diagnosis of Liver
Disease from Abdominal Computed Tomography”, International Journal of Computer Applications
(0975-8887), Vol. 74, no.11 (2013), PP.7-13.
[7] H. Sug, “ Improving the Prediction Accuracy of Liver Disorder Disease with Oversampling”, Applied
Mathematics in Electrical and Computer Engineering, American-MATH 12/CEA12 proceedings of
the 6th Applications and proceedings on the 2012 American Conference on Appied Mathematics
(2012), PP. 331-335.
[8] R.H.Lin, “An Intelligent model for liver disease diagnosis”, Artificial Intelligence in Medical, Vol.
47, no. 1 (2009), PP. 53-62.
[9] B. V. Ramanaland and M.S. P. Babu, “Liver Classification Using Modified Rotation Forest”,
International Journal of Engineering Research and Development ISSN: 2278-067X, Vol. 1, no. 6
(2012), PP.17-24.
[10] H.R. Kiruba and G. T. arasu, “An Intelligent Agent based Framework for Liver Disorder Diagnosis
Using Artificial Intelligence Techniques”, Journal of Theoretical and Applied Information
Technology, Vol. 69 , no.1 (2014), pp. 91-100.
[11] C.K. Ghosk, F. Islam, E. Ahmed, D.K. Ghosh, A. Haque and Q.K. Islam, “Etiological and clinical
patterns of Isolated Hepatomegaly” Journal of Hepato-Gastroenterology, vol.2, no. 1, PP. 1-4.
[12] S. S. Aksenova , ‘Machine Learning with WEKA -WEKA Explorer Tutorial for WEKA Version 3.4’,
2004.
[13] Machine Learning Repository, Center for Machine Learning and Intelligent Systems
https://archive.ics.uci.edu/ml/index.php