The goal of this project is to find the best tool for predicting the life expectancy of people with Hepatitis B. Different Machine Learning methods have been completely studied and various Machine Learning methods have been carried out by different experimenters. Hepatitis B is a worldwide disease with a high mortality rate. Different methods have been used by different researchers to predict the life expectancy of Hepatitis B patients. The Machine Learning models and algorithms such as the Classification model, Logistic Regression model, Recursive Feature Elimination Algorithm, Cirrhosis Mortality model, Extreme Gradient Boosting, Random Forest, Decision Tree have been utilized by different researchers to predict the life expectancy of Hepatitis B patients. Some algorithms and models showed very interesting and proving results whereas some were not that good. Area Under Curve analysis was used to assess the estimation of various models. The AUROC value of the PSO model was minimal, while the ADT model had the highest accuracy. XGBoost showed appropriate predictive performance. All other models showed good calibration.
Predicting Life Expectancy of Hepatitis B Patients
1. InternationalConference On Distributed Computing And Electrical Circuits And Electronics
(ICDCECE-2022)
organizes
Ballari Institute Of Technology and Management, Ballari
(Autonomous Institute under VTU, Belagavi | Approved by AICTE, New Delhi | Recognized by Govt. of Karnataka )
Technical Co-Sponsor
Predicting Life Expectancy of
Hepatitis B Patients using Machine
Learning
1. Nabeel Ali BBDITM
2. Dolley Srivastava BBDITM
3. Aditya Tiwari BBDITM
4. Akash Pandey BBDITM
5. Akshat Sahu BBDITM
6. Abhay Kumar Pandey BBDITM
3. • The goal of this work is to find the best tool for predicting the life
expectancy of people with Hepatitis B.
• Different Machine Learning methods have been completely studied and
various Machine Learning methods have been carried out by different
experimenters all over the world.
• The Machine Learning models and algorithms such as the Classification
model, Logistic Regression model, Recursive Feature Elimination
Algorithm, Cirrhosis Mortality model, Extreme Gradient Boosting,
Random Forest, Decision Tree have been utilized by different researchers
to predict the life expectancy of Hepatitis B patients.
Abstract
4. • Life expectation is the number of years a person is projected to live based
on the statistical normal.
• Hepatitis B, is one of the severe disorders that compromises the liver's
functions. The presence of infection in the liver is the main cause of
Hepatitis B symptoms.
• Hepatitis B symptoms include yellowing of the eyes, stomach pain, and
black urine, among others.
• The two most crucial elements for predicting the life expectancy of a
patient with any disease are:-
i. the selection of appropriate parameters and
ii. proper data analysis with skilled knowledge.
Introduction
5. • Various queries have been made in this field for opinions and prediction of
circumstances, and patient’s life expectancy.
• Tao Wang used ancient statistical methods to research and construct a model
of chronic hepatitis B carriers' life expectancy.
• Somaya et al. calculated various machine learning methods in the prediction of
advanced fibrosis in chronic Hepatitis C cases using serum biomarkers.
• Mingxue Yu et al. developed and validated a predictive model for the
prediction of Chronic liver failure in chronic Hepatitis B cases using a recursive
feature elimination technique.
• Xiaolu Tian et al. used multiple machine learning methods to predict the
possibility of Hepatitis B Surface Antigen Seroclearance in hepatitis B patients.
Literature Survey
6. • Supervised data mining techniques have been successful in hepatitis
disease diagnosis through a set of datasets.
• Many methods have been developed by the aids of data mining
techniques for hepatitis disease diagnosis.
• The majority of these methods are developed by single learning
techniques. In addition, these methods do not support the ensemble
learning of the data.
• Combining the outputs of several predictors can result in improved
accuracy in classification problems.
Problem Definition
7. • In this study, we will compare and evaluate the usefulness of different
machine learning techniques in predicting life expectancy of Hepatitis B
patients by developing classification models.
• Logical Regression (LR), Decision tree (DT), and K-Nearest Neighbour
(KNN) models for prediction were be developed.
• The proposed models should be easy to perform, inexpensive, and give
numerical and accurate results in real time. These models will predict the
life expectancy of patients with high accuracy.
Proposed Work
8. Tecniques Involved are:-
• EDA :-Exploratory Data Analysis is the process of investigating the dataset to discover
patterns, and anomalies (outliers), and form hypotheses based on our understanding of
the dataset.
• Outlier Detection:- Outliers are observations in a dataset that don't fit in some way.
They can skew statistical measures and data distributions andmislead representation of
the underlying data and relationships.
• Feature Selection:-Feature selection is the process of reducing the number of input
variables when developing a predictive model. It involve evaluating the relationship
between each input variable and the target variable using statistics and selecting those
input variables that have the strongest relationship with the target variable.
• RFE:-Recursive Feature Elimination is used for selecting those features (columns) in a
training dataset that are more or most relevant in predicting the target variable.
Methodology & Implementation
9. • Extra Tree Classifier :-Extremely Randomized Trees Classifier (Extra Trees Classifier) is a
type of ensemble learning technique which aggregates the results of multiple de-
correlated decision trees collected in a “forest” to output its classification result.
• Confusion Matrix :-The confusion matrix is a matrix used to determine the
performance of the classification models for a given set of test data. It can only be
determined if the true values for test data are known.
Cont.
10. • Data was collected from various online medical records and by surveying patients
suffering from Hepatitis B with different backgrounds. Data was thoroughly analyzed
and cleaned before using.
• Data collection included demographics, age, sex, use of steroid, antivirals, fatigue,
malaise, anorexia, liver big, liver firm, spleen palpable, spiders, ascites, varices,
bilirubin, alk phosphate, sgot, albumin, protime, histology.
• To explore the predictive power of individual variables, we first developed a
univariate logistic model for each variable.
• The Machine Learning algorithms such as Logistic Regression (LR), K Nearest
Neighbour (KNN), and Decision Tree were considered as the classification and
prediction tools for predicting the life expectancy of Hepatitis B patients.
• These models were trained and tested on the best 14 variables selected.
• The Logistic Regression showed an accuracy score of 0.72 while KNN and Decision
Tree showed similar accuracy score of 0.74.
Results & Discussions
11. • Logistic regression model is a classic statistical classification method. It investigates
the correlation between binary-dependent variable and -independent variables by
estimating probabilities using a logistic function.
• Decision tree is a nonparametric supervised learning method used for classification
and regression that uses a tree-like graph or model of decision to predict the value of
a target variable by learning simple decision rules inferred from the data features.
• K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of
new datapoints which further means that the new data point will be assigned a value
based on how closely it matches the points in the training set.
• These models were trained and tested on the best 14 variables selected.
Contd.
12. • All of the models had reasonable estimations.
• All the three models Logistic Regression, KNN and Decision Tree showed almost
similar accuracy scores based on the best features available.
• Logistic regression model is a classic statistical classification method. It investigates
the correlation between binary-dependent variable and -independent variables by
estimating probabilities using a logistic function. It showed an accuracy of 72%.
• Decision tree is a nonparametric supervised learning method used for classification
and regression that uses a tree-like graph or model of decision to predict the value of
a target variable by learning simple decision rules inferred from the data features. It
showed an accuracy of 74%.
• K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of
new datapoints which further means that the new data point will be assigned a value
based on how closely it matches the points in the training set. It also showed an
accuracy of 74%.
Conclusion
13. • Tao Wang [2009]. Model of Life
Expectancy of Chronic Hepatitis B
Carriers in an Endemic Region.
Journal of Epidemiology.[1]
• Brent C. Taylor [2009]. Clinical
Outcomes in Adults with Chronic
Hepatitis B in Association with
Patient and Viral Characteristics.
Hepatology Communications.[2]
• Mamta K. Jain [2009]. Mortality
in Patients Coinfected with
Hepatitis B Virus and HIV. Clinical
Infectious Diseases.[3]
• J.Wolfson [2015]. A Naïve Bayes
machine learning approach to
risk prediction using censored,
time-to-event data. US National
Library of Medicine National
Institutes of Health (NCBI).[4]
• Somaya Hashem, Gamal Esmat
[2017]. Comparison of Machine
Learning Approaches for
Prediction of Advanced Liver
Fibrosis in Chronic Hepatitis C
Patients. IEEE/ACM transactions
on computational biology and
bioinformatics / IEEE, ACM.[5]
• Yaming Zhang [2018]. Modeling
for the prediction of Hepatitis B
incidence based on integrated
online search indexes.
Informatics in Medicine
Unlocked.[6]
• Xiaolu Tian, Yutian Chong [2019].
Using Machine Learning
Algorithms to Predict Hepatitis B
Surface Antigen Seroclearance.
Hindawi Computational and
Mathematical Methods in
Medicine.[7]
• Hailemichael Desalegn [2019].
Predictors of mortality in patients
under treatment for chronic
hepatitis B in Ethiopia. BMC
Gastroenterology. [8]
• Fasiha Kanwal, MD, MSHS;
Thomas J. Taylor, PhD [2020].
Development, Validation, and
Evaluation of a Simple Machine
Learning Model to Predict
Cirrhosis Mortality. JAMA
Network Open.[9]
• Mingxue Yu, Xiangyong Li [2021].
Development and Validation of a
Novel Risk Prediction Model
UsingRecursive Feature
Elimination Algorithm for Acute-
on- Chronic Liver Failure in
Chronic Hepatitis B Patients with
Severe Acute Exacerbation.
Frontiers in Medicine.[10]
References