RajandeepKaur
Ph.D Scholar
18803003
Chronic Kidney Disease Prediction with
Attribute Reduction using Data Mining
Classifiers
Content
 Introduction
 What is Chronic Kidney Disease (CKD)
 Data Mining & Classification
 Role ofAttribute Selection
 LiteratureReview
 Dataset Used
 PerformanceParameters
 Results & Discussion
 Conclusion
 References
Introduction
 As the past records show, the number of deaths in
India due to chronic kidney disease (CKD) were 5.21
million in 2008 and this number can be further
raised to 7.63 million by 2020 [4] .
 There is need of detection of the chronic kidney
disease at early stage before getting it worse.
 To reduce mortality rate, an efficient technique is required
to predict and classify it.
Need of Study
General Problems :
 A large space is required for complete dataset
 Large computation time
 Not providing good Accuracy
Aim of study:
To predict Chronic Kidney Disease in more accurate and
faster way with reduced attributes.
What is Chronic Kidney Disease (CKD)
Structural or functional abnormalities of the kidneys for
>3 months, as manifested by either:
1. Kidney damage, with or without decreased GFR,
as defined by
 pathologic abnormalities
 markers of kidney damage, including abnormalities in the
composition of the blood or urine or abnormalities in
imaging tests
2. GFR <60 ml/min/1.73 m2, with or without
kidney damage; where GFR is Glomerular
FiltrationRate.
CKD
death
Stages in Progression of Chronic Kidney Disease
and Therapeutic Strategies
Complications
Screening
for CKD
risk factors
CKD risk
reduction;
Screening for
CKD
Diagnosis
& treatment;
Treat
comorbid
conditions;
Slow
progression
Estimate
progression;
Treat
complications;
Prepare for
replacement
Replacement
by dialysis
& transplant
Normal
Increased
risk
Kidney
failure
Damage  GFR
Data Mining & Classification
 Data mining refers to extracting meaningful
information from hidden patterns of dataset [2].
 The data mining techniques are very useful in health
informatics [16, 17].
 Data mining classification techniques play a vital role
in classifying various diseases from symptoms and
various medical tests.
Attribute Selection
 Before inducing a model we almost always do input
engineering
 The most useful part of this is attribute selection (also
called feature selection)
 Select relevantattributes
 Remove redundantand/or irrelevantattributes
 Select the most “relevant” subset of attributes according to
some selection criteria.
Why?
Reasons for Attribute Selection
 Simpler model
 Moretransparent
 Easier to interpret
 Faster model induction
 Structural knowledge
 Knowing which attributes are important may be inherently
important to the application
 Reduce storage requirement
What about the accuracy?
Attribute Selection Contd…
 Attribute Selection can be done by following two
methods:
 Filter
 Wrapper
Filter Method
 Results in either
Ranked list of attributes
 Typical when each attribute is evaluated individually
 Must select how many to keep
A selected subset of attributes
 Forward selection
 Best first
 Random search such as genetic algorithm
Wrapper Method
 “Wrap around” the learning algorithm
 Always evaluate subsets
 Return the best subset of attributes
 Use same search methods as before
 Wrapper approach is generally more accurate but
also more computationally expensive
Literature Review
Researcher Year Classifier Accuracy Remarks
K.R. Lakshmi [6] 2014 ANN 93.8521% Performed better than
Decision Tree and Logical
regressionclassifiers
Naganna Chetty
[7]
2015 NaïveBayes,
SMO,IBK
99%,98.25%,
100%
Attribute Reduction using
Wrapper Method
S.Vijayarani [8] 2015 SVM 76.32%. 584 instances and six
attributes
L.Jerlin Rubini
[9]
2015 Multilayer
Preceptor
99.75% Performed better than radial
basis function network, logistic
regression
Uma N Dulhare
[10]
2016 NaïveBayes 97.5% Attribute Reduction using
OneR
HuseyinPolat
[11]
2017 SVM 98.5%. Attribute Reduction
WalaA. [12] 2017 Decisiontree 99% Missing Values are replaced
withmean
DataSet Used
chronic_kidney_disease
from UCI machine learning
repository
Thedataset contains:
•400 instances
•25 attributes
 14 are nominal
11 are numeric
PERFORMANCE ANALYSIS PARAMETERS
 Accuracy
 Precision
 Recall
 RMSE (Root Mean Square Error)
 MAE (MeanAbsolute Error)
 ExecutionTime
 Kappa Statistics
 ROC(Receiver Operating Characteristics)
RESULT AND DISCUSSION
 Tool
 WEKA 3.8 (The Waikato Environment for Knowledge
Analysis)
 Classifier
 J48,DecisionTable and IBK
 AttributeSelection
 CfsSubsetEval,ClassifierSubsetEval,and WrapperSubsetEval
 SearchingTechnique
 Greedy and Bestfit Search Approach
RESULT OF J48, DECISION TABLE AND IBK
CLASSIFIERS ON CKD
Algorithm Accuracy Precision Recall Kappa Statistics Execution Time RMSE
J48 99% 0.990 0.990 0.9786 0.13 0.0807
DecisionTable 99% 0.990 0.990 0.9786 0.46 0.2507
IBK 95.75% 0.962 0.958 0.9113 0.01 0.2056
General Observations:
•J48 and Decision table provide 99% accuracy
•J48 provides least RMSE value
•IBK takes least time to execute
Attribute Reduction
Classifier Attribute Selection Method
Attributes in Original
Dataset
No. of reduced Attributes Attribute Reduction (in %)
J48
CFSSubsetEval+ Greedy
Stepwise
25 17 32
ClassifierSubsetEval+Greedy
Stepwise
25 4 84
WrapperSubsetEval+Best Fit 25 13 48
Decision Table
CFSSubsetEval+ Greedy
Stepwise
25 17 32
ClassifierSubsetEval+Greedy
Stepwise
25 4 84
WrapperSubsetEval+Best Fit 25 7 72
IBK
CFSSubsetEval+ Greedy
Stepwise
25 17 32
ClassifierSubsetEval+Greedy
Stepwise
25 5 80
WrapperSubsetEval+Best Fit 25 7 72
Accuracy of
Reduced Dataset
Classifier Attribute Selection Method Attribute Reduction (in %)
Accuracy without
Reduction
Accuracy with Reduction
J48
CFSSubsetEval+ Greedy
Stepwise
32 99 99
ClassifierSubsetEval+Greedy
Stepwise
84 99 98.25
WrapperSubsetEval+Best Fit 48 99 99
Decision Table
CFSSubsetEval+ Greedy
Stepwise
32 99 98.75
ClassifierSubsetEval+Greedy
Stepwise
84 99 99.25
WrapperSubsetEval+Best Fit 72 99 99
IBK
CFSSubsetEval+ Greedy
Stepwise
32 95.75 98
ClassifierSubsetEval+Greedy
Stepwise
80 95.75 99.75
WrapperSubsetEval+Best Fit 72 95.75 100
Comparison of Accuracy for J48, Decision
Table and IBK Classifier with original and
reduced dataset
CONCLUSION
 The accuracy of IBK for original dataset is 95.75%
 While with 72% reduced dataset, it provides 100% accuracy
using WrapperSubsetEval attribute evaluator with bestfirst
search.
 J48 and Decision Table provides better results than IBK for
originaldataset
 While IBK performed better with reduced dataset than
originaldataset.
 IBK can be used to predict CKD in efficient and fast way with
reduced attributes.
References
[1] L. Jena, and N. Ku. Kamila, "Distributed data mining classification algorithms for prediction of chronic-
kidney-disease," International Journal of Emerging Research in Management &Technology, vol-4, Issue-
11, pp: 110-118, November 2015.
[2] K. Chandel, V. Kunwar, S. Sabitha, T. Choudhury, and S. Mukherjee, “A comparative study on thyroid
disease detection using K-nearest neighbor and Naive Bayes classification techniques, CSI transactions on
ICT, 4(2-4), pp: 313-319, 2016.
[3] Sudhir B. Jagtap, "Census data mining and data analysis using WEKA," arXiv preprint arXiv:1310.4647,
2013.
[4] S.Dilli Arasu, R.Thirumalaiselvi, “Review of Chronic Kidney Disease based on Data Mining Techniques,”
International Journal ofApplied Engineering Research, vol-12, pp: 13498-13505, 2017.
[5] S. Zeynu, Shruti Patil, “Survey on Prediction of Chronic Kidney Disease Using Data Mining Classification
Techniques and Feature Selection,” International Journal of Pure and Applied Mathematics, vol-118, No.
8,pp:149-156, 2018.
[6] K. R. Lakshmi, Y. Nagesh, and M. Veera Krishna, "Performance comparison of three data mining techniques
for predicting kidney dialysis survivability," International Journal of Advances in Engineering &
Technology, vol. 7, pp: 242-254, 2014.
[7] N. Chetty, Kunwar Singh Vaisla, and Sithu D. Sudarsan, “Role of attributes selection in classification of
Chronic Kidney Disease patients,” Computing, Communication and Security (ICCCS), International
Conference on. IEEE, 2015.
References
[8] S. Vijayarani, and S. Dhayanand, "Data mining classification algorithms for kidney disease
prediction,"International Journal on Cybernetics and Informatics (IJCI) , 2015.
[9] L. Jerlin Rubini and Dr. P. Eswaran, “Generating comparative analysis of early stage prediction of Chronic
Kidney Disease,” International Journal of Modern Engineering Research (IJMER), Volume 5, Issue 7, pp
49-55, July2015.
[10] Uma N. Dulhare, and Mohammad Ayesha, “Extraction of action rules for chronic kidney disease using
Naïve bayes classifier,” Computational Intelligence and Computing Research (ICCIC), IEEE International
Conference on IEEE, 2016.
[11] H. Polat, Homay Danaei Mehr, and Aydin Cetin, “Diagnosis of chronic kidney disease based on support
vector machine by feature selection methods,” Journal of medical systems, Feb 2017.
[12] W. Abedalkhader, and Noora Abdulrahman, “Missing Data Classification Of Chronic Kidney Disease,”
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6,
November 2017.
[13] Abeer Y. Al-Hyari, “Chronic Kidney Disease Prediction System UsingClassifying Data Mining Techniques,”
Library of university of Jordan, 2012.
[14 Jiliang Tang, Salem Alelyani, and Huan Liu, “Feature selection for classification: A review,” Data
classification:Algorithms and applications, 2014.
References
[15] Geoffrey Holmes, Andrew Donkin, and Ian H. Witten, “Weka: A machine learning workbench,”
Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand
Conference on. IEEE, 1994.
[16] Mary K. Obenshain, “Application of data mining techniques to healthcare data,” Infection Control &
Hospital Epidemiology25.8, pp: 690-695, 2004.
[17] Cheng, Li-Chen, Ya-Han Hu, and Shr-Han Chiou, “Applying the Temporal Abstraction Technique to the
Predictionof Chronic Kidney Disease Progression,” Journal of medical systems 41, April 2017.
[18] Neeraj Bhargava, Girja Sharma, Ritu Bhargava, and Manish Mathuria, “Decision tree analysis on J48
algorithm for data mining,” Proceedings of International Journal of Advanced Research in Computer
Scienceand Software Engineering, Vol. 3, pp:1114-1119, June 2013.
[19] Hongjun Lu, and Hongyan Liu, “Decision tables: Scalable classification exploring RDBMS
capabilities,”Proceedings of the 26th International Conference onVery Large Data Bases,VLDB'00. 2000.

Chronic Kidney Disease Prediction

  • 1.
    RajandeepKaur Ph.D Scholar 18803003 Chronic KidneyDisease Prediction with Attribute Reduction using Data Mining Classifiers
  • 2.
    Content  Introduction  Whatis Chronic Kidney Disease (CKD)  Data Mining & Classification  Role ofAttribute Selection  LiteratureReview  Dataset Used  PerformanceParameters  Results & Discussion  Conclusion  References
  • 3.
    Introduction  As thepast records show, the number of deaths in India due to chronic kidney disease (CKD) were 5.21 million in 2008 and this number can be further raised to 7.63 million by 2020 [4] .  There is need of detection of the chronic kidney disease at early stage before getting it worse.  To reduce mortality rate, an efficient technique is required to predict and classify it.
  • 4.
    Need of Study GeneralProblems :  A large space is required for complete dataset  Large computation time  Not providing good Accuracy Aim of study: To predict Chronic Kidney Disease in more accurate and faster way with reduced attributes.
  • 5.
    What is ChronicKidney Disease (CKD) Structural or functional abnormalities of the kidneys for >3 months, as manifested by either: 1. Kidney damage, with or without decreased GFR, as defined by  pathologic abnormalities  markers of kidney damage, including abnormalities in the composition of the blood or urine or abnormalities in imaging tests 2. GFR <60 ml/min/1.73 m2, with or without kidney damage; where GFR is Glomerular FiltrationRate.
  • 6.
    CKD death Stages in Progressionof Chronic Kidney Disease and Therapeutic Strategies Complications Screening for CKD risk factors CKD risk reduction; Screening for CKD Diagnosis & treatment; Treat comorbid conditions; Slow progression Estimate progression; Treat complications; Prepare for replacement Replacement by dialysis & transplant Normal Increased risk Kidney failure Damage  GFR
  • 7.
    Data Mining &Classification  Data mining refers to extracting meaningful information from hidden patterns of dataset [2].  The data mining techniques are very useful in health informatics [16, 17].  Data mining classification techniques play a vital role in classifying various diseases from symptoms and various medical tests.
  • 8.
    Attribute Selection  Beforeinducing a model we almost always do input engineering  The most useful part of this is attribute selection (also called feature selection)  Select relevantattributes  Remove redundantand/or irrelevantattributes  Select the most “relevant” subset of attributes according to some selection criteria. Why?
  • 9.
    Reasons for AttributeSelection  Simpler model  Moretransparent  Easier to interpret  Faster model induction  Structural knowledge  Knowing which attributes are important may be inherently important to the application  Reduce storage requirement What about the accuracy?
  • 10.
    Attribute Selection Contd… Attribute Selection can be done by following two methods:  Filter  Wrapper
  • 11.
    Filter Method  Resultsin either Ranked list of attributes  Typical when each attribute is evaluated individually  Must select how many to keep A selected subset of attributes  Forward selection  Best first  Random search such as genetic algorithm
  • 12.
    Wrapper Method  “Wraparound” the learning algorithm  Always evaluate subsets  Return the best subset of attributes  Use same search methods as before  Wrapper approach is generally more accurate but also more computationally expensive
  • 13.
    Literature Review Researcher YearClassifier Accuracy Remarks K.R. Lakshmi [6] 2014 ANN 93.8521% Performed better than Decision Tree and Logical regressionclassifiers Naganna Chetty [7] 2015 NaïveBayes, SMO,IBK 99%,98.25%, 100% Attribute Reduction using Wrapper Method S.Vijayarani [8] 2015 SVM 76.32%. 584 instances and six attributes L.Jerlin Rubini [9] 2015 Multilayer Preceptor 99.75% Performed better than radial basis function network, logistic regression Uma N Dulhare [10] 2016 NaïveBayes 97.5% Attribute Reduction using OneR HuseyinPolat [11] 2017 SVM 98.5%. Attribute Reduction WalaA. [12] 2017 Decisiontree 99% Missing Values are replaced withmean
  • 14.
    DataSet Used chronic_kidney_disease from UCImachine learning repository Thedataset contains: •400 instances •25 attributes  14 are nominal 11 are numeric
  • 15.
    PERFORMANCE ANALYSIS PARAMETERS Accuracy  Precision  Recall  RMSE (Root Mean Square Error)  MAE (MeanAbsolute Error)  ExecutionTime  Kappa Statistics  ROC(Receiver Operating Characteristics)
  • 16.
    RESULT AND DISCUSSION Tool  WEKA 3.8 (The Waikato Environment for Knowledge Analysis)  Classifier  J48,DecisionTable and IBK  AttributeSelection  CfsSubsetEval,ClassifierSubsetEval,and WrapperSubsetEval  SearchingTechnique  Greedy and Bestfit Search Approach
  • 17.
    RESULT OF J48,DECISION TABLE AND IBK CLASSIFIERS ON CKD Algorithm Accuracy Precision Recall Kappa Statistics Execution Time RMSE J48 99% 0.990 0.990 0.9786 0.13 0.0807 DecisionTable 99% 0.990 0.990 0.9786 0.46 0.2507 IBK 95.75% 0.962 0.958 0.9113 0.01 0.2056 General Observations: •J48 and Decision table provide 99% accuracy •J48 provides least RMSE value •IBK takes least time to execute
  • 18.
    Attribute Reduction Classifier AttributeSelection Method Attributes in Original Dataset No. of reduced Attributes Attribute Reduction (in %) J48 CFSSubsetEval+ Greedy Stepwise 25 17 32 ClassifierSubsetEval+Greedy Stepwise 25 4 84 WrapperSubsetEval+Best Fit 25 13 48 Decision Table CFSSubsetEval+ Greedy Stepwise 25 17 32 ClassifierSubsetEval+Greedy Stepwise 25 4 84 WrapperSubsetEval+Best Fit 25 7 72 IBK CFSSubsetEval+ Greedy Stepwise 25 17 32 ClassifierSubsetEval+Greedy Stepwise 25 5 80 WrapperSubsetEval+Best Fit 25 7 72
  • 19.
    Accuracy of Reduced Dataset ClassifierAttribute Selection Method Attribute Reduction (in %) Accuracy without Reduction Accuracy with Reduction J48 CFSSubsetEval+ Greedy Stepwise 32 99 99 ClassifierSubsetEval+Greedy Stepwise 84 99 98.25 WrapperSubsetEval+Best Fit 48 99 99 Decision Table CFSSubsetEval+ Greedy Stepwise 32 99 98.75 ClassifierSubsetEval+Greedy Stepwise 84 99 99.25 WrapperSubsetEval+Best Fit 72 99 99 IBK CFSSubsetEval+ Greedy Stepwise 32 95.75 98 ClassifierSubsetEval+Greedy Stepwise 80 95.75 99.75 WrapperSubsetEval+Best Fit 72 95.75 100
  • 20.
    Comparison of Accuracyfor J48, Decision Table and IBK Classifier with original and reduced dataset
  • 21.
    CONCLUSION  The accuracyof IBK for original dataset is 95.75%  While with 72% reduced dataset, it provides 100% accuracy using WrapperSubsetEval attribute evaluator with bestfirst search.  J48 and Decision Table provides better results than IBK for originaldataset  While IBK performed better with reduced dataset than originaldataset.  IBK can be used to predict CKD in efficient and fast way with reduced attributes.
  • 22.
    References [1] L. Jena,and N. Ku. Kamila, "Distributed data mining classification algorithms for prediction of chronic- kidney-disease," International Journal of Emerging Research in Management &Technology, vol-4, Issue- 11, pp: 110-118, November 2015. [2] K. Chandel, V. Kunwar, S. Sabitha, T. Choudhury, and S. Mukherjee, “A comparative study on thyroid disease detection using K-nearest neighbor and Naive Bayes classification techniques, CSI transactions on ICT, 4(2-4), pp: 313-319, 2016. [3] Sudhir B. Jagtap, "Census data mining and data analysis using WEKA," arXiv preprint arXiv:1310.4647, 2013. [4] S.Dilli Arasu, R.Thirumalaiselvi, “Review of Chronic Kidney Disease based on Data Mining Techniques,” International Journal ofApplied Engineering Research, vol-12, pp: 13498-13505, 2017. [5] S. Zeynu, Shruti Patil, “Survey on Prediction of Chronic Kidney Disease Using Data Mining Classification Techniques and Feature Selection,” International Journal of Pure and Applied Mathematics, vol-118, No. 8,pp:149-156, 2018. [6] K. R. Lakshmi, Y. Nagesh, and M. Veera Krishna, "Performance comparison of three data mining techniques for predicting kidney dialysis survivability," International Journal of Advances in Engineering & Technology, vol. 7, pp: 242-254, 2014. [7] N. Chetty, Kunwar Singh Vaisla, and Sithu D. Sudarsan, “Role of attributes selection in classification of Chronic Kidney Disease patients,” Computing, Communication and Security (ICCCS), International Conference on. IEEE, 2015.
  • 23.
    References [8] S. Vijayarani,and S. Dhayanand, "Data mining classification algorithms for kidney disease prediction,"International Journal on Cybernetics and Informatics (IJCI) , 2015. [9] L. Jerlin Rubini and Dr. P. Eswaran, “Generating comparative analysis of early stage prediction of Chronic Kidney Disease,” International Journal of Modern Engineering Research (IJMER), Volume 5, Issue 7, pp 49-55, July2015. [10] Uma N. Dulhare, and Mohammad Ayesha, “Extraction of action rules for chronic kidney disease using Naïve bayes classifier,” Computational Intelligence and Computing Research (ICCIC), IEEE International Conference on IEEE, 2016. [11] H. Polat, Homay Danaei Mehr, and Aydin Cetin, “Diagnosis of chronic kidney disease based on support vector machine by feature selection methods,” Journal of medical systems, Feb 2017. [12] W. Abedalkhader, and Noora Abdulrahman, “Missing Data Classification Of Chronic Kidney Disease,” International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017. [13] Abeer Y. Al-Hyari, “Chronic Kidney Disease Prediction System UsingClassifying Data Mining Techniques,” Library of university of Jordan, 2012. [14 Jiliang Tang, Salem Alelyani, and Huan Liu, “Feature selection for classification: A review,” Data classification:Algorithms and applications, 2014.
  • 24.
    References [15] Geoffrey Holmes,Andrew Donkin, and Ian H. Witten, “Weka: A machine learning workbench,” Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on. IEEE, 1994. [16] Mary K. Obenshain, “Application of data mining techniques to healthcare data,” Infection Control & Hospital Epidemiology25.8, pp: 690-695, 2004. [17] Cheng, Li-Chen, Ya-Han Hu, and Shr-Han Chiou, “Applying the Temporal Abstraction Technique to the Predictionof Chronic Kidney Disease Progression,” Journal of medical systems 41, April 2017. [18] Neeraj Bhargava, Girja Sharma, Ritu Bhargava, and Manish Mathuria, “Decision tree analysis on J48 algorithm for data mining,” Proceedings of International Journal of Advanced Research in Computer Scienceand Software Engineering, Vol. 3, pp:1114-1119, June 2013. [19] Hongjun Lu, and Hongyan Liu, “Decision tables: Scalable classification exploring RDBMS capabilities,”Proceedings of the 26th International Conference onVery Large Data Bases,VLDB'00. 2000.