Data mining algorithms for Recognition and Codification of
Glandular Disorder
Pimpri Chinchwad College of Engineering, Nigdi, Pune, India
Presentee:
1. Amrita Chavan
Guided by:
1. Dr. K. Rajeswari(H.O.D of CSE)
2. Prof. Rupali Bhondve(Dept. Of CSE)
1. Introduction
2. Data Set Description
3. Algorithms
4. System Implementation
5. Accuracy Calculation
6. Graphical Representation
7. Conclusion
8. References
Contents
Pimpri Chinchwad College of Engineering, Nigdi, Pune, India
Introduction
● People suffering from thyroid gland tend to fall sick due to under or overproduction of
hormones from this gland
● Imbalance of thyroid
○ Hypothyroidism
○ Hyperthyroidism
● Data mining is enhancing strategically important tool
● Data mining will be the mainstay in detecting disease
3Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
Data Set
A. Data set Details
● Downloaded from university of California of Irvin (UCI) repository
● Dataset has 29 features
● 3772 sample from 3481 negative category
● 194 from compensated Glandular disorder category
● 95 from primary hypothyroid category
● 2 from secondary hypothyroid category
4Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
Data Set contd...
B. Loading and Filtering Files
● WEKA has file format converter
○ .CSV
● If WEKA cannot load the extension data, it test to clarify it as ARFF format
○ .arff
5Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
Algorithms Used
1. Bayesian Net
2. J48(Decision Tree)
3. REP Tree
4. CART(Classification and Regression Tree)
5. Decision Stump
6Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
System Implementation
Fig 1 : System Implementation
7Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
Accuracy Calculation
Table 1 . Accuracy by class for various classifiers
8
Name of
Algorithm
Accuracy TP Rate FP Rate Precision Recall
Bayes Net 98.59 0.993 0.086 0.993 0.993
J48 99.57 0.999 0.021 0.998 0.999
REP Tree 99.57 0.998 0.007 0.999 0.998
CART 99.52 0.998 0.01 0.999 0.998
Decision Stump 95.39 0.978 0.009 0.999 0.978
Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
Accuracy Calculation contd...
Table 2. Accuracy for K fold with J48
k=n Accuracy TP Rate FP Rate Precision Recall
k=2 99.46 0.998 0.01 0.998 0.999
k=4 99.57 0.998 0.014 0.999 0.998
k=6 99.6 0.999 0.021 0.998 0.999
k=8 99.54 0.998 0.021 0.998 0.998
k=10 99.57 0.999 0.021 0.998 0.999
9Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
Graphical Representation
Fig 2. Comparison of varied classifiers
10Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
Graphical Representation contd...
Fig 3. Comparison of K- fold for J48 classifier
11Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
Conclusion
● WEKA Tool is used for calculation
● Hypothyroid dataset is applied to data mining classifier
● For Glandular Disorder diagnosis purpose we used various classifications techniques.
● Performance evaluation done with respective performance measures like
○ Accuracy
○ Recall
○ Precision
○ FP Rate
○ TP Rate
● Using K-fold cross validation technique with J48 gives best results as compared to other
classification algorithm.
12Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
1. Pandey, Shivanee, Rohit Miri, and S. R. Tandan. "Diagnosis and classification of hypothyroid disease using data
mining techniques." IJERT, ISSN (2013): 2278- 0181.J. Clerk Maxwell, A Treatise on Electricity and
Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73.
2. Patil, Tina R., and S. S. Sherekar. "Performance analysis of Naive Bayes and J48 classification algorithm for
data classification." International Journal of Computer Science and Applications 6.2 (2013): 256-261.
3. Wisaeng, Kittipol. "A comparison of decision tree algorithms for UCI repository classification." Int. J. Eng.
Trends Technol 4 (2013): 3393-3397.
4. “UCI Machine Learning Repository of machine learning database”, University of California, school of
Information and Computer Science, Irvine. C.A. http://www.ics.uci.edu/
References
13Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
References contd...
5. Han, Jiawei, Jian Pei, and Micheline Kamber. Data mining: concepts and techniques. Elsevier, 2011.
6. Dash, Shreela, M. N. Das, and Brojo Kishore Mishra. "Implementation of an optimized classification
model for prediction of hypothyroid disease risks." Inventive Computation Technologies (ICICT),
International Conference on. Vol. 2. IEEE, 2016.
7. http://www.eee.metu.edu.tr/~halici/courses/543LectureNotes/lecturenotes-pdf/ch9.pdf
8. Banu, G. Rasitha. "A Role of decision Tree classification data Mining Technique in Diagnosing
Thyroid disease." International Journal of Computer Sciences and Engineering4.11(2016):64-70.
14Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
15

Data mining algorithms for recognition and codification of glandular disorder

  • 1.
    Data mining algorithmsfor Recognition and Codification of Glandular Disorder Pimpri Chinchwad College of Engineering, Nigdi, Pune, India Presentee: 1. Amrita Chavan Guided by: 1. Dr. K. Rajeswari(H.O.D of CSE) 2. Prof. Rupali Bhondve(Dept. Of CSE)
  • 2.
    1. Introduction 2. DataSet Description 3. Algorithms 4. System Implementation 5. Accuracy Calculation 6. Graphical Representation 7. Conclusion 8. References Contents Pimpri Chinchwad College of Engineering, Nigdi, Pune, India
  • 3.
    Introduction ● People sufferingfrom thyroid gland tend to fall sick due to under or overproduction of hormones from this gland ● Imbalance of thyroid ○ Hypothyroidism ○ Hyperthyroidism ● Data mining is enhancing strategically important tool ● Data mining will be the mainstay in detecting disease 3Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 4.
    Data Set A. Dataset Details ● Downloaded from university of California of Irvin (UCI) repository ● Dataset has 29 features ● 3772 sample from 3481 negative category ● 194 from compensated Glandular disorder category ● 95 from primary hypothyroid category ● 2 from secondary hypothyroid category 4Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 5.
    Data Set contd... B.Loading and Filtering Files ● WEKA has file format converter ○ .CSV ● If WEKA cannot load the extension data, it test to clarify it as ARFF format ○ .arff 5Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 6.
    Algorithms Used 1. BayesianNet 2. J48(Decision Tree) 3. REP Tree 4. CART(Classification and Regression Tree) 5. Decision Stump 6Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 7.
    System Implementation Fig 1: System Implementation 7Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 8.
    Accuracy Calculation Table 1. Accuracy by class for various classifiers 8 Name of Algorithm Accuracy TP Rate FP Rate Precision Recall Bayes Net 98.59 0.993 0.086 0.993 0.993 J48 99.57 0.999 0.021 0.998 0.999 REP Tree 99.57 0.998 0.007 0.999 0.998 CART 99.52 0.998 0.01 0.999 0.998 Decision Stump 95.39 0.978 0.009 0.999 0.978 Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 9.
    Accuracy Calculation contd... Table2. Accuracy for K fold with J48 k=n Accuracy TP Rate FP Rate Precision Recall k=2 99.46 0.998 0.01 0.998 0.999 k=4 99.57 0.998 0.014 0.999 0.998 k=6 99.6 0.999 0.021 0.998 0.999 k=8 99.54 0.998 0.021 0.998 0.998 k=10 99.57 0.999 0.021 0.998 0.999 9Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 10.
    Graphical Representation Fig 2.Comparison of varied classifiers 10Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 11.
    Graphical Representation contd... Fig3. Comparison of K- fold for J48 classifier 11Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 12.
    Conclusion ● WEKA Toolis used for calculation ● Hypothyroid dataset is applied to data mining classifier ● For Glandular Disorder diagnosis purpose we used various classifications techniques. ● Performance evaluation done with respective performance measures like ○ Accuracy ○ Recall ○ Precision ○ FP Rate ○ TP Rate ● Using K-fold cross validation technique with J48 gives best results as compared to other classification algorithm. 12Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 13.
    1. Pandey, Shivanee,Rohit Miri, and S. R. Tandan. "Diagnosis and classification of hypothyroid disease using data mining techniques." IJERT, ISSN (2013): 2278- 0181.J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73. 2. Patil, Tina R., and S. S. Sherekar. "Performance analysis of Naive Bayes and J48 classification algorithm for data classification." International Journal of Computer Science and Applications 6.2 (2013): 256-261. 3. Wisaeng, Kittipol. "A comparison of decision tree algorithms for UCI repository classification." Int. J. Eng. Trends Technol 4 (2013): 3393-3397. 4. “UCI Machine Learning Repository of machine learning database”, University of California, school of Information and Computer Science, Irvine. C.A. http://www.ics.uci.edu/ References 13Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 14.
    References contd... 5. Han,Jiawei, Jian Pei, and Micheline Kamber. Data mining: concepts and techniques. Elsevier, 2011. 6. Dash, Shreela, M. N. Das, and Brojo Kishore Mishra. "Implementation of an optimized classification model for prediction of hypothyroid disease risks." Inventive Computation Technologies (ICICT), International Conference on. Vol. 2. IEEE, 2016. 7. http://www.eee.metu.edu.tr/~halici/courses/543LectureNotes/lecturenotes-pdf/ch9.pdf 8. Banu, G. Rasitha. "A Role of decision Tree classification data Mining Technique in Diagnosing Thyroid disease." International Journal of Computer Sciences and Engineering4.11(2016):64-70. 14Pimpri Chinchwad College of Engineering, Nigdi, Pune, India02.16.2018
  • 15.