SlideShare a Scribd company logo
1 of 31
COMPARISON OF MACHINE
LEARNING TECHNIQUES USING WEKA
ENVIRONMENT
2011 20th IEEE International Workshops on Enabling
Technologies: Infrastructure for Collaborative Enterprises
Dwi Riyono D10207801
Dani Pranata M10307804
Nurul Retno Nurwulan D10301807
INTRODUCTION
• Correct diagnosis for further treatment or a potential therapy
change of a specific patient could be assisted with the use
of machine learning.
• The medical data given was employed in order to evaluate
the performance of a number of classification techniques.
• This study analyzed and evaluated the decision making task
of therapy change which a doctor suggests, when a number
of blood test parameters – mainly Prostate Specific Antigen
(PSA) – are measured every 3 months.
• Prostate cancer is the most common non-cutaneous cancer
and the second-leading cause of death in men in USA. It
prevalent in many countries and exhibits a wide spectrum of
aggressiveness.
Material and Methods
• Medical Problem Description and Data
• The WEKA environment
• Techniques
MEDICAL PROBLEM
DESCRIPTION AND DATA
• Challenge to physician who treats patients with
prostate cancer in advising effective treatment.
• Selection of appropriate treatment requires assessment
of the tumor’s potential aggressiveness and the
general health, life expectancy, and quality of life
preferences of the patient.
• Parameters chosen: Hematocrit (HCT), White Blood
Cells (WBC), free Prostate Specific Antigen (PSA free),
total Prostate Specific Antigen (PSA total), ratio PSA
(PSAfree/PSAtotal), Prostatic Acidic Phospatase (PAP),
and potential therapy change decision (yes/no).
• Real data of 40 patients were obtained. There were
1960 unique instances consisting of 280 rows and 7
columns.
THE WEKA ENVIRONMENT
• WEKA implements various machine learning
classification techniques, algorithm for regression and
clustering along with a number of visualization tools
that has been accepted as powerful and adequate
environment for data mining.
• All data analyzed and mined with the aim of WEKA is
saved in ARFF file format, which consists of special tags
in order to designate between attributes, values, and
names of the data given.
• All of the parameter chosen (blood test parameters)
were numerical values and the change therapy
decision of the doctor in the simple format of a yes/no.
TECHNIQUES
• Decision Trees – J48
• Neural Network (Multilayer Perceptron –
MLP)
• Naïve Bayes
• Radial Basis Function (RBF)
• K-Nearest Neighbor (IBk)
TECHNIQUES (2)
• Decision Trees – J48
It represents a mapping of the attributes given and
consists of nodes which link to two or more sub-trees. A
node calculates a specific outcome which is based on
the value of the instance and each possible outcome
is linked with one of the sub-trees. The J48 algorithm is
an efficient method for estimation and classification of
fuzzy data.
TECHNIQUES (3)
• Neural Network (Multilayer Perceptron – MLP)
An adaptive system that changes its structure based
on external or internal information which flows through
the network during an initial learning phase. In more
practical terms, NN is a non-linear statistical data
modeling tools. It can be used to model complex
relationships between inputs and outputs or to find
patterns in data.
The back propagation algorithm MLP was applied in
order to categorize a practitioner’s decision (therapy
change) was applied, using two input nodes (no = 0,
yes =1)
TECHNIQUES (4)
• Naïve Bayes
A representation of the Bayesian classifier that
produces probabilistic rules and received noteworthy
attention when used for classification purposes.
Classification is performed when the well-known Bayes
rule is applied to each attribute of the model and the
probability over an independent class variable is
computed. Although the model is straightforward, it
provides quite promising results on many real world
datasets.
TECHNIQUES (5)
• Radial Basis Function (RBF)
Initially introduced in order to address a variety of
problems (old pattern recognition techniques,
clustering, functional approximation, etc.). It is now
acknowledged to be one of the most important NN
models for classification. The basic function is based on
two-layer feed-forward model with a hidden layer
between the sets of input and output. Gaussian
function is preferred for classification and a key factor
for the successful implementation is to find a suitable
center.
TECHNIQUES (6)
• K-Nearest Neighbor (IBk)
One of the simplest forms of classification algorithms,
depicted as statistical learning algorithms and are
generated by simply storing the given data. Distance
metric is chosen and any new data is compared
against all-ready “memorized” data items, for the
classification purpose. The new item is assigned to the
class which is most common amongst its k nearest
neighbors. IBk is an implementation of the k-nearest
neighbor, which the number of nearest neighbor (k)
can be set manually or determined automatically
using cross-validation.
EXPERIMENT AND
RESULT
DATA SOURCE
Blood test are used to collect the data from patients
Six parameters that would be measured :
1. Hct (Hematocrit), volume percentage (%) of red
blood cells in blood
2. WBC (White blood cell)
3. PAP (Prostatic Acidic Phosphatase),
4. PSA Free (Prostate-Specific Antigen)
5. PSA Total
6. PSAf/PSAt
The percentage of PSA in the free or complex isoforms,
were used to predict the patient’s state over a period of
2 years.
BLOOD TEST PARAMETERS
Blood test are used to collect some health information
from each patients with the diagnosis of prostate
cancer.
Six parameters that will be measured :
1. Hct (Hematocrit), volume percentage (%) of red
blood cells in blood
2. WBC (White blood cell)
3. PAP (Prostatic Acidic Phosphatase),
4. PSA Free (Prostate-Specific Antigen)
5. PSA Total
6. PSAf/PSAt
The percentage of PSA in the free or complex isoforms,
were used to predict the patient’s state over a period
of 2 years.
TABLE 1.
BLOOD TEST PARAMETERS AND THEIR CRITICAL VALUES
Blood Tes Parameters Critical Values
HCT >28%
WBC >4000/mL
PSA Free 0.03ng/dl
PSA Total 0.05ng/dl
PSAf/PSAt >0.2
Prostatic Acid Phosfatase <3.5ng/ml
The difference value between each parameters and their critical value for every
quarter, would be used to decide a potential therapy plan change, along with
patient’s history and previous blood test results.
CLASSIFICATION PROCESS OF THE TARGET
PARAMETER (THERAPY CHANGE)
These difference value, would be provided to WEKA
toolbox for classification of the target parameter
(therapy change).
5 machine learning algorithm are used to obtain the
result.
1. J48
2. MLP
3. Naïve Bayes
4. RBF
5. IBk
CLASSIFICATION RESULTS FOR EACH
EXAMINED ALGORITHM FOR QUARTER 1
WEKA
Techniques
Simulation Results for Quarter 1
Correctly
Classified
Incorrectly
Classified
Time taken
(sec)
Kappa
statistic
J48 85% (34) 15% (6) 0.03 0.4146
MLP 85% (34) 15% (6) 0.13 0.4146
Naïve Bayes 90% (36) 10% (4) 0.01 0.6098
RBF 90% (36) 10% (4) 0.11 0.6098
IBk 82.5% (33) 17.5% (7) 0.01 0.2708
The table above, is mainly summarizes the accuracy of each machine learning
algorithm for all 40 patients, along with the time taken and Kappa statistic for each
algorithm.
TRAINING AND SIMULATION
ERROR FOR QUARTER 1
WEKA
Techniques
Simulation Results for Quarter 1
Mean
Absolute Error
Root Mean
Squared Error
Relative
Absolute Error
(%)
Root Relative
Squared Error (%)
J48 0.1737 0.3638 57.409 94.407
MLP 0.1899 0.3651 62.735 95.039
Naïve Bayes 0.1014 0.3163 33.494 81.334
RBF 0.1423 0.3127 47.007 81.406
IBk 0.1921 0.408 63.478 106.212
The table above is an overall synopsis based on different error rates.
DISCUSSION
&
CONCLUSIONS AND FUTURE WORK
DISCUSSION
Based on the results obtained for the 1st quarter of the therapy
plan for all patients examined, a number of useful conclusions
could be yielded, concerning the performance and error rates
of the algorithms chosen.
1. Naïve Bayes and RBF Network algorithms succeed to
obtain a relatively high accuracy rate (90%) with Kappa
score of 0.6098
2. Between the two of them, Naïve Bayes performs very fast
only 0.01 seconds comparing to 0.11 seconds that RBF
takes.
3. IBk algorithm has the worst accurracy with a small Kappa
Statitisic score of 0.2708, although time taken only 0.01 s.
4. This study observe that Naïve Bayes has the lowest mean
absolute, relative absolute and root relative squared error
rates, therefore it has more powerful classification
capabilities.
5. This study appointing that Naïve Bayes and RBF are the
best algorithm.
6. IBk is the algorithm with the highest error rate.
DISCUSSION
Perfomance J48 worth to mention, with accuracy rate
85% and the visualization tree which derived from the
execution of the algorithm for Q1. This decision tree
given in Figure 1 : For any patient given :
1. If the difference of PSA free is greater than 2.07
then there is definitely a necessity for therapy
change.
2. If not, then a ratioPSA (i.e.PSAfree/PSAtotal) and if
it’s greater than 0.15 (0.17 for a physician) then
there may not be need to change therapy.
3. If ratioPSA is lower than 0.15 then the last
parameter that the algorithm takes into
consideration is Prostatic Acidic Phosfatase and
characterizes a decision made according to a
difference of 2.3ng/ml.
DISCUSSION (2)
• Based on table III, concerning in classification error.
Naïve bayes (powerful classification
capabilities)has the lowest mean absolute, relative
absolute and root relative squared error rates.
• The algorithm with the highest error rates as can be
easily seen is IBk.
DISCUSSION (3)
• Based on table III, concerning in classification
error. Naïve bayes (powerful classification
capabilities)has the lowest mean absolute,
relative absolute and root relative squared error
rates.
• The algorithm with the highest error rates as can
be easily seen is IBk.
DISCUSSION
• This study, from (Table IV) a few observation can be
appointed most important aspect that a physician rarely
changes therapy to many patients during the period of a
quarter, therefore only one or two patients, therapy plan
was changed show in Table IV (Q2,Q4,Q6, and Q7).
• After performing a closer look to the value of the
parameters measured, moreover discussing this discovery
with the physicians (doctors in Urology), it turned out that
case considered to be ‘problematic’ in terms of measured
values. Specifically, these patients were not responding to
the treatment given and constantly the blood parameters
measured were extremely high or very low.
• Mean classification accuracy for each algorithm, for all
quarters examined, was: 92% for J48, 89% for MLP, 86% for
Bayes, 95% for RBF network and 92% for IBk.
CONCLUSIONS AND FUTURE WORK
• This study a comparison of five machine learning algorithms
upon real medical data was presented.
• Useful results were obtained concerning the performance and
error rates of the algorithms.
• The experiments performed showed that the best algorithm
based on the prostate cancer data given, is RBF Network
technique.
• RBF algorithm performed quite well in terms of classification
accuracy and Kappa score, as well as has given relatively low
error rates for the Q1 presented.
• One way of improving the result is the proposal of a new hybrid
algorithm.(algorithm which comprises of both the difference
between value s measured and critical values as well as the
difference in the values measured between two subsequent
quarters.
• Future work more clinical cases have to be evaluated to justify
these results as more will become available from the Dept.of
Urology.
REFERENCES
• Nils J. Nilsson (1999) Introduction to Machine Learning. California, United States of America
• Jemal A et al: Cancer statistics, 2005. CA Cancer J Clin 2005; 55(1):10–30.
• Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The
WEKA Data Mining Software: An Update; SIGKD Explorations, Volume 11, Issue 1
• Quinlan, J.R. (1990). Decision trees and decision making. IEEE Trans System, Man and Cybernetics 20, (2),
339–346.
• Duda et al. Pattern classification, John Wiley & Sons, 2001
• Langley, P., Iba, W. Thompson, K. 1992 , An analysis of Bayesian classifiers, in Proceedings of the tenth
national conference on artificial intelligence, AAAI Press and MIT Press, pp. 223--228.
• A. G. Bors, "Introduction of the Radial Basis Function (RBF) Networks," Online Symposium for Electronics
Engineers, issue 1, vol. 1, DSP Algorithms: Multimedia, http://www.osee.net/, Feb. 13 2001, pp. 1-7.
• Aha, D. (1992) “Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms.”
International Journal of Man-Machine Studies, Vol. 36, 267–287.
• Quinlan, J.R. (1993). C4.5: Programs for machine learning, Morgan Kaufmann, San Mateo, CA, USA.
• Jang, J.,-S., R., Sun, C., T., Mizutani, E ,: Neuro-fuzzy and soft computing: a computational approach to
learning and machine intelligence, Prentice Hall, Upper Saddle River, NJ, 1997.
• Bishop, C.M. (1996). Neural networks for pattern recognition, First edition, Oxford University Press, USA
• Bors, A.G., Pitas, I., (1996) “Median radial basis functions neural network,” IEEE Trans. on Neural Networks,
vol. 7, no. 6, pp. 1351-1364.
Final presentation dwi riyono

More Related Content

What's hot

Classification of Health Care Data Using Machine Learning Technique
Classification of Health Care Data Using Machine Learning TechniqueClassification of Health Care Data Using Machine Learning Technique
Classification of Health Care Data Using Machine Learning Techniqueinventionjournals
 
Classification of medical datasets using back propagation neural network powe...
Classification of medical datasets using back propagation neural network powe...Classification of medical datasets using back propagation neural network powe...
Classification of medical datasets using back propagation neural network powe...IJECEIAES
 
Estimating survival data from published Kaplan-Meier curves: A comparison of ...
Estimating survival data from published Kaplan-Meier curves:A comparison of ...Estimating survival data from published Kaplan-Meier curves:A comparison of ...
Estimating survival data from published Kaplan-Meier curves: A comparison of ...York Health Economics Consortium (YHEC)
 
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...IRJET Journal
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
 
PERFORMANCE ANALYSIS OF MULTICLASS SUPPORT VECTOR MACHINE CLASSIFICATION FOR ...
PERFORMANCE ANALYSIS OF MULTICLASS SUPPORT VECTOR MACHINE CLASSIFICATION FOR ...PERFORMANCE ANALYSIS OF MULTICLASS SUPPORT VECTOR MACHINE CLASSIFICATION FOR ...
PERFORMANCE ANALYSIS OF MULTICLASS SUPPORT VECTOR MACHINE CLASSIFICATION FOR ...ijcsa
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...IOSR Journals
 
Efficiency of Prediction Algorithms for Mining Biological Databases
Efficiency of Prediction Algorithms for Mining Biological  DatabasesEfficiency of Prediction Algorithms for Mining Biological  Databases
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
 
Boost model accuracy of imbalanced covid 19 mortality prediction
Boost model accuracy of imbalanced covid 19 mortality predictionBoost model accuracy of imbalanced covid 19 mortality prediction
Boost model accuracy of imbalanced covid 19 mortality predictionBindhuBhargaviTalasi
 
A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
 
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISAN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
 
An Artificial Neural Network Model for Neonatal Disease Diagnosis
An Artificial Neural Network Model for Neonatal Disease DiagnosisAn Artificial Neural Network Model for Neonatal Disease Diagnosis
An Artificial Neural Network Model for Neonatal Disease DiagnosisWaqas Tariq
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
 
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...rahulmonikasharma
 

What's hot (18)

Classification of Health Care Data Using Machine Learning Technique
Classification of Health Care Data Using Machine Learning TechniqueClassification of Health Care Data Using Machine Learning Technique
Classification of Health Care Data Using Machine Learning Technique
 
Classification of medical datasets using back propagation neural network powe...
Classification of medical datasets using back propagation neural network powe...Classification of medical datasets using back propagation neural network powe...
Classification of medical datasets using back propagation neural network powe...
 
Estimating survival data from published Kaplan-Meier curves: A comparison of ...
Estimating survival data from published Kaplan-Meier curves:A comparison of ...Estimating survival data from published Kaplan-Meier curves:A comparison of ...
Estimating survival data from published Kaplan-Meier curves: A comparison of ...
 
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 
0 introduction
0  introduction0  introduction
0 introduction
 
PERFORMANCE ANALYSIS OF MULTICLASS SUPPORT VECTOR MACHINE CLASSIFICATION FOR ...
PERFORMANCE ANALYSIS OF MULTICLASS SUPPORT VECTOR MACHINE CLASSIFICATION FOR ...PERFORMANCE ANALYSIS OF MULTICLASS SUPPORT VECTOR MACHINE CLASSIFICATION FOR ...
PERFORMANCE ANALYSIS OF MULTICLASS SUPPORT VECTOR MACHINE CLASSIFICATION FOR ...
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
Accuracy, Sensitivity and Specificity Measurement of Various Classification T...
 
Efficiency of Prediction Algorithms for Mining Biological Databases
Efficiency of Prediction Algorithms for Mining Biological  DatabasesEfficiency of Prediction Algorithms for Mining Biological  Databases
Efficiency of Prediction Algorithms for Mining Biological Databases
 
Boost model accuracy of imbalanced covid 19 mortality prediction
Boost model accuracy of imbalanced covid 19 mortality predictionBoost model accuracy of imbalanced covid 19 mortality prediction
Boost model accuracy of imbalanced covid 19 mortality prediction
 
A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).
 
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISAN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
An Artificial Neural Network Model for Neonatal Disease Diagnosis
An Artificial Neural Network Model for Neonatal Disease DiagnosisAn Artificial Neural Network Model for Neonatal Disease Diagnosis
An Artificial Neural Network Model for Neonatal Disease Diagnosis
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
 
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
 

Viewers also liked

LatentView Overview
LatentView OverviewLatentView Overview
LatentView OverviewLatentView
 
Lecture 4: The Weka Package
Lecture 4: The Weka PackageLecture 4: The Weka Package
Lecture 4: The Weka PackageMarina Santini
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...Rogue Wave Software
 
Analytics machine learning in weka
Analytics machine learning in wekaAnalytics machine learning in weka
Analytics machine learning in wekaSudhakar Chavan
 
Incremental Learning using WEKA
Incremental Learning using WEKAIncremental Learning using WEKA
Incremental Learning using WEKAvrohit13
 
台科逆向簡報
台科逆向簡報台科逆向簡報
台科逆向簡報耀德 蔡
 

Viewers also liked (8)

LatentView Overview
LatentView OverviewLatentView Overview
LatentView Overview
 
Lecture 4: The Weka Package
Lecture 4: The Weka PackageLecture 4: The Weka Package
Lecture 4: The Weka Package
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
Analytics machine learning in weka
Analytics machine learning in wekaAnalytics machine learning in weka
Analytics machine learning in weka
 
weka data mining
weka data mining weka data mining
weka data mining
 
Incremental Learning using WEKA
Incremental Learning using WEKAIncremental Learning using WEKA
Incremental Learning using WEKA
 
台科逆向簡報
台科逆向簡報台科逆向簡報
台科逆向簡報
 
Wekatutorial
WekatutorialWekatutorial
Wekatutorial
 

Similar to Final presentation dwi riyono

Classification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining TechniquesClassification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining TechniquesLovely Professional University
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 InternshipTaylor Martell
 
Automatic Sleep Staging Using State Machine-controlled Decision Trees
Automatic Sleep Staging Using State Machine-controlled Decision TreesAutomatic Sleep Staging Using State Machine-controlled Decision Trees
Automatic Sleep Staging Using State Machine-controlled Decision TreesAnas Imtiaz
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...IJDKP
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Alexander Decker
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your dataAlex Henderson
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
Enhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication NetworksEnhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication NetworksEditor IJCATR
 
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Editor IJCATR
 
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...Editor IJCATR
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Diagnosing Chronic Kidney Disease using Machine Learning
Diagnosing Chronic Kidney Disease using Machine LearningDiagnosing Chronic Kidney Disease using Machine Learning
Diagnosing Chronic Kidney Disease using Machine LearningIRJET Journal
 

Similar to Final presentation dwi riyono (20)

PMED Transition Workshop - Machine Learning Methods to Learn Improved Electro...
PMED Transition Workshop - Machine Learning Methods to Learn Improved Electro...PMED Transition Workshop - Machine Learning Methods to Learn Improved Electro...
PMED Transition Workshop - Machine Learning Methods to Learn Improved Electro...
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Classification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining TechniquesClassification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining Techniques
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
Automatic Sleep Staging Using State Machine-controlled Decision Trees
Automatic Sleep Staging Using State Machine-controlled Decision TreesAutomatic Sleep Staging Using State Machine-controlled Decision Trees
Automatic Sleep Staging Using State Machine-controlled Decision Trees
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Enhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication NetworksEnhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication Networks
 
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
 
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
 
Chap3 1
Chap3 1Chap3 1
Chap3 1
 
IMPL Data Analysis
IMPL Data AnalysisIMPL Data Analysis
IMPL Data Analysis
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Diagnosing Chronic Kidney Disease using Machine Learning
Diagnosing Chronic Kidney Disease using Machine LearningDiagnosing Chronic Kidney Disease using Machine Learning
Diagnosing Chronic Kidney Disease using Machine Learning
 

Recently uploaded

Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 

Recently uploaded (20)

Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 

Final presentation dwi riyono

  • 1. COMPARISON OF MACHINE LEARNING TECHNIQUES USING WEKA ENVIRONMENT 2011 20th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises Dwi Riyono D10207801 Dani Pranata M10307804 Nurul Retno Nurwulan D10301807
  • 2. INTRODUCTION • Correct diagnosis for further treatment or a potential therapy change of a specific patient could be assisted with the use of machine learning. • The medical data given was employed in order to evaluate the performance of a number of classification techniques. • This study analyzed and evaluated the decision making task of therapy change which a doctor suggests, when a number of blood test parameters – mainly Prostate Specific Antigen (PSA) – are measured every 3 months. • Prostate cancer is the most common non-cutaneous cancer and the second-leading cause of death in men in USA. It prevalent in many countries and exhibits a wide spectrum of aggressiveness.
  • 3. Material and Methods • Medical Problem Description and Data • The WEKA environment • Techniques
  • 4. MEDICAL PROBLEM DESCRIPTION AND DATA • Challenge to physician who treats patients with prostate cancer in advising effective treatment. • Selection of appropriate treatment requires assessment of the tumor’s potential aggressiveness and the general health, life expectancy, and quality of life preferences of the patient. • Parameters chosen: Hematocrit (HCT), White Blood Cells (WBC), free Prostate Specific Antigen (PSA free), total Prostate Specific Antigen (PSA total), ratio PSA (PSAfree/PSAtotal), Prostatic Acidic Phospatase (PAP), and potential therapy change decision (yes/no). • Real data of 40 patients were obtained. There were 1960 unique instances consisting of 280 rows and 7 columns.
  • 5. THE WEKA ENVIRONMENT • WEKA implements various machine learning classification techniques, algorithm for regression and clustering along with a number of visualization tools that has been accepted as powerful and adequate environment for data mining. • All data analyzed and mined with the aim of WEKA is saved in ARFF file format, which consists of special tags in order to designate between attributes, values, and names of the data given. • All of the parameter chosen (blood test parameters) were numerical values and the change therapy decision of the doctor in the simple format of a yes/no.
  • 6. TECHNIQUES • Decision Trees – J48 • Neural Network (Multilayer Perceptron – MLP) • Naïve Bayes • Radial Basis Function (RBF) • K-Nearest Neighbor (IBk)
  • 7. TECHNIQUES (2) • Decision Trees – J48 It represents a mapping of the attributes given and consists of nodes which link to two or more sub-trees. A node calculates a specific outcome which is based on the value of the instance and each possible outcome is linked with one of the sub-trees. The J48 algorithm is an efficient method for estimation and classification of fuzzy data.
  • 8. TECHNIQUES (3) • Neural Network (Multilayer Perceptron – MLP) An adaptive system that changes its structure based on external or internal information which flows through the network during an initial learning phase. In more practical terms, NN is a non-linear statistical data modeling tools. It can be used to model complex relationships between inputs and outputs or to find patterns in data. The back propagation algorithm MLP was applied in order to categorize a practitioner’s decision (therapy change) was applied, using two input nodes (no = 0, yes =1)
  • 9. TECHNIQUES (4) • Naïve Bayes A representation of the Bayesian classifier that produces probabilistic rules and received noteworthy attention when used for classification purposes. Classification is performed when the well-known Bayes rule is applied to each attribute of the model and the probability over an independent class variable is computed. Although the model is straightforward, it provides quite promising results on many real world datasets.
  • 10. TECHNIQUES (5) • Radial Basis Function (RBF) Initially introduced in order to address a variety of problems (old pattern recognition techniques, clustering, functional approximation, etc.). It is now acknowledged to be one of the most important NN models for classification. The basic function is based on two-layer feed-forward model with a hidden layer between the sets of input and output. Gaussian function is preferred for classification and a key factor for the successful implementation is to find a suitable center.
  • 11. TECHNIQUES (6) • K-Nearest Neighbor (IBk) One of the simplest forms of classification algorithms, depicted as statistical learning algorithms and are generated by simply storing the given data. Distance metric is chosen and any new data is compared against all-ready “memorized” data items, for the classification purpose. The new item is assigned to the class which is most common amongst its k nearest neighbors. IBk is an implementation of the k-nearest neighbor, which the number of nearest neighbor (k) can be set manually or determined automatically using cross-validation.
  • 13. DATA SOURCE Blood test are used to collect the data from patients Six parameters that would be measured : 1. Hct (Hematocrit), volume percentage (%) of red blood cells in blood 2. WBC (White blood cell) 3. PAP (Prostatic Acidic Phosphatase), 4. PSA Free (Prostate-Specific Antigen) 5. PSA Total 6. PSAf/PSAt The percentage of PSA in the free or complex isoforms, were used to predict the patient’s state over a period of 2 years.
  • 14. BLOOD TEST PARAMETERS Blood test are used to collect some health information from each patients with the diagnosis of prostate cancer. Six parameters that will be measured : 1. Hct (Hematocrit), volume percentage (%) of red blood cells in blood 2. WBC (White blood cell) 3. PAP (Prostatic Acidic Phosphatase), 4. PSA Free (Prostate-Specific Antigen) 5. PSA Total 6. PSAf/PSAt The percentage of PSA in the free or complex isoforms, were used to predict the patient’s state over a period of 2 years.
  • 15. TABLE 1. BLOOD TEST PARAMETERS AND THEIR CRITICAL VALUES Blood Tes Parameters Critical Values HCT >28% WBC >4000/mL PSA Free 0.03ng/dl PSA Total 0.05ng/dl PSAf/PSAt >0.2 Prostatic Acid Phosfatase <3.5ng/ml The difference value between each parameters and their critical value for every quarter, would be used to decide a potential therapy plan change, along with patient’s history and previous blood test results.
  • 16. CLASSIFICATION PROCESS OF THE TARGET PARAMETER (THERAPY CHANGE) These difference value, would be provided to WEKA toolbox for classification of the target parameter (therapy change). 5 machine learning algorithm are used to obtain the result. 1. J48 2. MLP 3. Naïve Bayes 4. RBF 5. IBk
  • 17. CLASSIFICATION RESULTS FOR EACH EXAMINED ALGORITHM FOR QUARTER 1 WEKA Techniques Simulation Results for Quarter 1 Correctly Classified Incorrectly Classified Time taken (sec) Kappa statistic J48 85% (34) 15% (6) 0.03 0.4146 MLP 85% (34) 15% (6) 0.13 0.4146 Naïve Bayes 90% (36) 10% (4) 0.01 0.6098 RBF 90% (36) 10% (4) 0.11 0.6098 IBk 82.5% (33) 17.5% (7) 0.01 0.2708 The table above, is mainly summarizes the accuracy of each machine learning algorithm for all 40 patients, along with the time taken and Kappa statistic for each algorithm.
  • 18. TRAINING AND SIMULATION ERROR FOR QUARTER 1 WEKA Techniques Simulation Results for Quarter 1 Mean Absolute Error Root Mean Squared Error Relative Absolute Error (%) Root Relative Squared Error (%) J48 0.1737 0.3638 57.409 94.407 MLP 0.1899 0.3651 62.735 95.039 Naïve Bayes 0.1014 0.3163 33.494 81.334 RBF 0.1423 0.3127 47.007 81.406 IBk 0.1921 0.408 63.478 106.212 The table above is an overall synopsis based on different error rates.
  • 20. DISCUSSION Based on the results obtained for the 1st quarter of the therapy plan for all patients examined, a number of useful conclusions could be yielded, concerning the performance and error rates of the algorithms chosen. 1. Naïve Bayes and RBF Network algorithms succeed to obtain a relatively high accuracy rate (90%) with Kappa score of 0.6098 2. Between the two of them, Naïve Bayes performs very fast only 0.01 seconds comparing to 0.11 seconds that RBF takes. 3. IBk algorithm has the worst accurracy with a small Kappa Statitisic score of 0.2708, although time taken only 0.01 s. 4. This study observe that Naïve Bayes has the lowest mean absolute, relative absolute and root relative squared error rates, therefore it has more powerful classification capabilities. 5. This study appointing that Naïve Bayes and RBF are the best algorithm. 6. IBk is the algorithm with the highest error rate.
  • 21.
  • 22. DISCUSSION Perfomance J48 worth to mention, with accuracy rate 85% and the visualization tree which derived from the execution of the algorithm for Q1. This decision tree given in Figure 1 : For any patient given : 1. If the difference of PSA free is greater than 2.07 then there is definitely a necessity for therapy change. 2. If not, then a ratioPSA (i.e.PSAfree/PSAtotal) and if it’s greater than 0.15 (0.17 for a physician) then there may not be need to change therapy. 3. If ratioPSA is lower than 0.15 then the last parameter that the algorithm takes into consideration is Prostatic Acidic Phosfatase and characterizes a decision made according to a difference of 2.3ng/ml.
  • 23.
  • 24. DISCUSSION (2) • Based on table III, concerning in classification error. Naïve bayes (powerful classification capabilities)has the lowest mean absolute, relative absolute and root relative squared error rates. • The algorithm with the highest error rates as can be easily seen is IBk.
  • 25. DISCUSSION (3) • Based on table III, concerning in classification error. Naïve bayes (powerful classification capabilities)has the lowest mean absolute, relative absolute and root relative squared error rates. • The algorithm with the highest error rates as can be easily seen is IBk.
  • 26.
  • 27. DISCUSSION • This study, from (Table IV) a few observation can be appointed most important aspect that a physician rarely changes therapy to many patients during the period of a quarter, therefore only one or two patients, therapy plan was changed show in Table IV (Q2,Q4,Q6, and Q7). • After performing a closer look to the value of the parameters measured, moreover discussing this discovery with the physicians (doctors in Urology), it turned out that case considered to be ‘problematic’ in terms of measured values. Specifically, these patients were not responding to the treatment given and constantly the blood parameters measured were extremely high or very low. • Mean classification accuracy for each algorithm, for all quarters examined, was: 92% for J48, 89% for MLP, 86% for Bayes, 95% for RBF network and 92% for IBk.
  • 28.
  • 29. CONCLUSIONS AND FUTURE WORK • This study a comparison of five machine learning algorithms upon real medical data was presented. • Useful results were obtained concerning the performance and error rates of the algorithms. • The experiments performed showed that the best algorithm based on the prostate cancer data given, is RBF Network technique. • RBF algorithm performed quite well in terms of classification accuracy and Kappa score, as well as has given relatively low error rates for the Q1 presented. • One way of improving the result is the proposal of a new hybrid algorithm.(algorithm which comprises of both the difference between value s measured and critical values as well as the difference in the values measured between two subsequent quarters. • Future work more clinical cases have to be evaluated to justify these results as more will become available from the Dept.of Urology.
  • 30. REFERENCES • Nils J. Nilsson (1999) Introduction to Machine Learning. California, United States of America • Jemal A et al: Cancer statistics, 2005. CA Cancer J Clin 2005; 55(1):10–30. • Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKD Explorations, Volume 11, Issue 1 • Quinlan, J.R. (1990). Decision trees and decision making. IEEE Trans System, Man and Cybernetics 20, (2), 339–346. • Duda et al. Pattern classification, John Wiley & Sons, 2001 • Langley, P., Iba, W. Thompson, K. 1992 , An analysis of Bayesian classifiers, in Proceedings of the tenth national conference on artificial intelligence, AAAI Press and MIT Press, pp. 223--228. • A. G. Bors, "Introduction of the Radial Basis Function (RBF) Networks," Online Symposium for Electronics Engineers, issue 1, vol. 1, DSP Algorithms: Multimedia, http://www.osee.net/, Feb. 13 2001, pp. 1-7. • Aha, D. (1992) “Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms.” International Journal of Man-Machine Studies, Vol. 36, 267–287. • Quinlan, J.R. (1993). C4.5: Programs for machine learning, Morgan Kaufmann, San Mateo, CA, USA. • Jang, J.,-S., R., Sun, C., T., Mizutani, E ,: Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence, Prentice Hall, Upper Saddle River, NJ, 1997. • Bishop, C.M. (1996). Neural networks for pattern recognition, First edition, Oxford University Press, USA • Bors, A.G., Pitas, I., (1996) “Median radial basis functions neural network,” IEEE Trans. on Neural Networks, vol. 7, no. 6, pp. 1351-1364.

Editor's Notes

  1. PAP is an enzyme produced by the prostate. The highest level of PAP are found in prostate cancer patients body. PSA is present in small quantities in the health prostate