SlideShare a Scribd company logo
1 of 6
Download to read offline
TELKOMNIKA Telecommunication, Computing, Electronics and Control
Vol. 18, No. 3, June 2020, pp. 1433~1438
ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018
DOI: 10.12928/TELKOMNIKA.v18i3.14837  1433
Journal homepage: http://journal.uad.ac.id/index.php/TELKOMNIKA
Prediction schizophrenia using random forest
Zuherman Rustam, Glori Stephani Saragih
Department of Mathematics, Faculty of Mathematics and Science, Universitas Indonesia, Indonesia
Article Info ABSTRACT
Article history:
Received Aug 15, 2019
Revised Feb 12, 2020
Accepted Feb 21, 2020
Schizophrenia is a mental illness with a very bad impact on sufferers, attacking
the part of human brain that disables the ability to think clearly. In 2018,
Rustam and Rampisela classified Schizophrenia by using Northwestern
University Schizophrenia Data, based on 66 variables consisting of group,
demographic, and questionnaires statistics, based on the scale for the
assessment of negative symptoms (SANS), and scale for the assessment of
positive symptoms (SAS), and then classifiers that used are SVM with
Gaussian kernel and Twin SVM with linear and Gaussian kernel. Furthermore,
this research is novel based on the use of random forest as a classifier, in order
to predict Schizophrenia. The result obtained is reported in percentage of
accuracy, both in training and testing of random forest, which was 100%. This
classification, therefore, shows the best value in contrast with prior methods,
even though only 40% of training data set was used. This is very important,
especially in the cases of rare disease, including schizophrenia.
Keywords:
Classification
Machine learning
Random forest
Schizophrenia
This is an open access article under the CC BY-SA license.
Corresponding Author:
Zuherman Rustam,
Department of Mathematics,
Faculty of Mathematics and Science,
Universitas Indonesia,
Margonda Raya St., Pondok Cina, Beji, Depok, Jawa Barat 16424, Indonesia.
Email: rustam@ui.ac.id
1. INTRODUCTION
Schizophrenia is a mental illness that has a very bad impact on sufferers, based on its ability to
attack parts of the human brain, thus disabling the persons ability to think clearly [1]. Generally, patients
experience a change either behaviorally or on the mind, which subsequently affects reality. This disease
possesses the propensity to attack everyone at any age, the average attacks starting at the age of 20s in men,
while in women, it was observed at the end of 20s [1]. It is, therefore, important to proficiently identify
the symptoms on time, in order to prevent the disease from occuring in earnest. Schizophrenia is generally
divided with three symptoms, including 1) the positive, presented with unnecessitated extra brain activities,
including hallucinations, 2) the negative, indicated by the loss of brain activities, 3) the cognitive, which is
exhibited as challenges with ability to remember and think. These are severe facts that confer potential
negative effects on the life of sufferers, and their mortality rates are 2 to 2.5 times higher than the general
population [2], 10% commited suicide and 20-40% attempted suicide at least once [3]. Then, the cause of
schizophrenia has not been determined for sure [4] and this illness has been evaluated to get worse due to the
incompetence of early detection, hence, the need to identify other approaches for detection, and one of which
involves the use of computational methods, comprising of machine learning.
However, several papers have attempted utilizing this technique in the diagnosis of schizophrenia,
including SVM with Gaussian kernel, Twin SVM with linear and Gaussian kernel [5], linear discriminant
 ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 3, June 2020: 1433 - 1438
1434
analysis and k-Nearest neighbor [6], fisher linear discriminant analysis [7], Elastic Net, as well as least
absolute shrinkage and selection operator [8]. This current research involved the use of random forest as a
classify, although it has widely been used in various studies, including the prediction of bank financial
failures, with accuracy of 93% [9], diabetes mellitus at 80.8% [10], automated diagnosis of heart disease at
83.6%, applying the weighted random forest [11], classify prostate cancer [12], chronic kidney disease [13]
and osteoarthritis disease [14]. Hence, the selected approach has proven the capability of this classifier to
apply to any problem, exhibiting good model performance in the process. This research is organized as
follows: section 1 provides background of details, while the second specifies data and research method used.
In addition, result and analysis were discussed in section 3, and finally, conclusions are included
in section 4.
2. DATA AND RESEARCH METHOD
2.1. Data
Information from the database of Northwestern University Schizophrenia Data was used in this
study [1]. Furthermore, there were 392 observations divided into 4 groups, with distributions as seen in
Table 1. The study followed the grouping used by Rustam and Rampisela in the paper entitled “Support vector
machines and twin support vector machines in the classification of schizophrenia data” [5]. This was because of
the similarity in research focus, which was based on the categorization into schizophrenics and
non-schizophrenics only, which served as the group variable [2]. Non-schizophrenics group consists of healthy
siblings of the patients, the control, and siblings of control. A total of 66 data variables were collected, including
the group, and demographics, consisting of gender, dominant hand, race, ethnicity, and age, using
questionnaires statistics of scale for the assessment of negative symptoms (SANS) [15] and scale for
the assessment of negative symptoms (SAPS) [16], as shown in Table 2.
Table 1. Distribution of group
Group Number of observations
Schizophrenics 171
Siblings of Schizophrenics 44
Control
Siblings of Control
111
66
Table 2. The variable of schizophrenia data
𝑛th
Variable
Data Group Variable Description
1 - 34 Questionnaires
of SAPS
SAPS𝑖,
𝑖 = 1, … , 34
SAPS is used to evaluate the positive symptoms of schizophrenia, SAPS is
divided into 4 main sections containing 34 different symptoms, namely
hallucination, delusion, bizarre behavior and thought disorder [16]. The data is in
scale (0.5)
35 - 60 Questionnaires
of SANS
SANS𝑗,
𝑗
= 35, … ,60
SANS is used to evaluate the negative symptoms of schizophrenia, SANS is
divided into 5 main sections containing 25 different symptoms, namely emotional
reaction decline, alogia, avolition and apathy, anhedonia and asociality, and
attention [15]. The data is in scale (0.5)
61 Demographic Gender Gender is divided into 2 categories: 1) Male and 2) Female
62 Demographic Dominant
Hand
Dominant Hand is divided into 2 categories: 1) Left and 2) right
63 Demographic Race
64 Demographic Ethnicity Ethnicity is divided into 3 categories: 1) Kaukasia, 2) Africa-America, and
3) others
65 Demographic Age Integer (13.66)
66 Group Group Group is divided into 2 classes: 0) Non-Schizophrenics and 1) Schizophrenics
2.2. Research method
2.2.1. Bootstrap
In 1948, Queneiville introduced Jackknife as a resampling method, while Bradley Efron initiated
bootstrap as its revolution in 1979, serving as a resampling method with replacement [17]. This allows for
the creation of new data sets from the original, through repeatedly sampling the observations, as this is more
feasible, in contrast with the method that required obtaining data from the population frequently [18].
This approach is a training data set, denoted as 𝐵, which contains 𝑛 observations, randomly selected to
TELKOMNIKA Telecommun Comput El Control 
Prediction schizophrenia using random forest (Zuherman Rustam)
1435
produce the new set, 𝐵∗1
, indicating a similarity is size with the original. In addition, sampling was
performed with replacement, which signifies the propensity of same observations appearing more
than once [19]. The process is conducted continuously to the point where 𝑇 new data set, 𝑇 is capable
of setting personally (e.g 𝑇 = 100) or tune by an independent validation data set.
2.2.2. Random forest
In 2001, Breiman introduced random forest as a categorization tool, consisting of a collection
of tree-structured classifiers {ℎ( 𝒙,  𝑘), 𝑘 = 1, … }, where { 𝑘} are identical independent distributed random
vectors, where each tree casts a unit vote for the most popular class at input 𝒙 [20].The approach used is
aimed at improving stability and accuracy of the decision tree, through the creation of numerous units from
existing training data, using the bootstrap method [21]. Random forest is capable of improving accuracy
through randomization and voting methods, and it is also able to reduce the correlation between trees,
without significantly reducing the strength of each [22]. Therefore, when overfitting is observed in a
particular training data, others do not behave in the same manner [20]. Based on Breiman’s paper,
the process building of numerous trees does not create an overfit, although it produces a generalization error
that converges to a value [20].
Algorithm random forest for classification [9].
1. Given the training data set, with 𝑛 and 𝑚 as observations and variables, respectively
2. For 𝑏 = 1 to 𝐵
a. Draw a bootstrap sample with 𝑛 number of observations from the training (original) data set
b. Build the decision tree 𝑇 𝑏
from each new result derived, where individual nodes are chosen at
random.
i. Select 𝑝 variable at random from 𝑚, with 𝑝 ≤ 𝑚, where 𝑝 = 1,2, … 𝑜𝑟 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 √ 𝑚 [23].
ii. Choose the best feature that provides satisfactory Information Gain or Gini Index [24].
iii. Split the node
c. Grow each without pruning
3. Output the ensemble of decision trees {𝑇 𝑏
}1
𝐵
4. Conduct voting, i.e., if 𝐶̂ 𝑏(𝑥) is the class prediction of the 𝑏th random forest tree,
then 𝐶̂ 𝑟𝑓
𝐵 ( 𝑥) = 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑣𝑜𝑡𝑒 {𝐶̂ 𝑏( 𝑥)}1
𝐵
The algorithm above can be representative with Figure 1.
Figure 1. Flow of random forest
2.2.3. Evaluation of model performance
The evaluation of model performance is important, due to its ability to provides knowledge on
the tools’ efficiency of classifying data. This was assessed through the measurement of accuracy, obtained
from the result of model with confusion matrix, where a high value indicates a good condition of
the classification model [21]. Basically, it is known to contain comparable information with the result made
by the model, including the binary type of classification, which indicates the presence of 2 output class,
encompassing schizophrenics and non-schizophrenics. However, there are 4 parts to this confusion matrix:
 ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 3, June 2020: 1433 - 1438
1436
1) TP: True Positive, observed on instances where schizophrenia is detected as Schizophrenics
2) TN: True Negative, is when non-schizophrenia is detected as non-schizophrenics
3) FP: False Positive, is seen in cases where non-schizophrenia is detected as schizophrenics
4) FN: False Negative, is when schizophrenia is detected as non-schizophrenics
Table 3 shows the confusion matrix used in this reserch.
Table 3. Confusion matrix
Actual Class
Schizophrenics Non-Schizophrenics
Predicted Class
Schizophrenics TP FN
Non-Schizophrenics FP TN
Therefore, based on the value of TP, TN, FP, and FN from the matrix, it is possible to obtain the valued
accuracy with the following formula:
Accuracy =
TP+TN
TP + TN + FP + FN
(1)
3. RESULTS AND ANALYSIS
3.1. Preprocessing data
There 66 variables in schizophrenia Data, therefore, feature selection was conducted before fitting
the model, in order to improve accuracy. This was based on the percentage of missing data from variables
less than 10%, thus, the feature is selected. Then, If the individual variance is more than that in the data
collected from group, then choices are made based on those conditions, with 60 used in the model. Another
means of promoting accuracy is by tuning the hyperparameters, which include those that affect the model
structure and also the result output, therefore, there is need to identify its optimal set [25]. This is obtained by
learning various algorithms with different sets, and subsequently comparing the results of eachs’
performance, also known as tuning the model. Furthermore, the paper by Saragih and Rustam was followed
for the use criterion (Gini and Entropy) [9]. The number of estimators/trees serves as hyperparameters to
improve the accuracy, which collectively with the criterion is a function that measures the equality of a split,
entropy for the Information Gain and Gini index [9].
3.2. Result and analysis
A study [5] applied 4 types of SVM on the same schizophrenia data, therefore, the main goal in
this research is novel, through the use of random forest, in order to enhance predictability. This research
required that the algorithm was run 10 times, and the repetition was performed due to the presence of element
random in this experiment. As mention in section 3.1, the model tuning was conducted with the use
of 2 hyperparameter combinations, and Table 4 provides the result of classification, using random forest with
entropy, while gini was the criterion in Table 5. In this research, we used scikit-learn library. From Table 4,
random forest with entropy in accuracy of training data was able to correctly classify schizophrenia data,
with a 100% accuracy level for all compositions of training data set, and number of trees. This occurs on
instances where the number of trees is 50 and 100 for 50-80% of the composition training data. From
Table 5, the gini in accuracy of training data for random forest, provides the same result as entropy in the
classification of schizophrenia data, which is 100% for all data set, and number of tree. Therefore, if this
percentage was obtainable with entropy for testing in 50-80% training data set, then the result of gini in 40-
80%, with the number of trees is 100, is correct.
Table 4. Accuracy of schizophrenia data classification using random forest with entropy as criterion
Percentage of
Data Training
(%)
Number of Tree
10 50 100 10 50 100
Accuracy of Testing Data (%) Accuracy of Training Data (%)
10 94 95 98 100 100 100
20 95 96 98 100 100 100
30 97 98 99 100 100 100
40 96 99 99 100 100 100
50 97 100 100 100 100 100
60 99 100 100 100 100 100
70 100 100 100 100 100 100
80 100 100 100 100 100 100
TELKOMNIKA Telecommun Comput El Control 
Prediction schizophrenia using random forest (Zuherman Rustam)
1437
Table 5. Accuracy of schizophrenia data classification using random forest with gini as criterion
Percentage of
Data Training
(%)
Number of Tree
10 50 100 10 50 100
Accuracy of Testing Data (%) Accuracy of Training Data (%)
10 93 96 98 100 100 100
20 95 95 98 100 100 100
30 96 98 98 100 100 100
40 96 99 100 100 100 100
50 96 99 100 100 100 100
60 97 100 100 100 100 100
70 100 100 100 100 100 100
80 100 100 100 100 100 100
Based on Table 4 and Table 5, it is seen that a greater composition used is able to produce higher
accuracy, due to the presence of more data learned by the model. Hence, tuning indicates the best accuracy
on instances where the criterion entropy is with number of estimators as 100. However, it is seen that the
results are similar for each composition training data and number of estimators. As said in Section 2,
schizophrenia data was followed that same way as in the paper written by Rustam and Rampisela [5], due to
a desire to compare performance results (accuracy) of the different methods used. Table 6, therefore, shows
the comparison of each methods’ Testing accuracy. Table 6 shows the highest accuracy occurring at random
forest, which was in contrast with other SVM methods used in the paper of Rustam and Rampisela [5]. This
was recorded at a level of 100%, when the percentage of training data is 40-80%, while others only achieved
90% at 60-80%
Table 6. Performance results
Training Data
(%)
Linear SVM
(%)
Gaussian SVM
(%)
Linear Twin
SVM (%)
Gaussian Twin
SVM (%)
Random Forest (%)
Accuracy (%)
Std
deviation
10 88 88 89 89 98 0.0001
20 89 89 89 89 98 0.0002
30 89 89 89 89 98 0.0001
40 89 89 89 89 100 0
50 89 89 89 89 100 0
60 90 90 90 90 100 0
70 90 90 90 90 100 0
80 90 90 90 90 100 0
4. CONCLUSION
Classification of schizophrenia has previously been conducted by Rustam and Rampisela, using
SVM models, therefore, this research was novel in the use of random forest to predict based on the
information collected from the database of Northwestern University Schizophrenia Data, which was also
used by Rustam and Rampisela. Therefore, it was established that random forest with entropy and gini shows
similar results, although it only slightly performs better with gini and the number of estimators being 100.
Subsequently, this technique was also able to predict with good accuracy, using 40% training data,
which was in contrast with other methods used in prior studies which insist on the use of 80%, in order to
obtain 90% accuracy. This is very important, especially in the prediction of rare disease, where data is
difficult to obtain.
In comparison with past studies using the same data, random forest was observed to show better
accuracy, at 100% for training, and also testing. This approach is, therefore, expected to be relevant in the
medical field, especially in the prediction of schizophrenia, and subsequently in other diseases, which is
currently hard in diagnosis, hence, enhancing the accuracy of the medical team in providing treatment. It is
suggested that successive research use feature selection to identify the important feature assists the medical
practitioners to focus on several data points, and also that the application of random forest is adopted in other
dataset with updated and superior dimension.
ACKNOWLEDGMENT
This research supported financially by University of indonesia with a DRPM PUTI Q2 2020
grant scheme.
 ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 3, June 2020: 1433 - 1438
1438
REFERENCES
[1] Wang L., et al., “Northwestern University Schizophrenia Data and Software Tool (NUSDAST),” Frontiers in
Neuroinformatics, vol. 7, pp. 25, 2013.
[2] Yasami M. T., et al., “Living A Healthy Life with Schizophrenia: Paving the Road to Recovery,” World Federation
for Mental Heatlh: World Health Organization, 2014.
[3] Schizophrenia.com. Schizophrenia symptoms and diagnosis of Schizophrenia retrieved from
http://schizophrenia.com/szfacts.htm on 22 October 2019.
[4] American Psychiatric Association. “Diagnostic and statistical manual of mental disorders,” fifth edition
(Arlington, VA: American Psychiatric Association) https://doi.org/10.1176/appi.books.9780890425596. 2013.
[5] Rustam Z., Rampisela T. V. “Support vector machines and twin support vector machines for classification of
schizophrenia data,” International Journal of Engineering & Technology, vol. 7, no. 4, pp. 6378-6877, 2018.
[6] Ahn M., Hong J. H., and Jun S. C., “Feasibility of approaches combining sensor and source features in
brain-computer interface.” Journal of neuroscience methods, vol. 204, no. 1, pp. 168–178, 2012.
[7] Neuhaus A. H., et al., “Single-subject classification of schizophrenia using event-related potentials obtained during
auditory and visual oddball paradigms,” European Archives of Psychiatry and Clinical Neuroscience, vol. 263,
no. 3, pp. 241-247, 2013.
[8] Hettige N. C., et al., “Classification of suicide attempters in schizophrenia using sociocultural and clinical features:
A machine learning approach,” Gen Hosp Psychiatry, vol. 47, pp. 20-28, 2017.
[9] Rustam Z., and Saragih G., “Predict bank financial failures using random forest,” Institute of Electrical and
Electronics Engineers. doi: 10.1109/IWBIS.2018.8471718. pp.81-86, 2018.
[10] Zou Q., et al., “Predicting Diabetes Mellitus with Machine Learning Techniques,” Frontiers in Genetics, vol. 9,
no. 515, 2018.
[11] Patil P. R., and Kinariwala S. A., “Automated Diagnosis og Heart Disease using Random Forest Algorithm,”
International Journal of Advance Research, Ideas and Innovations in Technology. vol. 3, no. 2, pp. 579–589, 2017.
[12] Huljanah M., et al., “Feature Selection using Random Forest Classifier for Predicting Prostate Cancer,” IOP
Conference Series: Materials Science and Engineering, vol. 546, no. 5, 2019.
[13] Rustam Z., Sudarsono E., and Sarwinda D., “Random-Forest (RF) and Support Vector Machine (SVM)
Implementation for Analysis of Gene Expression Data in Chronic Kidney Disease (CKD),” IOP Conference Series:
Materials Science and Engineering, vol. 546, no. 5, 2019.
[14] Aprilliani U., and Rustam Z., “Osteoarthritis Disease Prediction Based on Random Forest,” Institute of Electrical
and Electronics Engineers. DOI: 10.1109/ICACSIS.2018.8618166. pp. 237-240, 2018.
[15] Andreasen N. C., “Scale for the Assessment of Negative Symptoms (SANS).” conceptual and theoretical
foundations. The British journal of psychiatry, pp. 49-52, 1989.
[16] Andreasen N. C., “Scale for the Assessment of Positive Symptoms” (SApS. Iowa City: University of Iowa). 1984.
[17] Efron B., and Tibshirani R., “An Introduction to the Bootstra,” New York: Chapman & Hall. 1993.
[18] Hastie T., Tibshirani R., and Friedman J., “The element of statistical learning data mining, inference, and
prediction,” California: Springer. 2009.
[19] James G., et al., “First Edition; An Introduction to Statistical Learning with Application in R,” Springer. New
York: Springer-Verlag New York, vol. 112, pp. 3-7, 2013.
[20] Breiman L., “Random forests,” Mach. Learn, vol. 45, no. 1, pp.5–32, 2001.
[21] Mindermann S., “Random forests,” Thesis. Amsterdam: Korteweg-de Vries Instituut voor Wiskunde, Universiteit
van Amsterdam, 2016.
[22] Strobl C., “Statistical Issues in Machine Learning-Towards Reliable Split Selection and Variable Importance
Meaasures,” Munchen, 2008.
[23] Breiman L., et al., “Classification and regression trees” Wadsworth Int, 1984.
[24] Grabczewski K., “Meta-Learning in decision tree induction,” Torun: Springer, vol. 1, 2014.
[25] Kohavi R. and Provost F., “On applied research in machine learning,” Editorial for the Special Issue on Application
Machine Learning and the Knowledge Discovery Process, Columbia University, vol. 30, pp. 127-132, 1998.

More Related Content

What's hot

PSO-An Intellectual Technique for Feature Reduction on Heart Malady Anticipat...
PSO-An Intellectual Technique for Feature Reduction on Heart Malady Anticipat...PSO-An Intellectual Technique for Feature Reduction on Heart Malady Anticipat...
PSO-An Intellectual Technique for Feature Reduction on Heart Malady Anticipat...Sivagowry Shathesh
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease predictionKOYELMAJUMDAR1
 
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...ijcsa
 
The Analysis of Performace Model Tiered Artificial Neural Network for Assessm...
The Analysis of Performace Model Tiered Artificial Neural Network for Assessm...The Analysis of Performace Model Tiered Artificial Neural Network for Assessm...
The Analysis of Performace Model Tiered Artificial Neural Network for Assessm...IJECEIAES
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
 
IRJET- Disease Prediction using Machine Learning
IRJET-  	  Disease Prediction using Machine LearningIRJET-  	  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 
Machine Learning Based Approaches for Prediction of Parkinson's Disease
Machine Learning Based Approaches for Prediction of Parkinson's Disease  Machine Learning Based Approaches for Prediction of Parkinson's Disease
Machine Learning Based Approaches for Prediction of Parkinson's Disease mlaij
 
FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...
FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...
FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...IAEME Publication
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.SUJIT SHIBAPRASAD MAITY
 
Paper id 42201618
Paper id 42201618Paper id 42201618
Paper id 42201618IJRAT
 
Machine learning in disease diagnosis
Machine learning in disease diagnosisMachine learning in disease diagnosis
Machine learning in disease diagnosisSushrutaMishra1
 
A NOVEL PROBABILISTIC ARTIFICIAL NEURAL NETWORKS APPROACH FOR DIAGNOSING HEAR...
A NOVEL PROBABILISTIC ARTIFICIAL NEURAL NETWORKS APPROACH FOR DIAGNOSING HEAR...A NOVEL PROBABILISTIC ARTIFICIAL NEURAL NETWORKS APPROACH FOR DIAGNOSING HEAR...
A NOVEL PROBABILISTIC ARTIFICIAL NEURAL NETWORKS APPROACH FOR DIAGNOSING HEAR...ijfcstjournal
 
Predicting Chronic Kidney Disease using Data Mining Techniques
Predicting Chronic Kidney Disease using Data Mining TechniquesPredicting Chronic Kidney Disease using Data Mining Techniques
Predicting Chronic Kidney Disease using Data Mining Techniquesijtsrd
 
Risk Factors and Identifiers for Alzheimer’s Disease: A Data Mining Analysis
Risk Factors and Identifiers for Alzheimer’s Disease:  A Data Mining AnalysisRisk Factors and Identifiers for Alzheimer’s Disease:  A Data Mining Analysis
Risk Factors and Identifiers for Alzheimer’s Disease: A Data Mining Analysisertekg
 
Prognosis of Diabetes by Performing Data Mining of HbA1c
 Prognosis of Diabetes by Performing Data Mining of HbA1c Prognosis of Diabetes by Performing Data Mining of HbA1c
Prognosis of Diabetes by Performing Data Mining of HbA1cIJCSIS Research Publications
 

What's hot (18)

Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
PSO-An Intellectual Technique for Feature Reduction on Heart Malady Anticipat...
PSO-An Intellectual Technique for Feature Reduction on Heart Malady Anticipat...PSO-An Intellectual Technique for Feature Reduction on Heart Malady Anticipat...
PSO-An Intellectual Technique for Feature Reduction on Heart Malady Anticipat...
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
 
The Analysis of Performace Model Tiered Artificial Neural Network for Assessm...
The Analysis of Performace Model Tiered Artificial Neural Network for Assessm...The Analysis of Performace Model Tiered Artificial Neural Network for Assessm...
The Analysis of Performace Model Tiered Artificial Neural Network for Assessm...
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
IRJET- Disease Prediction using Machine Learning
IRJET-  	  Disease Prediction using Machine LearningIRJET-  	  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Machine Learning Based Approaches for Prediction of Parkinson's Disease
Machine Learning Based Approaches for Prediction of Parkinson's Disease  Machine Learning Based Approaches for Prediction of Parkinson's Disease
Machine Learning Based Approaches for Prediction of Parkinson's Disease
 
FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...
FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...
FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
 
Statistics
StatisticsStatistics
Statistics
 
Paper id 42201618
Paper id 42201618Paper id 42201618
Paper id 42201618
 
Machine learning in disease diagnosis
Machine learning in disease diagnosisMachine learning in disease diagnosis
Machine learning in disease diagnosis
 
A NOVEL PROBABILISTIC ARTIFICIAL NEURAL NETWORKS APPROACH FOR DIAGNOSING HEAR...
A NOVEL PROBABILISTIC ARTIFICIAL NEURAL NETWORKS APPROACH FOR DIAGNOSING HEAR...A NOVEL PROBABILISTIC ARTIFICIAL NEURAL NETWORKS APPROACH FOR DIAGNOSING HEAR...
A NOVEL PROBABILISTIC ARTIFICIAL NEURAL NETWORKS APPROACH FOR DIAGNOSING HEAR...
 
Predicting Chronic Kidney Disease using Data Mining Techniques
Predicting Chronic Kidney Disease using Data Mining TechniquesPredicting Chronic Kidney Disease using Data Mining Techniques
Predicting Chronic Kidney Disease using Data Mining Techniques
 
Risk Factors and Identifiers for Alzheimer’s Disease: A Data Mining Analysis
Risk Factors and Identifiers for Alzheimer’s Disease:  A Data Mining AnalysisRisk Factors and Identifiers for Alzheimer’s Disease:  A Data Mining Analysis
Risk Factors and Identifiers for Alzheimer’s Disease: A Data Mining Analysis
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Prognosis of Diabetes by Performing Data Mining of HbA1c
 Prognosis of Diabetes by Performing Data Mining of HbA1c Prognosis of Diabetes by Performing Data Mining of HbA1c
Prognosis of Diabetes by Performing Data Mining of HbA1c
 

Similar to Prediction schizophrenia using random forest

CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...indexPub
 
Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
 
IRJET - An Effective Stroke Prediction System using Predictive Models
IRJET -  	  An Effective Stroke Prediction System using Predictive ModelsIRJET -  	  An Effective Stroke Prediction System using Predictive Models
IRJET - An Effective Stroke Prediction System using Predictive ModelsIRJET Journal
 
A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...BASMAJUMAASALEHALMOH
 
Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...
Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...
Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...IJECEIAES
 
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...BRNSSPublicationHubI
 
Diabetes Prediction Using ML
Diabetes Prediction Using MLDiabetes Prediction Using ML
Diabetes Prediction Using MLIRJET Journal
 
An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...
An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...
An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...Scott Faria
 
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
 
Development of a Hybrid Dynamic Expert System for the Diagnosis of Peripheral...
Development of a Hybrid Dynamic Expert System for the Diagnosis of Peripheral...Development of a Hybrid Dynamic Expert System for the Diagnosis of Peripheral...
Development of a Hybrid Dynamic Expert System for the Diagnosis of Peripheral...ijtsrd
 
Predictive machine learning applying cross industry standard process for data...
Predictive machine learning applying cross industry standard process for data...Predictive machine learning applying cross industry standard process for data...
Predictive machine learning applying cross industry standard process for data...IAESIJAI
 
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...BASMAJUMAASALEHALMOH
 
A computational intelligent analysis of autism spectrum disorder using machin...
A computational intelligent analysis of autism spectrum disorder using machin...A computational intelligent analysis of autism spectrum disorder using machin...
A computational intelligent analysis of autism spectrum disorder using machin...IAESIJAI
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
 
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
 
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docxRunning head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docxtoltonkendal
 
8 2008-normative data for the letter cancellation task in school children
8 2008-normative data for the letter cancellation task in school children8 2008-normative data for the letter cancellation task in school children
8 2008-normative data for the letter cancellation task in school childrenElsa von Licy
 
Analysis and Prediction of Diabetes Diseases using Machine Learning Algorithm...
Analysis and Prediction of Diabetes Diseases using Machine Learning Algorithm...Analysis and Prediction of Diabetes Diseases using Machine Learning Algorithm...
Analysis and Prediction of Diabetes Diseases using Machine Learning Algorithm...IRJET Journal
 
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACHENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACHindexPub
 

Similar to Prediction schizophrenia using random forest (20)

CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
 
Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...
 
IRJET - An Effective Stroke Prediction System using Predictive Models
IRJET -  	  An Effective Stroke Prediction System using Predictive ModelsIRJET -  	  An Effective Stroke Prediction System using Predictive Models
IRJET - An Effective Stroke Prediction System using Predictive Models
 
A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...
 
Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...
Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...
Performance Analysis of Data Mining Methods for Sexually Transmitted Disease ...
 
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
 
Diabetes Prediction Using ML
Diabetes Prediction Using MLDiabetes Prediction Using ML
Diabetes Prediction Using ML
 
An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...
An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...
An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...
 
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
 
Development of a Hybrid Dynamic Expert System for the Diagnosis of Peripheral...
Development of a Hybrid Dynamic Expert System for the Diagnosis of Peripheral...Development of a Hybrid Dynamic Expert System for the Diagnosis of Peripheral...
Development of a Hybrid Dynamic Expert System for the Diagnosis of Peripheral...
 
Predictive machine learning applying cross industry standard process for data...
Predictive machine learning applying cross industry standard process for data...Predictive machine learning applying cross industry standard process for data...
Predictive machine learning applying cross industry standard process for data...
 
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
 
A computational intelligent analysis of autism spectrum disorder using machin...
A computational intelligent analysis of autism spectrum disorder using machin...A computational intelligent analysis of autism spectrum disorder using machin...
A computational intelligent analysis of autism spectrum disorder using machin...
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
 
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
 
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docxRunning head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
 
8 2008-normative data for the letter cancellation task in school children
8 2008-normative data for the letter cancellation task in school children8 2008-normative data for the letter cancellation task in school children
8 2008-normative data for the letter cancellation task in school children
 
Analysis and Prediction of Diabetes Diseases using Machine Learning Algorithm...
Analysis and Prediction of Diabetes Diseases using Machine Learning Algorithm...Analysis and Prediction of Diabetes Diseases using Machine Learning Algorithm...
Analysis and Prediction of Diabetes Diseases using Machine Learning Algorithm...
 
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACHENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
 

More from TELKOMNIKA JOURNAL

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesTELKOMNIKA JOURNAL
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
 

More from TELKOMNIKA JOURNAL (20)

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
 

Recently uploaded

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 

Recently uploaded (20)

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 

Prediction schizophrenia using random forest

  • 1. TELKOMNIKA Telecommunication, Computing, Electronics and Control Vol. 18, No. 3, June 2020, pp. 1433~1438 ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018 DOI: 10.12928/TELKOMNIKA.v18i3.14837  1433 Journal homepage: http://journal.uad.ac.id/index.php/TELKOMNIKA Prediction schizophrenia using random forest Zuherman Rustam, Glori Stephani Saragih Department of Mathematics, Faculty of Mathematics and Science, Universitas Indonesia, Indonesia Article Info ABSTRACT Article history: Received Aug 15, 2019 Revised Feb 12, 2020 Accepted Feb 21, 2020 Schizophrenia is a mental illness with a very bad impact on sufferers, attacking the part of human brain that disables the ability to think clearly. In 2018, Rustam and Rampisela classified Schizophrenia by using Northwestern University Schizophrenia Data, based on 66 variables consisting of group, demographic, and questionnaires statistics, based on the scale for the assessment of negative symptoms (SANS), and scale for the assessment of positive symptoms (SAS), and then classifiers that used are SVM with Gaussian kernel and Twin SVM with linear and Gaussian kernel. Furthermore, this research is novel based on the use of random forest as a classifier, in order to predict Schizophrenia. The result obtained is reported in percentage of accuracy, both in training and testing of random forest, which was 100%. This classification, therefore, shows the best value in contrast with prior methods, even though only 40% of training data set was used. This is very important, especially in the cases of rare disease, including schizophrenia. Keywords: Classification Machine learning Random forest Schizophrenia This is an open access article under the CC BY-SA license. Corresponding Author: Zuherman Rustam, Department of Mathematics, Faculty of Mathematics and Science, Universitas Indonesia, Margonda Raya St., Pondok Cina, Beji, Depok, Jawa Barat 16424, Indonesia. Email: rustam@ui.ac.id 1. INTRODUCTION Schizophrenia is a mental illness that has a very bad impact on sufferers, based on its ability to attack parts of the human brain, thus disabling the persons ability to think clearly [1]. Generally, patients experience a change either behaviorally or on the mind, which subsequently affects reality. This disease possesses the propensity to attack everyone at any age, the average attacks starting at the age of 20s in men, while in women, it was observed at the end of 20s [1]. It is, therefore, important to proficiently identify the symptoms on time, in order to prevent the disease from occuring in earnest. Schizophrenia is generally divided with three symptoms, including 1) the positive, presented with unnecessitated extra brain activities, including hallucinations, 2) the negative, indicated by the loss of brain activities, 3) the cognitive, which is exhibited as challenges with ability to remember and think. These are severe facts that confer potential negative effects on the life of sufferers, and their mortality rates are 2 to 2.5 times higher than the general population [2], 10% commited suicide and 20-40% attempted suicide at least once [3]. Then, the cause of schizophrenia has not been determined for sure [4] and this illness has been evaluated to get worse due to the incompetence of early detection, hence, the need to identify other approaches for detection, and one of which involves the use of computational methods, comprising of machine learning. However, several papers have attempted utilizing this technique in the diagnosis of schizophrenia, including SVM with Gaussian kernel, Twin SVM with linear and Gaussian kernel [5], linear discriminant
  • 2.  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 3, June 2020: 1433 - 1438 1434 analysis and k-Nearest neighbor [6], fisher linear discriminant analysis [7], Elastic Net, as well as least absolute shrinkage and selection operator [8]. This current research involved the use of random forest as a classify, although it has widely been used in various studies, including the prediction of bank financial failures, with accuracy of 93% [9], diabetes mellitus at 80.8% [10], automated diagnosis of heart disease at 83.6%, applying the weighted random forest [11], classify prostate cancer [12], chronic kidney disease [13] and osteoarthritis disease [14]. Hence, the selected approach has proven the capability of this classifier to apply to any problem, exhibiting good model performance in the process. This research is organized as follows: section 1 provides background of details, while the second specifies data and research method used. In addition, result and analysis were discussed in section 3, and finally, conclusions are included in section 4. 2. DATA AND RESEARCH METHOD 2.1. Data Information from the database of Northwestern University Schizophrenia Data was used in this study [1]. Furthermore, there were 392 observations divided into 4 groups, with distributions as seen in Table 1. The study followed the grouping used by Rustam and Rampisela in the paper entitled “Support vector machines and twin support vector machines in the classification of schizophrenia data” [5]. This was because of the similarity in research focus, which was based on the categorization into schizophrenics and non-schizophrenics only, which served as the group variable [2]. Non-schizophrenics group consists of healthy siblings of the patients, the control, and siblings of control. A total of 66 data variables were collected, including the group, and demographics, consisting of gender, dominant hand, race, ethnicity, and age, using questionnaires statistics of scale for the assessment of negative symptoms (SANS) [15] and scale for the assessment of negative symptoms (SAPS) [16], as shown in Table 2. Table 1. Distribution of group Group Number of observations Schizophrenics 171 Siblings of Schizophrenics 44 Control Siblings of Control 111 66 Table 2. The variable of schizophrenia data 𝑛th Variable Data Group Variable Description 1 - 34 Questionnaires of SAPS SAPS𝑖, 𝑖 = 1, … , 34 SAPS is used to evaluate the positive symptoms of schizophrenia, SAPS is divided into 4 main sections containing 34 different symptoms, namely hallucination, delusion, bizarre behavior and thought disorder [16]. The data is in scale (0.5) 35 - 60 Questionnaires of SANS SANS𝑗, 𝑗 = 35, … ,60 SANS is used to evaluate the negative symptoms of schizophrenia, SANS is divided into 5 main sections containing 25 different symptoms, namely emotional reaction decline, alogia, avolition and apathy, anhedonia and asociality, and attention [15]. The data is in scale (0.5) 61 Demographic Gender Gender is divided into 2 categories: 1) Male and 2) Female 62 Demographic Dominant Hand Dominant Hand is divided into 2 categories: 1) Left and 2) right 63 Demographic Race 64 Demographic Ethnicity Ethnicity is divided into 3 categories: 1) Kaukasia, 2) Africa-America, and 3) others 65 Demographic Age Integer (13.66) 66 Group Group Group is divided into 2 classes: 0) Non-Schizophrenics and 1) Schizophrenics 2.2. Research method 2.2.1. Bootstrap In 1948, Queneiville introduced Jackknife as a resampling method, while Bradley Efron initiated bootstrap as its revolution in 1979, serving as a resampling method with replacement [17]. This allows for the creation of new data sets from the original, through repeatedly sampling the observations, as this is more feasible, in contrast with the method that required obtaining data from the population frequently [18]. This approach is a training data set, denoted as 𝐵, which contains 𝑛 observations, randomly selected to
  • 3. TELKOMNIKA Telecommun Comput El Control  Prediction schizophrenia using random forest (Zuherman Rustam) 1435 produce the new set, 𝐵∗1 , indicating a similarity is size with the original. In addition, sampling was performed with replacement, which signifies the propensity of same observations appearing more than once [19]. The process is conducted continuously to the point where 𝑇 new data set, 𝑇 is capable of setting personally (e.g 𝑇 = 100) or tune by an independent validation data set. 2.2.2. Random forest In 2001, Breiman introduced random forest as a categorization tool, consisting of a collection of tree-structured classifiers {ℎ( 𝒙,  𝑘), 𝑘 = 1, … }, where { 𝑘} are identical independent distributed random vectors, where each tree casts a unit vote for the most popular class at input 𝒙 [20].The approach used is aimed at improving stability and accuracy of the decision tree, through the creation of numerous units from existing training data, using the bootstrap method [21]. Random forest is capable of improving accuracy through randomization and voting methods, and it is also able to reduce the correlation between trees, without significantly reducing the strength of each [22]. Therefore, when overfitting is observed in a particular training data, others do not behave in the same manner [20]. Based on Breiman’s paper, the process building of numerous trees does not create an overfit, although it produces a generalization error that converges to a value [20]. Algorithm random forest for classification [9]. 1. Given the training data set, with 𝑛 and 𝑚 as observations and variables, respectively 2. For 𝑏 = 1 to 𝐵 a. Draw a bootstrap sample with 𝑛 number of observations from the training (original) data set b. Build the decision tree 𝑇 𝑏 from each new result derived, where individual nodes are chosen at random. i. Select 𝑝 variable at random from 𝑚, with 𝑝 ≤ 𝑚, where 𝑝 = 1,2, … 𝑜𝑟 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 √ 𝑚 [23]. ii. Choose the best feature that provides satisfactory Information Gain or Gini Index [24]. iii. Split the node c. Grow each without pruning 3. Output the ensemble of decision trees {𝑇 𝑏 }1 𝐵 4. Conduct voting, i.e., if 𝐶̂ 𝑏(𝑥) is the class prediction of the 𝑏th random forest tree, then 𝐶̂ 𝑟𝑓 𝐵 ( 𝑥) = 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑣𝑜𝑡𝑒 {𝐶̂ 𝑏( 𝑥)}1 𝐵 The algorithm above can be representative with Figure 1. Figure 1. Flow of random forest 2.2.3. Evaluation of model performance The evaluation of model performance is important, due to its ability to provides knowledge on the tools’ efficiency of classifying data. This was assessed through the measurement of accuracy, obtained from the result of model with confusion matrix, where a high value indicates a good condition of the classification model [21]. Basically, it is known to contain comparable information with the result made by the model, including the binary type of classification, which indicates the presence of 2 output class, encompassing schizophrenics and non-schizophrenics. However, there are 4 parts to this confusion matrix:
  • 4.  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 3, June 2020: 1433 - 1438 1436 1) TP: True Positive, observed on instances where schizophrenia is detected as Schizophrenics 2) TN: True Negative, is when non-schizophrenia is detected as non-schizophrenics 3) FP: False Positive, is seen in cases where non-schizophrenia is detected as schizophrenics 4) FN: False Negative, is when schizophrenia is detected as non-schizophrenics Table 3 shows the confusion matrix used in this reserch. Table 3. Confusion matrix Actual Class Schizophrenics Non-Schizophrenics Predicted Class Schizophrenics TP FN Non-Schizophrenics FP TN Therefore, based on the value of TP, TN, FP, and FN from the matrix, it is possible to obtain the valued accuracy with the following formula: Accuracy = TP+TN TP + TN + FP + FN (1) 3. RESULTS AND ANALYSIS 3.1. Preprocessing data There 66 variables in schizophrenia Data, therefore, feature selection was conducted before fitting the model, in order to improve accuracy. This was based on the percentage of missing data from variables less than 10%, thus, the feature is selected. Then, If the individual variance is more than that in the data collected from group, then choices are made based on those conditions, with 60 used in the model. Another means of promoting accuracy is by tuning the hyperparameters, which include those that affect the model structure and also the result output, therefore, there is need to identify its optimal set [25]. This is obtained by learning various algorithms with different sets, and subsequently comparing the results of eachs’ performance, also known as tuning the model. Furthermore, the paper by Saragih and Rustam was followed for the use criterion (Gini and Entropy) [9]. The number of estimators/trees serves as hyperparameters to improve the accuracy, which collectively with the criterion is a function that measures the equality of a split, entropy for the Information Gain and Gini index [9]. 3.2. Result and analysis A study [5] applied 4 types of SVM on the same schizophrenia data, therefore, the main goal in this research is novel, through the use of random forest, in order to enhance predictability. This research required that the algorithm was run 10 times, and the repetition was performed due to the presence of element random in this experiment. As mention in section 3.1, the model tuning was conducted with the use of 2 hyperparameter combinations, and Table 4 provides the result of classification, using random forest with entropy, while gini was the criterion in Table 5. In this research, we used scikit-learn library. From Table 4, random forest with entropy in accuracy of training data was able to correctly classify schizophrenia data, with a 100% accuracy level for all compositions of training data set, and number of trees. This occurs on instances where the number of trees is 50 and 100 for 50-80% of the composition training data. From Table 5, the gini in accuracy of training data for random forest, provides the same result as entropy in the classification of schizophrenia data, which is 100% for all data set, and number of tree. Therefore, if this percentage was obtainable with entropy for testing in 50-80% training data set, then the result of gini in 40- 80%, with the number of trees is 100, is correct. Table 4. Accuracy of schizophrenia data classification using random forest with entropy as criterion Percentage of Data Training (%) Number of Tree 10 50 100 10 50 100 Accuracy of Testing Data (%) Accuracy of Training Data (%) 10 94 95 98 100 100 100 20 95 96 98 100 100 100 30 97 98 99 100 100 100 40 96 99 99 100 100 100 50 97 100 100 100 100 100 60 99 100 100 100 100 100 70 100 100 100 100 100 100 80 100 100 100 100 100 100
  • 5. TELKOMNIKA Telecommun Comput El Control  Prediction schizophrenia using random forest (Zuherman Rustam) 1437 Table 5. Accuracy of schizophrenia data classification using random forest with gini as criterion Percentage of Data Training (%) Number of Tree 10 50 100 10 50 100 Accuracy of Testing Data (%) Accuracy of Training Data (%) 10 93 96 98 100 100 100 20 95 95 98 100 100 100 30 96 98 98 100 100 100 40 96 99 100 100 100 100 50 96 99 100 100 100 100 60 97 100 100 100 100 100 70 100 100 100 100 100 100 80 100 100 100 100 100 100 Based on Table 4 and Table 5, it is seen that a greater composition used is able to produce higher accuracy, due to the presence of more data learned by the model. Hence, tuning indicates the best accuracy on instances where the criterion entropy is with number of estimators as 100. However, it is seen that the results are similar for each composition training data and number of estimators. As said in Section 2, schizophrenia data was followed that same way as in the paper written by Rustam and Rampisela [5], due to a desire to compare performance results (accuracy) of the different methods used. Table 6, therefore, shows the comparison of each methods’ Testing accuracy. Table 6 shows the highest accuracy occurring at random forest, which was in contrast with other SVM methods used in the paper of Rustam and Rampisela [5]. This was recorded at a level of 100%, when the percentage of training data is 40-80%, while others only achieved 90% at 60-80% Table 6. Performance results Training Data (%) Linear SVM (%) Gaussian SVM (%) Linear Twin SVM (%) Gaussian Twin SVM (%) Random Forest (%) Accuracy (%) Std deviation 10 88 88 89 89 98 0.0001 20 89 89 89 89 98 0.0002 30 89 89 89 89 98 0.0001 40 89 89 89 89 100 0 50 89 89 89 89 100 0 60 90 90 90 90 100 0 70 90 90 90 90 100 0 80 90 90 90 90 100 0 4. CONCLUSION Classification of schizophrenia has previously been conducted by Rustam and Rampisela, using SVM models, therefore, this research was novel in the use of random forest to predict based on the information collected from the database of Northwestern University Schizophrenia Data, which was also used by Rustam and Rampisela. Therefore, it was established that random forest with entropy and gini shows similar results, although it only slightly performs better with gini and the number of estimators being 100. Subsequently, this technique was also able to predict with good accuracy, using 40% training data, which was in contrast with other methods used in prior studies which insist on the use of 80%, in order to obtain 90% accuracy. This is very important, especially in the prediction of rare disease, where data is difficult to obtain. In comparison with past studies using the same data, random forest was observed to show better accuracy, at 100% for training, and also testing. This approach is, therefore, expected to be relevant in the medical field, especially in the prediction of schizophrenia, and subsequently in other diseases, which is currently hard in diagnosis, hence, enhancing the accuracy of the medical team in providing treatment. It is suggested that successive research use feature selection to identify the important feature assists the medical practitioners to focus on several data points, and also that the application of random forest is adopted in other dataset with updated and superior dimension. ACKNOWLEDGMENT This research supported financially by University of indonesia with a DRPM PUTI Q2 2020 grant scheme.
  • 6.  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 18, No. 3, June 2020: 1433 - 1438 1438 REFERENCES [1] Wang L., et al., “Northwestern University Schizophrenia Data and Software Tool (NUSDAST),” Frontiers in Neuroinformatics, vol. 7, pp. 25, 2013. [2] Yasami M. T., et al., “Living A Healthy Life with Schizophrenia: Paving the Road to Recovery,” World Federation for Mental Heatlh: World Health Organization, 2014. [3] Schizophrenia.com. Schizophrenia symptoms and diagnosis of Schizophrenia retrieved from http://schizophrenia.com/szfacts.htm on 22 October 2019. [4] American Psychiatric Association. “Diagnostic and statistical manual of mental disorders,” fifth edition (Arlington, VA: American Psychiatric Association) https://doi.org/10.1176/appi.books.9780890425596. 2013. [5] Rustam Z., Rampisela T. V. “Support vector machines and twin support vector machines for classification of schizophrenia data,” International Journal of Engineering & Technology, vol. 7, no. 4, pp. 6378-6877, 2018. [6] Ahn M., Hong J. H., and Jun S. C., “Feasibility of approaches combining sensor and source features in brain-computer interface.” Journal of neuroscience methods, vol. 204, no. 1, pp. 168–178, 2012. [7] Neuhaus A. H., et al., “Single-subject classification of schizophrenia using event-related potentials obtained during auditory and visual oddball paradigms,” European Archives of Psychiatry and Clinical Neuroscience, vol. 263, no. 3, pp. 241-247, 2013. [8] Hettige N. C., et al., “Classification of suicide attempters in schizophrenia using sociocultural and clinical features: A machine learning approach,” Gen Hosp Psychiatry, vol. 47, pp. 20-28, 2017. [9] Rustam Z., and Saragih G., “Predict bank financial failures using random forest,” Institute of Electrical and Electronics Engineers. doi: 10.1109/IWBIS.2018.8471718. pp.81-86, 2018. [10] Zou Q., et al., “Predicting Diabetes Mellitus with Machine Learning Techniques,” Frontiers in Genetics, vol. 9, no. 515, 2018. [11] Patil P. R., and Kinariwala S. A., “Automated Diagnosis og Heart Disease using Random Forest Algorithm,” International Journal of Advance Research, Ideas and Innovations in Technology. vol. 3, no. 2, pp. 579–589, 2017. [12] Huljanah M., et al., “Feature Selection using Random Forest Classifier for Predicting Prostate Cancer,” IOP Conference Series: Materials Science and Engineering, vol. 546, no. 5, 2019. [13] Rustam Z., Sudarsono E., and Sarwinda D., “Random-Forest (RF) and Support Vector Machine (SVM) Implementation for Analysis of Gene Expression Data in Chronic Kidney Disease (CKD),” IOP Conference Series: Materials Science and Engineering, vol. 546, no. 5, 2019. [14] Aprilliani U., and Rustam Z., “Osteoarthritis Disease Prediction Based on Random Forest,” Institute of Electrical and Electronics Engineers. DOI: 10.1109/ICACSIS.2018.8618166. pp. 237-240, 2018. [15] Andreasen N. C., “Scale for the Assessment of Negative Symptoms (SANS).” conceptual and theoretical foundations. The British journal of psychiatry, pp. 49-52, 1989. [16] Andreasen N. C., “Scale for the Assessment of Positive Symptoms” (SApS. Iowa City: University of Iowa). 1984. [17] Efron B., and Tibshirani R., “An Introduction to the Bootstra,” New York: Chapman & Hall. 1993. [18] Hastie T., Tibshirani R., and Friedman J., “The element of statistical learning data mining, inference, and prediction,” California: Springer. 2009. [19] James G., et al., “First Edition; An Introduction to Statistical Learning with Application in R,” Springer. New York: Springer-Verlag New York, vol. 112, pp. 3-7, 2013. [20] Breiman L., “Random forests,” Mach. Learn, vol. 45, no. 1, pp.5–32, 2001. [21] Mindermann S., “Random forests,” Thesis. Amsterdam: Korteweg-de Vries Instituut voor Wiskunde, Universiteit van Amsterdam, 2016. [22] Strobl C., “Statistical Issues in Machine Learning-Towards Reliable Split Selection and Variable Importance Meaasures,” Munchen, 2008. [23] Breiman L., et al., “Classification and regression trees” Wadsworth Int, 1984. [24] Grabczewski K., “Meta-Learning in decision tree induction,” Torun: Springer, vol. 1, 2014. [25] Kohavi R. and Provost F., “On applied research in machine learning,” Editorial for the Special Issue on Application Machine Learning and the Knowledge Discovery Process, Columbia University, vol. 30, pp. 127-132, 1998.