Successfully reported this slideshow.
Your SlideShare is downloading. ×

Classification of Headache Disorder Using Random Forest Algorithm.pptx

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Short story_2.pptx
Short story_2.pptx
Loading in …3
×

Check these out next

1 of 16 Ad

More Related Content

Similar to Classification of Headache Disorder Using Random Forest Algorithm.pptx (20)

Recently uploaded (20)

Advertisement

Classification of Headache Disorder Using Random Forest Algorithm.pptx

  1. 1. SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ARTICLE CRITICAL REVIEW CLASSIFICATION OF HEADACHE DISORDER USING RANDOM FOREST ALGORITHM PRESENTER: ALEMU GUDETA 1
  2. 2. About Article Authors Dhiyaussalam, Adi Wibowo, Fajar Agung Nugroho, Eko Adi Sarwoko, I Made Agus Setiawan 2020 4th International Conference on Informatics and Computational Sciences (ICICoS) Publisher - IEEE Classification of Headache Disorder Using Random Forest Algorithm 2
  3. 3. General Summary  Headache disorder is one of the most common illnesses.  The purpose of this article is to help anyone figure out what kind of headache they are experiencing at home without seeking medical attention.  The majority of headache sufferers prefer to treat themselves at home when they feel they have a headache.  Model for categorizing the different forms of headaches and generating feature importance was created using the Random Forest algorithm.  To create the best model, they used the Migbase dataset and adjusted numerous algorithmic parameters. The 850 data in the Migbase dataset have 39 features. Also it has three class labels for migraine, tension, and cluster, with respective percentages of 71.73%, 21.67%, and 6.60%. 3
  4. 4. Key Findings  Random Forest - an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time.  N_estimators - used to determine the number of trees contained in the Random Forest model.  Max_features - used to determine the maximum number of features used.  Max_depth - used to determine the maximum depth of a tree.  Bagging - Bootstrapping(Feature extraction) + Aggregation (Combining result and picking the dominant)  Gini impurity - a measurement of how evenly class labels are distributed among nodes. 4
  5. 5. Research Gap  Particularly in a developing nation, the patient does not need to see a doctor for a headache. However, as the headache wears on, negative effects may result.  However, utilizing a computer, those who are experiencing primary headaches can independently analyze their condition and choose the best course of treatment.  In some cases, it may be challenging for the doctors to diagnose the symptoms. In this approach, a headache type can be identified by the random forest algorithm with greater accuracy than by a doctor. 5
  6. 6. Objective / Purpose  The article's goal is to provide the optimal model for classifying different types of headaches using a dataset that contains information on experiencing symptoms.  Different scholars used various techniques, including decision trees, to attempt to solve the same issue. The authors of this research employed a random forest method to classify the different types of headaches.  The created model aids the person with a headache in determining its nature and initiating self- care measures. 6
  7. 7. Methodology  Data Collection - Migbase dataset is downloaded from the internet with 850 rows and 39 features.  Algorithm – Random Forest. RF combines a number of Decision Trees to produce classification and regression results. RF is less likely to overfitting.  Gini Impurity - Calculate how class labels are distributed among a nodes.  t = node j = number of children at node t, 𝑛𝑐𝑖 is the number of samples with the value 𝑥𝑖 belonging to class 𝑐, 𝑚𝑖 is the number of samples with the value 𝑥𝑖 at node 𝑡 7
  8. 8. Methodology … cntd  Feature importance - calculated from the average impurity reduction of all the Decision Tree in a Random Forest without assuming whether the data used is linearly separated or not.  𝐹𝐼 is feature-𝑖 in a Decision Tree, and 𝑘 is represent all of node.  Data Processing - feature extraction done manually and correlation matrix of the target label is also used to calculate the feature importance. Correlation values outside the range between -0.5 to 0.5 are removed. 8
  9. 9. Methodology … cntd  Model Development - The classification model using the Random Forest algorithm in this study was built based on the Scikit-learn library documentation.  Model Evaluation - The model will be evaluated using 10 data with the contents of 4 classes of migraine, 3 classes of tension, and 3 classes of clusters. 9 Confusion matrix of test result on one of model
  10. 10. Methodology … cntd  Parameter Optimization  To get the best performance results from the Random Forest model, setting-ups some parameter need to be done manually.  Manually setting the number of trees or n_estimators can reduce the error rate on the performance of a Random Forest model.  Accordingly, for n_estimators the values to be used are 10, 20, 50, and 100, Max_depth the values to be used are 4, 5, and none (no limitation), and for other parameters that will be set manually is the Max_features parameter with 3 choices: 6, 14, or 33. 10
  11. 11. Methodology … cntd 11
  12. 12. Key Findings  By adjusting N_estimators, Max_features and Max_depth authors achieved the following performance accuracy. 12
  13. 13. Contributions  The article assessed the effectiveness of the random forest algorithm to classify the types of headaches that various writers were unable get the specified performance accuracy.  Vandewiele et al., (2018) , Krawczyk et al., (2013) , Aljaaf et al., (2015) conducted a research ad suggested decision tree. But these authors got better performance accuracy compared with those authors.  Practically, it made life easier and increased societal awareness of a person's ability to diagnose their headaches at home. 13
  14. 14. Critique  The majority of the data processing in machine learning, including the amount of features, depth, and estimators to use, is done manually.  Additionally, the authors made no suggestions or indicated what they would do to enhance the paper in the future.  They weren't anticipating any new symptoms, which may have appeared at any time. In this situation, the model's performance will suffer. 14
  15. 15. Conclusion  According to the authors the model performs better than the model trained before using decision tree on same dataset, by researchers Vandewiele et al., (2018) , Krawczyk et al., (2013) , Aljaaf et al., (2015)  The highest performance resulted in an accuracy rating of 99.56%, and the lowest performance resulted in a value of 97.79% by adjusting n_estimators, max features, and max depth were 100, 33, and 5 respectively. 15
  16. 16.  THANK YOU! 16

×