Your SlideShare is downloading. ×
Data mining Computerassignment 2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Data mining Computerassignment 2

271
views

Published on

Computerassignment 2 of the course Data mining. This assisgnment is about using the program weka on an external database and showing the decision tree.

Computerassignment 2 of the course Data mining. This assisgnment is about using the program weka on an external database and showing the decision tree.


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
271
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data mining‘Baseline & Decision Trees’ COMPUTER ASSIGNMENT 2 BARRY KOLLEE 10349863
  • 2. Regression  |  CPU  performance        1. Evaluation in Machine LearningCopy the file weather.arff to your home directory. This file contains data for deciding when toplay a certain sport given weather conditions. Run the J48 classifier using "weather.arff" asthe training set. 1. Report how many instances are correctly and incorrectly classified on the training set. 2. The classifier weka.classifiers.rules.ZeroR simply assigns the most common classification in a training set to any new classifications and can be used as a baseline for evaluating other machine learning schemes. Invoke the ZeroR classifier using weather.arff. Report the number of correctly classified and misclassified instances both for the training set and cross-validation. 3. What are baselines used for? Is ZeroR a reasonable baseline? Can you think of other types of baselines? 4. What is the difference between a development set and a test set? 5. What is the difference between accuracy and precision? Give small examples for which both precision and accuracy scores differ greatly.1. I’ve loaded up the delivered database file “weather.arff” into weka and run the J48 classifier by usingthe delivered training set. The J48 is one of the ‘tree classifiers’. Next to the results UTF8-description ofthe model, we’re also able to see the decision tree of our weather.arff file. The results are listed below ofthe J48 classifier is listed below. The correct and incorrect instances are given in red. Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: weather Instances: 14 Attributes: 5 outlook temperature humidity windy play Test mode:evaluate on training data === Classifier model (full training set) === J48 pruned tree ------------------ outlook = sunny | humidity <= 75: yes (2.0) | humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0) Number of Leaves : 5 Size of the tree : 8 Time taken to build model: 0.01 seconds === Evaluation on training set === === Summary === Correctly Classified Instances 14 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 14 === Detailed Accuracy By Class ===2
  • 3. Regression  |  CPU  performance     TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 yes 1 0 1 1 1 1 no Weighted Avg. 1 0 1 1 1 1 === Confusion Matrix === a b <-- classified as 9 0 | a = yes 0 5 | b = no2. Now the ZeroR classifier has been used to classify the “weather.arff”. First I performed a 10-fold crossing-validation. The correct and incorrect instances are given in red. Scheme:weka.classifiers.rules.ZeroR Relation: weather Instances: 14 Attributes: 5 outlook temperature humidity windy play Test mode:10-fold cross-validation === Classifier model (full training set) === ZeroR predicts class value: yes Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 9 64.2857 % Incorrectly Classified Instances 5 35.7143 % Kappa statistic 0 Mean absolute error 0.4762 Root mean squared error 0.4934 Relative absolute error 100 % Root relative squared error 100 % Total Number of Instances 14 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 1 0.643 1 0.783 0.178 yes 0 0 0 0 0 0.178 no Weighted Avg. 0.643 0.643 0.413 0.643 0.503 0.178 === Confusion Matrix ===3
  • 4. Regression  |  CPU  performance     a b <-- classified as 9 0 | a = yes 5 0 | b = noNow I used the option ‘use training set’ in stead of 10-fold cross-validation. The correct and incorrect instancesare given in red. The model is listed below: Scheme:weka.classifiers.rules.ZeroR Relation: weather Instances: 14 Attributes: 5 outlook temperature humidity windy play Test mode:evaluate on training data === Classifier model (full training set) === ZeroR predicts class value: yes Time taken to build model: 0 seconds === Evaluation on training set === === Summary === Correctly Classified Instances 9 64.2857 % Incorrectly Classified Instances 5 35.7143 % Kappa statistic 0 Mean absolute error 0.4643 Root mean squared error 0.4795 Relative absolute error 100 % Root relative squared error 100 % Total Number of Instances 14 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 1 0.643 1 0.783 0.5 yes 0 0 0 0 0 0.5 no Weighted Avg. 0.643 0.643 0.413 0.643 0.503 0.5 === Confusion Matrix === a b <-- classified as 9 0 | a = yes 5 0 | b = no3. a) A baseline is a simple approach to a given problem, which is often used to compare otherapproaches to, in order to see whether the other approaches perform better.Next to datamining this is a common term in businesses. Where a business can define a certainbaseline for their company goals and/or strategy. Several approaches could deal this goal and/orstrategy.b) No it is not. It just determines the most common class or the median (in case of numeric values). Ittests how well a class can be predicted without considering any attributes.c) An example of a better type of baseline is NaiveBayes. This classifier does take attributes intoaccount which makes our created model better. That’s because it’s not only checking for the mostcommon class but also on every attributes the instances have. With these attributes we can give a waymore accurate insight of the baseline. You can see that in the snippet below:4
  • 5. Regression  |  CPU  performance     === Run information === Scheme:weka.classifiers.bayes.NaiveBayes Relation: weather Instances: 14 Attributes: 5 outlook temperature humidity windy play Test mode:10-fold cross-validation === Classifier model (full training set) === Naive Bayes Classifier Class Attribute yes no (0.63) (0.38) =============================== outlook sunny 3.0 4.0 overcast 5.0 1.0 rainy 4.0 3.0 [total] 12.0 8.0 temperature mean 72.9697 74.8364 std. dev. 5.2304 7.384 weight sum 9 5 precision 1.9091 1.9091 humidity mean 78.8395 86.1111 std. dev. 9.8023 9.2424 weight sum 9 5 precision 3.4444 3.4444 windy TRUE 4.0 4.0 FALSE 7.0 3.0 [total] 11.0 7.0 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 9 64.2857 % Incorrectly Classified Instances 5 35.7143 % Kappa statistic 0.1026 Mean absolute error 0.4649 Root mean squared error 0.543 Relative absolute error 97.6254 % Root relative squared error 110.051 % Total Number of Instances 14 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.889 0.8 0.667 0.889 0.762 0.444 yes 0.2 0.111 0.5 0.2 0.286 0.444 no Weighted Avg. 0.643 0.554 0.607 0.643 0.592 0.444 === Confusion Matrix === a b <-- classified as 8 1 | a = yes 4 1 | b = no5
  • 6. Regression  |  CPU  performance    4. We create our training set to increase the accuracy of the classifier, which we use on the data. Themore data we train the more accurate the resulting model will be.The other two sets are used to evaluate the performance of the classifier we use. The development setis used to evaluate the accuracy of different configurations of our classifier. It’s called the developmentset because we continuously need to evaluate the classification performance.In the end we’ve got a model, which has a great performance on the test data. To get estimates on howgood the new model will deal with new data we use the test data.5.  With accuracy we are getting a result which is close to the actual value/answer/datapoint. Withprecision we target to have an equal result on every new prediction on every new datapoint.  I.e. if we play darts we can be playing accurately by hitting the bulls eye. But precise means that weshould throw a dart on the exact same spot every time.  6
  • 7. Regression  |  CPU  performance    2. Decision TreesThis assignment uses the WEKA implementation of C4.5, a decision tree learner. To invokethis learner e.g. using a file called "train.arff" as a training set you can type:java weka.classifiers.trees.J48 -t train.arffThis will construct a decision tree from train.arff and then apply it to train.arff. After that it willperform a 10-fold cross-validation on train.arff.2.1. Copy the file zoo.arff zoo.train.arff zoo.test.arff from Blackboard. This data includesinstances of animals described by their features (hairy, feathered, etc) and classifications ofthose animals (e.g. mammal, bird, reptile). Invoke the J48 classifer using zoo.train.arff andzoo.test.arff as the training and testing files respectively. Note that zoo.{train,test}.arff togethercontain the same data as zoo.arff, i.e. the latter was split to create the training and testingsets.w 1. Report the number of correctly and incorrectly classified instances for the test data for Decision Trees. 2. Include in your report a description of the decision tree constructed by the J48 classifier and explain how the decision tree is used to classify a new instance.1. I’ve opened the zootest-file within weka and run the J48 classifier. The number of correct and incorrect instancesare given in red. This is the use of the test set: === Run information === Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: zoo Instances: 81 Attributes: 18 animal hair feathers eggs milk airborne aquatic predator toothed backbone breathes venomous fins legs tail domestic catsize type Test mode:user supplied test set: size unknown (reading incrementally) === Classifier model (full training set) === J48 pruned tree ------------------ feathers = false | milk = false | | toothed = false | | | airborne = false: invertebrate (8.0/1.0) | | | airborne = true: insect (5.0) | | toothed = true | | | fins = false | | | | legs <= 2: reptile (3.0) | | | | legs > 2: amphibian (3.0) | | | fins = true: fish (10.0) | milk = true: mammal (36.0)7
  • 8. Regression  |  CPU  performance     feathers = true: bird (16.0) Number of Leaves : 7 Size of the tree : 13 Time taken to build model: 0.01 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 17 85 % Incorrectly Classified Instances 3 15 % Kappa statistic 0.8187 Mean absolute error 0.0464 Root mean squared error 0.1965 Relative absolute error 20.0843 % Root relative squared error 55.849 % Total Number of Instances 20 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 mammal 1 0 1 1 1 1 bird 0 0 0 0 0 0.5 reptile 1 0 1 1 1 1 fish 1 0.053 0.5 1 0.667 0.974 amphibian 0.5 0 1 0.5 0.667 0.944 insect 1 0.118 0.6 1 0.75 0.941 invertebrate Weighted Avg. 0.85 0.02 0.815 0.85 0.813 0.934 === Confusion Matrix === a b c d e f g <-- classified as 5 0 0 0 0 0 0 | a = mammal 0 4 0 0 0 0 0 | b = bird 0 0 0 0 1 0 1 | c = reptile 0 0 0 3 0 0 0 | d = fish 0 0 0 0 1 0 0 | e = amphibian 0 0 0 0 0 1 1 | f = insect 0 0 0 0 0 0 3 | g = invertebrateWe can also perform a classifier based on the training set (and not the test set). The number of correctand incorrect instances is given in red: === Run information === Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: zoo Instances: 81 Attributes: 18 animal hair feathers eggs milk airborne aquatic predator toothed backbone breathes venomous fins legs tail domestic catsize type Test mode:10-fold cross-validation === Classifier model (full training set) === J48 pruned tree ------------------8
  • 9. Regression  |  CPU  performance     feathers = false | milk = false | | toothed = false | | | airborne = false: invertebrate (8.0/1.0) | | | airborne = true: insect (5.0) | | toothed = true | | | fins = false | | | | legs <= 2: reptile (3.0) | | | | legs > 2: amphibian (3.0) | | | fins = true: fish (10.0) | milk = true: mammal (36.0) feathers = true: bird (16.0) Number of Leaves : 7 Size of the tree : 13 Time taken to build model: 0.03 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 75 92.5926 % Incorrectly Classified Instances 6 7.4074 % Kappa statistic 0.8987 Mean absolute error 0.0232 Root mean squared error 0.1465 Relative absolute error 10.882 % Root relative squared error 45.1077 % Total Number of Instances 81 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 mammal 1 0 1 1 1 1 bird 0.333 0.013 0.5 0.333 0.4 0.66 reptile 1 0.014 0.909 1 0.952 0.993 fish 0.667 0.013 0.667 0.667 0.667 0.827 amphibian 0.667 0.013 0.8 0.667 0.727 0.818 insect 0.857 0.027 0.75 0.857 0.8 0.907 invertebrate Weighted Avg. 0.926 0.006 0.921 0.926 0.922 0.959 === Confusion Matrix === a b c d e f g <-- classified as 36 0 0 0 0 0 0 | a = mammal 0 16 0 0 0 0 0 | b = bird 0 0 1 1 1 0 0 | c = reptile 0 0 0 10 0 0 0 | d = fish 0 0 1 0 2 0 0 | e = amphibian 0 0 0 0 0 4 2 | f = insect 0 0 0 0 0 1 6 | g = invertebrate2. Weka produces the following decision tree: feathers = false | milk = false | | toothed = false | | | airborne = false: invertebrate (8.0/1.0) | | | airborne = true: insect (5.0) | | toothed = true | | | fins = false | | | | legs <= 2: reptile (3.0) | | | | legs > 2: amphibian (3.0) | | | fins = true: fish (10.0) | milk = true: mammal (36.0) feathers = true: bird (16.0)9
  • 10. Regression  |  CPU  performance    The decision tree is listed above. We define 6 different steps within our decision tree. 1. Does the creature has feathers (True/False) 2. Does the creature produces milk (True/False) 3. Is the creature toothed (True/False) 4. Does the creature flies (True/False) 5. Does the creature has fins (True/False) 6. Does the creature has more or less legs then 2 (numerical check)2.2. If you invoke a WEKA classifier with a training set but no testing set, WEKA willautomatically perform a 10-fold cross-validation on the training set and report how manyinstances are correctly and incorrectly classified when those instances are used as test dataduring the cross-validation. 1. When you ran the J48 classifier with the zoo.arff file, a 10-fold cross validation was performed. Report the number of instances correctly and incorrectly classified during the cross-validation.I loaded up the delivered zoo.arff file and performed a 10-fold crossing with the J48 classifier. Thenumber of correct and incorrect instances are given in red.=== Run information ===Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2Relation: zooInstances: 101Attributes: 18 animal hair feathers eggs milk airborne aquatic predator toothed backbone breathes venomous fins10
  • 11. Regression  |  CPU  performance     legs tail domestic catsize typeTest mode:10-fold cross-validation=== Classifier model (full training set) ===J48 pruned tree------------------feathers = false| milk = false| | backbone = false| | | airborne = false| | | | predator = false| | | | | legs <= 2: invertebrate (2.0)| | | | | legs > 2: insect (2.0)| | | | predator = true: invertebrate (8.0)| | | airborne = true: insect (6.0)| | backbone = true| | | fins = false| | | | tail = false: amphibian (3.0)| | | | tail = true: reptile (6.0/1.0)| | | fins = true: fish (13.0)| milk = true: mammal (41.0)feathers = true: bird (20.0)Number of Leaves : 9Size of the tree : 17Time taken to build model: 0.02 seconds=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 93 92.0792 %Incorrectly Classified Instances 8 7.9208 %Kappa statistic 0.8955Mean absolute error 0.0225Root mean squared error 0.14Relative absolute error 10.2478 %Root relative squared error 42.4398 %Total Number of Instances 101=== Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 mammal 1 0 1 1 1 1 bird 0.6 0.01 0.75 0.6 0.667 0.793 reptile 1 0.011 0.929 1 0.963 0.994 fish 0.75 0 1 0.75 0.857 0.872amphibian 0.625 0.032 0.625 0.625 0.625 0.92 insect 0.8 0.033 0.727 0.8 0.762 0.986invertebrateWeighted Avg. 0.921 0.008 0.922 0.921 0.92 0.976=== Confusion Matrix === a b c d e f g <-- classified as 41 0 0 0 0 0 0 | a = mammal 0 20 0 0 0 0 0 | b = bird 0 0 3 1 0 1 0 | c = reptile 0 0 0 13 0 0 0 | d = fish 0 0 1 0 3 0 0 | e = amphibian 0 0 0 0 0 5 3 | f = insect 0 0 0 0 0 2 8 | g = invertebrate11
  • 12. Regression  |  CPU  performance     2. With the WEKA classifiers, the "-x [value]" option can be used to specify how many folds to use in a cross validation -- e.g. "-x 5", will specify a 5-fold cross-validation. Suppose you wish to perform a "leave one out" cross-validation on the zoo.arff data. How many folds must you specify to achieve this?Because we have 101 instances we need to perform a total of 101 folds.2.3. Make a copy of the weather.arff file and modify it to train a classifier with multiple(discrete) target values. The classifier is supposed to assign your favorite sport according toweather conditions (take, for example, "swimming", "badminton" and "none"). Modify thetraining data according to your settings (create at least 20 training examples). 1. Train the J48 classifier on the original data in weather.arff and on your modified version and report the number of correctly and incorrectly classified instances during the 10-fold cross-validation.First I performed the 10-fold cross validation on the weather.arff. The number of correct and incorrectclassified instances is given in red. === Run information === Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: weather Instances: 14 Attributes: 5 outlook temperature humidity windy play Test mode:10-fold cross-validation === Classifier model (full training set) === J48 pruned tree ------------------ outlook = sunny | humidity <= 75: yes (2.0) | humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0) Number of Leaves : 5 Size of the tree : 8 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 9 64.2857 % Incorrectly Classified Instances 5 35.7143 % Kappa statistic 0.186 Mean absolute error 0.2857 Root mean squared error 0.4818 Relative absolute error 60 % Root relative squared error 97.6586 % Total Number of Instances 14 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.778 0.6 0.7 0.778 0.737 0.789 yes 0.4 0.222 0.5 0.4 0.444 0.789 no12
  • 13. Regression  |  CPU  performance     Weighted Avg. 0.643 0.465 0.629 0.643 0.632 0.789 === Confusion Matrix === a b <-- classified as 7 2 | a = yes 3 2 | b = no 1. We check for the discrete value ‘outlook’ of the weather first, which has 3 discrete values; respectively sunny, outlook and rainy. 2. If we have a sunny outlook we check for the value of the humidity. • If the humidity is lower or equal then 75 we play. • If the humidity is higher then 75 we dont play. 3. If we have a rainy outlook we check if it’s windy or not. It it’s windy we don’t play, if it’s not windy we do play. 4. If we have an overcast outlook we play.Now I’ve edited the weather.arff file with discrete values in stead of a true or false value. I’ve chosen toinclude the sports football (FB), indoor-tennis (TN) and ‘none’/nothing. The correct and incorrectclassified instances are given in red. === Run information === Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: weather Instances: 24 Attributes: 5 outlook temperature humidity windy play Test mode:10-fold cross-validation === Classifier model (full training set) === J48 pruned tree ------------------ outlook = sunny | humidity <= 90: FB (5.0) | humidity > 90: TN (3.0) outlook = overcast | humidity <= 7913
  • 14. Regression  |  CPU  performance     | | temperature <= 85: FB (4.0) | | temperature > 85: TN (2.0) | humidity > 79: TN (3.0) outlook = rainy | humidity <= 79: TN (2.0) | humidity > 79: none (5.0) Number of Leaves : 7 Size of the tree : 12 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 16 66.6667 % Incorrectly Classified Instances 8 33.3333 % Kappa statistic 0.4947 Mean absolute error 0.212 Root mean squared error 0.4295 Relative absolute error 48.9316 % Root relative squared error 92.0147 % Total Number of Instances 24 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.667 0.2 0.667 0.667 0.667 0.763 FB 0.5 0.214 0.625 0.5 0.556 0.754 TN 1 0.105 0.714 1 0.833 0.984 none Weighted Avg. 0.667 0.186 0.659 0.667 0.655 0.805 === Confusion Matrix === a b c <-- classified as 6 3 0 | a = FB 3 5 2 | b = TN 0 0 5 | c = none 2. Include your training data and the corresponding decision tree in your report and comment on its structure.The training data which I used is on the model which is listed above is: @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {FB, TN, none} @data sunny, 95, 95, FALSE, TN sunny, 65, 65, FALSE, FB sunny, 65, 65, TRUE, FB sunny, 76, 78, TRUE, FB sunny, 80, 85, TRUE, FB sunny, 75, 95, FALSE, TN sunny, 80, 80, TRUE, FB sunny, 60, 95, FALSE, TN overcast, 65,65,TRUE, FB overcast, 75,65, TRUE, FB overcast, 85, 90, FALSE, TN overcast, 85, 90, TRUE, TN overcast, 65,68, FALSE, FB overcast, 90,65, TRUE, TN overcast, 65,95, FALSE, TN overcast, 90,65, FALSE, TN overcast, 85,65, TRUE, FB14
  • 15. Regression  |  CPU  performance     rainy,70,96,FALSE,none rainy,68,80,FALSE,none rainy,65,70,TRUE, TN rainy,76,79,FALSE, TN rainy,74,96,TRUE,none rainy,60,80,TRUE,none rainy,74,96,TRUE,noneWeka produced the decision tree which is listed below when running the J48 classifier on the data. 1. We check for the discrete value ‘outlook’ of the weather first, which has 3 discrete values; respectively sunny, outlook and rainy. 2. If we have a sunny outlook we check for the value of the humidity. • If the humidity is lower or equal then 90 we play football. • If the humidity is higher then 90 we play indoor tennis. 3. If we have an overcast we can play both indoor-tennis or football • If the humidity is higher then 79 we play indoor-tennis • If the humidity is lower or equal then 79 we can play both indoor-tennis or football depending on the temperature 5. § If the temperature is lower or equal then 85 we play football § If the temperature is higher then 85 we play indoor-tennis 4. If we have a rainy outlook we check how high the humidity is. We can play indoor-tennis or we stay at home. • If the humidity is lower or equal then 79 we play indoor-tennis • If the humidity is higher then 79 we stay at home. 3. The "-U" option can be used to turn pruning off. What does pruning do and why? What happens to the tree learned from your data when pruning is "off"? Comment on the differences or explain why there is no difference.With pruning we eliminate branches within our model for generalizing parts of our data. If this results toa higher accuracy we mostly leave the new pruned part (of our model) in place. If pruning is not set thenevery branch within the model of the data becomes/remains visible.Within my own model there were no changes when turning pruning on and off. I think this happensbecause there aren’t enough branches (or sub-branches) within my model. In conclusion; if we wouldgeneralize parts of this decision tree then this would definitely lead to overfitting because we describeparts of our model to well. We need to have more branches to profit from pruning. Adding moreattributes could be a solution for this.15