Interpreting Deep Neural Networks Based on Decision Trees

Interpreting Deep Neural Networks
Based on Decision Trees
University of Aizu
System Intelligence Laboratory
s1240183 Tsukasa Ueno
Supervised by Qiangfu Zhao
1

Outline
・Background
・Experiment
・Result
・Discussion
・Future Work
2

Background
・From the 1980s, Neural Network(NN) has been studied and used successfully for solving many
problems.
・From the 2010s, Deep Neural Network(DNN) has come to be noticed with good results.
・ DNN is becoming a core for machine learning.
・Image recognition, Voice recognition, Abnormality detection
3

Background
・However, it is difficult for human to understand why DNN outputs the results.
・That is called “the Black Box problem”.
・Therefore, it is difficult to use DNN resolving problems which should be resolved
carefully
・Medical, Politics, Judicature, etc.
4

Background
・3 types of approaches for interpreting
・Decompositional approach [1]
・Transform each neuron one by one into logic formula
・Computational cost is expensive(exponential to the number of inputs)
・Pedagogical approach [2][3]
・Use the trained NN as a teacher, and train another interpretable model such as Decision Tree
・Computational cost is low, but generalization ability is poor
[1]H. Tsukimoto, “Extracting rules from trained neural networks,” IEEE Transactions on Neural Networks, Vol. 11, No. 2, pp. 377—389, 2000.
[2]S. Ardiansyah, M. A. Majid, and J. M. Zain, “Knowledge of extraction from trained neural network by using decision tree,” 2nd IEEE International Conference on
Science in Information Technology (ICSITech), pp. 220-225, 2016
[3]M. Sato, H. Tsukimoto, “Rule extraction from neural networks via decision tree induction}, Proceedings of International Joint Conference on Neural Networks
(IJCNN'01, No. 3, pp. 1870-1875, 2001.
5

Background
・The third approach
・Eclectic approach
・Combines decompositional and pedagogical approach
・Makes a balance between computational cost and performance
・Our approach belongs to this approach
・Pedagogical approach deals whole NN as teacher
・Our approach deals outputs of a hidden layer as teacher
6

Experiment
・This experiment is trying to interpret DNN using Decision Tree(DT).
・DT is known as an interpretable model.
・We create DT from the outputs of hidden neurons of DNN.
7

Experiment
・Preceding study, 1-5 hidden layers
・An extracting DT from a hidden layer closer to output layer can be more accurate
・And, the DT can be simpler in the sense that the number of nods is smaller
・It shows the possibility of extracting more accurate and more understandable knowledge from a
well-trained DNN.
・This study is extension of the preceding study.
・Here, we study NN with 1-15 hidden layers using more databases
・5 layers were not enough to know the trend
8

Experiment
・Experimental flow
・Step 1: Train NN using Back Propagation
・Step 2: Create DT from output of NN hidden layer which is closer to output layer
・Step 3: Add a new hidden layer between existing layer and output layer
- Before that, we fix the weight of existing layers
・We repeat these steps until number of hidden layers is 15
・We would like to confirm how much difference between the accuracy of DNNs and DT
・And if tree size depends on a number of a hidden layer.
9

Experiment
・Datasets
・From UCI Machine Learning Repository
Data Classes Features Instance
australian 2 14 690
cancer 2 24 683
german 2 24 1000
BHP 4 22 1075
statlog 7 19 2310
wine 3 13 178 10

Experiment
・NN Settings
・Num of Hidden Layers: 1 ~ L_max (in this study: L_max = 15)
・Activation Function: bi-polar sigmoid
・Solver: SGD
・Learning Rate:0.05
・Num of Hidden Neurons: same as number of features of data
・Validation: 10-fold cross validation
11

Result(NN)
・From this result, deep NNs do not
improve the performance significantly
compared with shallow NNs.
・The only exception is the dataset BHP
・For this dataset, the accuracy can
become approximately 100% when the
number of hidden layers is above 6.
13

Result(DT)
・This results show that the DTs also
perform very well for the datasets under
concern
15

Result(difference between NN and DT)
16

Result(difference between NN and DT)
・The difference in most cases are
smaller than 1%
・This means that the DTs can
approximate the original NN very
closely.
17

Result(Tree Size)
・The size decreases when the number
of hidden layer increases
・When the number of hidden layers
reaches a certain number, however, the
tree size often does not change.
・In some case, the tree size may even
increase.
19

Result(BHP Tree from 1st hidden layer) nodes: 71
20

Result(BHP Tree from 2nd hidden layer) nodes: 35
21

Result(BHP Tree from 3rd hidden layer) nodes: 17
22

Result(BHP Tree from 4th hidden layer) nodes: 19
23

24

25

26

27

28

29

30

31

32

33

34

Discussion
・The performance of the NNs is almost the same as that of the DTs.
・When there is enough number of hidden layers, the tree size will not decrease
anymore.
・We can use the tree size as a criterion to determine the number of layers needed
for solving a given problem.
・For example, for most datasets considered here, the proper number of hidden
layers should be less than 6 or 7.
35

Future Work
・Investigate the effect of training parameters
・number of hidden neurons per layers
・number of epochs
・Experiment with larger datasets or datasets for regression
・Define the meaning of hidden neurons outputs
36

Bi-polar sigmoid function
In this study, b = 1
44

Interpreting Deep Neural Networks Based on Decision Trees

Recommended

Recommended

More Related Content

Similar to Interpreting Deep Neural Networks Based on Decision Trees

Similar to Interpreting Deep Neural Networks Based on Decision Trees (20)

Recently uploaded

Recently uploaded (20)

Interpreting Deep Neural Networks Based on Decision Trees