SlideShare a Scribd company logo
Interpreting Deep Neural Networks
Based on Decision Trees
University of Aizu
System Intelligence Laboratory
s1240183 Tsukasa Ueno
Supervised by Qiangfu Zhao
1
Outline
・Background
・Experiment
・Result
・Discussion
・Future Work
2
Background
・From the 1980s, Neural Network(NN) has been studied and used successfully for solving many
problems.
・From the 2010s, Deep Neural Network(DNN) has come to be noticed with good results.
・ DNN is becoming a core for machine learning.
・Image recognition, Voice recognition, Abnormality detection
3
Background
・However, it is difficult for human to understand why DNN outputs the results.
・That is called “the Black Box problem”.
・Therefore, it is difficult to use DNN resolving problems which should be resolved
carefully
・Medical, Politics, Judicature, etc.
4
Background
・3 types of approaches for interpreting
・Decompositional approach [1]
・Transform each neuron one by one into logic formula
・Computational cost is expensive(exponential to the number of inputs)
・Pedagogical approach [2][3]
・Use the trained NN as a teacher, and train another interpretable model such as Decision Tree
・Computational cost is low, but generalization ability is poor
[1]H. Tsukimoto, “Extracting rules from trained neural networks,” IEEE Transactions on Neural Networks, Vol. 11, No. 2, pp. 377—389, 2000.
[2]S. Ardiansyah, M. A. Majid, and J. M. Zain, “Knowledge of extraction from trained neural network by using decision tree,” 2nd IEEE International Conference on
Science in Information Technology (ICSITech), pp. 220-225, 2016
[3]M. Sato, H. Tsukimoto, “Rule extraction from neural networks via decision tree induction}, Proceedings of International Joint Conference on Neural Networks
(IJCNN'01, No. 3, pp. 1870-1875, 2001.
5
Background
・The third approach
・Eclectic approach
・Combines decompositional and pedagogical approach
・Makes a balance between computational cost and performance
・Our approach belongs to this approach
・Pedagogical approach deals whole NN as teacher
・Our approach deals outputs of a hidden layer as teacher
6
Experiment
・This experiment is trying to interpret DNN using Decision Tree(DT).
・DT is known as an interpretable model.
・We create DT from the outputs of hidden neurons of DNN.
7
Experiment
・Preceding study, 1-5 hidden layers
・An extracting DT from a hidden layer closer to output layer can be more accurate
・And, the DT can be simpler in the sense that the number of nods is smaller
・It shows the possibility of extracting more accurate and more understandable knowledge from a
well-trained DNN.
・This study is extension of the preceding study.
・Here, we study NN with 1-15 hidden layers using more databases
・5 layers were not enough to know the trend
8
Experiment
・Experimental flow
・Step 1: Train NN using Back Propagation
・Step 2: Create DT from output of NN hidden layer which is closer to output layer
・Step 3: Add a new hidden layer between existing layer and output layer
- Before that, we fix the weight of existing layers
・We repeat these steps until number of hidden layers is 15
・We would like to confirm how much difference between the accuracy of DNNs and DT
・And if tree size depends on a number of a hidden layer.
9
Experiment
・Datasets
・From UCI Machine Learning Repository
Data Classes Features Instance
australian 2 14 690
cancer 2 24 683
german 2 24 1000
BHP 4 22 1075
statlog 7 19 2310
wine 3 13 178 10
Experiment
・NN Settings
・Num of Hidden Layers: 1 ~ L_max (in this study: L_max = 15)
・Activation Function: bi-polar sigmoid
・Solver: SGD
・Learning Rate:0.05
・Num of Hidden Neurons: same as number of features of data
・Validation: 10-fold cross validation
11
Result(NN)
12
Result(NN)
・From this result, deep NNs do not
improve the performance significantly
compared with shallow NNs.
・The only exception is the dataset BHP
・For this dataset, the accuracy can
become approximately 100% when the
number of hidden layers is above 6.
13
Result(DT)
14
Result(DT)
・This results show that the DTs also
perform very well for the datasets under
concern
15
Result(difference between NN and DT)
16
Result(difference between NN and DT)
・The difference in most cases are
smaller than 1%
・This means that the DTs can
approximate the original NN very
closely.
17
Result(Tree Size)
18
Result(Tree Size)
・The size decreases when the number
of hidden layer increases
・When the number of hidden layers
reaches a certain number, however, the
tree size often does not change.
・In some case, the tree size may even
increase.
19
Result(BHP Tree from 1st hidden layer) nodes: 71
20
Result(BHP Tree from 2nd hidden layer) nodes: 35
21
Result(BHP Tree from 3rd hidden layer) nodes: 17
22
Result(BHP Tree from 4th hidden layer) nodes: 19
23
Result(BHP Tree from 5th hidden layer) nodes: 15
24
Result(BHP Tree from 6th hidden layer) nodes: 11
25
Result(BHP Tree from 7th hidden layer) nodes: 7
26
Result(BHP Tree from 8th hidden layer) nodes: 7
27
Result(BHP Tree from 9th hidden layer) nodes: 7
28
Result(BHP Tree from 10th hidden layer) nodes: 7
29
Result(BHP Tree from 11th hidden layer) nodes: 7
30
Result(BHP Tree from 12th hidden layer) nodes: 7
31
Result(BHP Tree from 13th hidden layer) nodes: 7
32
Result(BHP Tree from 14th hidden layer) nodes: 7
33
Result(BHP Tree from 15th hidden layer) nodes: 7
34
Discussion
・The performance of the NNs is almost the same as that of the DTs.
・When there is enough number of hidden layers, the tree size will not decrease
anymore.
・We can use the tree size as a criterion to determine the number of layers needed
for solving a given problem.
・For example, for most datasets considered here, the proper number of hidden
layers should be less than 6 or 7.
35
Future Work
・Investigate the effect of training parameters
・number of hidden neurons per layers
・number of epochs
・Experiment with larger datasets or datasets for regression
・Define the meaning of hidden neurons outputs
36
Thank you for listening
37
Result(australian)
38
Result(cancer)
39
Result(german)
40
Result(BHP)
41
Result(statlog)
42
Result(wine)
43
Bi-polar sigmoid function
In this study, b = 1
44

More Related Content

Similar to Interpreting Deep Neural Networks Based on Decision Trees

[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
JaeJun Yoo
 
Decision Trees- Random Forests.pdf
Decision Trees- Random Forests.pdfDecision Trees- Random Forests.pdf
Decision Trees- Random Forests.pdf
TahaYasmin
 
Cmpe 255 Short Story Assignment
Cmpe 255 Short Story AssignmentCmpe 255 Short Story Assignment
Cmpe 255 Short Story Assignment
San Jose State University
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
jaskarankaur21
 
lecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very goodlecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very good
ranjankumarbehera14
 
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Aditya K G
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Deep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly DetectionDeep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly Detection
Manmeet Singh
 
BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
Wake Tech BAS
 
AIML2 DNN 3.5hr (111-1).pdf
AIML2 DNN  3.5hr (111-1).pdfAIML2 DNN  3.5hr (111-1).pdf
AIML2 DNN 3.5hr (111-1).pdf
ssuserb4d806
 
Decision tree
Decision treeDecision tree
Decision tree
Varun Jain
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit
Satoru Katsumata
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
Sangwoo Mo
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
Palin analytics
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
ijtsrd
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
batubao
 
Decision tree
Decision treeDecision tree
Decision tree
RINUSATHYAN
 

Similar to Interpreting Deep Neural Networks Based on Decision Trees (20)

[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Decision Trees- Random Forests.pdf
Decision Trees- Random Forests.pdfDecision Trees- Random Forests.pdf
Decision Trees- Random Forests.pdf
 
Cmpe 255 Short Story Assignment
Cmpe 255 Short Story AssignmentCmpe 255 Short Story Assignment
Cmpe 255 Short Story Assignment
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
 
lecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very goodlecture notes about decision tree. Its a very good
lecture notes about decision tree. Its a very good
 
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Deep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly DetectionDeep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly Detection
 
BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
 
AIML2 DNN 3.5hr (111-1).pdf
AIML2 DNN  3.5hr (111-1).pdfAIML2 DNN  3.5hr (111-1).pdf
AIML2 DNN 3.5hr (111-1).pdf
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
 
Decision tree
Decision treeDecision tree
Decision tree
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 

Interpreting Deep Neural Networks Based on Decision Trees

  • 1. Interpreting Deep Neural Networks Based on Decision Trees University of Aizu System Intelligence Laboratory s1240183 Tsukasa Ueno Supervised by Qiangfu Zhao 1
  • 3. Background ・From the 1980s, Neural Network(NN) has been studied and used successfully for solving many problems. ・From the 2010s, Deep Neural Network(DNN) has come to be noticed with good results. ・ DNN is becoming a core for machine learning. ・Image recognition, Voice recognition, Abnormality detection 3
  • 4. Background ・However, it is difficult for human to understand why DNN outputs the results. ・That is called “the Black Box problem”. ・Therefore, it is difficult to use DNN resolving problems which should be resolved carefully ・Medical, Politics, Judicature, etc. 4
  • 5. Background ・3 types of approaches for interpreting ・Decompositional approach [1] ・Transform each neuron one by one into logic formula ・Computational cost is expensive(exponential to the number of inputs) ・Pedagogical approach [2][3] ・Use the trained NN as a teacher, and train another interpretable model such as Decision Tree ・Computational cost is low, but generalization ability is poor [1]H. Tsukimoto, “Extracting rules from trained neural networks,” IEEE Transactions on Neural Networks, Vol. 11, No. 2, pp. 377—389, 2000. [2]S. Ardiansyah, M. A. Majid, and J. M. Zain, “Knowledge of extraction from trained neural network by using decision tree,” 2nd IEEE International Conference on Science in Information Technology (ICSITech), pp. 220-225, 2016 [3]M. Sato, H. Tsukimoto, “Rule extraction from neural networks via decision tree induction}, Proceedings of International Joint Conference on Neural Networks (IJCNN'01, No. 3, pp. 1870-1875, 2001. 5
  • 6. Background ・The third approach ・Eclectic approach ・Combines decompositional and pedagogical approach ・Makes a balance between computational cost and performance ・Our approach belongs to this approach ・Pedagogical approach deals whole NN as teacher ・Our approach deals outputs of a hidden layer as teacher 6
  • 7. Experiment ・This experiment is trying to interpret DNN using Decision Tree(DT). ・DT is known as an interpretable model. ・We create DT from the outputs of hidden neurons of DNN. 7
  • 8. Experiment ・Preceding study, 1-5 hidden layers ・An extracting DT from a hidden layer closer to output layer can be more accurate ・And, the DT can be simpler in the sense that the number of nods is smaller ・It shows the possibility of extracting more accurate and more understandable knowledge from a well-trained DNN. ・This study is extension of the preceding study. ・Here, we study NN with 1-15 hidden layers using more databases ・5 layers were not enough to know the trend 8
  • 9. Experiment ・Experimental flow ・Step 1: Train NN using Back Propagation ・Step 2: Create DT from output of NN hidden layer which is closer to output layer ・Step 3: Add a new hidden layer between existing layer and output layer - Before that, we fix the weight of existing layers ・We repeat these steps until number of hidden layers is 15 ・We would like to confirm how much difference between the accuracy of DNNs and DT ・And if tree size depends on a number of a hidden layer. 9
  • 10. Experiment ・Datasets ・From UCI Machine Learning Repository Data Classes Features Instance australian 2 14 690 cancer 2 24 683 german 2 24 1000 BHP 4 22 1075 statlog 7 19 2310 wine 3 13 178 10
  • 11. Experiment ・NN Settings ・Num of Hidden Layers: 1 ~ L_max (in this study: L_max = 15) ・Activation Function: bi-polar sigmoid ・Solver: SGD ・Learning Rate:0.05 ・Num of Hidden Neurons: same as number of features of data ・Validation: 10-fold cross validation 11
  • 13. Result(NN) ・From this result, deep NNs do not improve the performance significantly compared with shallow NNs. ・The only exception is the dataset BHP ・For this dataset, the accuracy can become approximately 100% when the number of hidden layers is above 6. 13
  • 15. Result(DT) ・This results show that the DTs also perform very well for the datasets under concern 15
  • 17. Result(difference between NN and DT) ・The difference in most cases are smaller than 1% ・This means that the DTs can approximate the original NN very closely. 17
  • 19. Result(Tree Size) ・The size decreases when the number of hidden layer increases ・When the number of hidden layers reaches a certain number, however, the tree size often does not change. ・In some case, the tree size may even increase. 19
  • 20. Result(BHP Tree from 1st hidden layer) nodes: 71 20
  • 21. Result(BHP Tree from 2nd hidden layer) nodes: 35 21
  • 22. Result(BHP Tree from 3rd hidden layer) nodes: 17 22
  • 23. Result(BHP Tree from 4th hidden layer) nodes: 19 23
  • 24. Result(BHP Tree from 5th hidden layer) nodes: 15 24
  • 25. Result(BHP Tree from 6th hidden layer) nodes: 11 25
  • 26. Result(BHP Tree from 7th hidden layer) nodes: 7 26
  • 27. Result(BHP Tree from 8th hidden layer) nodes: 7 27
  • 28. Result(BHP Tree from 9th hidden layer) nodes: 7 28
  • 29. Result(BHP Tree from 10th hidden layer) nodes: 7 29
  • 30. Result(BHP Tree from 11th hidden layer) nodes: 7 30
  • 31. Result(BHP Tree from 12th hidden layer) nodes: 7 31
  • 32. Result(BHP Tree from 13th hidden layer) nodes: 7 32
  • 33. Result(BHP Tree from 14th hidden layer) nodes: 7 33
  • 34. Result(BHP Tree from 15th hidden layer) nodes: 7 34
  • 35. Discussion ・The performance of the NNs is almost the same as that of the DTs. ・When there is enough number of hidden layers, the tree size will not decrease anymore. ・We can use the tree size as a criterion to determine the number of layers needed for solving a given problem. ・For example, for most datasets considered here, the proper number of hidden layers should be less than 6 or 7. 35
  • 36. Future Work ・Investigate the effect of training parameters ・number of hidden neurons per layers ・number of epochs ・Experiment with larger datasets or datasets for regression ・Define the meaning of hidden neurons outputs 36
  • 37. Thank you for listening 37
  • 44. Bi-polar sigmoid function In this study, b = 1 44