SlideShare a Scribd company logo
Towards Better Analysis of
Deep Convolutional Neural Networks
Mengchen Liu, Jiaxin Shi, Zhen Li, Chongxuan Li, Jun Zhu, Shixia Liu
IEEE VAST 2016
2
IEEE VAST 2016
• Visualizing Social Media Content with SentenTree
• http://ieeevis.org/year/2016/info/overview-amp-topics/papers
3
INTRODUCTION
• deep CNNs have been employed in deep reinforcement learning to extract
robust representations and which has led to humanlevel performance in
intelligent tasks such as Atari games and the game of Go.
• However, a deep CNN is often treated as a “black box”, It is generally difficult
for machine learning experts to understand the role of each component
(neuron, connection) due to the large number of interacting, non-linear parts
in a CNN.
• For example, training a single deep CNN on a large dataset may take several
days or even weeks.
4
INTRODUCTION
• In most cases, it is hard to summarize reusable knowledge from a failed or succ
essful training case and transfer it to the development of other relevant deep le
arning models.
• To tackle these challenges, we have developed an interactive, visual analytics sy
stem called CNNVis, which aims to help machine learning experts better unders
tand, diagnose, and refine CNNs.
5
INTRODUCTION
• The key technical contributions of this work are:
1. A visual analytics system that helps experts understand, diagnose, and
refine deep CNNs.
2. A hybrid visualization that combines a DAG with rectangle packing, matrix
visualization, and a biclustering-based edge bundling method.
6
RELATED WORK
• Existing methods can be classified into two categories, namely, code inversion
and activation maximization.
• The code inversion method synthesizes an image from the activation vector of a
specific layer, which is produced by a real image. For example, Zeiler [59]
utilized a multi-layered Deconvolutional Network to project the activations ont
o the input pixel space.
( Visualizing and Understanding Convolutional Networks 2014, Matthew D.
Zeiler , Rob Fergus, Dept. of Computer Science, New York University)
• However, a simple projection without any consideration for priors will produce i
mages that do not resemble natural images.
7
RELATED WORK
• The activation maximization method aims to find an image that maximally
activates a given neuron. It can be modeled as an optimization problem over
the image space.
• As a result, most activation maximization methods focus on defining the regular
ization term using natural image priors.
• Visualizing deep convolutional neural networks using natural pre-images
(Aravindh Mahendran , Andrea Vedaldi University of Oxford April 15, 2016)
• Understanding Neural Networks Through Deep Visualization
(Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson)
BACKGROUND
• The typical architecture of a CNN.
• there are two kinds of hidden layers: convolutional layers and pooling layers.
8
BACKGROUND
• convolutional layers:
W =
1 0 1
0 1 0
1 0 1
內積
9
BACKGROUND
• Pooling layers:
10
BACKGROUND
• Activation function
• An activation function is a non-linear transformation that can prevent CNNs
from learning trivial linear combinations of the inputs
• ReLU : f(x)=max(0,x)
11
BACKGROUND
12
BACKGROUND
• Softmax:
13
BACKGROUND
• Loss function: A loss function is used to evaluate the difference between the
output of a CNN and a true image label (i.e., the loss).
14
BACKGROUND
• The typical architecture of a CNN:
• Intro To CNNs:
• https://classroom.udacity.com/courses/ud730/lessons/6377263405/concepts/6374
1833610923#.
15
CNNVIS
• CNNVis was designed with a team of deep learning experts (six researchers)
over the course of twelve months. For simplicity’s sake, we denote these
experts as Ei (i = 1,2,··· ,6).
• Common deep learning frameworks include Caffe, Theano , Torch , and
TensorFlow . Researchers use these frameworks to train, debug, and deploy CN
Ns.
16
Requirement Analysis
• Providing an overview of the learned features of neurons.
• Interactively modifying the neuron clustering results
• Exploring multiple facets of neurons.
• Revealing how low-level features are aggregated into highlevel features.
• Examining the debugging information.
17
System Overview
• The list of requirements have motivated us to develop a visual analytics system,
CNNVis. It consists of the following components:
18
System Overview
1. A DAG formulation module to convert a CNN to a DAG and to aggregate
neurons and layers for an overview.
2. A neuron cluster visualization module to disclose the multiple facets of each
neuron.
3. A biclustering-based edge bundling to reduce visual clutter caused by a large
number of connections.
4. An interaction module that provides a set of interactions such as interactive
clustering result modification and showing debug information on demand.
19
Dag Formulation
• Layer Clustering:
experts are interested in the output of an activation layer instead of that of a
convolutional layer. As the outputs of these two layers have a one-to-one
mapping relationship, we then merge these two layers and simply show the out
put of the activation layer
20
Dag Formulation
• Neuron Clustering:
We assume that neurons with similar activations have similar roles. Thus, we
aggregate the activations into an average activation vector over the set of
classes in the training set.
• In CNNVis, we employ two widely used clustering methods, K-Means and Mean
Shift.
21
Neuron Cluster Visualization
• Computing learned features of neurons:
We employ the method used in [15] to compute the learned feature of a
neuron because it is fast and the results are easier to understand.
• R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich
feature hierarchies for accurate object detection and semantic segmentation
• we formulate the layout of image patches as a rectangle packing problem,
aiming to pack the given rectangles into an enclosing rectangle of a minimum
area. We use the size of an image patch to encode the importance of the
corresponding neuron
22
Neuron Cluster Visualization
• Existing rectangle packing algorithms can handle a small number of rectangles
well (e.g., 15 rectangles in less than 0.1s). However, the computing time grows
exponentially as the number of packed rectangles increases (e.g., 25 rectangles
in more than one hour)
• To solve this problem, we have developed a hierarchical rectangle packing
algorithm. Specifically, our algorithm contains the following steps:
分類聚層(wiki解釋) Treemap 布局 矩陣打包算法
23
Activations as Matrix Visualization
• In particular, the color of a cell in the i-th row and j-th column represents the av
erage activation of the i-th neuron ni in class cj.
• The order of columns (classes) should be consistent in different neuron clusters.
As a result, we only reorder the rows (neurons) in the matrix
24
Biclustering-based Edge Bundling
• we developed a biclustering-based edge bundling method to bundle edges with
both similar and large absolute weights. For a given layer, a bicluster is a subset
of input neuron clusters and a subset of output neuron clusters.
• Biclustering :let maximum value wmax in W = {wi j
pos} ∪ {|wi j
neg |}. If wmax ∈
{wi j
pos }, then we select the edges satisfying: |wi j
pos −wmax| < τ.
• Edge Bundling:. Inspired by BiSet , we also add an “in-between” layer between
the input neuron clusters and the output neuron clusters.
25
26
Overview - BaseCNN
• BaseCNN was designed based on a widely used deep which is often used in
image classification.
• BaseCNN consists of 10 convolutional layers and two fully connected layers.
The convolutional layers are organized into four groups, containing 2, 2, 3, and
3 convolutional layers, respectively. Each group is ended with a max-pooling
layer.
27
Overview - BaseCNN
• Framework: caffe.
• Dataset: CIFAR10.
• The BaseCNN model achieves 11.32% error on the test set.
28
Design of Case Studies
• We have worked closely with the expert team to design three case studies from
their current research on CNNs.
• based on BaseCNN, the expert team constructed several variants and aimed to
study the influence of the network architecture on the performance.
• Second, the expert team required to diagnose a training process that failed to
converge. For example, E3 changed the output activation function and the loss
function of BaseCNN. However,the training failed. The expert team wanted to
diagnose the training process and find potential issues.
29
Case Study: Influence of Network Architecture
• Network Depth: E2 further investigated how the depth of the network affects
the features detected by the neurons.
• He compared BaseCNN with two variant models: (1)ShallowCNN, which cuts off
the 4-th group of convolutional layers, and (2)DeepCNN, which doubles the nu
mber of convolutional layers.
Error ConvLayers Layers
ShallowCNN 11.94% 7 30
BaseCNN 11.33% 10 40
DeepCNN 14.77% 20 70
Table 1. Performance comparison between CNNs with different depth. “#ConvLayers” is the
number of convolutional layers and “#Layers” is the number of layers that can be visualized.
30
Case Study: Influence of Network Architecture
• In ShallowCNN, he identified that there
were indeed a lot more “impure” clusters
in the top convolutional layers compared
to those in BaseCNN .
• which indicates that a model without a
sufficiently large depth is often incapable
of distinguishing the images, which can l
ead to a decrease of the performance.
<= Fig. 13(a), Influence of the model depth:
high level features of a shallowCNN
31
• In DeepCNN, expert E2 noticed that almost all the
weights in 4-th group were positive (Fig. 13 (b)).
• By further expanding the 4-th group of convolutional l
ayers, E2 identified several consecutive layers that
have a similar pattern (Fig. 14).
• such redundancy may hurt overall performance and
make the learning process computationally
expensive and statistically ineffective.
Case Study: Influence of Network Architecture
<= Fig. 13(b) , A convolutional layer whose weights
are almost positive in DeepCNN.
32
Case Study: Influence of Network Architecture
• Network Width: Another important factor that influences performance is the
width of a CNN. To have a comprehensive understanding of its influence, there
are several variants of BaseCNN, named by BaseCNN×w.
Error #params Training loss Testing loss
BaseCNN×4 12.33% 4.22M 0.04 0.51
BaseCNN×2 11.47% 2.11M 0.07 0.43
BaseCNN 11.33% 1.05M 0.16 0.40
BaseCNN×0.5 12.61% 0.53M 0.34 0.40
BaseCNN×0.25 17.39% 0.26M 0.65 0.53
Table 2. Performance comparison between CNNs with different widths. #params is the number
of parameters in the model, which is measured in millions.
33
Case Study: Influence of Network Architecture
• E2 wanted to examine the influence of overfitting on CNNs, He visualized
BaseCNN×4 with our visual analytics system. After examining the higher level
features, the expert did not found much difference compared to BaseCNN.
• Then he switched to examine low level features. He instantly found that several
neurons learn to detect almost the same features (Fig. 15 (a))
34
Case Study: Influence of Network Architecture
• We often use a quantitative criterion (e.g., accuracy) to evaluate the quality of
a model. However, a quantitative criterion itself cannot provide sufficient intuiti
on and clear guidelines. Even I know a CNN overfits, it is hard to decide which la
yer to narrow down or remove. While CNNVis can guide me to locate the candi
date layers, which is very useful in my research.
35
Case Study: Influence of Network Architecture
• E2 then compared the performance of BaseCNN with narrower networks
(BaseCNN×0.5 and BaseCNN×0.25).
• The expert selected two similar classes,
automobile and truck, to examine the
patterns of the relevant neurons.
After analyzing low level features,
he did not find much difference
compared to BaseCNN.
• Thus, he switched his attention to
high level features, and he found
that there were several
“impure” neuron clusters.
36
Case Study: Training Diagnosis
• He wanted to find what influence the negative weights had on the model.
expert E3 switched to examine the activation matrix. He spotted some neuron
clusters where all the neurons had zero activations on all classes.
• To further study this phenomenon, he sequentially expanded the second,
third, and fourth groups of convolutional layers. He found that the ratio of
neurons with zero activations became larger and larger from the lower layers
to the higher layers (Fig. 17).
• The activation functions of these neurons are ReLUs. He continued to zoom in
and further examine the inputs fed into the ReLUs, which he found were always
negative. If the input of a ReLU is less than zero, it generates a zero activation.
37
Case Study: Training Diagnosis
38
Case Study: Training Diagnosis
• Having learned why the training process got stuck, expert E3 proposed a
method to force the network away from that situation.
• He added a batch-normalization layer after each convolutional and fully-connec
ted layer, before the ReLU activation function. With batchnormalization, the
input fed into the ReLUs should no longer be mostly negative.
• The improved model achieved an average error of 9.43% on the CIFAR-10
dataset.
• “I have investigated this problem for a long time and inserted all kinds of code
fragments to print the debugging information during training. However, after
many unsuccessful attempts reading the debugging information, I eventually
gave up. It is awesome to have a toolkit like CNNVis, which intuitively illustrates
the training statistics and allows me to explore the training process from multip
le perspectives.”
39
• Three case studies were conducted to demonstrate the effectiveness and
usefulness of the system for comprehensive analysis of CNNs.
• Currently, CNNVis focuses on analyzing a snapshot of the CNN which is for
offline analysis. All the experts expressed the need to integrate CNNVis with the
online training process and continuously get an update of the training status.
• A key issue is the difficulty of selecting representative snapshots and
comparing them effectively.
• Another interesting venue for future work is to apply CNNVis to other types of
deep models that cannot be formulated as a DAG, such as recurrent neural net
work (RNN).
CONCLUSION
40
• http://shixialiu.com/publications/cnnvis/demo/
Demo
41
Thank you
NCCU DCT, Effy Tseng, 20170111

More Related Content

What's hot

Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkItachi SK
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkJunho Cho
 
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksVisualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksWilly Marroquin (WillyDevNET)
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKMd Rajib Bhuiyan
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
ImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksWilly Marroquin (WillyDevNET)
 

What's hot (20)

Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksVisualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional Networks
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
ImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks
 

Similar to Towards better analysis of deep convolutional neural networks

04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptxZainULABIDIN496386
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
FINAL_Team_4.pptx
FINAL_Team_4.pptxFINAL_Team_4.pptx
FINAL_Team_4.pptxnitin571047
 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingShivam Sawhney
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
Teach a neural network to read handwriting
Teach a neural network to read handwritingTeach a neural network to read handwriting
Teach a neural network to read handwritingVipul Kaushal
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Randa Elanwar
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
final_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxfinal_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxshwetabhagat25
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 

Similar to Towards better analysis of deep convolutional neural networks (20)

04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Dl
DlDl
Dl
 
FINAL_Team_4.pptx
FINAL_Team_4.pptxFINAL_Team_4.pptx
FINAL_Team_4.pptx
 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands Handwriting
 
Lecture-7 Applied ML.pptx
Lecture-7 Applied ML.pptxLecture-7 Applied ML.pptx
Lecture-7 Applied ML.pptx
 
Deep learning
Deep learning Deep learning
Deep learning
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
DL.pdf
DL.pdfDL.pdf
DL.pdf
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Teach a neural network to read handwriting
Teach a neural network to read handwritingTeach a neural network to read handwriting
Teach a neural network to read handwriting
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
final_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxfinal_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptx
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 

Recently uploaded

0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
 
05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community NetworkingMichael Orias
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesIP ServerOne
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Orkestra
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfSkillCertProExams
 
Hi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptxHi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptxShivamM16
 
The Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDFThe Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDFRahsaan L. Browne
 
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22LHelferty
 
527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdfrajpreetkaur75080
 
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...Rahsaan L. Browne
 
123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptxgargh1099
 
Introduction of Biology in living organisms
Introduction of Biology in living organismsIntroduction of Biology in living organisms
Introduction of Biology in living organismssoumyapottola
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerVladimir Samoylov
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationAccess Innovations, Inc.
 

Recently uploaded (14)

0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
 
Hi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptxHi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptx
 
The Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDFThe Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDF
 
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
 
527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf
 
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
 
123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx
 
Introduction of Biology in living organisms
Introduction of Biology in living organismsIntroduction of Biology in living organisms
Introduction of Biology in living organisms
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 

Towards better analysis of deep convolutional neural networks

  • 1. Towards Better Analysis of Deep Convolutional Neural Networks Mengchen Liu, Jiaxin Shi, Zhen Li, Chongxuan Li, Jun Zhu, Shixia Liu IEEE VAST 2016
  • 2. 2 IEEE VAST 2016 • Visualizing Social Media Content with SentenTree • http://ieeevis.org/year/2016/info/overview-amp-topics/papers
  • 3. 3 INTRODUCTION • deep CNNs have been employed in deep reinforcement learning to extract robust representations and which has led to humanlevel performance in intelligent tasks such as Atari games and the game of Go. • However, a deep CNN is often treated as a “black box”, It is generally difficult for machine learning experts to understand the role of each component (neuron, connection) due to the large number of interacting, non-linear parts in a CNN. • For example, training a single deep CNN on a large dataset may take several days or even weeks.
  • 4. 4 INTRODUCTION • In most cases, it is hard to summarize reusable knowledge from a failed or succ essful training case and transfer it to the development of other relevant deep le arning models. • To tackle these challenges, we have developed an interactive, visual analytics sy stem called CNNVis, which aims to help machine learning experts better unders tand, diagnose, and refine CNNs.
  • 5. 5 INTRODUCTION • The key technical contributions of this work are: 1. A visual analytics system that helps experts understand, diagnose, and refine deep CNNs. 2. A hybrid visualization that combines a DAG with rectangle packing, matrix visualization, and a biclustering-based edge bundling method.
  • 6. 6 RELATED WORK • Existing methods can be classified into two categories, namely, code inversion and activation maximization. • The code inversion method synthesizes an image from the activation vector of a specific layer, which is produced by a real image. For example, Zeiler [59] utilized a multi-layered Deconvolutional Network to project the activations ont o the input pixel space. ( Visualizing and Understanding Convolutional Networks 2014, Matthew D. Zeiler , Rob Fergus, Dept. of Computer Science, New York University) • However, a simple projection without any consideration for priors will produce i mages that do not resemble natural images.
  • 7. 7 RELATED WORK • The activation maximization method aims to find an image that maximally activates a given neuron. It can be modeled as an optimization problem over the image space. • As a result, most activation maximization methods focus on defining the regular ization term using natural image priors. • Visualizing deep convolutional neural networks using natural pre-images (Aravindh Mahendran , Andrea Vedaldi University of Oxford April 15, 2016) • Understanding Neural Networks Through Deep Visualization (Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson)
  • 8. BACKGROUND • The typical architecture of a CNN. • there are two kinds of hidden layers: convolutional layers and pooling layers. 8
  • 9. BACKGROUND • convolutional layers: W = 1 0 1 0 1 0 1 0 1 內積 9
  • 11. BACKGROUND • Activation function • An activation function is a non-linear transformation that can prevent CNNs from learning trivial linear combinations of the inputs • ReLU : f(x)=max(0,x) 11
  • 14. BACKGROUND • Loss function: A loss function is used to evaluate the difference between the output of a CNN and a true image label (i.e., the loss). 14
  • 15. BACKGROUND • The typical architecture of a CNN: • Intro To CNNs: • https://classroom.udacity.com/courses/ud730/lessons/6377263405/concepts/6374 1833610923#. 15
  • 16. CNNVIS • CNNVis was designed with a team of deep learning experts (six researchers) over the course of twelve months. For simplicity’s sake, we denote these experts as Ei (i = 1,2,··· ,6). • Common deep learning frameworks include Caffe, Theano , Torch , and TensorFlow . Researchers use these frameworks to train, debug, and deploy CN Ns. 16
  • 17. Requirement Analysis • Providing an overview of the learned features of neurons. • Interactively modifying the neuron clustering results • Exploring multiple facets of neurons. • Revealing how low-level features are aggregated into highlevel features. • Examining the debugging information. 17
  • 18. System Overview • The list of requirements have motivated us to develop a visual analytics system, CNNVis. It consists of the following components: 18
  • 19. System Overview 1. A DAG formulation module to convert a CNN to a DAG and to aggregate neurons and layers for an overview. 2. A neuron cluster visualization module to disclose the multiple facets of each neuron. 3. A biclustering-based edge bundling to reduce visual clutter caused by a large number of connections. 4. An interaction module that provides a set of interactions such as interactive clustering result modification and showing debug information on demand. 19
  • 20. Dag Formulation • Layer Clustering: experts are interested in the output of an activation layer instead of that of a convolutional layer. As the outputs of these two layers have a one-to-one mapping relationship, we then merge these two layers and simply show the out put of the activation layer 20
  • 21. Dag Formulation • Neuron Clustering: We assume that neurons with similar activations have similar roles. Thus, we aggregate the activations into an average activation vector over the set of classes in the training set. • In CNNVis, we employ two widely used clustering methods, K-Means and Mean Shift. 21
  • 22. Neuron Cluster Visualization • Computing learned features of neurons: We employ the method used in [15] to compute the learned feature of a neuron because it is fast and the results are easier to understand. • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation • we formulate the layout of image patches as a rectangle packing problem, aiming to pack the given rectangles into an enclosing rectangle of a minimum area. We use the size of an image patch to encode the importance of the corresponding neuron 22
  • 23. Neuron Cluster Visualization • Existing rectangle packing algorithms can handle a small number of rectangles well (e.g., 15 rectangles in less than 0.1s). However, the computing time grows exponentially as the number of packed rectangles increases (e.g., 25 rectangles in more than one hour) • To solve this problem, we have developed a hierarchical rectangle packing algorithm. Specifically, our algorithm contains the following steps: 分類聚層(wiki解釋) Treemap 布局 矩陣打包算法 23
  • 24. Activations as Matrix Visualization • In particular, the color of a cell in the i-th row and j-th column represents the av erage activation of the i-th neuron ni in class cj. • The order of columns (classes) should be consistent in different neuron clusters. As a result, we only reorder the rows (neurons) in the matrix 24
  • 25. Biclustering-based Edge Bundling • we developed a biclustering-based edge bundling method to bundle edges with both similar and large absolute weights. For a given layer, a bicluster is a subset of input neuron clusters and a subset of output neuron clusters. • Biclustering :let maximum value wmax in W = {wi j pos} ∪ {|wi j neg |}. If wmax ∈ {wi j pos }, then we select the edges satisfying: |wi j pos −wmax| < τ. • Edge Bundling:. Inspired by BiSet , we also add an “in-between” layer between the input neuron clusters and the output neuron clusters. 25
  • 26. 26 Overview - BaseCNN • BaseCNN was designed based on a widely used deep which is often used in image classification. • BaseCNN consists of 10 convolutional layers and two fully connected layers. The convolutional layers are organized into four groups, containing 2, 2, 3, and 3 convolutional layers, respectively. Each group is ended with a max-pooling layer.
  • 27. 27 Overview - BaseCNN • Framework: caffe. • Dataset: CIFAR10. • The BaseCNN model achieves 11.32% error on the test set.
  • 28. 28 Design of Case Studies • We have worked closely with the expert team to design three case studies from their current research on CNNs. • based on BaseCNN, the expert team constructed several variants and aimed to study the influence of the network architecture on the performance. • Second, the expert team required to diagnose a training process that failed to converge. For example, E3 changed the output activation function and the loss function of BaseCNN. However,the training failed. The expert team wanted to diagnose the training process and find potential issues.
  • 29. 29 Case Study: Influence of Network Architecture • Network Depth: E2 further investigated how the depth of the network affects the features detected by the neurons. • He compared BaseCNN with two variant models: (1)ShallowCNN, which cuts off the 4-th group of convolutional layers, and (2)DeepCNN, which doubles the nu mber of convolutional layers. Error ConvLayers Layers ShallowCNN 11.94% 7 30 BaseCNN 11.33% 10 40 DeepCNN 14.77% 20 70 Table 1. Performance comparison between CNNs with different depth. “#ConvLayers” is the number of convolutional layers and “#Layers” is the number of layers that can be visualized.
  • 30. 30 Case Study: Influence of Network Architecture • In ShallowCNN, he identified that there were indeed a lot more “impure” clusters in the top convolutional layers compared to those in BaseCNN . • which indicates that a model without a sufficiently large depth is often incapable of distinguishing the images, which can l ead to a decrease of the performance. <= Fig. 13(a), Influence of the model depth: high level features of a shallowCNN
  • 31. 31 • In DeepCNN, expert E2 noticed that almost all the weights in 4-th group were positive (Fig. 13 (b)). • By further expanding the 4-th group of convolutional l ayers, E2 identified several consecutive layers that have a similar pattern (Fig. 14). • such redundancy may hurt overall performance and make the learning process computationally expensive and statistically ineffective. Case Study: Influence of Network Architecture <= Fig. 13(b) , A convolutional layer whose weights are almost positive in DeepCNN.
  • 32. 32 Case Study: Influence of Network Architecture • Network Width: Another important factor that influences performance is the width of a CNN. To have a comprehensive understanding of its influence, there are several variants of BaseCNN, named by BaseCNN×w. Error #params Training loss Testing loss BaseCNN×4 12.33% 4.22M 0.04 0.51 BaseCNN×2 11.47% 2.11M 0.07 0.43 BaseCNN 11.33% 1.05M 0.16 0.40 BaseCNN×0.5 12.61% 0.53M 0.34 0.40 BaseCNN×0.25 17.39% 0.26M 0.65 0.53 Table 2. Performance comparison between CNNs with different widths. #params is the number of parameters in the model, which is measured in millions.
  • 33. 33 Case Study: Influence of Network Architecture • E2 wanted to examine the influence of overfitting on CNNs, He visualized BaseCNN×4 with our visual analytics system. After examining the higher level features, the expert did not found much difference compared to BaseCNN. • Then he switched to examine low level features. He instantly found that several neurons learn to detect almost the same features (Fig. 15 (a))
  • 34. 34 Case Study: Influence of Network Architecture • We often use a quantitative criterion (e.g., accuracy) to evaluate the quality of a model. However, a quantitative criterion itself cannot provide sufficient intuiti on and clear guidelines. Even I know a CNN overfits, it is hard to decide which la yer to narrow down or remove. While CNNVis can guide me to locate the candi date layers, which is very useful in my research.
  • 35. 35 Case Study: Influence of Network Architecture • E2 then compared the performance of BaseCNN with narrower networks (BaseCNN×0.5 and BaseCNN×0.25). • The expert selected two similar classes, automobile and truck, to examine the patterns of the relevant neurons. After analyzing low level features, he did not find much difference compared to BaseCNN. • Thus, he switched his attention to high level features, and he found that there were several “impure” neuron clusters.
  • 36. 36 Case Study: Training Diagnosis • He wanted to find what influence the negative weights had on the model. expert E3 switched to examine the activation matrix. He spotted some neuron clusters where all the neurons had zero activations on all classes. • To further study this phenomenon, he sequentially expanded the second, third, and fourth groups of convolutional layers. He found that the ratio of neurons with zero activations became larger and larger from the lower layers to the higher layers (Fig. 17). • The activation functions of these neurons are ReLUs. He continued to zoom in and further examine the inputs fed into the ReLUs, which he found were always negative. If the input of a ReLU is less than zero, it generates a zero activation.
  • 38. 38 Case Study: Training Diagnosis • Having learned why the training process got stuck, expert E3 proposed a method to force the network away from that situation. • He added a batch-normalization layer after each convolutional and fully-connec ted layer, before the ReLU activation function. With batchnormalization, the input fed into the ReLUs should no longer be mostly negative. • The improved model achieved an average error of 9.43% on the CIFAR-10 dataset. • “I have investigated this problem for a long time and inserted all kinds of code fragments to print the debugging information during training. However, after many unsuccessful attempts reading the debugging information, I eventually gave up. It is awesome to have a toolkit like CNNVis, which intuitively illustrates the training statistics and allows me to explore the training process from multip le perspectives.”
  • 39. 39 • Three case studies were conducted to demonstrate the effectiveness and usefulness of the system for comprehensive analysis of CNNs. • Currently, CNNVis focuses on analyzing a snapshot of the CNN which is for offline analysis. All the experts expressed the need to integrate CNNVis with the online training process and continuously get an update of the training status. • A key issue is the difficulty of selecting representative snapshots and comparing them effectively. • Another interesting venue for future work is to apply CNNVis to other types of deep models that cannot be formulated as a DAG, such as recurrent neural net work (RNN). CONCLUSION
  • 41. 41 Thank you NCCU DCT, Effy Tseng, 20170111