This document describes DenseNets, a type of convolutional neural network architecture. DenseNets connect each layer to every other layer in a feed-forward fashion to encourage feature reuse and consolidate feature maps early in the network. This architecture improves information and gradient flow. The document outlines key DenseNet concepts like collective knowledge, compression layers, and growth rate. It also provides results comparing DenseNets to ResNet on CIFAR-10 and ImageNet datasets.
This document summarizes the DeepLab models for semantic image segmentation: DeepLab v1 used atrous convolution with VGG-16 as the backbone network. DeepLab v2 improved on this with atrous spatial pyramid pooling and added ResNet-101 as an option. DeepLab v3 removed dense CRFs and introduced multi-grid atrous convolution and bootstrapping. DeepLab v3+ uses an encoder-decoder architecture with Xception or ResNet-101 as the backbone and atrous separable convolutions.
Soft Actor-Critic is an off-policy maximum entropy deep reinforcement learning algorithm that uses a stochastic actor. It was presented in a 2017 NIPS paper by researchers from OpenAI, UC Berkeley, and DeepMind. Soft Actor-Critic extends the actor-critic framework by incorporating an entropy term into the reward function to encourage exploration. This allows the agent to learn stochastic policies that can operate effectively in environments with complex, sparse rewards. The algorithm was shown to learn robust policies on continuous control tasks using deep neural networks to approximate the policy and action-value functions.
This document describes DenseNets, a type of convolutional neural network architecture. DenseNets connect each layer to every other layer in a feed-forward fashion to encourage feature reuse and consolidate feature maps early in the network. This architecture improves information and gradient flow. The document outlines key DenseNet concepts like collective knowledge, compression layers, and growth rate. It also provides results comparing DenseNets to ResNet on CIFAR-10 and ImageNet datasets.
This document summarizes the DeepLab models for semantic image segmentation: DeepLab v1 used atrous convolution with VGG-16 as the backbone network. DeepLab v2 improved on this with atrous spatial pyramid pooling and added ResNet-101 as an option. DeepLab v3 removed dense CRFs and introduced multi-grid atrous convolution and bootstrapping. DeepLab v3+ uses an encoder-decoder architecture with Xception or ResNet-101 as the backbone and atrous separable convolutions.
Soft Actor-Critic is an off-policy maximum entropy deep reinforcement learning algorithm that uses a stochastic actor. It was presented in a 2017 NIPS paper by researchers from OpenAI, UC Berkeley, and DeepMind. Soft Actor-Critic extends the actor-critic framework by incorporating an entropy term into the reward function to encourage exploration. This allows the agent to learn stochastic policies that can operate effectively in environments with complex, sparse rewards. The algorithm was shown to learn robust policies on continuous control tasks using deep neural networks to approximate the policy and action-value functions.
This document discusses neural networks and their biological and technical underpinnings. It covers how natural neural networks operate using electrochemical signals and thresholds. It also discusses early artificial neural network models like McCulloch-Pitts networks and perceptrons. Perceptrons are defined as single-layer feedforward networks and can only represent linearly separable functions. The document introduces the concept of adding hidden layers to networks to increase their computational power and ability to represent more complex functions like XOR.
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A SurveyToru Tamaki
H. Song, M. Kim, D. Park, Y. Shin and J. -G. Lee, "Learning From Noisy Labels With Deep Neural Networks: A Survey", in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2022.3152527. IEEE TNNLS 2022
https://ieeexplore.ieee.org/document/9729424
https://arxiv.org/abs/2007.08199
center point-based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. Center- Net achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS.
This document provides an introduction to spectral graph theory. It discusses how spectral graph theory connects combinatorics and algebra through studying graphs using eigenvalues and eigenvectors of adjacency matrices. It covers applications of spectral graph theory such as spectral clustering, which uses eigenvectors of the graph Laplacian as features for clustering nodes, and graph convolutional networks, which apply graph filtering and node-wise transformations to classify nodes in a graph.
This document provides an overview of convolutional neural networks and summarizes four popular CNN architectures: AlexNet, VGG, GoogLeNet, and ResNet. It explains that CNNs are made up of convolutional and subsampling layers for feature extraction followed by dense layers for classification. It then briefly describes key aspects of each architecture like ReLU activation, inception modules, residual learning blocks, and their performance on image classification tasks.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
This document summarizes the correspondence between single-layer neural networks and Gaussian processes (GPs). It reviews how the outputs of a single-layer neural network converge to a GP in the infinite-width limit, with the network's covariance function determined by its architecture. The document derives the mean and covariance functions for the GP corresponding to a single-layer network, and notes that different network outputs are independent GPs.
This document discusses neural networks and their biological and technical underpinnings. It covers how natural neural networks operate using electrochemical signals and thresholds. It also discusses early artificial neural network models like McCulloch-Pitts networks and perceptrons. Perceptrons are defined as single-layer feedforward networks and can only represent linearly separable functions. The document introduces the concept of adding hidden layers to networks to increase their computational power and ability to represent more complex functions like XOR.
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A SurveyToru Tamaki
H. Song, M. Kim, D. Park, Y. Shin and J. -G. Lee, "Learning From Noisy Labels With Deep Neural Networks: A Survey", in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2022.3152527. IEEE TNNLS 2022
https://ieeexplore.ieee.org/document/9729424
https://arxiv.org/abs/2007.08199
center point-based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. Center- Net achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS.
This document provides an introduction to spectral graph theory. It discusses how spectral graph theory connects combinatorics and algebra through studying graphs using eigenvalues and eigenvectors of adjacency matrices. It covers applications of spectral graph theory such as spectral clustering, which uses eigenvectors of the graph Laplacian as features for clustering nodes, and graph convolutional networks, which apply graph filtering and node-wise transformations to classify nodes in a graph.
This document provides an overview of convolutional neural networks and summarizes four popular CNN architectures: AlexNet, VGG, GoogLeNet, and ResNet. It explains that CNNs are made up of convolutional and subsampling layers for feature extraction followed by dense layers for classification. It then briefly describes key aspects of each architecture like ReLU activation, inception modules, residual learning blocks, and their performance on image classification tasks.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
This document summarizes the correspondence between single-layer neural networks and Gaussian processes (GPs). It reviews how the outputs of a single-layer neural network converge to a GP in the infinite-width limit, with the network's covariance function determined by its architecture. The document derives the mean and covariance functions for the GP corresponding to a single-layer network, and notes that different network outputs are independent GPs.
Photo wake up - 3d character animation from a single photoKyeongUkJang
The document describes the steps involved in animating a 3D character model from a single photo. It involves detecting the person in the photo using Faster R-CNN, estimating their 2D pose, segmenting the person from the background, fitting the SMPL body model to generate a rigged 3D mesh, correcting head pose and texturing the mesh to create a 3D animated character. The method aims to overcome limitations of prior work and produce more accurate 3D character animations from just a single image.
This document summarizes the t-SNE technique for visualizing high-dimensional data in two or three dimensions. It explains that t-SNE is an advanced version of Stochastic Neighbor Embedding (SNE) that can better preserve local and global data structures compared to linear dimensionality reduction methods. The document outlines how t-SNE converts Euclidean distances between data points in high-dimensions to conditional probabilities representing similarity. It also discusses the "crowding problem" that occurs when mapping high-dimensional data to low-dimensions, and how t-SNE addresses this issue.
3. 1. Introduction
GoogLeNet 의 motivation
아주 단순하게도 image recognition model 의 성능을 높여보자
가장 쉬운 방법
NN의 depth와 width를 늘리는 것
문제
1. Parameter 개수 증가
2. computing resource 증가
→ overfitting
5. 2. Inception network의 아이디어
28 x 28 x 192
input
1 x 1 x 192 filter
3 x 3 x 192 filter
5 x 5 x 192 filter
Max pooling
32
128
64
32
output
28 x 28 x 256
6. 2. Inception network의 아이디어
가장 기본적인 아이디어는 한 레이어 안에서 여러 종류의 필터를 병렬적으로 사용해서
네트워크가 알아서 parameter나 필터 크기의 조합을 학습하는 것
7. 2. Inception network의 아이디어
1. 그냥
여러 크기의 필터를 병렬적으로 한번에 사용하게된 계기
1 x 1이 가장 좁은 scope
3 x 3이 약간 넓어진 scope
5 x 5이 더 넓어진 scope
얘네를 한번에 써볼까? 하는 아이디어
2. 어떤 크기의 필터를 고르는 것도 하나의 hyper-parameter
어떤 크기의 필터를 쓸지 고민하지 말고 그냥 한번에 다 써보자!
8. 2. Inception network의 아이디어
가장 기본적인 아이디어는 한 레이어 안에서 여러 종류의 필터를 병렬적으로 사용해서
네트워크가 알아서 parameter나 필터 크기의 조합을 학습하는 것
근데 여기에 큰 문제가 하나 있다
9. 2. Inception network의 아이디어
일반적인 ConvNet 보다 parameter가 엄청나게 많아짐
그래서 성능이 후졌다.
저 architecture를 유지하면서 연산량을 줄여야한다…
일반적인 ConvNet
inceptionNetwork의 Naive한 접근
13. 3. 1 x 1 filter의 역할
1 x 1 filter를 사용함으로써 얻을 수 있는 이점
1. 채널 수 조절
2. 계산량 감소
3. 비선형성
14. 3. 1 x 1 filter의 역할 – 채널 수 감소
높이, 너비, 채널 다 줄어든다
convolution
높이, 너비만 줄어든다
pooling
채널만 줄이고 싶다면?
?????
15. 채널만 줄이고 싶다면?
1 x 1 filter가 유용하다
3. 1 x 1 filter의 역할 – 채널 수 감소
16. 3. 1 x 1 filter의 역할 – 계산량 감소
5 x 5 filter로 Convolution 연산을 할 때의 계산량을 구해보자
192 x 5 x 5 x 28 x 28 x 32 = 1억 2천만
채널 수로 인한 연산
17. 3. 1 x 1 filter의 역할 – 계산량 감소
1 x 1 filter로 채널수를 줄이고 5 x 5 convolution 연산을 할 때의 연산량은
192 x 1 x 1 x 28 x 28 x 16 16 x 5 x 5 x 28 x 28 x 32+
= 240만 = 1천만
1240만 / 1억2천만 = 0.1
채널 수로 인한 연산
18. 3. 1 x 1 filter의 역할 – 계산량 감소
3 x 3, 5 x 5 같은 연산량이 많은 convolution 연산을 하기 전에
전처리 식으로 1 x 1 filter를 먼저 거쳐서 채널 수를 줄여준다
동일한 인풋 으로 5 x 5 convolution layer를 거치고 동일한 아웃풋이 나왔지만
1억 2천만 1240만
연산량은 10배 차이
19. 3. 1 x 1 filter의 역할 – 비선형성
동일한 input과 동일한 output이지만
1 x 1 filter를 사용한 경우에는 ReLU를 두번 거치기 때문에 더 강한 비선형성
더 강하고 좋은 representation 학습 가능
20. 4. GoogLeNet 파헤치기
Previous
Activation
28 x 28 x 192
1 x 1
CONV
5 x 5
CONV
16개
Output: 28 x 28 x 32
1 x 1
CONV
3 x 3
CONV96개
Output: 28 x 28 x 128
1 x 1
CONV
Output: 28 x 28 x 64
1 x 1
CONV
MAXPOOL
3 x 3
28 x 28 x 192
Output: 28 x 28 x 32
192가 너무 많아서 1 x 1 거친다
21. 4. GoogLeNet 파헤치기
Previous
Activation
28 x 28 x 192
1 x 1
CONV
5 x 5
CONV
16개
1 x 1
CONV
3 x 3
CONV96개
1 x 1
CONV
1 x 1
CONV
MAXPOOL
3 x 3
28 x 28 x 192
192가 너무 많아서 1 x 1 거친다
Channel
Concat
Output: 28 x 28 x 256
23. 4. GoogLeNet 파헤치기
처음에 이렇게 곁가지가 없는 형태였는데
네트워크가 워낙 깊다보니 ReLU를 사용했음에도
Gradient가 효과적으로 backprop되지 못하는 문제 발생
Given the relatively large depth of the network, the
ability to propagate gradients back through all the
layers in an effective manner was a concern.
24. 4. GoogLeNet 파헤치기
그래서 중간에 이런 곁가지들을 만들었다
잘 안 보이는데 끝단의 FC net, softmax와 output layer랑 같은 구조다
즉 중간에 예측을 하는 곁가지들이다
신기하게도 중간 곁가지들이 strong performance를 보였고
얘네들도 학습에 이용한다.
25. 4. GoogLeNet 파헤치기
저 곁가지들도 똑같이 Backprop을 하고,
loss 에 0.3이라는 가중치를 붙여서 전체 네트워크의 총 loss에 추가된다
이 구조가 regularization 효과를 가져왔다.
26. 4. GoogLeNet 파헤치기
자세한 설명은 없었지만 뇌피셜로 생각해보면
1번 곁가지를 학습하면서 생기는 weight, 2번에서 생기는 weight, 끝단에서 생기는 weight가
합쳐지면서
끝단에서 overfitting을 유발했던 weight를 중화시키는 효과
1
2
최종 예측을 할때는 곁가지는 제거하고 사용
27. 5. GoogLeNet In ILSVRC ILSVRC 2014 competition
…
1000개의 class label
120만개 training data
5만개 validation data
10만개 test data
metric
보통 Top-1 error rate, Top-5 error rate
data
이 두 가지를 많이 쓰는데
ILSVRC 2014에서는 Top-5 error만 사용
Top-5 error : 예측할 때 5개의 class label을 제시하고
그것 중 하나라도 맞으면 correct
Top-1 error : 예측할 때 1개의 class label을 제시하고
그것이 맞으면 correct