This document introduces convolutional neural networks (CNNs). It discusses how CNNs extract features using filters and pooling to build up representations of images while reducing the number of parameters. The key operations of CNNs including convolution, nonlinear activation, pooling and fully connected layers are explained. Examples of CNN applications are provided. The evolution of CNNs is then reviewed, from LeNet and AlexNet to VGGNet, GoogleNet, and improvements like ReLU, dropout, and batch normalization that helped CNNs train better and go deeper.
NICE: Non-linear Independent Components Estimation Laurent Dinh, David Krueger, Yoshua Bengio. 2014.
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio. 2017.
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma, Prafulla Dhariwal. 2018.
논문 리뷰 자료
요즘 Image관련 Deep learning 관련 논문에서 많이 나오는
용어인 Invariance와 Equivariance의 차이를 알기쉽게 설명하는 자료를 만들어봤습니다. Image의 Transformation에 대해
Equivariant한 feature를 만들기 위하여 제안된 Group equivariant Convolutional. Neural Networks 와 Capsule Nets에 대하여 설명
발표자: 이활석(NAVER)
발표일: 2017.11.
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨 지고 있습니다. 본 과정에서는 비지도학습의 가장 대표적인 방법인 오토인코더의 모든 것에 대해서 살펴보고자 합니다. 차원 축소관점에서 가장 많이 사용되는Autoencoder와 (AE) 그 변형 들인 Denoising AE, Contractive AE에 대해서 공부할 것이며, 데이터 생성 관점에서 최근 각광 받는 Variational AE와 (VAE) 그 변형 들인 Conditional VAE, Adversarial AE에 대해서 공부할 것입니다. 또한, 오토인코더의 다양한 활용 예시를 살펴봄으로써 현업과의 접점을 찾아보도록 노력할 것입니다.
1. Revisit Deep Neural Networks
2. Manifold Learning
3. Autoencoders
4. Variational Autoencoders
5. Applications
NICE: Non-linear Independent Components Estimation Laurent Dinh, David Krueger, Yoshua Bengio. 2014.
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio. 2017.
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma, Prafulla Dhariwal. 2018.
논문 리뷰 자료
요즘 Image관련 Deep learning 관련 논문에서 많이 나오는
용어인 Invariance와 Equivariance의 차이를 알기쉽게 설명하는 자료를 만들어봤습니다. Image의 Transformation에 대해
Equivariant한 feature를 만들기 위하여 제안된 Group equivariant Convolutional. Neural Networks 와 Capsule Nets에 대하여 설명
발표자: 이활석(NAVER)
발표일: 2017.11.
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨 지고 있습니다. 본 과정에서는 비지도학습의 가장 대표적인 방법인 오토인코더의 모든 것에 대해서 살펴보고자 합니다. 차원 축소관점에서 가장 많이 사용되는Autoencoder와 (AE) 그 변형 들인 Denoising AE, Contractive AE에 대해서 공부할 것이며, 데이터 생성 관점에서 최근 각광 받는 Variational AE와 (VAE) 그 변형 들인 Conditional VAE, Adversarial AE에 대해서 공부할 것입니다. 또한, 오토인코더의 다양한 활용 예시를 살펴봄으로써 현업과의 접점을 찾아보도록 노력할 것입니다.
1. Revisit Deep Neural Networks
2. Manifold Learning
3. Autoencoders
4. Variational Autoencoders
5. Applications
文献紹介:X3D: Expanding Architectures for Efficient Video RecognitionToru Tamaki
Christoph Feichtenhofer; X3D: Expanding Architectures for Efficient Video Recognition , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 203-213
https://openaccess.thecvf.com/content_CVPR_2020/html/Feichtenhofer_X3D_Expanding_Architectures_for_Efficient_Video_Recognition_CVPR_2020_paper.html
データ拡張 (Data Augmentation) を学習中に使い分けるRefined Data Augmentationについて解説しました。
He, Zhuoxun, et al. "Data augmentation revisited: Rethinking the distribution gap between clean and augmented data." arXiv preprint arXiv:1909.09148 (2019).
文献紹介:X3D: Expanding Architectures for Efficient Video RecognitionToru Tamaki
Christoph Feichtenhofer; X3D: Expanding Architectures for Efficient Video Recognition , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 203-213
https://openaccess.thecvf.com/content_CVPR_2020/html/Feichtenhofer_X3D_Expanding_Architectures_for_Efficient_Video_Recognition_CVPR_2020_paper.html
データ拡張 (Data Augmentation) を学習中に使い分けるRefined Data Augmentationについて解説しました。
He, Zhuoxun, et al. "Data augmentation revisited: Rethinking the distribution gap between clean and augmented data." arXiv preprint arXiv:1909.09148 (2019).
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different
classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make training
faster, we used non-saturating neurons and a very efficient GPU implementation
of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
computer vision 분야에서 dominant 한 Convolutional Layer를 일절 사용하지 않고, NLP에서 제안된 순수 Transformer의 architecture를 그대로 가져와 Attention과 일반 Feed Forward NN만을 이용하여 SOTA수준의 Image Classification Model을 구축한다.
TAVE research seminar 21.03.30 발표자료
발표자: 오창대
image classification is a common problem in Artificial Intelligence , we used CIFR10 data set and tried a lot of methods to reach a high test accuracy like neural networks and Transfer learning techniques .
you can view the source code and the papers we read on github : https://github.com/Asma-Hawari/Machine-Learning-Project-
Finding the best solution for Image ProcessingTech Triveni
What is beyond using Tensorflow, GPU or TPU to process images seamlessly? Do we have a silver bullet for image processing? Over the years, image processing has picked up a different level of attraction. Everyone can think about its ease of usability because it has become a reality now. We have started seeing how Residual Neural Network architecture is being used for different cases and not only that, how Residual Neural network is being tweaked to solve different problems. Along with tweaking the ResNet, preprocessing is also being improved to support different architecture for this matter.
Everyone has almost become cyborg already with mobile phones in our hands and apparently until human beings bring the AI/ML to the phones completely they are not taking any rest. We are going to see the development of different architecture and algorithms around running AI/ML on low configuration devices.
In this session, we are going to talk about different research papers submitted for these matters and some implementations for the same as well.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Similar to Convolutional neural networks 이론과 응용 (20)
Robot의 Gait optimization, Gesture Recognition, Optimal Control, Hyper parameter optimization, 신약 신소재 개발을 위한 optimal data sampling strategy등과 같은 ML분야에서 약방의 감초 같은 존재인 GP이지만 이해가 쉽지 않은 GP의 기본적인 이론 및 matlab code 소개
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
Deep learning기법을 이상진단 등에 적용할 경우, 정상과 이상 data-set간의 심각한 unbalance가 문제. 본 논문에서는 GAN 기법을 이용하여 정상 data-set만의 Manifold(축약된 모델)를 찾아낸 후 Query data에 대하여 기 훈련된 GAN 모델로 Manifold로의 mapping을 수행함으로서 기 훈련된 정상 data-set과의 차이가 있는지 여부를 판단하여 Query data의 이상 유무를 결정하고 영상 내에 존재하는 이상 영역을 pixel-wise segmentation 하여 제시함.
One-stage Network(YOLO, SSD 등)의 문제점 예를 들어 근본적인 문제인 # of Hard positives(object) << # of Easy negatives(back ground) 또는 large object 와 small object 를 동시에 detect하는 경우 등과 같이 극단적인 Class 간 unbalance나 난이도에서 차이가 나는 문제가 동시에 존재함으로써 발생하는 문제를 해결하기 위하여 제시된 Focal loss를 class간 아주 극단적인 unbalance data에 대한 classification 문제(예를 들어 1:10이나 1:100)에 적용한 실험결과가 있어서 정리해봤습니다. 결과적으로 hyper parameter의 설정에 매우 민감하다는 실험결과와 잘만 활용할 경우, class간 unbalance를 해결하기 위한 data level의 sampling 방법이나 classifier level에서의 특별한 고려 없이 좋은 결과를 얻을 수 있다는 내용입니다.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
2. 목차
Convolutional Neural Nets ?
Convolutional Neural Nets의 응용예
Convolutional Neural Nets의 동작원리
Convolutional Neural Nets의 진화과정
Brief intro : Invariance and Equivariance
Limitations of CNN
Group CovNet
Capsule Net
2
3. 3
x_image
(28x28)
Reshape
28x28 784x1 vector
.
.
.
10 digits
.
.
W, bx y=softmax(Wx+b)
Neural Nets
# of unknown parameters to estimate = # of weights + # of bias
= 784x10+10 = 7,850 !!!
• 일반적인 Neural Net의 경우, 입력 이미지의 pixel 정보로 부터 시작
• 고해상도 이미지를 고속으로 처리가 불가능
CONVOLUTIONAL NEURAL NETS ?
4. CONVOLUTIONAL NEURAL NETS ?
딥러닝 기반 시각인지를 위한 Networks
4
• CNN은 간단한 형상의 Patch(Filter or Kernel) 단위로 특징 추출
• 상위계층으로 진행될 수록 사물의 전체 형상을 구성
추정해야 할 parameter의 수가 줄어듬
5. 5
• Color images are three dimensional and so have a volume
• Time domain speech signals are 1-d while the frequency domain
representations (e.g. MFCC vectors) take a 2d form. They can also be
looked at as a time sequence.
• Medical images (such as CT/MR/etc) are multi-dimensional
• Videos have the additional temporal dimension compared to stationary
images
• Variable length sequences and time series data are again multi-dimensional
• Hence it makes sense to model them as tensors instead of vectors.
CONVOLUTIONAL NEURAL NETS ?
Types of inputs
6. 6
• Image retrieval from database
• Object Detection
• Self driving cars
• Semantic segmentation
• Face recognition (FB tagging)
• Pose estimation
• Detect diseases
• Speech Recognition
• Text processing
• Analysing satellite data
CONVOLUTIONAL NEURAL NETS의 응용 예
CNNs are everywhere
8. 8
물체 감지 및 인식
시각인지 기능을 이용하여 물체의 class와 BB 제시
CONVOLUTIONAL NEURAL NETS의 응용 예
다양한 Convolution layers
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
10. 10
CLs FCLs A1
Action
Sequential Front view
End to end learning for Self-driving car
시각인지와 자동차의 행동을 학습하여 자율주행을 수
행
https://youtu.be/qhUvQiKec2U
CONVOLUTIONAL NEURAL NETS의 응용 예
11. 11
90x1
224x224 pixels
Smart picking robot based on Deep learning
시각인지와 강화학습을 통한 산업용 로봇 훈련
CONVOLUTIONAL NEURAL NETS의 응용 예
12. 12
Feature Extraction Layer Classification Layer
CNN 은 Feature Extraction과 Classification Layer로 구성
CONVOLUTIONAL NEURAL NETS의 구조
19. 2X2 MAX POOLING WITH STRIDE=1
3 0 1
0 0 2
0 2 3
1 0 1
0 0 0
3 1 0
3 2
2 3
1 1
3 1
max pooling
max pooling
CONVOLUTIONAL NEURAL NETS의 동작원리
20. 20
Dimension Reduction
Add Spatial(Translation & Rotation) Invariance to
Feature Maps
• Be able to recognize feature regardless of angle, direction
or skew.
• Does not care where feature is, as long as it maintains its
relative position to other features.
CONVOLUTIONAL NEURAL NETS의 동작원리
Why Pooling ?
Spatial Invariance
24. Flattening takes the pooled layer and flattens it in sequential order into
a single vector.
• Vector is used as the input to the Classifier
Flattening
CONVOLUTIONAL NEURAL NETS의 동작원리
26. 26
CONVOLUTIONAL NEURAL NETS의 진화
LeNet to ResNet: A Deep Journey
LeNet5 (1998): The origin of convolutional neural network
• Repeat of Convolution – Pooling – Non
Linearity
• Average pooling
• Sigmoid activation for the intermediate
layer
• tanh activation at F6
• 5x5 Convolutionfilter
• 7 layers and less than 1M parameters
• Use of convolution to extract
spatial features
• Subsample using spatial average
ofmaps
• Sparse connection matrix
between layers to avoid large
computationalcost
Characteristics Key Contributions
• Slow totrain
• Hard to train (Neuronsdies
quickly)
• Lack of data
The Gap
27. 27
CONVOLUTIONAL NEURAL NETS의 진화
• ImageNet is an image database organized according to the WordNet hierarchy
• is formally a project aimed at (manually) labeling and categorizing images
• ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
• Training Data: 1.2 Million Images, 1000+ categories
• Validation and Test Data: 150K Images, 50K Validation, Remaining Test
• Image Net Data: http://image-net.org/challenges/LSVRC/2010/browse-synsets
• Multiple Challenges; Object recognition, localization etc.
28. IMAGENET CLASSIFICATION RESULTS
<2012 Result>
• Krizhevsky et al. – 16.4% error(top-5)
• Next best (non-convnet) – 26.2%
error
<2013 Result>
• All rankers use deep
learning(Convnet)
Revolution of Depth!
AlexNet
CONVOLUTIONAL NEURAL NETS의 진화
29. 29
CONVOLUTIONAL NEURAL NETS의 진화
ALEXNET (2012)
• GPU and training in
parallel
• ReLu Activation
• Dropout regularization
• Image Augmentation
Characteristics Key Contributions
- 11x11, 5x5 and 3x3 Convolutions
- Max pooling
- 3 FC layers
- 60 Million parameters
30. 30
CONVOLUTIONAL NEURAL NETS의 진화
A 4 layer CNN with ReLUs is 6
times faster than equivalent
network with thanh in
reaching 25% error rate on
CIFR-10 dataset
RELU NON-LINEARITY – SIMPLER ACTIVATION
31. Ljubljana, June 2016
Deep learning - ReLU
How does sigmoid function affect learning?
• Enables easier computation of derivative but has negative effects:
– Neuron never reaches 1 or 0 saturating
– Gradient reduces the magnitude of error
• Leads to two problems:
• Slow learning when neurons saturated i.e. big z values
• Vanishing gradient problem (gradient always 25% of error from previous layer!!)
32. Ljubljana, June 2016
Deep learning - ReLU
• Alex Krizhevsky (2011) proposed Rectified Linear Unit instead of sigmoid function
• Main purpose of ReLu: reduces saturation and vanishing gradient issues
• Still not perfect:
– Stops learning at negative z values (can use piecewise linear - Parametric ReLu, He 2015 from
Microsoft)
– Bigger risk of saturating neurons to infinity
33. Ljubljana, June 2016
Deep learning - dropout
• Too many weights cause overfitting issues
• Weight decay (regularization) helps but is not perfect
– Also adds another hyper-parameter to setup manually
• Srivastava et al. (2014) proposed a kind of „bagging“ for deep nets (actually Alex
Krizhevsky already used it in AlexNet in 2011)
• Main point:
– Robustify network by disabling neurons
– Each neuron has a probability, usually of 0.4, of being disabled
– Remaining neurons must adept to work without them
• Applied only to fully connected layers
– Conv. layers less susceptible to overfitting
Srivastava et al., Dropout : A Simple Way to Prevent Neural Networks from Overfitting, JMLR 2014
34. Ljubljana, June 2016
Deep learning – batch norm
• Input needs to be whitened i.e. normalized (LeCun 1998,
Efficient BackProp)
– Usually done on first layer input only
• The same reason for normalization of first layer exists for
other layers as well
• Ioffe and Szegedy, Batch Normalization, 2015
– Normalize input to each layer
– Reduce internal covariance shift
– Too slow to normalize all input data (>1M samples)
– Instead normalize within mini-batch only
– Learning: norm over mini-batch data
– Inference: norm over all trained input data
Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network
Training by Reducing Internal Covariate Shift, 2015
Better results while allowing to use higher learning rate, higher decay, no dropout, no LRN.
35. 35
VGG (2014) • Smaller size convolution 3x3 throughout the net
• Sequence of 3x3 convolution can emulate
larger receptive fields, e.g., 5x5 or 7x7
• Use of 1x1 convolution
• Decrease in spatial volume and increase in
depth of input
What's the advantage of using 3 layers of
3x3 instead of one layer of 7x7?
• 3 non-linear rectification layers
• Less number of parameters, 27C2 as opposed to
49C2
Key Points
• Depth is important
• Simplify the network to go deep
• 140M parameters
(mostly due to the FC layers)
CONVOLUTIONAL NEURAL NETS의 진화
36. 36
VGG(2ND PLACE IN 2014)
3x3 filter만 반복해서 사용
Why??
Convolution filter를 stack하면 더 큰
receptive field를 가질 수 있음
2개의 3x3 filter = 5x5 filter
3개의 3x3 filter = 7x7 filter
Parameter수는 큰 filter 사용하는 경우에
비하여 감소
regularization 효과
“Very Deep Convolutional Networks for Large-Scale Image Recognition”
37. 37
CONVOLUTIONAL NEURAL NETS의 진화
GOOGLENET OR INCEPTION (2014)
• 22 Layer CNN
• Heavy use of 1x1 ‘Network in Network’
• Use of average pooling before the classification
• Auxiliary classifiers connected to intermediate layers
• During training add the loss of the auxiliary classifiers
with a discount (0.3) weight
38. 38
GOOGLENET KEY IDEAS
• 3x3 or 5x5 중 어떤 것이 좋은가 ?
• 전부 다 사용해보자
연산량이 많아진다.
Naïve Version
Modified Idea
Way too many output!!! Use 1x1 for dimensionality reduction
Why 1x1 convolution?
• Introduced as “Network in Network” in 2014
• Is a way to increase Non-Linearity and spatially combine
features across feature maps
Only 4M parameters compared to
60M in AlexNet
39. 39
GOOGLENET KEY IDEAS
1x1 convolution을 사용하여 dimension reduction
Feature map의 개수를 절반으로 줄여 총 연산량은 비슷하게
40. 40
GOOGLENET KEY IDEAS
Input layer Kth feature map,
output layer
X11
Xij
y11,k
yij,kwk
wk
X11 : 1x256 vector, wk : 1x256 weight vector, Yij,k = f(Xij·wk), f() : Nonlinear ft’n
x y
w
1x1 Convolution의 dimension reduction 원리
Fully Connected NN을 이용한
Feature Dimension Reduction원리와 동일
41. 41
RESNET (RESIDUAL NEURAL NETWORK) (2015)
CONVOLUTIONAL NEURAL NETS의 진화
• Introduce shortcut connections (exists in prior literature in various forms)
• Key invention is to skip 2 layers. Skipping single layer didn’t give much
improvement for some reason
42. 42
RESNET
Layer수가 많을수록 항상 좋을까?
56개의 layer를 사용하는 경우가 20개의 layer를 사용하는 경우에 비
해 training error가 더 큰 결과가 나옴
43. 더 deep한 model은 training error가
더 낮아야 하지만
Deep한 model은 optimization이 쉽지 않다
는 것을 발견(identity도 힘들다)
원인 : vanishing/exploding gradient
학습시켜야 할 파라메터 수의 증가
A shallower model
(18 layers)
A deeper model
(34 layers)
“Deep Residual Learning for Image Recognition”
RESNET
44. RESNET의 KEY IDEA
Identity는 그대로 상위 layer로 전달하고, 나머지 부분만 학습
H(x)를 얻는 것이 목표가 아니라 F(x)=H(x)-x 를 목표로
F(x) ~0 이므로 수렴이 빠름
Identity shortcut을 통한 효과
- 깊은 망의 최적화도 가능
- 깊이에 비례해 정확도 개선
“Deep Residual Learning for Image Recognition”
45. BOTTLENECK : A PRACTICAL DESIGN
• # parameters
• 256 x 64 + 64 x 3 x 3x 64 + 64 x 25
6 = ~70K
• # parameters just using 3 x 3 x 256 x 2
56 conv layer = ~600K
1x1 conv를 이용하여 dimension reduction 3x3 conv
1x1 conv를 이용하여 dimension expansion
연산량을 줄이기 위함
RESNET의 KEY IDEA
46. Dilated convolutions
The goal of this layer is to increase the size of the receptive field
(input activations that are used to compute a given output)
without using downsampling (in order to preserve local information).
Increasing the size of the receptive field allows to use more context
(information spatially further away).
The idea is to spread the input images and fill the added pixels with
zeros, and then compute a convolution.
51. CovNet are translational Equivalent
This demonstrates LeNet-5's invariance to small rotations (+/-40 degrees).
How about Rotation ?
Limitation of Conventional CovNet
52. 2D convolution is equivariant under translation, but not under rotation
Limitation of Conventional CovNet
53. Invariance
Φ
Image(X)
Feature(Z) Z1 = Z = Z2
𝑇𝑔
1
Mapping
ft’n(Φ(·))
Φ
Transformation
X1 X2
Z = Z1 = Φ(X1) = Z2 = Φ(X2) = Φ(𝑻 𝒈
𝟏
X1 )
: Mapping independent of transformation, 𝑇𝑔, for all 𝑇𝑔
X2 = 𝑇𝑔
1
X1
54. To make a Convolutional Neural Networks (CNN) transformation-
invariant, data augmentation with training samples is generally used
Invariance
55. Equivariance
Φ
Image(X)
Feature(Z) Z1 Z2
𝑇𝑔
2
𝑇𝑔
1
Φ
Transformation
X1 X2
Z2 = 𝑻 𝒈
𝟐
Z1 = 𝑻 𝒈
𝟐
Φ(X1) = Φ(𝑻 𝒈
𝟏
X1 )
: Invariance is special case of equivariance where 𝑇𝑔
2 is the identity.
X2 = 𝑇𝑔
1
X1
Z2 = 𝑇𝑔
2
Z1
: Mapping preserves algebraic structure of transformation
Z1 ≠ Z2 but keeps the relationship
Mapping
ft’n(Φ(·))
56. Equivariance : Group CovNet
To understand the rotation or proportion change of a given entity, a
group of filters(a combination of rotated and mirror reflected versions of
filter) is adopted.
For example, the group p4 which contains translations and rotations by
multiples of ninety degrees, or, which additionally contains mirror
reflections.
: Rotation
: Mirror reflections
57. A filter in a G-CNN detects co-occurrences of features that have the
preferred relative pose, and can match such a feature constellation in
every global pose through an operation called the G-convolution.
Equivariance : Group CovNet
Filter group 1
Filter group 2
Filter group N
58. Visualization of classic 2D convolution
Visualization of the G-Conv for the roto-translation group
G-Convolution
Equivariance : Group CovNet
60. Equivariance : Group CovNet
Latent representations learnt by a CNN and a G-CNN.
- The left part is the result of a typical CNN while the right one is that of a G-
CNN.
- In both parts, the outer cycles consist of the rotated images while the inner
cycles consist of the learnt representations.
- Features produced by a G-CNN is equivariant to rotation while that produced
by a typical CNN is not.
61. What we need : EQUIVARIANCE (not invariance)
“Equivariance makes a CNN understand the rotation or proportion change”
Equivariance : Capsule Net
62. “A capsule is a group of neurons whose activity vector represents
the instantiation parameters of a specific type of entity such as an
object or an object part.”
Equivariance : Capsule Net
63. Equivariance of Capsules
“A capsule is a group of neurons whose activity vector represents the
instantiation parameters of a specific type of entity such as an object or
an object part.”
Activity vector map Object
Equivariance : Capsule Net