SlideShare a Scribd company logo
1 of 42
A
Moment
to Remember
October 31, 2016
Jinwon Lee / CAPP, SNU
What I would talk is…
Catastrophic Forgetting
Today’s Papers
• Zhizhong Li, Derek Hoiem, “Learning without Forgetting”,
arxiv:1606.09282v2
• Heechul Jung, Jeongwoo Ju, Minju Jung, Junmo Kim, “Less-
forgetting Learning in Deep Neural Networks”,
arxiv:1607.00122v1
Learning without Forgetting
How Can I Do?
• My machine already studied MNIST
• However, I want to make the machine solve the below
problem
• I should teach the machine to classify “+” and “=“. How??
Some Learning Methods
proposed
Comparisons between Various
Methods
Learning without Forgetting
• Recording responses yo on each new task image from the
original network for outputs on the old tasks(defined by θs
and θo)
• Nodes for each new class are added to the output layer with
randomly initialized weights θn(# of new classes x # of nodes
in the last shared layer)
• Training the network to minimize loss for all tasks and
regularization R using stochastic gradient descent
• First, freezing θs and θo and train θn to convergence. Then,
jointly training all weights until convergence
Procedure for Learning without
Forgetting
ILSVRC 2012 & PASCAL VOC 2012
Places 2 in ILSVRC 2015
MIT Indoor Scene Classification(Scene)
Caltech-UCSD Birds-200-2011 Fine-
Grained Classification(CUB)
Single New Task Scenario
Multiple New Task Scenario
Influence of Data Size
Something Else… (Some Alternatives)
Less-forgetting Learning in Deep
Neural Networks
Forgetting Problem(Fine-tuning)
Toy Example
• CIFAR-10 dataset
 60,000개 32x32x3 image
50,000 for training, 10,000 for testing
 10개 class
Experiment #1
• 학습 데이터를 40,000(group 1), 10,000(group 2)로 구성
• Group 1로 network 학습(test는 group 2)
• 학습된 weights를 initial weights로 하여 group 2를 다시 학습
Why?
• 새로운 dataset의 size가 너무 작아서 새로운 dataset에 적응하
면서 generalization의 능력을 잃어버리는 게 아닐까???
Experiment #2
• 학습 데이터를 30,000(group 1), 30,000(group 2)로 구성
• Group 1로 network 학습(test는 group 2)
• 학습된 weights를 initial weights로 하여 group 2를 다시 학습
Experiment #3
• 학습 데이터를 30,000(group 1), 40,000(group 2)로 구성(교집합
10,000장)
• Group 1로 network 학습(test는 group 2)
• 학습된 weights를 initial weights로 하여 group 2를 다시 학습
Observation & Goal
• Observation
 Fine-tuning은 새로운 dataset으로 학습 시, 기존의 dataset에 대한 성
능이 저하되는 문제가 있음
 기존 data와 새로운 data가 많이 다르지 않고, 양이 많다면 성능이 덜
저하됨
 기존 data 중 일부를 이용할 수 있다면 좀 더 성능 저하를 막을 수 있음.
• Goal
 이전 dataset을 다시 학습하지 않고, 새로운 dataset으로 fine-tuning
후에도 기존의 dataset에 대해 인식률 저하가 일어나지 않는 기법 개발
http://giphy.com/gifs/3o8dpb4pFmGtvyJtqE
Less Forgetting Learning
• 새로운 내용을 학습하더라도 기존에 학습한 것을 덜 잊어버리
도록 하는 learning 기법
 Source data : 기존 환경의 data
 Target data : 새로운 환경의 data
 Source network : 기존 환경에 대해 학습한 network
 Target network : 새 환경에 대해 학습할 network
New Learning Scheme for Forgetting
Less
• Property 1
 Target data를 학습하고 난 후에도 decision boundary가 변하지 않아
야 함
• Property 2
 Target network에서 추출된 source data의 high level feature들이 같은
class의 source feature들과 feature space에서 비슷한 위치에 분포하여
야 함
• Source data에 접근할 수 없음
New Learning Scheme
• Property 1 구현 – Softmax layer의 weights를 freezing
• Property 2 구현 – 두 가지 loss functio을 정의
 Softmax loss
 Euclidean loss
• Input layer에 target data가 입력됨
Algorithm Details
Ni and Nb in the algorithm denote the number of iterations
and the size of mini-batch.
Cifar-10 Feature Visualization
Experimental Results #1
• 총 60,000장의 영상 중 50,000장은 training, 10,000장은 test
• Training data는 다시 40,000장과 10,000장으로 나눈 후 10,000
장에 대해서는 grayscale 영상으로 변환
• Test set 10,000장을 gray scale로 변환하여 두 종류의 test set을
제작
• Channel이 다른 data를 같은 network에서 test하기 위해 gray
scale의 channel을 임의로 3으로 늘려서 실험
Experimental Results #1
Realistic Dataset(Imagenet)
• source dataset : imagenet dataset class당 약 1200장
• target dataset : 밝기가 밝거나 어두운 영상 class당 약 100장
Experimental Result #2
• 50개의 class 선정
Experimental Results #2
• Softmax output 수를 50개로 바꾼 GoogleNet을 기본 Network
로 이용
• Original GoogleNet에서는 총 세 개의 loss function을 사용하지
만 본 실험에서는 가장 최상위 loss function만 남기고 나머지는
제거
Experimental Results #2
Forgetting in General Learning Cases
Less-forgetting for General Learning
Cases(Algorithm)
• If the value of Ns is large,
the network does not
adapt a new data well.
Further, the parameter of
Nf plays a role similar to
Ns. As a result, our
algorithm has an ability to
forget less the
information learned
previously. We set the
value of Ns is smaller than
Nf , and we set Ns = 100
and Nf = 1000 for all the
experiments.
Generalization Test

More Related Content

What's hot

Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs홍배 김
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Mad Scientists
 
Deep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesDeep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesKang Pilsung
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events홍배 김
 
Intriguing properties of contrastive losses
Intriguing properties of contrastive lossesIntriguing properties of contrastive losses
Intriguing properties of contrastive lossestaeseon ryu
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkrlawjdgns
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks ISang Jun Lee
 
Siamese neural networks for one shot image recognition paper explained
Siamese neural networks for one shot image recognition paper explainedSiamese neural networks for one shot image recognition paper explained
Siamese neural networks for one shot image recognition paper explainedtaeseon ryu
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...Sunghoon Joo
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksSanghoon Yoon
 
keras 빨리 훑어보기(intro)
keras 빨리 훑어보기(intro)keras 빨리 훑어보기(intro)
keras 빨리 훑어보기(intro)beom kyun choi
 
딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해Hee Won Park
 
From maching learning to deep learning
From maching learning to deep learningFrom maching learning to deep learning
From maching learning to deep learningYongdae Kim
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장Sunggon Song
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
 
Learning by association
Learning by associationLearning by association
Learning by association홍배 김
 
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningSang Jun Lee
 
합성곱 신경망
합성곱 신경망합성곱 신경망
합성곱 신경망Sunggon Song
 

What's hot (20)

Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
 
Deep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesDeep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniques
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events
 
Intriguing properties of contrastive losses
Intriguing properties of contrastive lossesIntriguing properties of contrastive losses
Intriguing properties of contrastive losses
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
 
Siamese neural networks for one shot image recognition paper explained
Siamese neural networks for one shot image recognition paper explainedSiamese neural networks for one shot image recognition paper explained
Siamese neural networks for one shot image recognition paper explained
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
keras 빨리 훑어보기(intro)
keras 빨리 훑어보기(intro)keras 빨리 훑어보기(intro)
keras 빨리 훑어보기(intro)
 
딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해
 
From maching learning to deep learning
From maching learning to deep learningFrom maching learning to deep learning
From maching learning to deep learning
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
Learning by association
Learning by associationLearning by association
Learning by association
 
CNN
CNNCNN
CNN
 
Dl from scratch(8)
Dl from scratch(8)Dl from scratch(8)
Dl from scratch(8)
 
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised Learning
 
합성곱 신경망
합성곱 신경망합성곱 신경망
합성곱 신경망
 

Similar to Deep learning seminar_snu_161031

인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝Jinwon Lee
 
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentationtaeseon ryu
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Korea, Sejong University.
 
Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리SANG WON PARK
 
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5SANG WON PARK
 
Nationality recognition
Nationality recognitionNationality recognition
Nationality recognition준영 박
 
Convolutional neural networks
Convolutional neural networksConvolutional neural networks
Convolutional neural networksHyunjinBae3
 
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...태엽 김
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...gohyunwoong
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDong Heon Cho
 
Image classification
Image classificationImage classification
Image classification종현 김
 
Workshop 210417 dhlee
Workshop 210417 dhleeWorkshop 210417 dhlee
Workshop 210417 dhleeDongheon Lee
 
Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2KyeongUkJang
 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural networkKyeongUkJang
 
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQNCurt Park
 
[신경망기초] 심층신경망개요
[신경망기초] 심층신경망개요[신경망기초] 심층신경망개요
[신경망기초] 심층신경망개요jaypi Ko
 

Similar to Deep learning seminar_snu_161031 (20)

인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝
 
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks
 
Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리
 
Review EDSR
Review EDSRReview EDSR
Review EDSR
 
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
 
Nationality recognition
Nationality recognitionNationality recognition
Nationality recognition
 
Convolutional neural networks
Convolutional neural networksConvolutional neural networks
Convolutional neural networks
 
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other Models
 
Image classification
Image classificationImage classification
Image classification
 
DL from scratch(4~5)
DL from scratch(4~5)DL from scratch(4~5)
DL from scratch(4~5)
 
Workshop 210417 dhlee
Workshop 210417 dhleeWorkshop 210417 dhlee
Workshop 210417 dhlee
 
Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2
 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural network
 
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQN
 
LeNet & GoogLeNet
LeNet & GoogLeNetLeNet & GoogLeNet
LeNet & GoogLeNet
 
[신경망기초] 심층신경망개요
[신경망기초] 심층신경망개요[신경망기초] 심층신경망개요
[신경망기초] 심층신경망개요
 

More from Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 

More from Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 

Deep learning seminar_snu_161031

  • 1. A Moment to Remember October 31, 2016 Jinwon Lee / CAPP, SNU
  • 2. What I would talk is…
  • 4. Today’s Papers • Zhizhong Li, Derek Hoiem, “Learning without Forgetting”, arxiv:1606.09282v2 • Heechul Jung, Jeongwoo Ju, Minju Jung, Junmo Kim, “Less- forgetting Learning in Deep Neural Networks”, arxiv:1607.00122v1
  • 6. How Can I Do? • My machine already studied MNIST • However, I want to make the machine solve the below problem • I should teach the machine to classify “+” and “=“. How??
  • 9. Learning without Forgetting • Recording responses yo on each new task image from the original network for outputs on the old tasks(defined by θs and θo) • Nodes for each new class are added to the output layer with randomly initialized weights θn(# of new classes x # of nodes in the last shared layer) • Training the network to minimize loss for all tasks and regularization R using stochastic gradient descent • First, freezing θs and θo and train θn to convergence. Then, jointly training all weights until convergence
  • 10. Procedure for Learning without Forgetting
  • 11. ILSVRC 2012 & PASCAL VOC 2012
  • 12. Places 2 in ILSVRC 2015
  • 13. MIT Indoor Scene Classification(Scene)
  • 15. Single New Task Scenario
  • 16. Multiple New Task Scenario
  • 18. Something Else… (Some Alternatives)
  • 19. Less-forgetting Learning in Deep Neural Networks
  • 21. Toy Example • CIFAR-10 dataset  60,000개 32x32x3 image 50,000 for training, 10,000 for testing  10개 class
  • 22. Experiment #1 • 학습 데이터를 40,000(group 1), 10,000(group 2)로 구성 • Group 1로 network 학습(test는 group 2) • 학습된 weights를 initial weights로 하여 group 2를 다시 학습
  • 23. Why? • 새로운 dataset의 size가 너무 작아서 새로운 dataset에 적응하 면서 generalization의 능력을 잃어버리는 게 아닐까???
  • 24. Experiment #2 • 학습 데이터를 30,000(group 1), 30,000(group 2)로 구성 • Group 1로 network 학습(test는 group 2) • 학습된 weights를 initial weights로 하여 group 2를 다시 학습
  • 25. Experiment #3 • 학습 데이터를 30,000(group 1), 40,000(group 2)로 구성(교집합 10,000장) • Group 1로 network 학습(test는 group 2) • 학습된 weights를 initial weights로 하여 group 2를 다시 학습
  • 26. Observation & Goal • Observation  Fine-tuning은 새로운 dataset으로 학습 시, 기존의 dataset에 대한 성 능이 저하되는 문제가 있음  기존 data와 새로운 data가 많이 다르지 않고, 양이 많다면 성능이 덜 저하됨  기존 data 중 일부를 이용할 수 있다면 좀 더 성능 저하를 막을 수 있음. • Goal  이전 dataset을 다시 학습하지 않고, 새로운 dataset으로 fine-tuning 후에도 기존의 dataset에 대해 인식률 저하가 일어나지 않는 기법 개발
  • 28. Less Forgetting Learning • 새로운 내용을 학습하더라도 기존에 학습한 것을 덜 잊어버리 도록 하는 learning 기법  Source data : 기존 환경의 data  Target data : 새로운 환경의 data  Source network : 기존 환경에 대해 학습한 network  Target network : 새 환경에 대해 학습할 network
  • 29. New Learning Scheme for Forgetting Less • Property 1  Target data를 학습하고 난 후에도 decision boundary가 변하지 않아 야 함 • Property 2  Target network에서 추출된 source data의 high level feature들이 같은 class의 source feature들과 feature space에서 비슷한 위치에 분포하여 야 함 • Source data에 접근할 수 없음
  • 30. New Learning Scheme • Property 1 구현 – Softmax layer의 weights를 freezing • Property 2 구현 – 두 가지 loss functio을 정의  Softmax loss  Euclidean loss • Input layer에 target data가 입력됨
  • 31. Algorithm Details Ni and Nb in the algorithm denote the number of iterations and the size of mini-batch.
  • 33. Experimental Results #1 • 총 60,000장의 영상 중 50,000장은 training, 10,000장은 test • Training data는 다시 40,000장과 10,000장으로 나눈 후 10,000 장에 대해서는 grayscale 영상으로 변환 • Test set 10,000장을 gray scale로 변환하여 두 종류의 test set을 제작 • Channel이 다른 data를 같은 network에서 test하기 위해 gray scale의 channel을 임의로 3으로 늘려서 실험
  • 35.
  • 36. Realistic Dataset(Imagenet) • source dataset : imagenet dataset class당 약 1200장 • target dataset : 밝기가 밝거나 어두운 영상 class당 약 100장
  • 37. Experimental Result #2 • 50개의 class 선정
  • 38. Experimental Results #2 • Softmax output 수를 50개로 바꾼 GoogleNet을 기본 Network 로 이용 • Original GoogleNet에서는 총 세 개의 loss function을 사용하지 만 본 실험에서는 가장 최상위 loss function만 남기고 나머지는 제거
  • 40. Forgetting in General Learning Cases
  • 41. Less-forgetting for General Learning Cases(Algorithm) • If the value of Ns is large, the network does not adapt a new data well. Further, the parameter of Nf plays a role similar to Ns. As a result, our algorithm has an ability to forget less the information learned previously. We set the value of Ns is smaller than Nf , and we set Ns = 100 and Nf = 1000 for all the experiments.