SlideShare a Scribd company logo
1 of 66
Download to read offline
LifelongLearning
withDEN
(DynamicallyExpandableNetworks)
KAIST 전산학부기계학습및지능연구실(MLILAB)
석사과정이준영
2018-08-08
이 슬라이드는 ICLR 2018에 발표된
Yoon, Jaehong 등이 저술한
"Lifelong Learning with Dynamically Expandable Networks.” 논문을
스터디원들에게 공유하기 위해 만든 슬라이드입니다.
Contents
• Prerequisites
• RelatedWorks
• DynamicallyExpandableNetworks
• Experiments
• ConclusionandDiscussion
Introduction
l1-Regularizer
*2018봄 KAIST황성주교수님전산학부딥러닝개론 수업슬라이드에발췌.
Prerequisites
l1-regularizer를사용하게되면,
무슨효과가있을까?
l1-Regularizer
*2018봄 KAIST황성주교수님전산학부딥러닝개론 수업슬라이드에발췌.
Prerequisites
l1-regularizer를사용하게되면,
무슨효과가있을까?
->DimensionReduction효과!
다시말하면,변수를Sparse하게해주는효과가있음.Why?
l1-Regularizer
*2018봄 KAIST황성주교수님전산학부딥러닝개론 수업슬라이드에발췌.
Prerequisites
l1-norm이특정값(상수)이되도록
hard-constraint를주면,
높은확률로솔루션이절편에서나온다.
즉,솔루션이sparse하도록유도하는것이다.
솔루션이sparse하면,
dimension이reduction되는효과도
얻을수있다.
l1-Regularizer
*2018봄 KAIST황성주교수님전산학부딥러닝개론 수업슬라이드에발췌.
Prerequisites
다만,hard-constraint가아니고,
실제로regularizer로사용될때에는
0에가까운값이많아지는효과가있다.
그래서,실제로sparse하게만들고싶으면,
일정값이하에대한weight를0으로
만들어버리면된다.
Multi-taskLearning
• Task란?
• 머신러닝알고리즘이수행해야하는작업.
• Ex.개와고양이구별하기,손글씨로쓴숫자를보고어떤
숫자인지맞추기.
Prerequisites
Multi-taskLearning
• Task란?
• 머신러닝알고리즘이수행해야하는작업.
• Ex.개와고양이구별하기,손글씨로쓴숫자를보고어떤
숫자인지맞추기.
• Multi+Task
• 여러개의Task를동시에학습하는방법론.
Prerequisites
<Single-taskLearning>
<Multi-taskLearning>
Model 3
Training
Set 3
Model 2
Training
Set 2
Model 1
Training
Set 1
Model 3
Training
Set 3
Model 2
Training
Set 2
Model 1
Training
Set 1
Multi-taskLearning
• Task란?
• 머신러닝알고리즘이수행해야하는작업.
• Ex.개와고양이구별하기,손글씨로쓴숫자를보고어떤
숫자인지맞추기.
• Multi+Task
• 여러개의Task를동시에학습하는방법론.
• 왜동시에학습해야하는가?
• 각각을따로학습할때보다더성능이향상될수있기때문!
• 개와고양이를구별하는Task와늑대와호랑이를구별하
는Task가있다고할때,둘을각각학습시키는것보다둘을
한번에학습시키면더성능이좋아질것을기대할수있지
않을까?
Prerequisites
<Single-taskLearning>
<Multi-taskLearning>
Model 3
Training
Set 3
Model 2
Training
Set 2
Model 1
Training
Set 1
Model 3
Training
Set 3
Model 2
Training
Set 2
Model 1
Training
Set 1
Multi-taskLearning
• Task란?
• 머신러닝알고리즘이수행해야하는작업.
• Ex.개와고양이구별하기,손글씨로쓴숫자를보고어떤
숫자인지맞추기.
• Multi+Task
• 여러개의Task를동시에학습하는방법론.
• 왜동시에학습해야하는가?
• 각각을따로학습할때보다더성능이향상될수있기때문!
• 개와고양이를구별하는Task와개와호랑이를구별하는
Task가있다고할때,둘을각각학습시키는것보다둘을한
번에학습시키면더성능이좋아질것을기대할수있지않
을까?
• KnowledgeTransfer
• Single-taskLearning보다Multi-taskLearning의성능
이좋아졌을때,한Task의지식이다른Task로전이되어
서성능이좋아졌다고말함.
• MultitaskLearning의목표는지식전이가일어나도록하
는것!
Prerequisites
<Single-taskLearning>
<Multi-taskLearning>
Model 3
Training
Set 3
Model 2
Training
Set 2
Model 1
Training
Set 1
Model 3
Training
Set 3
Model 2
Training
Set 2
Model 1
Training
Set 1
Multi-taskLearning
• Set에대한개념을혼동하지않기.
• 오른쪽다이어그램같은상황도Multi-tankLearning임!
Prerequisites
Model 3
(X3, Y3)
Model 2
(X2, Y2)
Model 1
(X1, Y1)
Model 3
(X, Y3)
Model 2
(X, Y2)
Model 1
(X, Y1)
• 개와고양이사진을구별하는Task와늑대와호
랑이사진을구별하는Task를같이수행하는
Multi-taskLearning
• 한글손글씨사진을보고초성중성종성을각각
맞추는Multi-taskLearning
• 자율주행차는카메라의입력을받아,line을
tracing함과동시에장애물의위치를파악해야
함.
OnlineLearning
• 오프라인러닝
• 우리가아는일반적인머신러닝.
• 모델아!트레이닝데이터이만큼한번에다줄테니까,러닝해봐!
• 온라인러닝
• 데이터가스트림으로주어지는상황에서의머신러닝.
• 모델아!트레이닝데이터를한번에주면용량이너무크니까,조금씩나눠서줄게!러닝해봐!
• 근데내가주는거저장하기엔너무클테니까트레이닝데이터는쓰고버려!
• 온라인러닝에서는한번쓴데이터를다시쓸수없는경우가반드시존재한다.
• 모든데이터를1epoch에만쓸수있는것은아님!
• 받은batchdata를이용해여러epoch러닝하고버려도 됨.
Prerequisites
Training
Subset
Training
Subset
Training
Subset
Training
Subset
Training
Subset
Model
Training
Set
Model
Training
Subset
OnlineMulti-taskLearning
=OnlineLearning+Multi-taskLearning
LifelongLearning
• OnlineMultitaskLearning의일종.
• Taskt에해당하는TrainingSett가차례로주어진다.
• TrainingSett는Taskt’(t’>t)를학습할때에는주어지지않는다.
Prerequisite
Training
Set 6
Training
Set 5
Training
Set 4
Training
Set 3
Training
Set 2
Model
Training
Set 1
LifelongLearning
• OnlineMultitaskLearning의일종.
• Task1에해당하는TrainingSet1을모델이학습하고나면,

더이상TrainingSet1은접근할수없다.
Prerequisite
Training
Set 6
Training
Set5
Training
Set4
Training
Set 3
Model
Training
Set 2
LifelongLearning
• OnlineMultitaskLearning의일종.
• Task2에해당하는TrainingSet2를모델이학습하고나면,

더이상TrainingSet2은접근할수없다.
• 이후도마찬가지!
Prerequisite
Training
Set 6
Training
5
Training
Set 4
Model
Training
Set 3
LifelongLearning
• 평생학습문제상황은자율주행이나로봇에이전트학습과같이데이터가스트림형태로도착하
는시나리오에서자주등장해왔음.
• LifelongLearning상황에서는NetworkCapacity도효율적으로써야한다.
• 에이전트의총메모리는정해져서나오는데,에이전트가평생학습을진행하려면한Task당최소
한으로NetworkCapacity를써가면서학습해야함.
• 논문에서memoryefficient해야한다는표현이나오는데,여기서메모리는스토리지를포함하는개념!
• Multi-taskLearning이기때문에지식전이도일어나야함.
• ForwardKnowledgeTransfervsBackwardKnowledgeTransfer.
• ForwardKnowledgeTransfer:이전태스크를미리학습함으로써,이후태스크의학습을돕는상황.
• BackwardKnowledgeTransfer:이후태스크를미리학습함으로써,이전태스크의성능이증가하는상황.
Prerequisite
CatastrophicForgetting
• LifelongLearning에서는Task1,Task2,…,TaskN을차례로학습함.
• Taskt을학습함으로써,계산된Networkweight를Wt이라고하자.
• 만약,Taskt+1를학습할때는Wt를초기값으로사용해학습하면,

t가점점증가할수록이전Task에대해학습한내용을잊어버리게된다.
• 이현상을CatastrophicForgetting이라고함!
• 종종의미변화(SemanticDrift)현상이라고도함!
• Weight의의미가변화했다고해서...
• CatastrophicForgetting을막는것은LifelongLearning의주요challenge중하나.
Prerequisite
CatastrophicForgetting를어떻게막을까?
• NaiveApproach1:모든t에대해Wt를저장.
• CatastrophicForgetting이일어나지는않겠지만,메모리를비효율적으로사용할수밖에없음.
• NaiveApproach2:l2regularization사용.
• 새모델이이전모델에서너무많이벗어나는것을방지하는정규화를적용.
• 하지만,l2regularizer와같은간단한정규화는모델이새Task에대한지식을습득하는것을방해.
Prerequisite
LifelongLearning에관련된기존연구
• [1],[13]은이전Task와새로운Task모두에게좋은새로운정규화방법을제안.
• 하지만,신경망의최종계수값보다는전체학습궤도를고려한다는한계.
• 일종의Relaxation된문제를푸는것!
• [9]는이전신경망의수정을완전히차단.기학습된신경망에대해서는역전파(Back
Propagation)를막아신경망이수정되지않게하고,각업무에대해고정용량만큼신경망를확장
해나가는방식.
• BackwardKnowledgeTransfer가일어나지않음.
• 그래도NaiveApproach1보다는메모리를효율적으로사용하기도함.
RelatedWorks
DynamicallyExpandableNetworks와관련된기존연구
• 지금까지학습중동적으로뉴런을추가함으로써,신경망네트워크용량을확장하는신경망을제
시한연구는거의없었음.
• [14]는학습이어려운데이터에대해새로운뉴런을추가하고중복을방지하기위해다른뉴런과
병합하는방법론을제시.
• [8]은Loss를최소화하는방향으로신경망을학습하고,각층이무한한수의뉴런을가지고있다
는가정하에손실을줄일수있는각층의최소차원을찾는비모수적신경망모델을제안.
• [5]는부스팅이론을기반으로주어진손실을최소화하기위해신경망구조와계수를적응적으로
학습할수있는신경망를제안.
• 소개한연구들은모두다중업무학습에대해고려하지않고있음.

뉴런을반복적으로추가하는과정을포함.
• [12]는새로운분류가모델에주어질때,계층구조를형성하는신경망을점진적으로학습하는방
법을제안.
• 그러나,모든층에서뉴런의증가시킬수없고모델의분기가많아진다는단점이존재.
RelatedWorks
논문의Goal
• CatastrophicForgetting이없으면서,
• 모델이새로운Task에대한지식을습득하는데방해를받지않지않으면서,
• ForwardKnowledgeTransfer가일어나면서,
• BackwardKnowledgeTransfer도일어나면서,
• 메모리를효율적으로사용하는평생학습방법.
DynamicallyExpandableNetworks
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
전체알고리즘
DynamicallyExpandableNetworks
첫번째Task일때는,

일반DNN학습하듯학습하되,
l1-Regularizer를통해Sparse하게학습함!
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
두번째Task부터는이전Task까지학습한Weight를활용해학습을진행함!
학습은총3단계로구성되는데,
1.SelectiveRetraining 2.DynamicExpansion 3.SplitandDuplication
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
첫번째단계인SelectiveRetraining은말그대로선별적으로재학습하는단계!
새로운Task를학습시키기위해,주요한Weight만을업데이트하는과정.
모든Weight를학습시키지않고몇개의edge만을선별적으로업데이트한다.
(왜선별적으로업데이트하는지는SplitandDuplication까지설명을들으면알수있음.)
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
두번째단계인DynamicExpansion은
만약,SelectiveRetraining으로학습을시켰는데,
새Task에대한학습이제대로일어나지않았다면(=새Task에대한loss가너무크다면),
네트워크내에hiddenunit을추가하는과정이다.
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
두번째단계인DynamicExpansion은
만약,SelectiveRetraining으로학습을시켰는데,
새Task에대한학습이제대로일어나지않았다면(=새Task에대한loss가너무크다면),
네트워크내에hiddenunit을추가하는과정이다.
새Task에대한loss
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
세번째단계인SplitandDuplication은
SelectiveRetraining과정에서너무많이바뀐edge들을
떼서새로운hiddenunits을만들어주는과정이다.
이과정으로,catastrophicforgetting을방지할수있다.
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
세번째단계인SplitandDuplication은
SelectiveRetraining과정에서너무많이바뀐edge들을
떼서새로운hiddenunits을만들어주는과정이다.
이과정으로,catastrophicforgetting을방지할수있다.
이전Task의Weight와
현재Weight의차이
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
Q.왜SelectiveRetraining에서선별적으로업데이트했을까?
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
Q.왜SelectiveRetraining에서선별적으로업데이트했을까?
선별적으로업데이트하지않았다면,SplitandDuplication단계에서
복제해야할hiddenunit이너무많아질수있음.
그러면,효율적인학습이라는lifelonglearning의목표를달성할수없음.
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
Q.왜SplitandDuplication단계에서SelectiveRetraining에서바뀐모든weight에해당하는

unit을업데이트하지않고,일정threshold이상바뀐것에대해서만복제했을까?
전체알고리즘
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
Q.왜SplitandDuplication단계에서SelectiveRetraining에서바뀐모든weight에해당하는

unit을업데이트하지않고,일정threshold이상바뀐것에대해서만복제했을까?
모두복제하게되면CatastrophicForgetting은전혀일어나지않겠지만,
메모리비효율적이고,Backwardknowledgetransfer가일어날수없다.
일정threshold이상바뀐것만복제함으로써CatastrophicForgetting의여지가생기면서
Backwardknowledgetransfer의여지가생기는것임!
SelectiveRetraining
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
첫번째단계인SelectiveRetraining은말그대로선별적으로재학습하는단계!
새로운Task를학습시키기위해,주요한Weight만을업데이트하는과정.
모든Weight를학습시키지않고몇개의edge만을선별적으로업데이트한다.
SelectiveRetraining
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
l1-regularizer를통해주요한역할을하는edge들을찾아냄.
outputlayer부터inputlayer로가면서차례로Wl,tt를구해냄.
(Wl,tt는l번째layer에서taskt의Weight값)
SelectiveRetraining
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
l1-regularizer를통해주요한역할을하는edge들을찾아냄.
outputlayer부터inputlayer로가면서차례로Wl,tt를구해냄.
(Wl,tt는l번째layer에서taskt의Weight값)
알고리즘에서는Finallayer와그전layer의weight값을
구하는것처럼서술했지만,사실은모든Weight에대해서
미리다구해두어야한다.
SelectiveRetraining
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
Weight가바뀐Edge를찾았으면,
Edge에연결된hiddenunits을찾는단계.
SelectiveRetraining
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
논문의서술에따르면,위식보다는
SelectiveRetraining
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
이식으로보는것이적절할것!

WSt 를재학습하는데,WSt-1을초기값으로주고재학습.
WSt자체가너무커지지않도록l2regularization.
SelectiveRetraining
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
이식으로보는것이적절할것!

WSt 를재학습하는데,WSt-1을초기값으로주고재학습.
WSt자체가너무커지지않도록l2regularization.
극히일부subnework에대해서만학습하므로빠르게끝남.
l1regularizer로바꿀subnetwork를찾은후에
해당subnetwork를다시제대로재학습하는과정임.
DynamicallyExpansion
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
두번째단계인DynamicExpansion은
만약,SelectiveRetraining으로학습을시켰는데,
새Task에대한학습이제대로일어나지않았다면
(=새Task에대한loss가너무크다면),
네트워크내에hiddenunit을추가하는과정이다.
DynamicallyExpansion
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
SelectiveRetraining을했는데,
새Task에대한학습이제대로일어나지않았다면
(=새Task에대한loss가너무크다면),
네트워크내에고정된k개의hiddenunit을추가!
DynamicallyExpansion
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
네트워크내에고정된k개의hiddenunit을추가한이후에,
추가된hiddenunit들을학습시키는단계!
DynamicallyExpansion
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
여기서도l1-regularizer로
Weight가Sparse하게학습시킴.
DynamicallyExpansion
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
GroupSparsityRegularizer.Wenet,al(2016)
g는각unit에들어오는incomingweights.
불필요한units을찾아내는데유용.
DynamicallyExpansion
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
위식을통해학습하므로,불필요한units을찾아낼수있다.
따라서,불필요한units을제거할수있음!
(메모리효율성을위해불필요한메모리제거)
SplitAndDuplication
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
세번째단계인SplitandDuplication은
SelectiveRetraining과정에서너무많이바뀐edge들을
떼서새로운hiddenunits을만들어주는과정이다.
이과정으로,catastrophicforgetting을방지할수있다.
SplitAndDuplication
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
세번째단계인SplitandDuplication은
SelectiveRetraining과정에서너무많이바뀐edge들을
떼서새로운hiddenunits을만들어주는과정이다.
이과정으로,catastrophicforgetting을방지할수있다.
SelectiveRetraining이일어나지않으면,
SplitandDuplicaton도일어나지않는다!
SplitAndDuplication
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
모든hiddenunits에대해
이전태스크까지학습한weight와비교했을때,
너무많이바뀐weight가있는unit의경우에는복제한다.
SplitAndDuplication
DynamicallyExpandableNetworks
SelectiveRetraining DynamicExpansion SplitandDuplication
복제를진행한이후에는복제된상태를기준으로,
Finetuning을진행한다.
TimeStamping
• 각Task를학습할때마다각Task에사용한unit수를기록해두는것(레이어마다).
• TimeStamping을통해해당Task에해당하는unit만을사용해각Task를수행.
• 새로운Hiddenunit의추가로인한기존업무의치명적망각을방지.
• [9]가제시한각학습단계까지학습한Weight를저장하는것보다더유연한전략.
• [9]와달리DEN에서는BackwardTransfer가일어날수있음.
• 분할되지않았지만,추후Task에서학습된다른unit들로부터이익을얻을수있기때문.
DynamicallyExpandableNetworks
실험결과
• 많은실험이있지만,MNIST-VariationDataset에대해서집중적으로알아보고자함.
• MNIST-VariationDataset이란?
• MNIST데이터의image에randompermutation을준데이터셋.
• LifelongLearning상황을위해만들어진데이터셋.
• KnowledgeTransfer가일어날수밖에없는데이터셋.
Experiments
실험결과
• 얼마나Catastrophic Forgetting이안일어났는지를나타내는그래프.
Experiments
기존의LifelongLearning알고리즘보다더좋은성능을나타냄.
실험결과
• 얼마나적은Capacity를사용했는지를나타내는그래프.
Experiments
LifelongLearning
setting이아님.
실험결과
• 얼마나적은Capacity를사용했는지를나타내는그래프.
Experiments
DEN이가장용량을효율적으로썼다.
DEN-finetune은더효율적!
실험결과
• 얼마나적은Capacity를사용했는지를나타내는그래프.
Experiments
기존의LifelongLearning알고리즘들이사용한용량의11.9~60.3%를사용함.
DEN이가장용량을효율적으로썼다.
DEN-finetune은더효율적!
추가실험
• DEN모델에대해더알아보고싶은부분이있어구현을해보고자하였음.
• 다행히,저자들이공개한코드가존재!
• https://github.com/jaehong-yoon93/DEN
• 하지만,공개된코드에SelectiveRetraining이거의일어나지않는버그가있는것으로판단됨.
• 그래서,고쳐서실험해봄
• https://github.com/JoonyoungYi/DEN-tensorflow
• MNIST-Variations데이터셋으로실험한결과를리포트하겠음.
Experiments
실험결과
• 모델이새로운Task에대한지식을습득하는데방해를받지않지않을까?
• 방해받지않는다는것을실험을통해알수있었음.
Experiments
실험결과
• 논문에제시된결과보다CatastrophicForgetting이덜일어남.
• 논문에제시된실험결과의용량과아마차이가있을것으로예상.
Experiments
실험결과
• BackwardKnoweldgeTransfer가일어나는것을확인할수있었음.
Experiments
Task진행
각Task에대한성능
종종BackwardKnowledgeTransfer가일어나는것을확인.
(이후Task를학습하면오히려성능이증가하는것을종종확인가능)
MNIST-VariationsDataset특성상,BackwardKnowledgeTransfer가
많이일어나기는힘들것!
결론
• CatastrophicForgetting이다른모델보다덜했고,
• 모델이새로운Task에대한지식을습득하는데방해를받지않지않았고,
• ForwardKnowledgeTransfer가일어나며,
• BackwardKnowledgeTransfer도일어나며,
• 메모리를기존방법대비11.9~60.3%만사용해효율적인평생학습방법.
ConclusionandDiscussion
한계
• 하이퍼파라미터가너무많아서일일이손으로정해줘야한다.
• 그래서,한계를극복하기위한다른연구가등장!
• OvercomingCatastrophicForgettingwithHardAttentiontotheTask.
• LifelongLearning에서HardAttention을통해CatastrophicForgetting을해결.
ConclusionandDiscussion
NAME|
EMAIL|
PHONE|
이준영(JoonyoungYi)
joonyoung.yi@mli.kaist.ac.krjoonyoung.yi@kaist.ac.kr
+82-10-9765-0885
Reference
[1]JamesKirkpatrick,RazvanPascanu,NeilRabinowitz,JoelVeness,GuillaumeDesjardins,AndreiARusu,
KieranMilan,JohnQuan,TiagoRamalho,AgnieszkaGrabska-Barwinska,etal.”Overcomingcatastrophic
forgettinginneuralnetworks”,ProceedingsoftheNationalAcademyofSciences(PNAS)2017,pp.
201611835.
[3]EricEatonandPaulL.Ruvolo.ELLA:Anefficientlifelonglearningalgorithm.InSanjoyDasguptaand
DavidMcallester(eds.),ICML,volume28,pp.507‒515.JMLRWorkshopandConferencePro-ceedings,2013.
[5]CorinnaCortes,XaviGonzalvo,VitalyKuznetsov,MehryarMohri,andScottYang.Adanet:Adaptive
structurallearningofartificialneuralnetworks.arXivpreprintarXiv:1607.01097,2016.
[6]AbhishekKumarandHalDaumeIII.Learningtaskgroupingandoverlapinmulti-tasklearning.InICML,
2012.
[7]Sang-WooLee,Jin-HwaKim,Jung-WooHa,andByoung-TakZhang.Overcomingcatastrophicfor-
gettingbyincrementalmomentmatching.arXivpreprintarXiv:1703.08475,2017
[8]GeorgePhilippandJaimeG.Carbonell.Nonparametricneuralnetworks.InICLR,2017.
[9]AndreiRusu,NeilRabinowitz,GuillaumeDesjardins,HubertSoyer,JamesKirkpatrick,KorayKavukcuoglu,
RazvanPascanu,andRaiaHadsell.Progressiveneuralnetworks.arXivpreprintarXiv:1606.04671,2016.
[12]TianjunXiao,JiaxingZhang,KuiyuanYang,YuxinPeng,andZhengZhang.Error-drivenincremen-tal
learningindeepconvolutionalneuralnetworkforlarge-scaleimageclassification.InProceedingsofthe22nd
ACMinternationalconferenceonMultimedia,pp.177‒186.ACM,2014.
[13]FriedemannZenke,BenPoole,andSuryaGanguli.Continuallearningthroughsynapticintelligence.In
ICML,pp.3987‒3995,2017.
[14]GuanyuZhou,KihyukSohn,andHonglakLee.Onlineincrementalfeaturelearningwithdenoising
autoencoders.InInternationalConferenceonArtificialIntelligenceandStatistics,pp.1453‒1461,2012.
Appendix

More Related Content

What's hot

What's hot (20)

Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
 
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
 
NIP2015読み会「End-To-End Memory Networks」
NIP2015読み会「End-To-End Memory Networks」NIP2015読み会「End-To-End Memory Networks」
NIP2015読み会「End-To-End Memory Networks」
 
[DL輪読会]機械学習におけるカオス現象について
[DL輪読会]機械学習におけるカオス現象について[DL輪読会]機械学習におけるカオス現象について
[DL輪読会]機械学習におけるカオス現象について
 
文献紹介:YOLO series:v1-v5, X, F, and YOWO
文献紹介:YOLO series:v1-v5, X, F, and YOWO文献紹介:YOLO series:v1-v5, X, F, and YOWO
文献紹介:YOLO series:v1-v5, X, F, and YOWO
 
【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models
 
[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...
[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...
[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"
 
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
[DL輪読会]物理学による帰納バイアスを組み込んだダイナミクスモデル作成に関する論文まとめ
 
グラフニューラルネットワーク入門
グラフニューラルネットワーク入門グラフニューラルネットワーク入門
グラフニューラルネットワーク入門
 
ELBO型VAEのダメなところ
ELBO型VAEのダメなところELBO型VAEのダメなところ
ELBO型VAEのダメなところ
 
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
 
【DL輪読会】Mastering Diverse Domains through World Models
【DL輪読会】Mastering Diverse Domains through World Models【DL輪読会】Mastering Diverse Domains through World Models
【DL輪読会】Mastering Diverse Domains through World Models
 
20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて
 
Graph Neural Networks
Graph Neural NetworksGraph Neural Networks
Graph Neural Networks
 
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
 
[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-
[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-
[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-
 
グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題
 

Similar to Dynamically Expandable Network (DEN)

Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
MLAI2
 
Accelerating the Pace of Engineering Education with Simulation, Hardware and ...
Accelerating the Pace of Engineering Education with Simulation, Hardware and ...Accelerating the Pace of Engineering Education with Simulation, Hardware and ...
Accelerating the Pace of Engineering Education with Simulation, Hardware and ...
Joachim Schlosser
 

Similar to Dynamically Expandable Network (DEN) (20)

Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
 
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
 
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate Recognition
 
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image DatabaseIRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database
 
IRJET-Comparison between Supervised Learning and Unsupervised Learning
IRJET-Comparison between Supervised Learning and Unsupervised LearningIRJET-Comparison between Supervised Learning and Unsupervised Learning
IRJET-Comparison between Supervised Learning and Unsupervised Learning
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
Applying a Systematic Review on Adaptive Security for DSPL
 Applying a Systematic Review on Adaptive Security for DSPL Applying a Systematic Review on Adaptive Security for DSPL
Applying a Systematic Review on Adaptive Security for DSPL
 
Migration strategies for object oriented system to component based system
Migration strategies for object oriented system to component based systemMigration strategies for object oriented system to component based system
Migration strategies for object oriented system to component based system
 
Hybrid-Training & Placement Management with Prediction System
Hybrid-Training & Placement Management with Prediction SystemHybrid-Training & Placement Management with Prediction System
Hybrid-Training & Placement Management with Prediction System
 
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
 
Study on Structural Optimization of truss members using Meta- heuristic Algor...
Study on Structural Optimization of truss members using Meta- heuristic Algor...Study on Structural Optimization of truss members using Meta- heuristic Algor...
Study on Structural Optimization of truss members using Meta- heuristic Algor...
 
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
 
Accelerating the Pace of Engineering Education with Simulation, Hardware and ...
Accelerating the Pace of Engineering Education with Simulation, Hardware and ...Accelerating the Pace of Engineering Education with Simulation, Hardware and ...
Accelerating the Pace of Engineering Education with Simulation, Hardware and ...
 
Inga Magodla NCPC SA
Inga Magodla NCPC SAInga Magodla NCPC SA
Inga Magodla NCPC SA
 
Is robustness really robust? how different definitions of robustness impact d...
Is robustness really robust? how different definitions of robustness impact d...Is robustness really robust? how different definitions of robustness impact d...
Is robustness really robust? how different definitions of robustness impact d...
 
BOIL: Towards Representation Change for Few-shot Learning
BOIL: Towards Representation Change for Few-shot LearningBOIL: Towards Representation Change for Few-shot Learning
BOIL: Towards Representation Change for Few-shot Learning
 

More from Joonyoung Yi

Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)
Joonyoung Yi
 

More from Joonyoung Yi (9)

Mixture-Rank Matrix Approximation for Collaborative Filtering
Mixture-Rank Matrix Approximation for Collaborative FilteringMixture-Rank Matrix Approximation for Collaborative Filtering
Mixture-Rank Matrix Approximation for Collaborative Filtering
 
Sparsity Normalization: Stabilizing the Expected Outputs of Deep Networks
Sparsity Normalization: Stabilizing the Expected Outputs of Deep NetworksSparsity Normalization: Stabilizing the Expected Outputs of Deep Networks
Sparsity Normalization: Stabilizing the Expected Outputs of Deep Networks
 
Low-rank Matrix Approximation with Stability
Low-rank Matrix Approximation with StabilityLow-rank Matrix Approximation with Stability
Low-rank Matrix Approximation with Stability
 
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsIntroduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
 
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Why biased matrix factorization works well?
Why biased matrix factorization works well?Why biased matrix factorization works well?
Why biased matrix factorization works well?
 
Introduction to Low-rank Matrix Completion
Introduction to Low-rank Matrix CompletionIntroduction to Low-rank Matrix Completion
Introduction to Low-rank Matrix Completion
 
Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 

Dynamically Expandable Network (DEN)