SELF-ENSEMBLING
FOR VISUAL DOMAIN ADAPTATION
ICLR 2018
MEAN TEACHERS ARE
BETTERROLE MODELS
Weight-averaged consistency targets improve
semi-supervised deep learning results
ANTTI TARVAINEN & HARRI VALPOLA
HOW
THE MEAN TEACHER
WORKS
dog
dog
cat
horse horse
dog
cat
cat cat
horse dog
horse cat
horsecatdog
model’s
prediction
The truelabel pulls
predictions to
its direction.
SUPERVISED LEARNING
dog
dog
cat horse
The truelabel pulls
predictions to
its direction.
SUPERVISED LEARNING
dog
dog
cat horse
model’s
prediction
The truelabel pulls
predictions to
its direction.
SUPERVISED LEARNING
dog
dog
cat horse
model’s
prediction
The truelabel pulls
predictions to
its direction.
SUPERVISED LEARNING
dog
dog
dog
cat horse
The truelabel pulls
predictions to
its direction.
SUPERVISED LEARNING
dog
dog
cat horse
The truelabel pulls
predictions to
its direction.
SUPERVISED LEARNING
dog
dog
cat horse
WHAT ABOUT EXAMPLES
WITHUNKNOWN LABELS?
???
dog
cat horse
WHAT ABOUT EXAMPLES
WITHUNKNOWN LABELS?dog
How should we
adjust the prediction?
???
cat horse
dog
???
cat horse
We need two predictions:
a student and a teacher.
dog We need two predictions:
a student and a teacher.
Then, hopefully,
the student can learn
something from the
teacher.
???
cat horse
dog So how to do this?
Two ways.
???
cat horse
dog
We distort the student’s
input, making its task
more challenging.
1. MAKE THE
TASK HARDER.
cat horse
dog
1. MAKE THE
TASK HARDER.
cat horse
We distort the student’s
input, making its task
more challenging.
And then we train
the harder task to
predict the easier
tasks’ output.
dog
We maintain an exponential
moving average of weights
to create a better teacher.
2. MAKE THE
TEACHER BETTER.
???
cat horse
dog
We maintain an exponential
moving average of weights
to create a better teacher.
A mean teacher.
2. MAKE THE
TEACHER BETTER.
???
cat horse
dog
We maintain an exponential
moving average of weights
to create a better teacher.
A mean teacher.
Then we let the
student learn these
better predictions.
2. MAKE THE
TEACHER BETTER.
???
cat horse
dog
Combining these
two ways works
even better.
???
cat horse
dog
The student
and the teacher
improve each other
in a virtuous cycle.
???
cat horse
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
dog
???
cat horse
The student
and the teacher
improve each other
in a virtuous cycle.
HOW TO STARTUSING
THEMEAN TEACHER
θ
classification
cost
prediction
dog
Take a supervised model.
label
input
dog
θ θ’
classification
cost
prediction
dog
Make a copy of it.
label
input
dog
student teacher
θ θ’
exponentia
l moving
average
classification
cost
prediction
dog
Update teacher weights
after each training step.
label
input
dog
student teacher
θ θ’
exponentia
l moving
average
classification
cost
consistency
cost
prediction
dog
Add a cost between the two predictions.
label
input
dog
student teacher
θ θ’
exponentia
l moving
average
classification
cost
consistency
cost
prediction
noise noise
dog
Maybe add some noise.
label
input
dog
student teacher
θ θ’
exponentia
l moving
average
classification
cost
consistency
cost
prediction
noise noise
dog
Start using it for semi-supervised learning.
label
input
dog
student teacher
RESULTS
50000
images
truck
4000
labeled
0
2
4
12
10
8
6
testerrorrate
10,6 6,3
Virtual
Adversaria
l Training
Mean
Teacher
(ResNet)
using all labels
2,9
stateof theart
CIFAR-10 WITH4000 LABELS
IMAGENET 2012 WITH10%OF THELABELS
1280000
images
10%
labeled
brambling
0
40
30
20
10
top-5validation
errorrate
using all the labels
Variational
Auto-Encoder
Mean Teacher
(ResNet-18)
3,8
stateof the art
35,2
19,8
IMAGENET 2012 WITH10%OF THELABELS
0
10
20
30
40
top-5validation
errorrate
using all the labels
35,2
9,1
Variational
Auto-Encoder
Mean Teacher
(ResNet-152)
(theseresults included in the
Arxiv version of the paper)
3,8
stateof the art
IMAGENET 2012 WITH10%OF THELABELS
WHY TO USE
THEMEAN TEACHER
WHY MEAN TEACHER
1. It is easy to add to your model.
2. It can be adapted to different situations.
3. It gives good results.
Self-ensembling for visual domain adpation
Self-ensembling for visual domain adpation

Self-ensembling for visual domain adpation