SlideShare a Scribd company logo
1 of 14
Download to read offline
Learning With Neighbor
Consistency for Noisy Labels
Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid, CVPR2022
橋口凌大(名工大)
2023/6/30
概要
nノイズの多いラベルからの学習方法の提案
nNeighbor Consistency Regularization (NCR)を提案
• 類似の特徴表現を持つデータは類似の予測をするように促す
nベースラインよりも高い精度を達成する
nノイズの多いデータからの学習手法 [Song+, IEEE TNNLS 2022]
• ロバストなアーキテクチャ
• ロバストな正則化
• ロバストな損失関数の設計
• サンプル選択
従来研究
n予測値を用いた修正 [Han+, NeurIPS2018][Li+, ICLR2020]
nラベル伝播アルゴリズム [Iscen+, CVPR2019]
Mini-batch 1 A
A
A
A
A
A
B
B
B
A
A
A
B
B
B
!=
!=
M-Net Decoupling Co-teaching
Mini-batch 2
Mini-batch 3
gure 1: Comparison of error flow among MentorNet (M-Net) [17], Decoupling [26] and Co-
aching. Assume that the error flow comes from the biased selection of training instances, and error
ow from network A or B is denoted by red arrows or blue arrows, respectively. Left panel: M-Net
aintains only one network (A). Middle panel: Decoupling maintains two networks (A & B). The
rameters of two networks are updated, when the predictions of them disagree (!=). Right panel:
o-teaching maintains two networks (A & B) simultaneously. In each mini-batch data, each network
mples its small-loss instances as the useful knowledge, and teaches such useful instances to its peer
twork for the further training. Thus, the error flow in Co-teaching displays the zigzag shape.
multaneously, and then updates models only using the instances that have different predictions from
ese two networks. Nonetheless, noisy labels are evenly spread across the whole space of examples.
hus, the disagreement area includes a number of noisy labels, where the Decoupling approach cannot
ndle noisy labels explicitly. Although MentorNet and Decoupling are representative approaches in
is promising direction, there still exist the above discussed issues, which naturally motivates us to
mprove them in our research.
eanwhile, an interesting observation for deep models is that they can memorize easy instances
st, and gradually adapt to hard instances as training epochs become large [2]. When noisy labels
ist, deep learning models will eventually memorize these wrongly given labels [45], which leads to
e poor generalization performance. Besides, this phenomenon does not change with the choice of
aining optimizations (e.g., Adagrad [9] and Adam [18]) or network architectures (e.g., MLP [15],
exnet [20] and Inception [37]) [17, 45].
this paper, we propose a simple but effective learning paradigm called “Co-teaching”, which allows
to train deep networks robustly even with extremely noisy labels (e.g., 45% of noisy labels occur
the fine-grained classification with multiple classes [8]). Our idea stems from the Co-training
Published as a conference paper at ICLR 2020
Mini-batch 1
…
…
Mini-batch 2
!"
($)
, '"
($)
A
B
Epoch e
!"
(()
, '"
(()
)
MixMatch
MixMatch
MixMatch
MixMatch
A
B
Co-Divide
…
…
A
B
Epoch e-1
GMM
GMM
Figure 1: DivideMix trains two networks (A and B) simultaneously. At each epoch, a network models
its per-sample loss distribution with a GMM to divide the dataset into a labeled set (mostly clean) and an
unlabeled set (mostly noisy), which is then used as training data for the other network (i.e. co-divide). At each
mini-batch, a network performs semi-supervised training using an improved MixMatch method. We perform
label co-refinement on the labeled samples and label co-guessing on the unlabeled samples.
2.2 SEMI-SUPERVISED LEARNING
SSL methods aim to improve the model’s performance by leveraging unlabeled data. Current
state-of-the-art SSL methods mostly involve adding an additional loss term on unlabeled data to
regularize training. The regularization falls into two classes: consistency regularization (Laine &
Aila, 2017; Tarvainen & Valpola, 2017; Miyato et al., 2019) enforces the model to produce consistent
predictions on augmented input data; entropy minimization (Grandvalet & Bengio, 2004; Lee, 2013)
encourages the model to give high-confidence predictions on unlabeled data. Recently, Berthelot et al.
(2019) propose MixMatch, which unifies consistency regularization, entropy minimization, and the
MixUp (Zhang et al., 2018) regularization into one framework.
Feature extractor ✓
FC
+
softmax
Network f✓
Phase 1:
Train for T epochs with
Ls(XL, YL; ✓)
(labeled examples only)
Train for 1 epoch with
Lw(X, YL, ŶU ; ✓)
(all examples)
Extract descriptors V
Compute affinity A (9)
W A + A>
W D 1/2
W D 1/2
Use ✓
Solve (10)
Label propagation
Phase 2: Iterate T0 times
: labels : missing labels : pseudo-labels (size proportional to certainty !i)
モデル概観
n特徴抽出器gに画像を入れる
n識別器hでクラス識別
ng(x)の特徴空間上でk近傍のサンプルに近づくように強制(NCR)
hotdog
Neighbor
Consistency
Regularization
Cross
Entropy
Backbone Classifier Loss Function
Neighborhood in feature
space
Minibatch
NCR
nラベルノイズの記憶[Liu+, NeurIPS2020]を防ぐためにNCRを定義
• K近傍のデータの分布に類似度で重み付け
• 自分の分布と足し合わせた分布を近づけるように学習
n類似度の計算
nネットワークの学習
al is
o in-
ssify
t ex-
time
tice.
fea-
atrix
fore-
[15]
opti-
tion.
s the
aga-
g the
Sec-
hich
ss in
cting
the classifier hW . More specifically, hW (vi) and hW (vj)
should behave similarly if si,j is high, regardless of their la-
bels yi and yj. This would prevent the network from over-
fitting to an incorrect mapping between an example xi and
a label yi, if either (or both) yi and yj are noisy.
To enforce NCR, we design an objective function which
minimizes the distance between logits zi and zj, if the cor-
responding feature representations vi and vj are similar:
LNCR(X, Y ; ✓, W) :=
1
m
m
X
i=1
DKL
✓
(zi/T)
X
j2NNk(vi)
si,j
P
k si,k
· (zj/T)
◆
,
(3)
where DKL is the KL-divergence loss to measure the dif-
ference between two distributions, T is the temperature and
NNk(vi) denotes the set of k nearest neighbors of i in the
feature space. We set T = 2 throughout our experiments.
We normalize the similarity values so that the second term
of the KL-divergence loss remains a probability distribu-
hotdog
Neighbor
Consistency
Regularization
Cross
Entropy
Backbone Classifier Loss Function
Neighborhood in feature
space
Minibatch
Figure 1. To address the problem of noisy labels in the training set, we propose Neighbor Consisteny Regularizatio
encourages examples with similar feature representations to have similar outputs, thus mitigating the impact of train
incorrect labels.
match [3] proposed variants of mixup for semi-supervised
learning in which predictions replace the labels for unsu-
pervised examples. Xie et al. [44] introduced Unsupervised
Data Augmentation for semi-supervised image classifica-
tion, where a model is encouraged to be robust to label-
preserving transformations even when labels are not avail-
able by minimizing the divergence between predictions for
transformed and non-transformed images. Most relevant to
our work, [10] used prediction consistency with respect to
image transformations for the express purpose of learning
with noisy labels. While these forms of consistency are ef-
fective regularizers, neighbor consistency offers the ability
to transfer supervision directly to mislabelled examples.
3. Preliminaries
respond to the feature extractor and class
The feature extractor maps an image xi to
vector vi := g✓(xi) 2 Rd
. The class
dimensional vector to class scores zi :=
Typically, the network parameters are lea
ing a loss function for supervised classific
LS(X, Y ; ✓, W) :=
1
m
m
X
i=1
` ( (
where X and Y correspond to the set of
mini-batch, m = |X| = |Y | denotes the
batch, is the softmax function and `(q
<latexit sha1_base64="bU2ZsQABHGEIlRU836R53gOIS5A=">AAACtHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQv4p8fHlGSkliRqxOQmlmQkpVVX1MZnairYKsD4ZUA+V0wMVwa6wjI0hVUghfECygZ6BmCggMkwhDKUGaAgIF9gOUMMQwpDPkMyQylDLkMqQx5DCZCdw5DIUAyE0QyGDAYMBUCxWIZqoFgRkJUJlk9lqGXgAuotBapKBapIBIpmA8l0IC8aKpoH5IPMLAbrTgbakgPERUCdCgyqBlcNVhp8NjhhsNrgpcEfnGZVg80AuaUSSCdB9KYWxPN3SQR/J6grF0iXMGQgdOF1cwlDGoMF2K2ZQLcXgEVAvkiG6C+rmv452CpItVrNYJHBa6D7FxrcNDgM9EFe2ZfkpYGpQbMZQBFgiB7cmIwwIz1DMz2TQBNlBydoVHAwSDMoMWgAw9ucwYHBgyGAIRRo71aG2wxPGJ4ymTHFMCUzpUKUMjFC9QgzoACmPABWNKjA</latexit>
g✓(xi) = vi
h✓(vi) = zi
ng
(2)
entries
fication
s for all
bjective
which
ssigned
outputs
To overcome this issue, we propose Neighbor Consis-
tency Regularization (NCR). Our main assumption is that
the over-fitting occurs less dramatically before the classi-
fier hW . This is supported by MOIT [30], which shows
that feature representations are robust enough to discrimi-
nate between noisy and clean examples when training a net-
work. With that assumption, we can design a smoothness
constraint similar to label propagation (2) when training the
network. The overview of our method is shown in Figure 1.
Let us define the similarity between two examples by
the cosine similarity of their feature representations, i.e.
si,j = cos(vi, vj) = vT
i vj/(kvikkvjk). Note that the fea-
ture representations contain non-negative values when ob-
tained after a ReLU non-linearity, and therefore the cosine
similarity is bounded in the interval [0, 1]. Our goal is to
enforce neighbor consistency regularization by leveraging
-
e
-
h
n
g
-
-
.
-
o
where DKL is the KL-divergence loss to measure the dif-
ference between two distributions, T is the temperature and
NNk(vi) denotes the set of k nearest neighbors of i in the
feature space. We set T = 2 throughout our experiments.
We normalize the similarity values so that the second term
of the KL-divergence loss remains a probability distribu-
tion. We set the self-similarity si,i = 0 so that it does not
dominate the normalized similarity. Gradients will be back-
propagated to all inputs.
The objective (3) ensures that the output of xi will be
consistent with the output of its neighbors regardless of its
potentially noisy label yi. We combine it with the super-
vised classification loss function (1) to obtain the final ob-
jective to minimized during the training:
L(X, Y ; ✓, W) := (1 ↵) · LS(X, Y ; ✓, W)
+ ↵ · LNCR(X, Y ; ✓, W), (4)
実験設定
nデータセット
• CIFAR10, 100 [Krizhevsky, 2009]
• mini-ImageNet Blue, Red [Jiang+, PMLR2020]
• Mini-WebVision [Li+, ICLR2020]
• WebVision [Li+, arXiv2017]
• Clothing1M [Xiao+, CVPR2015]
nバックボーン
• ResNet-18, 50
Learning with Neighbor C
Supplement
Table 4. List of network hyperparameters used in our experiments.
CIFAR-{10, 100} mini-{Red, Blue} mini-Webvision Clothing1M
Opt. SGD
Momentum 0.9
Batch 256 128 256 128
LR 0.1 0.1 0.1 0.002
LR Sch. cosine decay with linear warmup
Warmup 5
Epochs 250 130 130 80
Weight Dec. 5e 4 5e 4 1e 3 1e 3
Arch. ResNet-18 ResNet-50
ハイパーパラメータの影響
nNCR項αの影響
• αが高いと有効
n近傍数kの影響
• K=10で高精度
n初期エポックeの影響
• e=0でより良い性能を示す
0.2 0.4 0.6 0.8
60
70
80
90
↵
Accuracy
(%)
1 10 100
70
80
90
k
0 50 100 150
70
80
90
e
0% Noise 20% Noise 40% Noise
Figure 2. Ablation study. Impact of hyperparameters ↵, k and e are evaluated on the CIFAR-10 validation set with ResNet-18.
esent our method Neighbor Consistency Reg-
d compare it to classical label propagation.
ight its relationship to similar, online tech-
r Consistency Regularization
ing with noisy labels, the network is prone to
morize, the mapping from xi to a noisy label
ing data [26]. This behavior typically results
al classification performance in a clean eval-
he network does not generalize well.
consistent with the output of its neighbors regardless of its
potentially noisy label yi. We combine it with the super-
vised classification loss function (1) to obtain the final ob-
jective to minimized during the training:
L(X, Y ; ✓, W) := (1 ↵) · LS(X, Y ; ✓, W)
+ ↵ · LNCR(X, Y ; ✓, W), (4)
where the hyper-parameter ↵ 2 [0, 1] controls the impact
of the each loss term. Similar to label propagation, the fi-
nal loss objective (4) has two terms. The first term is the
classification loss term LS. This is analogous to the fitting
4675
of nearby points in the graph to be similar.
One of the main limitations of label propagation is its
transductive property. In transductive learning, the goal is
to classify seen unlabeled examples. This is different to in-
ductive learning, which learns a generic classifier to classify
any unseen data. To apply label propagation on new test ex-
amples, a new graph W needs to be constructed each time
a test example is seen. This makes it inefficient in practice.
Another requirement for label propagation is that the fea-
ture space needs to be fixed to compute the affinity matrix
W. This requires the feature extractor to be learned before-
hand, potentially from the noisy data. Existing work [15]
has tried to overcome this issue by alternating between opti-
mizing the feature space, and performing label propagation.
However, this does not directly enforce smoothness, as the
optimization of two components are done separately.
Our goal is to overcome the limitations of label propaga-
tion by 1) adapting it to an inductive setting 2) applying the
smoothness constraint directly during optimization. In Sec-
tion 4, we propose a simple and efficient approach which
generalizes label propagation by enforcing smoothness in
the form of a regularizer. As a result, we avoid constructing
an explicit graph to propagate the information, and infer-
ence can be performed on any unseen test example.
4. Method
We now present our method Neighbor Consistency Reg-
enforce neighbor consistency regularization by leveraging
the structure of the feature space produced by g✓ to enhance
the classifier hW . More specifically, hW (vi) and hW (vj)
should behave similarly if si,j is high, regardless of their la-
bels yi and yj. This would prevent the network from over-
fitting to an incorrect mapping between an example xi and
a label yi, if either (or both) yi and yj are noisy.
To enforce NCR, we design an objective function which
minimizes the distance between logits zi and zj, if the cor-
responding feature representations vi and vj are similar:
LNCR(X, Y ; ✓, W) :=
1
m
m
X
i=1
DKL
✓
(zi/T)
X
j2NNk(vi)
si,j
P
k si,k
· (zj/T)
◆
,
(3)
where DKL is the KL-divergence loss to measure the dif-
ference between two distributions, T is the temperature and
NNk(vi) denotes the set of k nearest neighbors of i in the
feature space. We set T = 2 throughout our experiments.
We normalize the similarity values so that the second term
of the KL-divergence loss remains a probability distribu-
tion. We set the self-similarity si,i = 0 so that it does not
dominate the normalized similarity. Gradients will be back-
propagated to all inputs.
The objective (3) ensures that the output of xi will be
consistent with the output of its neighbors regardless of its
バックボーン: ResNet-18
データセット: CIFAR-10
比較
n5回の実験の平均を報告
n最大17.6%の性能向上
nノイズ0%でも性能向上
• NCRは正則化効果を持つ
Table 1. Baseline and oracle comparison. Classification accuracy is reported on the mini-ImageNet-{Blue, Red}
ResNet-18 architecture for each individual noise ratio (0%, 20%, 40%, 80%). We present the mean accuracy and standa
five trials. The oracle model is trained on only the known, clean examples in the training set using a cross-entropy loss.
mini-ImageNet-Blue mini-ImageNet-Red
Method 0% 20% 40% 80% 0% 20% 40% 80%
BASELINES
Standard 65.8±0.4 49.5±0.4 36.6±0.5 13.1±1.0 63.5±0.5 55.3±0.9 49.5±0.7 36.4±0.4
Mixup 67.4±0.4 60.1±0.2 51.6±0.8 21.0±0.5 65.5±0.5 61.6±0.5 57.2±0.6 43.7±0.3
Bootstrap 66.4±0.4 54.4±0.5 44.8±0.5 2.9±0.3 64.3±0.3 56.2±0.2 51.3±0.6 38.2±0.3
Bootstrap + Mixup 67.5±0.3 61.9±0.4 51.0±0.7 1.3±0.1 65.9±0.4 62.7±0.2 58.3±0.5 43.5±0.6
Label smoothing 67.5±0.8 60.2±0.5 50.2±0.4 20.9±0.8 65.7±0.5 59.7±0.4 54.0±0.6 39.7±0.5
Label smoothing + Mixup 68.6±0.3 63.3±1.0 57.1±0.2 14.4±0.3 66.9±0.2 63.4±0.4 59.2±0.4 45.5±0.7
OURS
Ours: NCR 66.5±0.2 61.7±0.3 54.1±0.4 20.7±0.5 64.0±0.4 60.9±0.3 56.1±0.7 40.9±0.2
Ours: NCR + Mixup 67.9±0.6 64.3±0.1 59.2±0.6 14.2±0.4 66.3±0.5 64.6±0.6 60.4±0.3 45.4±0.4
OTHER WORKS
D-Mix [22] – – – – 55.8 50.3 50.9 35.4
ELR [26] – – – – 57.4 58.1 50.6 41.7
MOIT [30] – – – – 64.7 63.1 60.8 45.9
ORACLE: CLEAN SUBSET
Standard 65.8±0.4 63.9±0.5 60.6±0.4 45.4±0.8 63.5±0.5 61.7±0.1 58.4±0.3 41.5±0.5
Mixup 67.4±0.4 64.2±0.5 61.5±0.3 46.9±0.8 65.5±0.5 63.1±0.6 59.7±0.7 43.6±0.4
Blue-40% Blue-80% Red-40% Red-80%
比較
nConfidence scoreとデータの分布
• ベースライン(上段)ではノイズが過剰適合
• Clean, Noise どちらともp = 1
• NCR(下段)では過剰適合を回避
• Noiseはp=0付近が多い
Label smoothing + Mixup 68.6±0.3 63.3±1.0 57.1±0.2 14.4±0.3 66.9±0.2 63.4±0.4 59.2±0.4 45.5±0.7
OURS
Ours: NCR 66.5±0.2 61.7±0.3 54.1±0.4 20.7±0.5 64.0±0.4 60.9±0.3 56.1±0.7 40.9±0.2
Ours: NCR + Mixup 67.9±0.6 64.3±0.1 59.2±0.6 14.2±0.4 66.3±0.5 64.6±0.6 60.4±0.3 45.4±0.4
OTHER WORKS
D-Mix [22] – – – – 55.8 50.3 50.9 35.4
ELR [26] – – – – 57.4 58.1 50.6 41.7
MOIT [30] – – – – 64.7 63.1 60.8 45.9
ORACLE: CLEAN SUBSET
Standard 65.8±0.4 63.9±0.5 60.6±0.4 45.4±0.8 63.5±0.5 61.7±0.1 58.4±0.3 41.5±0.5
Mixup 67.4±0.4 64.2±0.5 61.5±0.3 46.9±0.8 65.5±0.5 63.1±0.6 59.7±0.7 43.6±0.4
Blue-40% Blue-80% Red-40% Red-80%
Standard
NCR
特徴空間での類似度分布
n正しいラベルと間違えたラベルのコサイン類似度分布
nBlue-40%では正しく分離された
nRedでは正しいラベルの分離はできたが誤ったラベルでは分離失敗
• Redは視覚的に似ているクラス間で誤ってラベル付けされるので特徴空間上で
は同じような振る舞いをし,類似度が高くなり分離失敗したと考えられる
Blue-40% Blue-80% Red-40% Red-80%
Standard
NCR
比較
MLNT; 3 iter. [23] – – 73.5
CleanNet [21] – – 74.7
LDMI [40] – – 72.5
LongReMix [6] – – 73.0
ELR [26] 76.3†
– –
ELR+ [26] 77.8†
– 74.8
DMix [22] 76.3 – 74.8
GJS [10] 79.3 – –
MoPro [24] – 73.9 –
MILe [32] – 75.2 –
Heteroscedastic [5] – 76.6†
–
CurrNet [12] – 79.3†
–
Table 3. State-of-the-art comparison with synthetic noise on
CIFAR. A-40% refers to 40% asymmetric noise. All of the other
columns refer to symmetric noise.
CIFAR-10 CIFAR-100
20% 40% 50% 80% 90% A-40% 20% 40% 50% 80% 90% A-40%
Standard 83.9 68.3 58.5 25.9 17.3 77.3 61.5 46.2 37.4 10.4 4.1 43.9
MOIT+ [30] 94.1 92.0 - 75.8 - 93.2 75.9 67.4 - 51.4 - 74.0
D-Mix [22] 95.1 94.2 93.6 91.4 74.5 91.8 76.7 74.6 73.1 57.1 29.7 72.1
ELR+ [26] 94.9 94.4 93.9 90.9 74.5 88.9 76.3 74.0 72.0 57.2 30.9 75.8
Ours+ [26] 95.2 94.5 94.3 91.6 75.1 90.7 76.6 74.2 72.5 58.0 30.8 76.3
eters across all noise ratios, unlike Divide-Mix [22], which
is a more realistic scenario in practice.
dient
fact th
thetic
noise
Limit
our pr
featur
this lim
epoch
remov
Pro
pling
examp
in the
applie
ing, as
Broad
ing fr
scrapi
mini-I
in this
is able
verten
when
web, i
to whi
fringin
Figure 4. Similarity distributions. We compare the distribution of cosine similarities for training examples in mini-ImageNet that are
correctly and incorrectly labelled as the same class or different classes. For mini-ImageNet-Blue, the features learned using NCR achieve
significantly better class separation with 40% noise (or less, not pictured). For the more realistic mini-ImageNet-Red, NCR still achieves
better separation of the clean examples but fails to separate examples that are incorrectly labelled as the same class.
Table 2. State-of-the-art comparison with realistic noise. We
compare NCR and our baselines to other methods on mini-
WebVision, WebVision and Clothing1M. All results use ResNet-
50, except for those marked by †
which use Inception-ResNetV2.
mini-WebVision WebVision Clothing1M
Standard 75.8 74.9 71.7
Mixup 77.2 75.5 72.2
Ours: NCR 77.1 75.7 74.4
Ours: NCR+Mixup 79.4 75.4 74.5
Ours: NCR+Mixup+DA 80.5 76.8 74.6
MLNT; 3 iter. [23] – – 73.5
CleanNet [21] – – 74.7
LDMI [40] – – 72.5
LongReMix [6] – – 73.0
ELR [26] 76.3†
– –
ELR+ [26] 77.8†
– 74.8
DMix [22] 76.3 – 74.8
GJS [10] 79.3 – –
MoPro [24] – 73.9 –
MILe [32] – 75.2 –
Heteroscedastic [5] – 76.6†
–
CurrNet [12] – 79.3†
–
6. Conclusion
This work introduced Neighborhood Consistency Reg-
ularization and demonstrated that it is an effective strat-
egy for deep learning with label noise. While our ap-
proach draws inspiration from multi-stage training proce-
dures for semi-supervised learning that employ transduc-
tive label propagation, it consists of a comparatively sim-
ple training procedure, requiring only that an extra loss be
added to the objective which is optimized in stochastic gra-
dient descent. The efficacy of NCR is emphasized by the
fact that it achieved state-of-the-art results under both syn-
thetic (CIFAR-10 and -100) and realistic (mini-WebVision)
noise scenarios.
Limitations and future work. A limitation of NCR is that
our proposed loss assumes that it has access to an adequate
feature representation of the training data. We overcome
this limitation in practice by first training the network for e
epochs before applying the NCR loss, but future work is to
remove this additional training hyperparameter.
Promising directions for future research including cou-
n現実的なデータセットにおいてトップ精度
n正則化項を加える同様な手法GJS [Engleson&Azizpour, NeurIPS2021]
よりも1.2%高精度
n既存手法に正則化項を追加するだけで拡張でき高精度
ELRにNCRを追加したモデルを既存手法と比較
まとめ
nラベルノイズを含むデータで学習するためのNCRを提案
nシンプルな学習手順からなる
nSynthetic, Realstic noiseどちらでも有効である
補足資料
nMini-Imagenet blue, redの違い
Supplementary Materials for Beyond Synthetic Noise: Deep Learning on
Controlled Noisy Labels
Lu Jiang Di Huang Mason Liu Weilong Yang
Ladybug
Orange
text search image search
Mini-ImageNet
true positive
Ladybug Ladybug Ladybug
Orange Orange Orange
synthetic symmetric labels
blue noise red noise
[Jiang+, PMLR2020]
補足資料
n使用したハイパーパラメータ
n1000クラスを含むwebvisionでのバッチサイズによる認識率への影響
• 視覚的に似ているクラスをバッチ内に含むために大きいバッチサイズが必要
comprises the (clean) examples from the 0% noise dataset
Table 5. List of NCR hyperparameters used in our experiments.
mini-ImageNet CIFAR-10 CIFAR-100 WebVision Clothing1M
0% 20% 40% 80%
↵ 0.7 0.7 0.7 0.5 0.1 0.1 0.5 0.9
k 50 1 1 1 10 10 10 1
e 100 50 50 0 50 200 0 40
ing pro
is pre-tr
C. Eff
We u
the batc
large-sc
We repo
or Consistency for Noisy Labels
mentary Material
nts.
g1M
Table 6. Effect of the batch size on our proposed NCR method, on
the WebVision dataset containing 1000 classes.
Batch Size 256 512 1024 2048
Accuracy 73.9 75.0 75.7 75.6

More Related Content

What's hot

DNNの曖昧性に関する研究動向
DNNの曖昧性に関する研究動向DNNの曖昧性に関する研究動向
DNNの曖昧性に関する研究動向Naoki Matsunaga
 
強化学習における好奇心
強化学習における好奇心強化学習における好奇心
強化学習における好奇心Shota Imai
 
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision LearnersMasked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision LearnersGuoqingLiu9
 
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習SSII
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"Yuta Koreeda
 
[DL輪読会]representation learning via invariant causal mechanisms
[DL輪読会]representation learning via invariant causal mechanisms[DL輪読会]representation learning via invariant causal mechanisms
[DL輪読会]representation learning via invariant causal mechanismsDeep Learning JP
 
メタスタディ (Vision and Language)
メタスタディ (Vision and Language)メタスタディ (Vision and Language)
メタスタディ (Vision and Language)Shintaro Yamamoto
 
[DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions [DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions Deep Learning JP
 
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...Deep Learning JP
 
強化学習その2
強化学習その2強化学習その2
強化学習その2nishio
 
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings  (EMNLP 2021)【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings  (EMNLP 2021)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)Deep Learning JP
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたぱんいち すみもと
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Yoshitaka Ushiku
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方joisino
 
【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...
【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...
【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...Deep Learning JP
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision TransformerYusuke Uchida
 

What's hot (20)

DNNの曖昧性に関する研究動向
DNNの曖昧性に関する研究動向DNNの曖昧性に関する研究動向
DNNの曖昧性に関する研究動向
 
強化学習における好奇心
強化学習における好奇心強化学習における好奇心
強化学習における好奇心
 
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision LearnersMasked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
 
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"
 
[DL輪読会]representation learning via invariant causal mechanisms
[DL輪読会]representation learning via invariant causal mechanisms[DL輪読会]representation learning via invariant causal mechanisms
[DL輪読会]representation learning via invariant causal mechanisms
 
メタスタディ (Vision and Language)
メタスタディ (Vision and Language)メタスタディ (Vision and Language)
メタスタディ (Vision and Language)
 
[DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions [DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions
 
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
 
強化学習その2
強化学習その2強化学習その2
強化学習その2
 
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings  (EMNLP 2021)【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings  (EMNLP 2021)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
 
continual learning survey
continual learning surveycontinual learning survey
continual learning survey
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめた
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方
 
【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...
【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...
【DL輪読会】Standardized Max Logits: A Simple yet Effective Approach for Identifyi...
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
 
BERT入門
BERT入門BERT入門
BERT入門
 

Similar to 論文紹介:Learning With Neighbor Consistency for Noisy Labels

A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image CaptioningToru Tamaki
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscapeDevansh16
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...cvpaper. challenge
 
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A SurveyToru Tamaki
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksVisualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksWilly Marroquin (WillyDevNET)
 
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREM ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREijcga
 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIJMER
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi
 
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptxthanhdowork
 
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION cscpconf
 
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...Mamoon Ismail Khalid
 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAngie Miller
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Riccardo Satta
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsCHOOSE
 

Similar to 論文紹介:Learning With Neighbor Consistency for Noisy Labels (20)

A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
文献紹介:Learning From Noisy Labels With Deep Neural Networks: A Survey
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Visualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional NetworksVisualizing and Understanding Convolutional Networks
Visualizing and Understanding Convolutional Networks
 
ssc_icml13
ssc_icml13ssc_icml13
ssc_icml13
 
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREM ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
 
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image RecognitionDeep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
 
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
PERFORMANCE EVALUATION OF DIFFERENT TECHNIQUES FOR TEXTURE CLASSIFICATION
 
G04654247
G04654247G04654247
G04654247
 
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...Joint3DShapeMatching  - a fast approach to 3D model matching using MatchALS 3...
Joint3DShapeMatching - a fast approach to 3D model matching using MatchALS 3...
 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer Design
 
Joint3DShapeMatching
Joint3DShapeMatchingJoint3DShapeMatching
Joint3DShapeMatching
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based Systems
 

More from Toru Tamaki

論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...Toru Tamaki
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...Toru Tamaki
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNetToru Tamaki
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A surveyToru Tamaki
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex ScenesToru Tamaki
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...Toru Tamaki
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video SegmentationToru Tamaki
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New HopeToru Tamaki
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...Toru Tamaki
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt TuningToru Tamaki
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in MoviesToru Tamaki
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICAToru Tamaki
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context RefinementToru Tamaki
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...Toru Tamaki
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...Toru Tamaki
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusionToru Tamaki
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous DrivingToru Tamaki
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large MotionToru Tamaki
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense PredictionsToru Tamaki
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understandingToru Tamaki
 

More from Toru Tamaki (20)

論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

論文紹介:Learning With Neighbor Consistency for Noisy Labels

  • 1. Learning With Neighbor Consistency for Noisy Labels Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid, CVPR2022 橋口凌大(名工大) 2023/6/30
  • 2. 概要 nノイズの多いラベルからの学習方法の提案 nNeighbor Consistency Regularization (NCR)を提案 • 類似の特徴表現を持つデータは類似の予測をするように促す nベースラインよりも高い精度を達成する nノイズの多いデータからの学習手法 [Song+, IEEE TNNLS 2022] • ロバストなアーキテクチャ • ロバストな正則化 • ロバストな損失関数の設計 • サンプル選択
  • 3. 従来研究 n予測値を用いた修正 [Han+, NeurIPS2018][Li+, ICLR2020] nラベル伝播アルゴリズム [Iscen+, CVPR2019] Mini-batch 1 A A A A A A B B B A A A B B B != != M-Net Decoupling Co-teaching Mini-batch 2 Mini-batch 3 gure 1: Comparison of error flow among MentorNet (M-Net) [17], Decoupling [26] and Co- aching. Assume that the error flow comes from the biased selection of training instances, and error ow from network A or B is denoted by red arrows or blue arrows, respectively. Left panel: M-Net aintains only one network (A). Middle panel: Decoupling maintains two networks (A & B). The rameters of two networks are updated, when the predictions of them disagree (!=). Right panel: o-teaching maintains two networks (A & B) simultaneously. In each mini-batch data, each network mples its small-loss instances as the useful knowledge, and teaches such useful instances to its peer twork for the further training. Thus, the error flow in Co-teaching displays the zigzag shape. multaneously, and then updates models only using the instances that have different predictions from ese two networks. Nonetheless, noisy labels are evenly spread across the whole space of examples. hus, the disagreement area includes a number of noisy labels, where the Decoupling approach cannot ndle noisy labels explicitly. Although MentorNet and Decoupling are representative approaches in is promising direction, there still exist the above discussed issues, which naturally motivates us to mprove them in our research. eanwhile, an interesting observation for deep models is that they can memorize easy instances st, and gradually adapt to hard instances as training epochs become large [2]. When noisy labels ist, deep learning models will eventually memorize these wrongly given labels [45], which leads to e poor generalization performance. Besides, this phenomenon does not change with the choice of aining optimizations (e.g., Adagrad [9] and Adam [18]) or network architectures (e.g., MLP [15], exnet [20] and Inception [37]) [17, 45]. this paper, we propose a simple but effective learning paradigm called “Co-teaching”, which allows to train deep networks robustly even with extremely noisy labels (e.g., 45% of noisy labels occur the fine-grained classification with multiple classes [8]). Our idea stems from the Co-training Published as a conference paper at ICLR 2020 Mini-batch 1 … … Mini-batch 2 !" ($) , '" ($) A B Epoch e !" (() , '" (() ) MixMatch MixMatch MixMatch MixMatch A B Co-Divide … … A B Epoch e-1 GMM GMM Figure 1: DivideMix trains two networks (A and B) simultaneously. At each epoch, a network models its per-sample loss distribution with a GMM to divide the dataset into a labeled set (mostly clean) and an unlabeled set (mostly noisy), which is then used as training data for the other network (i.e. co-divide). At each mini-batch, a network performs semi-supervised training using an improved MixMatch method. We perform label co-refinement on the labeled samples and label co-guessing on the unlabeled samples. 2.2 SEMI-SUPERVISED LEARNING SSL methods aim to improve the model’s performance by leveraging unlabeled data. Current state-of-the-art SSL methods mostly involve adding an additional loss term on unlabeled data to regularize training. The regularization falls into two classes: consistency regularization (Laine & Aila, 2017; Tarvainen & Valpola, 2017; Miyato et al., 2019) enforces the model to produce consistent predictions on augmented input data; entropy minimization (Grandvalet & Bengio, 2004; Lee, 2013) encourages the model to give high-confidence predictions on unlabeled data. Recently, Berthelot et al. (2019) propose MixMatch, which unifies consistency regularization, entropy minimization, and the MixUp (Zhang et al., 2018) regularization into one framework. Feature extractor ✓ FC + softmax Network f✓ Phase 1: Train for T epochs with Ls(XL, YL; ✓) (labeled examples only) Train for 1 epoch with Lw(X, YL, ŶU ; ✓) (all examples) Extract descriptors V Compute affinity A (9) W A + A> W D 1/2 W D 1/2 Use ✓ Solve (10) Label propagation Phase 2: Iterate T0 times : labels : missing labels : pseudo-labels (size proportional to certainty !i)
  • 5. NCR nラベルノイズの記憶[Liu+, NeurIPS2020]を防ぐためにNCRを定義 • K近傍のデータの分布に類似度で重み付け • 自分の分布と足し合わせた分布を近づけるように学習 n類似度の計算 nネットワークの学習 al is o in- ssify t ex- time tice. fea- atrix fore- [15] opti- tion. s the aga- g the Sec- hich ss in cting the classifier hW . More specifically, hW (vi) and hW (vj) should behave similarly if si,j is high, regardless of their la- bels yi and yj. This would prevent the network from over- fitting to an incorrect mapping between an example xi and a label yi, if either (or both) yi and yj are noisy. To enforce NCR, we design an objective function which minimizes the distance between logits zi and zj, if the cor- responding feature representations vi and vj are similar: LNCR(X, Y ; ✓, W) := 1 m m X i=1 DKL ✓ (zi/T) X j2NNk(vi) si,j P k si,k · (zj/T) ◆ , (3) where DKL is the KL-divergence loss to measure the dif- ference between two distributions, T is the temperature and NNk(vi) denotes the set of k nearest neighbors of i in the feature space. We set T = 2 throughout our experiments. We normalize the similarity values so that the second term of the KL-divergence loss remains a probability distribu- hotdog Neighbor Consistency Regularization Cross Entropy Backbone Classifier Loss Function Neighborhood in feature space Minibatch Figure 1. To address the problem of noisy labels in the training set, we propose Neighbor Consisteny Regularizatio encourages examples with similar feature representations to have similar outputs, thus mitigating the impact of train incorrect labels. match [3] proposed variants of mixup for semi-supervised learning in which predictions replace the labels for unsu- pervised examples. Xie et al. [44] introduced Unsupervised Data Augmentation for semi-supervised image classifica- tion, where a model is encouraged to be robust to label- preserving transformations even when labels are not avail- able by minimizing the divergence between predictions for transformed and non-transformed images. Most relevant to our work, [10] used prediction consistency with respect to image transformations for the express purpose of learning with noisy labels. While these forms of consistency are ef- fective regularizers, neighbor consistency offers the ability to transfer supervision directly to mislabelled examples. 3. Preliminaries respond to the feature extractor and class The feature extractor maps an image xi to vector vi := g✓(xi) 2 Rd . The class dimensional vector to class scores zi := Typically, the network parameters are lea ing a loss function for supervised classific LS(X, Y ; ✓, W) := 1 m m X i=1 ` ( ( where X and Y correspond to the set of mini-batch, m = |X| = |Y | denotes the batch, is the softmax function and `(q <latexit sha1_base64="bU2ZsQABHGEIlRU836R53gOIS5A=">AAACtHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQv4p8fHlGSkliRqxOQmlmQkpVVX1MZnairYKsD4ZUA+V0wMVwa6wjI0hVUghfECygZ6BmCggMkwhDKUGaAgIF9gOUMMQwpDPkMyQylDLkMqQx5DCZCdw5DIUAyE0QyGDAYMBUCxWIZqoFgRkJUJlk9lqGXgAuotBapKBapIBIpmA8l0IC8aKpoH5IPMLAbrTgbakgPERUCdCgyqBlcNVhp8NjhhsNrgpcEfnGZVg80AuaUSSCdB9KYWxPN3SQR/J6grF0iXMGQgdOF1cwlDGoMF2K2ZQLcXgEVAvkiG6C+rmv452CpItVrNYJHBa6D7FxrcNDgM9EFe2ZfkpYGpQbMZQBFgiB7cmIwwIz1DMz2TQBNlBydoVHAwSDMoMWgAw9ucwYHBgyGAIRRo71aG2wxPGJ4ymTHFMCUzpUKUMjFC9QgzoACmPABWNKjA</latexit> g✓(xi) = vi h✓(vi) = zi ng (2) entries fication s for all bjective which ssigned outputs To overcome this issue, we propose Neighbor Consis- tency Regularization (NCR). Our main assumption is that the over-fitting occurs less dramatically before the classi- fier hW . This is supported by MOIT [30], which shows that feature representations are robust enough to discrimi- nate between noisy and clean examples when training a net- work. With that assumption, we can design a smoothness constraint similar to label propagation (2) when training the network. The overview of our method is shown in Figure 1. Let us define the similarity between two examples by the cosine similarity of their feature representations, i.e. si,j = cos(vi, vj) = vT i vj/(kvikkvjk). Note that the fea- ture representations contain non-negative values when ob- tained after a ReLU non-linearity, and therefore the cosine similarity is bounded in the interval [0, 1]. Our goal is to enforce neighbor consistency regularization by leveraging - e - h n g - - . - o where DKL is the KL-divergence loss to measure the dif- ference between two distributions, T is the temperature and NNk(vi) denotes the set of k nearest neighbors of i in the feature space. We set T = 2 throughout our experiments. We normalize the similarity values so that the second term of the KL-divergence loss remains a probability distribu- tion. We set the self-similarity si,i = 0 so that it does not dominate the normalized similarity. Gradients will be back- propagated to all inputs. The objective (3) ensures that the output of xi will be consistent with the output of its neighbors regardless of its potentially noisy label yi. We combine it with the super- vised classification loss function (1) to obtain the final ob- jective to minimized during the training: L(X, Y ; ✓, W) := (1 ↵) · LS(X, Y ; ✓, W) + ↵ · LNCR(X, Y ; ✓, W), (4)
  • 6. 実験設定 nデータセット • CIFAR10, 100 [Krizhevsky, 2009] • mini-ImageNet Blue, Red [Jiang+, PMLR2020] • Mini-WebVision [Li+, ICLR2020] • WebVision [Li+, arXiv2017] • Clothing1M [Xiao+, CVPR2015] nバックボーン • ResNet-18, 50 Learning with Neighbor C Supplement Table 4. List of network hyperparameters used in our experiments. CIFAR-{10, 100} mini-{Red, Blue} mini-Webvision Clothing1M Opt. SGD Momentum 0.9 Batch 256 128 256 128 LR 0.1 0.1 0.1 0.002 LR Sch. cosine decay with linear warmup Warmup 5 Epochs 250 130 130 80 Weight Dec. 5e 4 5e 4 1e 3 1e 3 Arch. ResNet-18 ResNet-50
  • 7. ハイパーパラメータの影響 nNCR項αの影響 • αが高いと有効 n近傍数kの影響 • K=10で高精度 n初期エポックeの影響 • e=0でより良い性能を示す 0.2 0.4 0.6 0.8 60 70 80 90 ↵ Accuracy (%) 1 10 100 70 80 90 k 0 50 100 150 70 80 90 e 0% Noise 20% Noise 40% Noise Figure 2. Ablation study. Impact of hyperparameters ↵, k and e are evaluated on the CIFAR-10 validation set with ResNet-18. esent our method Neighbor Consistency Reg- d compare it to classical label propagation. ight its relationship to similar, online tech- r Consistency Regularization ing with noisy labels, the network is prone to morize, the mapping from xi to a noisy label ing data [26]. This behavior typically results al classification performance in a clean eval- he network does not generalize well. consistent with the output of its neighbors regardless of its potentially noisy label yi. We combine it with the super- vised classification loss function (1) to obtain the final ob- jective to minimized during the training: L(X, Y ; ✓, W) := (1 ↵) · LS(X, Y ; ✓, W) + ↵ · LNCR(X, Y ; ✓, W), (4) where the hyper-parameter ↵ 2 [0, 1] controls the impact of the each loss term. Similar to label propagation, the fi- nal loss objective (4) has two terms. The first term is the classification loss term LS. This is analogous to the fitting 4675 of nearby points in the graph to be similar. One of the main limitations of label propagation is its transductive property. In transductive learning, the goal is to classify seen unlabeled examples. This is different to in- ductive learning, which learns a generic classifier to classify any unseen data. To apply label propagation on new test ex- amples, a new graph W needs to be constructed each time a test example is seen. This makes it inefficient in practice. Another requirement for label propagation is that the fea- ture space needs to be fixed to compute the affinity matrix W. This requires the feature extractor to be learned before- hand, potentially from the noisy data. Existing work [15] has tried to overcome this issue by alternating between opti- mizing the feature space, and performing label propagation. However, this does not directly enforce smoothness, as the optimization of two components are done separately. Our goal is to overcome the limitations of label propaga- tion by 1) adapting it to an inductive setting 2) applying the smoothness constraint directly during optimization. In Sec- tion 4, we propose a simple and efficient approach which generalizes label propagation by enforcing smoothness in the form of a regularizer. As a result, we avoid constructing an explicit graph to propagate the information, and infer- ence can be performed on any unseen test example. 4. Method We now present our method Neighbor Consistency Reg- enforce neighbor consistency regularization by leveraging the structure of the feature space produced by g✓ to enhance the classifier hW . More specifically, hW (vi) and hW (vj) should behave similarly if si,j is high, regardless of their la- bels yi and yj. This would prevent the network from over- fitting to an incorrect mapping between an example xi and a label yi, if either (or both) yi and yj are noisy. To enforce NCR, we design an objective function which minimizes the distance between logits zi and zj, if the cor- responding feature representations vi and vj are similar: LNCR(X, Y ; ✓, W) := 1 m m X i=1 DKL ✓ (zi/T) X j2NNk(vi) si,j P k si,k · (zj/T) ◆ , (3) where DKL is the KL-divergence loss to measure the dif- ference between two distributions, T is the temperature and NNk(vi) denotes the set of k nearest neighbors of i in the feature space. We set T = 2 throughout our experiments. We normalize the similarity values so that the second term of the KL-divergence loss remains a probability distribu- tion. We set the self-similarity si,i = 0 so that it does not dominate the normalized similarity. Gradients will be back- propagated to all inputs. The objective (3) ensures that the output of xi will be consistent with the output of its neighbors regardless of its バックボーン: ResNet-18 データセット: CIFAR-10
  • 8. 比較 n5回の実験の平均を報告 n最大17.6%の性能向上 nノイズ0%でも性能向上 • NCRは正則化効果を持つ Table 1. Baseline and oracle comparison. Classification accuracy is reported on the mini-ImageNet-{Blue, Red} ResNet-18 architecture for each individual noise ratio (0%, 20%, 40%, 80%). We present the mean accuracy and standa five trials. The oracle model is trained on only the known, clean examples in the training set using a cross-entropy loss. mini-ImageNet-Blue mini-ImageNet-Red Method 0% 20% 40% 80% 0% 20% 40% 80% BASELINES Standard 65.8±0.4 49.5±0.4 36.6±0.5 13.1±1.0 63.5±0.5 55.3±0.9 49.5±0.7 36.4±0.4 Mixup 67.4±0.4 60.1±0.2 51.6±0.8 21.0±0.5 65.5±0.5 61.6±0.5 57.2±0.6 43.7±0.3 Bootstrap 66.4±0.4 54.4±0.5 44.8±0.5 2.9±0.3 64.3±0.3 56.2±0.2 51.3±0.6 38.2±0.3 Bootstrap + Mixup 67.5±0.3 61.9±0.4 51.0±0.7 1.3±0.1 65.9±0.4 62.7±0.2 58.3±0.5 43.5±0.6 Label smoothing 67.5±0.8 60.2±0.5 50.2±0.4 20.9±0.8 65.7±0.5 59.7±0.4 54.0±0.6 39.7±0.5 Label smoothing + Mixup 68.6±0.3 63.3±1.0 57.1±0.2 14.4±0.3 66.9±0.2 63.4±0.4 59.2±0.4 45.5±0.7 OURS Ours: NCR 66.5±0.2 61.7±0.3 54.1±0.4 20.7±0.5 64.0±0.4 60.9±0.3 56.1±0.7 40.9±0.2 Ours: NCR + Mixup 67.9±0.6 64.3±0.1 59.2±0.6 14.2±0.4 66.3±0.5 64.6±0.6 60.4±0.3 45.4±0.4 OTHER WORKS D-Mix [22] – – – – 55.8 50.3 50.9 35.4 ELR [26] – – – – 57.4 58.1 50.6 41.7 MOIT [30] – – – – 64.7 63.1 60.8 45.9 ORACLE: CLEAN SUBSET Standard 65.8±0.4 63.9±0.5 60.6±0.4 45.4±0.8 63.5±0.5 61.7±0.1 58.4±0.3 41.5±0.5 Mixup 67.4±0.4 64.2±0.5 61.5±0.3 46.9±0.8 65.5±0.5 63.1±0.6 59.7±0.7 43.6±0.4 Blue-40% Blue-80% Red-40% Red-80%
  • 9. 比較 nConfidence scoreとデータの分布 • ベースライン(上段)ではノイズが過剰適合 • Clean, Noise どちらともp = 1 • NCR(下段)では過剰適合を回避 • Noiseはp=0付近が多い Label smoothing + Mixup 68.6±0.3 63.3±1.0 57.1±0.2 14.4±0.3 66.9±0.2 63.4±0.4 59.2±0.4 45.5±0.7 OURS Ours: NCR 66.5±0.2 61.7±0.3 54.1±0.4 20.7±0.5 64.0±0.4 60.9±0.3 56.1±0.7 40.9±0.2 Ours: NCR + Mixup 67.9±0.6 64.3±0.1 59.2±0.6 14.2±0.4 66.3±0.5 64.6±0.6 60.4±0.3 45.4±0.4 OTHER WORKS D-Mix [22] – – – – 55.8 50.3 50.9 35.4 ELR [26] – – – – 57.4 58.1 50.6 41.7 MOIT [30] – – – – 64.7 63.1 60.8 45.9 ORACLE: CLEAN SUBSET Standard 65.8±0.4 63.9±0.5 60.6±0.4 45.4±0.8 63.5±0.5 61.7±0.1 58.4±0.3 41.5±0.5 Mixup 67.4±0.4 64.2±0.5 61.5±0.3 46.9±0.8 65.5±0.5 63.1±0.6 59.7±0.7 43.6±0.4 Blue-40% Blue-80% Red-40% Red-80% Standard NCR
  • 11. 比較 MLNT; 3 iter. [23] – – 73.5 CleanNet [21] – – 74.7 LDMI [40] – – 72.5 LongReMix [6] – – 73.0 ELR [26] 76.3† – – ELR+ [26] 77.8† – 74.8 DMix [22] 76.3 – 74.8 GJS [10] 79.3 – – MoPro [24] – 73.9 – MILe [32] – 75.2 – Heteroscedastic [5] – 76.6† – CurrNet [12] – 79.3† – Table 3. State-of-the-art comparison with synthetic noise on CIFAR. A-40% refers to 40% asymmetric noise. All of the other columns refer to symmetric noise. CIFAR-10 CIFAR-100 20% 40% 50% 80% 90% A-40% 20% 40% 50% 80% 90% A-40% Standard 83.9 68.3 58.5 25.9 17.3 77.3 61.5 46.2 37.4 10.4 4.1 43.9 MOIT+ [30] 94.1 92.0 - 75.8 - 93.2 75.9 67.4 - 51.4 - 74.0 D-Mix [22] 95.1 94.2 93.6 91.4 74.5 91.8 76.7 74.6 73.1 57.1 29.7 72.1 ELR+ [26] 94.9 94.4 93.9 90.9 74.5 88.9 76.3 74.0 72.0 57.2 30.9 75.8 Ours+ [26] 95.2 94.5 94.3 91.6 75.1 90.7 76.6 74.2 72.5 58.0 30.8 76.3 eters across all noise ratios, unlike Divide-Mix [22], which is a more realistic scenario in practice. dient fact th thetic noise Limit our pr featur this lim epoch remov Pro pling examp in the applie ing, as Broad ing fr scrapi mini-I in this is able verten when web, i to whi fringin Figure 4. Similarity distributions. We compare the distribution of cosine similarities for training examples in mini-ImageNet that are correctly and incorrectly labelled as the same class or different classes. For mini-ImageNet-Blue, the features learned using NCR achieve significantly better class separation with 40% noise (or less, not pictured). For the more realistic mini-ImageNet-Red, NCR still achieves better separation of the clean examples but fails to separate examples that are incorrectly labelled as the same class. Table 2. State-of-the-art comparison with realistic noise. We compare NCR and our baselines to other methods on mini- WebVision, WebVision and Clothing1M. All results use ResNet- 50, except for those marked by † which use Inception-ResNetV2. mini-WebVision WebVision Clothing1M Standard 75.8 74.9 71.7 Mixup 77.2 75.5 72.2 Ours: NCR 77.1 75.7 74.4 Ours: NCR+Mixup 79.4 75.4 74.5 Ours: NCR+Mixup+DA 80.5 76.8 74.6 MLNT; 3 iter. [23] – – 73.5 CleanNet [21] – – 74.7 LDMI [40] – – 72.5 LongReMix [6] – – 73.0 ELR [26] 76.3† – – ELR+ [26] 77.8† – 74.8 DMix [22] 76.3 – 74.8 GJS [10] 79.3 – – MoPro [24] – 73.9 – MILe [32] – 75.2 – Heteroscedastic [5] – 76.6† – CurrNet [12] – 79.3† – 6. Conclusion This work introduced Neighborhood Consistency Reg- ularization and demonstrated that it is an effective strat- egy for deep learning with label noise. While our ap- proach draws inspiration from multi-stage training proce- dures for semi-supervised learning that employ transduc- tive label propagation, it consists of a comparatively sim- ple training procedure, requiring only that an extra loss be added to the objective which is optimized in stochastic gra- dient descent. The efficacy of NCR is emphasized by the fact that it achieved state-of-the-art results under both syn- thetic (CIFAR-10 and -100) and realistic (mini-WebVision) noise scenarios. Limitations and future work. A limitation of NCR is that our proposed loss assumes that it has access to an adequate feature representation of the training data. We overcome this limitation in practice by first training the network for e epochs before applying the NCR loss, but future work is to remove this additional training hyperparameter. Promising directions for future research including cou- n現実的なデータセットにおいてトップ精度 n正則化項を加える同様な手法GJS [Engleson&Azizpour, NeurIPS2021] よりも1.2%高精度 n既存手法に正則化項を追加するだけで拡張でき高精度 ELRにNCRを追加したモデルを既存手法と比較
  • 13. 補足資料 nMini-Imagenet blue, redの違い Supplementary Materials for Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Lu Jiang Di Huang Mason Liu Weilong Yang Ladybug Orange text search image search Mini-ImageNet true positive Ladybug Ladybug Ladybug Orange Orange Orange synthetic symmetric labels blue noise red noise [Jiang+, PMLR2020]
  • 14. 補足資料 n使用したハイパーパラメータ n1000クラスを含むwebvisionでのバッチサイズによる認識率への影響 • 視覚的に似ているクラスをバッチ内に含むために大きいバッチサイズが必要 comprises the (clean) examples from the 0% noise dataset Table 5. List of NCR hyperparameters used in our experiments. mini-ImageNet CIFAR-10 CIFAR-100 WebVision Clothing1M 0% 20% 40% 80% ↵ 0.7 0.7 0.7 0.5 0.1 0.1 0.5 0.9 k 50 1 1 1 10 10 10 1 e 100 50 50 0 50 200 0 40 ing pro is pre-tr C. Eff We u the batc large-sc We repo or Consistency for Noisy Labels mentary Material nts. g1M Table 6. Effect of the batch size on our proposed NCR method, on the WebVision dataset containing 1000 classes. Batch Size 256 512 1024 2048 Accuracy 73.9 75.0 75.7 75.6