Humpback whale identification challenge反省会

鯨コンペ反省会
あるいは1週間チャレンジ
@yu4u

team merger deadline (2/21) に初サブ
3
• いきなり独自モデル初サブで死亡
• 元々前職の知り合いとやろうという話を
していたのでミジンコなのにマージしてもらう

4
• 最近の距離学習はクラス分類として
学習できるらしいことを聞く
• 速攻ArcFaceベースの手法でやってみると
まともな精度が出た！
• ↓読みましょう
モダンな深層距離学習 (deep metric learning) 手法:
SphereFace, CosFace, ArcFace
https://qiita.com/yu4u/items/078054dfb5592cbb80cc

Summary
6
• 上位それぞれがかなり異なる手法で解いている
• 各手法自体がシンプル、1モデルでも高い精度が出せる
• 異なるモデルのアンサンブルも有効
という、かなり良コンペだったのではないか

鯨コンペ概要
7
• train/test: 🐳ちゃんの可愛いしっぽ画像
– ある程度align, cropされている
• trainの各画像には、whale_idがついている
• whale_idには識別されていない “new_whale” が存在
• 各test画像に対し、正解のwhale_idを当てる問題
• 精度指標はMAP@5, “new_whale” が正解となる🐳ちゃんが30%弱
train test

鯨コンペ概要
8
• train/test: 🐳ちゃんの可愛いしっぽ画像
– ある程度align, cropされている
• trainの各画像には、whale_idがついている
• whale_idには識別されていない “new_whale” が存在
• 各test画像に対し、正解のwhale_idを当てる問題
• 精度指標はMAP@5, “new_whale” が正解となる🐳ちゃんが30%弱
train
train: 25361 (unique id: 5004)
new_whale 9664
w_23a388d 73
w_9b5109b 65
w_9c506f6 62
…
test: 7960

罠
9
• new_whaleにもwhale_idが振られている🐳画像がある
• 同じ🐳だが異なるwhale_idが振られているものがある（いっぱい）

鯨コンペ概要
10
• タスク
– 問題としては顔認識と同じ
– 実はGoogle Landmark Recognition Challengeとも同じ
• 考えられる解法
– 距離学習（顔認識デファクト、landmark challengeで使われた）
– クラス分類として解く（new_whaleが課題）
– 局所特徴マッチングで解く（landmark challengeで使われた）
実際はどうだったのか？

神kernel①
11
• 🐳ちゃんしっぽdetector
• 顔認識においては、必ず顔検出が前処理として入る
• どのアプローチでも必ず効果がある
https://www.kaggle.com/martinpiotte/bounding-box-model
Maskカーネルもあるよ
https://www.kaggle.com/c/humpback-whale-identification/discussion/78453

神kernel②
12
https://www.kaggle.com/seesee/siamese-pretrained-0-822

神kernel②
13
• みんなSiameseNet, SiameseNet言うようになったkernel
• SiameseNetは通常contrastive lossを用いて距離学習を行う
– 学習が大変、不安定
• このkernelのSiameseNetは画像を2枚入力し
それらが同一の🐳かどうかを出力する
– クラス分類なので学習が簡単
– 精度も恐らくこちらのほうが高い
CNN
CNN
d2
weight share 特徴ベクトル
🐳
🐳
CNN
CNN
weight share
🐳
🐳
contrastive
loss
x1
x2
x1+x2
x1*x2
|x1-x2|
|x1-x2|2
全てpairwise
の演算
CNN 0~1
binary
crossentropy
通常のSiameseNet
+contrastive loss
Kernelの
SiameseNet
f(x1, x2)
= f(x2, x1)
となる設計

神kernel②
14
• みんなSiameseNet, SiameseNet言うようになったkernel
• SiameseNetは通常contrastive lossを用いて距離学習を行う
– 学習が大変、不安定
• このkernelのSiameseNetは画像を2枚入力し
それらが同一の🐳かどうかを出力する
– クラス分類なので学習が簡単
– 精度も恐らくこちらのほうが高い
CNN
CNN
d2
weight share 特徴ベクトル
🐳
🐳
CNN
CNN
weight share
🐳
🐳
contrastive
loss
x1
x2
x1+x2
x1*x2
|x1-x2|
|x1-x2|2
全てpairwise
の演算
CNN 0~1
binary
crossentropy
通常のSiameseNet
+contrastive loss
Kernelの
SiameseNet
f(x1, x2)
= f(x2, x1)
となる設計
特徴抽出ネットワーク分類ネットワーク

神Kernel②の訓練
15
1. train🐳から特徴抽出
– 特徴抽出NWをforward
2. positive pair抽出
3. negative pair抽出
a. 全🐳特徴ベクトル間のスコア(*-1)を計算しcost matrixとする
（分類NWをforward。画像数C2。分類NWは軽いので可能）
b. cost matrixの同じ🐳の部分を無限大に。対角も
c. cost matrixに対しlinear assignment problem (LAP)を解いて
costの小さいペアリストを取得
＝違う🐳なのにスコアが高い組み合わせを作る
使ったペアはコスト無限大に。5epoch使い回す
4. pos, negペアをネットワーク全体で学習
同一🐳内画像で
同一画像がペアに
ならないようにする
最初はcost matrix
に乱数を加えて手
心を加える

神Kernel②の訓練
16
1. train🐳から特徴抽出
– 特徴抽出NWをforward
2. positive pair抽出
3. negative pair抽出
a. 全🐳特徴ベクトル間のスコア(*-1)を計算しcost matrixとする
（分類NWをforward。画像数C2。分類NWは軽いので可能）
b. cost matrixの同じ🐳の部分を無限大に。対角も
c. cost matrixに対しlinear assignment problem (LAP)を解いて
costの小さいペアリストを取得
＝違う🐳なのにスコアが高い組み合わせを作る
使ったペアはコスト無限大に。5epoch使い回す
4. pos, negペアをネットワーク全体で学習
同一🐳内画像で
同一画像がペアに
ならないようにする
こんなんだと重すぎて無理 CNN
🐳
🐳
0~1
最初はcost matrix
に乱数を加えて手
心を加える

神Kernel②の推論
17
1. train🐳から特徴抽出（特徴抽出NWをforward）
2. test🐳から特徴抽出（特徴抽出NWをforward）
3. test🐳 vs. train🐳のスコアを算出（分類NWをforward）
4. for each test🐳:
スコア順にtrain🐳の🐳IDを正解に加える
但し、スコアがしきい値以下の場合、
正解にnew_whaleがなければnew_whaleを加える
whale_id毎のmeanのほうが良いかも

1st Solution
18
• 5004クラスをflipして10008クラスしてそれぞれbinary classification
🐳
global
average
pooling
channel方向に
pooling
BCE+
lovasz_loss
512x256
BBOX
RGB+mask
test時はflipも入力して
平均を取る
（対応するクラスが
分かっている）
https://github.com/earhian/Humpback-Whale-Identification-1st-

3rd Solution
19
• Train original bbox regressor (5 fold CV and trained 5 models)
• 320x320 input, DenseNet121 + ArcFace (s=65, m=0.5),
weight decay 0.0005, dropout 0.5
• Augmentation: average blur, motion blur;
add, multiply, grayscale;
scale, translate, shear, rotate;
align (single bbox) or no-align
• Inference
– train: 各🐳毎に5 BBOXを利用して特徴ベクトルを出す
🐳ID毎に更に平均
– test: 各🐳毎に5 BBOXを利用して特徴ベクトルを出す↑と比較

4th Solution
21
• SIFT+RANSACで全ペアbrute force!
1. Loop through all test/train pairs
2. Match keypoints using faiss
3. Double homography filtering of keypoints
(LMedS followed by RANSAC)
4. xgboost prediction to validate homography matrix
5. if # of matches > threshold, then use prediction
• Top-1の結果を↑で算出し、top-2 ~ 5をSiameseNetで算出
Landmark
コンペでやってた
フル解像度の🐳
CLAHE (Contrast Limited Adaptive
Histogram Equalization) で正規化
UNetでしっぽセグメンテーション

5th Solution
22
• SiameseNet (DenseNet121 backbone)
• Original BBOX regressor
• Augmentation: shear, rotation, flipping, contrast, Gaussian
noise, blurring, color augmentations, greying, random crops
• LAPをサブブロックで行う。サブブロックは毎回ランダムに生成
• 4-fold stratified cross validation + 15-model ensemble
• pseudo label -> update folds
(e.g. LB 0.938 -> LB 0.950 -> LB 0.965, etc.)
• Stacking（そこまで効果なし）
https://weiminwang.blog/2019/03/01/whale-identification-5th-place-approach-
using-siamese-networks-with-adversarial-training/
半分くらいベースにした
カーネルの説明

7th Solution
23
• SE-ResNeXt-50 -> global concat (max, avg) pool -> BN ->
Dropout -> Linear -> ReLU -> BN -> Dropout -> clf (5004)
• 4 head classification
• use bbox
• center loss, ring loss, GeM pooling
• verification by local features (Hessian-AffNet + HardNet)
https://github.com/ducha-aiki/mods-light-zmq
• バックボーンは、色々試したが、チームメイトの距離学習を行った
ネットワーク（SE-ResNeXt-50）をfinetuneするのが良かった
• new_whale is inserted to softmaxed predictions with constant
threshold, which is set on validation set by bruteforce search in
range from 0 to 0.95.
https://github.com/ducha-aiki/whale-identification-2018
距離学習のようなことをしているのでsoftmax閾値でもいけた？

7th Solution
24

7th Solution
25
• 距離学習ベースのアプローチ
– training on RGB images: 256x256, 384x384, 448x448,
360x720
– Augmentations: random erasing, affine transformations
(scale, translation, shear), brightness/contrast
– Models: resnet34, resnet50, resnet101, densenet121,
densenet162, seresnext50 — backbone architectures that
we’ve tried, followed by GeM pooling layer +L2 + multiplier
– Loss: hard triplet loss
• 実際のサブミッションには利用されず、クラス分類ベースの手法の
ベースネットワークとして利用された

9th Solution
26
• Summary: Adam, Cosine with restarts, CosFace, ArcFace, High-
resolution images, Weighted sampling, new_whale distillation,
Pseudo labeled test, Resnet34, BNInception, Densenet121,
AutoAugment, CoordConv, GAPNet
• 1024x1024 resnet34, 512x152 BNInception, 640x640
DenseNet121
• CosFace: s=32, m=0.35. ArcFace: m1=1.0, m2=0.4, m3=0.15
• Augumentation: Horizontal Flip, Rotate with 16 degree limit,
ShiftScaleRotate with 16 degree limit,
RandomBrightnessContrast, RandomGamma, Blur, Perspective
transform: tile left, right and corner, Shear, MotionBlur,
GridDistortion, ElasticTransform, Cutout
CosFace + ArcFace

10th Solution
27
（SiameseNet part）
• Summary
– Siamese architecture
– Metric learning featuring brand-new CVPR 2019 method
(will be published soon)
– Classification on features
– Large blend for new whale/not new whale binary classification
• Tricks
– Flip augmentation for both positive and negative pairs
– ResNet-18, ResNet-34, SE-ResNeXt-50, ResNet-50,
image size: 299->384->512
– 0.929 LB -> ensemble 0.940

10th Solution
28
（Metric learning part）Another solution will be explained later in
detail by @asanakoy. In two words, it is metric learning with
multiple branches and margin loss, trained on multiple resolution
crops using bboxes, grayscale and RGB input images. He also used
his brand-new method from CVPR which allowed for 1-2% score
boost. らしい
（Classification part）concat features from branch models and
train classifcation model
（Post processing）took their TOP-4 predictions for each whale.
Then, for all of our models, we took their predictions on these set
of classes. We used a blend of LogReg, SVM, several KNN models,
and LightGBM to solve a binary classification problem.

15th Solution
29
• At the beginning, we using pure softmax to classification 5005
class. The best result we obtain is around 0.86X using
seresnext50.
• Then we resort to sphereface. To use sphereface, we abandon
new whales, which means we only use around 19K images. This
gives us 0.920 using seresnext-50 (multi-layer fusion, 384384),
0.919 using resnext50 (multi-layer fusion,384384).
• We also tried arcface, which gives us 0.911 using seresnext-50
(multi-layer fusion, 384*384).

My Solution①
30
• 768x256🐳(BBOX), resnext101_32x4d backbone, ArcFace
• known🐳のみ、訓練時はduplicate🐳IDを1つにまとめる
• 10枚以下の画像の🐳は10枚以上になるようにover sampling
• Augmentation: grayscale, Gaussian noise, Gaussian blur,
rotation, shear, piecewise affine, color shift, contrast, crop
• train🐳 vs. test🐳のcos類似度を同一IDに対して平均
768x256
24x8
6x2
24576
bn,
avepool(4)
flatten,
dropout
512
FC
5004
FC
ArcFace
cross
entropyResNeXt101
Feature vector
private LB: 0.92239
public LB: 0.91156
NO VALIDATION SET ;D
due to time constraint

My Solution②
31
• Ensemble with 512x512 SiameseNet model
• test画像 vs. train画像のmatrixをtest画像 vs. 🐳IDのmatrixにする
• TTA: bounding boxのスケールをオリジナル＋2スケール利用
768x256
ArcFace
512x512
SiameseNet
5004
7960
768x256
ArcFace
𝑃 =
𝑚=1
12
𝑤 𝑚 𝑃𝑚
𝛼
𝑃1 𝑃2 𝑃3
𝑃10 𝑃11 𝑃12
𝑃
𝛼は0~1
0に近づくとvotingぽくなる
1は普通のweighted average
個人的にはとりあえず0.5にする
test🐳画像
train🐳”ID”
閾値で切って
new_whaleを差し込み
submissionファイル化
個々の値は
0~1
private: 0.92239
public: 0.91156
512x512
ArcFace
private: 0.92981
publoc: 0.91558
private: 0.90242
public: 0.88183
private: 0.89706
public: 0.86712 …
基本閾値
未tuning

Milestones
32
• 2/21: 独自モデルミジンコ初サブ
• 2/22: ArcFaceを知る
• 2/24: 448x448 model 0.786
• 2/26: 768x256 model 0.879
• 2/27: 768x256 model 0.887
• 2/28: 768x256 model 0.910
• 2/28ド深夜: 3モデル完成、アンサンブル実装
private: 0.95448, public: 0.94632
• 超能力ハイパラ調整により3subでアンサンブルガチャに勝利
• スコアベースアンサンブル、全く違うモデルのアンサンブル

Humpback whale identification challenge反省会

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Humpback whale identification challenge反省会

Similar to Humpback whale identification challenge反省会 (20)

More from Yusuke Uchida

More from Yusuke Uchida (20)

Recently uploaded

Recently uploaded (9)

Humpback whale identification challenge反省会