[DL輪読会] Learning from Simulated and Unsupervised Images through Adversarial Training

Learning from Simulated and
Unsupervised Images through
Adversarial Training
Aug. 28, 2017
⼯学系研究科システム創成学専攻杉原祥太

書誌情報
• 著者
• Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind,
Wenda Wang, Russell Webb
• Apple Inc.
• CVPR 2017, Best paper award
• arXiv: https://arxiv.org/abs/1612.07828
• Presentation: https://youtu.be/P3ayMdNdokg
2

Motivation
• ラベル付き実データを⼤量に⽤意す
るのは⼤変
• シミュレータで⽣成されたラベル付
きデータは実データとギャップがあ
り，思ったように性能が上がらない．
• GANのアプローチを⽤いて，シミュ
レータで⽣成した画像を，教師なし
学習で実データそっくりに修正する
3

SimGAN 概要
• Refiner
• シミュレータによる⽣成画像を
Refinerによって実データぽく修正
• シミュレータで得られた付加情報
（視線⽅向など）を保持しつつ，
Discriminatorをだますように学習
• Discriminator
• Refinerによって⽣成されてデータ
と実データを識別
4
Self-reg. loss
Adversarial loss

関連研究
• Generative Adversarial Nets [Goodfellow et al.(2014)]
• GeneratorとDiscriminatorを敵対的に学習
• UnityEyes[Wood et al.(2016)]
• シミュレータで⼤量の眼画像を⽣成，最近傍探索で視線推定
• Stacked Multichannel Autoencoder[Zhang et al.(2015)]
• スケッチと写真のデータの分布のギャップを学習
• CG2Real[Johnson et al.(2011)]
• CGから各部位を推定し，写真による実データを組み合わせて背景画像
⽣成
5

Training Loss: Discriminator
• Discriminator𝐷 𝝓は実データとRefinedデータを識別する
• パラメータ𝝓はミニバッチごとにSGDにより更新
6
Label 0 for real image
Label 1 for refined image

Training Loss: Refiner
• 実データ群𝐲$ ∈ 𝒴，シミュレータによる⽣成画像𝐱から，refine
画像𝐱(を⽣成したい．𝐱( ≔ 𝑅 𝜽 𝐱
• 𝜽の学習に⽤いるloss:
• ℓ-./0: 実データに似ているか
• ℓ-.1: 付加情報が保存されているか
• 𝜓は特徴空間への写像，ここでは恒等写像
7

アルゴリズム
• Refiner𝑅 𝜽はstride, poolingなしの
全結合層で実装→全体的な構造が
保存される
• ℒ4 𝜽 とℒ5 𝝓 を交代で学習
• 𝑅 𝜽更新時は𝝓固定
• 𝐷 𝝓更新時は𝜽固定
8

Local Adversarial Loss
• Discriminatorへの⼊⼒画像をpatchと呼ぶ⼩単位に分割する．
• Refinedデータが実データそっくりなら，局所的に⾒ても似て
いると考える．
• 各patchで実データである確率を計算，loss関数には全ての
patchのクロスエントロピーlossの和を与える．
9

Using a History of Refined Images
• Discriminatorが最新の⽣成画像のみに従っ
て学習してしまう．
• Refinerから⽣成された画像は常に”偽物”なのに
その影響を忘れる．
• Discriminatorは常にこれらの画像を偽物と分類
すべきである．
• Refined画像のHistoryからDiscriminatorを
学習
• ミニバッチの半分は現在のRefinerから⽣成され
た画像，残りはバッファで蓄えられた画像から
更新する．
• イテレータの更新時にバッファの半分をランダ
ムに選び，新しいRefined画像に置き換える．
10

データセット
• Eye Gaze Estimation
• MPIIGaze Dataset[Zhang et al.(2015)]
• 214K real images
• UnityEyes[Wood et al.(2016)]
• 1.2M synthetic images
• Hand Pose Estimation
• NYU hand pose dataset
• 72,757 training frames
• 8,251 testing frames
• 前処理で224×224にクロッピング
11

Eye Gaze Estimation: 定性的評価
• 視線の⽅向を保ちつつ，肌の質感，センサのノイズ，虹彩の外
観が再現できている．
• 𝜓が恒等写像でない場合
• RGBチャネルの平均値
12

Eye Gaze Estimation: Visual Tuning Test
• 20枚ずつ例を⾒せた上で，実データとRefinedデータを分類し
てもらう（50枚+50枚，被験者10名）
• Accuracyは517/1000 𝑝 = 0.148
• 対照的に，実データとオリジナルの⽣成データの分類の
Accuracyは168/200 𝑝 ≤ 10@
13

Eye Gaze Estimation: 定量的評価
• ZhangらのCNNに似た
CNN(outputは視線⽅向を表す
3次元ベクトル）を⽤いて学習
• 訓練データによる結果の⽐較
• d=7度以内の誤差を許容する時
の結果の⼀致率
14

Eye Gaze Estimation: 定量的評価
• ネットワークごとでの⽐較
• 提案⼿法が最も優れているこ
とを⽰した．
• Refinerが視線⽅向をきちんと
保存しているかどうか
• 各100個のデータをラベル付け
• 差の絶対量は1.1 ± 0.8 px, 瞳の
⼤きさは55 px
15

Hand Pose Estimation: 定性的評価
• 境界部分の不連続といった，実データにおけるノイズが再現で
きている．
16

Hand Pose Estimation: 定量的評価
• Refinerによる影響を調べるた
め，訓練データによる⽐較のみ
• 提案⼿法が最も優れていること
を⽰した．
17

Using the History of Refined Imagesの評価
• Without historyでの⽬の周りに
⽣じる不⾃然さが，with history
で抑制されている．
• 視線推定の誤差が減る
• With history: 7.8 degrees
• Without history: 12.2 degrees
18

Local Adversarial Lossの評価
• Global adversarial lossでは
境界部分で不⾃然な深度分布
になっているが，Local
adversarial lossではそれらが
解消されている．
19

まとめ
• Contirutions
• ⽣成データをRefineするため教師なし学習を⾏うS+U Learningの提案
• Adversarial lossとself-regularization lossを組み合わせて，Refinerを
学習
• GANの学習の安定化と不⾃然な⽣成を避けるための改良
• 質的量的にシミュレータによるデータ⽣成の⾃然さを向上，Refinerで
⽣成したデータからstate of the artな結果を実現
20

Eye Gaze Estimation: ネットワーク
• Refiner Network
• Input (conv, 55×35, filter3×3)
• ResNet (2conv, 64 feature map)×4
• Conv1x1 1 feature map=1
• Discriminator
• (1) Conv3x3, stride=2, feature maps=96,
• (2) Conv3x3, stride=2, feature maps=64
• (3) MaxPool3x3, stride=1
• (7) Softmax.
• EyeGazeEstimation
• Input 35x55 grayscale
• (1) Conv3x3, feature maps=32
• (4) Max- Pool3x3, stride=2
• (8) FC9600
• (9) FC1000
• (10) FC3
• (11) Euclidean loss
21

Hand Pose Estimation : ネットワーク
• Refiner Network
• Input (conv, 224x224, filter7x7)
• ResNet (2conv, 64 feature map)×10
• Conv1x1 1 feature map=1
• Discriminator
• (6) Conv1x1, stride=1, feature maps=2,
• (7) Softmax
• EyeGazeEstimation
• Input 224x224 grayscale
• Hourglass blocks x2
• Conv7x7, stride=2
• Residual module
• maxpooling
• Output 64x64
22

[DL輪読会] Learning from Simulated and Unsupervised Images through Adversarial Training

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to [DL輪読会] Learning from Simulated and Unsupervised Images through Adversarial Training

Similar to [DL輪読会] Learning from Simulated and Unsupervised Images through Adversarial Training (9)

More from Deep Learning JP

More from Deep Learning JP (20)

Recently uploaded

Recently uploaded (15)

[DL輪読会] Learning from Simulated and Unsupervised Images through Adversarial Training