KDD2018 paper reading

Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
KDD論文読み会
Oct 11, 2018
Kosuke Kuzuoka
AI system Dept.
DeNA Co., Ltd.
1

Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
自己紹介
■ Mar 2017 渋谷のベンチャー
⁃ 建設 x ITがコンセプトのiOSのアプリを開発
⁃ 芝浦工業大学と共同で図面解析を研究開発
■ June 2018 DeNAに転職
⁃ AIシステム部で主にCV関係の技術を開発
■ Interests
⁃ Computer vision
⁃ Self driving car
■ Hobby
⁃ ドライブ
⁃ 海外ドラマ
2
Facebook: Kousuke Kuzuoka
email: kosuke.kuzuoka@dena.com

タイトル紹介
3

概要
■ Dilated convolutionで発生する問題”Gridding artifacts”の原因解析
■ パラメータを増やさず問題を改善する方法を２つ提案
■ ERF(Effective Receptive Field Analysis)を用いての可視化
■ Semantic segmentationのデータセットを使用し既存手法との比較
4

Dilated convolutionとは？
5
Vanilla convolution
■ Dilated rate = 1
■ パラメータにstride, paddingなどがあり
Denseなフィルターを使用し演算する
Dilated convolution
■ Dilated rate > 1
■ Vanilla convのkernelにdilated rate-1個の0
を挿入したSparseなフィルターを使用し演
算する
[1]

Dilated convolutionとは？
6
DeepLabv2
■ ASPP(Atrous Spatial Pyramid Pooling)や
Dilated convを使用したセグメンテーション
モデル
■ Dilated rateは2が使用されている
DeepLabv3
■ DeepLabv2同様ASPPを取り入れ事後処理
のDenceCRFを無くした改善版
■ 異なるDilated rateが使用されている
[2] [3]

Gridding artifactsとは？
7
■ 出力の隣接する要素のReceptive fieldは隣接していないことからDilated convを連続的に
使用すると層が深くなると同時に空間的情報が失われ精度が悪くなる現象
■ Segmentationの場合角ばったような結果が出力されることがある

Gridding artifactsとは？
8
[4]

Decomposition view of Dilated Convolution
9
■ Dilated convを3つに分ける
■ Periodic subsamplingでDilated convに
使用される要素をGroupにする
■ Periodic subsamplingでGroupにされた
特徴マップに対しShared standard
convを行う（重みは共有）
■ Shared standard convの出力をSparse
に戻す
■ Sparseなフィルターを用いてDilated
convは行えるが計算量観点からこの
手の手法が使われることが多い

Group Interaction Layers
10
■ 既存のDilated convはGroup間に依存が
ない
■ Shared standard convの結果をGroup間で
依存させるために新たな重みを定義する
■ Shared standard convの後にGroup
interaction layerを足す
■ レイヤーの出力は全てのGroupが新たな
重みによって線型結合されたもの
■ 増えたパラメータは結合するための重み
だけ（図の場合は16）

Group Interaction Layers
11
RはDilated rate
Dは次元數
Group間での線型結合

Separable and Shared Convolution
12
■ Separable and shared convolution(SS)
をPeriodic subsamplingの前に使用
■ Separable convはHxWx1のフィルター
を使用しChannel wiseに行う
■ SS convで使用されるフィルターは全て
の入力のChannleに共有される
■ 入力のChannle数が1の場合は
Separable convとVanilla convと同じ
■ 増えたパラメータはフィルターの要素
値（図の場合9）

実験
■ PASCAL VOC2012
⁃ Semantic segmentation用のデータを使用
⁃ 合計21クラス（内1つはBackgroud）
⁃ 評価指標はmIoU
■ CityScapes
⁃ Semantic segmentation用データセット
⁃ 車載カメラからの画像
⁃ 合計19クラス
⁃ 評価指標はmIoU
13

実験
14
（上）COCOで学習した重みを使用しPASCAL VOC2012でテストを行なった結果
（下）COCOの重みを使用せずPASCAL VOC2012でテストを行なった結果
行はクラス別（“aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike,
person, potteplant, sheep, sofa, train, tvmonitor”）

実験
15
（上）COCOで学習した重みを使用しPASCAL VOC2012でテストを行なった結果
（下）COCOの重みを使用せずPASCAL VOC2012でテストを行なった結果
行はクラス別（“aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike,
person, potteplant, sheep, sofa, train, tvmonitor”）

実験
16
（上）COCOでの重みを使用したCityScapesでテストを行なった結果
（下）COCOの重みを使用せずCityScapesでテストを行なった結果
行はクラス別（“road, sidewalk, building, wall, fence, pole, traffic light, traffic sign, vegetation, terrain, sky, person,
rider, car, truck, bus, train, motorcycle, bicycle”）

実験
17
（上）COCOでの重みを使用したCityScapesでテストを行なった結果
（下）COCOの重みを使用せずCityScapesでテストを行なった結果
行はクラス別（“road, sidewalk, building, wall, fence, pole, traffic light, traffic sign, vegetation, terrain, sky, person,
rider, car, truck, bus, train, motorcycle, bicycle”）

実験
18
■ Effective Receptive Field
Analysis(ERF)を用いた可視化
■ 入力の特徴マップの要素が出力の
特徴マップの要素に与えた影響を可
視化する手法
■ 既存手法はGridding artifactsの原因
が一目でわかる
■ 提案手法は既存手法に比べ滑らか
になっているのがわかる
■ Group Interaction LayersはSS convと
比較するとgridding artifactsの問題
が少し残っていることが図を見ると
わかる

まとめ
■ 問題視されているgridding artifactsを追加パラメータを最小限に設定し緩和した
■ 新規手法を2つ提案し精度比較を行った結果いずれもmIoUの改善が見られた
■ 既存ネットワークに簡単に足すことができる
■ 画像処理だけでなく音声などDilated convolutionを使用している場合には提案手法を取
り入れることによって改善が見れる可能性がある
■ コード公開済み https://github.com/divelab/dilated/
19

参考文献
1. An Introduction to different Types of Convolutions in Deep Learning
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
2. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2016. Deeplab:
Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs.
arXiv preprint arXiv:1606.00915 (2016).
3. Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous
convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
4. PanquWang,PengfeiChen,YeYuan,DingLiu,ZehuaHuang,XiaodiHou,and Garrison Cottrell. 2017.
Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502 (2017).
20

KDD2018 paper reading

Recommended

Recommended

More Related Content

Similar to KDD2018 paper reading

Similar to KDD2018 paper reading (20)

KDD2018 paper reading