DLゼミ：Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation

Copyright © 2020 調和系工学研究室 - 北海道大学大学院情報科学研究院情報理工学部門複合情報工学分野 – All rights reserved.
論文紹介
Primitive Generation and
Semantic-related Alignment for
Universal Zero-Shot Segmentation
北海道大学大学院情報科学研究院
情報理工学部門複合情報工学分野調和系工学研究室
修士2年大倉博貴

2
論文情報
• タイトル
– Primitive Generation and Semantic-related Alignment
for Universal Zero-Shot Segmentation
• 著者
– Shuting He and Henghui Ding and Wei Jiang
• 発表
– CVPR2023
• 2023年6月19日
• URL
– Project Page
• https://henghuiding.github.io/PADing/
– Github
• https://github.com/heshuting555/PADing

3
概要
• 物体の意味関係を視覚特徴の学習に利用する
Zero-Shotセグメンテーション手法を提案
• 提案手法はSoTAを達成

4
背景
• 大量の教師データが必要というセグメンテーション
の課題解決にzero-shot学習が提案
• Generative modelベースは，物体を学習済みのクラス
に分類する傾向がある問題[1]を軽減し優れている
• しかし，いくつかの課題が存在
– ピクセルレベルごとに生成するため，十分なロバスト性が
ない
– 意味埋め込みから視覚特徴へのマッピングで，画像が言語
より豊富な情報を含むことを考慮していない
– 未知クラスに対する生成の学習が難しい
[1] Farhad Pourpanah, Moloud Abdar, Yuxuan Luo, Xinlei Zhou, Ran Wang, Chee Peng Lim, and Xi-Zhao Wang. A review of generalized zero-shot learning methods. arXiv preprint
arXiv:2011.08641, 2020.

5
提案手法
• PADing
– Primitive Generatorが未知のクラスの視覚特徴を
合成するフレームワーク
• Relationship AlignmentとDisentangleにより実現

6
提案手法
• PADingの学習アルゴリズム
– 事前学習済みの予測器が，クラスにとらわれない
マスクとクラス埋め込みを予測
– Primitive Generatorを学習
– クラス埋め込みと合成未知クラス埋め込みから，
予測器を調整

7
Primitive Generator
• Primitive Cross-Model Generation
– 細かい属性を持つprimitivesを用いたクラス埋め込
みの合成
• Semantic-Visual Relationship Alignment
– 未知のクラスの合成を実現するためのRelationship
-AlignmentとDisentangleによるアプローチ

8
Primitive Cross-Model Generation
• Primitivesをランダムに初期化
• Self-AttentionでPrimitivesを学習
– Primitivesは非常に細かい意味的特徴
• 例）毛，色，形など
𝑃 = 𝑝𝑖 𝑖=1
𝑁
, 𝑝𝑖 ∈ ℝ𝑑𝑘 𝑑𝑘：チャネル数

9
• 2つの異なる線形層𝜔𝐾, 𝜔𝑉を用いて，Cross-Attention
のKey(𝐾)とValue(𝑉)を得る
• 意味埋め込みをQueryとして，Cross-Attentionを実行
𝜒′
：合成クラス埋め込み
𝑧：ガウス分布のサンプル
𝜔1：線形層

10
• 2つの確率分布の平均不一致の最大値を抑えるために
損失関数を定義
– 既知のクラスのみを対象にしている
𝐿𝐺 =
𝑓,𝑓∈𝑋𝑆
𝑘 𝑓, 𝑓 +
𝑓′,𝑓′∈𝑋𝑆′
𝑘 𝑓′, 𝑓′ − 2
𝑓∈𝑋𝑆
𝑓′∈𝑋𝑆′
𝑘 𝑓, 𝑓′
𝑋𝑆：既知の実クラス埋め込み
𝑋𝑆′
：既知の合成クラス埋め込み

11
Semantic-Visual Relationship Alignment
• Disentangle
– クラス埋め込みに対してエンコーダを適用，意味関連情報
を分離
• Relationship Alignment
– 意味関連視覚空間と意味空間との関係アライメントを行う

12
Disentangle
• クラス埋め込みに対して異なるエンコーダを適用し，
意味関連と非意味関連を生成する
– 意味関連
– 非意味関連
𝑥𝑖 = 𝐸𝑅 𝑥𝑖 , 𝐿𝑅 = −
𝑖 𝑘
𝟙 𝑥𝑖 = 𝑘 𝑙𝑜𝑔
exp(𝑥𝑖𝑎𝑘/𝜏)
𝑘 exp(𝑥𝑖𝑎𝑘/𝜏)
𝐸𝑅：意味関連用エンコーダ
𝐸𝑈：非意味関連用エンコーダ
𝑥 ：𝑥の正解クラスのインデックス
𝐷𝐾𝐿：KLダイバージェンス
𝑥𝑖 = 𝐸𝑈 𝑥𝑖 , 𝐿𝑈 =
𝑖
𝐷𝐾𝐿[𝑥𝑖||𝑁(0,1)]

13
Disentangle
• より効率的に意味関連情報を抽出するためデコーダ
を用いて特徴を再構成
• Disentangle全体の損失関数を定義
𝐿𝑟𝑒𝑐𝑜𝑛 =
𝑖
𝑥𝑖 − 𝐷(𝑥𝑖, 𝑥𝑖) 𝐷：再構築用デコーダ
𝐿ⅅ = 𝐿𝑅 + 𝐿𝑈 + 𝐿𝑟𝑒𝑐𝑜𝑛

14
Relationship Alignment
• 意味関連視覚空間と意味空間の関係アライメント
– 任意の2つの特徴の類似性をそれらに対応する意味埋め込み
の類似性に近づける操作
𝐿𝐴 = 𝐷𝐾𝐿[
𝑥𝑖𝑥𝑗
𝑥𝑖 𝑥𝑗
/𝜏||
𝑎[𝑥𝑖]𝑎[𝑥𝑗]
𝑎[𝑥𝑖] 𝑎[𝑥𝑗]
/𝜏]
𝑥 ：𝑥の正解クラスのインデックス

15
Primitive Generator
• 損失関数の定義
– 未知のクラスに対する意味関係を含むクラス埋め
込みの合成を実現
𝐿𝑡𝑜𝑡𝑎𝑙 = 𝐿𝐺 + 𝜆(𝐿ⅅ + 𝐿𝐴)
𝐿𝐺：既知クラスの損失関数
𝐿ⅅ：Disentangleの損失関数
𝐿𝐴：Relation Alignmentの損失関数
𝜆：重み

16
実験設定
• 利用モデル
– 意味埋め込み生成
• CLIP text embedding[2]
• Word2vec[3]
– 分類器
• ResNet-50[4]をバックボーンとしたMask2Former[5]
– Disentangle用エンコーダとデコーダ
• MLP
– ベースライン
• GMMN[6]
[2] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning
transferable visual models from natural language supervision. In ICML. PMLR, 2021.
[3] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NeurIPS, 2013.
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
[5] Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
[6] Maxime Bucher, Tuan-Hung Vu, Matthieu Cord, and Patrick Perez. Zero-shot semantic segmentation. ´ NeurIPS, 32, 2019.

17
実験設定
• データセット
– MSCOCOを用いたZSP(Zero-Shot Panoptic Segmentation)用
データセットを作成
• ZSS(Zero-Shot Segmentation)用データセット作成の先行研究
[7]に従う
• 評価指標
– 調和平均(HM)で計算
• PQ(Panoptic Quality)
• SQ(Segmentation Quality)
• RQ(Recognition Quality)
• mAP(mean Average Precision)
• mIoU(mean Intersection-over-Union)
[7] Yongqin Xian, Subhabrata Choudhury, Yang He, Bernt Schiele, and Zeynep Akata. Semantic projection network for zero-and few-label semantic segmentation. In CVPR, 2019.
𝐻𝑀 =
2 × 𝑃𝑠𝑒𝑒𝑛 × 𝑃𝑢𝑛𝑠𝑒𝑒𝑛
𝑃𝑠𝑒𝑒𝑛 + 𝑃𝑢𝑛𝑠𝑒𝑒𝑛

18
実験①アブレーション
• ZSP(Zero-Shot Panoptic)タスク
– PADingが高精度
– Primitive Generatorがベースラインより優れている
• その他のZero-Shotタスク
– セグメンテーションタスク全体に有効
G/P：GMMN/Primitive Generator
A：Relationship Alignment
D：Disentangle

19
実験②SoTAとの比較
• ZSS(Zero-Shot Segmentation)タスク手法比較
– 従来の最良手法ZegFormer-seg[8]を上回る精度
• 従来手法がResNet-101を利用するが提案手法はResNet-50を利用
[8] Jian Ding, Nan Xue, Gui-Song Xia, and Dengxin Dai. Decoupling zero-shot semantic segmentation. In CVPR, 2022.

20
実験③定性的な結果
• ZSP(Zero-Shot Panoptic)タスク
– ベースラインが見落とす未知のクラスを分類できている
• スーツケース，草，フリスビー，道路，木，スケートボードなど

21
まとめ
• 物体の意味関係を視覚特徴の学習に利用する
Zero-Shotセグメンテーション手法を提案
• 提案手法はSoTAを達成

DLゼミ：Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation

Recommended

Recommended

More Related Content

Similar to DLゼミ：Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation

Similar to DLゼミ：Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation (20)

More from harmonylab

More from harmonylab (20)

Recently uploaded

Recently uploaded (10)

DLゼミ：Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation