20180420 csn learning type-aware embeddings for fashion compatibility

文献紹介
CSN | Learning Type-Aware
Embeddings for Fashion Compatibility
author: Vasileva, Mariya I et al.
2018

abstract
- outfit recommendation の model の提案
- 評価用の新しい厳格なデータセットを提案
- similarity と compatibility を同時に学習
- comatibility の学習をcategory の pair-wise に分けることで、improper triangle の
問題を解決
- より厳しい新しいデータセットで両taskともにSOTA

Table of Contents
- Introduction
- Method
- Experiments & Results
- Conclusion

Introduction - 先行研究の問題点
- improper triangle
- test dataset が簡単 (Experimentsのとこで説明する。)

Introduction - Related Work
- [(A. Veit et al. 2015) SiameseNet | Learning Visual Clothing Style with
Heterogeneous Dyadic Co-occurrences.](https://arxiv.org/pdf/1509.07473.pdf)

Introduction - Improper Triangle
図の出典: [(K. Yamaguchi et al. 2015) Mix and Match: Joint
Model for Clothing and Attribute
Recognition.](http://vision.is.tohoku.ac.jp/~kyamagu)
- compatibility では以下の三角不等式
が成り立つわけではない。
- 「tops A と bottoms B が compatible」かつ
「bottoms B と shoes C が compatible」→
「tops A と shoes C が compatible」

Introduction - Related Work
- [(X. Han et al. 2017) Bi-LSTM | Learning Fashion Compatibility with
Bidirectional LSTMs.](https://arxiv.org/pdf/1707.05691.pdf)

Method - Data
- 大元はPolyvore
- outfit = item image
sequence
- text
- 以下の 3 variants を用い
た。
- Maryland Polyvore (X. Han
et al. 2017)
- test data が簡単
- Polyvore Outfits-D (ours)
- Polyvore Outfits (ours)

Method - Data
- Maryland Polyvore は
- 定量的評価をするには test data が不適切。簡単。（Experimentsのところで説明する。）
- テキストの情報が貧弱。

Method - Model: CSN
- Veit, A., Belongie, S., Karaletsos, T.: Conditional similarity networks. In:
CVPR. (2017) を参考にした。

Method - CSN の input/output
image x
category
u: bottoms
v: tops
text t
comatible
image-text/text-text
distance
image-image
distance

Method - Model: CSN
- 3つのmoduleからなる
- similarity
- VSE = Visual Semantic Embedding: text と image を意味が近いと距離が近くなるよう
embed
- Sim: SiameseNetで、同じcategoryどうしのtext/imageを近くにembed
- compatibility
- Type-Specific Embed
- Sim の embedded space から category pair-wise な space に projectionして、
- compatible な image どうしを近づけて embed

Method - VSE: Visual Semantic Embedding
image x
comatible
自分のimage と text は近づけ、
自分以外の text は遠ざける。
image だけでなく、textの情報も与えることで、より
similarなものが近くなるよう embed。

Method - Sim
image x
comatible
category が同じ image/text どうしを
近づけ、違うcategoryは遠ざける。
先行研究では、ここで compatibilityの
triplet lossをとってた。

Method - Type-Specific Projection
image x
comatible
projection で category の pair-wise
space に分けてからcompatibility の
sim learning

Method - Type-Specific Projection

Method - Model: CSN
- 3つのmoduleからなる（おさらい）
- similarity
- VSE = Visual Semantic Embedding: text と image を意味が近いと距離が近くなるよう
embed
- Sim: SiameseNetで、同じcategoryどうしのtext/imageを近くにembed
- compatibility
- Type-Specific Embed
- Sim の embedded space から category pair-wise な space に projectionして、
- compatible な image どうしを近づけて embed

Experiments - Evaluation - task & metric
- 2 task
- FITB = Fill in the Blank
- Compatibility Prediction
- 5 dataset
- Maryland (All Negatives)
- Maryland (Composition Filtering)
- Maryland (Category-Aware Negative) 上と同じ?
- Polyvore Outfits
- Polyvore Outfits-D

Experiments - task(1/2) - FITB
- 1 correct, 3 wrong の中から compatible な correct を選ぶ
- metric: Accuracy

Experiments - task(2/2) - Compatibility Prediction
- compatible/imconpatible な outfit を binary classification
- compatible (positive sample)
- Polyvore 上の outfit は全てcompatible とする。
- incompatible (negative sample)
- dataset の種類により、samplingの仕方が違う。
- metric: AUC

- Maryland (All Negatives)
- Maryland (Composition Filtering): Maryland の test data では、
- FITB: 候補 item が明らかに違う category → correct item の予測が簡単。
- Compat. Pred.: categoryの重複や欠損がある negative outfit → imcompatible と予測するのが
簡単。
- 簡単なのものを削除
Experiments (1/3) - Maryland

Experiments(2/3) - Category-Aware Negative
- Maryland (Category-Aware Negative): Maryland の test data では、
- FITB: 候補 item が明らかに違う category → correct item の予測が簡単。
- Compat. Pred.: categoryの重複や欠損がある outfit → imcompatible と予測するのが簡単。
- 簡単なものを削除するだけでなく、 categoryを指定してnegative sampling する。

Experiments(3/3) - Polyvore Outfits(-D)
- item数/outfit を増やした。
- text 情報も増やした。
- negative sampling は category-aware の方法。
- D: trainとtestでitemどうしの被りもなし。

Conclusion
- outfit recommendation の model の提案
- 評価用の新しい厳格なデータセットを提案
- similarity と compatibility を同時に学習
- comatibility の学習をcategory の pair-wise に分けることで、improper triangle の
問題を解決
- より厳しい新しいデータセットで両taskともにSOTA

20180420 csn learning type-aware embeddings for fashion compatibility

Recommended

Recommended

More Related Content

Featured

Featured (20)

20180420 csn learning type-aware embeddings for fashion compatibility