Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Computer Vision meets
Fashion
Kota Yamaguchi
CyberAgent, Inc.
CVF Workshop, Oct 29, 2017github.com/kyamagu
style.com
Computer
vision and
fashion?
style.com
Computer vision
provides machine
perception to
fashion
For e-commerce
Garment search
ShortsBlazerT-shirt
query results
Clothing search
[Liu 12] [Kalantidis 13] [Cushen 13] [Kiap...
For social science
Social groups
[Murillo 12] [Kwak 13]
[Kwak 13]
nput: (Z1, Y1), . . . , (ZN , YN ), C, ϵ
Output: w, ξ
ni...
For re-identification
Person
identification
[Anguelov 07] [Gallagher 08] [Vaquero 09]
[Wang 11]
[Gallagher 08]
[Vaquero 20...
Fashion
Industry
“The global apparel market is valued at 3 trillion dollars,
3,000 billion, and accounts for 2 percent of ...
Amazon Echo Look:
Hands-Free Camera and Style Assistant
https://www.amazon.com/Echo-Hands-Free-Camera-Style-Assistant/dp/B...
Start-ups
• Cutting-edge
computer vision in
business
Fashwell
Wide Eyes TechnologiesShopagon
VasilyMarkable
Start-ups
• Virtual styling
https://qz.com/1090267/artificial-intelligence-can-now-show-you-how-those-pants-will-fit/
Vue....
Start-ups
Original Stitch
• Body
measurement
Makip
Discussion
• Fashion tech is strongly application-oriented
• UX in e-commerce and social media
• Computer vision as a buil...
Agenda
• Recent topics
1. Contexts in clothing
recognition
2. Retrieval for e-commerce
3. Style / trend analysis
4. Learni...
Recent topics
1. Contexts in clothing recognition
2. Retrieval for e-commerce
3. Style / trend analysis
4. Learning from b...
1. Contexts in clothing recognition
• Garment is related to
• Body parts
• Other garments
Semantic segmentationPose estima...
Mix and Match: Joint Model for Clothing
and Attribute Recognition
matched
unmatched
[BMVC 2015]
Learning Visual Clothing Style with
Heterogeneous Dyadic Co-occurrences
[Veit, ICCV 2015]
IncompatibleCompatible
• Learn f...
Learning Fashion Compatibility with
Bidirectional LSTMs [Han, ACM MM 2017]
Recommending Outfits from Personal Closet
Good
?
Bad
Outfit
Closet
Scoring a set of image
[Tangseng, CVF 2017]
2. Retrieval for e-commerce
• Street2shop
• Attribute-based search
• VQA shopping assistant
Exact retrieval,
interaction, ...
DeepFashion: Powering Robust Clothes
Recognition and Retrieval with Rich
Annotations
http://mmlab.ie.cuhk.edu.hk/projects/...
Where to Buy It: Matching Street Clothing
Photos in Online Shops
• Finding the exact product given a snapshot
[Kiapour, IC...
Memory-Augmented Attribute Manipulation
Networks for Interactive Fashion Search[Zhao, CVPR 2017]
Visual Search at eBay
• VQA in shopping
scenario
[Yang, KDD 2017]
3. Style / trend analysis
• Learning style/outfit as a whole
• Geographical trend
• Temporal trend
Unsupervised learning,
...
Fashion Style in 128 Floats: Joint Ranking
and Classification Using Weak Data for
Feature Extraction [Simo-Serra, CVPR 201...
StreetStyle: Exploring world-wide clothing
styles from millions of photos
[Matzen, arXiv 2017]
Learning the Latent "Look": Unsupervised
Discovery of a Style-Coherent Embedding from
Fashion Images [Hsiao, ICCV 2017]
Po...
Fashion Forward: Forecasting Visual
Style in Fashion [Al-Halah, ICCV 2017]
Popularity prediction by styles Keyword popular...
4. Learning from fashion big data
• Social signal + visual signal
• Textual signal
Fashion Conversation Data on Instagram
Successful categorization
Unsuccessful categorization
[Ha, ICWSM 2017]
When Fashion Meets Big Data: Discriminative
Mining of Best Selling Clothing Features[Chen, WWW 2017]
The Elements of Fashion Style [Vaccaro, UIST 2016]
Learning semantic relationship between high-level and low-level fashion...
5. Image synthesis
• Domain translation
• Creative inspiration
• Virtual fitting
Pixel-Level Domain Transfer
• Generating a product photo given a street snap
[Yoo, ECCV 2016]
https://github.com/fxia22/Pi...
Pose Guided Person Image Generation[Ma, arXiv 2017]
A Generative Model of People in Clothing
• Generating people from pose map and styling pipeline
[Lassner, ICCV 2017]
Be Your Own Prada: Fashion Synthesis
with Structural Coherence [Zhu, ICCV 2017]
Contextual modeling Retrieval
Style and trend Multimodal learning Creativity
matched
unmatched [Kiapour 2015]
[Al-Halah 20...
Attribute discovery
ECCV 2016
Visual attributes
• How does a _______ t-shirt look like?
• yellow
• large
• surfer
• comfy
• original
• popular
...
oneho...
Vocabulary of attributes
• How many words can we use to describe a t-shirt?
• My t-shirt looks __________.
Emerging trends
Can we learn an image
recognition model for
all of the hashtags?
Instagram
Automatic attribute discovery
• Finding vocabulary of visual attributes
• Open-world recognition challenge
• Using pre-tra...
Our approach
Pre-trained deep CNN
beautiful soft blush handmade leather ballet
flats.
***please, note, our new blush balle...
Web data:
unlimited vocabulary with images
Textual description
Feel So Good ... Purple Halter
Maxi Cotton dress 2 Sizes
Av...
Discovery: intuition
• Contrast positive and negative sets to identify difference of
semantics
pink not pink
Identifying difference at neurons
conv1
conv2
conv3
conv4
conv5
fc6
fc7
positive
negative
Deep neural
network
Activation
h...
KL visualization: shorts
Positive Negative
pool5 KL
average
image
norm2 KL
KL visualization: red
Positive Negative
pool5 KL
average
image
norm2 KL
Is the attribute visual?
• Which attribute is visually perceptible?
• Measure the classification performance, and compare ...
Visualness
• Visualness of word u given a classifier f and dataset D+, D-:
V(u| f )º accuracy( f,Du
+
,Du
-
)
positive neg...
Discovered attributes
lovelybrightorange acrylic
NOT lovelyNOT brightNOT orange NOT acrylic
elegant
NOT elegant
Discovery in noisy data
annotatedfloralNOTannotatedfloral
predicted MOST floral predicted LEAST floral
False positives
Fal...
Perceptual depth
• Which layer affects attribute recognition?
conv1
conv2
conv3
conv4
conv5
fc6
fc7
orange
bright
elegant
...
Most salient words (Etsy)
norm1 norm2 conv3 conv4 pool5 fc6 fc7
orange green bright flattering lovely many sleeve
colorful...
Saliency detection
• Can we identify salient region of the discovered attribute?
tulle-skirt
image
sunglasses
shorts sneakers
gingham check
white style yellow
human ours image human ours
Attribute discovery
• Learn open-world recognition models from Web data
• Highly-activating neurons to identify visual sti...
Clothing parsing
CVPR 2012, ICCV 2013, TPAMI 2014, arXiv
style.com
Clothing parsing
Fully-convolutional Neural Networks
(FCN)
• CNN for semantic
segmentation
• All-convolution
architecture
Fully Convolution...
Looking at outfit to parse clothing
FC FC Sigmoid
Deconv
Sum
Crop
Deconv
Deconv
Crop
Sum
Conv
3xConv
Pool5
3xConv
Pool4
3x...
Parsing results
input truth prediction input truth prediction
skin
hair
dress
hat/headband
shoes
skin
hair
bag
dress
jacket/blazer
necklace
shoes
sweater/cardigan
top/t-shirt
vest
watc...
Performance [%]
Dataset Method Accuracy IoU
Fashionista v0.2 Paper doll [ICCV 2013] 84.68 -
Clothlets [ACCV 2014] 84.88 -
...
Refined Fashionista dataset
• High-quality, manually-
annotated 685 pictures
• Major improvement from v0.2
• CVPR 2012
• U...
Pixel-based annotation using superpixels
https://github.com/kyamagu/js-segment-annotator
Coarse-to-fine superpixels on the fly
• SLIC superpixels computed on the client-side
• Takes just a second in modern brows...
Limitations
• CRF tends to trim small
items
• sunglasses
• watch/bracelet
• Dress vs. top+skirt
distinction is still hard
...
Computer Vision meets Fashion
• Machine perception to fashion
• Tool to analyze semantics of fashion
• Research towards re...
Upcoming SlideShare
Loading in …5
×

Computer Vision meets Fashion

2,977 views

Published on

Talk at Computer Vision for Fashion workshop in ICCV 2017. A brief review on recent topics in fashion application in computer vision and introducing automatic attribute discovery work from ECCV 2016.

Published in: Data & Analytics
  • Get Automated Computer NFL,MLB,Soccer picks [$127,999 profit verified] ▲▲▲ http://scamcb.com/zcodesys/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ➤➤ 3 Reasons Why You Shouldn't take Pills for ED (important) ■■■ http://ishbv.com/rockhardx/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Computer Vision meets Fashion

  1. 1. Computer Vision meets Fashion Kota Yamaguchi CyberAgent, Inc. CVF Workshop, Oct 29, 2017github.com/kyamagu
  2. 2. style.com Computer vision and fashion?
  3. 3. style.com
  4. 4. Computer vision provides machine perception to fashion
  5. 5. For e-commerce Garment search ShortsBlazerT-shirt query results Clothing search [Liu 12] [Kalantidis 13] [Cushen 13] [Kiapour 15] [Liu 16] Recommendation [Liu 12] [McAuley 15] [Veit 15] [Yang 17] Categorization [Borras 03] [Berg 10] [Bourdev 11] [Chen 12] [Bossard 12] [Di 13]
  6. 6. For social science Social groups [Murillo 12] [Kwak 13] [Kwak 13] nput: (Z1, Y1), . . . , (ZN , YN ), C, ϵ Output: w, ξ nitialization: H = ∅ epeat (w, ξ) ← solveproblem (8) based on current H; for n = 1 to N do Y∗ n ← argmaxY ∗ n ∈Y { △ (Yn , Y∗ )+ wT Ψ(Zn , Y∗ )} ; end H ← H ∪ { (Y∗ 1 , . . . , Y∗ N )} ; until 1 N N n △ (Yn , Y∗ n ) − 1 N wT N n [Ψ(Zn , Yn ) − Ψ(Zn , Y∗ n )] ≤ ξ + ϵ; gorithm 2: 1-slack formulation for structure SVM. all that wehave6 spatial relations, |A| dimensional fea- e, and C categories of occupations. Then thedimension- y of wa and wb is 6C2 and C × |A|, respectively. Anal- Soccer Player Mara- thoner Chef Lawyer Doctor Firefighter Policeman WaiterSoldier Student Clergy Mailman Construc- tion Labor Teacher Figure 3. Illustrations of the collected occupation database. There are14 occupations and over 7K images in total. hypothesis should be valid. However, the runtime for this n-slack formulation in problem (6) is still polynomial with Occupation [Song 11] [Shao 13] [Shao 13]
  7. 7. For re-identification Person identification [Anguelov 07] [Gallagher 08] [Vaquero 09] [Wang 11] [Gallagher 08] [Vaquero 2009]
  8. 8. Fashion Industry “The global apparel market is valued at 3 trillion dollars, 3,000 billion, and accounts for 2 percent of the world‘s Gross Domestic Product (GDP).” – FashionUnited.com
  9. 9. Amazon Echo Look: Hands-Free Camera and Style Assistant https://www.amazon.com/Echo-Hands-Free-Camera-Style-Assistant/dp/B0186JAEWK
  10. 10. Start-ups • Cutting-edge computer vision in business Fashwell Wide Eyes TechnologiesShopagon VasilyMarkable
  11. 11. Start-ups • Virtual styling https://qz.com/1090267/artificial-intelligence-can-now-show-you-how-those-pants-will-fit/ Vue.ai Only two of these images were taken by a camera.
  12. 12. Start-ups Original Stitch • Body measurement Makip
  13. 13. Discussion • Fashion tech is strongly application-oriented • UX in e-commerce and social media • Computer vision as a building block • Deep learning almost solves recognition problems • From simple tasks to complex tasks • Data issues: research towards unsupervised / weakly-annotated data • Machine learning for creativity? • Can all the players in fashion industry benefit from computer vision?
  14. 14. Agenda • Recent topics 1. Contexts in clothing recognition 2. Retrieval for e-commerce 3. Style / trend analysis 4. Learning from big data 5. Image synthesis • Attribute discovery • Clothing parsing
  15. 15. Recent topics 1. Contexts in clothing recognition 2. Retrieval for e-commerce 3. Style / trend analysis 4. Learning from big data 5. Image synthesis
  16. 16. 1. Contexts in clothing recognition • Garment is related to • Body parts • Other garments Semantic segmentationPose estimation Contexts, joint models, dependency [CVPR 2012]
  17. 17. Mix and Match: Joint Model for Clothing and Attribute Recognition matched unmatched [BMVC 2015]
  18. 18. Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences [Veit, ICCV 2015] IncompatibleCompatible • Learn from co-purchase data
  19. 19. Learning Fashion Compatibility with Bidirectional LSTMs [Han, ACM MM 2017]
  20. 20. Recommending Outfits from Personal Closet Good ? Bad Outfit Closet Scoring a set of image [Tangseng, CVF 2017]
  21. 21. 2. Retrieval for e-commerce • Street2shop • Attribute-based search • VQA shopping assistant Exact retrieval, interaction, attributes ShortsBlazerT-shirt
  22. 22. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html [Liu, CVPR 2016] 800K images 50 categories 1K attributes, bbox, landmarks Attribute prediction Street-to-shop In-shop retrieval Landmark detection Retrieval demo: http://fashion.sensetime.com/
  23. 23. Where to Buy It: Matching Street Clothing Photos in Online Shops • Finding the exact product given a snapshot [Kiapour, ICCV 2015]
  24. 24. Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search[Zhao, CVPR 2017]
  25. 25. Visual Search at eBay • VQA in shopping scenario [Yang, KDD 2017]
  26. 26. 3. Style / trend analysis • Learning style/outfit as a whole • Geographical trend • Temporal trend Unsupervised learning, weakly supervised learning
  27. 27. Fashion Style in 128 Floats: Joint Ranking and Classification Using Weak Data for Feature Extraction [Simo-Serra, CVPR 2016]
  28. 28. StreetStyle: Exploring world-wide clothing styles from millions of photos [Matzen, arXiv 2017]
  29. 29. Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images [Hsiao, ICCV 2017] Polylingual LDA
  30. 30. Fashion Forward: Forecasting Visual Style in Fashion [Al-Halah, ICCV 2017] Popularity prediction by styles Keyword popularity prediction
  31. 31. 4. Learning from fashion big data • Social signal + visual signal • Textual signal
  32. 32. Fashion Conversation Data on Instagram Successful categorization Unsuccessful categorization [Ha, ICWSM 2017]
  33. 33. When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing Features[Chen, WWW 2017]
  34. 34. The Elements of Fashion Style [Vaccaro, UIST 2016] Learning semantic relationship between high-level and low-level fashion concepts in text
  35. 35. 5. Image synthesis • Domain translation • Creative inspiration • Virtual fitting
  36. 36. Pixel-Level Domain Transfer • Generating a product photo given a street snap [Yoo, ECCV 2016] https://github.com/fxia22/PixelDTGAN
  37. 37. Pose Guided Person Image Generation[Ma, arXiv 2017]
  38. 38. A Generative Model of People in Clothing • Generating people from pose map and styling pipeline [Lassner, ICCV 2017]
  39. 39. Be Your Own Prada: Fashion Synthesis with Structural Coherence [Zhu, ICCV 2017]
  40. 40. Contextual modeling Retrieval Style and trend Multimodal learning Creativity matched unmatched [Kiapour 2015] [Al-Halah 2017] [Lassner 2017][Ha 2017] • From recognition to real-world tasks • Business impact? Recent topics
  41. 41. Attribute discovery ECCV 2016
  42. 42. Visual attributes • How does a _______ t-shirt look like? • yellow • large • surfer • comfy • original • popular ... onehourtees.com www.justclick.ie www.matsongraphics.compolyvore.com
  43. 43. Vocabulary of attributes • How many words can we use to describe a t-shirt? • My t-shirt looks __________.
  44. 44. Emerging trends Can we learn an image recognition model for all of the hashtags? Instagram
  45. 45. Automatic attribute discovery • Finding vocabulary of visual attributes • Open-world recognition challenge • Using pre-trained deep neural networks to identify visual words in the Web data yellow large surfer comfy original popular
  46. 46. Our approach Pre-trained deep CNN beautiful soft blush handmade leather ballet flats. ***please, note, our new blush ballet flats are without the beige trim line (around the edges), still just as beautiful and perhaps even more*** SIZING ✍ how to take measurements ✍ there are a number of ways to measure your feet, however we find the quickest and most reliable practice is by tracing your feet. Here is how to do it: stand on a piece of paper that's bigger than your feet, circle your feet around with a straight standing pencil (without pressing the pencil too hard to the edges of your feet). Once you have the tracing, measure distance between longest and widest points. Compare the measurements to the list below. Image Text white red striped wooden sliky ... Attributes 1. Get Web data 2. Analyze DNN's internal activity
  47. 47. Web data: unlimited vocabulary with images Textual description Feel So Good ... Purple Halter Maxi Cotton dress 2 Sizes Available Tags used, american casual, summer, shorts, t-shirt, surfer, printed, duffer Etsy dataset: e-commerce Wear dataset: fashion-blog
  48. 48. Discovery: intuition • Contrast positive and negative sets to identify difference of semantics pink not pink
  49. 49. Identifying difference at neurons conv1 conv2 conv3 conv4 conv5 fc6 fc7 positive negative Deep neural network Activation histograms unit #1 unit #2 ... KL divergence Images neurons
  50. 50. KL visualization: shorts Positive Negative pool5 KL average image norm2 KL
  51. 51. KL visualization: red Positive Negative pool5 KL average image norm2 KL
  52. 52. Is the attribute visual? • Which attribute is visually perceptible? • Measure the classification performance, and compare against human yellow comfy large original surfer popular
  53. 53. Visualness • Visualness of word u given a classifier f and dataset D+, D-: V(u| f )º accuracy( f,Du + ,Du - ) positive negative D+ D-
  54. 54. Discovered attributes lovelybrightorange acrylic NOT lovelyNOT brightNOT orange NOT acrylic elegant NOT elegant
  55. 55. Discovery in noisy data annotatedfloralNOTannotatedfloral predicted MOST floral predicted LEAST floral False positives False negatives
  56. 56. Perceptual depth • Which layer affects attribute recognition? conv1 conv2 conv3 conv4 conv5 fc6 fc7 orange bright elegant lovely acrylic
  57. 57. Most salient words (Etsy) norm1 norm2 conv3 conv4 pool5 fc6 fc7 orange green bright flattering lovely many sleeve colorful red pink lovely elegant soft sole vibrant yellow red vintage natural new acrylic bright purple purple romantic beautiful upper cold blue colorful green deep delicate sole flip welcome blue lace waist recycled genuine newborn exact vibrant yellow front chic friendly large yellow ruffle sweet gentle formal sexy floral red orange French formal decorative stretchy waist specific only black delicate romantic great American
  58. 58. Saliency detection • Can we identify salient region of the discovered attribute? tulle-skirt
  59. 59. image sunglasses shorts sneakers gingham check white style yellow human ours image human ours
  60. 60. Attribute discovery • Learn open-world recognition models from Web data • Highly-activating neurons to identify visual stimuli associated to the given word • Neural activations can further identify salient regions
  61. 61. Clothing parsing CVPR 2012, ICCV 2013, TPAMI 2014, arXiv
  62. 62. style.com Clothing parsing
  63. 63. Fully-convolutional Neural Networks (FCN) • CNN for semantic segmentation • All-convolution architecture Fully Convolutional Networks for Semantic Segmentation Jonathan Long, Evan Shelhamer, Trevor Darrell CVPR 2015
  64. 64. Looking at outfit to parse clothing FC FC Sigmoid Deconv Sum Crop Deconv Deconv Crop Sum Conv 3xConv Pool5 3xConv Pool4 3xConv 3xConv 3xConv Pool3 Pool2 Pool1 Conv Conv Product Deconv Crop Softmax CRF Loss Outfit encoder Conv Conv 1. Outfit prediction (clothing combination) 2. Filter out inappropriate garments 3. Smooth out boundary FCN Input [Pongsate, arXiv 2017] Input Output
  65. 65. Parsing results input truth prediction input truth prediction
  66. 66. skin hair dress hat/headband shoes skin hair bag dress jacket/blazer necklace shoes sweater/cardigan top/t-shirt vest watch/bracelet predictiontruthinput
  67. 67. Performance [%] Dataset Method Accuracy IoU Fashionista v0.2 Paper doll [ICCV 2013] 84.68 - Clothlets [ACCV 2014] 84.88 - FCN-8s [CVPR 2015] 87.51 33.97 Our model 88.34 37.23 Refined Fashionista FCN-8s [CVPR 2015] 90.09 44.72 Our model 91.74 51.78 CFPD CFPD [TMM 2015] - 42.10 FCN-8s [CVPR 2015] 91.58 51.28 Our model 92.35 54.65
  68. 68. Refined Fashionista dataset • High-quality, manually- annotated 685 pictures • Major improvement from v0.2 • CVPR 2012 • Used to learn FCN by fine- tuning from pre-trained model [Long 2015]
  69. 69. Pixel-based annotation using superpixels https://github.com/kyamagu/js-segment-annotator
  70. 70. Coarse-to-fine superpixels on the fly • SLIC superpixels computed on the client-side • Takes just a second in modern browsers • Efficient annotation from large to small segments
  71. 71. Limitations • CRF tends to trim small items • sunglasses • watch/bracelet • Dress vs. top+skirt distinction is still hard • Annotation cost truth prediction input truth prediction
  72. 72. Computer Vision meets Fashion • Machine perception to fashion • Tool to analyze semantics of fashion • Research towards real-world challenges • Street2shop, style understanding, social influence, fashion trend, creativity • Technology getting mature for business 

×