Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unreasonable Effectiveness of OCR in Visual Advertisement Understanding

268 views

Published on

Winner talk for Automatic Understanding of Visual Advertisements Challenge
Workshop page: http://people.cs.pitt.edu/~kovashka/ads_workshop/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Unreasonable Effectiveness of OCR in Visual Advertisement Understanding

  1. 1. Unreasonable Effectiveness of OCR in Visual Advertisement Understanding Mayu Otani, Yuki Iwazaki, Kota Yamaguchi CyberAgent, Inc.
  2. 2. Challenges in visual ads Symbolism Complecated Compositions Illustrations
  3. 3. Key observations Texts on ads seem the most powerful clues
  4. 4. Our approach 1. Extract image2text and ocr2text relevance 2. Rank the statement given both scores old jeans new homes Image2text relevance Ocr2text relevance I should shop for jeans at this store Because they use their profits to build homes for people Statement text Visual and OCR clues from ad
  5. 5. Our model old jeans new homes “ ” (negative relevance) (negative relevance)
  6. 6. Per-word weights old jeans new homes shop jeans people OCR words Statement words per-word weights = Vector similarity between OCR detected words and words in a statement text
  7. 7. Image embedding module
  8. 8. Text pre-processing I should shop for jeans at this store Because they use their profits to build homes for people Long-term dependency harms. We trim texts. shop for jeans at this store they use their profits to build homes for people I should <action> because <reason> Action to image / OCR relevance estimation Reason to image / OCR relevance estimation
  9. 9. Training Contrastive Loss : Label. 1=correct text, 0=incorrect text : Distance between text and image/OCR embeddings “ ” old, jeans, new, homes, ... Text embedding OCR words embedding Per-word attention “ ” Text embedding Image embedding Per-region attention
  10. 10. Examples of QA results old jeans new homes we partnered exclusively with brad make it right foundation non profit organization that builds sustainable homes for people in need old jeans new homes we partnered exclusively with brad make it right foundation non profit organization that builds sustainable homes for people in need A. I should shop for jeans at this store Because they use their profits to build homes for people Per-word weights for action Per-word weights for reason
  11. 11. Examples of QA results toyota leadership award the spirit of the leader part courage part determination single minded dedication to personal excellence and the ability to inspire excel in others what makes leader as leader in automotive design and technology toyota recognizes and this spirit by presenting the toyota leader ship award to an each team competing in college football games … Per-word weights for reason A. I should buy Toyota Because they support college leaders
  12. 12. Examples of QA results A. I should shop at the gap Because it will make me sexy No words detected
  13. 13. How much does OCR help? I should shop for jeans at this store Because they use their profits to build homes for people 0.42 P@1 I should shop for jeans at this store Because they use their profits to build homes for people o l d j e a n s new homes 0.82
  14. 14. How much does appearance help? I should shop for jeans at this store Because they use their profits to build homes for people o l d j e a n s new homes 0.85 I should shop for jeans at this store Because they use their profits to build homes for people o l d j e a n s new homes 0.83 P@1 (validation set)
  15. 15. Is OCR quality crucial? A. Yes Score on validation set Google Cloud Vision OCR 0.85 Tesseract OCR v3 0.54 , 1‘2. .4, f4 Water, HydraIiOn and” Water,Hydration and Health A Toolkit for Registered Dietitians Nestle Waters The Healthy Hydration Company Tesseract v3 Google Cloud Vision
  16. 16. Things we didn’t try OCR results correction B L AC K B E R R YnR E M E M B E Rn... BLACKBERRY REMEMBER … Multilingual support '誕生。美容オイル生峦軋のルージュ。nそれは、美容オイルがス ティックになった、贅沢な色艶。nとろけて密着。色っぽくうるん だ官能の唇へ。nマキアージュドラマティックルージュ全10色新発 売nMAKE uP DATA :pクアチイッタルー問" RD425 /トカルーア イPipp-V1233:殷定カラートn2Mn合わ ■0120.30-47100900- 2100库楽ギ始收定AMHEMOnwww.shiseido.cojpmqnレディにし あがれ。nNEWMAQUİIAGEnとの唇、女っぽくて、ごめんなさい。 n' Words detected but discarded but look important
  17. 17. Limitations • Multilingual support • Decorative font • Paintings, drawings
  18. 18. Misc findings • Higher resolution helps • Perhaps because CNNs can literally read texts • Also object detection? • Annotators really read texts • And perhaps logos
  19. 19. Symbolism in Ad Symbolism might be used for rhetorics, but not necessarily for surface messages.
  20. 20. Remaining questions •How well do humans understand ad messages without language clues? •Effects of designs (colors, layout, font style, etc.) in message telling •Better task / dataset design? •Hiding / blurring texts enough?

×