Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Image search at facebook - making sense of one of the largest image databases in the world

404 views

Published on

Fedor Borisyuk, Technical Leader in the Domain of Computer Vision at Facebook

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Image search at facebook - making sense of one of the largest image databases in the world

  1. 1. Image Search at Facebook: Making sense of one of the largest image databases in the world Fedor Borisyuk, engineering leader at Facebook
  2. 2. A bit about me • Fedor Borisyuk • At Facebook since April 2017 • Lead ML teams in the domains • Computer vision
  3. 3. Agenda 1. Photo Search product 2. Photo Search at FB 3. Deep dive: Large scale image classification 4. Deep dive: Optical character recognition 5. Q & A
  4. 4. 1. Overview of Photo Search product
  5. 5. Photo Search at Facebook •Social Photos – posted by friends •Public photos – posted by people to be publicly visible •Over a billion images uploaded every day
  6. 6. What people are searching for https://unsplash.com/photos/eIvu9C94UfY https://unsplash.com/photos/c9H7UzXK7uk https://unsplash.com/photos/yihlaRCCvd4 https://unsplash.com/photos/UWw9OD3pIMo https://unsplash.com/photos/4V07cUP8Sxc https://unsplash.com/photos/FBXuXp57eM0 https://unsplash.com/photos/PGnqT0rXWLs Friends photos Celebrities Products Memes https://unsplash.com/photos/EzH46XCDQRY Recipes Music/Movies Places Sport events News https://www.nps.gov/locations/alaska/news.htm
  7. 7. What people are searching for https://unsplash.com/photos/yihlaRCCvd4 Query: running dog meme https://unsplash.com/photos/DIZBFTl7c-A Query: child pink skirt https://en.wikipedia.org/wiki/Strelitzia#/media/File:Strelitzia_larger.jpg Query: strelitzia
  8. 8. 2. Photo Search at FB
  9. 9. Unicorn: Infrastructure of search * Unicorn: A System for Searching the Social Graph, VLDB, 2013, Mike Curtiss et al.
  10. 10. Photo Search Ranking pipeline Search request Retrieval 1st stage ranking 2nd stage ranking Models https://code.fb.com/ml-applications/under-the-hood-photo-search/
  11. 11. Overview ML Technologies • CNNs for large scale image classification • Ranking • Neural networks • GBDTs • Features based on: • Image clustering • Image tagging • Image quality • Multimodal relationship between Query and Image • Optical character recognition
  12. 12. Modeling similarity between query and image • Multilingual query embeddings trained using Fasttext (https://github.com/facebookresearch/fastText) • Image embeddings trained on ResNeXt
  13. 13. Extending Photos with textual description Publication: Multi-model similarity propagation and its application for web image retrieval, Xin-Jing Wang at el. Photos are coming from: https://unsplash.com/photos/3WhQe8sEBZU https://unsplash.com/photos/ie8giTVBVxE https://unsplash.com/photos/9FWfFy4N4R8 https://unsplash.com/photos/a90WklNaPBM https://unsplash.com/photos/9EwxGJdTJNo
  14. 14. 3. Deep dive: Large scale image classification
  15. 15. Large scale Image classification • Architecture: ResNeXt 101 with >800 million parameters • Train data: 3.5 billion public images and 17,000 hashtags ECCV, 2018 Supervised Unsupervised ImageNet: Cat, dog, … #cat, #dog, … Weakly supervised
  16. 16. Large scale Image classification: Noise
  17. 17. Large scale Image classification • Labels collision • utilize WordNet to merge some hashtags into a single canonical form (e.g., #brownbear and #ursusarctos are merged) • Skewed label distribution • Square root sampling
  18. 18. 4. Deep dive: Optical character recognition
  19. 19. Optical Character Recognition • OCR is a process of conversion of electronic images into machine encoded text
  20. 20. Optical character recognition KDD, 2018
  21. 21. OCR End-to-end Process
  22. 22. Text Detection Model • Faster R-CNN performs detection and object recognition by: Learn CNN Image Representati on Learn region proposal network to produce bounding boxes Learn classifier to recognize if box contains text Remove duplicate overlapping boxes Learn regression to refine boxes coordinates
  23. 23. • CNN ResNet-18 architecture • Cast as sequence prediction problem: • Input: the image containing the text • Output: sequence of characters • Use Connectionist Temporal Classification (CTC) loss to train Text Recognition Model
  24. 24. • Recognition model inference: • in linear time by greedily taking the most likely character at every position • recognize words of arbitrary length and out-of- vocabulary words Text Recognition Model
  25. 25. • CTC model harder to train as model consistently diverged • Curriculum learning – start easy: • short words <= 5 characters • low learning rate so the model doesn’t diverge Curriculum learning training
  26. 26. Q & A

×