Compact Descriptors for Visual Search

5,215 views
4,695 views

Published on

Published in: Technology
1 Comment
7 Likes
Statistics
Notes
  • Thanks for quoting Moodstocks (http://www.moodstocks.com/) on slide 7! I would like to provide some more precisions: Moodstocks is not an application per-se but an image recognition platform [1] dedicated to mobile devices [2] . Also thanks to our client-side SDK for iOS and Android we do perform on-device matching, i.e. we provide a full-stack local search as described by the (c) model on slide 11 and the 'descriptor extraction and matching in mobile devices' on slide 14.
    Let me know if you have more questions.
    [1] https://developers.moodstocks.com/register
    [2] http://help.moodstocks.com/customer/portal/articles/550150-1-overview
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
5,215
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
120
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

Compact Descriptors for Visual Search

  1. Compact Descriptors 4 Visual Search Danilo Pau (danilo.pau@st.com) Senior Principal Engineer Senior Member of Technical Staff SMIEEE SI/CVRP STMicroelectronics/AST Courtesy: M. Funamizu
  2. Agenda 2• Visual Search: Context• MPEG initiative on Visual Search• Compact Descriptors for Visual Search• Implementation• Use Cases• Visual Search Evolution: Moving Pictures and 3D• Question and Answers Presentation Title 15/01/2013
  3. Agenda 3• Visual Search: Context• MPEG initiative on Visual Search• Compact Descriptors for Visual Search• Implementation• Use Cases• Visual Search Evolution: Moving Pictures and 3D• Question and Answers Presentation Title 15/01/2013
  4. Visual Search Context 4• Millions of images and videos continue being uploaded all over the world on remote servers • Each day on Facebook 300 million photos are uploaded • roughly 58 photos uploaded each second • One hour of video uploaded to YouTube every second Presentation Title 15/01/2013
  5. Content Based Image Recognition 5• CBIR covers the concept of search that analyzes the actual content in the image, rather than relying on metadata.• The development of this concept incorporated many algorithms and techniques from fields such as statistics, pattern recognition and computer vision.• CBIR attracted a lot of attention and after many years of research, it has expanded towards the marketplace.• CBIR’s application on mobile market is called Mobile Visual Search• Visual Search is about the capability to initiate a search using an image as a query that captures a rigid object • Market potential of mobile visual search considers any mobile device with camera (phones, tablets and hybrids). Presentation Title 15/01/2013
  6. CBIR vs QR Codes 6• Quick Response codes, a type of two-dimensional barcode.• The code is scanned by the mobile imager to produce a URL address for re-direction and browsing.• QR codes are being used by 6.2% of the smart phone users in USA Presentation Title 15/01/2013
  7. Lots of Existing Applications 7• Google’s Goggles• Nokia’s Point and Find• oMoby• Like.com• Kooaba• Moodstocks• Snaptell• pixlinQ• Bing Presentation Title 15/01/2013
  8. Existing Apps use Jpeg 8• Previous applications use mobile imager that send JPEG compressed queries Mobile device Send Jpeg images Remote server Visual search result Database Presentation Title 15/01/2013
  9. An Example of Visual Search 9 Interest Point Description Descriptor pairing InliersQuery Courtesy Telecom Italia
  10. The Rise of Compressed Descriptors 10• Alternatively send “compact features” extracted from raw images• For example Scale Invariant Feature Transform – SIFT visual descriptors• Consider 1200 descriptors, each one 128 Bytes, 4 bytes for coordinates, times 30 fps network load nearly 38 Mbit/s unacceptable VGA Image 160 140 120 100 JPEG High KB 80 JPEG Low SIFT 60 40 20 0 JPEG High JPEG Low SIFT Presentation Title 15/01/2013
  11. Systems Considered 11• Instead of sending images (a)• application can send compact descriptors (b)• and even perform search locally (c).
  12. Previous Attempts 12• Hashing • Locality Sensitive Hashing [Yeo et ali., 2008] • Similarity Sensitive Coding [Torralba et ali., 2008] • Spectral Hashing [Weiss et ali, 2008]• Transform Coding • Karunen-love Transform [Chandrasekhar et ali. 2009] • ICA based Transform [Narozny et ali., 2008]• Vector Quantization • Product Quantization [Jegou et ali., 2010] • Tree Structured Vector Quantization [Nistr et ali., 2006]• Alternative to SIFT • Compressed Histogram of Gradients [Chandrasekhar et ali. 2011] Presentation Title 15/01/2013
  13. Agenda 13• Visual Search: Context• MPEG initiative on Visual Search• Compact Descriptors for Visual Search• Implementation• Use Cases• Visual Search Evolution: Moving Pictures and 3D• Question and Answers Presentation Title 15/01/2013
  14. Is a standard on Visual Search needed ? 14• Reduce load on wireless networks carrying visual search-related information.• Ensure interoperability of visual search applications and databases,• Enable hardware support for descriptor extraction and matching in mobile devices,• Enable high level of performance of implementations conformant to the standard,• Simplify design of descriptor extraction and matching for visual search applications,
  15. What is a suitable standardization 15 body ?• Informal title: • Moving Picture Experts Group (MPEG)• Formal title: • ISO/IEC JTC1 SC29 WG11 (Coding of Moving Pictures and Audio) JTC 1• Parent SDOs: • ISO: International Organization for Standardization SC29 • IEC: International Electro technical Commission • JTC 1: Joint Technical Committee One • SC29: Study Committee 29: Coding of Audio, Picture, WG11 (MPEG) Multimedia and Hypermedia Information• Members: National Bodies (25 voting, 16 observers)
  16. 16
  17. Agenda 17• Visual Search: Context• MPEG initiative on Visual Search• Compact Descriptors for Visual Search• Implementation• Use Cases• Visual Search Evolution: Moving Pictures and 3D• Question and Answers Presentation Title 15/01/2013
  18. CDVS : Scope 18• Descriptor extraction process needed to ensure interoperability.• Bitstream of compact descriptors Standard Query Descriptor Descriptor Descriptor Geometric List of Image extraction bitstream matching verification results Database
  19. Requirements 19Robustness High matching accuracy shall be achieved at least for images of textured rigid objects, landmarks, and printed documents. The matching accuracy shall be robust to changes in vantage points, camera parameters, lighting conditions, as well as in the presence of partial occlusions.Sufficiency Descriptors shall be self-contained, in the sense that no other data are necessary for matching.Compactness Shall minimize lengths/size of image descriptorsScalability Shall allow adaptation of descriptor lengths to support the required performance level and database size. Shall enable design of web-scale visual search applications and databases.
  20. How to achieve robustness 20• Image content is transformed into visual feature with coordinates that are invariant to illumination, scale, rotation, affine and perspective transforms
  21. Types of invariance 21• Illumination
  22. Types of invariance 22• Illumination• Scale
  23. Types of invariance 23• Illumination• Scale• Rotation
  24. Types of invariance 24• Illumination• Scale• Rotation• Affine Transform
  25. Types of invariance 25• Illumination• Scale• Rotation• Affine Transform• Full Perspective
  26. Compactness 26KB VGA Image160140120 JPEG High JPEG Low100 SIFT 512B 80 1KB 2KB 60 4KB 8KB 40 16KB 20 0 JPEG High JPEG Low SIFT 512B 1KB 2KB 4KB 8KB 16KB Presentation Title 15/01/2013
  27. Extraction Pipeline 27 Encoding Local Description Transfor Arithmetic m & SQ coding ExtractionImage Keypoint MSVQ Resizing DoG SIFT H Mode selection encoding Compact descriptors S Mode Coordinate coding H-Mode uses SQ encoding (256B) SCFV S-Mode uses MSVQ encoding (38KB) Descriptor Both Mode uses SCFV (49KB)
  28. Properties of SIFT 28David Lowe’s local descriptor detection extraction (1999-2004)Extraordinarily robust matching technique • Can handle changes in viewpoint • Up to about 30 degree out of plane rotation • Can handle significant changes in illumination • Sometimes even day vs. night (below) • Lots of code available http://www.vlfeat.org (BSD license)
  29. Scale 1 Pyramid of DoG Scale m 29Octave 1 DoGs DoGsOctave n DoGs
  30. Actual Interest Point Detector Output 30
  31. Building a Descriptor 31• Take 16x16 patch window around detected interest point• Subdivide patch with 4x4 sub-patches• Create per sub patch 8 bin-histogram over edge orientations weighted by magnitude angle histogram 0 π 2π• These lead to a 4x4x8=128 element vector the SIFT descriptor Presentation Title 15/01/2013
  32. Key point selection 32• Basic idea: inlier features do not behave, in a statistical sense, as do the outlier features.• Relevance value that results from taking into account distance from center, scale, orientation, peak, mean and variance of the SIFT descriptor.
  33. Local Descriptor Compression H mode 33• Main idea is to generate a compressed descriptor from uncompressed SIFT by • Simple linear combinations of histograms • Scalar quantisation of resultant values • Adaptive Arithmetic coding• Main benefits • Very low computational complexity • Negligible memory requirements • Highly scalable • Allows for very efficient matching and retrieval
  34. Vector Quantizer Scheme: S- Mode 34
  35. Location Encoding 35• Histogram Map: The positions of the nonzero bins are encoded as binary words through scanning columns and compressing the words by arithmetic coding.• Histogram Count: The number of coordinates in the nonzero bins is encoded in an iterative fashion, by specifying first which bins contain more than 1 key point, then by specifying which among these that contain more than 2 keypoints, and so forth
  36. Agenda 36• Visual Search: Context• MPEG initiative on Visual Search• Compact Descriptors for Visual Search• Implementation• Use Cases• Visual Search Evolution: Moving Pictures and 3D• Question and Answers Presentation Title 15/01/2013
  37. Extraction times 37• SIFT interest point detection and feature extraction made the biggest contribution• Global descriptors as complex as Interest Point Detection• Very fast local descriptors and coordinate encoding Quantitative evaluation of CDVS extraction and pairwise matching 15/01/2013
  38. Agenda 38• Visual Search: Context• MPEG initiative on Visual Search• Compact Descriptors for Visual Search• Implementation• Use Cases• Visual Search Evolution: Moving Pictures and 3D• Question and Answers Presentation Title 15/01/2013
  39. Mobile Visual Search: Music CDs 39 Query Stream Music… …
  40. Visual Search: eReaders, Printers 40Snapshot Mass Storage AugmentationPaper-copy Initiate Visual 3D models and markers Search Send Compact Transmission of markers and 3D Query models Augmentation Rendering 2D / 3D RenderingSelective quality&content Multimedia Content Retrieval Composition ofprinting From the cloud augmentations and image Content Augmentation
  41. News Finder 41Still Pictures - Visual Search Presentation Title 15/01/2013
  42. Application and Use Cases from 42 Broadcaster point of view• Logo Detection• Interactive Fruition Courtesy RAI Presentation Title 15/01/2013
  43. Automotive 3D Top View 43 Cam ECUCam Cam Cam
  44. Automotive 3D Top View 44
  45. Moving Pictures Visual Search 45 Courtesy Telecom Design
  46. Agenda 46• Visual Search: Context• MPEG initiative on Visual Search• Compact Descriptors for Visual Search• Implementation• Use Cases• Visual Search Evolution: Moving Pictures and 3D• Question and Answers Presentation Title 15/01/2013
  47. Intra Predicted Descriptors 47 Desirable Properties: An inter descriptor coded in a compact visual stream Expressed in terms of one or more temporally neighboring descriptors. The "inter" part of the term refers to the use of Inter Frame Prediction. Designed to achieve higher compression rates and/or better precision-recall performances Presentation Title 15/01/2013
  48. 3D Mobile Devices Will Surpass 148 Million 48 in 2015• Advances in the 3D technology are very fast• Industry adoption opens new opportunities 3D Visual Search• From In-Stat studies: • ~ 30 % of all handheld game consoles will be 3D by 2015. • 3D mobile devices will increase demand for image sensors by 130 %. • In 2012, Notebook will be the first 3D enabled mobile device to reach 1 million units. • By 2014, 18 % of all tablets will be 3D. • Nintendo, Fuji, GoPro, Sony, ViewSonic, LG, Origin, Toshiba, Fujitsu, HP, ASUS, Lenovo, Dell, Alienware, HTC and Sharp focusing on autostereoscopy mobile technologies Presentation Title 15/01/2013
  49. Microsoft Kinect Asus Xtion 49 LG Optimus 3D P920 LG Optimus Pad 3DS by NintendoGoogle 3D WarehouseHTC EVO 3D Sharp Aquos SH-12C Presentation Title 15/01/2013
  50. 3D Object Recognition with Kinect 50SHOT: Unique Signatures of Histograms for Local Surface Description http://www.youtube.com/watch?v=eRW1zG_aONk Courtesy: CV laboratory University of Bologna Presentation Title 15/01/2013
  51. Agenda 51• Visual Search: Context• MPEG initiative on Visual Search• Compact Descriptors for Visual Search• Implementation• Use Cases• Visual Search Evolution: Moving Pictures and 3D• Question and Answers Presentation Title 15/01/2013
  52. 52Presentation Title 15/01/2013

×