This document discusses compact descriptors for visual search. It begins by providing context on visual search and content-based image recognition. It then discusses MPEG's initiative to standardize compact descriptors to enable interoperable visual search applications. The presentation describes the requirements and techniques for developing compact yet robust descriptors, including achieving various types of invariance. It presents different methods for compacting descriptors and their properties. Finally, it discusses potential use cases and the evolution of visual search to include video, 3D objects, and augmented reality.
1. Compact Descriptors 4 Visual Search
Danilo Pau (danilo.pau@st.com)
Senior Principal Engineer
Senior Member of Technical Staff
SMIEEE
SI/CVRP
STMicroelectronics/AST
Courtesy: M. Funamizu
2. Agenda 2
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
Presentation Title 15/01/2013
3. Agenda 3
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
Presentation Title 15/01/2013
4. Visual Search Context 4
• Millions of images and videos continue being uploaded all over the
world on remote servers
• Each day on Facebook 300 million photos are uploaded
• roughly 58 photos uploaded each second
• One hour of video uploaded to YouTube every second
Presentation Title 15/01/2013
5. Content Based Image Recognition 5
• CBIR covers the concept of search that analyzes the actual content in
the image, rather than relying on metadata.
• The development of this concept incorporated many algorithms and
techniques from fields such as statistics, pattern recognition and
computer vision.
• CBIR attracted a lot of attention and after many years of research, it
has expanded towards the marketplace.
• CBIR’s application on mobile market is called Mobile Visual Search
• Visual Search is about the capability to initiate a search using an
image as a query that captures a rigid object
• Market potential of mobile visual search considers any mobile device with camera
(phones, tablets and hybrids).
Presentation Title 15/01/2013
6. CBIR vs QR Codes 6
• Quick Response codes, a type of two-dimensional barcode.
• The code is scanned by the mobile imager to produce a URL address
for re-direction and browsing.
• QR codes are being used by 6.2% of the smart phone users in USA
Presentation Title 15/01/2013
7. Lots of Existing Applications 7
• Google’s Goggles
• Nokia’s Point and Find
• oMoby
• Like.com
• Kooaba
• Moodstocks
• Snaptell
• pixlinQ
• Bing
Presentation Title 15/01/2013
8. Existing Apps use Jpeg 8
• Previous applications use mobile imager that send JPEG compressed
queries
Mobile device
Send Jpeg images Remote server
Visual search result
Database
Presentation Title 15/01/2013
9. An Example of Visual Search 9
Interest Point Description
Descriptor pairing
Inliers
Query
Courtesy Telecom Italia
10. The Rise of Compressed Descriptors 10
• Alternatively send “compact features” extracted from raw images
• For example Scale Invariant Feature Transform – SIFT visual
descriptors
• Consider 1200 descriptors, each one 128 Bytes, 4 bytes for
coordinates, times 30 fps network load nearly 38 Mbit/s
unacceptable VGA Image
160
140
120
100 JPEG High
KB 80 JPEG Low
SIFT
60
40
20
0
JPEG High JPEG Low SIFT Presentation Title 15/01/2013
11. Systems Considered 11
• Instead of sending images
(a)
• application can send
compact descriptors (b)
• and even perform search
locally (c).
12. Previous Attempts 12
• Hashing
• Locality Sensitive Hashing [Yeo et ali., 2008]
• Similarity Sensitive Coding [Torralba et ali., 2008]
• Spectral Hashing [Weiss et ali, 2008]
• Transform Coding
• Karunen-love Transform [Chandrasekhar et ali. 2009]
• ICA based Transform [Narozny et ali., 2008]
• Vector Quantization
• Product Quantization [Jegou et ali., 2010]
• Tree Structured Vector Quantization [Nistr et ali., 2006]
• Alternative to SIFT
• Compressed Histogram of Gradients [Chandrasekhar et ali. 2011]
Presentation Title 15/01/2013
13. Agenda 13
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
Presentation Title 15/01/2013
14. Is a standard on Visual Search needed ? 14
• Reduce load on wireless networks carrying visual search-related
information.
• Ensure interoperability of visual search applications and databases,
• Enable hardware support for descriptor extraction and matching in
mobile devices,
• Enable high level of performance of implementations conformant to
the standard,
• Simplify design of descriptor extraction and matching for visual search
applications,
15. What is a suitable standardization
15
body ?
• Informal title:
• Moving Picture Experts Group (MPEG)
• Formal title:
• ISO/IEC JTC1 SC29 WG11 (Coding of Moving Pictures and Audio)
JTC 1
• Parent SDOs:
• ISO: International Organization for Standardization SC29
• IEC: International Electro technical Commission
• JTC 1: Joint Technical Committee One
• SC29: Study Committee 29: Coding of Audio, Picture, WG11 (MPEG)
Multimedia and Hypermedia Information
• Members: National Bodies (25 voting, 16 observers)
17. Agenda 17
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
Presentation Title 15/01/2013
18. CDVS : Scope 18
• Descriptor extraction process needed to ensure interoperability.
• Bitstream of compact descriptors
Standard
Query Descriptor Descriptor Descriptor Geometric List of
Image extraction bitstream matching verification results
Database
19. Requirements 19
Robustness
High matching accuracy shall be achieved at least for images of textured
rigid objects, landmarks, and printed documents.
The matching accuracy shall be robust to changes in vantage points,
camera parameters, lighting conditions, as well as in the presence of partial
occlusions.
Sufficiency
Descriptors shall be self-contained, in the sense that no other data are
necessary for matching.
Compactness
Shall minimize lengths/size of image descriptors
Scalability
Shall allow adaptation of descriptor lengths to support the required
performance level and database size.
Shall enable design of web-scale visual search applications and
databases.
20. How to achieve robustness 20
• Image content is transformed into visual feature with coordinates
that are invariant to illumination, scale, rotation, affine and
perspective transforms
27. Extraction Pipeline 27
Encoding
Local Description Transfor Arithmetic
m & SQ coding
Extraction
Image Keypoint MSVQ
Resizing DoG SIFT H Mode
selection encoding Compact
descriptors
S Mode
Coordinate
coding
H-Mode uses SQ encoding (256B) SCFV
S-Mode uses MSVQ encoding (38KB) Descriptor
Both Mode uses SCFV (49KB)
28. Properties of SIFT 28
David Lowe’s local descriptor detection extraction (1999-2004)
Extraordinarily robust matching technique
• Can handle changes in viewpoint
• Up to about 30 degree out of plane rotation
• Can handle significant changes in illumination
• Sometimes even day vs. night (below)
• Lots of code available http://www.vlfeat.org (BSD license)
29. Scale 1
Pyramid of DoG
Scale m
29
Octave 1
DoGs
DoGs
Octave n
DoGs
31. Building a Descriptor 31
• Take 16x16 patch window around detected interest point
• Subdivide patch with 4x4 sub-patches
• Create per sub patch 8 bin-histogram over edge orientations weighted
by magnitude
angle histogram
0 π
2π
• These lead to a 4x4x8=128 element vector the SIFT descriptor
Presentation Title 15/01/2013
32. Key point selection 32
• Basic idea: inlier features do not behave, in a statistical sense, as do
the outlier features.
• Relevance value that results from taking into account distance from
center, scale, orientation, peak, mean and variance of the SIFT
descriptor.
33. Local Descriptor Compression H mode 33
• Main idea is to generate a compressed descriptor from
uncompressed SIFT by
• Simple linear combinations of histograms
• Scalar quantisation of resultant values
• Adaptive Arithmetic coding
• Main benefits
• Very low computational complexity
• Negligible memory requirements
• Highly scalable
• Allows for very efficient matching and retrieval
35. Location Encoding 35
• Histogram Map: The positions of the nonzero bins are encoded as
binary words through scanning columns and compressing the words by
arithmetic coding.
• Histogram Count: The number of coordinates in the nonzero bins is
encoded in an iterative fashion, by specifying first which bins contain
more than 1 key point, then by specifying which among these that
contain more than 2 keypoints, and so forth
36. Agenda 36
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
Presentation Title 15/01/2013
37. Extraction times 37
• SIFT interest point detection and feature extraction made the biggest
contribution
• Global descriptors as complex as Interest Point Detection
• Very fast local descriptors and coordinate encoding
Quantitative evaluation of CDVS extraction and pairwise matching 15/01/2013
38. Agenda 38
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
Presentation Title 15/01/2013
40. Visual Search: eReaders, Printers 40
Snapshot Mass Storage
Augmentation
Paper-copy Initiate Visual 3D models and markers
Search Send
Compact Transmission of
markers and 3D
Query models
Augmentation
Rendering
2D / 3D
Rendering
Selective quality&content Multimedia Content Retrieval Composition of
printing From the cloud augmentations
and image
Content Augmentation
41. News Finder
41
Still Pictures - Visual Search
Presentation Title 15/01/2013
42. Application and Use Cases from
42
Broadcaster point of view
• Logo Detection
• Interactive Fruition
Courtesy RAI Presentation Title 15/01/2013
46. Agenda 46
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
Presentation Title 15/01/2013
47. Intra Predicted Descriptors 47
Desirable Properties:
An inter descriptor coded in a
compact visual stream
Expressed in terms of one or
more temporally neighboring
descriptors.
The "inter" part of the term
refers to the use of Inter Frame
Prediction.
Designed to achieve higher
compression rates and/or better
precision-recall performances
Presentation Title 15/01/2013
48. 3D Mobile Devices Will Surpass 148 Million
48
in 2015
• Advances in the 3D technology are very fast
• Industry adoption opens new opportunities 3D Visual Search
• From In-Stat studies:
• ~ 30 % of all handheld game consoles will be 3D by 2015.
• 3D mobile devices will increase demand for image sensors by 130 %.
• In 2012, Notebook will be the first 3D enabled mobile device to reach 1 million
units.
• By 2014, 18 % of all tablets will be 3D.
• Nintendo, Fuji, GoPro, Sony, ViewSonic, LG, Origin, Toshiba, Fujitsu, HP, ASUS,
Lenovo, Dell, Alienware, HTC and Sharp focusing on autostereoscopy mobile
technologies
Presentation Title 15/01/2013
49. Microsoft Kinect Asus Xtion
49
LG Optimus 3D P920
LG Optimus Pad
3DS by Nintendo
Google 3D Warehouse
HTC EVO 3D Sharp Aquos SH-12C
Presentation Title 15/01/2013
50. 3D Object Recognition with Kinect 50
SHOT: Unique Signatures of Histograms for Local Surface Description
http://www.youtube.com/watch?v=eRW1zG_aONk
Courtesy: CV laboratory University of Bologna
Presentation Title 15/01/2013
51. Agenda 51
• Visual Search: Context
• MPEG initiative on Visual Search
• Compact Descriptors for Visual Search
• Implementation
• Use Cases
• Visual Search Evolution: Moving Pictures and 3D
• Question and Answers
Presentation Title 15/01/2013