Compact Descriptors for Visual Search

Compact Descriptors 4 Visual Search
Danilo Pau (danilo.pau@st.com)
Senior Principal Engineer
Senior Member of Technical Staff
SMIEEE
SI/CVRP
STMicroelectronics/AST

Courtesy: M. Funamizu

Agenda 2

• Visual Search: Context

• MPEG initiative on Visual Search

• Compact Descriptors for Visual Search

• Implementation

• Use Cases

• Visual Search Evolution: Moving Pictures and 3D

• Question and Answers

Presentation Title 15/01/2013

Agenda 3




• Implementation

• Use Cases




Visual Search Context 4

• Millions of images and videos continue being uploaded all over the
world on remote servers

• Each day on Facebook 300 million photos are uploaded

• roughly 58 photos uploaded each second

• One hour of video uploaded to YouTube every second


Content Based Image Recognition 5

• CBIR covers the concept of search that analyzes the actual content in
the image, rather than relying on metadata.

• The development of this concept incorporated many algorithms and
techniques from fields such as statistics, pattern recognition and
computer vision.

• CBIR attracted a lot of attention and after many years of research, it
has expanded towards the marketplace.

• CBIR’s application on mobile market is called Mobile Visual Search

• Visual Search is about the capability to initiate a search using an
image as a query that captures a rigid object
• Market potential of mobile visual search considers any mobile device with camera
(phones, tablets and hybrids).


CBIR vs QR Codes 6

• Quick Response codes, a type of two-dimensional barcode.

• The code is scanned by the mobile imager to produce a URL address
for re-direction and browsing.

• QR codes are being used by 6.2% of the smart phone users in USA


Lots of Existing Applications 7

• Google’s Goggles
• Nokia’s Point and Find
• oMoby
• Like.com
• Kooaba
• Moodstocks
• Snaptell
• pixlinQ
• Bing


Existing Apps use Jpeg 8

• Previous applications use mobile imager that send JPEG compressed
queries

Mobile device
Send Jpeg images Remote server

Visual search result
Database


An Example of Visual Search 9

Interest Point Description
Descriptor pairing
Inliers

Query
Courtesy Telecom Italia

The Rise of Compressed Descriptors 10

• Alternatively send “compact features” extracted from raw images

• For example Scale Invariant Feature Transform – SIFT visual
descriptors

• Consider 1200 descriptors, each one 128 Bytes, 4 bytes for
coordinates, times 30 fps network load nearly 38 Mbit/s
unacceptable VGA Image

160

140

120

100 JPEG High
KB 80 JPEG Low
SIFT
60

40

20

0
JPEG High JPEG Low SIFT Presentation Title 15/01/2013

Systems Considered 11

• Instead of sending images
(a)

• application can send
compact descriptors (b)

• and even perform search
locally (c).

Previous Attempts 12

• Hashing
• Locality Sensitive Hashing [Yeo et ali., 2008]
• Similarity Sensitive Coding [Torralba et ali., 2008]
• Spectral Hashing [Weiss et ali, 2008]

• Transform Coding
• Karunen-love Transform [Chandrasekhar et ali. 2009]
• ICA based Transform [Narozny et ali., 2008]

• Vector Quantization
• Product Quantization [Jegou et ali., 2010]
• Tree Structured Vector Quantization [Nistr et ali., 2006]

• Alternative to SIFT
• Compressed Histogram of Gradients [Chandrasekhar et ali. 2011]


Agenda 13




• Implementation

• Use Cases




Is a standard on Visual Search needed ? 14

• Reduce load on wireless networks carrying visual search-related
information.

• Ensure interoperability of visual search applications and databases,

• Enable hardware support for descriptor extraction and matching in
mobile devices,

• Enable high level of performance of implementations conformant to
the standard,

• Simplify design of descriptor extraction and matching for visual search
applications,

What is a suitable standardization
15
body ?
• Informal title:
• Moving Picture Experts Group (MPEG)

• Formal title:
• ISO/IEC JTC1 SC29 WG11 (Coding of Moving Pictures and Audio)
JTC 1

• Parent SDOs:
• ISO: International Organization for Standardization SC29
• IEC: International Electro technical Commission
• JTC 1: Joint Technical Committee One
• SC29: Study Committee 29: Coding of Audio, Picture, WG11 (MPEG)
Multimedia and Hypermedia Information

• Members: National Bodies (25 voting, 16 observers)

Agenda 17




• Implementation

• Use Cases




CDVS : Scope 18

• Descriptor extraction process needed to ensure interoperability.

• Bitstream of compact descriptors

Standard

Query Descriptor Descriptor Descriptor Geometric List of
Image extraction bitstream matching verification results

Database

Requirements 19

Robustness
High matching accuracy shall be achieved at least for images of textured
rigid objects, landmarks, and printed documents.
The matching accuracy shall be robust to changes in vantage points,
camera parameters, lighting conditions, as well as in the presence of partial
occlusions.
Sufficiency
Descriptors shall be self-contained, in the sense that no other data are
necessary for matching.
Compactness
Shall minimize lengths/size of image descriptors
Scalability
Shall allow adaptation of descriptor lengths to support the required
performance level and database size.
Shall enable design of web-scale visual search applications and
databases.

How to achieve robustness 20

• Image content is transformed into visual feature with coordinates
that are invariant to illumination, scale, rotation, affine and
perspective transforms

Types of invariance 21

• Illumination


• Illumination

• Scale


• Illumination

• Scale

• Rotation


• Illumination

• Scale

• Rotation

• Affine Transform


• Illumination

• Scale

• Rotation

• Affine Transform

• Full Perspective

Compactness 26

KB VGA Image

160

140

120 JPEG High
JPEG Low
100 SIFT
512B
80 1KB
2KB
60 4KB
8KB
40 16KB

20

0
JPEG High JPEG Low SIFT 512B 1KB 2KB 4KB 8KB 16KB


Extraction Pipeline 27

Encoding

Local Description Transfor Arithmetic
m & SQ coding
Extraction

Image Keypoint MSVQ
Resizing DoG SIFT H Mode
selection encoding Compact
descriptors

S Mode
Coordinate
coding

H-Mode uses SQ encoding (256B) SCFV

S-Mode uses MSVQ encoding (38KB) Descriptor

Both Mode uses SCFV (49KB)

Properties of SIFT 28

David Lowe’s local descriptor detection extraction (1999-2004)
Extraordinarily robust matching technique
• Can handle changes in viewpoint
• Up to about 30 degree out of plane rotation
• Can handle significant changes in illumination
• Sometimes even day vs. night (below)
• Lots of code available http://www.vlfeat.org (BSD license)

Scale 1
Pyramid of DoG
Scale m
29

Octave 1

DoGs

DoGs

Octave n

DoGs

Actual Interest Point Detector Output 30

Building a Descriptor 31

• Take 16x16 patch window around detected interest point

• Subdivide patch with 4x4 sub-patches

• Create per sub patch 8 bin-histogram over edge orientations weighted
by magnitude
angle histogram

0 π
2π

• These lead to a 4x4x8=128 element vector the SIFT descriptor


Key point selection 32

• Basic idea: inlier features do not behave, in a statistical sense, as do
the outlier features.

• Relevance value that results from taking into account distance from
center, scale, orientation, peak, mean and variance of the SIFT
descriptor.

Local Descriptor Compression H mode 33

• Main idea is to generate a compressed descriptor from
uncompressed SIFT by
• Simple linear combinations of histograms
• Scalar quantisation of resultant values
• Adaptive Arithmetic coding

• Main benefits
• Very low computational complexity
• Negligible memory requirements
• Highly scalable
• Allows for very efficient matching and retrieval

Vector Quantizer Scheme: S- Mode 34

Location Encoding 35

• Histogram Map: The positions of the nonzero bins are encoded as
binary words through scanning columns and compressing the words by
arithmetic coding.
• Histogram Count: The number of coordinates in the nonzero bins is
encoded in an iterative fashion, by specifying first which bins contain
more than 1 key point, then by specifying which among these that
contain more than 2 keypoints, and so forth

Agenda 36




• Implementation

• Use Cases




Extraction times 37

• SIFT interest point detection and feature extraction made the biggest
contribution

• Global descriptors as complex as Interest Point Detection

• Very fast local descriptors and coordinate encoding

Quantitative evaluation of CDVS extraction and pairwise matching 15/01/2013

Agenda 38




• Implementation

• Use Cases




Mobile Visual Search: Music CDs 39

Query

Stream Music

… …

Visual Search: eReaders, Printers 40

Snapshot Mass Storage
Augmentation
Paper-copy Initiate Visual 3D models and markers
Search Send
Compact Transmission of
markers and 3D
Query models

Augmentation
Rendering
2D / 3D
Rendering
Selective quality&content Multimedia Content Retrieval Composition of
printing From the cloud augmentations
and image

Content Augmentation

News Finder
41
Still Pictures - Visual Search


Application and Use Cases from
42
Broadcaster point of view
• Logo Detection

• Interactive Fruition

Courtesy RAI Presentation Title 15/01/2013

Automotive 3D Top View 43

Cam
ECU
Cam Cam

Cam

Moving Pictures Visual Search 45

Courtesy Telecom Design

Agenda 46




• Implementation

• Use Cases




Intra Predicted Descriptors 47

Desirable Properties:

An inter descriptor coded in a
compact visual stream
Expressed in terms of one or
more temporally neighboring
descriptors.
The "inter" part of the term
refers to the use of Inter Frame
Prediction.
Designed to achieve higher
compression rates and/or better
precision-recall performances


3D Mobile Devices Will Surpass 148 Million
48
in 2015
• Advances in the 3D technology are very fast

• Industry adoption opens new opportunities 3D Visual Search

• From In-Stat studies:
• ~ 30 % of all handheld game consoles will be 3D by 2015.
• 3D mobile devices will increase demand for image sensors by 130 %.
• In 2012, Notebook will be the first 3D enabled mobile device to reach 1 million
units.
• By 2014, 18 % of all tablets will be 3D.
• Nintendo, Fuji, GoPro, Sony, ViewSonic, LG, Origin, Toshiba, Fujitsu, HP, ASUS,
Lenovo, Dell, Alienware, HTC and Sharp focusing on autostereoscopy mobile
technologies


Microsoft Kinect Asus Xtion

49

LG Optimus 3D P920

LG Optimus Pad

3DS by Nintendo
Google 3D Warehouse
HTC EVO 3D Sharp Aquos SH-12C


3D Object Recognition with Kinect 50

SHOT: Unique Signatures of Histograms for Local Surface Description

http://www.youtube.com/watch?v=eRW1zG_aONk
Courtesy: CV laboratory University of Bologna

Agenda 51




• Implementation

• Use Cases




52


Compact Descriptors for Visual Search

Recommended

Recommended

More Related Content

Similar to Compact Descriptors for Visual Search

Similar to Compact Descriptors for Visual Search (20)

More from Antonio Capone

More from Antonio Capone (14)

Recently uploaded

Recently uploaded (20)

Compact Descriptors for Visual Search