Large-scale search
with polysemous codes
Florent Perronnin
Naver Labs Europe
Problem statement
Given a query, find closest match(es) in large database of “entities”
Example entities: image, video, text, post, user, ad, …
Example applications:
• video copy detection (query = video, database = video)
• blog recommendation (query = user, database = blogs)
• ad placement (query = user, database = ads)
→ very large-scale problems
2
Problem statement
Visual signatures: compact, fast, accurate
DB
query
3
Image signatures
Step 1: embedding in common Eucliean space (approx. 1-10K dim)
Step 2: compression
“the cat”
“the dog”
4
Step 1: embedding in common Eucliean space (approx. 1-10K dim)
Step 2: compression → our focus in this talk
“the cat”
“the dog”
Image signatures
5
CONTENTS
1. Background: similarity search with compact codes
2. Polysemous codes
3. Application: knn-graph construction
6
1.
Background: large-scale
search with compact codes
1.1 Binary codes
000 001
100
110 111
011
101
010
Idea: design/learn a function mapping the original space into a
compact Hamming space
Neighbors w.r.t Hamming space try to reflect neighbors in original space
Advantages: compact descriptor, fast distance computation
LSH example: random projection + thresholding
[Charikar’02] shows that Hamming distance gives a cosine estimator 8
1.2 Product quantization (PQ)
y = y1 y2 y3 y4
Decompose the feature space as a product space
• use a distinct quantizer in each subspace, typically k-means
• estimate distances by using look-ups and additions only
[Jégou’11]
9
1.3 Binary codes vs PQ
Binary codes (ITQ) [Gong’11] Product quantization
e.g. [01000110…01] e.g. [2 63 27 227]
context-free comparison need quantizer centroids
1,190M comparisons / sec 222M comparisons / sec
precision = 0.143 precision = 0.442
How to get the best of both worlds?
Seen as competing methods in literature
10
2.
Polysemous codes
[Polysemous codes, Douze, Jégou, Perronnin, ECCV’16]
2.1 A naïve approach
q_bin= binary_encode(x) # compute query binary code
d_min = ∞
for i = 1..n # loop over database items
db_bin = db_bin_codes[i] # get binary code for item i
if hamming(q_bin, db_min) < threshold
db_pq = db_pq_codes[i] # get PQ code for item i
d = PQ_distance(x, db_pq)
if d < d_min
nearest_neighbor, d_min = i, d
Encode all DB items with binary and PQ codes:
12
2.2 A naïve approach
Encode all DB items with binary and PQ codes
→ memory increase (x2)
Could we use the same codes for both the Hamming and
PQ distances?
→ polysemous codes
13
2.3 Channel optimized vector quantization
Channel-optimized vector quantizers: “pseudo-Gray coding”
Minimize the overall expected distortion (both from source and channel)
Optimize the index assignment → neighboring codes encode similar info
enc=01001100 dec=01011100
14
2.4 Index assignment optimization
Given a k-means quantizer, learn a permutation of the codes such that the
binary comparison reflects centroid distances
15
2.5 The polysemous approach
q = encode(x) # compute query code
d_min = ∞
for i = 1..n # loop over database items
db = db_codes[i] # get code for item i
if hamming(q, db) < threshold
d = PQ_distance(x, db)
if d < d_min
nearest_neighbor, d_min = i, d
Interpret PQ codes as binary codes:
16
2.5 The polysemous approach
q = encode(x) # compute query code
d_min = ∞
for i = 1..n # loop over database items
db = db_codes[i] # get code for item i
if hamming(q, db) < threshold
d = PQ_distance(x, db)
if d < d_min
nearest_neighbor, d_min = i, d
Interpret PQ codes as binary codes:
→ no memory increase 17
2.6 Objective function
Find a permutation 𝝅() such that
the Hamming distance between permuted indices matches
the distance between centroids
weighting to
favor nearby
centroids
×
optimize permutation
over all pairs
of centroids
monotonous (linear)
function to correct the scale
18
2.6 Objective function
Find a permutation 𝝅() such that
the Hamming distance between permuted indices matches
the distance between centroids
weighting to
favor nearby
centroids
×
optimize permutation
over all pairs
of centroids
monotonous (linear)
function to correct the scale
19
2.6 Objective function
Find a permutation 𝝅() such that
the Hamming distance between permuted indices matches
the distance between centroids
weighting to
favor nearby
centroids
×
optimize permutation
over all pairs
of centroids
monotonous (linear)
function to correct the scale
20
2.6 Objective function
Find a permutation 𝝅() such that
the Hamming distance between permuted indices matches
the distance between centroids
weighting to
favor nearby
centroids
×
optimize permutation
over all pairs
of centroids
monotonous (linear)
function to correct the scale
21
2.7 Optimization
Simulated annealing:
• initialization: random permutation
• swap two entries in the permutation
• converges in approx. 200k iterations (<10s)
22
2.8 Optimization
23
2.9 Results: exhaustive search
1M SIFT vectors: exhaustive comparison (16 bytes)
24
90M CNN descriptors from Flickr100M dataset (16 bytes)
2.10 Results: non-exhaustive search
25
BIGANN academic benchmark (1B vectors) → x2-2.5 speed-up
3.
The knn-graph of a
collection
3.1 Building a graph on images
Testbed: Flickr100M
• public dataset of CC images
• described with AlexNet FC7 features
normalized, PCA to 256D, encoded as 32bytes,
coarse quantizer size 4096
Each image in turn is a query
• compute 100-NN
• build index = 14h, search = 7h
• storage for the graph = 2 x 40 GB RAM
3.2 Graph modes
Graph seen as a Markov model
→ compute stationary distribution [Cho’12]
Sparse matrix – vector multiplication
• 200 iterations (30s / iter)
• mode = local maximum over nodes
3.3 Paths in the graph
Almost all images are connected: find path between pairs of images
→ morphing from one image to another
Which paths?
• shortest path
• minimize sum of distances
• minimize max of distances
29
3.4 Path: flower to flower
30
3.5 Path: lion to panther
31
3.6 Path: dog to cat
32
3.7 Path: 3D morphing
33
3.8 Path: different objects
34
3.9 Path: non intuitive…
35
Q & A
Thank you

[241]large scale search with polysemous codes

  • 1.
    Large-scale search with polysemouscodes Florent Perronnin Naver Labs Europe
  • 2.
    Problem statement Given aquery, find closest match(es) in large database of “entities” Example entities: image, video, text, post, user, ad, … Example applications: • video copy detection (query = video, database = video) • blog recommendation (query = user, database = blogs) • ad placement (query = user, database = ads) → very large-scale problems 2
  • 3.
    Problem statement Visual signatures:compact, fast, accurate DB query 3
  • 4.
    Image signatures Step 1:embedding in common Eucliean space (approx. 1-10K dim) Step 2: compression “the cat” “the dog” 4
  • 5.
    Step 1: embeddingin common Eucliean space (approx. 1-10K dim) Step 2: compression → our focus in this talk “the cat” “the dog” Image signatures 5
  • 6.
    CONTENTS 1. Background: similaritysearch with compact codes 2. Polysemous codes 3. Application: knn-graph construction 6
  • 7.
  • 8.
    1.1 Binary codes 000001 100 110 111 011 101 010 Idea: design/learn a function mapping the original space into a compact Hamming space Neighbors w.r.t Hamming space try to reflect neighbors in original space Advantages: compact descriptor, fast distance computation LSH example: random projection + thresholding [Charikar’02] shows that Hamming distance gives a cosine estimator 8
  • 9.
    1.2 Product quantization(PQ) y = y1 y2 y3 y4 Decompose the feature space as a product space • use a distinct quantizer in each subspace, typically k-means • estimate distances by using look-ups and additions only [Jégou’11] 9
  • 10.
    1.3 Binary codesvs PQ Binary codes (ITQ) [Gong’11] Product quantization e.g. [01000110…01] e.g. [2 63 27 227] context-free comparison need quantizer centroids 1,190M comparisons / sec 222M comparisons / sec precision = 0.143 precision = 0.442 How to get the best of both worlds? Seen as competing methods in literature 10
  • 11.
    2. Polysemous codes [Polysemous codes,Douze, Jégou, Perronnin, ECCV’16]
  • 12.
    2.1 A naïveapproach q_bin= binary_encode(x) # compute query binary code d_min = ∞ for i = 1..n # loop over database items db_bin = db_bin_codes[i] # get binary code for item i if hamming(q_bin, db_min) < threshold db_pq = db_pq_codes[i] # get PQ code for item i d = PQ_distance(x, db_pq) if d < d_min nearest_neighbor, d_min = i, d Encode all DB items with binary and PQ codes: 12
  • 13.
    2.2 A naïveapproach Encode all DB items with binary and PQ codes → memory increase (x2) Could we use the same codes for both the Hamming and PQ distances? → polysemous codes 13
  • 14.
    2.3 Channel optimizedvector quantization Channel-optimized vector quantizers: “pseudo-Gray coding” Minimize the overall expected distortion (both from source and channel) Optimize the index assignment → neighboring codes encode similar info enc=01001100 dec=01011100 14
  • 15.
    2.4 Index assignmentoptimization Given a k-means quantizer, learn a permutation of the codes such that the binary comparison reflects centroid distances 15
  • 16.
    2.5 The polysemousapproach q = encode(x) # compute query code d_min = ∞ for i = 1..n # loop over database items db = db_codes[i] # get code for item i if hamming(q, db) < threshold d = PQ_distance(x, db) if d < d_min nearest_neighbor, d_min = i, d Interpret PQ codes as binary codes: 16
  • 17.
    2.5 The polysemousapproach q = encode(x) # compute query code d_min = ∞ for i = 1..n # loop over database items db = db_codes[i] # get code for item i if hamming(q, db) < threshold d = PQ_distance(x, db) if d < d_min nearest_neighbor, d_min = i, d Interpret PQ codes as binary codes: → no memory increase 17
  • 18.
    2.6 Objective function Finda permutation 𝝅() such that the Hamming distance between permuted indices matches the distance between centroids weighting to favor nearby centroids × optimize permutation over all pairs of centroids monotonous (linear) function to correct the scale 18
  • 19.
    2.6 Objective function Finda permutation 𝝅() such that the Hamming distance between permuted indices matches the distance between centroids weighting to favor nearby centroids × optimize permutation over all pairs of centroids monotonous (linear) function to correct the scale 19
  • 20.
    2.6 Objective function Finda permutation 𝝅() such that the Hamming distance between permuted indices matches the distance between centroids weighting to favor nearby centroids × optimize permutation over all pairs of centroids monotonous (linear) function to correct the scale 20
  • 21.
    2.6 Objective function Finda permutation 𝝅() such that the Hamming distance between permuted indices matches the distance between centroids weighting to favor nearby centroids × optimize permutation over all pairs of centroids monotonous (linear) function to correct the scale 21
  • 22.
    2.7 Optimization Simulated annealing: •initialization: random permutation • swap two entries in the permutation • converges in approx. 200k iterations (<10s) 22
  • 23.
  • 24.
    2.9 Results: exhaustivesearch 1M SIFT vectors: exhaustive comparison (16 bytes) 24
  • 25.
    90M CNN descriptorsfrom Flickr100M dataset (16 bytes) 2.10 Results: non-exhaustive search 25 BIGANN academic benchmark (1B vectors) → x2-2.5 speed-up
  • 26.
    3. The knn-graph ofa collection
  • 27.
    3.1 Building agraph on images Testbed: Flickr100M • public dataset of CC images • described with AlexNet FC7 features normalized, PCA to 256D, encoded as 32bytes, coarse quantizer size 4096 Each image in turn is a query • compute 100-NN • build index = 14h, search = 7h • storage for the graph = 2 x 40 GB RAM
  • 28.
    3.2 Graph modes Graphseen as a Markov model → compute stationary distribution [Cho’12] Sparse matrix – vector multiplication • 200 iterations (30s / iter) • mode = local maximum over nodes
  • 29.
    3.3 Paths inthe graph Almost all images are connected: find path between pairs of images → morphing from one image to another Which paths? • shortest path • minimize sum of distances • minimize max of distances 29
  • 30.
    3.4 Path: flowerto flower 30
  • 31.
    3.5 Path: lionto panther 31
  • 32.
    3.6 Path: dogto cat 32
  • 33.
    3.7 Path: 3Dmorphing 33
  • 34.
  • 35.
    3.9 Path: nonintuitive… 35
  • 36.
  • 37.