2. CONTENTS
•ANN on the Hypercube
• Hypercube and Hamming distance
• Constructing NNbr for the Hamming cube
• Construction of the near-neighbour data-structure
•LSH and ANN in Euclidean Space
• Preliminaries
• Locality Sensitive Hashing
• ANN in High Dimensional Euclidean Space
3. NEAREST
NEIGHBOR
SEARCH
Given a set P of n distinct
points in a d-dimensional
space .
We have to build a data
structure which, given a
query point 𝑞 ∈ 𝑅𝑑
returns a
point p ∈ 𝑃 minimizing
𝑝 − 𝑞 .
4. CURSE OF DIMENSIONALITY
In 𝑅2
the Vornoi diagram is of size O(n) and the Query takes
O(log(n)) time.
In 𝑅𝑑
the Vornoi diagram time complexity is O(𝑛[𝑑/2]
)
We can also perform a linear scan: O(dn) space, O(dn) time
Volume-distance ratio explodes
5. (1+ 𝜀)-APPROXIMATE NEAREST
NEIGHBOR SEARCH
(1+ε)-approximate nearest neighbor search is a special case of the
nearest neighbor search problem.
The solution to the (1+ε)-approximate nearest neighbor search is
a point or multiple points within distance cr = (1+ε)r from a query
point,
where r is the distance between the query point and its true nearest
neighbor.
7. HYPERCUBE AND HAMMING
DISTANCE
The set of points 𝐻𝑑= {0, 1}𝑑 is the d-dimensional
hypercube.
A point p = (𝑝1, . . . . , 𝑝𝑑) ∈ 𝐻𝑑
can be interpreted,
naturally, as a binary string 𝑝1 𝑝2. . . 𝑝𝑑.
The Hamming distance ⅆ𝐻(𝑝, 𝑞) between p, q ∈ 𝐻𝑑
, is the
number of coordinates where p and q disagree.
100→011 has distance
3;
010→111 has distance
2
9. NNBR≈ DATA STRUCTURE
Our goal is to create a data structure that solves the ANN
problem
This data structure works as follows for a given query
point q :-
• If 𝒅 𝒒, 𝑷 ≤ 𝒓 then NNb𝒓≈ outputs a point 𝒑 ∈ 𝐏 such that 𝒅 𝒑, 𝒒 ≤
𝟏 + 𝜺 𝐫.
• If 𝑑 𝑞, 𝑃 ≥ (1 + 𝜀)𝑟 in this case NNbr≈ outputs that "𝑑(𝑞, 𝑃) ≥ 𝑟“.
• If 𝑟 ≤ 𝑑 𝑞, 𝑃 ≤ 1 + 𝜀 𝑟 , either of the above is acceptable.
Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can
construct a data-structure that answers ANN using
O(log(n/𝜀)) queries.
10. HYPERCUBE AND HAMMING
DISTANCE
NNbr≈ Data Structure
Our goal is to create a data structure that solves the ANN
problem
This data structure works as follows for a given query
point q :-
• If 𝑑 𝑞, 𝑃 ≤ 𝑟 then NNbr≈ outputs a point 𝑝 ∈ P such that 𝑑 𝑝, 𝑞 ≤
1 + 𝜀 r.
• If 𝒅 𝒒, 𝑷 ≥ (𝟏 + 𝜺)𝒓 in this case NNb𝒓≈ outputs that "𝒅(𝒒, 𝑷) ≥ 𝒓“.
• If 𝑟 ≤ 𝑑 𝑞, 𝑃 ≤ 1 + 𝜀 𝑟 , either of the above is acceptable.
Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can
construct a data-structure that answers ANN using
11. HYPERCUBE AND HAMMING
DISTANCE
NNbr≈ Data Structure
Our goal is to create a data structure that solves the ANN
problem
This data structure works as follows for a given query
point q :-
• If 𝑑 𝑞, 𝑃 ≤ 𝑟 then NNbr≈ outputs a point 𝑝 ∈ P such that 𝑑 𝑝, 𝑞 ≤
1 + 𝜀 r.
• If 𝑑 𝑞, 𝑃 ≥ (1 + 𝜀)𝑟 in this case NNb𝑟≈ outputs that "𝑑(𝑞, 𝑃) ≥ 𝑟“.
• If 𝒓 ≤ 𝒅 𝒒, 𝑷 ≤ 𝟏 + 𝜺 𝒓 , either of the above is acceptable.
Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can
construct a data-structure that answers ANN using
12. CONSTRUCTING NNBR≈ FOR THE
HAMMING CUBE
We are interested in building an NNbr≈ for balls of radius r in the
Hamming distance.
F = {ℎ ∶ 𝑆 → [0; 𝑈]} of functions, is an (r, R, 𝛼, 𝛽)-sensitive if for any p, 𝑞
∈ 𝑆, we have:
If 𝑢 ∈ 𝑏(𝑞, 𝑟) then Pr ℎ 𝑝 = ℎ 𝑞 ≥ 𝛼
If 𝑢 ∉ 𝑏(𝑞, 𝑅) then Pr ℎ 𝑝 = ℎ 𝑞 ≤ 𝛽
h is randomly picked from 𝐹, 𝑟 < 𝑅, and 𝛼 > 𝛽.
14. For the hypercube 𝐻𝑑 = {0,1}𝑑, and a point 𝑏 = (𝑏1, . . . . , 𝑏𝑑) ∈ 𝐻𝑑, let F be the set of
functions
{ℎ𝑖(𝑏) = 𝑏𝑖 | b = (𝑏1, . . . . , 𝑏𝑑) ∈ 𝐻𝑑, for 𝑖 = 1, . . . , 𝑑. }
Then for any r, 𝜀, the family F is (r, (1+𝜀)r, 1-
𝑟
𝑑
, 1-
r(1+𝜀)
𝑑
)- sensitive.
Proof:-
If 𝑢, 𝑣 ∈ {0,1}𝑑 are in distance smaller than r from each other (under the Hamming
distance), then they differ in at most 𝑟 coordinates
The probability that h ∈ 𝐹 would project into a coordinate that u and v
agree on is ≥ 1−
r
d
Similarly, if ⅆ𝐻 𝑢, 𝑣 ≥ (1 + 𝜀)𝑟 then the probability that ℎ would map
into a coordinate that u and v agree on is ≥ 1−
r(1+𝜀)
𝑑
.
h1
h2
DEFINE A FAMILY F
15. DEFINE A FAMILY G TO EXTEND F
Let k be a parameter to be specified shortly. Let
𝐺(𝐹) = { 𝑔: {0,1}𝑑→ {0,1}𝑘|g(u) = (ℎ1(u), . . . . , ℎ𝑘(u)), for ℎ1, . . . . , ℎ𝑘 ∈ 𝐹 }
k
h1
h2
hk
gi( ) = (0,0,...,1)
gi( ) = (1,0,…,0)
hi : Rd {0,1}
0 1
16. DEFINE A FAMILY G TO EXTEND F
Values of 𝛼 and 𝛽 for G
𝛼′ = 1 −
𝑟
𝑑
𝑘
= 𝛼𝑘
𝛽′
= 1 −
r 1+𝜀
𝑑
𝑘
= 𝛽𝑘
Thus we can say family G is (r, R, 𝛼𝑘, 𝛽𝑘)-sensitive
17. CONSTRUCTING HASH TABLES
Choose 𝑔1, 𝑔2, . . . , 𝑔𝜏 uniformly at random from G
Constructing 𝜏 hash tables (𝐻1, 𝐻2, . . . , 𝐻𝜏)
Hash all the points in P
Store 𝑔𝑖(𝑝1), . . . , 𝑔𝑖(𝑝𝑛) in table 𝐻𝑖
𝜏 is a parameter, will chose later
𝐻1 𝐻2 𝐻𝜏
18. CONSTRUCTING HASH TABLES
When a query point q is given
Hash q into each 𝑔1, 𝑔2, . . . , 𝑔𝜏
Check colliding nodes for ANN
Stop if more than 4𝜏 collisions, return fail 𝐻1 𝐻2 𝐻𝜏
19. CHOOSING PARAMETERS
We choose k and 𝝉 so that with constant probability (say larger than
half) we have the following two properties:
1. If there is a point u ∈ P, such that ⅆ𝐻(u, q)≤ 𝑟, then 𝒈𝒋(𝒖) = 𝒈𝒋(𝒒) for
some 𝒋.
2. If ⅆ𝐻(𝑢, 𝑞) ≥ (1+𝜀)𝑟), the total number of points colliding with 𝑞 in
the 𝜏 hash tables, is smaller than 𝟒𝝉.
Define: 𝜌 =
ln 1/𝛼
ln 1/𝛽
Choose: 𝑘 =
ln 𝑛
ln 1/𝛽
, 𝜏 = 2𝑛𝜌
20. DEFINING VALUES OF
PARAMETERS
Theorem (1.1) If there is a (r, R, 𝛼, 𝛽)-sensitive family F of functions for the
hypercube, then there exists a Nbr≈(P, r, (1 + 𝜀)𝑟) which uses 𝑂(𝑑𝑛 + 𝑛1+𝜌)
space and O(𝑛𝜌
) hash probes for each query, where
𝜌 =
ln 1/𝛼
ln 1/𝛽
Then, this data-structure succeeds with constant probability
Proof:
Case 1: Probability of collision if there is no ANN
Consider a point p ∉ 𝑏(𝑞, 𝑟(1 + 𝜀) and a hash function g ∈ G
Pr 𝑔 𝑝 = 𝑔 𝑞 ≤ 𝛽𝑘
= exp ln 𝛽 ⋅
ln 𝑛
ln 1 𝛽
≤ 1/𝑛
22. Case 2: Probability of finding an ANN if there is a NN
Pr 𝑔𝑖 𝑝 = 𝑔𝑖 𝑞 ≥ 𝛼𝑘
= 𝛼
ln 𝑛
ln 1/𝛽
= 𝑛
−
ln 1/𝛼
ln 1/𝛽
= 𝑛−𝜌
Pr[E(𝑔𝑖 hashes p and q to different locations)] ≤ 1 − 𝑛−𝜌
Pr[E(p and q collide atleast once in 𝜏 tables)] ≥ 1 − (1− 𝑛−𝜌)𝜏
By setting 𝜏 = 2𝑛𝜌 we get the probability > 4/5
DEFINING VALUES OF
PARAMETERS
23. LEMMA
There exists Nbr≈(P, r, (1 + 𝜀)𝑟) which uses 𝑂(𝑑𝑛 + 𝑛1+𝜌
) space and
O(𝑛𝜌) hash probes for each query. The probability of success is a
constant.
Claim used:-
24. TIME COMPLEXITY
Putting the value of 𝜌 in the statement of Theorem 1.1
Space and time complexity of LSH:-
Theorem (1.2):
For amplifying success probability
Building O log 𝑛 structures
26. LOCALITY SENSITIVE HASHING
• hashes similar input items into the same “buckets” with high
probability.
• hash collisions are maximized, not minimized.
27. LSH
Family of hash functions:
• Map close points to same buckets, faraway
points to different buckets
• Choose a random function and hash P
• Only store non-empty buckets
• Hash q in the table
• Test every point in q’s bucket for ANN
Problem: q’s bucket may be empty
28. LSH
• Solution: Use a number of hash tables.
We are done if any ANN is found
• Problem: Poor resolution too many
candidates, Stops after reaching a limit,
small probability.
Want to find a hash function:
If 𝑢 ∈ 𝑏(𝑞, 𝑟) then Pr ℎ 𝑝 = ℎ 𝑞 ≥ 𝛼
If 𝑢 ∉ 𝑏(𝑞, 𝑅) then Pr ℎ 𝑝 = ℎ 𝑞 ≤ 𝛽
h is randomly picked from 𝐹, 𝑟 < 𝑅, and
𝛼 > 𝛽.
29. 2 STABLE DISTRIBUTION
Let X = (X1, . . . , Xd) be a vector of d independent
variables which have distribution N(0, 1)
For any v = (v1, . . . , vd) ∈ 𝑅𝑑
.
We have that v · X = 𝑖 𝑣𝑖
𝑋𝑖 is distributed as 𝑣 Z,
where Z ∼ N(0, 1).
A d-dimensional distribution that has this property is
called a 2-stable distribution.
Gaussian Normal
Distributions are 2-
stable
30. LSH BY PROJECTIONS
The Idea:-
• Hash function is a projection to a line of random orientation
• One composite hash function is a random grid
• Hashing buckets are grid cells
• Multiple grids are used for prob. amplification
• Jitter grid offset randomly (check only one cell)
• Double hashing: Do not store empty grid cells
31. PROJECTION
• The idea: If two points are close
together then after projection, these
points will remain close together.
• Let p, q be two points in 𝑅𝑑
. We want
to decide if q − r ≤ 1 or q − r ≥ η, η
= 1 + ε. We randomly choose a vector
𝑣 from the d-dimensional normal
distribution 𝑁𝑑
(0, 1) (which is 2-
stable).
• r is a parameter which signifies size of
the bucket, and t is a random number
from the interval [0,r]. p ∈ 𝑅𝑑
• consider the random hash function:
h(p) =
𝑝.𝑣+𝑡
𝑟
32. If p and q are in distance η from each other, and distance between their
projection on 𝑣 is t, then the probability that they get the same hash
value is 1 − t/r. The probability of collusion is:
α(η) = Pr[ h(p) = h(q)] = 𝑡=0
𝑟
Pr[ 𝑝. 𝑣 − 𝑞. 𝑣 =t] (1−
𝑡
𝑟
)ⅆt
Since 𝑣 is a 2-stable distribution, we have that 𝑝. 𝑣 − 𝑞. 𝑣 = (𝑝 − 𝑞). 𝑣 ∼
N(0, 𝑝𝑞 2
). For absolute value, we multiply this by 2:
α(η,r) = 𝑡=0
𝑟 2
η 2𝜋
exp(
−𝑡2
2η2 (1−
𝑡
𝑟
)ⅆt)
We would like to maximize the difference α(1 + ε,r) − α(1,r), as much as
possible, by choosing the right value of r. Through numerical
computation we obtain that we can chose an r such that:
ρ(1 + ε) ≤
1
1+ε
LSH
33. LOCALITY SENSITIVE HF
Theorem 2.1: For a set P of n points in 𝑅𝑑 , with ε > 0 and r > 0, we
construct 𝑁𝑁𝑏𝑟≈ = 𝑁𝑁𝑏𝑟≈(P,r, (1 + ε)r) , s.t, for a query point q, if:
• b(q,r) ∩ P ≠ ∅, then 𝑁𝑁𝑏𝑟≈ returns a point u ∈ P, such that 𝑑𝐻(u, q) ≤
(1 + ε)r.
• b(q, (1 + ε)r) ∩ P = ∅ then 𝑁𝑁𝑏𝑟≈ returns that no point is in distance
≤ r from q.
In any other case, any of the answers is acceptable.
Performance:
The query time is O(d𝑛1/(1+ε) log 𝑛) and the space used is
O(dn+𝑛1+1/(1+ε) log 𝑛) . The result returned is correct with high
probability.
34. ANN IN HIGH DIMENSIONAL
EUCLIDEAN SPACE
Recap:
Theorem(2.2) : Given a set P of n points in 𝑅𝑑 , then one can
construct data-structures D that answers (1 + ε)-ANN queries, by
performing O(log(n/ε)) 𝑁𝑁𝑏𝑟≈ queries.
The total number of points stored at 𝑁𝑁𝑏𝑟≈ data-structures of D is
O(nε−1log(n/ε)).
This theorem requires constructing a low quality HST but such
constructions are exponential in dimension or take quadratic time. So
we need to present a faster scheme.
35. THE OVERALL RESULT
Given a set P of n points in 𝑅𝑑 , parameters ε > 0 and r > 0, one can
build ANN data-structure using
O (dn + 𝑛1+1/(1+ε) ε−2
𝑙𝑜𝑔3
(n/ε) )
space, such that given a query point q, one can returns an (1 + ε)-
ANN in P in
O (d𝑛1/(1+ε) (log n) log (n/ε) )
time. The result returned is correct with high probability.
The construction time is O (dn + 𝑛1+1/(1+ε) ε−2
𝑙𝑜𝑔3
(n/ε) )
36. THE OVERALL RESULT
Proof:
Theorem (2.3): Let P be a set of n in 𝑅𝑑
. One can compute a nd-HST
of P in O( nⅆ 𝑙𝑜𝑔2
n) time (note, that the constant hidden by the O
notation does not depend on d).
We compute the low quality HST using this theorem(2.3). This takes
O( nⅆ 𝑙𝑜𝑔2
n) time. Using this HST, we can construct the data-
structure D of Theorem(2.1), where we do not compute the
𝑁𝑁𝑏𝑟≈ data-structures. We next traverse the tree D, and construct the
𝑁𝑁𝑏𝑟≈ data-structures using Theorem 2.1.
We only need to prove the bound on the space. We need to store each
point only once, since other place can refer to the point by a pointer.
Thus, this is the O(nd) space requirement. The other term comes
from plugging the bound of Theorem 2.1 into the bound of Theorem
2.1
Editor's Notes
As we know volume is related exponentially to the order of radius
To relax the problem we solve approximate
a hypercube is an n-dimensional analogue of a square and a cube.
consists of groups of opposite parallel line segments aligned in each of the space's dimensions, perpendicular to each other and of the same length.Every point is a binary string
Hamming distance ( r): • Number of different coordinates
If there is a NN, return yes and output one ANN
If there is no ANN, return no
Otherwise, return either
From this slide we can conclude that, if we can construct a (r; R; ; )-sensitive family, then we can distinguish between two points which are close
together, and two points which are far away from each other. Of course, the probabilities and might be very close to each other,
and we need a way to do amplification.
Here we can see the hashing working visually
Defining a family F
Intuition: compare a random coordinate
Hash function is a projection to a line of random orientation
Intuitively, G is a family that extends F
by probing into k coordinates instead of
only one coordinate.
We can Compare k random coordinates
One composite hash function is a random grid
We calculate value of the function h for a point k times hence the probability is
Let be (yet another) parameter to be specified shortly. We pick g1; : : : ; g functions randomly and uniformly from G. For each
point u 2 P compute g1(u); : : : ; gT(u). We construct a hash table Hi to store all the values of gi(p1); : : : ; gi(pn), for i = 1; : : : ; T.
Given a query point q 2 Hd, we compute p1(q); : : : ; p(q), and retrieve all the points stored in those buckets in the hash tables
H1; : : : ; H, respectively
For every point retrieved, we compute its distance to q, and if this distance is <=R we return it.‘fail’=we encounter more than 4 points we abort, and return If no “close” point is encountered