Approximate Nearest Neighbour in Higher Dimensions

ANN IN HIGHER
DIMENSIONS
Shelly Chouhan
190101082
Shrey Verma 190101083

CONTENTS
•ANN on the Hypercube
• Hypercube and Hamming distance
• Constructing NNbr for the Hamming cube
• Construction of the near-neighbour data-structure
•LSH and ANN in Euclidean Space
• Preliminaries
• Locality Sensitive Hashing
• ANN in High Dimensional Euclidean Space

NEAREST
NEIGHBOR
SEARCH
Given a set P of n distinct
points in a d-dimensional
space .
We have to build a data
structure which, given a
query point 𝑞 ∈ 𝑅𝑑
returns a
point p ∈ 𝑃 minimizing
𝑝 − 𝑞 .

CURSE OF DIMENSIONALITY
 In 𝑅2
the Vornoi diagram is of size O(n) and the Query takes
O(log(n)) time.
 In 𝑅𝑑
the Vornoi diagram time complexity is O(𝑛[𝑑/2]
)
 We can also perform a linear scan: O(dn) space, O(dn) time
 Volume-distance ratio explodes

(1+ 𝜀)-APPROXIMATE NEAREST
NEIGHBOR SEARCH
(1+ε)-approximate nearest neighbor search is a special case of the
nearest neighbor search problem.
The solution to the (1+ε)-approximate nearest neighbor search is
a point or multiple points within distance cr = (1+ε)r from a query
point,
where r is the distance between the query point and its true nearest
neighbor.

HYPERCUBE AND HAMMING
DISTANCE
The set of points 𝐻𝑑= {0, 1}𝑑 is the d-dimensional
hypercube.
A point p = (𝑝1, . . . . , 𝑝𝑑) ∈ 𝐻𝑑
can be interpreted,
naturally, as a binary string 𝑝1 𝑝2. . . 𝑝𝑑.
The Hamming distance ⅆ𝐻(𝑝, 𝑞) between p, q ∈ 𝐻𝑑
, is the
number of coordinates where p and q disagree.
100→011 has distance
3;
010→111 has distance
2

DISTANCE
0100→1001 has distance 3;
0110→1110 has distance 1

NNBR≈ DATA STRUCTURE
Our goal is to create a data structure that solves the ANN
problem
This data structure works as follows for a given query
point q :-
• If 𝒅 𝒒, 𝑷 ≤ 𝒓 then NNb𝒓≈ outputs a point 𝒑 ∈ 𝐏 such that 𝒅 𝒑, 𝒒 ≤
𝟏 + 𝜺 𝐫.
• If 𝑑 𝑞, 𝑃 ≥ (1 + 𝜀)𝑟 in this case NNbr≈ outputs that "𝑑(𝑞, 𝑃) ≥ 𝑟“.
• If 𝑟 ≤ 𝑑 𝑞, 𝑃 ≤ 1 + 𝜀 𝑟 , either of the above is acceptable.
Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can
construct a data-structure that answers ANN using
O(log(n/𝜀)) queries.

DISTANCE
NNbr≈ Data Structure
problem
point q :-
• If 𝑑 𝑞, 𝑃 ≤ 𝑟 then NNbr≈ outputs a point 𝑝 ∈ P such that 𝑑 𝑝, 𝑞 ≤
1 + 𝜀 r.
• If 𝒅 𝒒, 𝑷 ≥ (𝟏 + 𝜺)𝒓 in this case NNb𝒓≈ outputs that "𝒅(𝒒, 𝑷) ≥ 𝒓“.
• If 𝑟 ≤ 𝑑 𝑞, 𝑃 ≤ 1 + 𝜀 𝑟 , either of the above is acceptable.

DISTANCE
NNbr≈ Data Structure
problem
point q :-
• If 𝑑 𝑞, 𝑃 ≤ 𝑟 then NNbr≈ outputs a point 𝑝 ∈ P such that 𝑑 𝑝, 𝑞 ≤
1 + 𝜀 r.
• If 𝑑 𝑞, 𝑃 ≥ (1 + 𝜀)𝑟 in this case NNb𝑟≈ outputs that "𝑑(𝑞, 𝑃) ≥ 𝑟“.
• If 𝒓 ≤ 𝒅 𝒒, 𝑷 ≤ 𝟏 + 𝜺 𝒓 , either of the above is acceptable.

CONSTRUCTING NNBR≈ FOR THE
HAMMING CUBE
We are interested in building an NNbr≈ for balls of radius r in the
Hamming distance.
F = {ℎ ∶ 𝑆 → [0; 𝑈]} of functions, is an (r, R, 𝛼, 𝛽)-sensitive if for any p, 𝑞
∈ 𝑆, we have:
If 𝑢 ∈ 𝑏(𝑞, 𝑟) then Pr ℎ 𝑝 = ℎ 𝑞 ≥ 𝛼
If 𝑢 ∉ 𝑏(𝑞, 𝑅) then Pr ℎ 𝑝 = ℎ 𝑞 ≤ 𝛽
h is randomly picked from 𝐹, 𝑟 < 𝑅, and 𝛼 > 𝛽.

For the hypercube 𝐻𝑑 = {0,1}𝑑, and a point 𝑏 = (𝑏1, . . . . , 𝑏𝑑) ∈ 𝐻𝑑, let F be the set of
functions
{ℎ𝑖(𝑏) = 𝑏𝑖 | b = (𝑏1, . . . . , 𝑏𝑑) ∈ 𝐻𝑑, for 𝑖 = 1, . . . , 𝑑. }
Then for any r, 𝜀, the family F is (r, (1+𝜀)r, 1-
𝑟
𝑑
, 1-
r(1+𝜀)
𝑑
)- sensitive.
Proof:-
If 𝑢, 𝑣 ∈ {0,1}𝑑 are in distance smaller than r from each other (under the Hamming
distance), then they differ in at most 𝑟 coordinates
The probability that h ∈ 𝐹 would project into a coordinate that u and v
agree on is ≥ 1−
r
d
Similarly, if ⅆ𝐻 𝑢, 𝑣 ≥ (1 + 𝜀)𝑟 then the probability that ℎ would map
into a coordinate that u and v agree on is ≥ 1−
r(1+𝜀)
𝑑
.
h1
h2
DEFINE A FAMILY F

DEFINE A FAMILY G TO EXTEND F
Let k be a parameter to be specified shortly. Let
𝐺(𝐹) = { 𝑔: {0,1}𝑑→ {0,1}𝑘|g(u) = (ℎ1(u), . . . . , ℎ𝑘(u)), for ℎ1, . . . . , ℎ𝑘 ∈ 𝐹 }
k
h1
h2
hk
gi( ) = (0,0,...,1)
gi( ) = (1,0,…,0)
hi : Rd  {0,1}
0 1

DEFINE A FAMILY G TO EXTEND F
Values of 𝛼 and 𝛽 for G
𝛼′ = 1 −
𝑟
𝑑
𝑘
= 𝛼𝑘
𝛽′
= 1 −
r 1+𝜀
𝑑
𝑘
= 𝛽𝑘
Thus we can say family G is (r, R, 𝛼𝑘, 𝛽𝑘)-sensitive

CONSTRUCTING HASH TABLES
Choose 𝑔1, 𝑔2, . . . , 𝑔𝜏 uniformly at random from G
Constructing 𝜏 hash tables (𝐻1, 𝐻2, . . . , 𝐻𝜏)
Hash all the points in P
Store 𝑔𝑖(𝑝1), . . . , 𝑔𝑖(𝑝𝑛) in table 𝐻𝑖
𝜏 is a parameter, will chose later
𝐻1 𝐻2 𝐻𝜏

CONSTRUCTING HASH TABLES
When a query point q is given
Hash q into each 𝑔1, 𝑔2, . . . , 𝑔𝜏
Check colliding nodes for ANN
Stop if more than 4𝜏 collisions, return fail 𝐻1 𝐻2 𝐻𝜏

CHOOSING PARAMETERS
We choose k and 𝝉 so that with constant probability (say larger than
half) we have the following two properties:
1. If there is a point u ∈ P, such that ⅆ𝐻(u, q)≤ 𝑟, then 𝒈𝒋(𝒖) = 𝒈𝒋(𝒒) for
some 𝒋.
2. If ⅆ𝐻(𝑢, 𝑞) ≥ (1+𝜀)𝑟), the total number of points colliding with 𝑞 in
the 𝜏 hash tables, is smaller than 𝟒𝝉.
Define: 𝜌 =
ln 1/𝛼
ln 1/𝛽
Choose: 𝑘 =
ln 𝑛
ln 1/𝛽
, 𝜏 = 2𝑛𝜌

DEFINING VALUES OF
PARAMETERS
Theorem (1.1) If there is a (r, R, 𝛼, 𝛽)-sensitive family F of functions for the
hypercube, then there exists a Nbr≈(P, r, (1 + 𝜀)𝑟) which uses 𝑂(𝑑𝑛 + 𝑛1+𝜌)
space and O(𝑛𝜌
) hash probes for each query, where
𝜌 =
ln 1/𝛼
ln 1/𝛽
Then, this data-structure succeeds with constant probability
Proof:
Case 1: Probability of collision if there is no ANN
Consider a point p ∉ 𝑏(𝑞, 𝑟(1 + 𝜀) and a hash function g ∈ G
Pr 𝑔 𝑝 = 𝑔 𝑞 ≤ 𝛽𝑘
= exp ln 𝛽 ⋅
ln 𝑛
ln 1 𝛽
≤ 1/𝑛

DEFINING VALUES OF
PARAMETERS
Pr[E(collisions with q in 1
table)] ≤ 1
Pr[E(collisions with q in 𝜏
tables)] ≤ 𝜏
Pr[E(≥ 4𝜏 collisions)] ≤
𝜏
4𝜏
=
1
4
Pr[E(< 4𝜏 collisions)] ≥
3
4

Case 2: Probability of finding an ANN if there is a NN
Pr 𝑔𝑖 𝑝 = 𝑔𝑖 𝑞 ≥ 𝛼𝑘
= 𝛼
ln 𝑛
ln 1/𝛽
= 𝑛
−
ln 1/𝛼
ln 1/𝛽
= 𝑛−𝜌
Pr[E(𝑔𝑖 hashes p and q to different locations)] ≤ 1 − 𝑛−𝜌
Pr[E(p and q collide atleast once in 𝜏 tables)] ≥ 1 − (1− 𝑛−𝜌)𝜏
By setting 𝜏 = 2𝑛𝜌 we get the probability > 4/5
DEFINING VALUES OF
PARAMETERS

LEMMA
There exists Nbr≈(P, r, (1 + 𝜀)𝑟) which uses 𝑂(𝑑𝑛 + 𝑛1+𝜌
) space and
O(𝑛𝜌) hash probes for each query. The probability of success is a
constant.
Claim used:-

TIME COMPLEXITY
Putting the value of 𝜌 in the statement of Theorem 1.1
Space and time complexity of LSH:-
Theorem (1.2):
For amplifying success probability
Building O log 𝑛 structures

LSH AND ANN IN
EUCLIDEAN SPACE

LOCALITY SENSITIVE HASHING
• hashes similar input items into the same “buckets” with high
probability.
• hash collisions are maximized, not minimized.

LSH
Family of hash functions:
• Map close points to same buckets, faraway
points to different buckets
• Choose a random function and hash P
• Only store non-empty buckets
• Hash q in the table
• Test every point in q’s bucket for ANN
Problem: q’s bucket may be empty

LSH
• Solution: Use a number of hash tables.
We are done if any ANN is found
• Problem: Poor resolution too many
candidates, Stops after reaching a limit,
small probability.
Want to find a hash function:
If 𝑢 ∈ 𝑏(𝑞, 𝑟) then Pr ℎ 𝑝 = ℎ 𝑞 ≥ 𝛼
If 𝑢 ∉ 𝑏(𝑞, 𝑅) then Pr ℎ 𝑝 = ℎ 𝑞 ≤ 𝛽
h is randomly picked from 𝐹, 𝑟 < 𝑅, and
𝛼 > 𝛽.

2 STABLE DISTRIBUTION
Let X = (X1, . . . , Xd) be a vector of d independent
variables which have distribution N(0, 1)
For any v = (v1, . . . , vd) ∈ 𝑅𝑑
.
We have that v · X = 𝑖 𝑣𝑖
𝑋𝑖 is distributed as 𝑣 Z,
where Z ∼ N(0, 1).
A d-dimensional distribution that has this property is
called a 2-stable distribution.
Gaussian Normal
Distributions are 2-
stable

LSH BY PROJECTIONS
The Idea:-
• Hash function is a projection to a line of random orientation
• One composite hash function is a random grid
• Hashing buckets are grid cells
• Multiple grids are used for prob. amplification
• Jitter grid offset randomly (check only one cell)
• Double hashing: Do not store empty grid cells

PROJECTION
• The idea: If two points are close
together then after projection, these
points will remain close together.
• Let p, q be two points in 𝑅𝑑
. We want
to decide if q − r ≤ 1 or q − r ≥ η, η
= 1 + ε. We randomly choose a vector
𝑣 from the d-dimensional normal
distribution 𝑁𝑑
(0, 1) (which is 2-
stable).
• r is a parameter which signifies size of
the bucket, and t is a random number
from the interval [0,r]. p ∈ 𝑅𝑑
• consider the random hash function:
h(p) =
𝑝.𝑣+𝑡
𝑟

If p and q are in distance η from each other, and distance between their
projection on 𝑣 is t, then the probability that they get the same hash
value is 1 − t/r. The probability of collusion is:
α(η) = Pr[ h(p) = h(q)] = 𝑡=0
𝑟
Pr[ 𝑝. 𝑣 − 𝑞. 𝑣 =t] (1−
𝑡
𝑟
)ⅆt
Since 𝑣 is a 2-stable distribution, we have that 𝑝. 𝑣 − 𝑞. 𝑣 = (𝑝 − 𝑞). 𝑣 ∼
N(0, 𝑝𝑞 2
). For absolute value, we multiply this by 2:
α(η,r) = 𝑡=0
𝑟 2
η 2𝜋
exp(
−𝑡2
2η2 (1−
𝑡
𝑟
)ⅆt)
We would like to maximize the difference α(1 + ε,r) − α(1,r), as much as
possible, by choosing the right value of r. Through numerical
computation we obtain that we can chose an r such that:
ρ(1 + ε) ≤
1
1+ε
LSH

LOCALITY SENSITIVE HF
Theorem 2.1: For a set P of n points in 𝑅𝑑 , with ε > 0 and r > 0, we
construct 𝑁𝑁𝑏𝑟≈ = 𝑁𝑁𝑏𝑟≈(P,r, (1 + ε)r) , s.t, for a query point q, if:
• b(q,r) ∩ P ≠ ∅, then 𝑁𝑁𝑏𝑟≈ returns a point u ∈ P, such that 𝑑𝐻(u, q) ≤
(1 + ε)r.
• b(q, (1 + ε)r) ∩ P = ∅ then 𝑁𝑁𝑏𝑟≈ returns that no point is in distance
≤ r from q.
In any other case, any of the answers is acceptable.
Performance:
The query time is O(d𝑛1/(1+ε) log 𝑛) and the space used is
O(dn+𝑛1+1/(1+ε) log 𝑛) . The result returned is correct with high
probability.

ANN IN HIGH DIMENSIONAL
EUCLIDEAN SPACE
Recap:
Theorem(2.2) : Given a set P of n points in 𝑅𝑑 , then one can
construct data-structures D that answers (1 + ε)-ANN queries, by
performing O(log(n/ε)) 𝑁𝑁𝑏𝑟≈ queries.
The total number of points stored at 𝑁𝑁𝑏𝑟≈ data-structures of D is
O(nε−1log(n/ε)).
This theorem requires constructing a low quality HST but such
constructions are exponential in dimension or take quadratic time. So
we need to present a faster scheme.

THE OVERALL RESULT
Given a set P of n points in 𝑅𝑑 , parameters ε > 0 and r > 0, one can
build ANN data-structure using
O (dn + 𝑛1+1/(1+ε) ε−2
𝑙𝑜𝑔3
(n/ε) )
space, such that given a query point q, one can returns an (1 + ε)-
ANN in P in
O (d𝑛1/(1+ε) (log n) log (n/ε) )
time. The result returned is correct with high probability.
The construction time is O (dn + 𝑛1+1/(1+ε) ε−2
𝑙𝑜𝑔3
(n/ε) )

THE OVERALL RESULT
Proof:
Theorem (2.3): Let P be a set of n in 𝑅𝑑
. One can compute a nd-HST
of P in O( nⅆ 𝑙𝑜𝑔2
n) time (note, that the constant hidden by the O
notation does not depend on d).
We compute the low quality HST using this theorem(2.3). This takes
O( nⅆ 𝑙𝑜𝑔2
n) time. Using this HST, we can construct the data-
structure D of Theorem(2.1), where we do not compute the
𝑁𝑁𝑏𝑟≈ data-structures. We next traverse the tree D, and construct the
𝑁𝑁𝑏𝑟≈ data-structures using Theorem 2.1.
We only need to prove the bound on the space. We need to store each
point only once, since other place can refer to the point by a pointer.
Thus, this is the O(nd) space requirement. The other term comes
from plugging the bound of Theorem 2.1 into the bound of Theorem
2.1

Approximate Nearest Neighbour in Higher Dimensions

In this document