ANN IN HIGHER
DIMENSIONS
Shelly Chouhan
190101082
Shrey Verma 190101083
CONTENTS
•ANN on the Hypercube
• Hypercube and Hamming distance
• Constructing NNbr for the Hamming cube
• Construction of the near-neighbour data-structure
•LSH and ANN in Euclidean Space
• Preliminaries
• Locality Sensitive Hashing
• ANN in High Dimensional Euclidean Space
NEAREST
NEIGHBOR
SEARCH
Given a set P of n distinct
points in a d-dimensional
space .
We have to build a data
structure which, given a
query point 𝑞 ∈ 𝑅𝑑
returns a
point p ∈ 𝑃 minimizing
𝑝 − 𝑞 .
CURSE OF DIMENSIONALITY
 In 𝑅2
the Vornoi diagram is of size O(n) and the Query takes
O(log(n)) time.
 In 𝑅𝑑
the Vornoi diagram time complexity is O(𝑛[𝑑/2]
)
 We can also perform a linear scan: O(dn) space, O(dn) time
 Volume-distance ratio explodes
(1+ 𝜀)-APPROXIMATE NEAREST
NEIGHBOR SEARCH
(1+ε)-approximate nearest neighbor search is a special case of the
nearest neighbor search problem.
The solution to the (1+ε)-approximate nearest neighbor search is
a point or multiple points within distance cr = (1+ε)r from a query
point,
where r is the distance between the query point and its true nearest
neighbor.
ANN ON THE
HYPERCUBE
HYPERCUBE AND HAMMING
DISTANCE
The set of points 𝐻𝑑= {0, 1}𝑑 is the d-dimensional
hypercube.
A point p = (𝑝1, . . . . , 𝑝𝑑) ∈ 𝐻𝑑
can be interpreted,
naturally, as a binary string 𝑝1 𝑝2. . . 𝑝𝑑.
The Hamming distance ⅆ𝐻(𝑝, 𝑞) between p, q ∈ 𝐻𝑑
, is the
number of coordinates where p and q disagree.
100→011 has distance
3;
010→111 has distance
2
HYPERCUBE AND HAMMING
DISTANCE
0100→1001 has distance 3;
0110→1110 has distance 1
NNBR≈ DATA STRUCTURE
Our goal is to create a data structure that solves the ANN
problem
This data structure works as follows for a given query
point q :-
• If 𝒅 𝒒, 𝑷 ≤ 𝒓 then NNb𝒓≈ outputs a point 𝒑 ∈ 𝐏 such that 𝒅 𝒑, 𝒒 ≤
𝟏 + 𝜺 𝐫.
• If 𝑑 𝑞, 𝑃 ≥ (1 + 𝜀)𝑟 in this case NNbr≈ outputs that "𝑑(𝑞, 𝑃) ≥ 𝑟“.
• If 𝑟 ≤ 𝑑 𝑞, 𝑃 ≤ 1 + 𝜀 𝑟 , either of the above is acceptable.
Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can
construct a data-structure that answers ANN using
O(log(n/𝜀)) queries.
HYPERCUBE AND HAMMING
DISTANCE
NNbr≈ Data Structure
Our goal is to create a data structure that solves the ANN
problem
This data structure works as follows for a given query
point q :-
• If 𝑑 𝑞, 𝑃 ≤ 𝑟 then NNbr≈ outputs a point 𝑝 ∈ P such that 𝑑 𝑝, 𝑞 ≤
1 + 𝜀 r.
• If 𝒅 𝒒, 𝑷 ≥ (𝟏 + 𝜺)𝒓 in this case NNb𝒓≈ outputs that "𝒅(𝒒, 𝑷) ≥ 𝒓“.
• If 𝑟 ≤ 𝑑 𝑞, 𝑃 ≤ 1 + 𝜀 𝑟 , either of the above is acceptable.
Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can
construct a data-structure that answers ANN using
HYPERCUBE AND HAMMING
DISTANCE
NNbr≈ Data Structure
Our goal is to create a data structure that solves the ANN
problem
This data structure works as follows for a given query
point q :-
• If 𝑑 𝑞, 𝑃 ≤ 𝑟 then NNbr≈ outputs a point 𝑝 ∈ P such that 𝑑 𝑝, 𝑞 ≤
1 + 𝜀 r.
• If 𝑑 𝑞, 𝑃 ≥ (1 + 𝜀)𝑟 in this case NNb𝑟≈ outputs that "𝑑(𝑞, 𝑃) ≥ 𝑟“.
• If 𝒓 ≤ 𝒅 𝒒, 𝑷 ≤ 𝟏 + 𝜺 𝒓 , either of the above is acceptable.
Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can
construct a data-structure that answers ANN using
CONSTRUCTING NNBR≈ FOR THE
HAMMING CUBE
We are interested in building an NNbr≈ for balls of radius r in the
Hamming distance.
F = {ℎ ∶ 𝑆 → [0; 𝑈]} of functions, is an (r, R, 𝛼, 𝛽)-sensitive if for any p, 𝑞
∈ 𝑆, we have:
If 𝑢 ∈ 𝑏(𝑞, 𝑟) then Pr ℎ 𝑝 = ℎ 𝑞 ≥ 𝛼
If 𝑢 ∉ 𝑏(𝑞, 𝑅) then Pr ℎ 𝑝 = ℎ 𝑞 ≤ 𝛽
h is randomly picked from 𝐹, 𝑟 < 𝑅, and 𝛼 > 𝛽.
r
… hn
… h2
… h1
For the hypercube 𝐻𝑑 = {0,1}𝑑, and a point 𝑏 = (𝑏1, . . . . , 𝑏𝑑) ∈ 𝐻𝑑, let F be the set of
functions
{ℎ𝑖(𝑏) = 𝑏𝑖 | b = (𝑏1, . . . . , 𝑏𝑑) ∈ 𝐻𝑑, for 𝑖 = 1, . . . , 𝑑. }
Then for any r, 𝜀, the family F is (r, (1+𝜀)r, 1-
𝑟
𝑑
, 1-
r(1+𝜀)
𝑑
)- sensitive.
Proof:-
If 𝑢, 𝑣 ∈ {0,1}𝑑 are in distance smaller than r from each other (under the Hamming
distance), then they differ in at most 𝑟 coordinates
The probability that h ∈ 𝐹 would project into a coordinate that u and v
agree on is ≥ 1−
r
d
Similarly, if ⅆ𝐻 𝑢, 𝑣 ≥ (1 + 𝜀)𝑟 then the probability that ℎ would map
into a coordinate that u and v agree on is ≥ 1−
r(1+𝜀)
𝑑
.
h1
h2
DEFINE A FAMILY F
DEFINE A FAMILY G TO EXTEND F
Let k be a parameter to be specified shortly. Let
𝐺(𝐹) = { 𝑔: {0,1}𝑑→ {0,1}𝑘|g(u) = (ℎ1(u), . . . . , ℎ𝑘(u)), for ℎ1, . . . . , ℎ𝑘 ∈ 𝐹 }
k
h1
h2
hk
gi( ) = (0,0,...,1)
gi( ) = (1,0,…,0)
hi : Rd  {0,1}
0 1
DEFINE A FAMILY G TO EXTEND F
Values of 𝛼 and 𝛽 for G
𝛼′ = 1 −
𝑟
𝑑
𝑘
= 𝛼𝑘
𝛽′
= 1 −
r 1+𝜀
𝑑
𝑘
= 𝛽𝑘
Thus we can say family G is (r, R, 𝛼𝑘, 𝛽𝑘)-sensitive
CONSTRUCTING HASH TABLES
Choose 𝑔1, 𝑔2, . . . , 𝑔𝜏 uniformly at random from G
Constructing 𝜏 hash tables (𝐻1, 𝐻2, . . . , 𝐻𝜏)
Hash all the points in P
Store 𝑔𝑖(𝑝1), . . . , 𝑔𝑖(𝑝𝑛) in table 𝐻𝑖
𝜏 is a parameter, will chose later
𝐻1 𝐻2 𝐻𝜏
CONSTRUCTING HASH TABLES
When a query point q is given
Hash q into each 𝑔1, 𝑔2, . . . , 𝑔𝜏
Check colliding nodes for ANN
Stop if more than 4𝜏 collisions, return fail 𝐻1 𝐻2 𝐻𝜏
CHOOSING PARAMETERS
We choose k and 𝝉 so that with constant probability (say larger than
half) we have the following two properties:
1. If there is a point u ∈ P, such that ⅆ𝐻(u, q)≤ 𝑟, then 𝒈𝒋(𝒖) = 𝒈𝒋(𝒒) for
some 𝒋.
2. If ⅆ𝐻(𝑢, 𝑞) ≥ (1+𝜀)𝑟), the total number of points colliding with 𝑞 in
the 𝜏 hash tables, is smaller than 𝟒𝝉.
Define: 𝜌 =
ln 1/𝛼
ln 1/𝛽
Choose: 𝑘 =
ln 𝑛
ln 1/𝛽
, 𝜏 = 2𝑛𝜌
DEFINING VALUES OF
PARAMETERS
Theorem (1.1) If there is a (r, R, 𝛼, 𝛽)-sensitive family F of functions for the
hypercube, then there exists a Nbr≈(P, r, (1 + 𝜀)𝑟) which uses 𝑂(𝑑𝑛 + 𝑛1+𝜌)
space and O(𝑛𝜌
) hash probes for each query, where
𝜌 =
ln 1/𝛼
ln 1/𝛽
Then, this data-structure succeeds with constant probability
Proof:
Case 1: Probability of collision if there is no ANN
Consider a point p ∉ 𝑏(𝑞, 𝑟(1 + 𝜀) and a hash function g ∈ G
Pr 𝑔 𝑝 = 𝑔 𝑞 ≤ 𝛽𝑘
= exp ln 𝛽 ⋅
ln 𝑛
ln 1 𝛽
≤ 1/𝑛
DEFINING VALUES OF
PARAMETERS
Pr[E(collisions with q in 1
table)] ≤ 1
Pr[E(collisions with q in 𝜏
tables)] ≤ 𝜏
Pr[E(≥ 4𝜏 collisions)] ≤
𝜏
4𝜏
=
1
4
Pr[E(< 4𝜏 collisions)] ≥
3
4
Case 2: Probability of finding an ANN if there is a NN
Pr 𝑔𝑖 𝑝 = 𝑔𝑖 𝑞 ≥ 𝛼𝑘
= 𝛼
ln 𝑛
ln 1/𝛽
= 𝑛
−
ln 1/𝛼
ln 1/𝛽
= 𝑛−𝜌
Pr[E(𝑔𝑖 hashes p and q to different locations)] ≤ 1 − 𝑛−𝜌
Pr[E(p and q collide atleast once in 𝜏 tables)] ≥ 1 − (1− 𝑛−𝜌)𝜏
By setting 𝜏 = 2𝑛𝜌 we get the probability > 4/5
DEFINING VALUES OF
PARAMETERS
LEMMA
There exists Nbr≈(P, r, (1 + 𝜀)𝑟) which uses 𝑂(𝑑𝑛 + 𝑛1+𝜌
) space and
O(𝑛𝜌) hash probes for each query. The probability of success is a
constant.
Claim used:-
TIME COMPLEXITY
Putting the value of 𝜌 in the statement of Theorem 1.1
Space and time complexity of LSH:-
Theorem (1.2):
For amplifying success probability
Building O log 𝑛 structures
LSH AND ANN IN
EUCLIDEAN SPACE
LOCALITY SENSITIVE HASHING
• hashes similar input items into the same “buckets” with high
probability.
• hash collisions are maximized, not minimized.
LSH
Family of hash functions:
• Map close points to same buckets, faraway
points to different buckets
• Choose a random function and hash P
• Only store non-empty buckets
• Hash q in the table
• Test every point in q’s bucket for ANN
Problem: q’s bucket may be empty
LSH
• Solution: Use a number of hash tables.
We are done if any ANN is found
• Problem: Poor resolution too many
candidates, Stops after reaching a limit,
small probability.
Want to find a hash function:
If 𝑢 ∈ 𝑏(𝑞, 𝑟) then Pr ℎ 𝑝 = ℎ 𝑞 ≥ 𝛼
If 𝑢 ∉ 𝑏(𝑞, 𝑅) then Pr ℎ 𝑝 = ℎ 𝑞 ≤ 𝛽
h is randomly picked from 𝐹, 𝑟 < 𝑅, and
𝛼 > 𝛽.
2 STABLE DISTRIBUTION
Let X = (X1, . . . , Xd) be a vector of d independent
variables which have distribution N(0, 1)
For any v = (v1, . . . , vd) ∈ 𝑅𝑑
.
We have that v · X = 𝑖 𝑣𝑖
𝑋𝑖 is distributed as 𝑣 Z,
where Z ∼ N(0, 1).
A d-dimensional distribution that has this property is
called a 2-stable distribution.
Gaussian Normal
Distributions are 2-
stable
LSH BY PROJECTIONS
The Idea:-
• Hash function is a projection to a line of random orientation
• One composite hash function is a random grid
• Hashing buckets are grid cells
• Multiple grids are used for prob. amplification
• Jitter grid offset randomly (check only one cell)
• Double hashing: Do not store empty grid cells
PROJECTION
• The idea: If two points are close
together then after projection, these
points will remain close together.
• Let p, q be two points in 𝑅𝑑
. We want
to decide if q − r ≤ 1 or q − r ≥ η, η
= 1 + ε. We randomly choose a vector
𝑣 from the d-dimensional normal
distribution 𝑁𝑑
(0, 1) (which is 2-
stable).
• r is a parameter which signifies size of
the bucket, and t is a random number
from the interval [0,r]. p ∈ 𝑅𝑑
• consider the random hash function:
h(p) =
𝑝.𝑣+𝑡
𝑟
If p and q are in distance η from each other, and distance between their
projection on 𝑣 is t, then the probability that they get the same hash
value is 1 − t/r. The probability of collusion is:
α(η) = Pr[ h(p) = h(q)] = 𝑡=0
𝑟
Pr[ 𝑝. 𝑣 − 𝑞. 𝑣 =t] (1−
𝑡
𝑟
)ⅆt
Since 𝑣 is a 2-stable distribution, we have that 𝑝. 𝑣 − 𝑞. 𝑣 = (𝑝 − 𝑞). 𝑣 ∼
N(0, 𝑝𝑞 2
). For absolute value, we multiply this by 2:
α(η,r) = 𝑡=0
𝑟 2
η 2𝜋
exp(
−𝑡2
2η2 (1−
𝑡
𝑟
)ⅆt)
We would like to maximize the difference α(1 + ε,r) − α(1,r), as much as
possible, by choosing the right value of r. Through numerical
computation we obtain that we can chose an r such that:
ρ(1 + ε) ≤
1
1+ε
LSH
LOCALITY SENSITIVE HF
Theorem 2.1: For a set P of n points in 𝑅𝑑 , with ε > 0 and r > 0, we
construct 𝑁𝑁𝑏𝑟≈ = 𝑁𝑁𝑏𝑟≈(P,r, (1 + ε)r) , s.t, for a query point q, if:
• b(q,r) ∩ P ≠ ∅, then 𝑁𝑁𝑏𝑟≈ returns a point u ∈ P, such that 𝑑𝐻(u, q) ≤
(1 + ε)r.
• b(q, (1 + ε)r) ∩ P = ∅ then 𝑁𝑁𝑏𝑟≈ returns that no point is in distance
≤ r from q.
In any other case, any of the answers is acceptable.
Performance:
The query time is O(d𝑛1/(1+ε) log 𝑛) and the space used is
O(dn+𝑛1+1/(1+ε) log 𝑛) . The result returned is correct with high
probability.
ANN IN HIGH DIMENSIONAL
EUCLIDEAN SPACE
Recap:
Theorem(2.2) : Given a set P of n points in 𝑅𝑑 , then one can
construct data-structures D that answers (1 + ε)-ANN queries, by
performing O(log(n/ε)) 𝑁𝑁𝑏𝑟≈ queries.
The total number of points stored at 𝑁𝑁𝑏𝑟≈ data-structures of D is
O(nε−1log(n/ε)).
This theorem requires constructing a low quality HST but such
constructions are exponential in dimension or take quadratic time. So
we need to present a faster scheme.
THE OVERALL RESULT
Given a set P of n points in 𝑅𝑑 , parameters ε > 0 and r > 0, one can
build ANN data-structure using
O (dn + 𝑛1+1/(1+ε) ε−2
𝑙𝑜𝑔3
(n/ε) )
space, such that given a query point q, one can returns an (1 + ε)-
ANN in P in
O (d𝑛1/(1+ε) (log n) log (n/ε) )
time. The result returned is correct with high probability.
The construction time is O (dn + 𝑛1+1/(1+ε) ε−2
𝑙𝑜𝑔3
(n/ε) )
THE OVERALL RESULT
Proof:
Theorem (2.3): Let P be a set of n in 𝑅𝑑
. One can compute a nd-HST
of P in O( nⅆ 𝑙𝑜𝑔2
n) time (note, that the constant hidden by the O
notation does not depend on d).
We compute the low quality HST using this theorem(2.3). This takes
O( nⅆ 𝑙𝑜𝑔2
n) time. Using this HST, we can construct the data-
structure D of Theorem(2.1), where we do not compute the
𝑁𝑁𝑏𝑟≈ data-structures. We next traverse the tree D, and construct the
𝑁𝑁𝑏𝑟≈ data-structures using Theorem 2.1.
We only need to prove the bound on the space. We need to store each
point only once, since other place can refer to the point by a pointer.
Thus, this is the O(nd) space requirement. The other term comes
from plugging the bound of Theorem 2.1 into the bound of Theorem
2.1

Approximate Nearest Neighbour in Higher Dimensions

  • 1.
    ANN IN HIGHER DIMENSIONS ShellyChouhan 190101082 Shrey Verma 190101083
  • 2.
    CONTENTS •ANN on theHypercube • Hypercube and Hamming distance • Constructing NNbr for the Hamming cube • Construction of the near-neighbour data-structure •LSH and ANN in Euclidean Space • Preliminaries • Locality Sensitive Hashing • ANN in High Dimensional Euclidean Space
  • 3.
    NEAREST NEIGHBOR SEARCH Given a setP of n distinct points in a d-dimensional space . We have to build a data structure which, given a query point 𝑞 ∈ 𝑅𝑑 returns a point p ∈ 𝑃 minimizing 𝑝 − 𝑞 .
  • 4.
    CURSE OF DIMENSIONALITY In 𝑅2 the Vornoi diagram is of size O(n) and the Query takes O(log(n)) time.  In 𝑅𝑑 the Vornoi diagram time complexity is O(𝑛[𝑑/2] )  We can also perform a linear scan: O(dn) space, O(dn) time  Volume-distance ratio explodes
  • 5.
    (1+ 𝜀)-APPROXIMATE NEAREST NEIGHBORSEARCH (1+ε)-approximate nearest neighbor search is a special case of the nearest neighbor search problem. The solution to the (1+ε)-approximate nearest neighbor search is a point or multiple points within distance cr = (1+ε)r from a query point, where r is the distance between the query point and its true nearest neighbor.
  • 6.
  • 7.
    HYPERCUBE AND HAMMING DISTANCE Theset of points 𝐻𝑑= {0, 1}𝑑 is the d-dimensional hypercube. A point p = (𝑝1, . . . . , 𝑝𝑑) ∈ 𝐻𝑑 can be interpreted, naturally, as a binary string 𝑝1 𝑝2. . . 𝑝𝑑. The Hamming distance ⅆ𝐻(𝑝, 𝑞) between p, q ∈ 𝐻𝑑 , is the number of coordinates where p and q disagree. 100→011 has distance 3; 010→111 has distance 2
  • 8.
    HYPERCUBE AND HAMMING DISTANCE 0100→1001has distance 3; 0110→1110 has distance 1
  • 9.
    NNBR≈ DATA STRUCTURE Ourgoal is to create a data structure that solves the ANN problem This data structure works as follows for a given query point q :- • If 𝒅 𝒒, 𝑷 ≤ 𝒓 then NNb𝒓≈ outputs a point 𝒑 ∈ 𝐏 such that 𝒅 𝒑, 𝒒 ≤ 𝟏 + 𝜺 𝐫. • If 𝑑 𝑞, 𝑃 ≥ (1 + 𝜀)𝑟 in this case NNbr≈ outputs that "𝑑(𝑞, 𝑃) ≥ 𝑟“. • If 𝑟 ≤ 𝑑 𝑞, 𝑃 ≤ 1 + 𝜀 𝑟 , either of the above is acceptable. Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can construct a data-structure that answers ANN using O(log(n/𝜀)) queries.
  • 10.
    HYPERCUBE AND HAMMING DISTANCE NNbr≈Data Structure Our goal is to create a data structure that solves the ANN problem This data structure works as follows for a given query point q :- • If 𝑑 𝑞, 𝑃 ≤ 𝑟 then NNbr≈ outputs a point 𝑝 ∈ P such that 𝑑 𝑝, 𝑞 ≤ 1 + 𝜀 r. • If 𝒅 𝒒, 𝑷 ≥ (𝟏 + 𝜺)𝒓 in this case NNb𝒓≈ outputs that "𝒅(𝒒, 𝑷) ≥ 𝒓“. • If 𝑟 ≤ 𝑑 𝑞, 𝑃 ≤ 1 + 𝜀 𝑟 , either of the above is acceptable. Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can construct a data-structure that answers ANN using
  • 11.
    HYPERCUBE AND HAMMING DISTANCE NNbr≈Data Structure Our goal is to create a data structure that solves the ANN problem This data structure works as follows for a given query point q :- • If 𝑑 𝑞, 𝑃 ≤ 𝑟 then NNbr≈ outputs a point 𝑝 ∈ P such that 𝑑 𝑝, 𝑞 ≤ 1 + 𝜀 r. • If 𝑑 𝑞, 𝑃 ≥ (1 + 𝜀)𝑟 in this case NNb𝑟≈ outputs that "𝑑(𝑞, 𝑃) ≥ 𝑟“. • If 𝒓 ≤ 𝒅 𝒒, 𝑷 ≤ 𝟏 + 𝜺 𝒓 , either of the above is acceptable. Given such a data-structure NNbr≈ (P, r, (1+𝜀)r), one can construct a data-structure that answers ANN using
  • 12.
    CONSTRUCTING NNBR≈ FORTHE HAMMING CUBE We are interested in building an NNbr≈ for balls of radius r in the Hamming distance. F = {ℎ ∶ 𝑆 → [0; 𝑈]} of functions, is an (r, R, 𝛼, 𝛽)-sensitive if for any p, 𝑞 ∈ 𝑆, we have: If 𝑢 ∈ 𝑏(𝑞, 𝑟) then Pr ℎ 𝑝 = ℎ 𝑞 ≥ 𝛼 If 𝑢 ∉ 𝑏(𝑞, 𝑅) then Pr ℎ 𝑝 = ℎ 𝑞 ≤ 𝛽 h is randomly picked from 𝐹, 𝑟 < 𝑅, and 𝛼 > 𝛽.
  • 13.
  • 14.
    For the hypercube𝐻𝑑 = {0,1}𝑑, and a point 𝑏 = (𝑏1, . . . . , 𝑏𝑑) ∈ 𝐻𝑑, let F be the set of functions {ℎ𝑖(𝑏) = 𝑏𝑖 | b = (𝑏1, . . . . , 𝑏𝑑) ∈ 𝐻𝑑, for 𝑖 = 1, . . . , 𝑑. } Then for any r, 𝜀, the family F is (r, (1+𝜀)r, 1- 𝑟 𝑑 , 1- r(1+𝜀) 𝑑 )- sensitive. Proof:- If 𝑢, 𝑣 ∈ {0,1}𝑑 are in distance smaller than r from each other (under the Hamming distance), then they differ in at most 𝑟 coordinates The probability that h ∈ 𝐹 would project into a coordinate that u and v agree on is ≥ 1− r d Similarly, if ⅆ𝐻 𝑢, 𝑣 ≥ (1 + 𝜀)𝑟 then the probability that ℎ would map into a coordinate that u and v agree on is ≥ 1− r(1+𝜀) 𝑑 . h1 h2 DEFINE A FAMILY F
  • 15.
    DEFINE A FAMILYG TO EXTEND F Let k be a parameter to be specified shortly. Let 𝐺(𝐹) = { 𝑔: {0,1}𝑑→ {0,1}𝑘|g(u) = (ℎ1(u), . . . . , ℎ𝑘(u)), for ℎ1, . . . . , ℎ𝑘 ∈ 𝐹 } k h1 h2 hk gi( ) = (0,0,...,1) gi( ) = (1,0,…,0) hi : Rd  {0,1} 0 1
  • 16.
    DEFINE A FAMILYG TO EXTEND F Values of 𝛼 and 𝛽 for G 𝛼′ = 1 − 𝑟 𝑑 𝑘 = 𝛼𝑘 𝛽′ = 1 − r 1+𝜀 𝑑 𝑘 = 𝛽𝑘 Thus we can say family G is (r, R, 𝛼𝑘, 𝛽𝑘)-sensitive
  • 17.
    CONSTRUCTING HASH TABLES Choose𝑔1, 𝑔2, . . . , 𝑔𝜏 uniformly at random from G Constructing 𝜏 hash tables (𝐻1, 𝐻2, . . . , 𝐻𝜏) Hash all the points in P Store 𝑔𝑖(𝑝1), . . . , 𝑔𝑖(𝑝𝑛) in table 𝐻𝑖 𝜏 is a parameter, will chose later 𝐻1 𝐻2 𝐻𝜏
  • 18.
    CONSTRUCTING HASH TABLES Whena query point q is given Hash q into each 𝑔1, 𝑔2, . . . , 𝑔𝜏 Check colliding nodes for ANN Stop if more than 4𝜏 collisions, return fail 𝐻1 𝐻2 𝐻𝜏
  • 19.
    CHOOSING PARAMETERS We choosek and 𝝉 so that with constant probability (say larger than half) we have the following two properties: 1. If there is a point u ∈ P, such that ⅆ𝐻(u, q)≤ 𝑟, then 𝒈𝒋(𝒖) = 𝒈𝒋(𝒒) for some 𝒋. 2. If ⅆ𝐻(𝑢, 𝑞) ≥ (1+𝜀)𝑟), the total number of points colliding with 𝑞 in the 𝜏 hash tables, is smaller than 𝟒𝝉. Define: 𝜌 = ln 1/𝛼 ln 1/𝛽 Choose: 𝑘 = ln 𝑛 ln 1/𝛽 , 𝜏 = 2𝑛𝜌
  • 20.
    DEFINING VALUES OF PARAMETERS Theorem(1.1) If there is a (r, R, 𝛼, 𝛽)-sensitive family F of functions for the hypercube, then there exists a Nbr≈(P, r, (1 + 𝜀)𝑟) which uses 𝑂(𝑑𝑛 + 𝑛1+𝜌) space and O(𝑛𝜌 ) hash probes for each query, where 𝜌 = ln 1/𝛼 ln 1/𝛽 Then, this data-structure succeeds with constant probability Proof: Case 1: Probability of collision if there is no ANN Consider a point p ∉ 𝑏(𝑞, 𝑟(1 + 𝜀) and a hash function g ∈ G Pr 𝑔 𝑝 = 𝑔 𝑞 ≤ 𝛽𝑘 = exp ln 𝛽 ⋅ ln 𝑛 ln 1 𝛽 ≤ 1/𝑛
  • 21.
    DEFINING VALUES OF PARAMETERS Pr[E(collisionswith q in 1 table)] ≤ 1 Pr[E(collisions with q in 𝜏 tables)] ≤ 𝜏 Pr[E(≥ 4𝜏 collisions)] ≤ 𝜏 4𝜏 = 1 4 Pr[E(< 4𝜏 collisions)] ≥ 3 4
  • 22.
    Case 2: Probabilityof finding an ANN if there is a NN Pr 𝑔𝑖 𝑝 = 𝑔𝑖 𝑞 ≥ 𝛼𝑘 = 𝛼 ln 𝑛 ln 1/𝛽 = 𝑛 − ln 1/𝛼 ln 1/𝛽 = 𝑛−𝜌 Pr[E(𝑔𝑖 hashes p and q to different locations)] ≤ 1 − 𝑛−𝜌 Pr[E(p and q collide atleast once in 𝜏 tables)] ≥ 1 − (1− 𝑛−𝜌)𝜏 By setting 𝜏 = 2𝑛𝜌 we get the probability > 4/5 DEFINING VALUES OF PARAMETERS
  • 23.
    LEMMA There exists Nbr≈(P,r, (1 + 𝜀)𝑟) which uses 𝑂(𝑑𝑛 + 𝑛1+𝜌 ) space and O(𝑛𝜌) hash probes for each query. The probability of success is a constant. Claim used:-
  • 24.
    TIME COMPLEXITY Putting thevalue of 𝜌 in the statement of Theorem 1.1 Space and time complexity of LSH:- Theorem (1.2): For amplifying success probability Building O log 𝑛 structures
  • 25.
    LSH AND ANNIN EUCLIDEAN SPACE
  • 26.
    LOCALITY SENSITIVE HASHING •hashes similar input items into the same “buckets” with high probability. • hash collisions are maximized, not minimized.
  • 27.
    LSH Family of hashfunctions: • Map close points to same buckets, faraway points to different buckets • Choose a random function and hash P • Only store non-empty buckets • Hash q in the table • Test every point in q’s bucket for ANN Problem: q’s bucket may be empty
  • 28.
    LSH • Solution: Usea number of hash tables. We are done if any ANN is found • Problem: Poor resolution too many candidates, Stops after reaching a limit, small probability. Want to find a hash function: If 𝑢 ∈ 𝑏(𝑞, 𝑟) then Pr ℎ 𝑝 = ℎ 𝑞 ≥ 𝛼 If 𝑢 ∉ 𝑏(𝑞, 𝑅) then Pr ℎ 𝑝 = ℎ 𝑞 ≤ 𝛽 h is randomly picked from 𝐹, 𝑟 < 𝑅, and 𝛼 > 𝛽.
  • 29.
    2 STABLE DISTRIBUTION LetX = (X1, . . . , Xd) be a vector of d independent variables which have distribution N(0, 1) For any v = (v1, . . . , vd) ∈ 𝑅𝑑 . We have that v · X = 𝑖 𝑣𝑖 𝑋𝑖 is distributed as 𝑣 Z, where Z ∼ N(0, 1). A d-dimensional distribution that has this property is called a 2-stable distribution. Gaussian Normal Distributions are 2- stable
  • 30.
    LSH BY PROJECTIONS TheIdea:- • Hash function is a projection to a line of random orientation • One composite hash function is a random grid • Hashing buckets are grid cells • Multiple grids are used for prob. amplification • Jitter grid offset randomly (check only one cell) • Double hashing: Do not store empty grid cells
  • 31.
    PROJECTION • The idea:If two points are close together then after projection, these points will remain close together. • Let p, q be two points in 𝑅𝑑 . We want to decide if q − r ≤ 1 or q − r ≥ η, η = 1 + ε. We randomly choose a vector 𝑣 from the d-dimensional normal distribution 𝑁𝑑 (0, 1) (which is 2- stable). • r is a parameter which signifies size of the bucket, and t is a random number from the interval [0,r]. p ∈ 𝑅𝑑 • consider the random hash function: h(p) = 𝑝.𝑣+𝑡 𝑟
  • 32.
    If p andq are in distance η from each other, and distance between their projection on 𝑣 is t, then the probability that they get the same hash value is 1 − t/r. The probability of collusion is: α(η) = Pr[ h(p) = h(q)] = 𝑡=0 𝑟 Pr[ 𝑝. 𝑣 − 𝑞. 𝑣 =t] (1− 𝑡 𝑟 )ⅆt Since 𝑣 is a 2-stable distribution, we have that 𝑝. 𝑣 − 𝑞. 𝑣 = (𝑝 − 𝑞). 𝑣 ∼ N(0, 𝑝𝑞 2 ). For absolute value, we multiply this by 2: α(η,r) = 𝑡=0 𝑟 2 η 2𝜋 exp( −𝑡2 2η2 (1− 𝑡 𝑟 )ⅆt) We would like to maximize the difference α(1 + ε,r) − α(1,r), as much as possible, by choosing the right value of r. Through numerical computation we obtain that we can chose an r such that: ρ(1 + ε) ≤ 1 1+ε LSH
  • 33.
    LOCALITY SENSITIVE HF Theorem2.1: For a set P of n points in 𝑅𝑑 , with ε > 0 and r > 0, we construct 𝑁𝑁𝑏𝑟≈ = 𝑁𝑁𝑏𝑟≈(P,r, (1 + ε)r) , s.t, for a query point q, if: • b(q,r) ∩ P ≠ ∅, then 𝑁𝑁𝑏𝑟≈ returns a point u ∈ P, such that 𝑑𝐻(u, q) ≤ (1 + ε)r. • b(q, (1 + ε)r) ∩ P = ∅ then 𝑁𝑁𝑏𝑟≈ returns that no point is in distance ≤ r from q. In any other case, any of the answers is acceptable. Performance: The query time is O(d𝑛1/(1+ε) log 𝑛) and the space used is O(dn+𝑛1+1/(1+ε) log 𝑛) . The result returned is correct with high probability.
  • 34.
    ANN IN HIGHDIMENSIONAL EUCLIDEAN SPACE Recap: Theorem(2.2) : Given a set P of n points in 𝑅𝑑 , then one can construct data-structures D that answers (1 + ε)-ANN queries, by performing O(log(n/ε)) 𝑁𝑁𝑏𝑟≈ queries. The total number of points stored at 𝑁𝑁𝑏𝑟≈ data-structures of D is O(nε−1log(n/ε)). This theorem requires constructing a low quality HST but such constructions are exponential in dimension or take quadratic time. So we need to present a faster scheme.
  • 35.
    THE OVERALL RESULT Givena set P of n points in 𝑅𝑑 , parameters ε > 0 and r > 0, one can build ANN data-structure using O (dn + 𝑛1+1/(1+ε) ε−2 𝑙𝑜𝑔3 (n/ε) ) space, such that given a query point q, one can returns an (1 + ε)- ANN in P in O (d𝑛1/(1+ε) (log n) log (n/ε) ) time. The result returned is correct with high probability. The construction time is O (dn + 𝑛1+1/(1+ε) ε−2 𝑙𝑜𝑔3 (n/ε) )
  • 36.
    THE OVERALL RESULT Proof: Theorem(2.3): Let P be a set of n in 𝑅𝑑 . One can compute a nd-HST of P in O( nⅆ 𝑙𝑜𝑔2 n) time (note, that the constant hidden by the O notation does not depend on d). We compute the low quality HST using this theorem(2.3). This takes O( nⅆ 𝑙𝑜𝑔2 n) time. Using this HST, we can construct the data- structure D of Theorem(2.1), where we do not compute the 𝑁𝑁𝑏𝑟≈ data-structures. We next traverse the tree D, and construct the 𝑁𝑁𝑏𝑟≈ data-structures using Theorem 2.1. We only need to prove the bound on the space. We need to store each point only once, since other place can refer to the point by a pointer. Thus, this is the O(nd) space requirement. The other term comes from plugging the bound of Theorem 2.1 into the bound of Theorem 2.1

Editor's Notes

  • #5  As we know volume is related exponentially to the order of radius To relax the problem we solve approximate
  • #8  a hypercube is an n-dimensional analogue of a square and a cube. consists of groups of opposite parallel line segments aligned in each of the space's dimensions, perpendicular to each other and of the same length. Every point is a binary string Hamming distance ( r): • Number of different coordinates
  • #10 If there is a NN, return yes and output one ANN
  • #11 If there is no ANN, return no
  • #12 Otherwise, return either
  • #13 From this slide we can conclude that, if we can construct a (r; R; ; )-sensitive family, then we can distinguish between two points which are close together, and two points which are far away from each other. Of course, the probabilities and might be very close to each other, and we need a way to do amplification.
  • #14 Here we can see the hashing working visually
  • #15 Defining a family F Intuition: compare a random coordinate Hash function is a projection to a line of random orientation
  • #16 Intuitively, G is a family that extends F by probing into k coordinates instead of only one coordinate. We can Compare k random coordinates One composite hash function is a random grid
  • #17 We calculate value of the function h for a point k times hence the probability is
  • #18 Let be (yet another) parameter to be specified shortly. We pick g1; : : : ; g functions randomly and uniformly from G. For each point u 2 P compute g1(u); : : : ; gT(u). We construct a hash table Hi to store all the values of gi(p1); : : : ; gi(pn), for i = 1; : : : ; T.
  • #19 Given a query point q 2 Hd, we compute p1(q); : : : ; p(q), and retrieve all the points stored in those buckets in the hash tables H1; : : : ; H, respectively For every point retrieved, we compute its distance to q, and if this distance is <=R we return it. ‘fail’=we encounter more than 4 points we abort, and return If no “close” point is encountered