2. Introduction
Internet has brought us a wealth of data all now available
at our fingerprints.
With rapid growth of computer we don’t have the
processing power to search this amount of data by brute
force.
3. Finding similar objects
Given a query point we wish to find the point in a large
dataset that are closest to the query.
In many application objects are not identical yet they share
large portions of their content.
i.Movie Rating
ii.Online purchasing
iii.Article from the same source
4. ● This problem can be easily solved by iterating
through each point in database and calculating
the distance to the query object.
● But our database may contain billions of objects
each object described by a vector that contains
hundreds of dimension.
● Therefor the processing time grows linearly with
number of items and the complexity of the object.
5. LSH
It allows us to quickly find similar entries in large
database.
LSH is randomized algorithm which means it does not
guarantee an exact answer but instead provide a high
probability guarantee that it will return the correct answer or
close to it.
LSH reduces the dimensionality of high dimensional data
and it does not depends on linear search of the database.