Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Multidimensional Data in the VO by Jose Enrique Ruiz 500 views
- Agile Testing - LAST Conference 2015 by Theresa Neate 880 views
- ECCV2010: feature learning for imag... by zukun 1849 views
- Multidimensional Indexing by Digvijay Singh 5014 views
- Project - Deep Locality Sensitive H... by Gabriele Angeletti 121 views
- A survey on massively Parallelism f... by Tejovat Technolog... 152 views

1,333 views

Published on

Nearest Neighbor Search (similarity search): the general problem is, given a set of objects (e.g., images), to construct a data structure so that later, given a query object, one can efficiently find the most similar object from the database.

Streaming framework: we are required to solve a certain problem on a large collection of items that one streams through once (i.e., algorithm's memory footprint is much smaller than the dataset itself). For example, how can a router with 1Mb memory estimate the number of different IPs it sees in a multi-gigabytes long real-time traffic?

Parallel framework: we look at problems where neither the data or the output fits on a machine. For example, given a set of 2D points, how can we compute the minimum spanning tree over a cluster of machines.

The focus will be on techniques such as sketching, dimensionality reduction, sampling, hashing, and others.

No Downloads

Total views

1,333

On SlideShare

0

From Embeds

0

Number of Embeds

700

Shares

0

Downloads

38

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Sketching, Sampling and other Sublinear Algorithms: Nearest Neighbor Search Alex Andoni (MSR SVC)
- 2. Nearest Neighbor Search (NNS)
- 3. Motivation Generic setup: Points model objects (e.g. images) Distance models (dis)similarity measure Application areas: machine learning: k-NN rule speech/image/video/music recognition, vector quantization, bioinformatics, etc… Distance can be: Hamming, Euclidean, edit distance, Earth-mover distance, etc… Primitive for other problems: find the similar pairs in a set D, clustering… 000000 011100 010100 000100 010100 011111 000000 001100 000100 000100 110100 111111
- 4. Lecture Plan 1. Locality-Sensitive Hashing 2. LSH as a Sketch 3. Towards Embeddings
- 5. 2D case
- 6. High-dimensional case Algorithm Query time Space Full indexing No indexing – linear scan
- 7. Approximate NNS q r p cr
- 8. Heuristic for Exact NNS q r p cr
- 9. Approximation Algorithms for NNS A vast literature: milder dependence on dimension [Arya-Mount’93], [Clarkson’94],[Arya-Mount-Netanyahu-Silverman- We’98], [Kleinberg’97],[Har-Peled’02],…[Aiger-Kaplan-Sharir’13], little to no dependence on dimension [Indyk-Motwani’98],[Kushilevitz-Ostrovsky-Rabani’98],[Indyk’98, ‘01], [Gionis-Indyk-Motwani’99], [Charikar’02], [Datar-Immorlica- Indyk-Mirrokni’04],[Chakrabarti-Regev’04], [Panigrahy’06], [Ailon- Chazelle’06], [A-Indyk’06],… [A-Indyk-Nguyen-Razenshteyn’??]
- 10. Locality-Sensitive Hashing q p 1 [Indyk-Motwani’98] q “not-so-small”
- 11. Locality sensitive hash functions 11
- 12. Formal description 12
- 13. Analysis of LSH Scheme 13
- 14. Analysis: Correctness 14
- 15. Analysis: Runtime 15
- 16. LSH in the wild 16 safety not guaranteed fewer false positives fewer tables
- 17. LSH Zoo 17 To be or not to be To sketch or not to sketch …21102… be to or not sketch …01122… be to or not sketch …11101… …01111… {be,not,or,to} {not,or,to, sketch} 1 1 not not be to

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment