Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- CV2011-2. Lecture 06. Structure fr... by Anton Konushin 2549 views
- CV2011-2. Lecture 08. Multi-view st... by Anton Konushin 1142 views
- Sketching, Sampling, and other Subl... by Anton Konushin 824 views
- Computer vision infrastracture by Anton Konushin 1324 views
- CV2011-2. Lecture 07. Binocular st... by Anton Konushin 1535 views
- Writing a computer vision paper by Anton Konushin 1474 views

1,303 views

Published on

Nearest Neighbor Search (similarity search): the general problem is, given a set of objects (e.g., images), to construct a data structure so that later, given a query object, one can efficiently find the most similar object from the database.

Streaming framework: we are required to solve a certain problem on a large collection of items that one streams through once (i.e., algorithm's memory footprint is much smaller than the dataset itself). For example, how can a router with 1Mb memory estimate the number of different IPs it sees in a multi-gigabytes long real-time traffic?

Parallel framework: we look at problems where neither the data or the output fits on a machine. For example, given a set of 2D points, how can we compute the minimum spanning tree over a cluster of machines.

The focus will be on techniques such as sketching, dimensionality reduction, sampling, hashing, and others.

No Downloads

Total views

1,303

On SlideShare

0

From Embeds

0

Number of Embeds

699

Shares

0

Downloads

22

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
- 2. A Sketching Problem 2 010110 010101 similar? To be or not to be To sketch or not to sketch be to similar?
- 3. Sketch from LSH 3 1 [Broder’97]: for Jaccard coefficient
- 4. General Theory: embeddings Euclidean distance (ℓ2) Hamming distance Edit distance between two strings Earth-Mover (transportation) Distance Compute distance between two points Diameter/Close-pair of set S Clustering, MST, etc Nearest Neighbor Search f Reduce problem <P under hard metric> to <P under simpler metric>
- 5. Embeddings: landscape
- 6. Dimension Reduction
- 7. Main intuition
- 8. 1D embedding
- 9. 1D embedding 2 2
- 10. Full Dimension Reduction
- 11. Concentration
- 12. Dimension Reduction: wrap-up
- 13. NNS for Euclidean space 13 [Datar-Immorlica-Indyk-Mirrokni’04]
- 14. Regular grid → grid of balls p can hit empty space, so take more such grids until p is in a ball Need (too) many grids of balls Start by projecting in dimension t Analysis gives Choice of reduced dimension t? Tradeoff between # hash tables, n , and Time to hash, tO(t) Total query time: dn1/c2+o(1) Near-Optimal LSH 2D p p Rt [A-Indyk’06]
- 15. Open question: [Prob. needle of length 1 is not cut] [Prob needle of length c is not cut] ≥ c2
- 16. Time-Space Trade-offs [AI’06] [KOR’98, IM’98, Pan’06] [Ind’01, Pan’06] Space Time Comment Reference [DIIM’04, AI’06] [IM’98] query time space medium medium lowhigh highlow one hash table lookup! no(1/ε2) ω(1) memory lookups [AIP’06] n1+o(1/c2) ω(1) memory lookups [PTW’08, PTW’10]
- 17. NNS beyond LSH 17
- 18. Finale

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment