• Like
Sketching, Sampling, and other Sublinear Algorithms 2 (Lecture by Alex Andoni)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Sketching, Sampling, and other Sublinear Algorithms 2 (Lecture by Alex Andoni)

  • 1,005 views
Published

We will learn about modern algorithmic techniques for handling large datasets, often by using imprecise but concise representations of the data such as a sketch or a sample of the data. The lectures …

We will learn about modern algorithmic techniques for handling large datasets, often by using imprecise but concise representations of the data such as a sketch or a sample of the data. The lectures will cluster around three themes

Nearest Neighbor Search (similarity search): the general problem is, given a set of objects (e.g., images), to construct a data structure so that later, given a query object, one can efficiently find the most similar object from the database.
Streaming framework: we are required to solve a certain problem on a large collection of items that one streams through once (i.e., algorithm's memory footprint is much smaller than the dataset itself). For example, how can a router with 1Mb memory estimate the number of different IPs it sees in a multi-gigabytes long real-time traffic?
Parallel framework: we look at problems where neither the data or the output fits on a machine. For example, given a set of 2D points, how can we compute the minimum spanning tree over a cluster of machines.

The focus will be on techniques such as sketching, dimensionality reduction, sampling, hashing, and others.

Published in Education , Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,005
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
16
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
  • 2. A Sketching Problem 2  010110 010101 similar? To be or not to be To sketch or not to sketch be to similar?
  • 3. Sketch from LSH 3  1 [Broder’97]: for Jaccard coefficient
  • 4. General Theory: embeddings  Euclidean distance (ℓ2) Hamming distance Edit distance between two strings Earth-Mover (transportation) Distance Compute distance between two points Diameter/Close-pair of set S Clustering, MST, etc Nearest Neighbor Search f Reduce problem <P under hard metric> to <P under simpler metric>
  • 5. Embeddings: landscape 
  • 6. Dimension Reduction 
  • 7. Main intuition 
  • 8. 1D embedding 
  • 9. 1D embedding  2 2
  • 10. Full Dimension Reduction 
  • 11. Concentration 
  • 12. Dimension Reduction: wrap-up 
  • 13. NNS for Euclidean space 13  [Datar-Immorlica-Indyk-Mirrokni’04]
  • 14.  Regular grid → grid of balls  p can hit empty space, so take more such grids until p is in a ball  Need (too) many grids of balls  Start by projecting in dimension t  Analysis gives  Choice of reduced dimension t?  Tradeoff between  # hash tables, n , and  Time to hash, tO(t)  Total query time: dn1/c2+o(1) Near-Optimal LSH 2D p p Rt [A-Indyk’06]
  • 15. Open question:  [Prob. needle of length 1 is not cut] [Prob needle of length c is not cut] ≥ c2
  • 16. Time-Space Trade-offs [AI’06] [KOR’98, IM’98, Pan’06] [Ind’01, Pan’06] Space Time Comment Reference [DIIM’04, AI’06] [IM’98] query time space medium medium lowhigh highlow one hash table lookup! no(1/ε2) ω(1) memory lookups [AIP’06] n1+o(1/c2) ω(1) memory lookups [PTW’08, PTW’10]
  • 17. NNS beyond LSH 17 
  • 18. Finale 