Converting High DimensionalProblems to Low Dimensional           Ones
General Paradigm                 Reduce and Conquer• Large Problem  Small Problem   – Break array into two parts   – Cons...
The Problem• Given n points in D dimensional space• Project them in d << D dimensions   – So (Euclidean) distance between ...
Application• Hierarchical Clustering• Say ten thousand samples each over a few million  SNPs• Few million  Few Hundreds/T...
First Attempt• Can we make d=n-1?  – X axis through 2 of the points  – Y axis so 3rd point is in the XY    plane  – Z axis...
First Attempt• Time taken  – Each new axis has to be made    orthogonal to all previous axes  – O(n2 D)  – Too slow
Second Attempt          Use Random Projections• Take d random vectors r1..rd• For every point p, take the d dimensional po...
Random Projections              Further Simplification• Take any vector p in D dimensions• Suppose we show   – [ p.r1 p.r2...
Random Projections        What is a random vector?• No directional bias
Normal Distributions• Pr of being between x and x+dx       For N(0,1), ~ e-x2/2
Generating Random Vectors without           Directional Bias• Take D numbers (X1...XD), each N(0,1), independently• Distri...
The Algorithm• Take d random vectors r1..rd   – Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from     N(0,1) indepen...
Simplifying Further• Take any vector p in D dimensions• We need to show that    • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has le...
Analysis• We need to show that       • [X1 X2 … Xd] * sqrt(1/d) has length ~ 1       • Failure prob < 1/n3• Or (X12+…+Xd2)...
Law of Large Numbers• Y1..Yd each with any (decent) distribution with mean  1 and s.d sqrt(2)• Then Y1+…+Yd tends to a Nor...
Conclusion• n numbers in D dimensions  – can be projected to 12 ln n/∆2 dimensions  – all distances stretch only by (1+/-∆...
Upcoming SlideShare
Loading in...5
×

Converting High Dimensional Problems to Low Dimensional Ones

413

Published on

Published in: Technology, Self Improvement
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
413
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Converting High Dimensional Problems to Low Dimensional Ones

  1. 1. Converting High DimensionalProblems to Low Dimensional Ones
  2. 2. General Paradigm Reduce and Conquer• Large Problem  Small Problem – Break array into two parts – Consider odd and even elements – Sample edges in a graph to obtain a smaller graph – Represent a graph by a collection of trees – Take number modulo small prime – Multiply matrix by a random vector – Project high dimensional point sets into fewer dimensions
  3. 3. The Problem• Given n points in D dimensional space• Project them in d << D dimensions – So (Euclidean) distance between every pair of points is (almost) preserved• How does d compare to n?
  4. 4. Application• Hierarchical Clustering• Say ten thousand samples each over a few million SNPs• Few million  Few Hundreds/Thousands? And Fast?
  5. 5. First Attempt• Can we make d=n-1? – X axis through 2 of the points – Y axis so 3rd point is in the XY plane – Z axis so 4th point is in the XYZ 3d space – And so on
  6. 6. First Attempt• Time taken – Each new axis has to be made orthogonal to all previous axes – O(n2 D) – Too slow
  7. 7. Second Attempt Use Random Projections• Take d random vectors r1..rd• For every point p, take the d dimensional point • [ p.r1 p.r2 .. p.rd ] * scaling-factor• Do these d-dim points preserve inter-point distances approximately? How large should d be?
  8. 8. Random Projections Further Simplification• Take any vector p in D dimensions• Suppose we show – [ p.r1 p.r2 .. p.rd ] * scaling-factor has length ~ |p| – Failure prob < 1/n3• Prob that even one of the n2 difference vector lengths is not preserved with prob < n2/n3 ~ 1/n
  9. 9. Random Projections What is a random vector?• No directional bias
  10. 10. Normal Distributions• Pr of being between x and x+dx For N(0,1), ~ e-x2/2
  11. 11. Generating Random Vectors without Directional Bias• Take D numbers (X1...XD), each N(0,1), independently• Distribution of each number X – Pr of being between a..a+da ~ e-a2/2• Pr X1 in a1..a1+da1 : X2 in a2..a2+da2 ::: XD in aD..aD+daD – e-a12/2 e-a22/2 … e-aD2/2 da1da2….daD – e-(a12+a22+aD2)/2 da1da2….daD – e-l2/2 da1da2….daD So no dependence on direction, only on length l !
  12. 12. The Algorithm• Take d random vectors r1..rd – Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from N(0,1) independently• For every point p, take the d dimensional point • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d)• Time: n*d*D
  13. 13. Simplifying Further• Take any vector p in D dimensions• We need to show that • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has length ~ |p| • Failure prob < 1/n3• We can assume p to be 1 0 0 0 0 0 … – because random vectors have no directional bias – Then [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) = [X11 X21 … Xd1] * sqrt(1/d)
  14. 14. Analysis• We need to show that • [X1 X2 … Xd] * sqrt(1/d) has length ~ 1 • Failure prob < 1/n3• Or (X12+…+Xd2)/d ~ 1, failure prob < 1/n3• Or (X12+…+Xd2) ~ d, failure prob < 1/n3• Note Xi has mean 1 and s.d sqrt(2)
  15. 15. Law of Large Numbers• Y1..Yd each with any (decent) distribution with mean 1 and s.d sqrt(2)• Then Y1+…+Yd tends to a Normal distribution with mean d and s.d sqrt(2d) (for large d)• Pr (Y1+…+Yd not in (1+∆)d.. (1-∆)d) < • e-(∆d)2/2.2d = e-∆2d/4• Choose d=12 ln n/∆2 , this is < 1/n3 as needed
  16. 16. Conclusion• n numbers in D dimensions – can be projected to 12 ln n/∆2 dimensions – all distances stretch only by (1+/-∆) – with prob > 1-1/n
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×