Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

2,499 views

Published on

Presentation to the NYC HUG on March 5, 2013 regarding the upcoming Mahout release.

Published in:
Technology

No Downloads

Total views

2,499

On SlideShare

0

From Embeds

0

Number of Embeds

511

Shares

0

Downloads

38

Comments

0

Likes

5

No embeds

No notes for slide

- 1. News From Mahout©MapR Technologies - Confidential 1
- 2. whoami – Ted Dunning Chief Application Architect, MapR Technologies Committer, member, Apache Software Foundation – particularly Mahout, Zookeeper and Drill (we’re hiring) Contact me at tdunning@maprtech.com tdunning@apache.com ted.dunning@gmail.com @ted_dunning©MapR Technologies - Confidential 2
- 3. Slides and such (available late tonight): – http://www.mapr.com/company/events/nyhug-03-05-2013 Hash tags: #mapr #nyhug #mahout©MapR Technologies - Confidential 3
- 4. New in Mahout 0.8 is coming soon (1-2 months) gobs of fixes QR decomposition is 10x faster – makes ALS 2-3 times faster May include Bayesian Bandits Super fast k-means – fast – online (!?!)©MapR Technologies - Confidential 4
- 5. New in Mahout 0.8 is coming soon (1-2 months) gobs of fixes QR decomposition is 10x faster – makes ALS 2-3 times faster May include Bayesian Bandits Super fast k-means – fast – online (!?!) – fast Possible new edition of MiA coming – Japanese and Korean editions released, Chinese coming©MapR Technologies - Confidential 5
- 6. New in Mahout 0.8 is coming soon (1-2 months) gobs of fixes QR decomposition is 10x faster – makes ALS 2-3 times faster May include Bayesian Bandits Super fast k-means – fast – online (!?!) – fast Possible new edition of MiA coming – Japanese and Korean editions released, Chinese coming©MapR Technologies - Confidential 6
- 7. Real-time Learning©MapR Technologies - Confidential 7
- 8. We have a product to sell … from a web-site©MapR Technologies - Confidential 8
- 9. What tag- What line? picture? Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! What call to action?©MapR Technologies - Confidential 9
- 10. The Challenge Design decisions affect probability of success – Cheesy web-sites don’t even sell cheese The best designers do better when allowed to fail – Exploration juices creativity But failing is expensive – If only because we could have succeeded – But also because offending or disappointing customers is bad©MapR Technologies - Confidential 10
- 11. More Challenges Too many designs – 5 pictures – 10 tag-lines – 4 calls to action – 3 back-ground colors => 5 x 10 x 4 x 3 = 600 designs It gets worse quickly – What about changes on the back-end? – Search engine variants? – Checkout process variants?©MapR Technologies - Confidential 11
- 12. Example – AB testing in real-time I have 15 versions of my landing page Each visitor is assigned to a version – Which version? A conversion or sale or whatever can happen – How long to wait? Some versions of the landing page are horrible – Don’t want to give them traffic©MapR Technologies - Confidential 12
- 13. A Quick Diversion You see a coin – What is the probability of heads? – Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value?©MapR Technologies - Confidential 13
- 14. A Philosophical Conclusion Probability as expressed by humans is subjective and depends on information and experience©MapR Technologies - Confidential 14
- 15. I Dunno©MapR Technologies - Confidential 15
- 16. 5 heads out of 10 throws©MapR Technologies - Confidential 16
- 17. 2 heads out of 12 throws©MapR Technologies - Confidential 17
- 18. So now you understand Bayesian probability©MapR Technologies - Confidential 18
- 19. Another Quick Diversion Let’s play a shell game This is a special shell game It costs you nothing to play The pea has constant probability of being under each shell (trust me) How do you find the best shell? How do you find it while maximizing the number of wins?©MapR Technologies - Confidential 19
- 20. Pause for short con-game©MapR Technologies - Confidential 20
- 21. Interim Thoughts Can you identify winners or losers without trying them out? Can you ever completely eliminate a shell with a bad streak? Should you keep trying apparent losers?©MapR Technologies - Confidential 21
- 22. So now you understand multi-armed bandits©MapR Technologies - Confidential 22
- 23. Conclusions Can you identify winners or losers without trying them out? No Can you ever completely eliminate a shell with a bad streak? No Should you keep trying apparent losers? Yes, but at a decreasing rate©MapR Technologies - Confidential 23
- 24. Is there an optimum strategy?©MapR Technologies - Confidential 24
- 25. Bayesian Bandit Compute distributions based on data so far Sample p1, p2 and p2 from these distributions Pick shell i where i = argmaxi pi Lemma 1: The probability of picking shell i will match the probability it is the best shell Lemma 2: This is as good as it gets©MapR Technologies - Confidential 25
- 26. And it works! 0.12 0.11 0.1 0.09 0.08 0.07 regret 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 500 600 700 800 900 1000 1100 n©MapR Technologies - Confidential 26
- 27. Video Demo©MapR Technologies - Confidential 27
- 28. The Code Select an alternative n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0))) Select and learn for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k) But we already know how to count!©MapR Technologies - Confidential 28
- 29. The Basic Idea We can encode a distribution by sampling Sampling allows unification of exploration and exploitation Can be extended to more general response models©MapR Technologies - Confidential 29
- 30. The Original Problem x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3©MapR Technologies - Confidential 30
- 31. Response Function æ ö p(win) = w çåqi xi ÷ è i ø 1 0.5 y 0 -6 -4 -2 0 2 4 6 x©MapR Technologies - Confidential 31
- 32. Generalized Banditry Suppose we have an infinite number of bandits – suppose they are each labeled by two real numbers x and y in [0,1] – also that expected payoff is a parameterized function of x and y E [ z ] = f (x, y | q ) – now assume a distribution for θ that we can learn online Selection works by sampling θ, then computing f Learning works by propagating updates back to θ – If f is linear, this is very easy – For special other kinds of f it isn’t too hard Don’t just have to have two labels, could have labels and context©MapR Technologies - Confidential 32
- 33. Context Variables x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3 user.geo env.time env.day_of_week env.weekend©MapR Technologies - Confidential 33
- 34. Caveats Original Bayesian Bandit only requires real-time Generalized Bandit may require access to long history for learning – Pseudo online learning may be easier than true online Bandit variables can include content, time of day, day of week Context variables can include user id, user features Bandit × context variables provide the real power©MapR Technologies - Confidential 34
- 35. You can do this yourself!©MapR Technologies - Confidential 35
- 36. Super-fast k-means Clustering©MapR Technologies - Confidential 36
- 37. Rationale©MapR Technologies - Confidential 37
- 38. What is Quality? Robust clustering not a goal – we don’t care if the same clustering is replicated Generalization is critical Agreement to “gold standard” is a non-issue©MapR Technologies - Confidential 38
- 39. An Example©MapR Technologies - Confidential 39
- 40. An Example©MapR Technologies - Confidential 40
- 41. Diagonalized Cluster Proximity©MapR Technologies - Confidential 41
- 42. Clusters as Distribution Surrogate©MapR Technologies - Confidential 42
- 43. Clusters as Distribution Surrogate©MapR Technologies - Confidential 43
- 44. Theory©MapR Technologies - Confidential 44
- 45. For Example 1 D (X) > 2 D (X) 2 s 4 2 5 Grouping these two clusters seriously hurts squared distance©MapR Technologies - Confidential 45
- 46. Algorithms©MapR Technologies - Confidential 46
- 47. Typical k-means Failure Selecting two seeds here cannot be fixed with Lloyds Result is that these two clusters get glued together©MapR Technologies - Confidential 47
- 48. Ball k-means Provably better for highly clusterable data Tries to find initial centroids in each “core” of each real clusters Avoids outliers in centroid computation initialize centroids randomly with distance maximizing tendency for each of a very few iterations: for each data point: assign point to nearest cluster recompute centroids using only points much closer than closest cluster©MapR Technologies - Confidential 48
- 49. Still Not a Win Ball k-means is nearly guaranteed with k = 2 Probability of successful seeding drops exponentially with k Alternative strategy has high probability of success, but takes O(nkd + k3d) time©MapR Technologies - Confidential 49
- 50. Still Not a Win Ball k-means is nearly guaranteed with k = 2 Probability of successful seeding drops exponentially with k Alternative strategy has high probability of success, but takes O( nkd + k3d ) time But for big data, k gets large©MapR Technologies - Confidential 50
- 51. Surrogate Method Start with sloppy clustering into lots of clusters κ = k log n clusters Use this sketch as a weighted surrogate for the data Results are provably good for highly clusterable data©MapR Technologies - Confidential 51
- 52. Algorithm Costs Surrogate methods – fast, sloppy single pass clustering with κ = k log n – fast sloppy search for nearest cluster, O(d log κ) = O(d (log k + log log n)) per point – fast, in-memory, high-quality clustering of κ weighted centroids O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O(d log κ log k) for larger k, looser quality – result is k high-quality centroids • Even the sloppy surrogate may suffice©MapR Technologies - Confidential 52
- 53. Algorithm Costs Surrogate methods – fast, sloppy single pass clustering with κ = k log n – fast sloppy search for nearest cluster, O(d log κ) = O(d ( log k + log log n )) per point – fast, in-memory, high-quality clustering of κ weighted centroids O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O( d log k ( log k + log log n ) ) for larger k, looser quality – result is k high-quality centroids • For many purposes, even the sloppy surrogate may suffice©MapR Technologies - Confidential 53
- 54. Algorithm Costs How much faster for the sketch phase? – take k = 2000, d = 10, n = 100,000 – k d log n = 2000 x 10 x 26 = 500,000 – d (log k + log log n) = 10(11 + 5) = 170 – 3,000 times faster is a bona fide big deal©MapR Technologies - Confidential 54
- 55. Algorithm Costs How much faster for the sketch phase? – take k = 2000, d = 10, n = 100,000 – k d log n = 2000 x 10 x 26 = 500,000 – d (log k + log log n) = 10(11 + 5) = 170 – 3,000 times faster is a bona fide big deal©MapR Technologies - Confidential 55
- 56. How It Works For each point – Find approximately nearest centroid (distance = d) – If (d > threshold) new centroid – Else if (u > d/threshold) new cluster – Else add to nearest centroid If centroids > κ ≈ C log N – Recursively cluster centroids with higher threshold©MapR Technologies - Confidential 56
- 57. Implementation©MapR Technologies - Confidential 57
- 58. But Wait, … Finding nearest centroid is inner loop This could take O( d κ ) per point and κ can be big Happily, approximate nearest centroid works fine©MapR Technologies - Confidential 58
- 59. Projection Search total ordering!©MapR Technologies - Confidential 59
- 60. LSH Bit-match Versus Cosine 1 0.8 0.6 0.4 0.2 Y Ax is 0 0 8 16 24 32 40 48 56 64 - 0.2 - 0.4 - 0.6 - 0.8 -1 X Ax is©MapR Technologies - Confidential 60
- 61. Results©MapR Technologies - Confidential 61
- 62. Parallel Speedup? 200 Non- threaded ✓ 100 2 Tim e per point (μs) Threaded version 3 50 4 40 6 5 8 30 10 14 12 20 Perfect Scaling 16 10 1 2 3 4 5 20 Threads©MapR Technologies - Confidential 62
- 63. Quality Ball k-means implementation appears significantly better than simple k-means Streaming k-means + ball k-means appears to be about as good as ball k-means alone All evaluations on 20 newsgroups with held-out data Figure of merit is mean and median squared distance to nearest cluster©MapR Technologies - Confidential 63
- 64. Contact Me! We’re hiring at MapR in US and Europe MapR software available for research use Get the code as part of Mahout trunk (or 0.8 very soon) Contact me at tdunning@maprtech.com or @ted_dunning Share news with @apachemahout©MapR Technologies - Confidential 64

No public clipboards found for this slide

Be the first to comment