Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- CouchConf-SF-Couchbase-Performance-... by Couchbase 3880 views
- Couchbase Server 2.0 - Indexing and... by Dipti Borkar 6809 views
- Development Platform as a Service -... by IBM Sverige 1678 views
- Big Data Analysis Patterns - TriHUG... by boorad 5950 views
- OpenStack Heat slides by dbelova 7521 views
- Cassandra at Instagram (August 2013) by Rick Branson 25785 views

6,394 views

Published on

A quick BOF talk I gave in Paris at Devoxx

Published in:
Technology

No Downloads

Total views

6,394

On SlideShare

0

From Embeds

0

Number of Embeds

3,505

Shares

0

Downloads

46

Comments

0

Likes

10

No embeds

No notes for slide

- 1. Practical Machine Learning with Mahout
- 2. whoami – Ted Dunning• Chief Application Architect, MapR Technologies• Committer, member, Apache Software Foundation – particularly Mahout, Zookeeper and Drill (we’re hiring)• Contact me at tdunning@maprtech.com tdunning@apache.com ted.dunning@gmail.com @ted_dunning
- 3. Agenda• What works at scale• Recommendation• Unsupervised - Clustering
- 4. What Works at Scale• Logging• Counting• Session grouping
- 5. What Works at Scale• Logging• Counting• Session grouping• Really. Don’t bet on anything much more complex than these
- 6. What Works at Scale• Logging• Counting• Session grouping• Really. Don’t bet on anything much more complex than these• These are harder than they look
- 7. Recommendations
- 8. Recommendations• Special case of reflected intelligence• Traditionally “people who bought x also bought y”• But soooo much more is possible
- 9. Examples• Customers buying books (Linden et al)• Web visitors rating music (Shardanand and Maes) or movies (Riedl, et al), (Netflix)• Internet radio listeners not skipping songs (Musicmatch)• Internet video watchers watching >30 s
- 10. Dyadic Structure• Functional – Interaction: actor -> item*• Relational – Interaction ⊆ Actors x Items• Matrix – Rows indexed by actor, columns by item – Value is count of interactions• Predict missing observations
- 11. Recommendations Analysis• R(x,y) = # people who bought x also bought yselect x, y, count(*) from ( (select distinct(user_id, item_id) as x from log) A join (select distinct(user_id, item_id) as y from log) B on user_id) group by x, y
- 12. Recommendations Analysis• R(x,y) = People who bought x also bought yselect x, y, count(*) from ( (select distinct(user_id, item_id) as x from log) A join (select distinct(user_id, item_id) as y from log) B on user_id) group by x, y
- 13. Recommendations Analysis• R(x,y) = People who bought x also bought yselect x, y, count(*) from ( (select distinct(user_id, item_id) as x from log) A join (select distinct(user_id, item_id) as y from log) B on user_id) group by x, y
- 14. Recommendations Analysis• R(x,y) = People who bought x also bought yselect x, y, count(*) from ( (select distinct(user_id, item_id) as x from log) A join (select distinct(user_id, item_id) as y from log) B on user_id) group by x, y
- 15. Recommendations Analysis• R(x,y) = People who bought x also bought yselect x, y, count(*) from ( (select distinct(user_id, item_id) as x from log) A join (select distinct(user_id, item_id) as y from log) B on user_id) group by x, y
- 16. Recommendations Analysis• R(x,y) = People who bought x also bought yselect x, y, count(*) from ( (select distinct(user_id, item_id) as x from log) A join (select distinct(user_id, item_id) as y from log) B on user_id) group by x, y
- 17. Recommendations Analysis Rij = å A ui Buj u =A BT
- 18. Fundamental Algorithmic Structure• Cooccurrence K=A A T• Matrix approximation by factoring A » USV T K » VS2 VT r = VS V h 2 T• LLR r = sparsify(A A)h T
- 19. But Wait!• Cooccurrence K=A A T• Cross occurrence K=B A T
- 20. For example• Users enter queries (A) – (actor = user, item=query)• Users view videos (B) – (actor = user, item=video)• A’A gives query recommendation – “did you mean to ask for”• B’B gives video recommendation – “you might like these videos”
- 21. The punch-line• B’A recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
- 22. Real-life example• Query: “Paco de Lucia”• Conventional meta-data search results: – “hombres del paco” times 400 – not much else• Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
- 23. Real-life example
- 24. Hypothetical Example• Want a navigational ontology?• Just put labels on a web page with traffic – This gives A = users x label clicks• Remember viewing history – This gives B = users x items• Cross recommend – B’A = label to item mapping• After several users click, results are whatever users think they should be
- 25. Super-fast k-means Clustering
- 26. RATIONALE
- 27. What is Quality?• Robust clustering not a goal – we don’t care if the same clustering is replicated• Generalization is critical• Agreement to “gold standard” is a non-issue
- 28. An Example
- 29. An Example
- 30. Diagonalized Cluster Proximity
- 31. Clusters as Distribution Surrogate
- 32. Clusters as Distribution Surrogate
- 33. THEORY
- 34. For Example 1 D (X) > 2 D (X) 2 s 4 2 5 Grouping these two clusters seriously hurts squared distance
- 35. ALGORITHMS
- 36. Typical k-means Failure Selecting two seeds here cannot be fixed with Lloyds Result is that these two clusters get glued together
- 37. Ball k-means• Provably better for highly clusterable data• Tries to find initial centroids in each “core” of each real clusters• Avoids outliers in centroid computation initialize centroids randomly with distance maximizing tendency for each of a very few iterations: for each data point: assign point to nearest cluster recompute centroids using only points much closer than closest cluster
- 38. Still Not a Win• Ball k-means is nearly guaranteed with k = 2• Probability of successful seeding drops exponentially with k• Alternative strategy has high probability of success, but takes O(nkd + k3d) time
- 39. Still Not a Win• Ball k-means is nearly guaranteed with k = 2• Probability of successful seeding drops exponentially with k• Alternative strategy has high probability of success, but takes O( nkd + k3d ) time• But for big data, k gets large
- 40. Surrogate Method• Start with sloppy clustering into lots of clusters κ = k log n clusters• Use this sketch as a weighted surrogate for the data• Results are provably good for highly clusterable data
- 41. Algorithm Costs• Surrogate methods – fast, sloppy single pass clustering with κ = k log n – fast sloppy search for nearest cluster, O(d log κ) = O(d (log k + log log n)) per point – fast, in-memory, high-quality clustering of κ weighted centroids O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O(d log κ log k) for larger k, looser quality – result is k high-quality centroids • Even the sloppy surrogate may suffice
- 42. Algorithm Costs• Surrogate methods – fast, sloppy single pass clustering with κ = k log n – fast sloppy search for nearest cluster, O(d log κ) = O(d ( log k + log log n )) per point – fast, in-memory, high-quality clustering of κ weighted centroids O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O( d log k ( log k + log log n ) ) for larger k, looser quality – result is k high-quality centroids • For many purposes, even the sloppy surrogate may suffice
- 43. Algorithm Costs• How much faster for the sketch phase? – take k = 2000, d = 10, n = 100,000 – k d log n = 2000 x 10 x 26 = 500,000 – d (log k + log log n) = 10(11 + 5) = 170 – 3,000 times faster is a bona fide big deal
- 44. Algorithm Costs• How much faster for the sketch phase? – take k = 2000, d = 10, n = 100,000 – k d log n = 2000 x 10 x 26 = 500,000 – d (log k + log log n) = 10(11 + 5) = 170 – 3,000 times faster is a bona fide big deal
- 45. How It Works• For each point – Find approximately nearest centroid (distance = d) – If (d > threshold) new centroid – Else if (u > d/threshold) new cluster – Else add to nearest centroid• If centroids > κ ≈ C log N – Recursively cluster centroids with higher threshold
- 46. IMPLEMENTATION
- 47. But Wait, …• Finding nearest centroid is inner loop• This could take O( d κ ) per point and κ can be big• Happily, approximate nearest centroid works fine
- 48. Projection Search total ordering!
- 49. 1 LSH Bit-match Versus Cosine 0.8 0.6 0.4 0.2Y Ax is 0 0 8 16 24 32 40 48 56 64 - 0.2 - 0.4 - 0.6 - 0.8 -1 X Ax is
- 50. RESULTS
- 51. Parallel Speedup? 200 Non- threaded ✓ 100 2Tim e per point (μs) Threaded version 3 50 4 40 6 5 8 30 10 14 12 20 Perfect Scaling 16 10 1 2 3 4 5 20 Threads
- 52. Quality• Ball k-means implementation appears significantly better than simple k-means• Streaming k-means + ball k-means appears to be about as good as ball k-means alone• All evaluations on 20 newsgroups with held-out data• Figure of merit is mean and median squared distance to nearest cluster
- 53. Contact Me!• We’re hiring at MapR in US and Europe• MapR software available for research use• Get the code as part of Mahout trunk (or 0.8 very soon)• Contact me at tdunning@maprtech.com or @ted_dunning• Share news with @apachemahout

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment