1©MapR Technologies 2013- Confidential
Introduction to Mahout
And How To Build a Recommender
2©MapR Technologies 2013- Confidential
Me, Us
 Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahou...
3©MapR Technologies 2013- Confidential
Requested Topic For Tonight
 What is Mahout?
 What makes it different?
 How can ...
4©MapR Technologies 2013- Confidential
Also
 What is MapR?
 What is MapR doing?
 How does MapR’s technology work?
 How...
5©MapR Technologies 2013- Confidential
Oh … Also This
 Detailed break-down of a live machine learning system running
with...
6©MapR Technologies 2013- Confidential
I may have to
summarize
7©MapR Technologies 2013- Confidential
I may have to
summarize
just a bit
8©MapR Technologies 2013- Confidential
Part 1:
5 minutes of math
9©MapR Technologies 2013- Confidential
Part 2:
12 minutes: I want a pony
10©MapR Technologies 2013- Confidential
Part 3:
A working example
11©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
12©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë...
13©MapR Technologies 2013- Confidential
Comparison of Three Main ML Topics
 Recommendation:
– Involves observation of int...
14©MapR Technologies 2013- Confidential
15©MapR Technologies 2013- Confidential
16©MapR Technologies 2013- Confidential
Part 1:
A bit of math
(the math of bits)
17©MapR Technologies 2013- Confidential
Mahout Math
 Goals are
– basic linear algebra,
– and statistical sampling,
– and ...
18©MapR Technologies 2013- Confidential
Matrices and Vectors
 At the core:
– DenseVector, RandomAccessSparseVector
– Dens...
19©MapR Technologies 2013- Confidential
Assign? View?
 Why assign?
– Copying is the major cost for naïve matrix packages
...
24©MapR Technologies 2013- Confidential
Examples
A =a
A =aB+ b
double alpha; a.assign(alpha);
a.assign(b, Functions.chain(...
26©MapR Technologies 2013- Confidential
More Examples
 The trace of a matrix
 Set diagonal to zero
 Set diagonal to neg...
27©MapR Technologies 2013- Confidential
Examples
 The trace of a matrix
 Set diagonal to zero
 Set diagonal to negative...
28©MapR Technologies 2013- Confidential
Examples
 The trace of a matrix
 Set diagonal to zero
 Set diagonal to negative...
29©MapR Technologies 2013- Confidential
Examples
 The trace of a matrix
 Set diagonal to zero
 Set diagonal to negative...
32©MapR Technologies 2013- Confidential
Clustering and Such
 Streaming k-means and ball k-means
– streaming reduces very ...
33©MapR Technologies 2013- Confidential
Mahout Math Summary
 Matrices, Vectors
– views
– in-place assignment
– aggregatio...
34©MapR Technologies 2013- Confidential
Part 2:
How recommenders work
(I still want a pony)
35©MapR Technologies 2013- Confidential
Recommendations
Behavior of a
crowd helps us
understand what
individuals will do
36©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a
puppy
Charles got a bicycle
Alice
Charles
37©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a
puppy
Charles got a bicycle
Bob got an ap...
38©MapR Technologies 2013- Confidential
Recommendations
What else would Bob like??
Alice
Bob
Charles
39©MapR Technologies 2013- Confidential
Recommendations
What if everybody gets a
pony?
Now what does Bob want?
?
Alice
Bob...
40©MapR Technologies 2013- Confidential
Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice
41©MapR Technologies 2013- Confidential
Log Files
u1
u3
u2
u1
u3
u2
u1
t1
t2
t3
t4
t3
t3
t1
42©MapR Technologies 2013- Confidential
Log Files and Dimensions
u1
u3
u2
u1
u3
u2
u1
t1
t2
t3
t4
t3
t3
t1
t1
t2
t3
t4
Thi...
43©MapR Technologies 2013- Confidential
History Matrix
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔
44©MapR Technologies 2013- Confidential
Cooccurrence Matrix
1 2
1 1
1
1
2 1
45©MapR Technologies 2013- Confidential
Indicator Matrix
✔
46©MapR Technologies 2013- Confidential
Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywo...
47©MapR Technologies 2013- Confidential
Problems with Raw Cooccurrence
 Very popular items co-occur with everything
– Wel...
48©MapR Technologies 2013- Confidential
Recommendation Basics
 Coocurrence
t3 not t3
t1 2 1
not t1 1 1
49©MapR Technologies 2013- Confidential
Spot the Anomaly
 Root LLR is roughly like standard deviations
A not A
B 13 1000
...
50©MapR Technologies 2013- Confidential
A Quick Simplification
 Users who do h (a vector of things a user has done)
 Als...
51©MapR Technologies 2013- Confidential
Symmetry Gives Cross Recommentations
AT
A( )h
BT
A( )h
Conventional recommendation...
52©MapR Technologies 2013- Confidential
For example
 Users enter queries (A)
– (actor = user, item=query)
 Users view vi...
53©MapR Technologies 2013- Confidential
The punch-line
 BTA recommends videos in response to a query
– (isn’t that a sear...
54©MapR Technologies 2013- Confidential
Real-life example
 Query: “Paco de Lucia”
 Conventional meta-data search results...
55©MapR Technologies 2013- Confidential
Real-life example
56©MapR Technologies 2013- Confidential
Hypothetical Example
 Want a navigational ontology?
 Just put labels on a web pa...
57©MapR Technologies 2013- Confidential
Nice. But we
can do better?
58©MapR Technologies 2013- Confidential
Ausers
things
59©MapR Technologies 2013- Confidential
A1 A2
é
ë
ù
û
users
thing
type 1
thing
type 2
60©MapR Technologies 2013- Confidential
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A...
61©MapR Technologies 2013- Confidential
Part 3:
What about that worked
example?
62©MapR Technologies 2013- Confidential
Metrics and
logs (5)
Cooccurrence
analysis (7)
Post to
search
engine (8)
Search
en...
63©MapR Technologies 2013- Confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Inde...
64©MapR Technologies 2013- Confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
h...
65©MapR Technologies 2013- Confidential
Objective Results
 At a very large credit card company
 History is all transacti...
66©MapR Technologies 2013- Confidential
Summary
 Input: Multiple kinds of behavior on one set of things
 Output: Recomme...
67©MapR Technologies 2013- Confidential
Objective Results
 At a very large credit card company
 History is all transacti...
68©MapR Technologies 2013- Confidential
Me, Us
 Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Maho...
Upcoming SlideShare
Loading in...5
×

Mahout and Recommendations

1,109

Published on

These are the slides from my talk at DFW Big Data. This includes the first version of the Mahout dog and pony show.

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,109
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
102
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
  • Talk track: Under the covers, machine learning looks very complicated. So how do you get from here to the familiar examples? Tonight’s presentation will show you some simple tricks to help you apply machine learning techniques to build a powerful recommendation engine.
  • Mahout and Recommendations

    1. 1. 1©MapR Technologies 2013- Confidential Introduction to Mahout And How To Build a Recommender
    2. 2. 2©MapR Technologies 2013- Confidential Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Tonight Hash tag - #dfwbd #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR
    3. 3. 3©MapR Technologies 2013- Confidential Requested Topic For Tonight  What is Mahout?  What makes it different?  How can big data technology solve impossible problems?  How is big data affecting the world?
    4. 4. 4©MapR Technologies 2013- Confidential Also  What is MapR?  What is MapR doing?  How does MapR’s technology work?  How are customers making use of MapR?  How can anyone make use of MapR to solve problems?
    5. 5. 5©MapR Technologies 2013- Confidential Oh … Also This  Detailed break-down of a live machine learning system running with Mahout on MapR  With code examples
    6. 6. 6©MapR Technologies 2013- Confidential I may have to summarize
    7. 7. 7©MapR Technologies 2013- Confidential I may have to summarize just a bit
    8. 8. 8©MapR Technologies 2013- Confidential Part 1: 5 minutes of math
    9. 9. 9©MapR Technologies 2013- Confidential Part 2: 12 minutes: I want a pony
    10. 10. 10©MapR Technologies 2013- Confidential Part 3: A working example
    11. 11. 11©MapR Technologies 2013- Confidential What Does Machine Learning Look Like?
    12. 12. 12©MapR Technologies 2013- Confidential What Does Machine Learning Look Like? A1 A2 é ë ù û T A1 A2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú A1 A2 é ë ù û = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T A1 A1 T A2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O(d log κ log k) for larger k, looser quality But tonight we’re going to show you how to keep it simple yet powerful…
    13. 13. 13©MapR Technologies 2013- Confidential Comparison of Three Main ML Topics  Recommendation: – Involves observation of interactions between people taking action (users) and items for input data to the recommender model – Goal is to suggest additional appropriate or desirable interactions – Applications include: movie, music or map-based restaurant choices; suggesting sale items for e-stores or via cash-register receipts
    14. 14. 14©MapR Technologies 2013- Confidential
    15. 15. 15©MapR Technologies 2013- Confidential
    16. 16. 16©MapR Technologies 2013- Confidential Part 1: A bit of math (the math of bits)
    17. 17. 17©MapR Technologies 2013- Confidential Mahout Math  Goals are – basic linear algebra, – and statistical sampling, – and good clustering, – decent speed, – extensibility, – especially for sparse data  But not – totally badass speed – comprehensive set of algorithms – optimization, root finders, quadrature
    18. 18. 18©MapR Technologies 2013- Confidential Matrices and Vectors  At the core: – DenseVector, RandomAccessSparseVector – DenseMatrix, SparseRowMatrix  Highly composable API  Important ideas: – view*, assign and aggregate – iteration m.viewDiagonal().assign(v)
    19. 19. 19©MapR Technologies 2013- Confidential Assign? View?  Why assign? – Copying is the major cost for naïve matrix packages – In-place operations critical to reasonable performance – Many kinds of updates required, so functional style very helpful  Why view? – In-place operations often required for blocks, rows, columns or diagonals – With views, we need #assign + #views methods – Without views, we need #assign x #views methods  Synergies – With both views and assign, many loops become single line
    20. 20. 24©MapR Technologies 2013- Confidential Examples A =a A =aB+ b double alpha; a.assign(alpha); a.assign(b, Functions.chain( Functions.plus(beta), Functions.times(alpha));
    21. 21. 26©MapR Technologies 2013- Confidential More Examples  The trace of a matrix  Set diagonal to zero  Set diagonal to negative of row sums
    22. 22. 27©MapR Technologies 2013- Confidential Examples  The trace of a matrix  Set diagonal to zero  Set diagonal to negative of row sums m.viewDiagonal().zSum()
    23. 23. 28©MapR Technologies 2013- Confidential Examples  The trace of a matrix  Set diagonal to zero  Set diagonal to negative of row sums m.viewDiagonal().zSum() m.viewDiagonal().assign(0)
    24. 24. 29©MapR Technologies 2013- Confidential Examples  The trace of a matrix  Set diagonal to zero  Set diagonal to negative of row sums excluding the diagonal m.viewDiagonal().zSum() m.viewDiagonal().assign(0) Vector diag = m.viewDiagonal().assign(0); diag.assign(m.rowSums().assign(Functions.MINUS));
    25. 25. 32©MapR Technologies 2013- Confidential Clustering and Such  Streaming k-means and ball k-means – streaming reduces very large data to a cluster sketch – ball k-means is a high quality k-means implementation – the cluster sketch is also usable for other applications – single machine threaded and map-reduce versions available  SVD and friends – stochastic SVD has in-memory, single machine out-of-core and map-reduce versions – good for reducing very large sparse matrices to tall skinny dense ones  Spectral clustering – based on SVD, allows massive dimensional clustering
    26. 26. 33©MapR Technologies 2013- Confidential Mahout Math Summary  Matrices, Vectors – views – in-place assignment – aggregations – iterations  Functions – lots built-in – cooperate with sparse vector optimizations  Sampling – abstract samplers – samplers as functions  Other stuff … clustering, SVD
    27. 27. 34©MapR Technologies 2013- Confidential Part 2: How recommenders work (I still want a pony)
    28. 28. 35©MapR Technologies 2013- Confidential Recommendations Behavior of a crowd helps us understand what individuals will do
    29. 29. 36©MapR Technologies 2013- Confidential Recommendations Alice got an apple and a puppy Charles got a bicycle Alice Charles
    30. 30. 37©MapR Technologies 2013- Confidential Recommendations Alice got an apple and a puppy Charles got a bicycle Bob got an apple Alice Bob Charles
    31. 31. 38©MapR Technologies 2013- Confidential Recommendations What else would Bob like?? Alice Bob Charles
    32. 32. 39©MapR Technologies 2013- Confidential Recommendations What if everybody gets a pony? Now what does Bob want? ? Alice Bob Charles
    33. 33. 40©MapR Technologies 2013- Confidential Log Files Alice Bob Charles Alice Bob Charles Alice
    34. 34. 41©MapR Technologies 2013- Confidential Log Files u1 u3 u2 u1 u3 u2 u1 t1 t2 t3 t4 t3 t3 t1
    35. 35. 42©MapR Technologies 2013- Confidential Log Files and Dimensions u1 u3 u2 u1 u3 u2 u1 t1 t2 t3 t4 t3 t3 t1 t1 t2 t3 t4 Things u1 Alice Bob Charles u3 u2 Users
    36. 36. 43©MapR Technologies 2013- Confidential History Matrix Alice Bob Charles ✔ ✔ ✔ ✔ ✔ ✔ ✔
    37. 37. 44©MapR Technologies 2013- Confidential Cooccurrence Matrix 1 2 1 1 1 1 2 1
    38. 38. 45©MapR Technologies 2013- Confidential Indicator Matrix ✔
    39. 39. 46©MapR Technologies 2013- Confidential Indicator Matrix ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1)
    40. 40. 47©MapR Technologies 2013- Confidential Problems with Raw Cooccurrence  Very popular items co-occur with everything – Welcome document – Elevator music  That isn’t interesting – We want anomalous cooccurrence
    41. 41. 48©MapR Technologies 2013- Confidential Recommendation Basics  Coocurrence t3 not t3 t1 2 1 not t1 1 1
    42. 42. 49©MapR Technologies 2013- Confidential Spot the Anomaly  Root LLR is roughly like standard deviations A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 0.44 0.98 2.26 7.15
    43. 43. 50©MapR Technologies 2013- Confidential A Quick Simplification  Users who do h (a vector of things a user has done)  Also do r Ah AT Ah( ) AT A( )h User-centric recommendations (transpose translates back to things) Item-centric recommendations (change the order of operations) A translates things into users
    44. 44. 51©MapR Technologies 2013- Confidential Symmetry Gives Cross Recommentations AT A( )h BT A( )h Conventional recommendations with off-line learning Cross recommendations
    45. 45. 52©MapR Technologies 2013- Confidential For example  Users enter queries (A) – (actor = user, item=query)  Users view videos (B) – (actor = user, item=video)  ATA gives query recommendation – “did you mean to ask for”  BTB gives video recommendation – “you might like these videos”
    46. 46. 53©MapR Technologies 2013- Confidential The punch-line  BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
    47. 47. 54©MapR Technologies 2013- Confidential Real-life example  Query: “Paco de Lucia”  Conventional meta-data search results: – “hombres del paco” times 400 – not much else  Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
    48. 48. 55©MapR Technologies 2013- Confidential Real-life example
    49. 49. 56©MapR Technologies 2013- Confidential Hypothetical Example  Want a navigational ontology?  Just put labels on a web page with traffic – This gives A = users x label clicks  Remember viewing history – This gives B = users x items  Cross recommend – B’A = label to item mapping  After several users click, results are whatever users think they should be
    50. 50. 57©MapR Technologies 2013- Confidential Nice. But we can do better?
    51. 51. 58©MapR Technologies 2013- Confidential Ausers things
    52. 52. 59©MapR Technologies 2013- Confidential A1 A2 é ë ù û users thing type 1 thing type 2
    53. 53. 60©MapR Technologies 2013- Confidential A1 A2 é ë ù û T A1 A2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú A1 A2 é ë ù û = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T A1 A1 T A2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú
    54. 54. 61©MapR Technologies 2013- Confidential Part 3: What about that worked example?
    55. 55. 62©MapR Technologies 2013- Confidential Metrics and logs (5) Cooccurrence analysis (7) Post to search engine (8) Search engine (4) Presentation tier (2) User behavior generator (1) Session collector (3) History collector (6) Diagnostic browsing (9) http://bit.ly/18vbbaT
    56. 56. 63©MapR Technologies 2013- Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Complete history Analyze with Map-Reduce
    57. 57. 64©MapR Technologies 2013- Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history Deploy with Conventional Search System
    58. 58. 65©MapR Technologies 2013- Confidential Objective Results  At a very large credit card company  History is all transactions  Development time to minimal viable product about 4 months  General release 2-3 months later  Search-based recs at or equal in quality to other techniques
    59. 59. 66©MapR Technologies 2013- Confidential Summary  Input: Multiple kinds of behavior on one set of things  Output: Recommendations for one kind of behavior with a different set of things  Cross recommendation is a special case
    60. 60. 67©MapR Technologies 2013- Confidential Objective Results  At a very large credit card company  History is all transactions  Development time to minimal viable product about 4 months  General release 2-3 months later  Search-based recs at or equal in quality to other techniques
    61. 61. 68©MapR Technologies 2013- Confidential Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG tdunning@{apache.org,maprtech.com} ted.dunning@gmail.com  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Tonight Hash tag - #dfwbd #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×