Upcoming SlideShare
Loading in …5
×

# MIW Chapter 8.ppt

650 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total views
650
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
• Two uses for customer data: Assist customers in locating their target selections “ Encourage” customers to make certain selections
• V^{a,j} = value of missing vote in target user vector, v{a} = ?, C = normalizing constant = SUM{I=1 to n}, w{a,I} = weight impact for comparing users I and a, v*{I,j} = v{I,j} – v{I}, v{I} = (1/|L{I}|)SUM{for each j in L{I}}( v{I,j}) where j is a member of LL{a} (which is the total set of items number of items in the set of items the user has voted on), a = target user, I = user in the training data set
• K = # mixture components P(c=k) probability that a randomly chosen row belongs to component k
• ### MIW Chapter 8.ppt

1. 1. E-Commerce
2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Customer Data on the Web </li></ul><ul><li>Automated Recommender Systems </li></ul><ul><li>Networks and Recommendations </li></ul><ul><li>Web Path Analysis for Purchase Prediction </li></ul>
3. 3. Introduction <ul><li>Some Motivating Questions </li></ul><ul><ul><li>Can we design algorithms to help recommend new products to visitors based on their browsing behavior? </li></ul></ul><ul><ul><li>Can we better understand factors influencing how customers make purchases on a website? </li></ul></ul><ul><ul><li>Can we predict in real time who will make purchases based on their observed navigation patterns? </li></ul></ul>
4. 4. Customer Data on the Web <ul><li>Data collection on client, server sides and anywhere in between </li></ul><ul><li>Goal determine who is purchasing what products </li></ul><ul><li>Tracking customer data </li></ul><ul><ul><li>Web logs, E-Commerce logs, cookies, explicit login </li></ul></ul><ul><ul><li>Data then used to provide personalized content to site users to: </li></ul></ul><ul><ul><ul><li>Assist customers in locating their target selections </li></ul></ul></ul><ul><ul><ul><li>“ Encourage” customers to make certain selections </li></ul></ul></ul>
5. 5. Automated Recommender Systems <ul><li>Problem framed in two ways </li></ul><ul><ul><li>Users ‘vote’ for pages/items (binary) </li></ul></ul><ul><ul><li>Users rank pages/items (multivalued) </li></ul></ul><ul><li>Results are captured in a generally sparse matrix (users x items) </li></ul><ul><li>Complication: no votes can occur because users do not vote on items they do not like (Breeze, et al 1998) </li></ul><ul><ul><li>Ignored by most recommender systems </li></ul></ul>
6. 6. Automated Recommender Systems
7. 7. Evaluating Recommender Systems <ul><li>Cautions in data interpretation </li></ul><ul><ul><li>Users may purchase items regardless of recommendations </li></ul></ul><ul><ul><li>Users may also avoid purchases they might have made based on recommendations </li></ul></ul><ul><li>Approaches to recommender algorithms </li></ul><ul><ul><li>Nearest-neighbor </li></ul></ul><ul><ul><li>Model-based collaborative filtering </li></ul></ul><ul><ul><li>Others? </li></ul></ul>
8. 8. Nearest-Neighbor Collaborative Filtering <ul><li>Basic principle: utilize user’s vote history to predict future votes/recommendations </li></ul><ul><ul><li>Find most similar users to the target user in the training matrix and fill in the target user’s missing vote values based on these “nearest-neighbors” </li></ul></ul><ul><li>A typical normalized prediction scheme: </li></ul><ul><li>goal: predict vote for item ‘j’ based on other users, weighted towards those with similar past votes as target user ‘a’ </li></ul>
9. 9. Nearest-Neighbor Collaborative Filtering <ul><li>Another challenge: defining weights </li></ul><ul><ul><li>What is “the most optimal weight calculation” to use? </li></ul></ul><ul><ul><ul><li>Requires fine tuning of weighting algorithm for the particular data set </li></ul></ul></ul><ul><ul><li>What do we do when the target user has not voted enough to provide a reliable set of nearest-neighbors? </li></ul></ul><ul><ul><ul><li>One approach: use default votes (popular items) to populate matrix on items neither the target user nor the nearest-neighbor have voted on </li></ul></ul></ul><ul><ul><ul><li>A different approach: model-based prediction using Dirichlet priors to smooth the votes (see chapter 7) </li></ul></ul></ul><ul><ul><ul><li>Other factors include relative vote counts for all items between users, thresholding, clustering (see Sarwar, 2000) </li></ul></ul></ul>
10. 10. Nearest-Neighbor Collaborative Filtering <ul><li>Structure based recommendations </li></ul><ul><ul><li>Recommendations based on similarities between items with positive votes (as opposed to votes of other users) </li></ul></ul><ul><ul><li>Structure of item dependencies modeled through dimensionality reduction via singular value decomposition (SVD) aka latent semantic indexing (see chapter 4) </li></ul></ul><ul><ul><ul><li>Approximate the set of row-vector votes as a linear combination of basis column-vectors </li></ul></ul></ul><ul><ul><ul><ul><li>i.e. find the set of columns to least-squares minimize the difference between the row estimations and their true values </li></ul></ul></ul></ul><ul><ul><ul><li>Perform nearest-neighbor calculations to project predictions for all items </li></ul></ul></ul>
11. 11. Model Based Collaborative Filtering <ul><li>Recommendations based on a model of relationships between items based on historical voting patterns in the training set </li></ul><ul><ul><li>Better performance than nearest-neighbor analysis </li></ul></ul><ul><li>Joint distribution modeling </li></ul><ul><ul><li>Uses one model as basis for predictions </li></ul></ul><ul><li>Conditional distribution modeling </li></ul><ul><ul><li>A model for each item predicting future vote based on votes for each of the other items </li></ul></ul>
12. 12. Model Based Collaborative Filtering <ul><li>Joint distribution modeling: A practical approach </li></ul><ul><ul><li>Model joint distribution as a finite mixture of simpler distributions </li></ul></ul><ul><ul><li>Additional simplification is achieved by assuming that votes are independent of others within a component </li></ul></ul><ul><li>Limitation: assumes that users can be described with one model of the ‘K’ mixture components </li></ul><ul><ul><li>Hoffman and Puzicha (1999) propose a workaround asserting that each row of votes represents up to ‘K’ mixture components, rather than a single component </li></ul></ul>
13. 13. Model Based Collaborative Filtering <ul><ul><li>Another limitation: all predictions are based on the (static) training set </li></ul></ul><ul><li>Conditional distribution modeling </li></ul><ul><ul><li>Better results by creating a model for each item conditioned on the others rather than using a single joint density model </li></ul></ul><ul><ul><li>Decision trees Heckerman et al. (2000) </li></ul></ul><ul><ul><ul><li>Greedy approach to approximate tree structure </li></ul></ul></ul><ul><ul><ul><li>Predictions are made for each item not purchased or visited </li></ul></ul></ul><ul><ul><ul><li>Performance </li></ul></ul></ul><ul><ul><ul><ul><li>Accuracy nearly equal to Bayesian networks </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Offline memory usage significantly less than Bayesian networks </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Offline computation time complexity better than Bayesian networks </li></ul></ul></ul></ul>
14. 14. Model-Based Combining of Votes and Content <ul><li>Combine content-specific information with other information (e.g. structure, vote) </li></ul><ul><ul><li>Useful for determining item similarity (Mooney and Roy 2000) and creating user models </li></ul></ul><ul><ul><li>Useful when there is no vote history </li></ul></ul><ul><ul><li>Implementation (Popescul et al 2000) </li></ul></ul><ul><ul><ul><li>Extension of (Hoffman and Puzicha 1999) </li></ul></ul></ul><ul><ul><ul><li>Joint density is determined assuming a hidden latent variable making users, documents, and words conditionally independent i.e. </li></ul></ul></ul>
15. 15. Model-Based Combining of Votes and Content <ul><ul><ul><li>The hidden variable represents multiple (hidden) topics of a document </li></ul></ul></ul><ul><ul><ul><li>Conditional probabilities of the hidden parameter are calculated using EM </li></ul></ul></ul><ul><ul><ul><li>Sparsity still remains a problem for content-based modeling </li></ul></ul></ul>
16. 16. Challenges <ul><li>Noisy Data </li></ul><ul><ul><li>The same user may use multiple IP addresses/logins </li></ul></ul><ul><ul><li>Different users may use the same IP address/login </li></ul></ul><ul><li>Privacy </li></ul><ul><ul><li>No cookies! </li></ul></ul><ul><li>Changing user habits </li></ul><ul><ul><li>Previous history may not accurately predict present purchase selection </li></ul></ul><ul><li>Continuous updating of user activities </li></ul>
17. 17. Networks & Recommendation <ul><li>Word-of-Mouth </li></ul><ul><ul><li>Needs little explicit advertising </li></ul></ul><ul><ul><li>Products are recommended to friends, family, co-workers, etc. </li></ul></ul><ul><ul><li>This is the primary form of advertising behind the growth of Google </li></ul></ul>
18. 18. Email Product Recommendation <ul><li>Hotmail </li></ul><ul><ul><li>Very little direct advertising in the beginning </li></ul></ul><ul><ul><li>Launched in July 1996 </li></ul></ul><ul><ul><ul><li>20,000 subscribers after a month </li></ul></ul></ul><ul><ul><ul><li>100,000 subscribers after 3 months </li></ul></ul></ul><ul><ul><ul><li>1,000,000 subscribers after 6 months </li></ul></ul></ul><ul><ul><ul><li>12,000,000 subscribers after 18 months </li></ul></ul></ul><ul><ul><ul><li>By April 2002 Hotmail had 110 million subscribers </li></ul></ul></ul>
19. 19. Email Product Recommendation <ul><li>What was Hotmail’s primary form of advertising? </li></ul><ul><ul><li>Small link to the sign up page at the bottom of every email sent by a subscriber </li></ul></ul><ul><ul><ul><li>‘Spreading Activation’ </li></ul></ul></ul><ul><ul><ul><li>Implicit recommendation </li></ul></ul></ul>
20. 20. Spreading Activation <ul><li>Network effects </li></ul><ul><ul><li>Even if a small number of people who receive the message subscribe (~0.1%), the service will spread rapidly </li></ul></ul><ul><ul><li>This can be contrasted with the current practice of SPAM </li></ul></ul><ul><ul><ul><li>SPAM is not sent by friends, family, co-workers </li></ul></ul></ul><ul><ul><ul><li>No implicit recommendation </li></ul></ul></ul><ul><ul><ul><li>SPAM is often viewed as not providing a good service </li></ul></ul></ul>
21. 21. Modeling Spreading Activation <ul><li>Diffusion Model </li></ul><ul><ul><li>Montgomery (2002) </li></ul></ul><ul><ul><ul><li>Applied models used in marketing literature, Bass (1969) to the hotmail phenomena </li></ul></ul></ul><ul><ul><ul><li>Similar word-of-mouth networks used in selling consumer electronics such as refrigerators and televisions </li></ul></ul></ul><ul><ul><ul><li>We want to predict at time t how many individuals k(t) will adopt the product out of a population of N possible adopters </li></ul></ul></ul>
22. 22. Modeling Spreading Activation <ul><li>Diffusion Model </li></ul><ul><ul><li>Two ways individuals will subscribe </li></ul></ul><ul><ul><ul><li>Direct Advertising </li></ul></ul></ul><ul><ul><ul><ul><li>At time t , N – k(t) individuals have not subscribed </li></ul></ul></ul></ul><ul><ul><ul><ul><li>α ≥ 0 percent of these individuals will subscribe due to direct advertising </li></ul></ul></ul></ul><ul><ul><ul><li>Word-of-Mouth </li></ul></ul></ul><ul><ul><ul><ul><li>At time t , there are k(t)(N – k(t)) possible connections between subscribers and non-subscribers </li></ul></ul></ul></ul><ul><ul><ul><ul><li>β ≥ 0 percent of these connections will cause a non-subscriber to subscribe </li></ul></ul></ul></ul>
23. 23. Modeling Spreading Activation <ul><li>Combine these and we get the following expression: </li></ul><ul><li>Solve this and we get: </li></ul>
24. 24. Modeling Spreading Activation
25. 25. Modeling Spreading Activation
26. 26. Modeling Spreading Activation <ul><li>Diffusion Model </li></ul><ul><ul><li>This does not completely model the what actually occurred </li></ul></ul><ul><ul><li>However, it is simple and provides a lot of interesting (useful) information </li></ul></ul><ul><ul><li>Other work </li></ul></ul><ul><ul><ul><li>Domingos & Richardson (2001) Markov Random Field Model </li></ul></ul></ul><ul><ul><ul><li>Daley & Gani (1999) various deterministic and stochastic models </li></ul></ul></ul>
27. 27. Purchase Prediction <ul><li>We want to predict whether or not a shopper will make a purchase </li></ul><ul><ul><li>We know demographics </li></ul></ul><ul><ul><li>We know page view patterns </li></ul></ul><ul><ul><li>Can we accurately predict if the user will make a purchase or not? </li></ul></ul>
28. 28. Purchase Prediction <ul><li>Li et al. (2002) </li></ul><ul><ul><li>Study 1160 shoppers at www.barnesandnoble.com between April 1 and April 30, 2002 </li></ul></ul><ul><ul><li>The data was collected client side so they knew exactly what pages were displayed to the user </li></ul></ul><ul><ul><li>They also knew the demographics (predominantly well-educated and affluent) </li></ul></ul>
29. 29. Purchase Prediction <ul><li>Li et al. (2002) </li></ul><ul><ul><li>There were 14,512 page views which they divided into 1659 sessions </li></ul></ul><ul><ul><ul><li>Mean: 8.75 </li></ul></ul></ul><ul><ul><ul><li>Median: 5 </li></ul></ul></ul><ul><ul><ul><li>Standard deviation: 16.4 </li></ul></ul></ul><ul><ul><ul><li>Min: 1 </li></ul></ul></ul><ul><ul><ul><li>Max: 570 </li></ul></ul></ul><ul><ul><ul><li>7% of sessions contained a purchase </li></ul></ul></ul>
30. 30. Purchase Prediction <ul><li>Li et al. (2002) </li></ul><ul><ul><li>Divided the pages into 8 classes </li></ul></ul><ul><ul><ul><li>Home (H), main page </li></ul></ul></ul><ul><ul><ul><li>Account (A), account information pages </li></ul></ul></ul><ul><ul><ul><li>List (L), pages with lists of items </li></ul></ul></ul><ul><ul><ul><li>Product (P), page with a single item </li></ul></ul></ul><ul><ul><ul><li>Information (I), informational pages (shipping etc.) </li></ul></ul></ul><ul><ul><ul><li>Shopping cart (S) </li></ul></ul></ul><ul><ul><ul><li>Order (O), indicates a completed order </li></ul></ul></ul><ul><ul><ul><li>Entry or Exit (E), entering or leaving the site </li></ul></ul></ul>
31. 31. Purchase Prediction <ul><li>Li et al. (2002) </li></ul><ul><ul><li>Each session was represented by a string of the form: I H H I I L I I E </li></ul></ul><ul><ul><li>A session containing an O is considered having made a purchase </li></ul></ul><ul><ul><li>The average length of a session with a purchase was 34.5 and without was only 6.8 </li></ul></ul>
32. 32. Purchase Prediction <ul><li>Markov transition matrix </li></ul><ul><ul><li>For sessions with no purchase </li></ul></ul>
33. 33. Purchase Prediction <ul><li>Li et al. (2002) </li></ul><ul><ul><li>They did several models based on this data </li></ul></ul><ul><ul><ul><li>Tested on predicting next page and predicting a purchase </li></ul></ul></ul><ul><ul><ul><li>Best models 64% accurate at predicting next page </li></ul></ul></ul><ul><ul><ul><li>After 2 page views the best models predicted 12% true positives and 5.3% false positives </li></ul></ul></ul><ul><ul><ul><li>After 6 page views 13.1% true positives and 2.9% false positives </li></ul></ul></ul>