Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Predictive analytics in action real... by The Marketing Dis... 13589 views
- Predictive Analytics using R by Jeffrey Stricklan... 11803 views
- Practical Predictive Analytics by Jen Underwood 29437 views
- Predictive Analytics - An Overview by MachinePulse 14169 views
- Introduction To Predictive Analytic... by jayroy 15123 views
- Predictive Analytics: Context and U... by Kimberley Mitchell 15396 views

20,231 views

Published on

Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on April 1, 2010 (no joke!) :)

Published in:
Technology

No Downloads

Total views

20,231

On SlideShare

0

From Embeds

0

Number of Embeds

1,528

Shares

0

Downloads

9

Comments

1

Likes

41

No embeds

No notes for slide

- 1. BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com
- 2. Outline Predictivemodeling methodology k-Nearest Neighbor (kNN) algorithm Singular value decomposition (SVD) method for dimensionality reduction Using a synthetic data set to test and improve your model Experiment and results 2
- 3. The Business Problem Design product recommender solution that will increase revenue. $$ 3
- 4. How Do We Increase Revenue? Increase Conversion Increase Revenue Increase Unit Price Increase Avg. Order Value Increase Units / Order 4
- 5. Example Is this recommendation effective? Increase Unit Price Increase Units / Order 5
- 6. What am I going to do? 6
- 7. Predictive Model Framework ML Prediction Data Features Algorithm Output What data? What feature? Which Algorithm ? Cross-sell & Up-sell Recommendation 7
- 8. What Data to Use? Explicit data Ratings Comments Implicit data Order history / Return history Cart events Page views Click-thru Search log In today’s talk we only use Order history and Cart events 8
- 9. Predictive Model ML Prediction Data Features Algorithm Output Order History What feature? Which Algorithm ? Cross-sell & Up-sell Cart Events Recommendation 9
- 10. What Features to Use? We know that a given product tends to get purchased by customers with similar tastes or needs. Use user engagement data to describe a product. users 1 2 3 4 5 6 7 8 9 10 … n item 17 1 .25 .25 1 .25 user engagement vector 10
- 11. Data Representation / Features When we merge every item’s user engagement vector, we got a m x n item-user matrix users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 .25 2 .25 items 3 1 .25 1 4 .25 1 .25 1 1 1 … m 11
- 12. Data Normalization Ensurethe magnitudes of the entries in the dataset matrix are appropriate users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .5 1 .9 1 .92 1 .49 2 1 .79 items 3 1 .67 1 .46 1 .73 4 1 .39 1 .82 1 .76 1 .69 1 1 … … .52 .8 m Removecolumn average – so frequent buyers 12 don’t dominate the model
- 13. Data Normalization Differentengagement data points (Order / Cart / Page View) should have different weights Common normalization strategies: Remove column average Remove row average Remove global mean Z-score Fill-in the null values 13
- 14. Predictive Model ML Prediction Data Features Algorithm Output Order History User engagement Which Algorithm ? Cross-sell & Up-sell Cart Events vector Recommendation Data Normalization 14
- 15. Which Algorithm? How do we find the items that have similar user engagement data? users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 1 2 1 items 17 1 1 1 .25 .25 18 1 .25 1 1 1 1 … .25 m We can find the items that have similar user 15 engagement vectors with kNN algorithm
- 16. k-Nearest Neighbor (kNN) Find the k items that have the most similar user engagement vectors users 1 2 3 4 5 6 7 8 9 10 … n 1 .5 1 1 1 2 1 .5 1 items 3 1 1 1 1 4 1 .5 1 1 .5 1 … m 1 .5 Nearest Neighbors of Item 4 = [2,3,1] 16
- 17. Similarity Measure for kNN users 1 2 3 4 5 6 7 8 9 10 … n items 2 1 .5 1 4 1 .5 1 1 Jaccard coefficient: (1+ 1) sim(a,b) = (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1) Cosine similarity: a•b (1*1+ 0.5 *1) sim(a,b) = cos(a,b) = = € a ∗ b (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 ) 2 2 Pearson Correlation: € corr(a,b) = ∑ (r − r )(r − r ) i ai a bi b = m∑ aibi − ∑ ai ∑ bi ∑ (r − r ) ∑ (r − r ) i ai a 2 i bi b 2 m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2 17 match _ cols * Dotprod(a,b) − sum(a) * sum(b) = match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
- 18. k-Nearest Neighbor (kNN) Item feature space Similarity Measure (cosine similarity) 7 9 8 2 1 4 6 5 3 kNN k=5 Nearest Neighbors(8) = [9,6,3,1,2] 18
- 19. Predictive Model Ver. 1: kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbor Cross-sell & Up-sell Cart Events vector (kNN) Recommendation Data Normalization 19
- 20. Cosine Similarity – Code fragment long i_cnt = 100000; // number of items 100K long u_cnt = 2000000; // number of users 2M double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation) double norm[i_cnt]; // assume data matrix is loaded …… // calculate vector norm for each user engagement vector for (i=0; i<i_cnt; i++) { norm[i] = 0; for (f=0; f<u_cnt; f++) { norm[i] += data[i][f] * data [i][f]; } 1. 100K rows x 100K rows x 2M features --> scalability problem norm[i] = sqrt(norm[i]); kd-tree, Locality sensitive hashing, } MapReduce/Hadoop, Multicore/Threading, Stream Processors // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem are for (j=0; j<i_cnt; j++) { // loop thru 100K dot_product = 0; This leads us to The SVD dimensionality reduction ! for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M dot_product += data[i][f] * data[j][f]; } printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j])); } 20 // find the Top K nearest neighbors here …….
- 21. Singular Value Decomposition (SVD) A = U × S ×VT A U S VT m x n matrix m x r matrix r x r matrix r x n matrix € items items rank = k k<r users users users Ak = U k × Sk × VkT Low rank approx. Item profile is U k * Sk items Low rank approx. User profile is S k *VkT 21 € Low rank approx. Item-User matrix is € U k * Sk * Sk *VkT €
- 22. Reduced SVD Ak = U k × Sk × VkT Ak Uk Sk VkT 100K x 2M matrix 100K x 3 matrix 3 x 3 matrix 3 x 2M matrix 7 0 0 0 3 0 items items 0 0 1 users rank = 3 Descending Singular Values users Low rank approx. Item profile is U k * Sk 22 €
- 23. SVD Factor Interpretation S 3 x 3 matrix Singular values plot (rank=512) 7 0 0 0 3 0 0 0 1 Descending Singular Values 23 More Significant Latent Factors Noises + Others Less Significant
- 24. SVD Dimensionality Reduction U k * Sk <----- latent factors -----> # of users € items 3 rank Need to find the most optimal low rank !! 10 24
- 25. Missing values Difference between “0” and “unknown” Missing values do NOT appear randomly. Value = (Preference Factors) + (Availability) – (Purchased elsewhere) – (Navigation inefficiency) – etc. Approx. Value = (Preference Factors) +/- (Noise) Modeling missing values correctly will help us make good recommendations, especially when working with an extremely sparse data set 25
- 26. Singular Value Decomposition (SVD) Use SVD to reduce dimensionality, so neighborhood formation happens in reduced user space SVD helps model to find the low rank approx. dataset matrix, while retaining the critical latent factors and ignoring noise. Optimal low rank needs to be tuned SVD is computationally expensive SVD Libraries: Matlab [U, S, V] = svds(A,256); SVDPACKC http://www.netlib.org/svdpack/ SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/ GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html 26
- 27. Predictive Model Ver. 2: SVD+kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbors Cross-sell & Up-sell Cart Events vector (kNN) in reduced Recommendation space Data Normalization SVD 27
- 28. Synthetic Data Set Why do we use synthetic data set? Sowe can test our new model in a controlled environment 28
- 29. Synthetic Data Set 16latent factors synthetic e-commerce data set Dimension: 1,000 (items) by 20,000 (users) 16 user preference factors 16 item property factors (non-negative) Txn Set: n = 55,360 sparsity = 99.72 % Txn+Cart Set: n = 192,985 sparsity = 99.03% Download: http://www.IntelligentMining.com/dataset/ user_id item_id type 10 42 0.25 10 997 0.25 10 950 0.25 29 11 836 0.25 11 225 1
- 30. Synthetic Data Set Item property User preference Purchase Likelihood score 1K x 20K matrix factors factors 1K x 16 matrix 16 x 20K matrix X11 X12 X13 X14 X15 X16 x X21 X22 X12 X24 X25 X26 y items X31 X32 X33 X34 X35 X36 a b c z X41 X42 X43 X44 X45 X46 X51 X52 X53 X54 X55 X56 users X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z X32 = Likelihood of Item 3 being purchased by User 2 30
- 31. Synthetic Data Set X11 X31 X51 Based on the distribution, pre-determine # of items X21 X41 purchased by an user X41 (# of item=2) X31 Sort by Purchase X21 From the top, select and skip X31 likelihood Score certain items to create data X41 X51 sparsity. X21 X51 X11 X11 User 1 purchased Item 4 and Item 1 31
- 32. Experiment Setup Each model (Random / kNN / SVD+kNN) will generate top 20 recommendations for each item. Compare model output to the actual top 20 provided by synthetic data set Evaluation Metrics : Precision %: Overlapping of the top 20 between model output and actual (higher the better) {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items} Precision = {Found _ Top20 _ items} Quality metric: Average of the actual ranking in the model output (lower the better) € 32 1 2 30 47 50 21 1 2 368 62 900 510
- 33. Experimental Result kNN vs. Random (Control) Precision % Quality (higher is better) (Lower is better) 33
- 34. Experimental Result Precision % of SVD+kNN Recall % (higher is better) Improvement 34 SVD Rank
- 35. Experimental Result Quality of SVD+kNN Quality (Lower is better) Improvement 35 SVD Rank
- 36. Experimental Result The effect of using Cart data Precision % (higher is better) 36 SVD Rank
- 37. Experimental Result The effect of using Cart data Quality (Lower is better) 37 SVD Rank
- 38. Outline Predictivemodeling methodology k-Nearest Neighbor (kNN) algorithm Singular value decomposition (SVD) method for dimensionality reduction Using a synthetic data set to test and improve your model Experiment and results 38
- 39. References J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," in Proceedings of the Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI 1998), 1998. B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the Tenth International Conference on the World Wide Web (WWW 10), pp. 285-295, 2001. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of Dimensionality Reduction in Recommender System A Case Study" In ACM WebKDD 2000 Web Mining for E-Commerce Workshop Apache Lucene Mahout http://lucene.apache.org/mahout/ Cofi: A Java-Based Collaborative Filtering Library http://www.nongnu.org/cofi/ 39
- 40. Thank you Any question or comment? 40

No public clipboards found for this slide

While you're there, also check out our blog.

Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on April 1, 2010 (no joke!) :)