Data Mining and Recommendation Systems

3,549 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,549
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
135
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Mining and Recommendation Systems

  1. 1. Data Mining and Recommendation Systems - S A L IL NAVG IR E
  2. 2. Introduction • Discovery of models for data • Example if the data is set of numbers then we assume that the data comes from Gaussian and model the parameters to define it completely • Recognize meaningful patterns in data -> data mining Predict outcome from known patterns -> ML
  3. 3. Data Mining Techniques • Classification • Predicting the class of new item given set of items with several classes and past instances • Example loan approval based on decision tree classifiers Job Engineer Carpenter Income <30K Bad >50K Good Income <40K Bad >90K Good Doctor Income >100K <50K Bad Good
  4. 4. • Clustering • Clustering algorithms find group of items that are similar • Basically divides a dataset so that records with similar content are in the same group and group are as different as possible from each other • K-Nearest Neighbor – a classification method that clasifies based on calculating the distances between point and other points in the training dataset • Example Car Sales
  5. 5. • Regression • Deals with prediction of value rather than class • Given x1, x2, x3….. Predict Y • Use Linear regression and predict variables a0, a1, a2… in Y=a0+a1x1+a2x2….. • Use Line fitting, Curve fitting methods • Example find a relationship between smoking patients and cancer related illness
  6. 6. • Association Rules • These algorithms create rules that describe how often events have occurred together • Example when a customer buys a hammer then 90% of the time they buy nails • Spam classification based on conditional probability • Support is a measure of what fraction of the population satisfies both the antecedent and the consequent of the rule • Confidence is the measure of how often the consequent is true when the antecedent is true • Outlier Analysis • Most Data mining methods discard outliers as noise or exceptions • However in some applications such as fraud detection, these rare events can be more interesting
  7. 7. Knowledge Discovery Process • Data Collection • Data Cleaning • Data Integration • Data selection • Data transformation • Data Mining • Evaluation • Knowledge presentation
  8. 8. Applications of Data Mining • Marketing • Manufacturing • Analysis of consumer behavior • Optimization of resources • Advertising campaigns • Optimization of manufacturing processes • Targeted mailings • Segmentation of customers, stores, or products • Finance • Product design based on customer requirements • Health Care • Creditworthiness of clients • Discovering patterns in X-ray images • Performance analysis of finance investments • Analyzing side effects of drugs • Fraud detection • Effectiveness of treatments
  9. 9. Privacy Concerns • Effective Data Mining requires large sources of data • To achieve a wide spectrum of data, link multiple data sources • Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked: • Shopping History • Credit History • Bank History • Employment History • The users life story can be painted from the collected data
  10. 10. Recommendation systems • Definition – RS are subclass of information filtering systems that seek to predict the rating or preference that user would give to an item • Enhance user experience by assisting user in finding information and reduce search and navigation time • Increase productivity and credibility • Decrease Long tail phenomenon • Types of RS • Content based RS • Collaborative filtering RS • Hybrid RS
  11. 11. • Content based RS • Recommend items similar to those users preferred in the past • User profiling is the key • Items/content usually denoted by keywords • Limitations • Not all contents well represented by keywords (e.g Images) • unrated items not shown • Users with thousands of purchases is a problem • Example: Pandora uses properties of a song in the Music Genome Project to play similar songs
  12. 12. • Collaborative Filtering method • Uses other users rating for recommendation • Key is to find users/user groups whose interests match with the current user • More users, more ratings: better results • Limitations • Cold Start problem • Large computation power required • Sparsity • Example: Last.fm or Spotify recommend songs based on user listening history and comparing with other users. Facebook, LinkedIn use collaborative filtering to recommend new friends and connections
  13. 13. • Hybrid RS • There are some cases where combining content based and collaborative filtering are more effective • Can overcome the sparsity and cold start problem • Netflix Prize: offered a prize of 1 million to team that could increase the Netflix rating by 10%. The competition spanned from 2006-2009 won by BellKor's Pragmatic Chaos who used ensemble of 107 algorithms for single prediction! • Amazon item to item collaboration • Compute similarity between item pairs • Combine the similar items into recommendation list • Vector corresponds to an item, and directions correspond to customers who have purchased them • Similar items table built offline
  14. 14. • Measuring similarity
  15. 15. Examples • E-Commerce: Amazon.com, Ebay, Etsy. • Music: Spotify, Pandora. • Movie: Nettfilx.com, IMDB. • News: Digg, Summly. • Social Networks: LinkedIn, Facebook, Quora, YouTube • Apps: Playstore, Cover

×