Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Data Mining and Recommendation Systems

485
views

Published on

Published in: Technology

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
485
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
39
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Data Mining and Recommendation Systems - S A L IL NAVG IR E
• 2. Introduction • Discovery of models for data • Example if the data is set of numbers then we assume that the data comes from Gaussian and model the parameters to define it completely • Recognize meaningful patterns in data -> data mining Predict outcome from known patterns -> ML
• 3. Data Mining Techniques • Classification • Predicting the class of new item given set of items with several classes and past instances • Example loan approval based on decision tree classifiers Job Engineer Carpenter Income <30K Bad >50K Good Income <40K Bad >90K Good Doctor Income >100K <50K Bad Good
• 4. • Clustering • Clustering algorithms find group of items that are similar • Basically divides a dataset so that records with similar content are in the same group and group are as different as possible from each other • K-Nearest Neighbor – a classification method that clasifies based on calculating the distances between point and other points in the training dataset • Example Car Sales
• 5. • Regression • Deals with prediction of value rather than class • Given x1, x2, x3….. Predict Y • Use Linear regression and predict variables a0, a1, a2… in Y=a0+a1x1+a2x2….. • Use Line fitting, Curve fitting methods • Example find a relationship between smoking patients and cancer related illness
• 6. • Association Rules • These algorithms create rules that describe how often events have occurred together • Example when a customer buys a hammer then 90% of the time they buy nails • Spam classification based on conditional probability • Support is a measure of what fraction of the population satisfies both the antecedent and the consequent of the rule • Confidence is the measure of how often the consequent is true when the antecedent is true • Outlier Analysis • Most Data mining methods discard outliers as noise or exceptions • However in some applications such as fraud detection, these rare events can be more interesting
• 7. Knowledge Discovery Process • Data Collection • Data Cleaning • Data Integration • Data selection • Data transformation • Data Mining • Evaluation • Knowledge presentation
• 8. Applications of Data Mining • Marketing • Manufacturing • Analysis of consumer behavior • Optimization of resources • Advertising campaigns • Optimization of manufacturing processes • Targeted mailings • Segmentation of customers, stores, or products • Finance • Product design based on customer requirements • Health Care • Creditworthiness of clients • Discovering patterns in X-ray images • Performance analysis of finance investments • Analyzing side effects of drugs • Fraud detection • Effectiveness of treatments
• 9. Privacy Concerns • Effective Data Mining requires large sources of data • To achieve a wide spectrum of data, link multiple data sources • Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked: • Shopping History • Credit History • Bank History • Employment History • The users life story can be painted from the collected data
• 10. Recommendation systems • Definition – RS are subclass of information filtering systems that seek to predict the rating or preference that user would give to an item • Enhance user experience by assisting user in finding information and reduce search and navigation time • Increase productivity and credibility • Decrease Long tail phenomenon • Types of RS • Content based RS • Collaborative filtering RS • Hybrid RS
• 11. • Content based RS • Recommend items similar to those users preferred in the past • User profiling is the key • Items/content usually denoted by keywords • Limitations • Not all contents well represented by keywords (e.g Images) • unrated items not shown • Users with thousands of purchases is a problem • Example: Pandora uses properties of a song in the Music Genome Project to play similar songs
• 12. • Collaborative Filtering method • Uses other users rating for recommendation • Key is to find users/user groups whose interests match with the current user • More users, more ratings: better results • Limitations • Cold Start problem • Large computation power required • Sparsity • Example: Last.fm or Spotify recommend songs based on user listening history and comparing with other users. Facebook, LinkedIn use collaborative filtering to recommend new friends and connections
• 13. • Hybrid RS • There are some cases where combining content based and collaborative filtering are more effective • Can overcome the sparsity and cold start problem • Netflix Prize: offered a prize of 1 million to team that could increase the Netflix rating by 10%. The competition spanned from 2006-2009 won by BellKor's Pragmatic Chaos who used ensemble of 107 algorithms for single prediction! • Amazon item to item collaboration • Compute similarity between item pairs • Combine the similar items into recommendation list • Vector corresponds to an item, and directions correspond to customers who have purchased them • Similar items table built offline
• 14. • Measuring similarity
• 15. Examples • E-Commerce: Amazon.com, Ebay, Etsy. • Music: Spotify, Pandora. • Movie: Nettfilx.com, IMDB. • News: Digg, Summly. • Social Networks: LinkedIn, Facebook, Quora, YouTube • Apps: Playstore, Cover