Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Mining and Recommendation Systems


Published on

Published in: Technology
  • Follow the link, new dating source: ❤❤❤ ❤❤❤
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ♥♥♥ ♥♥♥
    Are you sure you want to  Yes  No
    Your message goes here

Data Mining and Recommendation Systems

  1. 1. Data Mining and Recommendation Systems - S A L IL NAVG IR E
  2. 2. Introduction • Discovery of models for data • Example if the data is set of numbers then we assume that the data comes from Gaussian and model the parameters to define it completely • Recognize meaningful patterns in data -> data mining Predict outcome from known patterns -> ML
  3. 3. Data Mining Techniques • Classification • Predicting the class of new item given set of items with several classes and past instances • Example loan approval based on decision tree classifiers Job Engineer Carpenter Income <30K Bad >50K Good Income <40K Bad >90K Good Doctor Income >100K <50K Bad Good
  4. 4. • Clustering • Clustering algorithms find group of items that are similar • Basically divides a dataset so that records with similar content are in the same group and group are as different as possible from each other • K-Nearest Neighbor – a classification method that clasifies based on calculating the distances between point and other points in the training dataset • Example Car Sales
  5. 5. • Regression • Deals with prediction of value rather than class • Given x1, x2, x3….. Predict Y • Use Linear regression and predict variables a0, a1, a2… in Y=a0+a1x1+a2x2….. • Use Line fitting, Curve fitting methods • Example find a relationship between smoking patients and cancer related illness
  6. 6. • Association Rules • These algorithms create rules that describe how often events have occurred together • Example when a customer buys a hammer then 90% of the time they buy nails • Spam classification based on conditional probability • Support is a measure of what fraction of the population satisfies both the antecedent and the consequent of the rule • Confidence is the measure of how often the consequent is true when the antecedent is true • Outlier Analysis • Most Data mining methods discard outliers as noise or exceptions • However in some applications such as fraud detection, these rare events can be more interesting
  7. 7. Knowledge Discovery Process • Data Collection • Data Cleaning • Data Integration • Data selection • Data transformation • Data Mining • Evaluation • Knowledge presentation
  8. 8. Applications of Data Mining • Marketing • Manufacturing • Analysis of consumer behavior • Optimization of resources • Advertising campaigns • Optimization of manufacturing processes • Targeted mailings • Segmentation of customers, stores, or products • Finance • Product design based on customer requirements • Health Care • Creditworthiness of clients • Discovering patterns in X-ray images • Performance analysis of finance investments • Analyzing side effects of drugs • Fraud detection • Effectiveness of treatments
  9. 9. Privacy Concerns • Effective Data Mining requires large sources of data • To achieve a wide spectrum of data, link multiple data sources • Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked: • Shopping History • Credit History • Bank History • Employment History • The users life story can be painted from the collected data
  10. 10. Recommendation systems • Definition – RS are subclass of information filtering systems that seek to predict the rating or preference that user would give to an item • Enhance user experience by assisting user in finding information and reduce search and navigation time • Increase productivity and credibility • Decrease Long tail phenomenon • Types of RS • Content based RS • Collaborative filtering RS • Hybrid RS
  11. 11. • Content based RS • Recommend items similar to those users preferred in the past • User profiling is the key • Items/content usually denoted by keywords • Limitations • Not all contents well represented by keywords (e.g Images) • unrated items not shown • Users with thousands of purchases is a problem • Example: Pandora uses properties of a song in the Music Genome Project to play similar songs
  12. 12. • Collaborative Filtering method • Uses other users rating for recommendation • Key is to find users/user groups whose interests match with the current user • More users, more ratings: better results • Limitations • Cold Start problem • Large computation power required • Sparsity • Example: or Spotify recommend songs based on user listening history and comparing with other users. Facebook, LinkedIn use collaborative filtering to recommend new friends and connections
  13. 13. • Hybrid RS • There are some cases where combining content based and collaborative filtering are more effective • Can overcome the sparsity and cold start problem • Netflix Prize: offered a prize of 1 million to team that could increase the Netflix rating by 10%. The competition spanned from 2006-2009 won by BellKor's Pragmatic Chaos who used ensemble of 107 algorithms for single prediction! • Amazon item to item collaboration • Compute similarity between item pairs • Combine the similar items into recommendation list • Vector corresponds to an item, and directions correspond to customers who have purchased them • Similar items table built offline
  14. 14. • Measuring similarity
  15. 15. Examples • E-Commerce:, Ebay, Etsy. • Music: Spotify, Pandora. • Movie:, IMDB. • News: Digg, Summly. • Social Networks: LinkedIn, Facebook, Quora, YouTube • Apps: Playstore, Cover