This project proposed a similarity measurement which is focusing on recommendation performance under the cold start problem [The problem which occurs in the recommendation for newly comer items and users, which doesn't have any recognition in the system] and also perfectly suitable for sparse data set.
This technique solves the problem of the cold start in recommender system as well as improves the performance of recommendation to the users.
Artificial intelligence in the post-deep learning era
Β
A new similarity measurement based on hellinger distance for collaborating filtering in sparse data set
1. A New Similarity Measurement based on Hellinger Distance
For Collaborating Filtering in Sparse Data Set
Submitted in Fulfillment of Requirements for the
Degree of
MASTER OF TECHNOLOGY IN
COMPUTER SCIENCE AND ENGINEERING
specialization in
Information Security
by
Prabhu Kumar (15MT000624)
Under the guidance of
Dr. Rajendra Pamula
(Assistant Professor)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY (INDIAN SCHOOL OF MINES), DHANBAD
INDIA
M AY 2017
2. Outlines
β’ Introduction of recommender system
β’ Source of information
β’ Types of recommendation system
β’ Architecture
β’ Similarity measurements
β’ Proposed method
β’ Result
β’ References
3. Introduction
What is Recommender System?
β’ Itβs generic machine learning techniques
or information filtering system which predict
the userβs preference.
4. Example of Recommender System
β’ Recommender system widely used in Movie, News, and Music recommendation etc...
5. Source of Information
β’ The data which collects for recommendation is from Content, demographic, and
social media information.
7. Types of Recommendation
1. Collaborative filtering recommendation system- It is based on the way which
humans have made decision throughout history and it is based on rating that user
has rated before using that specific items. So that, algorithm analyze their rating
predicts items for recommendation
2. Content based recommendation system- It is based on the userβs choices made
in the past in form of content that which content user liked the most in past
3. Hybrid recommendation system- Combinations of both
If A and B techniques is used for recommendation then Aβs disadvantages will fix B
and Bβs disadvantages will fix A .
11. β’ For matching process in Recommender system:
βKNN algorithm is one of most useful algorithm which is used for recommendation
the item to the usersβ
14. Similarity Measurements
β’ Cosine Similarity:
βIt measures angle between two vector of ratings, the lower the angle, higher the similarityβ
πππ(π, π) πππ
=
π π . π π
π π . π π
βA vector which has magnitude and direction.β
Drawbacks:
β’ If the two vector are on same line example a=(2,2,2,2) and b=(3,3,3,3) then the cosine value will be 1,
the similarity value will be β0β.
β’ It suffers from the co-rated items.
β’ Similarity measurement is techniques which finds the nearest neighbor for an specific active user for
further processing of recommendation.
15. β’ ACOS (Adjusted Cosine Similarity) : β Some people like to rate high even they donβt like the item very
much However some people like to rate low if they like the item too much. So, ACOS is introducedβ
πππ(π, π) π¨πͺπΆπΊ
=
π=π
πππππ ππ ππ ππβπππππ πππππ
π π π
β π π π
β (π π π
β π π π
)
π=π
πππππ ππ ππ ππβπππππ πππππ
(π π π
β π π π
) π
π=π
πππππ ππ ππ ππβπππππ πππππ
(π π π
β π π π
) π
Drawbacks:
β’ Similar rating problems
β’ Few co-rated item problems
β’ Pearsonβs co-relation : βIt finds the linear co-relation between two vector of ratingsβ
πππ(π, π) π·πͺπͺ
=
πβπ(π π,π β π π)(π π,π β π π)
πβπ(π π,π β π π) π . πβπ(π π,π β π π)π
Drawbacks:
β’ If the rating item vector is a=(2,2,2,2) and b=(1,2,3,4) or rating in vector is Flat then PCC canβt be calculate
β’ If the co-rated item 1, PCC will be β0β, So it suffer from the few co-rated items.
18. Proposed method
Hellinger Distance:
β’ It is used to quantify the similarity between two vector.
β’ The minimum hellinger distance will be zero if no item is rated by both users and all the item rated by users as
absolutely same.
β’ The value of hellinger distance will range from 0 to 2
β’ 2 is defines at H(P,Q) β€ 1 for all distance between the two users
π― π·, πΈ =
π
π π=π
π
( ππ β ππ) π
Let P = {2, 3, 1} and Q= {3, 2, 3}
So, Hellinger distance =
1
2
( 2 β 3)2 + 3 β 2 2 + ( 1 β 3)2
=
1
2
0.101021 + 0.101021 + 0.53589838 =
1
2
π 0.85903 =0.60743
19. Local references:
β’ It plays an important role to find the local information about the userβs rating.
β’ It must provide positive as well as negative co-relation between two users.
β’ It is used for finding the actual relation between two users according to their ratings.
πππ πππ π ππ , π ππ =
(π ππβπ πππ )(π ππ βπ πππ )
πβπ° π
(π ππ βπ πππ ) π
πβπ° π
(π ππβπ πππ ) π
Whereas, K is all items rated by users
rui is the rating by user u for ith item.
rvi is the rating by user v for ith item.
rmed is the average of rating by users.
20. Proposed method equation :
π π’, π£ = π» π’, π£ β
πβπ’ πβπ£
πππ ππ’π, ππ£π + π½πππππ(π’, π£)
Where,
H(u, v) is the hellinger distance
loc(rui, rvj) is the local similarities between all the userβs rating to that items
Jacard (u, v) measures the rating proportion of two users.
21. Result:
β’ In this graph, the flat item-ratings and few common rating problem is solved using proposed
method.
β’ U1 and U3 and U2-U4 is flat rating, U4-U5 is improvement of Common rating Proportion.
β’ U3 to U5 has few co-rated item problem.
Item1 Item2 Item3 Item4
User1 4 3 5 4
User2 5 3 - -
User3 4 3 4 4
User4 2 1 - -
User5 4 2 - -
22. β’ The problem of same co-rated vector and few co-rated items has improved using proposed method and
also the simultaneous difference of rating problem has been solved.
β’ U1 and U3 has same co-rated Vector, it improves using proposed method.
β’ U1 and U5 suffers from few co-rated items
β’ U4 and U5 has simultaneous difference problem.
23. β’ The problem of local similarities and proportion of rating has improved using proposed
method.
β’ U4 and U5 has proportion of rating problem in PIP which improved by proposed method.
β’ U1 and U4 has few co-rated item problems.
β’ U2 and U4 has local similarities improvement.
24. Evaluation of Proposed method in large dataset
β’ Through large dataset of Movielens, called ML-100K, there are 100,000 ratings with
943 persons and 1682 movies. Another is ML-1M, it includes 6040 users and 3952
movies with 1,000,209 ratings. Each user has rated at least 20 movies.
25. β’ The movieβs recommendation using Cosine Similarity and proposed method.
26. β’ The movieβs recommendation using PIP (proximity-impact-popularity) and
proposed method.
27. References
β’ J. Bobadilla, F. Ortega, A. Hernando, A. Gutirrez, Recommender systems survey, Knowl.-Based Syst. 46 (2013) 109β132.
β’ P. Resnick, H.R. Varian, Recommender systems, Commun. ACM 40 (3) (1997) 56β58.
β’ G. Linden, B. Smith, J. York, Amazon.com recommendations: item-to-item collaborative filtering, IEEE Internet Comput. 7 (1)
(2003) 76β80.
β’ Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: Proceedings of the 14th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 426β434.
β’ C. Desrosiers, G. Karypis, A comprehensive survey of neighborhood-based recommendation methods, in: Recommender
Systems Handbook, 2011, pp. 107β144.
β’ M.J. Pazzani, D. Billsus, Content-based recommendation systems, The Adap. Web (2007) 325β341.
β’ H. Junming, C. Xueqi, G. Jiafeng, S. Huawei, Y. Kun, Social recommendation with interpersonal influence, ECAI 10 (2010) 601β
606.
28. Thank You !
A special thanks to my project guide Dr. Rajendra Pamula sir for
guiding, motivating and providing me with fruitful information throughout
the development process of this project work
My sincere gratitude to the panel of teachers present for giving their
precious time for listening and evaluating my project presentation