Learning to recommend with user generated content

Learning to Recommend with User
Generated Content
Yueshen Xu1, Zhiyuan Chen2, Jianwei Yin1, Zizheng
Wu1 and Taojun Yao1
1School of Computer Science and Technology, Zhejiang University
2University of Illinois at Chicago
xyshzjucs@zju.edu.cn; xyshzjucs@gmail.com
2015/6/9 1Zhejiang University
Junxiang Wang

Yueshen Xu, WAIM, 2015
Outline
 Background
 Introduction
 Related Work
 Recommendation with UGC in User Side
 Matrix Factorization
 Topic Analysis for Items through Topic Modeling
 User Interest Distribution
 User Topic Regularization
 Recommendation with UGC in Item Side
 Item Topic Regularization
 Experiment and Evaluation
 Reference
Keywords: Recommendation, User
Generated Content, Topic Modeling, Matrix
Factorization

Background
 Recommendation in General
 Collaborative Filtering (CF)
− Matrix Factorization (MF)
 Content-based approach
− Pandora music genome project
 User Generated Content (UGC)
 social tag, review, question answer, blog, tweet, etc
 tag-based / review-based recommendation
 Problems in existing works
 not every web site has all kinds of UGC
 the item-word / user-word space is highly sparse
 synonym & polysemy
 most works only focus on a single kind of UGC
item1 item2 item3 item4
user1 r11
user2 r22
user3
user4 r41 r44
user5 r53

Background
2015/6/9 4
 Other related work
 social / trust-based recommendation  helpful but limited
− no social relationship  Amazon, Ebay, Newegg, Jingdong,
Expedia, etc
− UGC √
 Description/Profile-based recommendation
− static content
− fail to distinguish different items
− unrelated to a user’s preference
 UGC, in contrast:
 emphasize an item’s features
− those words received frequently
 increase dynamically
 associated with a user’s preference / interested topics
− I like science fiction films, so I wrote a lot of movie reviews that contain
words like fiction, tech, super, hero, robotic, machine
 natural chunking (social tag)

Contribution
 Main contributions
 We study UGC in learning user interests and learning item features
 We propose a novel user-oriented collaborative filtering model and a
novel item-oriented collaborative filtering model
 We propose a way to utilize different types of UGC in a unified way in
recommender systems
 We expand an existing dataset by crawling new data, and conduct
sufficient experiments on three real-world datasets, which attest the
effectiveness of proposed models.

Recommendation with UGC in User
Side
2015/6/9 Zhejiang University 6
 Topic analysis for items through topic modeling
 Terms in UGC are combined together to compose the term set W
 each item owns an aggregated term list
 pLSA/LDA/HDP/nCRP/PAM: all are OK
 𝚯 = 𝜽𝒋 (𝜽𝒋 = 𝜃𝒋𝟏, 𝜃𝒋𝟐, … , 𝜃𝒋𝑲, ) is the topic/aspect distribution
of document j (i.e., item j)  what we need
 User Interest Distribution
 Cluster items into groups according to the similarity of their
topics (K-Means/GMM/K-Medoid: all are OK)

Side
 User Interest Distribution (cont.)
 Intuition : find items with similar topics, although they are in
different categories: clothes, gadget, book, toy, DVD all about
Harry Potter
 Aggregate each user’s consumption records on each cluster 𝐶 𝑞
𝑆𝑖𝑚 𝑖, 𝑙 =
𝑃𝐶𝐶, 𝒄𝒐𝒔𝒊𝒏𝒆 𝑜𝑟 𝐾𝐿 𝑑𝑖𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒
the weight of 𝑙 as one of user 𝑖’s
neighbors: 𝑒𝑖𝑙 𝑖, 𝑙 =
𝑆𝑖𝑚(𝑖,𝑙)
𝑙′∈𝐿(𝑖) 𝑆𝑖𝑚(𝑖,𝑙′)
 A novel regularization : user topic regularization (UTR)
 𝑚𝑖𝑛 𝑖=1
𝑀
∥ 𝑈𝑖 − 𝑙∈𝐿(𝑖) 𝑒𝑖𝑙 𝑈𝑙 ∥ 𝐹
2
 Intuition: users with similar interested topics tend to have similar
latent features
user 𝑖
user 𝑙

Side
 A new MF model (UTR-MF)
 𝑚𝑖𝑛 𝑈,𝑉 𝐿 = 𝑖=1
𝑀
𝑗=1
𝑁
𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖
𝑇
𝑉𝑗)2 +
𝜆 𝑈
2
∥ 𝑈 ∥ 𝐹
2
+
𝜆 𝑉
2
∥ 𝑉 ∥ 𝐹
2
+
𝛼
2 𝑖=1
𝑀
∥ 𝑈𝑖 − 𝑙∈𝐿(𝑖) 𝑒𝑖𝑙 𝑈𝑙 ∥ 𝐹
2
 gradient descent/ coordinate descent
 Gradient Descent

𝜕𝐿
𝜕𝑈 𝑖
= 𝑗=1
𝑁
𝑇
𝑉𝑗)(−𝑉𝑗) + 𝜆 𝑈 𝑈𝑖 + 𝛼 𝑈𝑖 − 𝑙∈𝐿 𝑖 𝑒𝑖𝑙 𝑈𝑖 +
𝛼 𝑔∈𝐺(𝑖)(𝑈𝑔 − 𝑙′∈𝐿 𝑔 𝑒 𝑔𝑙′ 𝑈𝑙′) × (−𝑒 𝑔𝑖)

𝜕𝐿
𝜕𝑉 𝑗
= 𝑖=1
𝑀
𝑇
𝑉𝑗)(−𝑈𝑖) + 𝜆 𝑉 𝑉𝑗
 𝐺(𝑖) is a set consisting of those users whose neighborhoods
include user 𝑖

Recommendation with UGC in Item
Side
2015/6/9 9
 Intuition for items: similar UGC  similar topic
distribution  similar latent feature
 𝑆𝑖𝑚 𝑗, ℎ : similarity between item j and h  PCC, cosine or KL
divergence
 𝑤 𝑗, ℎ =
𝑆𝑖𝑚(𝑗,ℎ)
ℎ′∈𝐻(𝑗) 𝑆𝑖𝑚(𝑗,ℎ′)
 A novel regularization: item topic regularization (ITR)
 𝑚𝑖𝑛 𝑗=1
𝑁
∥ 𝑉𝑗 − ℎ∈𝐻(𝑗) 𝑤𝑗ℎ 𝑉ℎ ∥ 𝐹
2
 A new MF model (ITR-MF):
‒ 𝑚𝑖𝑛 𝑈,𝑉 𝐿 = 𝑖=1
𝑀
𝑗=1
𝑁
𝑇
𝑉𝑗)2
+
𝜆 𝑈
2
∥ 𝑈 ∥ 𝐹
2
+
𝜆 𝑉
2
∥ 𝑉 ∥ 𝐹
2
+
𝛼
2 𝑗=1
𝑁
∥ 𝑉𝑗 − ℎ∈𝐻(𝑗) 𝑤𝑗ℎ 𝑉ℎ ∥ 𝐹
2
 A natural combination: UTR + ITR
 gradient descent/coordinate descent

Experiment and Evaluation
 Real-world dataset
 Movielens (social tag + rating)
 Last.fm (expanded, social tag + rating)
 Yelp (review + rating)
 Evaluation Metric: RMSE and MAE
 Compared baseline models: UserCF, ItemCF, PMF, TF-IDF MF, CTR
 In social tag case:

 Experimental results (cont.)
 UTR-MF and ITR-MF outperform other baselines in all cases
 A detailed example, in Last.fm dataset, ITR-MF achieves 14%
improvement than PMF and 8% improvement than CTR
 ITR-MF behaves better than UTR-MF: a user’s preference is harder to
infer. The main reason is probably that a user’s preference can change
dynamically

 Experimental results (cont.)
 in review case  the improvement is similar to that in the social tag
case
 UTR-MF and ITR-MF outperform other baselines in all cases
 ITR-MF behaves better than UTR-MF: a user’s preference is harder to
infer
 The improvements are significant according to the paired t-test (𝑝 <
0.001)
 For more details, please refer to our paper

Conclusion
 Conclusion
 We demonstrate that different types of UGC can be integrated
into the MF model in a unified way
 User preferences and item features can be learned from UGC
text
 Our two novel regularization terms are effective to model user
preferences and item features
 Our two MF-extended models can achieve large improvements
 Future Work
 Study other types of UGC, such as tweet and blog, to learn user
preferences and influential events in SNS

Reference
[1] Adomavicius, G. and Tuzhilin, A.: Toward the next generation of recommender systems: A survey of
the state-of-the-art and possible extensions. In: IEEE TKDE, 17(6):734-749 (2005)
[2] Aggarwal, C.C. and Zhai, C.: Mining Text Data. In: Springer, New York (2012)
[3] Bischo, K., Firan, C.S., Nejdl, W., and Paiu, R.: Can all tags be used for search?In: ACM CIKM, pp.
193-202 (2008)
[4] Blei, D.M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation. In: JMLR,3:993-1022 (2003)
[5] Cantador, I., Brusilovsky, P., and Ku ik, T.: HetRec workshop. In: ACM RecSys,New York, USA (2011)
[6] Chen, C., Zheng, X., Wang, Y., Hong, F. and Lin, Z.: Context-Aware Collaborative Topic Regression
with Social Matrix Factorization for Recommender Systems. In: AAAI, pp. 9-15 (2014)
[7] Fang, Y. and Si, L.: Matrix co-factorization for recommendation with rich side information and implicit
feedback. In: HetRec (workshop of RecSys), pp. 65-69 (2011)
[8] Griths, T. L. and Steyvers, M.: Finding Scientific Topics. In: PNAS (2004)
[9] Koren, Y., Bell, R., and Volinsky, C.: Matrix factorization techniques for recommender systems. In:
Computer, 42(8):30-37 (2009)
[10] Liang, H., Xu, Y., Li, Y., Nayak, R., and Tao, X.: Connecting users and items with weighted tags for
personalized item recommendations. In: Hypertext, pp.51-60(2010)
[11] Liu, X. and Aberer, K.: SoCo: a social network aided context-aware recommendersystem. In: WWW,
pp. 781-802 (2013)
[12] Ma, H., Zhou, D., Liu, C., Lyu, M.R., and King, I.: Recommender systems with social regularization.
In: ACM WSDM, pp. 287-296 (2011)

Reference
[13] McAuley, J.J. and Leskovec, J.: Hidden factors and hidden topics: understanding rating
dimensions with review text. In: ACM RecSys, pp. 165-172 (2013)
[14] Moens, M.-F., Li, J. and Chua, T.-S. : Mining User Generated Content. In: Chapman and Hall/CRC
(2014)
[15] Pandora. Music genome project. In: http://www.pandora.com/about/mgp
[16] Purushotham, S. and Liu, Y.: Collaborative topic regression with social matrix factorization for
recommendation systems. In: IEEE ICML, pp. 759-766 (2012)
[17] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J.: Grouplens: An open
architecture for collaborative filtering of netnews. In: CSCW, pp. 175-186 (1994)
[18] Rovi. Recommendations api version 2.0. In:
http://proddoc.rovicorp.com/mashery/index.php/Recommendations
[19] Salakhutdinov, R. and Mnih, A.: Probabilistic matrix factorization. In: NIPS
[20] Sarwar, B., Karypis, G., Konstan, J., and Reidl, J.: Item-based collaborative tering
recommendation algorithm. In: WWW, pp. 285-295 (2001)
[21] Wang, C. and Blei, D.M.: Collaborative topic modeling for recommending scientic articles. In: ACM
SIGKDD, pp. 448-456 (2011)
[22] Yang, X., Steck, H., and Liu, Y.: Circle-based recommendation in online social networks. In: ACM
SIGKDD, pp. 1267-1275 (2012)
[23] Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y. and Ma, S.: Explicit factor models for explainable
recommendation based on phrase-level sentiment analysis. In: ACM SIGIR, pp. 83-92 (2014)

Thank you!
Q&A

Learning to recommend with user generated content

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (7)

Similar to Learning to recommend with user generated content

Similar to Learning to recommend with user generated content (20)

More from Yueshen Xu

More from Yueshen Xu (20)

Recently uploaded

Recently uploaded (20)

Learning to recommend with user generated content