This document discusses recommendation systems that incorporate user generated content (UGC) such as tags, reviews, questions/answers, blogs and tweets. It proposes two new matrix factorization-based recommendation models: 1) UTR-MF which regularizes user latent factors based on their interested topics learned from UGC, and 2) ITR-MF which regularizes item latent factors based on their topic distributions learned from associated UGC. The models are evaluated on three real-world datasets and are shown to outperform baselines by utilizing UGC to better learn user preferences and item features. Future work could explore incorporating other UGC types like tweets and blogs.
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyColleen Farrelly
A lot of data science coverage in the media focuses on big data—storage systems, deep learning, and analyzing data with billions or trillions of observations. However, there’s an equally pressing problem in many industries and smaller companies today: small sample sizes or small subgroups within larger datasets. Machine learning algorithms fail to converge. Statistical methods break down completely. And valuable insight is lost.
However, recent advances in a branch of machine learning called topological data analysis (TDA), along with novel applications of topology to existing statistical methods, have provided a toolset suited to the challenges of small data. These methods have great potential as the field of data science moves from quantity to quality of data. This talk overviews several of TDA’s major tools, as well as their applications to three projects in which traditional methods fail.
I will link to the video when it is made available :)
Diagrammatic elicitation & When to use diagrams, drawings and cartoons?Tünde Varga-Atkins
This presentation was given by Tunde Varga-Atkins at the 2011 International Visual Methods conference at the Open University, UK, Milton Keynes (Sep13-15 2011). It is a collaboration between Muriah Umoquit, Peggy Tso, Tunde and Mark O'Brien and Johannes Wheeldon. It combines two papers into one (one on terminology and diagrammatic elicitation) and another one on the ontological consequences of using diagrams, drawings and cartoons. (This combination was due to an admin error - both papers are available in more detail on request.)
Paper Writing in Applied Mathematics (slightly updated slides)Mason Porter
Here are my slides (which I have updated very slightly) in writing papers in applied mathematics.
There will be an accompanying oral presentation and discussion on Friday 20 April. I am recording the video for that and plan to post it along with these (or a further updated version of these) slides.
Graduate Paper--Hierarchical clustring and topology for psychometrics paperColleen Farrelly
Paper presents general alternative to traditional psychometrics methods (factor analysis...) on an example survey (from a bridging concept in psychology that is typically hard to measure); PPT by this name distills the mathematic machinery.
PPT is found here: https://www.slideshare.net/ColleenFarrelly/hierarchical-clustering-for-psychometric-validation-76735689
To cite: Farrelly, C. M., Schwartz, S. J., Amodeo, A. L., Feaster, D. J., Steinley, D. L., Meca, A., & Picariello, S. (2017). The Analysis of Bridging Constructs with Hierarchical Clustering Methods: An application to identity. Journal of Research in Personality.
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...YONG ZHENG
Yong Zheng, Mayur Agnani, Mili Singh. “Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering”. Proceedings of The 6th ACM Conference on Research in Information Technology (RIIT), Rochester, NY, USA, October, 2017
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyColleen Farrelly
A lot of data science coverage in the media focuses on big data—storage systems, deep learning, and analyzing data with billions or trillions of observations. However, there’s an equally pressing problem in many industries and smaller companies today: small sample sizes or small subgroups within larger datasets. Machine learning algorithms fail to converge. Statistical methods break down completely. And valuable insight is lost.
However, recent advances in a branch of machine learning called topological data analysis (TDA), along with novel applications of topology to existing statistical methods, have provided a toolset suited to the challenges of small data. These methods have great potential as the field of data science moves from quantity to quality of data. This talk overviews several of TDA’s major tools, as well as their applications to three projects in which traditional methods fail.
I will link to the video when it is made available :)
Diagrammatic elicitation & When to use diagrams, drawings and cartoons?Tünde Varga-Atkins
This presentation was given by Tunde Varga-Atkins at the 2011 International Visual Methods conference at the Open University, UK, Milton Keynes (Sep13-15 2011). It is a collaboration between Muriah Umoquit, Peggy Tso, Tunde and Mark O'Brien and Johannes Wheeldon. It combines two papers into one (one on terminology and diagrammatic elicitation) and another one on the ontological consequences of using diagrams, drawings and cartoons. (This combination was due to an admin error - both papers are available in more detail on request.)
Paper Writing in Applied Mathematics (slightly updated slides)Mason Porter
Here are my slides (which I have updated very slightly) in writing papers in applied mathematics.
There will be an accompanying oral presentation and discussion on Friday 20 April. I am recording the video for that and plan to post it along with these (or a further updated version of these) slides.
Graduate Paper--Hierarchical clustring and topology for psychometrics paperColleen Farrelly
Paper presents general alternative to traditional psychometrics methods (factor analysis...) on an example survey (from a bridging concept in psychology that is typically hard to measure); PPT by this name distills the mathematic machinery.
PPT is found here: https://www.slideshare.net/ColleenFarrelly/hierarchical-clustering-for-psychometric-validation-76735689
To cite: Farrelly, C. M., Schwartz, S. J., Amodeo, A. L., Feaster, D. J., Steinley, D. L., Meca, A., & Picariello, S. (2017). The Analysis of Bridging Constructs with Hierarchical Clustering Methods: An application to identity. Journal of Research in Personality.
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...YONG ZHENG
Yong Zheng, Mayur Agnani, Mili Singh. “Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering”. Proceedings of The 6th ACM Conference on Research in Information Technology (RIIT), Rochester, NY, USA, October, 2017
Broad concepts - Methods in User-Technology StudiesAntti Salovaara
Research question, Research methods as a toolbox, Reliability and validity, Open vs. closed research designs, Grounded theory, Interventionist vs. observational research, Having multiple research questions and methods, Triangulation and methodological overlap
This slide was presented in International the 2015 Conference on Education Research.
I aggregated several my other partial slides and reports to describe adaptive learning model pertaining to concept of learning analytics as well as LOD for curriculum standards and digital resources. There is short introduction to the project of ISO/IEC 20748 Learning analytics interoperability - Part 1: Reference model.
This is a tutorial about recommender system for CS410 @ UIUC. It summarize some good research paper about how user profile and tags can improve recommender systems.
In this case study we identify the factors that influence the adoption of a new system in a major company in Saudi Arabia. We develop a theoretical framework to help derive better understanding of system adoption via socio-technical integration.
We formulation of 14 hypotheses that were tested via a survey of 42 system users. Management support and change management were found to be significant factors influencing system adoption. As a result, the 14 null hypotheses were rejected due to their statistical significance (p-value < 0.05). Discussions and recommendations for future research are discussed.
Broad concepts - Methods in User-Technology StudiesAntti Salovaara
Research question, Research methods as a toolbox, Reliability and validity, Open vs. closed research designs, Grounded theory, Interventionist vs. observational research, Having multiple research questions and methods, Triangulation and methodological overlap
This slide was presented in International the 2015 Conference on Education Research.
I aggregated several my other partial slides and reports to describe adaptive learning model pertaining to concept of learning analytics as well as LOD for curriculum standards and digital resources. There is short introduction to the project of ISO/IEC 20748 Learning analytics interoperability - Part 1: Reference model.
This is a tutorial about recommender system for CS410 @ UIUC. It summarize some good research paper about how user profile and tags can improve recommender systems.
In this case study we identify the factors that influence the adoption of a new system in a major company in Saudi Arabia. We develop a theoretical framework to help derive better understanding of system adoption via socio-technical integration.
We formulation of 14 hypotheses that were tested via a survey of 42 system users. Management support and change management were found to be significant factors influencing system adoption. As a result, the 14 null hypotheses were rejected due to their statistical significance (p-value < 0.05). Discussions and recommendations for future research are discussed.
Internet becomes the most popular surfing environment which increases the
service oriented data size. As the data size grows, finding and retrieving the most
similar data from the large volume of data would become more difficult task. This
problem is focused in the various research methods, which attempts to cluster the
large volume of data. In the existing research method Clustering-based Collaborative
Filtering approach (ClubCF) is introduced whose main goal is to cluster the similar
kind of data together, so that retrieval time cost can be reduced considerably.
However, existing research methods cannot find the similar reviews accurately which
needs to be focused more for efficient and accurate recommendation system. This is
ensured in the proposed research method by introducing the novel research technique
namely Modified Collaborative Filtering and Clustering with Regression (MoCFCR).
In this research method, initially k means algorithm is used to cluster the similar
movie reviewer together, so that recommendation process can be done in the easier
way. In order to handle the large volume of data this research work adapts the map
reduce framework which will divide the entire data into subsets which will assigned
on separate nodes with individual key values. After clustering, the clustered outcome
is merged together using inverted index procedure in which similarity between movies
would be calculated. Here collaborative filtering is applied to remove the movies that
are not relevant to input. Finally recommendations of movies are made in the accurate
way by using the logistic regression method. The overall evaluation of the proposed
research method is done in Hadoop from which it can be proved that the proposed
research technique can lead to provide better outcome than the existing research
techniques
RUNNING HEADER: Analytics Ecosystem 1
Analytics Ecosystem 4
Analytics Ecosystem
Lisa Garay
Rasmussen College
Authors Note
This paper is being submitted for Anastasia Rashtchian’s B288 Business Analytics Course.
This paper looks at the nine clusters of the ecosystem. Clustering refers to a system of grouping functions that are similar so as to set them out from others. It begins by highlighting them before proceeding to defining them. It then identifies clusters that represent technology developers and technology users. Peer reviewed materials are used in this endeavor.
They include executive sponsor cluster which contains information that concerns administrators for directing the system. Another one is end-user tools and dashboards cluster that is made of functions that facilitate ability of persons to ultimately engage the system. Data owners cluster is made up of programs that are related to persons who have data in the system. Business users’ cluster is made up of functions that are related to clients of the system. Business applications and systems cluster is made up programs related to features of a given system. Developers cluster is made of programs that are related to the development of programs in the system. Analyst cluster is made up of materials that are related to analysis of data in the system. SME cluster that is made up switches that run SME applications in the system. Lastly, operational data stores that are made up of programs that are concerned with storage of data in a system (Pitelis, 2012).
While developers cluster is made up of technology developers in the system, business users’ cluster is made up of technology users in the system. In conclusion, clustering serves to bring roles together as well as separating roles that are not related in a system (Cameron, Gelbach & Miller, 2012).
They can be represented as follows:-
References
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2012). Robust inference with multiway clustering. Journal of Business & Economic Statistics.
Pitelis, C. (2012). Clusters, entrepreneurial ecosystem co-creation, and appropriability: a conceptual framework. Industrial and Corporate Change, dts008.
Infrastructure
Executive Sponsor Cluster
End-user tools and dashboards cluster
operational data stores
Data Owners Cluster
Business users' cluster
Business systems and applications cluster
Developers Cluster
Analysts Cluster
SME cluster
4
Running head: Sentiment analysis
Sentiment Analysis
Lisa Garay
Rasmussen College
Authors Note
This paper is being submitted for Anastashia Rashtcian’s B288 Business Analytics course.
Sentiment analysis has played a significant role in the concurrent marketing field, specifically in product marketing. According to Somasundaran, Swapna, (2010), the process’ operational module is structured on a data mining sequence, whereby the end users of given particulars the feedback pertaining a used.
THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION; The Lived Experiences of Faculty
Dr. Joseph K. Adjei
School of Technology (SOT)
Ghana Institute of Management and Public Administration (GIMPA)
2nd International Conference of the African Virtual University
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
In recent years, many daily processes such as internet web searching, e-mail filter-ing, social media services, e-commerce have benefited from machine learning tech-niques (ML). The implementation of ML techniques has been largely focused on blackbox methods where the general conclusions are not easily interpretable. Hence, theelaboration with other declarative software models to identify the correctness and com-pleteness of the models is not easy to perform. On the other hand, the emerge of somelogic-based machine learning techniques with their advantage of white box approachhave been proven to be well-suited for many software engineering tasks. In this paper,we propose the use of a logic-based approach to learn user preference in the form ofpairwise comparisons. APARELL as a novel approach of inductive learning is able tomodel the user’s preferences in description logic representation. This offers a rich, re-lational representation which is then can be used to produce a set of recommendations.A user study has been performed in our experiment to evaluate the implementation ofpairwise preference recommender system when compared to a standard list interface.The result of the experiment shows that the pairwise interface was significantly betterthan the other interface in many ways.
Recommender system slides for undergraduateYueshen Xu
Slides for undergraduate in IR class. Presented in Chinese
Mainly focus on the background, application, real case, idea, basic method of recommender systems
This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Learning to recommend with user generated content
1. Learning to Recommend with User
Generated Content
Yueshen Xu1, Zhiyuan Chen2, Jianwei Yin1, Zizheng
Wu1 and Taojun Yao1
1School of Computer Science and Technology, Zhejiang University
2University of Illinois at Chicago
xyshzjucs@zju.edu.cn; xyshzjucs@gmail.com
2015/6/9 1Zhejiang University
Junxiang Wang
2. Yueshen Xu, WAIM, 2015
Outline
Background
Introduction
Related Work
Recommendation with UGC in User Side
Matrix Factorization
Topic Analysis for Items through Topic Modeling
User Interest Distribution
User Topic Regularization
Recommendation with UGC in Item Side
Item Topic Regularization
Experiment and Evaluation
Reference
2015/6/9 2Zhejiang University
Keywords: Recommendation, User
Generated Content, Topic Modeling, Matrix
Factorization
3. Yueshen Xu, WAIM, 2015
Background
Recommendation in General
Collaborative Filtering (CF)
− Matrix Factorization (MF)
Content-based approach
− Pandora music genome project
2015/6/9 3Zhejiang University
User Generated Content (UGC)
social tag, review, question answer, blog, tweet, etc
tag-based / review-based recommendation
Problems in existing works
not every web site has all kinds of UGC
the item-word / user-word space is highly sparse
synonym & polysemy
most works only focus on a single kind of UGC
item1 item2 item3 item4
user1 r11
user2 r22
user3
user4 r41 r44
user5 r53
4. Yueshen Xu, WAIM, 2015
Background
2015/6/9 4
Other related work
social / trust-based recommendation helpful but limited
− no social relationship Amazon, Ebay, Newegg, Jingdong,
Expedia, etc
− UGC √
Description/Profile-based recommendation
− static content
− fail to distinguish different items
− unrelated to a user’s preference
UGC, in contrast:
emphasize an item’s features
− those words received frequently
increase dynamically
associated with a user’s preference / interested topics
− I like science fiction films, so I wrote a lot of movie reviews that contain
words like fiction, tech, super, hero, robotic, machine
natural chunking (social tag)
5. Yueshen Xu, WAIM, 2015
Contribution
2015/6/9 5Zhejiang University
Main contributions
We study UGC in learning user interests and learning item features
We propose a novel user-oriented collaborative filtering model and a
novel item-oriented collaborative filtering model
We propose a way to utilize different types of UGC in a unified way in
recommender systems
We expand an existing dataset by crawling new data, and conduct
sufficient experiments on three real-world datasets, which attest the
effectiveness of proposed models.
6. Yueshen Xu, WAIM, 2015
Recommendation with UGC in User
Side
2015/6/9 Zhejiang University 6
Topic analysis for items through topic modeling
Terms in UGC are combined together to compose the term set W
each item owns an aggregated term list
pLSA/LDA/HDP/nCRP/PAM: all are OK
𝚯 = 𝜽𝒋 (𝜽𝒋 = 𝜃𝒋𝟏, 𝜃𝒋𝟐, … , 𝜃𝒋𝑲, ) is the topic/aspect distribution
of document j (i.e., item j) what we need
User Interest Distribution
Cluster items into groups according to the similarity of their
topics (K-Means/GMM/K-Medoid: all are OK)
7. Yueshen Xu, WAIM, 2015
Recommendation with UGC in User
Side
2015/6/9 Zhejiang University 7
User Interest Distribution (cont.)
Intuition : find items with similar topics, although they are in
different categories: clothes, gadget, book, toy, DVD all about
Harry Potter
Aggregate each user’s consumption records on each cluster 𝐶 𝑞
𝑆𝑖𝑚 𝑖, 𝑙 =
𝑃𝐶𝐶, 𝒄𝒐𝒔𝒊𝒏𝒆 𝑜𝑟 𝐾𝐿 𝑑𝑖𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒
the weight of 𝑙 as one of user 𝑖’s
neighbors: 𝑒𝑖𝑙 𝑖, 𝑙 =
𝑆𝑖𝑚(𝑖,𝑙)
𝑙′∈𝐿(𝑖) 𝑆𝑖𝑚(𝑖,𝑙′)
A novel regularization : user topic regularization (UTR)
𝑚𝑖𝑛 𝑖=1
𝑀
∥ 𝑈𝑖 − 𝑙∈𝐿(𝑖) 𝑒𝑖𝑙 𝑈𝑙 ∥ 𝐹
2
Intuition: users with similar interested topics tend to have similar
latent features
user 𝑖
user 𝑙
8. Yueshen Xu, WAIM, 2015
Recommendation with UGC in User
Side
2015/6/9 Zhejiang University 8
A new MF model (UTR-MF)
𝑚𝑖𝑛 𝑈,𝑉 𝐿 = 𝑖=1
𝑀
𝑗=1
𝑁
𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖
𝑇
𝑉𝑗)2 +
𝜆 𝑈
2
∥ 𝑈 ∥ 𝐹
2
+
𝜆 𝑉
2
∥ 𝑉 ∥ 𝐹
2
+
𝛼
2 𝑖=1
𝑀
∥ 𝑈𝑖 − 𝑙∈𝐿(𝑖) 𝑒𝑖𝑙 𝑈𝑙 ∥ 𝐹
2
gradient descent/ coordinate descent
Gradient Descent
𝜕𝐿
𝜕𝑈 𝑖
= 𝑗=1
𝑁
𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖
𝑇
𝑉𝑗)(−𝑉𝑗) + 𝜆 𝑈 𝑈𝑖 + 𝛼 𝑈𝑖 − 𝑙∈𝐿 𝑖 𝑒𝑖𝑙 𝑈𝑖 +
𝛼 𝑔∈𝐺(𝑖)(𝑈𝑔 − 𝑙′∈𝐿 𝑔 𝑒 𝑔𝑙′ 𝑈𝑙′) × (−𝑒 𝑔𝑖)
𝜕𝐿
𝜕𝑉 𝑗
= 𝑖=1
𝑀
𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖
𝑇
𝑉𝑗)(−𝑈𝑖) + 𝜆 𝑉 𝑉𝑗
𝐺(𝑖) is a set consisting of those users whose neighborhoods
include user 𝑖
9. Yueshen Xu, WAIM, 2015
Recommendation with UGC in Item
Side
2015/6/9 9
Intuition for items: similar UGC similar topic
distribution similar latent feature
𝑆𝑖𝑚 𝑗, ℎ : similarity between item j and h PCC, cosine or KL
divergence
𝑤 𝑗, ℎ =
𝑆𝑖𝑚(𝑗,ℎ)
ℎ′∈𝐻(𝑗) 𝑆𝑖𝑚(𝑗,ℎ′)
A novel regularization: item topic regularization (ITR)
𝑚𝑖𝑛 𝑗=1
𝑁
∥ 𝑉𝑗 − ℎ∈𝐻(𝑗) 𝑤𝑗ℎ 𝑉ℎ ∥ 𝐹
2
A new MF model (ITR-MF):
‒ 𝑚𝑖𝑛 𝑈,𝑉 𝐿 = 𝑖=1
𝑀
𝑗=1
𝑁
𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖
𝑇
𝑉𝑗)2
+
𝜆 𝑈
2
∥ 𝑈 ∥ 𝐹
2
+
𝜆 𝑉
2
∥ 𝑉 ∥ 𝐹
2
+
𝛼
2 𝑗=1
𝑁
∥ 𝑉𝑗 − ℎ∈𝐻(𝑗) 𝑤𝑗ℎ 𝑉ℎ ∥ 𝐹
2
A natural combination: UTR + ITR
gradient descent/coordinate descent
10. Yueshen Xu, WAIM, 2015
Experiment and Evaluation
2015/6/9 Zhejiang University 10
Real-world dataset
Movielens (social tag + rating)
Last.fm (expanded, social tag + rating)
Yelp (review + rating)
Evaluation Metric: RMSE and MAE
Compared baseline models: UserCF, ItemCF, PMF, TF-IDF MF, CTR
In social tag case:
11. Yueshen Xu, WAIM, 2015
Experiment and Evaluation
2015/6/9 Zhejiang University 11
Experimental results (cont.)
UTR-MF and ITR-MF outperform other baselines in all cases
A detailed example, in Last.fm dataset, ITR-MF achieves 14%
improvement than PMF and 8% improvement than CTR
ITR-MF behaves better than UTR-MF: a user’s preference is harder to
infer. The main reason is probably that a user’s preference can change
dynamically
12. Yueshen Xu, WAIM, 2015
Experiment and Evaluation
2015/6/9 Zhejiang University 12
Experimental results (cont.)
in review case the improvement is similar to that in the social tag
case
UTR-MF and ITR-MF outperform other baselines in all cases
ITR-MF behaves better than UTR-MF: a user’s preference is harder to
infer
The improvements are significant according to the paired t-test (𝑝 <
0.001)
For more details, please refer to our paper
13. Yueshen Xu, WAIM, 2015
Conclusion
Conclusion
We demonstrate that different types of UGC can be integrated
into the MF model in a unified way
User preferences and item features can be learned from UGC
text
Our two novel regularization terms are effective to model user
preferences and item features
Our two MF-extended models can achieve large improvements
Future Work
Study other types of UGC, such as tweet and blog, to learn user
preferences and influential events in SNS
2015/6/9 Zhejiang University 13
14. Yueshen Xu, WAIM, 2015
Reference
[1] Adomavicius, G. and Tuzhilin, A.: Toward the next generation of recommender systems: A survey of
the state-of-the-art and possible extensions. In: IEEE TKDE, 17(6):734-749 (2005)
[2] Aggarwal, C.C. and Zhai, C.: Mining Text Data. In: Springer, New York (2012)
[3] Bischo, K., Firan, C.S., Nejdl, W., and Paiu, R.: Can all tags be used for search?In: ACM CIKM, pp.
193-202 (2008)
[4] Blei, D.M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation. In: JMLR,3:993-1022 (2003)
[5] Cantador, I., Brusilovsky, P., and Ku ik, T.: HetRec workshop. In: ACM RecSys,New York, USA (2011)
[6] Chen, C., Zheng, X., Wang, Y., Hong, F. and Lin, Z.: Context-Aware Collaborative Topic Regression
with Social Matrix Factorization for Recommender Systems. In: AAAI, pp. 9-15 (2014)
[7] Fang, Y. and Si, L.: Matrix co-factorization for recommendation with rich side information and implicit
feedback. In: HetRec (workshop of RecSys), pp. 65-69 (2011)
[8] Griths, T. L. and Steyvers, M.: Finding Scientific Topics. In: PNAS (2004)
[9] Koren, Y., Bell, R., and Volinsky, C.: Matrix factorization techniques for recommender systems. In:
Computer, 42(8):30-37 (2009)
[10] Liang, H., Xu, Y., Li, Y., Nayak, R., and Tao, X.: Connecting users and items with weighted tags for
personalized item recommendations. In: Hypertext, pp.51-60(2010)
[11] Liu, X. and Aberer, K.: SoCo: a social network aided context-aware recommendersystem. In: WWW,
pp. 781-802 (2013)
[12] Ma, H., Zhou, D., Liu, C., Lyu, M.R., and King, I.: Recommender systems with social regularization.
In: ACM WSDM, pp. 287-296 (2011)
2015/6/9 Zhejiang University 14
15. Yueshen Xu, WAIM, 2015
Reference
[13] McAuley, J.J. and Leskovec, J.: Hidden factors and hidden topics: understanding rating
dimensions with review text. In: ACM RecSys, pp. 165-172 (2013)
[14] Moens, M.-F., Li, J. and Chua, T.-S. : Mining User Generated Content. In: Chapman and Hall/CRC
(2014)
[15] Pandora. Music genome project. In: http://www.pandora.com/about/mgp
[16] Purushotham, S. and Liu, Y.: Collaborative topic regression with social matrix factorization for
recommendation systems. In: IEEE ICML, pp. 759-766 (2012)
[17] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J.: Grouplens: An open
architecture for collaborative filtering of netnews. In: CSCW, pp. 175-186 (1994)
[18] Rovi. Recommendations api version 2.0. In:
http://proddoc.rovicorp.com/mashery/index.php/Recommendations
[19] Salakhutdinov, R. and Mnih, A.: Probabilistic matrix factorization. In: NIPS
[20] Sarwar, B., Karypis, G., Konstan, J., and Reidl, J.: Item-based collaborative tering
recommendation algorithm. In: WWW, pp. 285-295 (2001)
[21] Wang, C. and Blei, D.M.: Collaborative topic modeling for recommending scientic articles. In: ACM
SIGKDD, pp. 448-456 (2011)
[22] Yang, X., Steck, H., and Liu, Y.: Circle-based recommendation in online social networks. In: ACM
SIGKDD, pp. 1267-1275 (2012)
[23] Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y. and Ma, S.: Explicit factor models for explainable
recommendation based on phrase-level sentiment analysis. In: ACM SIGIR, pp. 83-92 (2014)
2015/6/9 Zhejiang University 15