The document describes a proposed approach for inferring implicit topical interests of users on Twitter. It discusses related work on detecting user interests from social media using bag-of-words, topic modeling, and bag-of-concepts approaches. The proposed approach models user interests as a graph-based link prediction problem over a heterogeneous graph incorporating user followerships, explicit interests, and topic relatedness. It evaluates different variants of the model and finds semantic relatedness of topics to be most effective for identifying implicit user interests.
Tag recommendation is very useful system in detecting the type of messages like GMAIL has divided its inbox in three tabs PRIMARY, SOCIAL, PROMOTIONS and there are some other labels like SPAM, Important etc. It can be used in other categories like Social Bookmarking, Search Engines etc.
Summary slides of "Unsupervised Model for Topic Viewpoint Discovery in Online Debates Leveraging Author Interactions" published at ICWSM 2018 from Amine Trabelsi and Osmar R. Za ̈ıane.
Recommender systems are knowledge-based systems which support human decision-making. In an era of overwhelming choice, they help us decide which
products, services and information to consume. The focus of attention in recommender systems research and development has been on making recommendations to individual consumers. These places focus on the easier case, but ignore the fact that it is as common, if not more common, for us to consume items in groups such as couples, families and parties of friends. The choice of a date movie, a family holiday destination, or a restaurant for a celebration meal all require the balancing of the preferences of multiple consumers
Tag recommendation is very useful system in detecting the type of messages like GMAIL has divided its inbox in three tabs PRIMARY, SOCIAL, PROMOTIONS and there are some other labels like SPAM, Important etc. It can be used in other categories like Social Bookmarking, Search Engines etc.
Summary slides of "Unsupervised Model for Topic Viewpoint Discovery in Online Debates Leveraging Author Interactions" published at ICWSM 2018 from Amine Trabelsi and Osmar R. Za ̈ıane.
Recommender systems are knowledge-based systems which support human decision-making. In an era of overwhelming choice, they help us decide which
products, services and information to consume. The focus of attention in recommender systems research and development has been on making recommendations to individual consumers. These places focus on the easier case, but ignore the fact that it is as common, if not more common, for us to consume items in groups such as couples, families and parties of friends. The choice of a date movie, a family holiday destination, or a restaurant for a celebration meal all require the balancing of the preferences of multiple consumers
Summary of a Recommender Systems Survey paperChangsung Moon
This is the summary of the following paper:
J. Bobadilla, F. Ortega, A. Hernando and A. Gutierrez, “Recommender Systems Survey,” Knowledge Based Systems, Vol. 26, 2013, pp. 109-132.
Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: last Friday, we won the ACM RecSys 2013 News Recommender Systems challenge). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.
Textual information exchanged among users on online social network platforms provides deep understanding into users' interest and behavioral patterns. However, unlike traditional text-dominant settings such as online publishing, one distinct feature for online social network is users' rich interactions with the textual content, which, unfortunately, has not yet been well incorporated in the existing topic modeling frameworks.
In this paper, we propose an LDA-based behavior-topic
model (B-LDA) which jointly models user topic interests and behavioral patterns. We focus the study of the model on on-line social network settings such as microblogs like Twitter where the textual content is relatively short but user inter-actions on them are rich. We conduct experiments on real Twitter data to demonstrate that the topics obtained by our model are both informative and insightful. As an application of our B-LDA model, we also propose a Twitter followee rec-ommendation algorithm combining B-LDA and LDA, which we show in a quantitative experiment outperforms LDA with a signicant margin.
The increase in the amount of structured data published using the principles of Linked Data, means that now it is more likely to find resources on the Web of Data that describe real life concepts. However, discovering resources related to any given resource is still an open research area. This thesis studies recommender systems that use Linked Data as a source for generating recommendations exploiting the big amount of available resources and the relationships between them. Accordingly, a framework named \emph{AlLied} to execute recommendation algorithms is proposed. This framework can be used as the main component for recommendations in a given architecture because it allows application developers to execute and evaluate recommendation algorithms in different contexts. Two implementations of this framework are presented and compared. The first one relies on graph-based algorithms and the second one on machine learning algorithms. Finally, a new recommendation algorithm that adapts dynamically to the linking features of the datasets used is also proposed
Recommender systems are software tools and techniques providing suggestions for items to be of interest to a user. Recommender systems have proved in recent years to be a valuable means of helping Web users by providing useful and effective recommendations or suggestions.
This slide deck provides a survey of research in the field of coreference resolution. We survey 10 significant research papers and also provide a detailed description of the problem and also suggest future research directions.
Recommender systems analyze patterns of user interest in
products to provide personalized recommendations. They seek to predict the rating or preference that user would
give to an item. Some of the most successful realizations of latent factor models are based on matrix factorization...
Summary of a Recommender Systems Survey paperChangsung Moon
This is the summary of the following paper:
J. Bobadilla, F. Ortega, A. Hernando and A. Gutierrez, “Recommender Systems Survey,” Knowledge Based Systems, Vol. 26, 2013, pp. 109-132.
Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: last Friday, we won the ACM RecSys 2013 News Recommender Systems challenge). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.
Textual information exchanged among users on online social network platforms provides deep understanding into users' interest and behavioral patterns. However, unlike traditional text-dominant settings such as online publishing, one distinct feature for online social network is users' rich interactions with the textual content, which, unfortunately, has not yet been well incorporated in the existing topic modeling frameworks.
In this paper, we propose an LDA-based behavior-topic
model (B-LDA) which jointly models user topic interests and behavioral patterns. We focus the study of the model on on-line social network settings such as microblogs like Twitter where the textual content is relatively short but user inter-actions on them are rich. We conduct experiments on real Twitter data to demonstrate that the topics obtained by our model are both informative and insightful. As an application of our B-LDA model, we also propose a Twitter followee rec-ommendation algorithm combining B-LDA and LDA, which we show in a quantitative experiment outperforms LDA with a signicant margin.
The increase in the amount of structured data published using the principles of Linked Data, means that now it is more likely to find resources on the Web of Data that describe real life concepts. However, discovering resources related to any given resource is still an open research area. This thesis studies recommender systems that use Linked Data as a source for generating recommendations exploiting the big amount of available resources and the relationships between them. Accordingly, a framework named \emph{AlLied} to execute recommendation algorithms is proposed. This framework can be used as the main component for recommendations in a given architecture because it allows application developers to execute and evaluate recommendation algorithms in different contexts. Two implementations of this framework are presented and compared. The first one relies on graph-based algorithms and the second one on machine learning algorithms. Finally, a new recommendation algorithm that adapts dynamically to the linking features of the datasets used is also proposed
Recommender systems are software tools and techniques providing suggestions for items to be of interest to a user. Recommender systems have proved in recent years to be a valuable means of helping Web users by providing useful and effective recommendations or suggestions.
This slide deck provides a survey of research in the field of coreference resolution. We survey 10 significant research papers and also provide a detailed description of the problem and also suggest future research directions.
Recommender systems analyze patterns of user interest in
products to provide personalized recommendations. They seek to predict the rating or preference that user would
give to an item. Some of the most successful realizations of latent factor models are based on matrix factorization...
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...Eliezer Silva
Presentation from ECML-PKDD 2017 of joint work with my supervisors Helge Langseth and Heri Ramampiaro, about a Poisson factorization model for recommendations that integrates topic models in the items set and social connections between users.
code: https://github.com/zehsilva/poissonmf_cs
preprint version: https://inajourney.files.wordpress.com/2012/11/poissonmf_cs.pdf
In social networks, where users send messages to each other, the issue of what triggers communication between unrelated users arises: does communication between previously unrelated users depend on friend-of-a-friend type of relationships, common interests, or other factors? In this work, we study the problem of predicting directed communication
intention between two users. Link prediction is similar to communication intention in that it uses network structure for prediction. However, these two problems exhibit fundamental
differences that originate from their focus. Link prediction uses evidence to predict network structure evolution, whereas our focal point is directed communication initiation between
users who are previously not structurally connected. To address this problem, we employ topological evidence in conjunction to transactional information in order to predict communication intention. It is not intuitive whether methods that work well for
link prediction would work well in this case. In fact, we show in this work that network or content evidence, when considered separately, are not sufficiently accurate predictors. Our novel approach, which jointly considers local structural properties of users in a social network, in conjunction with their generated content, captures numerous interactions, direct and indirect, social and contextual, which have up to date been considered independently. We performed an empirical study to evaluate our method using an extracted network of directed @-messages sent between users of a corporate microblogging service, which resembles Twitter. We find that our method outperforms state of the art techniques for link prediction. Our findings have implications for a wide range of social web applications, such as contextual expert recommendation for Q&A, new friendship relationships creation, and targeted content delivery.
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...GUANGYUAN PIAO
In this paper, we propose user modeling strategies which
use Concept Frequency - Inverse Document Frequency (CF-
IDF) as a weighting scheme and incorporate either or both
of the dynamics and semantics of user interests. To this end,
we first provide a comparative study on different user modeling strategies considering the dynamics of user interests in
previous literature to present their comparative performance.
In addition, we investigate different types of information (i.e.,
categories, classes and connected entities via various proper-
ties) for entities from DBpedia and the combination of them
for extending user interest profiles. Finally, we build our user
modeling strategies incorporating either or both of the best-
performing methods in each dimension. Results show that
our strategies outperform two baseline strategies significantly
in the context of link recommendations on Twitter.
Data Mining In Social Networks Using K-Means Clustering Algorithmnishant24894
This topic deals with K-Means Clustering Algorithm which is used to categorize the data set into clusters depending upon their similarities like common interest or organization or colleges, etc. It categorize the data into clusters on the basis of mutual friendship.
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsPC LO
Hoang, T. A., & Lim, E. P. (2014, November). On joint modeling of topical communities and personal interest in microblogs. In International Conference on Social Informatics (pp. 1-16). Springer, Cham.
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
Evolving swings from social streams is receiving renewed interest and it is motivated by the growth of social
media and social streams. Non-conventional based approaches can be appropriate which include text, images,
URLs and videos. The focus is on evolving topics by social aspects of the networks and the mentions of user
links between users which are generated intentionally or unintentionally through replies, mentions and retweets.
A probability model of the mentioning behavior is proposed and the proposed model detects the evolving topic
from the anomalies measured. After a several experiments, it shows that mention anomaly based approaches
detects the evolving swing as early as text anomaly based approa0ches.
Analyzing User Modeling on Twitter for Personalized News RecommendationsGUANGYUAN PIAO
Presentation for reading group 30/09/2015, Check out my recent work http://parklize.github.io/#research on User Modeling which is motivated by this presentation.
Despite being controversial, research metrics are becoming a key component of research evaluation processes globally. Nevertheless, accessing research metrics to support these processes in a timely manner is not a straightforward task, as it requires either having access to expensive commercial solutions such as Elsevier SciVal or Clarivate Analytics' InCites, or having substantial knowledge of existing APIs and data sources as well as the ability and skills needed to analyse large amounts of raw scholarly data in-house. This is especially the case on a department or institutional level where large amounts of data have to be aggregated prior to analysis. To alleviate this problem we have designed and prototyped CORE Analytics Dashboard – a tool for analytical evaluation of research outputs of universities. The aim of the CORE Analytics Dashboard is to help universities analyse their performance using a variety of metrics captured from openly available data sources, including citation counts and social media metrics, and to help them compare their performance with other institutions. This paper presents the motivation behind developing this dashboard and its main features.
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Denis Parra Santander
- First version was a guest lecture about Network Visualization in the class "Data Visualization" taught by Dr. Sharon Hsiao in the QMSS program at Columbia University http://www.columbia.edu/~ih2240/dataviz/index.htm
- This updated version was delivered in our class on SNA at PUC Chile in the MPGI master program.
In this talk we shall introduce the main ideas of TruSIS (Trust in Social Internetworking System), a Marie Curie Fellowhsip financed by European Union and hosted at VU University, Department of Computer Science, Business and Web group. The goal of TruSIS is to study the baheviour of users who affiliate to multiple social networking sites and
are active in them (e.g., users may publish personal profiles on sites like MySpace and post videos on sites like YouTube). We briefly called this scenario as SIS (Social Internetworking system).
As a first research contribution, we implemented a crawler to gather data about users and link their profiles on multiple social networking websites. To this purpose we used Google Social Graph API, a powerful API released by Google in 2008. We obtained a sample of about 1.3 millions of user accounts and 36 millions of connections between them.
Parameters from social network theory (like average clustering coefficient, network modularity and so on) were used to study the structural properties of the gathered sample and how these properties depend on user behavious.
A second contribution is about the computation of distance between two users in a SIS on the basis of their social ties. We used a popular parameter from Social Network Theory known as Katz coeffcient and
provide a computationally afficient approach to computing Katz coefficient which relies on the usage of a popular tool from linear algebra known as Sherman- Morrison formula.
Finally, we shall describe our work on extending the notion of trust from single social networks to a SIS. We describe the main research challenges tied to the definition of trust and how they relate to Semantic Web technologies.
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...Sven Van Laere
Ontology engineering is far from trivial and most collaborative methods and tools start from a predefined set of rules, stakeholders can have in the ontology engineering process. We, however, believe that the different types of user behavior are not known a priori and depend on the ontology engineering project. The detection of such user profiles based on unsupervised learning allows finding roles and responsibilities along peers in a collaborative setting. In this paper, we present a method for automatic detection of user profiles in a collaborative ontology engineering environment by means of the K-means clustering algorithm only by looking at the type of interactions a user makes. In this paper we use the GOSPL ontology engineering tool and method to demonstrate this method. The data used to demonstrate the method stems from two ontology engineering projects involving respectively 42 and 36 users.
For my final year project I used data analysis techniques to investigate user behavior pattern recognition in respect of similar interests and culture versus offline geographical location. This was an out-of-the-box topic, which I selected due to my love on Data Analysis, in respect of the Social Network Analysis in the Internet era.
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...SkillCertProExams
• For a full set of 760+ questions. Go to
https://skillcertpro.com/product/databricks-certified-data-engineer-associate-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsRosie Wells
Insight: In a landscape where traditional narrative structures are giving way to fragmented and non-linear forms of storytelling, there lies immense potential for creativity and exploration.
'Collapsing Narratives: Exploring Non-Linearity' is a micro report from Rosie Wells.
Rosie Wells is an Arts & Cultural Strategist uniquely positioned at the intersection of grassroots and mainstream storytelling.
Their work is focused on developing meaningful and lasting connections that can drive social change.
Please download this presentation to enjoy the hyperlinks!
This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.
1. Laboratory of systems, software and semantics (LS3)
Ryerson university of Canada
Inferring Implicit Topical Interests on Twitter
1
Fattane Zarrinkalam
Hossein Fani
Ebrahim Bagheri
Mohsen Kahani
3. 3
Introduction
• Due to the increasing growth of user-generated content on the
web, it is interesting for users to receive only information which
are related to their interests.
• Personalization and recommender systems
• Social networks like twitter, enable users to freely communicate
with each other and share recent news, ongoing activities or
views about different topics.
• They can be seen as a viable source of information about the users
and their interests
User interest detection from Twitter
4. 4
Related Work
• Bag of Words approach
• It suffers from known problems in natural language processing like
Polysemy and Synonymy
• Topic modeling approach (e.g. LDA)
• Sparsity problem
• Tweets are short, noisy and informal (limited to 140 characters)
• The number of topics in LDA is assumed to be fixed
• They don’t consider the underlying semantics of the phrases
5. 5
Related Work
• Bag of Concepts approach
• Usually, external knowledge bases such as DBpedia, Freebase and
Yago are used as a source for extracting concepts.
6. 6
Related Work
Limitations of bag of concepts approach
An interest is often modeled to be represented using one single
concept
They cannot infer that a user is interested in a more specific topic, which is actually
a combination of multiple related concepts.
Interests are confined to a set of predefined concepts.
Interests to recent events that are not among that set cannot be discovered on the
fly.
• [Zarrinkalam et al., WI2015] We view each topic of interest as a
conjunction of several semantic concepts which are temporally
correlated on Twitter.
• Topic of interest: {Premier League, Arsenal F.C., Tottenham Hotspur
F.C., Arsène Wenger}
• represents rivalry between Spurs and Arsenal
7. 7
Related Work
• Many previous works are related to Explicit interests detection:
• Interests that are directly derivable from a user’s tweets
• Little is known in detecting Implicit interests, topics that the user never
explicitly engaged with but might have interest in.
• Homophily theory
• Semantic relatedness between topics
• They view each topic as a single concept,
• the relationship between two topics is predefined in the external
knowledge base.
8. 8
Proposed Approach
• The main objective of our work:
• Determining implicit interests of users over the emerging topics on Twitter
• Our Model:
• A graph-based link prediction schema that operates over a
heterogeneous graph which uses three types of information:
• Users’ Explicit interest profile
• Theory of Homophily (user followership relations)
• Relationship between emerging topics
which or what combination of these three types of information are most
effective in allowing us to accurately identify a user’s implicit interests?
10. 10
Representation Model (User-Topic graph)
• Emerging Topic:
• z = {(c, w(c, z)) | c ∈ C}
• w(c, z) : the importance of concept c in topic z.
•
• The weight of each edge euz ∈ EUℤ :
• The degree of u’s explicit interest in topic z
• Our intuition:
• the more a user tweets about a certain topic, the more interested the
user would be in that topic.
• Occurrence Ratio of topic z in tweet m:
• euz is calculated by averaging the value of OR(z , m) over all tweets posted
by the specific user u with regards to topic z.
11. 11
Representation Model (Topic-Topic graph)
• Topic Relatedness
1. Semantics relatedness
• Semantic relatedness of their constituent concepts
• Using a Wikipedia-based measures [Witten et al, AAAI2008]
2. Collaborative relatedness
• Based on users’ overlapping explicit contributions toward these topics
• Using collaborative filtering approach
3. Hybrid approach
• Based on both the semantic relatedness of the concepts within each
topic as well as users’ contributions towards the emerging topics
.
12. 12
Representation Model (Topic-Topic graph)
• Collaborative relatedness
• Adopting a factored item-item collaborative filtering method [Kabbur et
al., SIGKDD2013]
• Input: a user-item rating matrix R <user-topic graph information >
• P and Q (latent factors of items ) can be learnt by minimizing an
optimization problem
• Output: item-item similarities as a product of two rank matrices, P
and Q. <collaborative relatedness of topics>
13. 13
Representation Model (Topic-Topic graph)
• Hybrid approach
• We follow the assumption of [Yu er al., TKDE 2014] to add the item
attribute information into optimization problem of factored collaborative
filtering method.
• S is a matrix in which Sii’ denotes the similarity between topic zi and topic
zi’ based on their attributes.
• attributes of each topic are its constituent concepts
• Sii: semantic relatedness of two topics
14. 14
Link Prediction
• Unsupervised link prediction strategies:
• There is no single superior method among existing work and
their quality is dependent on the structure of the underlying
graph. [Liben-Nowell, J. Am. Soc. Inf. Sci., 2007]
• Adamic/Adar
• Common Neighbors
• Jaccard’s coefficient
• Katz
• SimRank
15. 15
Experiments
• Dataset
• Twitter dataset: 3M tweets posted by approximately 135K
users
• TAGME as a semantic annotator
• Evaluation Methodology
• leave-one-out method
• Metrics
• Area Under Receiver Operating Characteristic (AUROC) curve
• Area Under the Precision-Recall (AUPR) curve
16. 16
Experiments
Seven variants of our representation model to compare
followership information (F)
Semantic relatedness (S)
collaborative relatedness (C)
hybrid relatedness (CS).
22. 22
Conclusion and Future work
• Conclusion:
• We modeled user implicit interest detection problem as a link prediction
task over a graph including three type of information: followerships, users
explicit interests over the emerging topics and topics relatedness.
• We investigated the impact these methods on the accuracy of implicit
interest detection, by comparing different variants of our representation
model and applying some well-known link prediction strategies.
• Future work:
• Using link prediction methods introduced for heterogeneous graphs
• Including temporal behavior of users toward topics in our model
23. 23
References
• L.M. Aiello, G. Petkos, C. Martin, D. Corney, S. Papadopoulos, R. Skraba, A. Goker, I.
Kompatsiaris, A. Jaimes, Sensing Trending Topics in Twitter, IEEE Transaction on
Multimedia, vol. 15, no. 6, pp. 1268 - 1282, 2013.
• M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based
on temporal and social terms evaluation. In Proceedings of the Tenth International
Workshop on Multimedia Data Mining, MDMKDD ’10, pages 4:1–4:10, New York,
NY, USA, 2010. ACM.
• F. Zarrinkalam, H. Fani, E. Bagheri, M. Kahani, W. Du, “Semantics-enabled User
Interest Detection from Twitter”, IEEE/WIC/ACM Web Intelligence Conference,
2015.
• Abel, F., Gao, Q., Houben, G.J., Tao, K.: Analyzing user modeling on twitter for
personalized news recommendations. In: 19th International Conference on User
Modeling, Adaption and Personalization (UMAP ‘11), pp. 1-12. Springer (2011)
• Ferragina, P., Scaiella, U.: Fast and Accurate Annotation of Short Texts with
Wikipedia Pages. J. IEEE Software 29(1), pp. 70-75. IEEE (2012)
24. 24
References
• Michelson, M., Macskassy, S.A.: Discovering Users’ Topics of Interest on Twitter: A
First Look. In: 4th Workshop on Analytics for Noisy Unstructured Text Data
(AND'10), pp. 73-80 (2010)
• Abel, F., Gao, Q., Houben, G.J., Tao, K.: Semantic Enrichment of Twitter Posts for
User Profile Construction on the Social Web. In: 8th Extended Semantic Web
Conference (ESWC ’11), pp. 375-389. Springer (2011)
• Kapanipathi,P., Jain, P., Venkataramani, C., Sheth, A.: User Interests dentification on
Twitter Using a Hierarchical Knowledge Base. In: 11th Extended Semantic Web
Conference (ESWC ’14), pp. 99-113. Springer (2014)
• Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know:
Inferring user profiles in online social networks. In: 3th ACM international
conference on Web search and data mining (WSDM’10), pp. 251-260. ACM (2010)
• Wang, J., Zhao, W.X., He, Y., Li, X.: Infer User Interests via Link Structure
Regularization. ACM Transactions on Intelligent Systems and Technology (TIST) -
Special Issue on Linking Social Granularity and Functions 5(2), ACM (2014)
25. 25
References
• Santosh Kabbur, Xia Ning, George Karypis,FISM: factored item similarity models for
top-N recommender systems, Proceedings of the 19th ACM SIGKDD international
conference on Knowledge discovery and data mining, pp. 659-667, 2013.
• Yu, Y.; Wang, C.; and Gao, Y. 2014. Attributes coupling based item enhanced matrix
factorization technique for recommender systems. arXiv preprint arXiv:1405.0770
• Liben-Nowell, D. and Kleinberg, J. (2007), The link-prediction problem for social
networks. J. Am. Soc. Inf. Sci., 58: 1019–1031. doi: 10.1002/asi.20591
• Cheng, Xueqi, Yan, Xiaohui, Lan, Yanyan, Guo, Jiafeng: BTM: Topic Modeling over
Short Texts. IEEE Transactions on Knowledge and Data Engineering 26(12), 2928-
2941, IEEE (2014)
• Parantapa Bhattacharya, Muhammad Bilal Zafar, Niloy Ganguly, Saptarshi Ghosh,
and Krishna P. Gummadi. 2014. Inferring user interests in the Twitter social
network. In Proceedings of the 8th ACM Conference on Recommender systems
(RecSys '14). ACM, New York, NY, USA, 357-360.
my presentation is about inferring user’s implicit interests from twitter.
As you know, there are many user-generated content on the web. So it is interesting for users to recieve only information which are related to their interests.
Therefore, the main step in all personalization and recommender systems like news recommendation is user interest detection.
The works in this fild, in terms of how they represent the user’s interests can be divided into three categories:
Bag of words: Each user interest is represented as a term extracted from the user contents
Topic Modeling approach like LDA which Implicitly use co-occurrence patterns of terms don’t perform well on tweets which are short, noisy and informal.
Generally, both of these approaches are based on terms, so they don’t consider underlying semantics of the tweets.
There is another line of work that use the concepts defined in external knowledge bases like Dbpedia to represent user interests. Different existing semantic annotators are used in these works like Zemanta, TagMe, Open calies.
For example, in this slide you can see the results of TagMe for this real tweet. Arsenal is annotated with arsenal FC and spurse with tottenham.
These works has some limitation, the most important one is that they view each topic as a single concept so they cant specify specific interests of users. So in our previous work which is published in WI2015, we view each topic as a combination of several concepts which are temporally corelated on twitter.
Independent of how they represent user interests, Most of the previous works have focused on extracting explicit interests through analysing only their textual contents.
However, little is known in detecting implicit interests. We mean by implicit interests, the topics that the user never explicitly engaged with but might have interest in.
Among these works, they only consider homophily theory or predefined relatedness between topics.
Based on this theory, users tend to connect to users with common interests or preferences.
In this paper, a graph based link prediction schema is proposed to infer implicit interests of the users towards emerging topics in Twitter. The underlying graph of this schema uses three types of information: user’s followerships, user’s explicit interests towards the topics, and the relatedness of the topics.
we propose a comprehensive graph-based representation model that includes these three types of information. This heterogenous graph composed of three subgraphs, user graph, an unweighted and directed for representing followership relations between users on Twitter,
Topic graph that shows potential relationships between detected emerging topics in ℤ. And finally User-topic graph to represent explicit interests of users.
Here, in line with our previous work, we view each topic as a set of weighted concepts extracted from wikipedia.
our intuition to calculate the value of explicit interest of each user to each topic, is that the more the user tweets about the topic, the more interested the users in that topic. So, we first calculate the relatedness of each of her tweets to that topic and then average over them.
where (c,m) is 1, if Tweet m is annotated with concept c, otherwise, (c,m)=0.
We use three approaches for compute the relatedness between our topics.
The first one is semantic relatedness. Based on this approach, two topics are considered to be similar if their concepts are semantically similar. Because the concepts are wikipedia concepts, so we utilize an existing Wikipedia-based relatedness measure to compute the relatedness between topics.
In collaborative relatedness approach, the relatedness of two topics is determined based on a collaborative filtering strategy over explicit interests of users.
And Hybrid approach is based on both semantic and collaborative relatedness.
the semantic relatedness of two emerging topics can be calculated by measuring the average pairwise semantic relatedness between the concepts of the two topics using a Wikipedia-based relatedness measure.
In our experiments, we use WLM [22], which computes the relatedness of two Wikipedia concepts through link structure analysis.
Given a user-topic graph GUℤ, we regard the problem of computing the collaborative relatedness of topics as an instance of a model-based collaborative filtering problem.
For Collaborative filtering approach, we adopt an existing factored item-item collaborative filtering method presented in SIGKDD conference..
They give user-topic rating matrix as input
They solve the optimization problem as shown in the slide and learn latent factors of items p and Q. Finally, collaborative relatedness of topics can be computed by the product of these two matrix.
In our work, each item is considered as a topic and we build user item rating matrix based on the explicit interests of users to topics
///////////////////////////////////////////////
where Ru+ is the set of topics that user u is interested in, pj and qi are the learned topic latent factors, nu+ is the number of topics that user u is interested in and is a user specified parameter between 0 and 1. According to [24], matrices P and Q can be learnt by minimizing a regularized optimization problem:
where the vectors bu and bi correspond to the vector of user u and topic zi biases, respectively.
The optimization problem can be solved using Stochastic Gradient Descent to learn two matrices P and Q. Given P and Q as latent factors of topics, the collaborative relatedness of two topics zi and zj is computed as the dot product between the corresponding factors from P and Q i.e., pi and qj.
Here, in this approach, we follow the assumption of the paper published in TKDE journal. [IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING] to add item attribute information in optimization problem of factored item-item collaborative filtering method. Based on their approach, by adding this term to the optimization problem in the previous slide, two topic latent vectors would be considered similar if they are similar according to their attribute information. In our work, attributes are concepts of each topic and s is calculated by measuring the semantic relatedness of topics.
In this term, p and q are latent factors of topics and matrix S denotes the similarity between topics based on their attributes.
/////////////////////////////////////
where is a parameter to control the impact of topic concept information, S is a matrix in which Sii’ denotes the similarity between topic zi and topic zi’ based on their attributes. In our proposed approach, attributes of each topic are its constituent concepts and Sii’ is calculated by measuring the semantic relatedness of two topics as introduced earlier.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Two item latent feature vectors would be considered similar if they are similar according to their attribute information.
After building the representation model, our goal is finding missing links of user-topic graph by adopting an unsupervised link prediction strategy. Because there isnt any superiour approach, we apply different known link prediction methods.
The three first methods are neiborhood based and katz an simrank are path-based methods.
Vertex neighborhood methods are based on the idea that two vertices x and y are more likely to have a link if they have many common neighbors. Path-based methods consider the ensemble of all paths between two vertices
For our experiments, we use an availible twitter dataset on the web about 3 milion tweets sampled in two month of 2010. further, we utilize TagMe to annotate the tweets with wikipedia concepts
As our evaluation strategy, we use leave-one-out method . At each time, we pick one edge from user-topic graph for test and the rest of the representation model for training set. We repeat this procedure for all pairs.
We use two metrics to evaluate the results by comparing them with the test set: the Area Under Receiver Operating Characteristic (AUROC) and the Area Under the Precision-Recall (AUPR) curves
Because we want to investigate the impact of different types of information on the accuracy of implicit interest detection from twitter, here we consider 7 variants of our representation model to compare.
As an example the method named F only uses followership information in addition to user explicit information in the representation model.
The final results in terms of the two evaluation metrics and by applying different link prediction strategies are reported in this table.
As illustrated in this table, we can clearly see that the SimRank link prediction method has not shown a good performance over none of the variants. Based on our results, actually, it acts as a random predictor because for most of the models its AUROC value is about 0.5. So, we ignore its results for analyzing the influence of the different information in our representation model.
As the first observation, In this table we can see that all three models that use topic relationships C, S and CS outperform Model F noticeably in terms of two metrics. This means that considering the relationships between the topics considerably improves the accuracy of inferring implicit interests in comparison to when only followership information is used.
Here, By comparing S, C and CS themselves, it can be observed that using the semantic relatedness has higher accuracy compared to others. This is an interesting observation that users are interested in topics that are around similar topics. For an instance, two topics z1={Chelsea F.C., Arsenal F.C.} and z2= {FC Barcelona, Real Madrid C.F.} are the most semantically similar topics in our data. it is reasonable to infer, that a user who is explicitly interested in one of these derbies, is probably interested also in the other one.
In this table, by comparing C and CS, it can also be concluded that adding semantic relatedness to collaborative relatedness measure leads to improve accuracy.
The observation that S is the best is more interesting when compare the computational complexity of these methods. the computation of C and CS require solving an optimization problem through Stochastic Gradient Descent is an expensive operation compared to S.
As another observation, the model SF add the followership information in the S. Based on the results, no uniform observation can be made in any of the cases.
I mean, the followership information does not seem to have a noticeable impact on the results. So, through our experiments we were not able to show the impact of homophily theory.
We can also conclude all the previous observations by comparing their curves.
To conclude, we have modeled identifying user implicit interests of users as a link prediction task over a hetrogenous graph including three kinds of information.
To investigate the impact of different information, we compared different variaents of our representation model and have concuded that considering the relationship between topic considerably out performs the method that only use followership information. Further, among topic relatedness method, semantic relatedness is the best.
In summary, model S, which relies solely on the semantic relatedness of topics and user’s explicit contributions to these topics shows the best performance across all seven variants. The model SF shows the same performance as S in which the additional followership information included in the model does not seem to have impacted the final results.