Slides ecir2016

Laboratory of systems, software and semantics (LS3)
Ryerson university of Canada
Inferring Implicit Topical Interests on Twitter
1
Fattane Zarrinkalam
Hossein Fani
Ebrahim Bagheri
Mohsen Kahani

2
Outline
• Introduction
• Related Work and Motivation
• Proposed Approach
• Evaluation
• Conclusion and future work

3
Introduction
• Due to the increasing growth of user-generated content on the
web, it is interesting for users to receive only information which
are related to their interests.
• Personalization and recommender systems
• Social networks like twitter, enable users to freely communicate
with each other and share recent news, ongoing activities or
views about different topics.
• They can be seen as a viable source of information about the users
and their interests
User interest detection from Twitter

4
Related Work
• Bag of Words approach
• It suffers from known problems in natural language processing like
Polysemy and Synonymy
• Topic modeling approach (e.g. LDA)
• Sparsity problem
• Tweets are short, noisy and informal (limited to 140 characters)
• The number of topics in LDA is assumed to be fixed
• They don’t consider the underlying semantics of the phrases

5
Related Work
• Bag of Concepts approach
• Usually, external knowledge bases such as DBpedia, Freebase and
Yago are used as a source for extracting concepts.

6
Related Work
Limitations of bag of concepts approach
 An interest is often modeled to be represented using one single
concept
They cannot infer that a user is interested in a more specific topic, which is actually
a combination of multiple related concepts.
 Interests are confined to a set of predefined concepts.
Interests to recent events that are not among that set cannot be discovered on the
fly.
• [Zarrinkalam et al., WI2015] We view each topic of interest as a
conjunction of several semantic concepts which are temporally
correlated on Twitter.
• Topic of interest: {Premier League, Arsenal F.C., Tottenham Hotspur
F.C., Arsène Wenger}
• represents rivalry between Spurs and Arsenal

7
Related Work
• Many previous works are related to Explicit interests detection:
• Interests that are directly derivable from a user’s tweets
• Little is known in detecting Implicit interests, topics that the user never
explicitly engaged with but might have interest in.
• Homophily theory
• Semantic relatedness between topics
• They view each topic as a single concept,
• the relationship between two topics is predefined in the external
knowledge base.

8
Proposed Approach
• The main objective of our work:
• Determining implicit interests of users over the emerging topics on Twitter
• Our Model:
• A graph-based link prediction schema that operates over a
heterogeneous graph which uses three types of information:
• Users’ Explicit interest profile
• Theory of Homophily (user followership relations)
• Relationship between emerging topics
which or what combination of these three types of information are most
effective in allowing us to accurately identify a user’s implicit interests?

9
Representation Model
User Graph GU
Topic Graph Gℤ
User-Topic Graph GUℤ

10
Representation Model (User-Topic graph)
• Emerging Topic:
• z = {(c, w(c, z)) | c ∈ C}
• w(c, z) : the importance of concept c in topic z.
•
• The weight of each edge euz ∈ EUℤ :
• The degree of u’s explicit interest in topic z
• Our intuition:
• the more a user tweets about a certain topic, the more interested the
user would be in that topic.
• Occurrence Ratio of topic z in tweet m:
• euz is calculated by averaging the value of OR(z , m) over all tweets posted
by the specific user u with regards to topic z.

11
Representation Model (Topic-Topic graph)
• Topic Relatedness
1. Semantics relatedness
• Semantic relatedness of their constituent concepts
• Using a Wikipedia-based measures [Witten et al, AAAI2008]
2. Collaborative relatedness
• Based on users’ overlapping explicit contributions toward these topics
• Using collaborative filtering approach
3. Hybrid approach
• Based on both the semantic relatedness of the concepts within each
topic as well as users’ contributions towards the emerging topics
.

12
• Collaborative relatedness
• Adopting a factored item-item collaborative filtering method [Kabbur et
al., SIGKDD2013]
• Input: a user-item rating matrix R <user-topic graph information >
• P and Q (latent factors of items ) can be learnt by minimizing an
optimization problem
• Output: item-item similarities as a product of two rank matrices, P
and Q. <collaborative relatedness of topics>

13
• Hybrid approach
• We follow the assumption of [Yu er al., TKDE 2014] to add the item
attribute information into optimization problem of factored collaborative
filtering method.
• S is a matrix in which Sii’ denotes the similarity between topic zi and topic
zi’ based on their attributes.
• attributes of each topic are its constituent concepts
• Sii: semantic relatedness of two topics

14
Link Prediction
• Unsupervised link prediction strategies:
• There is no single superior method among existing work and
their quality is dependent on the structure of the underlying
graph. [Liben-Nowell, J. Am. Soc. Inf. Sci., 2007]
• Adamic/Adar
• Common Neighbors
• Jaccard’s coefficient
• Katz
• SimRank

15
Experiments
• Dataset
• Twitter dataset: 3M tweets posted by approximately 135K
users
• TAGME as a semantic annotator
• Evaluation Methodology
• leave-one-out method
• Metrics
• Area Under Receiver Operating Characteristic (AUROC) curve
• Area Under the Precision-Recall (AUPR) curve

16
Experiments
Seven variants of our representation model to compare
followership information (F)
Semantic relatedness (S)
collaborative relatedness (C)
hybrid relatedness (CS).

17
Results
Model Metric
Adamic/
Adar
Common
Neighbor
Jaccard
Coefficient
Katz
SimRank
=0.8= 0.0005 = 0.005 = 0.05
F
AUROC 0.500 0.500 0.500 0.524 0.524 0.528 0.510
AUPR 0.438 0.438 0.438 0.454 0.454 0.458 0.422
S
AUROC 0.791 0.790 0.774 0.790 0.790 0.788 0.500
AUPR 0.740 0.739 0.723 0.740 0.739 0.734 0.438
SF
AUROC 0.791 0.790 0.762 0.757 0.753 0.720 0.520
AUPR 0.740 0.739 0.707 0.660 0.652 0.602 0.430
C
AUROC 0.712 0.710 0.700 0.714 0.715 0.728 0.500
AUPR 0.657 0.651 0.610 0.657 0.661 0.680 0.438
CF
AUROC 0.773 0.771 0.758 0.742 0.738 0.716 0.517
AUPR 0.717 0.714 0.692 0.647 0.640 0.602 0.428
CS
AUROC 0.762 0.761 0.748 0.763 0.763 0.767 0.500
AUPR 0.697 0.695 0.661 0.699 0.699 0.707 0.438
CSF
AUROC 0.762 0.761 0.738 0.736 0.732 0.707 0.520
AUPR 0.697 0.695 0.652 0.640 0.632 0.595 0.428
The AUROC/AUPR values showing the performance of different model variants.

18
Results
Model Metric
Adamic/
Adar
Common
Neighbor
Jaccard
Coefficient
Katz
= 0.0005 = 0.005 = 0.05
F
AUROC 0.500 0.500 0.500 0.524 0.524 0.528
AUPR 0.438 0.438 0.438 0.454 0.454 0.458
S
AUROC 0.791 0.790 0.774 0.790 0.790 0.788
AUPR 0.740 0.739 0.723 0.740 0.739 0.734
C
AUROC 0.712 0.710 0.700 0.714 0.715 0.728
AUPR 0.657 0.651 0.610 0.657 0.661 0.680
CS
AUROC 0.762 0.761 0.748 0.763 0.763 0.767
AUPR 0.697 0.695 0.661 0.699 0.699 0.707

19
Results
Model Metric
Adamic/
Adar
Common
Neighbor
Jaccard
Coefficient
Katz
= 0.0005 = 0.005 = 0.05
S
AUROC 0.791 0.790 0.774 0.790 0.790 0.788
AUPR 0.740 0.739 0.723 0.740 0.739 0.734
C
AUROC 0.712 0.710 0.700 0.714 0.715 0.728
AUPR 0.657 0.651 0.610 0.657 0.661 0.680
CS
AUROC 0.762 0.761 0.748 0.763 0.763 0.767
AUPR 0.697 0.695 0.661 0.699 0.699 0.707

20
Results
Model Metric
Adamic/
Adar
Common
Neighbor
Jaccard
Coefficient
Katz
= 0.0005 = 0.005 = 0.05
S
AUROC 0.791 0.790 0.774 0.790 0.790 0.788
AUPR 0.740 0.739 0.723 0.740 0.739 0.734
SF
AUROC 0.791 0.790 0.762 0.757 0.753 0.720
AUPR 0.740 0.739 0.707 0.660 0.652 0.602
C
AUROC 0.712 0.710 0.700 0.714 0.715 0.728
AUPR 0.657 0.651 0.610 0.657 0.661 0.680
CF
AUROC 0.773 0.771 0.758 0.742 0.738 0.716
AUPR 0.717 0.714 0.692 0.647 0.640 0.602
CS
AUROC 0.762 0.761 0.748 0.763 0.763 0.767
AUPR 0.697 0.695 0.661 0.699 0.699 0.707
CSF
AUROC 0.762 0.761 0.738 0.736 0.732 0.707
AUPR 0.697 0.695 0.652 0.640 0.632 0.595

21
Results
The ROC curves for comparing the seven variants

22
Conclusion and Future work
• Conclusion:
• We modeled user implicit interest detection problem as a link prediction
task over a graph including three type of information: followerships, users
explicit interests over the emerging topics and topics relatedness.
• We investigated the impact these methods on the accuracy of implicit
interest detection, by comparing different variants of our representation
model and applying some well-known link prediction strategies.
• Future work:
• Using link prediction methods introduced for heterogeneous graphs
• Including temporal behavior of users toward topics in our model

23
References
• L.M. Aiello, G. Petkos, C. Martin, D. Corney, S. Papadopoulos, R. Skraba, A. Goker, I.
Kompatsiaris, A. Jaimes, Sensing Trending Topics in Twitter, IEEE Transaction on
Multimedia, vol. 15, no. 6, pp. 1268 - 1282, 2013.
• M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based
on temporal and social terms evaluation. In Proceedings of the Tenth International
Workshop on Multimedia Data Mining, MDMKDD ’10, pages 4:1–4:10, New York,
NY, USA, 2010. ACM.
• F. Zarrinkalam, H. Fani, E. Bagheri, M. Kahani, W. Du, “Semantics-enabled User
Interest Detection from Twitter”, IEEE/WIC/ACM Web Intelligence Conference,
2015.
• Abel, F., Gao, Q., Houben, G.J., Tao, K.: Analyzing user modeling on twitter for
personalized news recommendations. In: 19th International Conference on User
Modeling, Adaption and Personalization (UMAP ‘11), pp. 1-12. Springer (2011)
• Ferragina, P., Scaiella, U.: Fast and Accurate Annotation of Short Texts with
Wikipedia Pages. J. IEEE Software 29(1), pp. 70-75. IEEE (2012)

24
References
• Michelson, M., Macskassy, S.A.: Discovering Users’ Topics of Interest on Twitter: A
First Look. In: 4th Workshop on Analytics for Noisy Unstructured Text Data
(AND'10), pp. 73-80 (2010)
• Abel, F., Gao, Q., Houben, G.J., Tao, K.: Semantic Enrichment of Twitter Posts for
User Profile Construction on the Social Web. In: 8th Extended Semantic Web
Conference (ESWC ’11), pp. 375-389. Springer (2011)
• Kapanipathi,P., Jain, P., Venkataramani, C., Sheth, A.: User Interests dentification on
Twitter Using a Hierarchical Knowledge Base. In: 11th Extended Semantic Web
Conference (ESWC ’14), pp. 99-113. Springer (2014)
• Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know:
Inferring user profiles in online social networks. In: 3th ACM international
conference on Web search and data mining (WSDM’10), pp. 251-260. ACM (2010)
• Wang, J., Zhao, W.X., He, Y., Li, X.: Infer User Interests via Link Structure
Regularization. ACM Transactions on Intelligent Systems and Technology (TIST) -
Special Issue on Linking Social Granularity and Functions 5(2), ACM (2014)

25
References
• Santosh Kabbur, Xia Ning, George Karypis,FISM: factored item similarity models for
top-N recommender systems, Proceedings of the 19th ACM SIGKDD international
conference on Knowledge discovery and data mining, pp. 659-667, 2013.
• Yu, Y.; Wang, C.; and Gao, Y. 2014. Attributes coupling based item enhanced matrix
factorization technique for recommender systems. arXiv preprint arXiv:1405.0770
• Liben-Nowell, D. and Kleinberg, J. (2007), The link-prediction problem for social
networks. J. Am. Soc. Inf. Sci., 58: 1019–1031. doi: 10.1002/asi.20591
• Cheng, Xueqi, Yan, Xiaohui, Lan, Yanyan, Guo, Jiafeng: BTM: Topic Modeling over
Short Texts. IEEE Transactions on Knowledge and Data Engineering 26(12), 2928-
2941, IEEE (2014)
• Parantapa Bhattacharya, Muhammad Bilal Zafar, Niloy Ganguly, Saptarshi Ghosh,
and Krishna P. Gummadi. 2014. Inferring user interests in the Twitter social
network. In Proceedings of the 8th ACM Conference on Recommender systems
(RecSys '14). ACM, New York, NY, USA, 357-360.

Slides ecir2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Slides ecir2016

Similar to Slides ecir2016 (20)

Recently uploaded

Recently uploaded (19)

Slides ecir2016

Editor's Notes