SlideShare a Scribd company logo
Laboratory of systems, software and semantics (LS3)
Ryerson university of Canada
Inferring Implicit Topical Interests on Twitter
1
Fattane Zarrinkalam
Hossein Fani
Ebrahim Bagheri
Mohsen Kahani
2
Outline
• Introduction
• Related Work and Motivation
• Proposed Approach
• Evaluation
• Conclusion and future work
3
Introduction
• Due to the increasing growth of user-generated content on the
web, it is interesting for users to receive only information which
are related to their interests.
• Personalization and recommender systems
• Social networks like twitter, enable users to freely communicate
with each other and share recent news, ongoing activities or
views about different topics.
• They can be seen as a viable source of information about the users
and their interests
User interest detection from Twitter
4
Related Work
• Bag of Words approach
• It suffers from known problems in natural language processing like
Polysemy and Synonymy
• Topic modeling approach (e.g. LDA)
• Sparsity problem
• Tweets are short, noisy and informal (limited to 140 characters)
• The number of topics in LDA is assumed to be fixed
• They don’t consider the underlying semantics of the phrases
5
Related Work
• Bag of Concepts approach
• Usually, external knowledge bases such as DBpedia, Freebase and
Yago are used as a source for extracting concepts.
6
Related Work
Limitations of bag of concepts approach
 An interest is often modeled to be represented using one single
concept
They cannot infer that a user is interested in a more specific topic, which is actually
a combination of multiple related concepts.
 Interests are confined to a set of predefined concepts.
Interests to recent events that are not among that set cannot be discovered on the
fly.
• [Zarrinkalam et al., WI2015] We view each topic of interest as a
conjunction of several semantic concepts which are temporally
correlated on Twitter.
• Topic of interest: {Premier League, Arsenal F.C., Tottenham Hotspur
F.C., Arsène Wenger}
• represents rivalry between Spurs and Arsenal
7
Related Work
• Many previous works are related to Explicit interests detection:
• Interests that are directly derivable from a user’s tweets
• Little is known in detecting Implicit interests, topics that the user never
explicitly engaged with but might have interest in.
• Homophily theory
• Semantic relatedness between topics
• They view each topic as a single concept,
• the relationship between two topics is predefined in the external
knowledge base.
8
Proposed Approach
• The main objective of our work:
• Determining implicit interests of users over the emerging topics on Twitter
• Our Model:
• A graph-based link prediction schema that operates over a
heterogeneous graph which uses three types of information:
• Users’ Explicit interest profile
• Theory of Homophily (user followership relations)
• Relationship between emerging topics
which or what combination of these three types of information are most
effective in allowing us to accurately identify a user’s implicit interests?
9
Representation Model
User Graph GU
Topic Graph Gℤ
User-Topic Graph GUℤ
10
Representation Model (User-Topic graph)
• Emerging Topic:
• z = {(c, w(c, z)) | c ∈ C}
• w(c, z) : the importance of concept c in topic z.
•
• The weight of each edge euz ∈ EUℤ :
• The degree of u’s explicit interest in topic z
• Our intuition:
• the more a user tweets about a certain topic, the more interested the
user would be in that topic.
• Occurrence Ratio of topic z in tweet m:
• euz is calculated by averaging the value of OR(z , m) over all tweets posted
by the specific user u with regards to topic z.
11
Representation Model (Topic-Topic graph)
• Topic Relatedness
1. Semantics relatedness
• Semantic relatedness of their constituent concepts
• Using a Wikipedia-based measures [Witten et al, AAAI2008]
2. Collaborative relatedness
• Based on users’ overlapping explicit contributions toward these topics
• Using collaborative filtering approach
3. Hybrid approach
• Based on both the semantic relatedness of the concepts within each
topic as well as users’ contributions towards the emerging topics
.
12
Representation Model (Topic-Topic graph)
• Collaborative relatedness
• Adopting a factored item-item collaborative filtering method [Kabbur et
al., SIGKDD2013]
• Input: a user-item rating matrix R <user-topic graph information >
• P and Q (latent factors of items ) can be learnt by minimizing an
optimization problem
• Output: item-item similarities as a product of two rank matrices, P
and Q. <collaborative relatedness of topics>
13
Representation Model (Topic-Topic graph)
• Hybrid approach
• We follow the assumption of [Yu er al., TKDE 2014] to add the item
attribute information into optimization problem of factored collaborative
filtering method.
• S is a matrix in which Sii’ denotes the similarity between topic zi and topic
zi’ based on their attributes.
• attributes of each topic are its constituent concepts
• Sii: semantic relatedness of two topics
14
Link Prediction
• Unsupervised link prediction strategies:
• There is no single superior method among existing work and
their quality is dependent on the structure of the underlying
graph. [Liben-Nowell, J. Am. Soc. Inf. Sci., 2007]
• Adamic/Adar
• Common Neighbors
• Jaccard’s coefficient
• Katz
• SimRank
15
Experiments
• Dataset
• Twitter dataset: 3M tweets posted by approximately 135K
users
• TAGME as a semantic annotator
• Evaluation Methodology
• leave-one-out method
• Metrics
• Area Under Receiver Operating Characteristic (AUROC) curve
• Area Under the Precision-Recall (AUPR) curve
16
Experiments
Seven variants of our representation model to compare
followership information (F)
Semantic relatedness (S)
collaborative relatedness (C)
hybrid relatedness (CS).
17
Results
Model Metric
Adamic/
Adar
Common
Neighbor
Jaccard
Coefficient
Katz
SimRank
=0.8= 0.0005 = 0.005 = 0.05
F
AUROC 0.500 0.500 0.500 0.524 0.524 0.528 0.510
AUPR 0.438 0.438 0.438 0.454 0.454 0.458 0.422
S
AUROC 0.791 0.790 0.774 0.790 0.790 0.788 0.500
AUPR 0.740 0.739 0.723 0.740 0.739 0.734 0.438
SF
AUROC 0.791 0.790 0.762 0.757 0.753 0.720 0.520
AUPR 0.740 0.739 0.707 0.660 0.652 0.602 0.430
C
AUROC 0.712 0.710 0.700 0.714 0.715 0.728 0.500
AUPR 0.657 0.651 0.610 0.657 0.661 0.680 0.438
CF
AUROC 0.773 0.771 0.758 0.742 0.738 0.716 0.517
AUPR 0.717 0.714 0.692 0.647 0.640 0.602 0.428
CS
AUROC 0.762 0.761 0.748 0.763 0.763 0.767 0.500
AUPR 0.697 0.695 0.661 0.699 0.699 0.707 0.438
CSF
AUROC 0.762 0.761 0.738 0.736 0.732 0.707 0.520
AUPR 0.697 0.695 0.652 0.640 0.632 0.595 0.428
The AUROC/AUPR values showing the performance of different model variants.
18
Results
Model Metric
Adamic/
Adar
Common
Neighbor
Jaccard
Coefficient
Katz
= 0.0005 = 0.005 = 0.05
F
AUROC 0.500 0.500 0.500 0.524 0.524 0.528
AUPR 0.438 0.438 0.438 0.454 0.454 0.458
S
AUROC 0.791 0.790 0.774 0.790 0.790 0.788
AUPR 0.740 0.739 0.723 0.740 0.739 0.734
C
AUROC 0.712 0.710 0.700 0.714 0.715 0.728
AUPR 0.657 0.651 0.610 0.657 0.661 0.680
CS
AUROC 0.762 0.761 0.748 0.763 0.763 0.767
AUPR 0.697 0.695 0.661 0.699 0.699 0.707
The AUROC/AUPR values showing the performance of different model variants.
19
Results
Model Metric
Adamic/
Adar
Common
Neighbor
Jaccard
Coefficient
Katz
= 0.0005 = 0.005 = 0.05
S
AUROC 0.791 0.790 0.774 0.790 0.790 0.788
AUPR 0.740 0.739 0.723 0.740 0.739 0.734
C
AUROC 0.712 0.710 0.700 0.714 0.715 0.728
AUPR 0.657 0.651 0.610 0.657 0.661 0.680
CS
AUROC 0.762 0.761 0.748 0.763 0.763 0.767
AUPR 0.697 0.695 0.661 0.699 0.699 0.707
The AUROC/AUPR values showing the performance of different model variants.
20
Results
Model Metric
Adamic/
Adar
Common
Neighbor
Jaccard
Coefficient
Katz
= 0.0005 = 0.005 = 0.05
S
AUROC 0.791 0.790 0.774 0.790 0.790 0.788
AUPR 0.740 0.739 0.723 0.740 0.739 0.734
SF
AUROC 0.791 0.790 0.762 0.757 0.753 0.720
AUPR 0.740 0.739 0.707 0.660 0.652 0.602
C
AUROC 0.712 0.710 0.700 0.714 0.715 0.728
AUPR 0.657 0.651 0.610 0.657 0.661 0.680
CF
AUROC 0.773 0.771 0.758 0.742 0.738 0.716
AUPR 0.717 0.714 0.692 0.647 0.640 0.602
CS
AUROC 0.762 0.761 0.748 0.763 0.763 0.767
AUPR 0.697 0.695 0.661 0.699 0.699 0.707
CSF
AUROC 0.762 0.761 0.738 0.736 0.732 0.707
AUPR 0.697 0.695 0.652 0.640 0.632 0.595
The AUROC/AUPR values showing the performance of different model variants.
21
Results
The ROC curves for comparing the seven variants
22
Conclusion and Future work
• Conclusion:
• We modeled user implicit interest detection problem as a link prediction
task over a graph including three type of information: followerships, users
explicit interests over the emerging topics and topics relatedness.
• We investigated the impact these methods on the accuracy of implicit
interest detection, by comparing different variants of our representation
model and applying some well-known link prediction strategies.
• Future work:
• Using link prediction methods introduced for heterogeneous graphs
• Including temporal behavior of users toward topics in our model
23
References
• L.M. Aiello, G. Petkos, C. Martin, D. Corney, S. Papadopoulos, R. Skraba, A. Goker, I.
Kompatsiaris, A. Jaimes, Sensing Trending Topics in Twitter, IEEE Transaction on
Multimedia, vol. 15, no. 6, pp. 1268 - 1282, 2013.
• M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based
on temporal and social terms evaluation. In Proceedings of the Tenth International
Workshop on Multimedia Data Mining, MDMKDD ’10, pages 4:1–4:10, New York,
NY, USA, 2010. ACM.
• F. Zarrinkalam, H. Fani, E. Bagheri, M. Kahani, W. Du, “Semantics-enabled User
Interest Detection from Twitter”, IEEE/WIC/ACM Web Intelligence Conference,
2015.
• Abel, F., Gao, Q., Houben, G.J., Tao, K.: Analyzing user modeling on twitter for
personalized news recommendations. In: 19th International Conference on User
Modeling, Adaption and Personalization (UMAP ‘11), pp. 1-12. Springer (2011)
• Ferragina, P., Scaiella, U.: Fast and Accurate Annotation of Short Texts with
Wikipedia Pages. J. IEEE Software 29(1), pp. 70-75. IEEE (2012)
24
References
• Michelson, M., Macskassy, S.A.: Discovering Users’ Topics of Interest on Twitter: A
First Look. In: 4th Workshop on Analytics for Noisy Unstructured Text Data
(AND'10), pp. 73-80 (2010)
• Abel, F., Gao, Q., Houben, G.J., Tao, K.: Semantic Enrichment of Twitter Posts for
User Profile Construction on the Social Web. In: 8th Extended Semantic Web
Conference (ESWC ’11), pp. 375-389. Springer (2011)
• Kapanipathi,P., Jain, P., Venkataramani, C., Sheth, A.: User Interests dentification on
Twitter Using a Hierarchical Knowledge Base. In: 11th Extended Semantic Web
Conference (ESWC ’14), pp. 99-113. Springer (2014)
• Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know:
Inferring user profiles in online social networks. In: 3th ACM international
conference on Web search and data mining (WSDM’10), pp. 251-260. ACM (2010)
• Wang, J., Zhao, W.X., He, Y., Li, X.: Infer User Interests via Link Structure
Regularization. ACM Transactions on Intelligent Systems and Technology (TIST) -
Special Issue on Linking Social Granularity and Functions 5(2), ACM (2014)
25
References
• Santosh Kabbur, Xia Ning, George Karypis,FISM: factored item similarity models for
top-N recommender systems, Proceedings of the 19th ACM SIGKDD international
conference on Knowledge discovery and data mining, pp. 659-667, 2013.
• Yu, Y.; Wang, C.; and Gao, Y. 2014. Attributes coupling based item enhanced matrix
factorization technique for recommender systems. arXiv preprint arXiv:1405.0770
• Liben-Nowell, D. and Kleinberg, J. (2007), The link-prediction problem for social
networks. J. Am. Soc. Inf. Sci., 58: 1019–1031. doi: 10.1002/asi.20591
• Cheng, Xueqi, Yan, Xiaohui, Lan, Yanyan, Guo, Jiafeng: BTM: Topic Modeling over
Short Texts. IEEE Transactions on Knowledge and Data Engineering 26(12), 2928-
2941, IEEE (2014)
• Parantapa Bhattacharya, Muhammad Bilal Zafar, Niloy Ganguly, Saptarshi Ghosh,
and Krishna P. Gummadi. 2014. Inferring user interests in the Twitter social
network. In Proceedings of the 8th ACM Conference on Recommender systems
(RecSys '14). ACM, New York, NY, USA, 357-360.
26

More Related Content

What's hot

Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
Changsung Moon
 
Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)
Anil Shrestha
 
Email Classification
Email ClassificationEmail Classification
Email Classification
Xi Chen
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
NBER
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
guest77b0cd12
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Arjen de Vries
 
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
WarNik Chow
 
13 sdm-blda-slides
13 sdm-blda-slides13 sdm-blda-slides
13 sdm-blda-slides
Minghui QIU
 
Preference Elicitation Interface
Preference Elicitation InterfacePreference Elicitation Interface
Preference Elicitation Interface
晓愚 孟
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
Cristhian Figueroa
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
Tamer Rezk
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
Tayfun Sen
 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
Anirudh Jayakumar
 
Recommender system a-introduction
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introductionzh3f
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
Shilpa Subrahmanyam
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
Chanon Hongsirikulkit
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
Aladejubelo Oluwashina
 

What's hot (20)

Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)
 
Email Classification
Email ClassificationEmail Classification
Email Classification
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
 
13 sdm-blda-slides
13 sdm-blda-slides13 sdm-blda-slides
13 sdm-blda-slides
 
Preference Elicitation Interface
Preference Elicitation InterfacePreference Elicitation Interface
Preference Elicitation Interface
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
 
Recommender system a-introduction
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introduction
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 

Similar to Slides ecir2016

Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Charalampos Chelmis
 
Scalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on MicroblogsScalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on Microblogs
Yuanyuan Tian
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
Justin Sybrandt, Ph.D.
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
Xavier Amatriain
 
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
Eliezer Silva
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
Charalampos Chelmis
 
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
GUANGYUAN PIAO
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
nishant24894
 
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsOn Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
PC LO
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability Model
IJERA Editor
 
Analyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News RecommendationsAnalyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News Recommendations
GUANGYUAN PIAO
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
petrknoth
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Denis Parra Santander
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social Network
Lora Aroyo
 
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
Sven Van Laere
 
Cite track presentation
Cite track presentationCite track presentation
Cite track presentation
Amir Razmjou
 
Graph Based User Interest Modeling in Twitter
Graph Based User Interest Modeling in TwitterGraph Based User Interest Modeling in Twitter
Graph Based User Interest Modeling in Twitterraghavr186
 
DIE 20130724
DIE 20130724DIE 20130724
DIE 20130724
Tokyo Tech
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social Network
George Konstantakopoulos
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
eswcsummerschool
 

Similar to Slides ecir2016 (20)

Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
 
Scalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on MicroblogsScalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on Microblogs
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
 
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsOn Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability Model
 
Analyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News RecommendationsAnalyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News Recommendations
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social Network
 
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
 
Cite track presentation
Cite track presentationCite track presentation
Cite track presentation
 
Graph Based User Interest Modeling in Twitter
Graph Based User Interest Modeling in TwitterGraph Based User Interest Modeling in Twitter
Graph Based User Interest Modeling in Twitter
 
DIE 20130724
DIE 20130724DIE 20130724
DIE 20130724
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social Network
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
 

Recently uploaded

Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
2024-05-30_meetup_devops_aix-marseille.pdf
2024-05-30_meetup_devops_aix-marseille.pdf2024-05-30_meetup_devops_aix-marseille.pdf
2024-05-30_meetup_devops_aix-marseille.pdf
Frederic Leger
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Dutch Power
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
eCommerce Institute
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Dutch Power
 
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
SkillCertProExams
 
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AwangAniqkmals
 
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsCollapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Rosie Wells
 
Gregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics PresentationGregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics Presentation
gharris9
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
faizulhassanfaiz1670
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
kkirkland2
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
amekonnen
 

Recently uploaded (19)

Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
2024-05-30_meetup_devops_aix-marseille.pdf
2024-05-30_meetup_devops_aix-marseille.pdf2024-05-30_meetup_devops_aix-marseille.pdf
2024-05-30_meetup_devops_aix-marseille.pdf
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
 
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...
 
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
 
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsCollapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie Wells
 
Gregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics PresentationGregory Harris - Cycle 2 - Civics Presentation
Gregory Harris - Cycle 2 - Civics Presentation
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
 

Slides ecir2016

  • 1. Laboratory of systems, software and semantics (LS3) Ryerson university of Canada Inferring Implicit Topical Interests on Twitter 1 Fattane Zarrinkalam Hossein Fani Ebrahim Bagheri Mohsen Kahani
  • 2. 2 Outline • Introduction • Related Work and Motivation • Proposed Approach • Evaluation • Conclusion and future work
  • 3. 3 Introduction • Due to the increasing growth of user-generated content on the web, it is interesting for users to receive only information which are related to their interests. • Personalization and recommender systems • Social networks like twitter, enable users to freely communicate with each other and share recent news, ongoing activities or views about different topics. • They can be seen as a viable source of information about the users and their interests User interest detection from Twitter
  • 4. 4 Related Work • Bag of Words approach • It suffers from known problems in natural language processing like Polysemy and Synonymy • Topic modeling approach (e.g. LDA) • Sparsity problem • Tweets are short, noisy and informal (limited to 140 characters) • The number of topics in LDA is assumed to be fixed • They don’t consider the underlying semantics of the phrases
  • 5. 5 Related Work • Bag of Concepts approach • Usually, external knowledge bases such as DBpedia, Freebase and Yago are used as a source for extracting concepts.
  • 6. 6 Related Work Limitations of bag of concepts approach  An interest is often modeled to be represented using one single concept They cannot infer that a user is interested in a more specific topic, which is actually a combination of multiple related concepts.  Interests are confined to a set of predefined concepts. Interests to recent events that are not among that set cannot be discovered on the fly. • [Zarrinkalam et al., WI2015] We view each topic of interest as a conjunction of several semantic concepts which are temporally correlated on Twitter. • Topic of interest: {Premier League, Arsenal F.C., Tottenham Hotspur F.C., Arsène Wenger} • represents rivalry between Spurs and Arsenal
  • 7. 7 Related Work • Many previous works are related to Explicit interests detection: • Interests that are directly derivable from a user’s tweets • Little is known in detecting Implicit interests, topics that the user never explicitly engaged with but might have interest in. • Homophily theory • Semantic relatedness between topics • They view each topic as a single concept, • the relationship between two topics is predefined in the external knowledge base.
  • 8. 8 Proposed Approach • The main objective of our work: • Determining implicit interests of users over the emerging topics on Twitter • Our Model: • A graph-based link prediction schema that operates over a heterogeneous graph which uses three types of information: • Users’ Explicit interest profile • Theory of Homophily (user followership relations) • Relationship between emerging topics which or what combination of these three types of information are most effective in allowing us to accurately identify a user’s implicit interests?
  • 9. 9 Representation Model User Graph GU Topic Graph Gℤ User-Topic Graph GUℤ
  • 10. 10 Representation Model (User-Topic graph) • Emerging Topic: • z = {(c, w(c, z)) | c ∈ C} • w(c, z) : the importance of concept c in topic z. • • The weight of each edge euz ∈ EUℤ : • The degree of u’s explicit interest in topic z • Our intuition: • the more a user tweets about a certain topic, the more interested the user would be in that topic. • Occurrence Ratio of topic z in tweet m: • euz is calculated by averaging the value of OR(z , m) over all tweets posted by the specific user u with regards to topic z.
  • 11. 11 Representation Model (Topic-Topic graph) • Topic Relatedness 1. Semantics relatedness • Semantic relatedness of their constituent concepts • Using a Wikipedia-based measures [Witten et al, AAAI2008] 2. Collaborative relatedness • Based on users’ overlapping explicit contributions toward these topics • Using collaborative filtering approach 3. Hybrid approach • Based on both the semantic relatedness of the concepts within each topic as well as users’ contributions towards the emerging topics .
  • 12. 12 Representation Model (Topic-Topic graph) • Collaborative relatedness • Adopting a factored item-item collaborative filtering method [Kabbur et al., SIGKDD2013] • Input: a user-item rating matrix R <user-topic graph information > • P and Q (latent factors of items ) can be learnt by minimizing an optimization problem • Output: item-item similarities as a product of two rank matrices, P and Q. <collaborative relatedness of topics>
  • 13. 13 Representation Model (Topic-Topic graph) • Hybrid approach • We follow the assumption of [Yu er al., TKDE 2014] to add the item attribute information into optimization problem of factored collaborative filtering method. • S is a matrix in which Sii’ denotes the similarity between topic zi and topic zi’ based on their attributes. • attributes of each topic are its constituent concepts • Sii: semantic relatedness of two topics
  • 14. 14 Link Prediction • Unsupervised link prediction strategies: • There is no single superior method among existing work and their quality is dependent on the structure of the underlying graph. [Liben-Nowell, J. Am. Soc. Inf. Sci., 2007] • Adamic/Adar • Common Neighbors • Jaccard’s coefficient • Katz • SimRank
  • 15. 15 Experiments • Dataset • Twitter dataset: 3M tweets posted by approximately 135K users • TAGME as a semantic annotator • Evaluation Methodology • leave-one-out method • Metrics • Area Under Receiver Operating Characteristic (AUROC) curve • Area Under the Precision-Recall (AUPR) curve
  • 16. 16 Experiments Seven variants of our representation model to compare followership information (F) Semantic relatedness (S) collaborative relatedness (C) hybrid relatedness (CS).
  • 17. 17 Results Model Metric Adamic/ Adar Common Neighbor Jaccard Coefficient Katz SimRank =0.8= 0.0005 = 0.005 = 0.05 F AUROC 0.500 0.500 0.500 0.524 0.524 0.528 0.510 AUPR 0.438 0.438 0.438 0.454 0.454 0.458 0.422 S AUROC 0.791 0.790 0.774 0.790 0.790 0.788 0.500 AUPR 0.740 0.739 0.723 0.740 0.739 0.734 0.438 SF AUROC 0.791 0.790 0.762 0.757 0.753 0.720 0.520 AUPR 0.740 0.739 0.707 0.660 0.652 0.602 0.430 C AUROC 0.712 0.710 0.700 0.714 0.715 0.728 0.500 AUPR 0.657 0.651 0.610 0.657 0.661 0.680 0.438 CF AUROC 0.773 0.771 0.758 0.742 0.738 0.716 0.517 AUPR 0.717 0.714 0.692 0.647 0.640 0.602 0.428 CS AUROC 0.762 0.761 0.748 0.763 0.763 0.767 0.500 AUPR 0.697 0.695 0.661 0.699 0.699 0.707 0.438 CSF AUROC 0.762 0.761 0.738 0.736 0.732 0.707 0.520 AUPR 0.697 0.695 0.652 0.640 0.632 0.595 0.428 The AUROC/AUPR values showing the performance of different model variants.
  • 18. 18 Results Model Metric Adamic/ Adar Common Neighbor Jaccard Coefficient Katz = 0.0005 = 0.005 = 0.05 F AUROC 0.500 0.500 0.500 0.524 0.524 0.528 AUPR 0.438 0.438 0.438 0.454 0.454 0.458 S AUROC 0.791 0.790 0.774 0.790 0.790 0.788 AUPR 0.740 0.739 0.723 0.740 0.739 0.734 C AUROC 0.712 0.710 0.700 0.714 0.715 0.728 AUPR 0.657 0.651 0.610 0.657 0.661 0.680 CS AUROC 0.762 0.761 0.748 0.763 0.763 0.767 AUPR 0.697 0.695 0.661 0.699 0.699 0.707 The AUROC/AUPR values showing the performance of different model variants.
  • 19. 19 Results Model Metric Adamic/ Adar Common Neighbor Jaccard Coefficient Katz = 0.0005 = 0.005 = 0.05 S AUROC 0.791 0.790 0.774 0.790 0.790 0.788 AUPR 0.740 0.739 0.723 0.740 0.739 0.734 C AUROC 0.712 0.710 0.700 0.714 0.715 0.728 AUPR 0.657 0.651 0.610 0.657 0.661 0.680 CS AUROC 0.762 0.761 0.748 0.763 0.763 0.767 AUPR 0.697 0.695 0.661 0.699 0.699 0.707 The AUROC/AUPR values showing the performance of different model variants.
  • 20. 20 Results Model Metric Adamic/ Adar Common Neighbor Jaccard Coefficient Katz = 0.0005 = 0.005 = 0.05 S AUROC 0.791 0.790 0.774 0.790 0.790 0.788 AUPR 0.740 0.739 0.723 0.740 0.739 0.734 SF AUROC 0.791 0.790 0.762 0.757 0.753 0.720 AUPR 0.740 0.739 0.707 0.660 0.652 0.602 C AUROC 0.712 0.710 0.700 0.714 0.715 0.728 AUPR 0.657 0.651 0.610 0.657 0.661 0.680 CF AUROC 0.773 0.771 0.758 0.742 0.738 0.716 AUPR 0.717 0.714 0.692 0.647 0.640 0.602 CS AUROC 0.762 0.761 0.748 0.763 0.763 0.767 AUPR 0.697 0.695 0.661 0.699 0.699 0.707 CSF AUROC 0.762 0.761 0.738 0.736 0.732 0.707 AUPR 0.697 0.695 0.652 0.640 0.632 0.595 The AUROC/AUPR values showing the performance of different model variants.
  • 21. 21 Results The ROC curves for comparing the seven variants
  • 22. 22 Conclusion and Future work • Conclusion: • We modeled user implicit interest detection problem as a link prediction task over a graph including three type of information: followerships, users explicit interests over the emerging topics and topics relatedness. • We investigated the impact these methods on the accuracy of implicit interest detection, by comparing different variants of our representation model and applying some well-known link prediction strategies. • Future work: • Using link prediction methods introduced for heterogeneous graphs • Including temporal behavior of users toward topics in our model
  • 23. 23 References • L.M. Aiello, G. Petkos, C. Martin, D. Corney, S. Papadopoulos, R. Skraba, A. Goker, I. Kompatsiaris, A. Jaimes, Sensing Trending Topics in Twitter, IEEE Transaction on Multimedia, vol. 15, no. 6, pp. 1268 - 1282, 2013. • M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the Tenth International Workshop on Multimedia Data Mining, MDMKDD ’10, pages 4:1–4:10, New York, NY, USA, 2010. ACM. • F. Zarrinkalam, H. Fani, E. Bagheri, M. Kahani, W. Du, “Semantics-enabled User Interest Detection from Twitter”, IEEE/WIC/ACM Web Intelligence Conference, 2015. • Abel, F., Gao, Q., Houben, G.J., Tao, K.: Analyzing user modeling on twitter for personalized news recommendations. In: 19th International Conference on User Modeling, Adaption and Personalization (UMAP ‘11), pp. 1-12. Springer (2011) • Ferragina, P., Scaiella, U.: Fast and Accurate Annotation of Short Texts with Wikipedia Pages. J. IEEE Software 29(1), pp. 70-75. IEEE (2012)
  • 24. 24 References • Michelson, M., Macskassy, S.A.: Discovering Users’ Topics of Interest on Twitter: A First Look. In: 4th Workshop on Analytics for Noisy Unstructured Text Data (AND'10), pp. 73-80 (2010) • Abel, F., Gao, Q., Houben, G.J., Tao, K.: Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web. In: 8th Extended Semantic Web Conference (ESWC ’11), pp. 375-389. Springer (2011) • Kapanipathi,P., Jain, P., Venkataramani, C., Sheth, A.: User Interests dentification on Twitter Using a Hierarchical Knowledge Base. In: 11th Extended Semantic Web Conference (ESWC ’14), pp. 99-113. Springer (2014) • Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: Inferring user profiles in online social networks. In: 3th ACM international conference on Web search and data mining (WSDM’10), pp. 251-260. ACM (2010) • Wang, J., Zhao, W.X., He, Y., Li, X.: Infer User Interests via Link Structure Regularization. ACM Transactions on Intelligent Systems and Technology (TIST) - Special Issue on Linking Social Granularity and Functions 5(2), ACM (2014)
  • 25. 25 References • Santosh Kabbur, Xia Ning, George Karypis,FISM: factored item similarity models for top-N recommender systems, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 659-667, 2013. • Yu, Y.; Wang, C.; and Gao, Y. 2014. Attributes coupling based item enhanced matrix factorization technique for recommender systems. arXiv preprint arXiv:1405.0770 • Liben-Nowell, D. and Kleinberg, J. (2007), The link-prediction problem for social networks. J. Am. Soc. Inf. Sci., 58: 1019–1031. doi: 10.1002/asi.20591 • Cheng, Xueqi, Yan, Xiaohui, Lan, Yanyan, Guo, Jiafeng: BTM: Topic Modeling over Short Texts. IEEE Transactions on Knowledge and Data Engineering 26(12), 2928- 2941, IEEE (2014) • Parantapa Bhattacharya, Muhammad Bilal Zafar, Niloy Ganguly, Saptarshi Ghosh, and Krishna P. Gummadi. 2014. Inferring user interests in the Twitter social network. In Proceedings of the 8th ACM Conference on Recommender systems (RecSys '14). ACM, New York, NY, USA, 357-360.
  • 26. 26

Editor's Notes

  1. my presentation is about inferring user’s implicit interests from twitter.
  2. As you know, there are many user-generated content on the web. So it is interesting for users to recieve only information which are related to their interests. Therefore, the main step in all personalization and recommender systems like news recommendation is user interest detection.
  3. The works in this fild, in terms of how they represent the user’s interests can be divided into three categories: Bag of words: Each user interest is represented as a term extracted from the user contents Topic Modeling approach like LDA which Implicitly use co-occurrence patterns of terms don’t perform well on tweets which are short, noisy and informal. Generally, both of these approaches are based on terms, so they don’t consider underlying semantics of the tweets.
  4. There is another line of work that use the concepts defined in external knowledge bases like Dbpedia to represent user interests. Different existing semantic annotators are used in these works like Zemanta, TagMe, Open calies. For example, in this slide you can see the results of TagMe for this real tweet. Arsenal is annotated with arsenal FC and spurse with tottenham.
  5. These works has some limitation, the most important one is that they view each topic as a single concept so they cant specify specific interests of users. So in our previous work which is published in WI2015, we view each topic as a combination of several concepts which are temporally corelated on twitter.
  6. Independent of how they represent user interests, Most of the previous works have focused on extracting explicit interests through analysing only their textual contents. However, little is known in detecting implicit interests. We mean by implicit interests, the topics that the user never explicitly engaged with but might have interest in. Among these works, they only consider homophily theory or predefined relatedness between topics. Based on this theory, users tend to connect to users with common interests or preferences.
  7. In this paper, a graph based link prediction schema is proposed to infer implicit interests of the users towards emerging topics in Twitter. The underlying graph of this schema uses three types of information: user’s followerships, user’s explicit interests towards the topics, and the relatedness of the topics.
  8. we propose a comprehensive graph-based representation model that includes these three types of information. This heterogenous graph composed of three subgraphs, user graph, an unweighted and directed for representing followership relations between users on Twitter, Topic graph that shows potential relationships between detected emerging topics in ℤ. And finally User-topic graph to represent explicit interests of users.
  9. Here, in line with our previous work, we view each topic as a set of weighted concepts extracted from wikipedia. our intuition to calculate the value of explicit interest of each user to each topic, is that the more the user tweets about the topic, the more interested the users in that topic. So, we first calculate the relatedness of each of her tweets to that topic and then average over them. where (c,m) is 1, if Tweet m is annotated with concept c, otherwise, (c,m)=0.
  10. We use three approaches for compute the relatedness between our topics. The first one is semantic relatedness. Based on this approach, two topics are considered to be similar if their concepts are semantically similar. Because the concepts are wikipedia concepts, so we utilize an existing Wikipedia-based relatedness measure to compute the relatedness between topics. In collaborative relatedness approach, the relatedness of two topics is determined based on a collaborative filtering strategy over explicit interests of users. And Hybrid approach is based on both semantic and collaborative relatedness. the semantic relatedness of two emerging topics can be calculated by measuring the average pairwise semantic relatedness between the concepts of the two topics using a Wikipedia-based relatedness measure. In our experiments, we use WLM [22], which computes the relatedness of two Wikipedia concepts through link structure analysis. Given a user-topic graph GUℤ, we regard the problem of computing the collaborative relatedness of topics as an instance of a model-based collaborative filtering problem.
  11. For Collaborative filtering approach, we adopt an existing factored item-item collaborative filtering method presented in SIGKDD conference.. They give user-topic rating matrix as input They solve the optimization problem as shown in the slide and learn latent factors of items p and Q. Finally, collaborative relatedness of topics can be computed by the product of these two matrix. In our work, each item is considered as a topic and we build user item rating matrix based on the explicit interests of users to topics /////////////////////////////////////////////// where Ru+  is the set of topics that user u is interested in, pj and qi are the learned topic latent factors, nu+ is the number of topics that user u is interested in and is a user specified parameter between 0 and 1. According to [24],  matrices P and Q can be learnt by minimizing a regularized optimization problem: where the vectors bu and bi correspond to the vector of user u and topic zi biases, respectively.   The optimization problem can be solved using  Stochastic Gradient Descent to learn two matrices P and Q. Given P and Q as latent factors of topics, the collaborative relatedness of two topics zi and zj is computed as the dot product between the corresponding factors from P and Q i.e., pi  and qj.
  12. Here, in this approach, we follow the assumption of the paper published in TKDE journal. [IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING] to add item attribute information in optimization problem of factored item-item collaborative filtering method. Based on their approach, by adding this term to the optimization problem in the previous slide, two topic latent vectors would be considered similar if they are similar according to their attribute information.  In our work, attributes are concepts of each topic and s is calculated by measuring the semantic relatedness of topics. In this term, p and q are latent factors of topics and matrix S denotes the similarity between topics based on their attributes. ///////////////////////////////////// where is a parameter to control the impact of topic concept information, S is a matrix in which Sii’ denotes the similarity between topic zi and topic zi’ based on their attributes. In our proposed approach, attributes of each topic are its constituent concepts and Sii’ is calculated by measuring the semantic relatedness of two topics as introduced earlier. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING Two item latent feature vectors would be considered similar if they are similar according to their attribute information.
  13. After building the representation model, our goal is finding missing links of user-topic graph by adopting an unsupervised link prediction strategy. Because there isnt any superiour approach, we apply different known link prediction methods. The three first methods are neiborhood based and katz an simrank are path-based methods. Vertex neighborhood methods are based on the idea that two vertices x and y are more likely to have a link if they have many common neighbors. Path-based methods consider the ensemble of all paths between two vertices
  14. For our experiments, we use an availible twitter dataset on the web about 3 milion tweets sampled in two month of 2010. further, we utilize TagMe to annotate the tweets with wikipedia concepts As our evaluation strategy, we use leave-one-out method . At each time, we pick one edge from user-topic graph for test and the rest of the representation model for training set. We repeat this procedure for all pairs. We use two metrics to evaluate the results by comparing them with the test set: the Area Under Receiver Operating Characteristic (AUROC) and the Area Under the Precision-Recall (AUPR) curves
  15. Because we want to investigate the impact of different types of information on the accuracy of implicit interest detection from twitter, here we consider 7 variants of our representation model to compare. As an example the method named F only uses followership information in addition to user explicit information in the representation model.
  16. The final results in terms of the two evaluation metrics and by applying different link prediction strategies are reported in this table. As illustrated in this table, we can clearly see that the SimRank link prediction method has not shown a good performance over none of the variants. Based on our results, actually, it acts as a random predictor because for most of the models its AUROC value is about 0.5. So, we ignore its results for analyzing the influence of the different information in our representation model.
  17. As the first observation, In this table we can see that all three models that use topic relationships C, S and CS outperform Model F noticeably in terms of two metrics. This means that considering the relationships between the topics considerably improves the accuracy of inferring implicit interests in comparison to when only followership information is used.
  18. Here, By comparing S, C and CS themselves, it can be observed that using the semantic relatedness has higher accuracy compared to others. This is an interesting observation that users are interested in topics that are around similar topics. For an instance, two topics z1={Chelsea F.C., Arsenal F.C.} and z2= {FC Barcelona, Real Madrid C.F.} are the most semantically similar topics in our data. it is reasonable to infer, that a user who is explicitly interested in one of these derbies, is probably interested also in the other one. In this table, by comparing C and CS, it can also be concluded that adding semantic relatedness to collaborative relatedness measure leads to improve accuracy. The observation that S is the best is more interesting when compare the computational complexity of these methods. the computation of C and CS require solving an optimization problem through Stochastic Gradient Descent is an expensive operation compared to S.
  19. As another observation, the model SF add the followership information in the S. Based on the results, no uniform observation can be made in any of the cases. I mean, the followership information does not seem to have a noticeable impact on the results. So, through our experiments we were not able to show the impact of homophily theory.
  20. We can also conclude all the previous observations by comparing their curves.
  21. To conclude, we have modeled identifying user implicit interests of users as a link prediction task over a hetrogenous graph including three kinds of information. To investigate the impact of different information, we compared different variaents of our representation model and have concuded that considering the relationship between topic considerably out performs the method that only use followership information. Further, among topic relatedness method, semantic relatedness is the best. In summary, model S, which relies solely on the semantic relatedness of topics and user’s explicit contributions to these topics shows the best performance across all seven variants. The model SF shows the same performance as S in which the additional followership information included in the model does not seem to have impacted the final results.