Textual information exchanged among users on online social network platforms provides deep understanding into users' interest and behavioral patterns. However, unlike traditional text-dominant settings such as online publishing, one distinct feature for online social network is users' rich interactions with the textual content, which, unfortunately, has not yet been well incorporated in the existing topic modeling frameworks.
In this paper, we propose an LDA-based behavior-topic
model (B-LDA) which jointly models user topic interests and behavioral patterns. We focus the study of the model on on-line social network settings such as microblogs like Twitter where the textual content is relatively short but user inter-actions on them are rich. We conduct experiments on real Twitter data to demonstrate that the topics obtained by our model are both informative and insightful. As an application of our B-LDA model, we also propose a Twitter followee rec-ommendation algorithm combining B-LDA and LDA, which we show in a quantitative experiment outperforms LDA with a signicant margin.
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
13 sdm-blda-slides
1. It Is Not Just What We Say, But How We Say Them:
Joint Behaviour-Topic Modelling
Minghui QIU, Feida ZHU and Jing JIANG
Singapore Management University
2. Microblogs
• Rich user interactions with textual information
(posting behaviors)
POST
RETWEET
REPLY
MENTION
Why do we need to consider user behaviors?
2
3. Observation 1: users with similar topics of
interest can have different behavioral patterns
• Users who are interested in `politics’ topic
Different behaviors people exhibit in Twitter suggest different
motivations using the platform.
3
4. Observation 2: user clusters with distinct behavioral
patterns usually represent different user profiles
• Top 5 users who frequently post tweets about
the topic `politics’
Official news media accounts
4
5. IT Is Not Just What We Say, But
How we Say Them
• The way people interact with text is critical in
understanding user behavior patterns and
modeling user interest in social networks
• To joint model the topic interests and
interactions of a user with the topic in
Microbloggs like Twitter
5
6. Outline
• Topic Modeling in Twitter
• Joint behavior-topic model
• Applications and Empirical Results
– Topic analysis
– User clustering
– Followee Recommendation
• Summary
7. Topic Modeling in Twitter
• Twitter
– 140 character limit
– Noisy tweets
• Comparison between LDA and Twitter-LDA [Zhao et al.,
ECIR’10]
LDA
Document
T-LDA (Twitter-LDA)
All tweets of a given Twitter user
Words
Words in user’s tweets
Topic assignment
Each word has a topic
Each tweet has a topic
Word pools
Topical words
Topical words or background words
To extend T-LDA to jointly model
the topic interests and interactions of a user.
8. LDA-based Behaviour-Topic Modelling
– B-LDA
topic’s
behavior
T distribution
b
user’s topic
distribution
θ
z
w
T
y
•U: # of users
•N: # of tweets
•L: # of words
•z: a topic label
•y: a switch
L
topic’s
word
distribution
N
U
γ
background word
distribution.
background word
distribution.
8
9. Outline
• Topic Modeling in Twitter
• Joint behavior-topic model
• Applications and Empirical Results
– Topic analysis
– User clustering
– Followee Recommendation
• Summary
10. Data sets
• Base data set
– 151,055 twitter users in Singapore and their tweets
• Our data set
– Randomly selected 5000 users, among whom 1000 are further
selected to obtain their followees, totally 9688 users
– Tweets from Sep 1, 2011 to Nov 30.2011
– Total tweets: 11,882,441 tweets
• Preprocess
– Remove stop words
– Remove words with non-standard characters (url, emoticon etc.)
• Parameters setting (LDA, Twitter-LDA, B-LDA):
– # of topics: T = 80
– α = 50/T, β = 0.01
10
11. Topic Analysis
• Whether the resulting topics in B-LDA has some
dominant behaviors?
• Entropy on topic’s behavior distribution
– B-LDA: p(b|t) could be learnt
– LDA and T-LDA:
– C(t,b): # of times topic t co-occurs with behavior b
– δ: normalization factor
11
12. Topic Analysis
• Whether the resulting topics in B-LDA has some
dominant behaviors?
– Low entropy means the topic is with dominant behaviors
– B-LDA: topic is enhanced by dominant behavior patterns
12
13. Topic Analysis
• Topics of distinct behavior patterns
Topics that are
similar but
with different
behaviors
POST
RETWEET
REPLY
13
14. Topics Analysis
• Topics in T-LDA would be split by different
behavioral patterns in B-LDA
T-LDA
B-LDA
1
1
2
Distance:
KL-divergence
…
…
T
2
Topic group
T
– 15 topic groups each with two topics
– 1 topic group with three topics
14
15. Topics Analysis
• Topics split by different behavioral patterns
Topic 16 is mainly contributed by new
media accounts, but topic 13 is not.
Topic 61 is a retweet topic and contains
more words with hashtags.
Topics in B-LDA are with more distinct
behavioral pattern than those in T-LDA
15
16. Applications – followee recommendation
• Followee recommendation
– User profile: user’s or user’s followees’ textual content
– Does not consider behavior patterns
• Behavior-matters
– People who use Twitter as instant massager: follow users
who they may interact with
– People who use Twitter as information source and news
feeds: follow official new media channels.
– Twitter - news media or social network [Kwak et al., WWW’10]
• Definition: users who cares about the behavioral patterns of
their followees, explicitly or implicitly, are “behavior-driven
followers”.
16
17. Applications – followee recommendation
• Finding behavior-driven followers
– A behavior-driven follower’s followees will naturally form a
small number of clusters within each of which the
followees would share similar behavioral patterns.
– k-nearest-neighbor distance
• S: a given space, U: a set of users,
: user v’s k-nearest-neighbors
– Behavior-driven index
• ST: the topic space, SB: the joint behavior-topic space, Fu: followees of u
• Behavior-driven follower has a large βK
17
18. Applications – followee recommendation
• Definition
– βK ≥ τ : behavior-driven follower
– βK < τ : topic-driven follower
• Behavior-driven index
– K = 1, topic space: LDA, joint behavior-topic space: B-LDA
– Half of users are to some extent behavior-driven
18
19. Applications – followee recommendation
• Followee recommendation approach [Chen et al., WWW’09]
– For a target user u, we randomly pick one followee from
u’s current followee set, and then combine her with
another m randomly-selected non-followees.
– For these m + 1 users, any recommendation algorithm
would generate a ranking of them in descending order.
– The performance is measured by examining how high the
real followee is ranked.
19
21. Applications – followee recommendation
• Evaluation
– Rank of the real followee
– Mean reciprocal rank
22. Applications – followee recommendation
• Evaluation
– Smaller neighbourhood size K has better results
– BLDA and TLDA ranks real followees higher than LDA with a smaller
deviation than LDA
– Adding behaviours to topic modelling help the task: BLDA > TLDA
– LDA: better MRR but low average rank: LDA is not robust and performs
22
particular well or worse on some set of users
23. Applications – followee recommendation
• Study on behavior-driven index
– Correlation between DKNN and Rank of the real followee
– Correlation between βK and relative rank rLDA/rBLDA
– β1 will be used judge whether a given user is behavior-driven or topic
driven follower
24. Applications – followee recommendation
• Topic-driven follower vs. Behavior-driven follower
• Results on behavior-driven follower
BLDA significantly performs better than
LDA on behavior-driven followees.
24
25. Applications – followee recommendation
• A combined followee recommendation method
(comModel)
– Using behavior-driven index to choose model
• Model selection
26. Applications – followee recommendation
• Comparisons of comModel, B-LDA and LDA
– Rank of the real followee and MRR
– Cummulative distribution of ranks (CDR) for real followees
27. Summary
• We propose B-LDA - a Behaviour-integrated topic model
based on LDA
• Comparison B-LDA with LDA and Twitter-LDA
– Experiment results show B-LDA can find topics with dominant
behaviours
– We propose an index βK to characterize users who are behaviourdriven followers, and demonstrate that B-LDA significantly
outperforms other models on followee recommendation for
behaviour-driven followers.
– Based on the βK index, we propose a new recommendation framework
combining B-LDA and LDA which gives promising recommendations.
27
29. Reference
• [Zhao et al., ECIR’10] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan,
and X. Li, “Comparing twitter and traditional media using topic models,”
ser. ECIR, 2011, pp. 338–349
• [Kwak et al., WWW’10] H. Kwak, C. Lee, H. Park, and S. Moon, “What is
Twitter, a social network or a news media?” in WWW, 2010, pp. 591–600.
• W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang,
“Collaborative filtering for orkut communities: discovery of user latent
behavior,” ser. WWW, 2009, pp. 681–690.
Editor's Notes
Y = 0, background word distribution
In contrast to LDA, B-LDA generates topics each enhanced by a user behaviour distribution, which is denoted as ψt,b in the output. Just like LDA is expected to generate topics each containing words most relevant to a coherent topic, we would like B-LDA to generate topics which are identified with some dominant behaviour. Delta is a normalization factor.
In contrast to LDA, B-LDA generates topics each enhanced by a user behavior distribution, which is denoted as ψt,b in the output. Just like LDA is expected to generate topics each containing words most relevant to a coherent topic, we would like B-LDA to generate topics which are identified with some dominant behaviour.
For the “PO” dimension, topic 16 is related to daily news which is contributed mainly by news media accounts who mostly have no other behavior than posting. Topic 23 is mostly users’ daily personal updates which seldom interest others to retweet or reply. Topic 71 is also related to personal updates, but more on things related to cell phones, laptops, etc., and it tend to have more sentiment words like ‘omg’, ‘damn’ etc. than topic 23. Top-4 topics in the “RT” dimension are topics related to jokes like topic 51 which is a mixture of jokes and interesting things shared by user @SoSingaporean and @BvsSG, popular quotes like topic 70, daily horoscope topic 54, and topic 52 which is related to a music event - MAMA concert. We can also tell that topical words used in reply are more informal than the other behavior types which are hard to be labeled.
This observation seems to echo the findings in [] that Twitter functions both as ..Only consider topical interests regardless of behavior patterns
E.g.: a music fan may follow: music celebrities, official media channels, fan club etc.
a music fan may follow clusters of users: music celebrities, official media channels, fan club etc.
The model itself is not designed for fee rec we just wanna try these models on feeRec task to see which one is more suitable to profile users for the task.
The model itself is not designed for fee rec we just wanna try these models on feeRec task to see which one is more suitable to profile users for the task.
The model itself is not designed for fee rec we just wanna try these models on feeRec task to see which one is more suitable to profile users for the task.
The model itself is not designed for fee rec we just wanna try these models on feeRec task to see which one is more suitable to profile users for the task.