SlideShare a Scribd company logo
1 of 29
It Is Not Just What We Say, But How We Say Them:
Joint Behaviour-Topic Modelling

Minghui QIU, Feida ZHU and Jing JIANG

Singapore Management University
Microblogs
• Rich user interactions with textual information
(posting behaviors)
POST
RETWEET

REPLY
MENTION

Why do we need to consider user behaviors?
2
Observation 1: users with similar topics of
interest can have different behavioral patterns
• Users who are interested in `politics’ topic

Different behaviors people exhibit in Twitter suggest different
motivations using the platform.

3
Observation 2: user clusters with distinct behavioral
patterns usually represent different user profiles

• Top 5 users who frequently post tweets about
the topic `politics’

Official news media accounts

4
IT Is Not Just What We Say, But
How we Say Them
• The way people interact with text is critical in
understanding user behavior patterns and
modeling user interest in social networks

• To joint model the topic interests and
interactions of a user with the topic in
Microbloggs like Twitter

5
Outline
• Topic Modeling in Twitter
• Joint behavior-topic model
• Applications and Empirical Results
– Topic analysis
– User clustering
– Followee Recommendation

• Summary
Topic Modeling in Twitter
• Twitter
– 140 character limit
– Noisy tweets
• Comparison between LDA and Twitter-LDA [Zhao et al.,
ECIR’10]
LDA
Document

T-LDA (Twitter-LDA)
All tweets of a given Twitter user

Words

Words in user’s tweets

Topic assignment

Each word has a topic

Each tweet has a topic

Word pools

Topical words

Topical words or background words

To extend T-LDA to jointly model
the topic interests and interactions of a user.
LDA-based Behaviour-Topic Modelling
– B-LDA
topic’s
behavior
T distribution

b

user’s topic
distribution

θ

z

w
T
y

•U: # of users
•N: # of tweets
•L: # of words
•z: a topic label
•y: a switch

L

topic’s
word
distribution

N
U

γ
background word
distribution.

background word
distribution.

8
Outline
• Topic Modeling in Twitter
• Joint behavior-topic model
• Applications and Empirical Results
– Topic analysis
– User clustering
– Followee Recommendation

• Summary
Data sets
• Base data set
– 151,055 twitter users in Singapore and their tweets

• Our data set
– Randomly selected 5000 users, among whom 1000 are further
selected to obtain their followees, totally 9688 users
– Tweets from Sep 1, 2011 to Nov 30.2011
– Total tweets: 11,882,441 tweets

• Preprocess
– Remove stop words
– Remove words with non-standard characters (url, emoticon etc.)

• Parameters setting (LDA, Twitter-LDA, B-LDA):
– # of topics: T = 80
– α = 50/T, β = 0.01

10
Topic Analysis
• Whether the resulting topics in B-LDA has some
dominant behaviors?
• Entropy on topic’s behavior distribution

– B-LDA: p(b|t) could be learnt
– LDA and T-LDA:

– C(t,b): # of times topic t co-occurs with behavior b
– δ: normalization factor

11
Topic Analysis
• Whether the resulting topics in B-LDA has some
dominant behaviors?

– Low entropy means the topic is with dominant behaviors
– B-LDA: topic is enhanced by dominant behavior patterns
12
Topic Analysis
• Topics of distinct behavior patterns

Topics that are
similar but
with different
behaviors

POST

RETWEET

REPLY

13
Topics Analysis
• Topics in T-LDA would be split by different
behavioral patterns in B-LDA
T-LDA

B-LDA

1

1

2

Distance:
KL-divergence

…

…
T

2

Topic group

T

– 15 topic groups each with two topics
– 1 topic group with three topics
14
Topics Analysis
• Topics split by different behavioral patterns

Topic 16 is mainly contributed by new
media accounts, but topic 13 is not.
Topic 61 is a retweet topic and contains
more words with hashtags.
Topics in B-LDA are with more distinct
behavioral pattern than those in T-LDA
15
Applications – followee recommendation
• Followee recommendation
– User profile: user’s or user’s followees’ textual content
– Does not consider behavior patterns

• Behavior-matters
– People who use Twitter as instant massager: follow users
who they may interact with
– People who use Twitter as information source and news
feeds: follow official new media channels.
– Twitter - news media or social network [Kwak et al., WWW’10]

• Definition: users who cares about the behavioral patterns of
their followees, explicitly or implicitly, are “behavior-driven
followers”.

16
Applications – followee recommendation
• Finding behavior-driven followers
– A behavior-driven follower’s followees will naturally form a
small number of clusters within each of which the
followees would share similar behavioral patterns.
– k-nearest-neighbor distance

• S: a given space, U: a set of users,

: user v’s k-nearest-neighbors

– Behavior-driven index

• ST: the topic space, SB: the joint behavior-topic space, Fu: followees of u
• Behavior-driven follower has a large βK
17
Applications – followee recommendation
• Definition
– βK ≥ τ : behavior-driven follower
– βK < τ : topic-driven follower

• Behavior-driven index

– K = 1, topic space: LDA, joint behavior-topic space: B-LDA
– Half of users are to some extent behavior-driven
18
Applications – followee recommendation
• Followee recommendation approach [Chen et al., WWW’09]
– For a target user u, we randomly pick one followee from
u’s current followee set, and then combine her with
another m randomly-selected non-followees.
– For these m + 1 users, any recommendation algorithm
would generate a ranking of them in descending order.
– The performance is measured by examining how high the
real followee is ranked.

19
Applications – followee recommendation
• Followee recommendation approach [Chen et al., WWW’09]

20
Applications – followee recommendation
• Evaluation
– Rank of the real followee

– Mean reciprocal rank
Applications – followee recommendation
• Evaluation

– Smaller neighbourhood size K has better results
– BLDA and TLDA ranks real followees higher than LDA with a smaller
deviation than LDA
– Adding behaviours to topic modelling help the task: BLDA > TLDA
– LDA: better MRR but low average rank: LDA is not robust and performs
22
particular well or worse on some set of users
Applications – followee recommendation
• Study on behavior-driven index
– Correlation between DKNN and Rank of the real followee

– Correlation between βK and relative rank rLDA/rBLDA

– β1 will be used judge whether a given user is behavior-driven or topic
driven follower
Applications – followee recommendation
• Topic-driven follower vs. Behavior-driven follower

• Results on behavior-driven follower

BLDA significantly performs better than
LDA on behavior-driven followees.

24
Applications – followee recommendation
• A combined followee recommendation method
(comModel)
– Using behavior-driven index to choose model

• Model selection
Applications – followee recommendation
• Comparisons of comModel, B-LDA and LDA
– Rank of the real followee and MRR

– Cummulative distribution of ranks (CDR) for real followees
Summary
• We propose B-LDA - a Behaviour-integrated topic model
based on LDA
• Comparison B-LDA with LDA and Twitter-LDA
– Experiment results show B-LDA can find topics with dominant
behaviours
– We propose an index βK to characterize users who are behaviourdriven followers, and demonstrate that B-LDA significantly
outperforms other models on followee recommendation for
behaviour-driven followers.
– Based on the βK index, we propose a new recommendation framework
combining B-LDA and LDA which gives promising recommendations.

27
• Thanks!
28
Reference
• [Zhao et al., ECIR’10] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan,
and X. Li, “Comparing twitter and traditional media using topic models,”
ser. ECIR, 2011, pp. 338–349
• [Kwak et al., WWW’10] H. Kwak, C. Lee, H. Park, and S. Moon, “What is
Twitter, a social network or a news media?” in WWW, 2010, pp. 591–600.
• W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang,
“Collaborative filtering for orkut communities: discovery of user latent
behavior,” ser. WWW, 2009, pp. 681–690.

More Related Content

What's hot

Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliTag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deli
Vinay Singri
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommender
gu wendong
 
Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliTag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deli
Vinay Singri
 

What's hot (20)

Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliTag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deli
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommender
 
Email Classification
Email ClassificationEmail Classification
Email Classification
 
Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deliTag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deli
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and Applications
 
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
 
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
 
Social recommender system
Social recommender systemSocial recommender system
Social recommender system
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Tutorial on query auto completion
Tutorial on query auto completionTutorial on query auto completion
Tutorial on query auto completion
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Developing a Secured Recommender System in Social Semantic Network
Developing a Secured Recommender System in Social Semantic NetworkDeveloping a Secured Recommender System in Social Semantic Network
Developing a Secured Recommender System in Social Semantic Network
 
Content based filtering
Content based filteringContent based filtering
Content based filtering
 
Selection of Tags for Tag Clouds
Selection of Tags for Tag CloudsSelection of Tags for Tag Clouds
Selection of Tags for Tag Clouds
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
 

Viewers also liked

Social Media Conference: Eric Goubin - wat verandert in social media
Social Media Conference: Eric Goubin -  wat verandert in social mediaSocial Media Conference: Eric Goubin -  wat verandert in social media
Social Media Conference: Eric Goubin - wat verandert in social media
Opening-up.eu
 
Brett McCoy - Turning Fans into Fanatics
Brett McCoy - Turning Fans into FanaticsBrett McCoy - Turning Fans into Fanatics
Brett McCoy - Turning Fans into Fanatics
SocialCrush
 

Viewers also liked (8)

Social Media Conference: Eric Goubin - wat verandert in social media
Social Media Conference: Eric Goubin -  wat verandert in social mediaSocial Media Conference: Eric Goubin -  wat verandert in social media
Social Media Conference: Eric Goubin - wat verandert in social media
 
Brett McCoy - Turning Fans into Fanatics
Brett McCoy - Turning Fans into FanaticsBrett McCoy - Turning Fans into Fanatics
Brett McCoy - Turning Fans into Fanatics
 
Strategies for Stimulating Traffic and Loyalty
Strategies for Stimulating Traffic and LoyaltyStrategies for Stimulating Traffic and Loyalty
Strategies for Stimulating Traffic and Loyalty
 
Social Media as Part of the Wider Mediasphere
Social Media as Part of the Wider MediasphereSocial Media as Part of the Wider Mediasphere
Social Media as Part of the Wider Mediasphere
 
Social Media 101
Social Media 101Social Media 101
Social Media 101
 
Social Crush: Social Media Case Study
Social Crush: Social Media Case StudySocial Crush: Social Media Case Study
Social Crush: Social Media Case Study
 
Social Media Strategy
Social Media StrategySocial Media Strategy
Social Media Strategy
 
Social media - and why you should use it!
Social media - and why you should use it!Social media - and why you should use it!
Social media - and why you should use it!
 

Similar to 13 sdm-blda-slides

MyPlan - similarity metrics for matching lifelong learner timelines
MyPlan - similarity metrics for matching lifelong learner timelinesMyPlan - similarity metrics for matching lifelong learner timelines
MyPlan - similarity metrics for matching lifelong learner timelines
Nicolas Van Labeke
 
#lak2013, Leuven, DC slides, #learninganalytics
#lak2013, Leuven, DC slides, #learninganalytics#lak2013, Leuven, DC slides, #learninganalytics
#lak2013, Leuven, DC slides, #learninganalytics
Soudé Fazeli
 

Similar to 13 sdm-blda-slides (20)

Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
 
Analyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News RecommendationsAnalyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News Recommendations
 
MyPlan - similarity metrics for matching lifelong learner timelines
MyPlan - similarity metrics for matching lifelong learner timelinesMyPlan - similarity metrics for matching lifelong learner timelines
MyPlan - similarity metrics for matching lifelong learner timelines
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Predicting Answering Behaviour in Online Question Answering Communities
Predicting Answering Behaviour in Online Question Answering CommunitiesPredicting Answering Behaviour in Online Question Answering Communities
Predicting Answering Behaviour in Online Question Answering Communities
 
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...
 
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
 
#lak2013, Leuven, DC slides, #learninganalytics
#lak2013, Leuven, DC slides, #learninganalytics#lak2013, Leuven, DC slides, #learninganalytics
#lak2013, Leuven, DC slides, #learninganalytics
 
The User Side of Personalization: How Personalization Affects the Users
The User Side of Personalization: How Personalization Affects the UsersThe User Side of Personalization: How Personalization Affects the Users
The User Side of Personalization: How Personalization Affects the Users
 
NISO Update ODI June 2014 Morse
NISO Update ODI June 2014 MorseNISO Update ODI June 2014 Morse
NISO Update ODI June 2014 Morse
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
The Power of Known Peers: A Study in Two Domains
The Power of Known Peers: A Study in Two DomainsThe Power of Known Peers: A Study in Two Domains
The Power of Known Peers: A Study in Two Domains
 
Scalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on MicroblogsScalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on Microblogs
 
Are topic-specific search term, journal name and author name recommendations ...
Are topic-specific search term, journal name and author name recommendations ...Are topic-specific search term, journal name and author name recommendations ...
Are topic-specific search term, journal name and author name recommendations ...
 
Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling ...
Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling ...Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling ...
Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling ...
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket data
 
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsOn Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
 
Lagace - Copyright Clearance Center April 2, 2015
Lagace - Copyright Clearance Center April 2, 2015Lagace - Copyright Clearance Center April 2, 2015
Lagace - Copyright Clearance Center April 2, 2015
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

13 sdm-blda-slides

  • 1. It Is Not Just What We Say, But How We Say Them: Joint Behaviour-Topic Modelling Minghui QIU, Feida ZHU and Jing JIANG Singapore Management University
  • 2. Microblogs • Rich user interactions with textual information (posting behaviors) POST RETWEET REPLY MENTION Why do we need to consider user behaviors? 2
  • 3. Observation 1: users with similar topics of interest can have different behavioral patterns • Users who are interested in `politics’ topic Different behaviors people exhibit in Twitter suggest different motivations using the platform. 3
  • 4. Observation 2: user clusters with distinct behavioral patterns usually represent different user profiles • Top 5 users who frequently post tweets about the topic `politics’ Official news media accounts 4
  • 5. IT Is Not Just What We Say, But How we Say Them • The way people interact with text is critical in understanding user behavior patterns and modeling user interest in social networks • To joint model the topic interests and interactions of a user with the topic in Microbloggs like Twitter 5
  • 6. Outline • Topic Modeling in Twitter • Joint behavior-topic model • Applications and Empirical Results – Topic analysis – User clustering – Followee Recommendation • Summary
  • 7. Topic Modeling in Twitter • Twitter – 140 character limit – Noisy tweets • Comparison between LDA and Twitter-LDA [Zhao et al., ECIR’10] LDA Document T-LDA (Twitter-LDA) All tweets of a given Twitter user Words Words in user’s tweets Topic assignment Each word has a topic Each tweet has a topic Word pools Topical words Topical words or background words To extend T-LDA to jointly model the topic interests and interactions of a user.
  • 8. LDA-based Behaviour-Topic Modelling – B-LDA topic’s behavior T distribution b user’s topic distribution θ z w T y •U: # of users •N: # of tweets •L: # of words •z: a topic label •y: a switch L topic’s word distribution N U γ background word distribution. background word distribution. 8
  • 9. Outline • Topic Modeling in Twitter • Joint behavior-topic model • Applications and Empirical Results – Topic analysis – User clustering – Followee Recommendation • Summary
  • 10. Data sets • Base data set – 151,055 twitter users in Singapore and their tweets • Our data set – Randomly selected 5000 users, among whom 1000 are further selected to obtain their followees, totally 9688 users – Tweets from Sep 1, 2011 to Nov 30.2011 – Total tweets: 11,882,441 tweets • Preprocess – Remove stop words – Remove words with non-standard characters (url, emoticon etc.) • Parameters setting (LDA, Twitter-LDA, B-LDA): – # of topics: T = 80 – α = 50/T, β = 0.01 10
  • 11. Topic Analysis • Whether the resulting topics in B-LDA has some dominant behaviors? • Entropy on topic’s behavior distribution – B-LDA: p(b|t) could be learnt – LDA and T-LDA: – C(t,b): # of times topic t co-occurs with behavior b – δ: normalization factor 11
  • 12. Topic Analysis • Whether the resulting topics in B-LDA has some dominant behaviors? – Low entropy means the topic is with dominant behaviors – B-LDA: topic is enhanced by dominant behavior patterns 12
  • 13. Topic Analysis • Topics of distinct behavior patterns Topics that are similar but with different behaviors POST RETWEET REPLY 13
  • 14. Topics Analysis • Topics in T-LDA would be split by different behavioral patterns in B-LDA T-LDA B-LDA 1 1 2 Distance: KL-divergence … … T 2 Topic group T – 15 topic groups each with two topics – 1 topic group with three topics 14
  • 15. Topics Analysis • Topics split by different behavioral patterns Topic 16 is mainly contributed by new media accounts, but topic 13 is not. Topic 61 is a retweet topic and contains more words with hashtags. Topics in B-LDA are with more distinct behavioral pattern than those in T-LDA 15
  • 16. Applications – followee recommendation • Followee recommendation – User profile: user’s or user’s followees’ textual content – Does not consider behavior patterns • Behavior-matters – People who use Twitter as instant massager: follow users who they may interact with – People who use Twitter as information source and news feeds: follow official new media channels. – Twitter - news media or social network [Kwak et al., WWW’10] • Definition: users who cares about the behavioral patterns of their followees, explicitly or implicitly, are “behavior-driven followers”. 16
  • 17. Applications – followee recommendation • Finding behavior-driven followers – A behavior-driven follower’s followees will naturally form a small number of clusters within each of which the followees would share similar behavioral patterns. – k-nearest-neighbor distance • S: a given space, U: a set of users, : user v’s k-nearest-neighbors – Behavior-driven index • ST: the topic space, SB: the joint behavior-topic space, Fu: followees of u • Behavior-driven follower has a large βK 17
  • 18. Applications – followee recommendation • Definition – βK ≥ τ : behavior-driven follower – βK < τ : topic-driven follower • Behavior-driven index – K = 1, topic space: LDA, joint behavior-topic space: B-LDA – Half of users are to some extent behavior-driven 18
  • 19. Applications – followee recommendation • Followee recommendation approach [Chen et al., WWW’09] – For a target user u, we randomly pick one followee from u’s current followee set, and then combine her with another m randomly-selected non-followees. – For these m + 1 users, any recommendation algorithm would generate a ranking of them in descending order. – The performance is measured by examining how high the real followee is ranked. 19
  • 20. Applications – followee recommendation • Followee recommendation approach [Chen et al., WWW’09] 20
  • 21. Applications – followee recommendation • Evaluation – Rank of the real followee – Mean reciprocal rank
  • 22. Applications – followee recommendation • Evaluation – Smaller neighbourhood size K has better results – BLDA and TLDA ranks real followees higher than LDA with a smaller deviation than LDA – Adding behaviours to topic modelling help the task: BLDA > TLDA – LDA: better MRR but low average rank: LDA is not robust and performs 22 particular well or worse on some set of users
  • 23. Applications – followee recommendation • Study on behavior-driven index – Correlation between DKNN and Rank of the real followee – Correlation between βK and relative rank rLDA/rBLDA – β1 will be used judge whether a given user is behavior-driven or topic driven follower
  • 24. Applications – followee recommendation • Topic-driven follower vs. Behavior-driven follower • Results on behavior-driven follower BLDA significantly performs better than LDA on behavior-driven followees. 24
  • 25. Applications – followee recommendation • A combined followee recommendation method (comModel) – Using behavior-driven index to choose model • Model selection
  • 26. Applications – followee recommendation • Comparisons of comModel, B-LDA and LDA – Rank of the real followee and MRR – Cummulative distribution of ranks (CDR) for real followees
  • 27. Summary • We propose B-LDA - a Behaviour-integrated topic model based on LDA • Comparison B-LDA with LDA and Twitter-LDA – Experiment results show B-LDA can find topics with dominant behaviours – We propose an index βK to characterize users who are behaviourdriven followers, and demonstrate that B-LDA significantly outperforms other models on followee recommendation for behaviour-driven followers. – Based on the βK index, we propose a new recommendation framework combining B-LDA and LDA which gives promising recommendations. 27
  • 29. Reference • [Zhao et al., ECIR’10] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li, “Comparing twitter and traditional media using topic models,” ser. ECIR, 2011, pp. 338–349 • [Kwak et al., WWW’10] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in WWW, 2010, pp. 591–600. • W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang, “Collaborative filtering for orkut communities: discovery of user latent behavior,” ser. WWW, 2009, pp. 681–690.

Editor's Notes

  1. Y = 0, background word distribution
  2. In contrast to LDA, B-LDA generates topics each enhanced by a user behaviour distribution, which is denoted as ψt,b in the output. Just like LDA is expected to generate topics each containing words most relevant to a coherent topic, we would like B-LDA to generate topics which are identified with some dominant behaviour. Delta is a normalization factor.
  3. In contrast to LDA, B-LDA generates topics each enhanced by a user behavior distribution, which is denoted as ψt,b in the output. Just like LDA is expected to generate topics each containing words most relevant to a coherent topic, we would like B-LDA to generate topics which are identified with some dominant behaviour.
  4. For the “PO” dimension, topic 16 is related to daily news which is contributed mainly by news media accounts who mostly have no other behavior than posting. Topic 23 is mostly users’ daily personal updates which seldom interest others to retweet or reply. Topic 71 is also related to personal updates, but more on things related to cell phones, laptops, etc., and it tend to have more sentiment words like ‘omg’, ‘damn’ etc. than topic 23. Top-4 topics in the “RT” dimension are topics related to jokes like topic 51 which is a mixture of jokes and interesting things shared by user @SoSingaporean and @BvsSG, popular quotes like topic 70, daily horoscope topic 54, and topic 52 which is related to a music event - MAMA concert. We can also tell that topical words used in reply are more informal than the other behavior types which are hard to be labeled.
  5. This observation seems to echo the findings in [] that Twitter functions both as ..Only consider topical interests regardless of behavior patterns
  6. E.g.: a music fan may follow: music celebrities, official media channels, fan club etc.
  7. a music fan may follow clusters of users: music celebrities, official media channels, fan club etc.
  8. The model itself is not designed for fee rec we just wanna try these models on feeRec task to see which one is more suitable to profile users for the task.
  9. The model itself is not designed for fee rec we just wanna try these models on feeRec task to see which one is more suitable to profile users for the task.
  10. The model itself is not designed for fee rec we just wanna try these models on feeRec task to see which one is more suitable to profile users for the task.
  11. The model itself is not designed for fee rec we just wanna try these models on feeRec task to see which one is more suitable to profile users for the task.