Lamjed Ben Jabeur, Lynda Tamine, Mohand Boughanem. Uprising microblogs: A Bayesian network retrieval model for tweet search (regular paper). Dans : ACM Symposium on Applied Computing (SAC 2012), Riva del Garda (Trento), Italy, 26/03/12-30/03/12, mars 2012 http://dx.doi.org/10.1145/2245276.2245459
We investigate in this paper the problem of accessing to real-time information and we propose a Bayesian network
retrieval model for tweet search. The proposed model interprets tweet relevance as a conditional probability and
estimates it using different sources of evidence. In particular, we introduce a social search model that considers, in
addition to text similarity measures, the microblogger’s influence, the time magnitude and the presence of hashtags.
To evaluate our model, we conducted a series of experiments on the TREC Tweets2011 corpus. Experiments with “Arab
Spring” topic set show that both of social and temporal features improve tweet search for different types of queries.
Final results show also that our model outperforms other traditional information retrieval baselines.
Fostering Friendships - Enhancing Social Bonds in the Classroom
Uprising microblogs: A Bayesian network retrieval model for tweet search
1. Uprising microblogs: A Bayesian network
retrieval model for tweet search
Lamjed Ben Jabeur, Lynda Tamine and Mohand Boughanem
IRIT, Université Paul Sabatier
2. A Bayesian network retrieval model for tweet search
Outline
1. Microblogging service
2. Tweet search
3. Bayesian network topology
4. Computing conditional probabilities
5. Experimental evaluation
6. Conclusion and future work
2
3. Microblogging service
Microblog?
“ Microblogging is a new form of communication [….]
that enables users to broadcast and share information
about their activities, opinions and status. [Java et
al.2007].
”
• Microblog post
– Short (140 characters)
1 billions Publications /week
– Real-time 50 millions Publications /day
– Social motivation 177 million Publications in mars 2011
– Mobile device +106 millions User accounts
3
4. Microblogging service
Tweet, retweet et hashtag ?
“
Jack Dorsey 21 Mars 06 1ier Tweet
inviting coworkers #oilspill
“
Stephen Colbert 21 Juin 2010 Golden Tweet Award 2010
In honor of oil-soaked birds, 'tweets' are now 'gurgles. http://bit.ly/cIhZNf
“
Wendy's 8 Juin 2011 Golden Tweet Award 2011
RT for a good cause. Each Retweet sends 50¢ to help kids in foster care. #TreatItFwd
“
CORIA11 16 mars 2010
CORIA 2011 : Université d'Avignon #CORIA11 http://yfrog.com/h3y
““
MohBoughanem 17 Mars 2010
@coria2011 well visualized, quickly found
MohBoughanem CORIA11 17 Mars 2010
4
@coria2011 well visualized, quickly found
6. Tweet search
Microblog IR
• Users overwhelmed by the huge quantity of tweets
– Important publication rate
– Diverse sources of information
Difficulty to accessing to interesting posts
• Microblog IR tasks
– Person search and follower suggestion
– Trend extraction
– Opinion search
– Tweet search
6
7. Tweet search
Tweet search task
“ real-time search task, where the user wishes to see the
most recent but relevant information to the query. (Ounis
et al., 2011).
”
“ adhoc search on Twitter, where a user’s information need is
”
represented by a query at a specific time. (Ounis et al., 2011).
• Search motivations
– access to concise and credible information
– access to fresh and real-time news
– follow an event
– collect opinions and public sentiments
7
8. Tweet search
Related work
1. Spatio-temporel context
TwitterStand (Sankaranarayanan J. et al, 2009) TweetSieve (Grinev M et al, 2009)
2. Microblog features
– followership, tweets, retweets, reply, hashtags, URLs
– Linear combination (Nagmoti et al., 2010)
– Learn to Rank (Duan Y et al., 2010)
8
9. Tweet search
Related work
3. Social network structure
– Indegree, Retweet et Mention influence (Cha et al.,
2010).,TweetRank, FollowerRank (Nagmoti et al., 2010).
– Authority (Kwak et al., 2010)
– Influence (Kwak et al., 2010), TwitterRank (Weng et al., 2010),
Popularity (Duan et al.,2010)
9
10. Tweet search
Contributions
topical
• Relevance features:
– Term occurrence
– social influence
– time magnitude
• Bayesian network model
temporal social
10
11. Bayesian network topology
Definitions and notations
• Query: q 0,1 q, q
• Term: ki 0,1 k , ki i
• Term configuration: k
example : k1 , k 2
k k1 , k2 ), (k1 , k2 ), (k1 , k2 ), (k1 , k2 )
(
• Tweet: t j 0,1 ti , ti
• Microblogger: uk 0,1 uk , uk
11
13. Computing conditional probabilities
Query evaluation
Query q
P(q t i ) P(q | k )P(k | t i ) P( t i | u k ) P(u k )
k
Terms k1 k2 k3
P(q t j ) P(q | k )P( t j | u k ) P(u k )
k
Tweets t1 t2 t3
P(k i | t j ) P(k i | t j )
k |on(i,k ) 1
i k i |on(i,k ) 0
Microbloggers u1 u2
13
14. Computing conditional probabilities
Query
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
P(q | k ) on(i, k )
i , ki q
14
15. Computing conditional probabilities
Tweet
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
P(k i | t j ) (1 ) F (ki , t j ) H (ki , t j ) T (ki , t j ) L(t j )
Term occurrence Tweet properties
P( k i | t j ) 1 P( k i | t j )
15
16. Computing conditional probabilities
Term frequency
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
P(k i | t j ) (1 ) F (ki , t j ) H (ki , t j ) T (ki , t j ) L(t j )
a
if k i t j
F ( ki , t j )
1 1
F (ki , t j ) tf ki ,t j 0,8
0
a=0,1
otherwise 0,6 a=0,25
0,4 a=0,5
0,2 a=0,75
0 a=1
0 5 tf ki ,t j10
16
17. Computing conditional probabilities
Hashtag
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
P(k i | t j ) (1 ) F (ki , t j ) H (ki , t j ) T (ki , t j ) L(t j )
b if # k i t j
1
H (ki , t j ) tf #ki ,t j
b otherwise
17
18. Computing conditional probabilities
Time magnitude
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
P(k i | t j ) (1 ) F (ki , t j ) H (ki , t j ) T (ki , t j ) L(t j )
tweets
df k i, j
T ( ki , t j ) 30
j 20
t1
10
t2
0
1 2 tems
3 4 5
j t k , t j t k t time
18
19. Computing conditional probabilities
Tweet length
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
P(k i | t j ) (1 ) F (ki , t j ) H (ki , t j ) T (ki , t j ) L(t j )
1
L(t j )
1 avgtl tltj
19
20. Computing conditional probabilities
Microblogger
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
1
P( t j | u k )
u k
20
21. Computing conditional probabilities
Social influence
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
P(uk ) Inf (uk )
PageRank on Retweet Social Network
1 Inf G 1 (ui )
k
Inf Gk (ui ) d (1 d ) w j ,i
U u j ,e ( u j ,ui )E O(u j )
(u j ) (u j )
w j ,i
(u j )
21
22. Computing conditional probabilities
Social influence
P(q t j ) P(q | k )P( t j | u k ) P(u k ) P(k i | t j ) P( k i | t j )
k |on(i,k )1
k i k i |on(i,k ) 0
(u j ) (ui )
wi , j
(ui )
22
31. A Bayesian network retrieval model for tweet search
Conclusion and future work
• Tweet search model
– Normalized Term frequency
– Time magnitude
– Social influence
• Integrating relevance factors within a Bayesian network
• Query profile impact features performances.
• Our model outperforms traditional IR baselines.
• Future work
– Automatically detect optimal time window
– Select appropriate feature depending on the query profile
31
32. Thank you for your attention!
Follow me on Twitter!
http://twitter.com/amjedbj