0
北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University

CIKM 2013
Exploiting Ranking Factorization
Ma...
Problem Definition
Q1

Q2

…

Qn

Q1

Q2

…

Qn

ranking

timestamp

Tweet Collection
2

relevance

(Q1 , t1)
(Q2 , t2)
…
...
Motivations
IR for microblog is a non-trivial problem









Length of document is very short
 severe vocabulary-mi...
Motivations
Learning to rank methods can make full use of different
models or factors in microblog retrieval





differ...
Limitations
Features are considered independent





Some features are closely related to each other.


RT and @ symbol...
Proposal
Employ an Ranking FM Framework





Adopts FM as the ranking function to model interactions
between features

U...
Outline
Ranking FM for Microblog Retrieval






Ranking FM Framework
Optimization Methods

Feature Description
Experim...
Ranking FM Framework
Pairwise approach



 x p , y p  ,  xq , yq 


1 y p
  x p , xq  , z  

 1 yq




y...
Factorization Machines Model
n

n

ˆ
y ( x)  w0   wi xi  
i 1

n



i 1 j i 1

k

vi , v j xi x j

factorized
pa...
Learn Ranking FM




timeconsuming

Stochastic Gradient Descent
 Grid search on validation set for find the best λ
Adap...
Feature Description


Content Relevance Features (3)





Semantic Expansion Features (3x3=9)







Query & Twee...
Experimental Setup


Dataset






title field of link pages

TREC’11 50 queries
TREC’12 60 queries

Evaluation Metric...
Baselines


KL2SFBLoc [3]





hitURLrun3 [4]





Expanded language model with two-stage query expansion
Perform v...
Ranking FM Performance
7% improve
on P@30
4% improve
on P@30
Metric

KL2SFBLoc

RSVM_Full

hitURLrun3

RFM_FullSGD

RFM_Fu...
Feature Study
0.5
Full
-Quality
-Document Expansion
-Query Expansion
-Content Relevance
Only Content Relevance

0.45

0.4
...
Influence of the hyper-parameter k

0.29

0.275
RFM_FullSGD

RFM_FullSGD

0.285

0.27
0.265

0.275

MAP

P@30

0.28

0.27
...
Stochastic gradient descent v.s.
Adaptive regularization
4

3

x 10

Training time (s)

2.5

Stochastic Gradient Descent
A...
Summary


Ranking FM Framework





Two optimization methods





Pairwise approach
Use Factorization Machines as r...
References







[1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC2011 MicroblogTrack. In Proceedings...
北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University

CIKM 2013
Exploiting Ranking Factorization
Ma...
Upcoming SlideShare
Loading in...5
×

Exploiting Ranking Factorization Machines for Microblog Retrieval

149

Published on

Learning to rank method has been proposed for practical application in the field of information retrieval. When employing it in microblog retrieval, the significant interactions of the various involved features are rarely considered. In this paper, we propose a Ranking Factorization Machine (Ranking FM) model, which applies Factorization Machine model to microblog ranking on basis of pairwise classification. In this way, our proposed model combines the generality of learning to rank framework with the advantages of factorization models in estimating interactions between features, leading to better retrieval performance. Moreover, three groups of features (content relevance features, semantic expansion features and quality features) and their interactions are utilized in the Ranking FM model with the methods of stochastic gradient descent and adaptive regularization for optimization. Experimental results demonstrate its superiority over several baseline systems on a real Twitter dataset in terms of P@30 and MAP metrics. Furthermore, it outperforms the best performing results in the TREC'12 Real-Time Search Task.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
149
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Exploiting Ranking Factorization Machines for Microblog Retrieval"

  1. 1. 北京大学计算机科学技术研究所 Institute of Computer Science & Technology Peking University CIKM 2013 Exploiting Ranking Factorization Machines for Microblog Retrieval Runwei Qiang Feng Liang Jianwu Yang Institute of Computer Science and Technology Peking University 1 Exploiting Ranking Factorization Machines for Microblog Retrieval
  2. 2. Problem Definition Q1 Q2 … Qn Q1 Q2 … Qn ranking timestamp Tweet Collection 2 relevance (Q1 , t1) (Q2 , t2) … (Qn , tn) Real-time Search At time t, find tweets about topic X. —— TREC’2011 Not Available !! Exploiting Ranking Factorization Machines for Microblog Retrieval
  3. 3. Motivations IR for microblog is a non-trivial problem     Length of document is very short  severe vocabulary-mismatch problem, how to apply query expansion technique? Abundance of shortened URLs  offer ways to expand document, but how to make use of it? Large quantities of pointless babble  3 How to use the tweet quality to filter non-informative message? Exploiting Ranking Factorization Machines for Microblog Retrieval
  4. 4. Motivations Learning to rank methods can make full use of different models or factors in microblog retrieval   different factors => different features Many features have been proved useful    4 Semantic features between query and document Tweet quality features, i.e. link, retweet, and mention count/binary Exploiting Ranking Factorization Machines for Microblog Retrieval
  5. 5. Limitations Features are considered independent   Some features are closely related to each other.  RT and @ symbols occur in the same tweet frequently. Feature utilization   Link feature: binary => semantic information Small plane crashes at big airport; no one notices- CNN.com 5 Exploiting Ranking Factorization Machines for Microblog Retrieval
  6. 6. Proposal Employ an Ranking FM Framework   Adopts FM as the ranking function to model interactions between features Utilize several effective features which are neglected in existing work Optimize Ranking FM by two optimization methods     6 Stochastic Gradient Descent Adaptive Regularization Exploiting Ranking Factorization Machines for Microblog Retrieval
  7. 7. Outline Ranking FM for Microblog Retrieval    Ranking FM Framework Optimization Methods Feature Description Experiments Summary    7 Exploiting Ranking Factorization Machines for Microblog Retrieval
  8. 8. Ranking FM Framework Pairwise approach   x p , y p  ,  xq , yq   1 y p   x p , xq  , z     1 yq   yq   yp   Loss function   ( min L()   lt f ;  x (pt ) , xqt ) , z ( t )      2 l  t 1 FM ranking Hinge Loss function Function 8   Regularization term Exploiting Ranking Factorization Machines for Microblog Retrieval
  9. 9. Factorization Machines Model n n ˆ y ( x)  w0   wi xi   i 1 n  i 1 j i 1 k vi , v j xi x j factorized parameters vi , v j  vi , f ·j , f v f 1 nested interations factorization dimensionality 2 n  1 k  n  2 2 ˆ y ( x)  w0   wi xi      vi , f xi    vi , f xi   2 f 1   i 1 i 1  i 1   n 𝑂(𝑘 ∙ 𝑛) 9 Exploiting Ranking Factorization Machines for Microblog Retrieval
  10. 10. Learn Ranking FM   timeconsuming Stochastic Gradient Descent  Grid search on validation set for find the best λ Adaptive Regularization [2] Training set   ˆ (t 1) |  (t ) : arg min   l  y (x | ( t ) ), y    ( t ) 2         x , y ST  Validation Set   ˆ l  y (x | ( t 1) ), y    ( t ) 2        x , y SV   (t 1) | (t 1) : arg min    adapt the regularization automatically 10 Exploiting Ranking Factorization Machines for Microblog Retrieval
  11. 11. Feature Description  Content Relevance Features (3)    Semantic Expansion Features (3x3=9)      Query & Tweet BM25、TFIDF、Language Model Score Query & topic info; Expanded query & Tweet; Expanded query & Topic info BM25、TFIDF、Language Model Score Quality Features (5)   11 mention、retweet、hashtag、link binary feature tweet length Exploiting Ranking Factorization Machines for Microblog Retrieval
  12. 12. Experimental Setup  Dataset    title field of link pages TREC’11 50 queries TREC’12 60 queries Evaluation Metrics Status 200 OK 302 Found 815,794 403 Forbidden 817,273 404 Not Found 868,667 Null about 2 weeks twitter data TopicInfo Corpus   HTTP Code TREC Tweet11 Corpus   Summary statistics of Tweet11 Corpus Null 67,011 Searchable # of tweets 8,084,724 8,900,518 Summary statistics of TopicInfo Corpus 200 OK 302 Found Forbidden 5,050 404 Not Found 92,378 Null P@30 & MAP Status 403  HTTP Code Null 265,468 Searchable 12 # of tweets 1,225,947 688 1,226,635 Exploiting Ranking Factorization Machines for Microblog Retrieval
  13. 13. Baselines  KL2SFBLoc [3]    hitURLrun3 [4]    Expanded language model with two-stage query expansion Perform very well in TREC’11 real time search task Use a logistic regression model to learn a pairwise ranking for microblog retrieval Best Performing system in TREC’12 real time search task RSVM_Full   13 Ranking SVM with linear kernel Same feature set the Ranking FM used Exploiting Ranking Factorization Machines for Microblog Retrieval
  14. 14. Ranking FM Performance 7% improve on P@30 4% improve on P@30 Metric KL2SFBLoc RSVM_Full hitURLrun3 RFM_FullSGD RFM_FullAR P@30 0.2441 0.2616 0.2701 0.2808 0.2746 MAP 0.2506 0.2597 0.2642 0.2694 0.2678 TREC’12 Best 14 Ranking FM Exploiting Ranking Factorization Machines for Microblog Retrieval
  15. 15. Feature Study 0.5 Full -Quality -Document Expansion -Query Expansion -Content Relevance Only Content Relevance 0.45 0.4 P@N 0.35 0.3 0.25 0.2 0 5 10 15 N 20 25 30 Ranking FM of k=3 optimized by SGD 15 Exploiting Ranking Factorization Machines for Microblog Retrieval
  16. 16. Influence of the hyper-parameter k 0.29 0.275 RFM_FullSGD RFM_FullSGD 0.285 0.27 0.265 0.275 MAP P@30 0.28 0.27 0.255 0.265 0.25 0.26 0.255 0 0.26 5 10 15 0.245 0 k 5 10 15 k Ranking FM optimized by SGD 16 Exploiting Ranking Factorization Machines for Microblog Retrieval
  17. 17. Stochastic gradient descent v.s. Adaptive regularization 4 3 x 10 Training time (s) 2.5 Stochastic Gradient Descent Adaptive Regularization 2 1.5 1 0.5 0 0 5 10 15 k Method P@10 P@30 MAP RFM_FullSGD 0.4068 0.3695 0.2808 0.2694 RFM_FullAR 17 P@5 0.4034 0.3678 0.2746 0.2678 Exploiting Ranking Factorization Machines for Microblog Retrieval
  18. 18. Summary  Ranking FM Framework    Two optimization methods    Pairwise approach Use Factorization Machines as ranking function Stochastic Gradient Descent Adaptive Regularization Three groups of features    18 Content Relevance Features Semantic Expansion Features Quality Features Exploiting Ranking Factorization Machines for Microblog Retrieval
  19. 19. References     [1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC2011 MicroblogTrack. In Proceedings of TREC 2011, 2012. [2] S. Rendle. Learning recommender systems with adaptive regularization. In Proceedings of the fifth ACM international conference on Web search and data mining, WSDM ’12, pages 133–142. ACM, 2012. [3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM, 2012. [4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012 Microblog Track. In Proceedings of TREC 2012, 2013. 19 Exploiting Ranking Factorization Machines for Microblog Retrieval
  20. 20. 北京大学计算机科学技术研究所 Institute of Computer Science & Technology Peking University CIKM 2013 Exploiting Ranking Factorization Machines for Microblog Retrieval Runwei Qiang Feng Liang Jianwu Yang Institute of Computer Science and Technology Peking University 20 Exploiting Ranking Factorization Machines for Microblog Retrieval
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×