Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA

ACM International Conference on the
Theory of Information Retrieval
University of Delaware, Newark, DE, USA September 13-16, 2016
Fast Feature Selection Algorithms
for Learning to Rank
Andrea Gigli
Department of Computer Science, University of Pisa & ISTI – CNR Pisa
Franco Maria Nardini, Claudio Lucchese, Raffaele Perego
ISTI – CNR Pisa & istella*, Pisa

Outline
Introduction
Proposed Feature Selection Algorithms (FSA)
Application to Learning to Rank
ICTIR 2016, Newark, DE

...
…
…
...
...
Learning
System
Ranking
System
Indexed
Documents
...
Training
Prediction
How to Rank Documents using
Supervised Learning
...
, ,
,
,
, ,
…
…
, ,
,
,
, ,
, ,
,
,
: query i
, : document j
associated to the
query i
, : relevance
label for the j-th
document
associated to the
i-th query
, : scoring
function

,
Learning to Rank
, , , !… …, , , !
" ,
( )
" ,
(#)
" ,
($)
⋮
" ,
(&)
" ,#
( )
" ,#
(#)
" ,#
($)
⋮
" ,#
(&)
" ,
( )
" ,
(#)
" ,
($)
⋮
" ,
(&)
…
Documents Query-Document LabelsQuery
, ≈ (()
K is in order of
hundreds,
thousands

We propose the following algorithms
Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection

We compare them with the Greedy search Algorithm for
feature Selection (GAS) proposed by Geng, Liu, Qin, Li
(SIGIR07)
All the competing FSAs belong to Filter Methods family.
Competing FSAs try to to Maximise the Importance of a
feature w.r.t. the judgements and Minimize Similarity
among selected features.
Both X-GAS and the GAS require hyper-parameter
calibration.
Selection

Selection
(X-GAS)
Selection (H-GAS)

Selection: N-GAS
The graph is built and the Subset S of n=4 selected
features is initialized.
Importance of the 8th
feature w.r.t. query-offer
relevance judgements
Similarity
between features
6th and 7th

Selection: N-GAS
Start by adding the node with the highest importance to S
(Node ❶ in this example)

Selection: N-GAS
• Let be the node having the lowest similarity wrt Node ❶
• Let be the node having the highest similarity wrt Node

Selection: N-GAS
From ( , ) select the Node with the highest importance and
add it to S (Node in the example).

Selection: N-GAS
• Let ❷ be the node having the lowest similarity wrt Node
• Let ❸ be the node having the highest similarity wrt Node ❷

Selection: N-GAS
From (❷, ❸ ) select the node with the highest importance and
add it to S (Node ❷ in the example).

Selection: N-GAS
• Let ❹be the node having the lowest similarity wrt Node ❷
• Let ❽ be the node having the highest similarity wrt Node ❹

Selection: N-GAS
In (❹, ❽ ) select the node with the highest importance and
add it to S (Node ❹ in the example)

(X-GAS)
Selection (H-GAS)
Selection

Selection: X-GAS
The graph is built and the Subset S of n=4 selected features is
initialized.

Selection: X-GAS
Start by adding the node with the highest importance to S
(Node ❶ in this example)

Selection: X-GAS
Select the 50% of nodes less similar to ❶
Filter
Parameter

Selection: X-GAS
From the selection take the node with the highest importance and
add it to S (Node in the example)

Selection: X-GAS
Select the 50% of the nodes less similar to

Selection: X-GAS
From the selection take the node with the highest importance and
add it to S ( Node ❸ in the example)

Selection: X-GAS
Select the 50% of the nodes less similar to ❸

Selection: X-GAS
From the selection take the node with the highest importance
and add it to S (Node ❹ in the example)

Selection: H-GAS
1
4
5
7
6
8
2
3
1
4
6
5
2
3
8
7
8
3
7
5
2
1
4
6
1
5
8
4

Application to Web Search
Engine Data
Bing data http://research.microsoft.com/en-us/projects/mslr/
Yahoo! data http://webscope.sandbox.yahoo.com
Train Validation Test
# queries 19,944 2,994 6,983
# urls 473,134 71,083 165,660
# features 519
Train Validation Test
# queries 18,919 6,306 6,306
# urls 723,412 235,259 241,521
# features 136

Experimental Framework
Importance, ) :
+,-.@10 using each as a
ranking model
Similarity, 2 , :
Spearman RankCorrelation
Coefficient
Distance, 3 , : 1 − S , 6
L2R Algorithm: LambdaMART

Select a subset of n<K
features using a given FSA
Repeat for different n in
{5%K, 10%K, 20%K, 30%K, 40%K, 50%K, 75%K, K}
Experimental Protocol
Train LamdaMART
using n features
Measure LamdaMART
Performance on the
Test Set
Compare
FSAs using
average
789@ :
1 2 3 4
Repeat from ❶
for each FSA

Results on “Bing” dataset789@:
Feature Subset Size
as % of the Feature Set
Size (K)

Results on “Yahoo!” dataset
Feature Subset Size
as % of the Feature Set
Size (K)
789@:

Feature Subset
Dimension
5% 10% 20% 30% 40% 100%
N-GAS 0.4011▼ 0.4459 0.471 0.4739▼ 0.4813 0.4863
X-GAS, p = 0.05 0.4376▲ 0.4528 0.4577▼ 0.4825 0.4834 0.4863
H-GAS, "single" 0.4423▲ 0.4643▲ 0.4870▲ 0.4854 0.4848 0.4863
H-GAS, "ward" 0.4289 0.4434▼ 0.4820 0.4879 0.4853 0.4863
GAS, c = 0.01 0.4294 0.4515 0.4758 0.4848 0.4863 0.4863
Feature Subset
Dimension
5% 10% 20% 30% 40% 100%
N-GAS 0.7430▼ 0.7601 0.7672 0.7717 0.7724 0.7753
X-GAS, p = 0.8 0.7655 0.7666 0.7723 0.7742 0.7751 0.7753
H-GAS, "single" 0.7350▼ 0.7635 0.7666 0.7738 0.7742 0.7753
H-GAS, "ward" 0.7570▼ 0.7626 0.7704 0.7743 0.7755 0.7753
GAS, c = 0.01 0.7628 0.7649 0.7671 0.773 0.7737 0.7753
Results
Yahoo! dataset
Bing dataset

Conclusion
X-GAS e H-GAS show a performance greater or equal than
the benchmark model
H-CAS and N-GAS are more efficient than the others
because do not need any hyper-parameter calibration.
FutureWork:
experiments on the new LtR dataset provided by istella*
(http://blog.istella.it/istella-learning-to-rank-dataset/)
application to other ML contexts, sorting problems and
ensemble learning.

ACM International Conference on the
Theory of Information Retrieval
University of Delaware, Newark, DE, USA September 13-16, 2016
Thank you and
special thanks to ACM-SIGIR for
theTravel Grant support
Andrea Gigli Email: andrgig@gmail.com
Twitter: @andrgig
http://www.slideshare.net/andrgig

Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (14)

Similar to Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA

Similar to Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA (20)

More from Andrea Gigli

More from Andrea Gigli (20)

Recently uploaded

Recently uploaded (20)

Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA