My talk on fast feature selection filter algorithms at the ACM International Conference on the Theory of Information Retrieval (ICTIR 2016) held in Newark, DE, US
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA
1. ACM International Conference on the
Theory of Information Retrieval
University of Delaware, Newark, DE, USA September 13-16, 2016
Fast Feature Selection Algorithms
for Learning to Rank
Andrea Gigli
Department of Computer Science, University of Pisa & ISTI – CNR Pisa
Franco Maria Nardini, Claudio Lucchese, Raffaele Perego
ISTI – CNR Pisa & istella*, Pisa
7. We propose the following algorithms
Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
8. We compare them with the Greedy search Algorithm for
feature Selection (GAS) proposed by Geng, Liu, Qin, Li
(SIGIR07)
All the competing FSAs belong to Filter Methods family.
Competing FSAs try to to Maximise the Importance of a
feature w.r.t. the judgements and Minimize Similarity
among selected features.
Both X-GAS and the GAS require hyper-parameter
calibration.
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
9. Proposed Algorithms for Feature
Selection
Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
ICTIR 2016, Newark, DE
10. Proposed Algorithms for Feature
Selection: N-GAS
The graph is built and the Subset S of n=4 selected
features is initialized.
Importance of the 8th
feature w.r.t. query-offer
relevance judgements
Similarity
between features
6th and 7th
ICTIR 2016, Newark, DE
11. Proposed Algorithms for Feature
Selection: N-GAS
Start by adding the node with the highest importance to S
(Node ❶ in this example)
ICTIR 2016, Newark, DE
12. Proposed Algorithms for Feature
Selection: N-GAS
• Let be the node having the lowest similarity wrt Node ❶
• Let be the node having the highest similarity wrt Node
ICTIR 2016, Newark, DE
13. Proposed Algorithms for Feature
Selection: N-GAS
From ( , ) select the Node with the highest importance and
add it to S (Node in the example).
ICTIR 2016, Newark, DE
14. Proposed Algorithms for Feature
Selection: N-GAS
• Let ❷ be the node having the lowest similarity wrt Node
• Let ❸ be the node having the highest similarity wrt Node ❷
ICTIR 2016, Newark, DE
15. Proposed Algorithms for Feature
Selection: N-GAS
From (❷, ❸ ) select the node with the highest importance and
add it to S (Node ❷ in the example).
ICTIR 2016, Newark, DE
16. Proposed Algorithms for Feature
Selection: N-GAS
• Let ❹be the node having the lowest similarity wrt Node ❷
• Let ❽ be the node having the highest similarity wrt Node ❹
ICTIR 2016, Newark, DE
17. Proposed Algorithms for Feature
Selection: N-GAS
In (❹, ❽ ) select the node with the highest importance and
add it to S (Node ❹ in the example)
ICTIR 2016, Newark, DE
18. Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
19. Proposed Algorithms for Feature
Selection: X-GAS
The graph is built and the Subset S of n=4 selected features is
initialized.
ICTIR 2016, Newark, DE
20. Proposed Algorithms for Feature
Selection: X-GAS
Start by adding the node with the highest importance to S
(Node ❶ in this example)
ICTIR 2016, Newark, DE
21. Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of nodes less similar to ❶
Filter
Parameter
ICTIR 2016, Newark, DE
22. Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance and
add it to S (Node in the example)
ICTIR 2016, Newark, DE
23. Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of the nodes less similar to
ICTIR 2016, Newark, DE
24. Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance and
add it to S ( Node ❸ in the example)
ICTIR 2016, Newark, DE
25. Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of the nodes less similar to ❸
ICTIR 2016, Newark, DE
26. Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance
and add it to S (Node ❹ in the example)
ICTIR 2016, Newark, DE
27. Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
30. Application to Web Search
Engine Data
Bing data http://research.microsoft.com/en-us/projects/mslr/
Yahoo! data http://webscope.sandbox.yahoo.com
Train Validation Test
# queries 19,944 2,994 6,983
# urls 473,134 71,083 165,660
# features 519
Train Validation Test
# queries 18,919 6,306 6,306
# urls 723,412 235,259 241,521
# features 136
ICTIR 2016, Newark, DE
31. Experimental Framework
Importance, ) :
+,-.@10 using each as a
ranking model
Similarity, 2 , :
Spearman RankCorrelation
Coefficient
Distance, 3 , : 1 − S , 6
L2R Algorithm: LambdaMART
ICTIR 2016, Newark, DE
32. Select a subset of n<K
features using a given FSA
Repeat for different n in
{5%K, 10%K, 20%K, 30%K, 40%K, 50%K, 75%K, K}
Experimental Protocol
Train LamdaMART
using n features
Measure LamdaMART
Performance on the
Test Set
Compare
FSAs using
average
789@ :
1 2 3 4
Repeat from ❶
for each FSA
ICTIR 2016, Newark, DE
33. Results on “Bing” dataset789@:
Feature Subset Size
as % of the Feature Set
Size (K)
ICTIR 2016, Newark, DE
34. Results on “Yahoo!” dataset
Feature Subset Size
as % of the Feature Set
Size (K)
ICTIR 2016, Newark, DE
789@:
36. Conclusion
X-GAS e H-GAS show a performance greater or equal than
the benchmark model
H-CAS and N-GAS are more efficient than the others
because do not need any hyper-parameter calibration.
FutureWork:
experiments on the new LtR dataset provided by istella*
(http://blog.istella.it/istella-learning-to-rank-dataset/)
application to other ML contexts, sorting problems and
ensemble learning.
ICTIR 2016, Newark, DE
37. ACM International Conference on the
Theory of Information Retrieval
University of Delaware, Newark, DE, USA September 13-16, 2016
Thank you and
special thanks to ACM-SIGIR for
theTravel Grant support
Andrea Gigli Email: andrgig@gmail.com
Twitter: @andrgig
http://www.slideshare.net/andrgig