SlideShare a Scribd company logo
1 of 37
Download to read offline
ACM International Conference on the
Theory of Information Retrieval
University of Delaware, Newark, DE, USA September 13-16, 2016
Fast Feature Selection Algorithms
for Learning to Rank
Andrea Gigli
Department of Computer Science, University of Pisa & ISTI – CNR Pisa
Franco Maria Nardini, Claudio Lucchese, Raffaele Perego
ISTI – CNR Pisa & istella*, Pisa
Outline
Introduction
Proposed Feature Selection Algorithms (FSA)
Application to Learning to Rank
ICTIR 2016, Newark, DE
Outline
Introduction
Proposed Feature Selection Algorithms (FSA)
Application to Learning to Rank
ICTIR 2016, Newark, DE
...
…
…
...
...
Learning
System
Ranking
System
Indexed
Documents
...
Training
Prediction
How to Rank Documents using
Supervised Learning
...
, ,
,
,
, , 	
…
…
, ,
,
,
, , 	
, ,
,
,
: query i
, : document j
associated to the
query i
, : relevance
label for the j-th
document
associated to the
i-th query
, : scoring
function
ICTIR 2016, Newark, DE
,
Learning to Rank
, , , !… …, , , !
" ,
( )
" ,
(#)
" ,
($)
⋮
" ,
(&)
" ,#
( )
" ,#
(#)
" ,#
($)
⋮
" ,#
(&)
" ,
( )
" ,
(#)
" ,
($)
⋮
" ,
(&)
…
Documents Query-Document LabelsQuery
, ≈ (()
K is in order of
hundreds,
thousands
ICTIR 2016, Newark, DE
Outline
Introduction
Proposed Feature Selection Algorithms (FSA)
Application to Learning to Rank
ICTIR 2016, Newark, DE
We propose the following algorithms
Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
We compare them with the Greedy search Algorithm for
feature Selection (GAS) proposed by Geng, Liu, Qin, Li
(SIGIR07)
All the competing FSAs belong to Filter Methods family.
Competing FSAs try to to Maximise the Importance of a
feature w.r.t. the judgements and Minimize Similarity
among selected features.
Both X-GAS and the GAS require hyper-parameter
calibration.
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection
Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
The graph is built and the Subset S of n=4 selected
features is initialized.
Importance of the 8th
feature w.r.t. query-offer
relevance judgements
Similarity
between features
6th and 7th
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
Start by adding the node with the highest importance to S
(Node ❶ in this example)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
• Let be the node having the lowest similarity wrt Node ❶
• Let be the node having the highest similarity wrt Node
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
From ( , ) select the Node with the highest importance and
add it to S (Node in the example).
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
• Let ❷ be the node having the lowest similarity wrt Node
• Let ❸ be the node having the highest similarity wrt Node ❷
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
From (❷, ❸ ) select the node with the highest importance and
add it to S (Node ❷ in the example).
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
• Let ❹be the node having the lowest similarity wrt Node ❷
• Let ❽ be the node having the highest similarity wrt Node ❹
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
In (❹, ❽ ) select the node with the highest importance and
add it to S (Node ❹ in the example)
ICTIR 2016, Newark, DE
Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
The graph is built and the Subset S of n=4 selected features is
initialized.
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
Start by adding the node with the highest importance to S
(Node ❶ in this example)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of nodes less similar to ❶
Filter
Parameter
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance and
add it to S (Node in the example)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of the nodes less similar to
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance and
add it to S ( Node ❸ in the example)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of the nodes less similar to ❸
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance
and add it to S (Node ❹ in the example)
ICTIR 2016, Newark, DE
Naïve Greedy search Algorithm for feature Selection (N-GAS)
eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: H-GAS
1
4
5
7
6
8
2
3
1
4
6
5
2
3
8
7
8
3
7
5
2
1
4
6
1
5
8
4
ICTIR 2016, Newark, DE
Outline
Introduction
Proposed Feature Selection Algorithms (FSA)
Application to Learning to Rank
ICTIR 2016, Newark, DE
Application to Web Search
Engine Data
Bing data http://research.microsoft.com/en-us/projects/mslr/
Yahoo! data http://webscope.sandbox.yahoo.com
Train Validation Test
# queries 19,944 2,994 6,983
# urls 473,134 71,083 165,660
# features 519
Train Validation Test
# queries 18,919 6,306 6,306
# urls 723,412 235,259 241,521
# features 136
ICTIR 2016, Newark, DE
Experimental Framework
Importance, ) :
+,-.@10 using each as a
ranking model
Similarity, 2 , :
Spearman RankCorrelation
Coefficient
Distance, 3 , : 1 − S , 6
L2R Algorithm: LambdaMART
ICTIR 2016, Newark, DE
Select a subset of n<K
features using a given FSA
Repeat for different n in
{5%K, 10%K, 20%K, 30%K, 40%K, 50%K, 75%K, K}
Experimental Protocol
Train LamdaMART
using n features
Measure LamdaMART
Performance on the
Test Set
Compare
FSAs using
average
789@ :
1 2 3 4
Repeat from ❶
for each FSA
ICTIR 2016, Newark, DE
Results on “Bing” dataset789@:
Feature Subset Size
as % of the Feature Set
Size (K)
ICTIR 2016, Newark, DE
Results on “Yahoo!” dataset
Feature Subset Size
as % of the Feature Set
Size (K)
ICTIR 2016, Newark, DE
789@:
Feature Subset
Dimension
5% 10% 20% 30% 40% 100%
N-GAS 0.4011▼ 0.4459 0.471 0.4739▼ 0.4813 0.4863
X-GAS, p = 0.05 0.4376▲ 0.4528 0.4577▼ 0.4825 0.4834 0.4863
H-GAS, "single" 0.4423▲ 0.4643▲ 0.4870▲ 0.4854 0.4848 0.4863
H-GAS, "ward" 0.4289 0.4434▼ 0.4820 0.4879 0.4853 0.4863
GAS, c = 0.01 0.4294 0.4515 0.4758 0.4848 0.4863 0.4863
Feature Subset
Dimension
5% 10% 20% 30% 40% 100%
N-GAS 0.7430▼ 0.7601 0.7672 0.7717 0.7724 0.7753
X-GAS, p = 0.8 0.7655 0.7666 0.7723 0.7742 0.7751 0.7753
H-GAS, "single" 0.7350▼ 0.7635 0.7666 0.7738 0.7742 0.7753
H-GAS, "ward" 0.7570▼ 0.7626 0.7704 0.7743 0.7755 0.7753
GAS, c = 0.01 0.7628 0.7649 0.7671 0.773 0.7737 0.7753
Results
Yahoo! dataset
Bing dataset
ICTIR 2016, Newark, DE
Conclusion
X-GAS e H-GAS show a performance greater or equal than
the benchmark model
H-CAS and N-GAS are more efficient than the others
because do not need any hyper-parameter calibration.
FutureWork:
experiments on the new LtR dataset provided by istella*
(http://blog.istella.it/istella-learning-to-rank-dataset/)
application to other ML contexts, sorting problems and
ensemble learning.
ICTIR 2016, Newark, DE
ACM International Conference on the
Theory of Information Retrieval
University of Delaware, Newark, DE, USA September 13-16, 2016
Thank you and
special thanks to ACM-SIGIR for
theTravel Grant support
Andrea Gigli Email: andrgig@gmail.com
Twitter: @andrgig
http://www.slideshare.net/andrgig

More Related Content

What's hot

Linear search Algorithm
Linear search AlgorithmLinear search Algorithm
Linear search Algorithmamit kumar
 
Sampling methods for graphs
Sampling methods for graphsSampling methods for graphs
Sampling methods for graphsAntoine Rebecq
 
Sampling the Twitter graph
Sampling the Twitter graphSampling the Twitter graph
Sampling the Twitter graphAntoine Rebecq
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013Sanjeev Mishra
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 

What's hot (7)

OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Linear search Algorithm
Linear search AlgorithmLinear search Algorithm
Linear search Algorithm
 
Sampling methods for graphs
Sampling methods for graphsSampling methods for graphs
Sampling methods for graphs
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
 
Sampling the Twitter graph
Sampling the Twitter graphSampling the Twitter graph
Sampling the Twitter graph
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 

Viewers also liked

Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Learning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTLearning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTJulian Qian
 
Mine the Wine by Andrea Gigli
Mine the Wine by Andrea GigliMine the Wine by Andrea Gigli
Mine the Wine by Andrea GigliAndrea Gigli
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature SelectionAndres Mendez-Vazquez
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
Bank: Trends, Tech and Future
Bank: Trends, Tech and FutureBank: Trends, Tech and Future
Bank: Trends, Tech and FutureIvano Digital
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
 
2016 Banking Trends
2016 Banking Trends2016 Banking Trends
2016 Banking TrendsMX
 

Viewers also liked (14)

Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Learning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTLearning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMART
 
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
 
Mine the Wine by Andrea Gigli
Mine the Wine by Andrea GigliMine the Wine by Andrea Gigli
Mine the Wine by Andrea Gigli
 
Feature selection
Feature selectionFeature selection
Feature selection
 
22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Bank: Trends, Tech and Future
Bank: Trends, Tech and FutureBank: Trends, Tech and Future
Bank: Trends, Tech and Future
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
2016 Banking Trends
2016 Banking Trends2016 Banking Trends
2016 Banking Trends
 

Similar to Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA

Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document RankingAndrea Gigli
 
Development Infographic
Development InfographicDevelopment Infographic
Development InfographicRealMassive
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersICTeam S.p.A.
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMData Science Milan
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit
 
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...Eugenio Villar
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...IRJET Journal
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Be cps-18 cps13or23-module1
Be cps-18 cps13or23-module1Be cps-18 cps13or23-module1
Be cps-18 cps13or23-module1kavya R
 
Engineering C-programing module1 ppt (18CPS13/23)
Engineering C-programing module1 ppt (18CPS13/23)Engineering C-programing module1 ppt (18CPS13/23)
Engineering C-programing module1 ppt (18CPS13/23)kavya R
 
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...EuroIoTa
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationCSCJournals
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Ada lab manual
Ada lab manualAda lab manual
Ada lab manualaman713418
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
 
Product Recommendation System​ By Using Collaborative Filtering and Network B...
Product Recommendation System​ By Using Collaborative Filtering and Network B...Product Recommendation System​ By Using Collaborative Filtering and Network B...
Product Recommendation System​ By Using Collaborative Filtering and Network B...NathanonKaewsamertha
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Dr Sulaimon Afolabi
 

Similar to Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA (20)

Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
 
Sorting_project_2.pdf
Sorting_project_2.pdfSorting_project_2.pdf
Sorting_project_2.pdf
 
Development Infographic
Development InfographicDevelopment Infographic
Development Infographic
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R users
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
 
Project
ProjectProject
Project
 
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Be cps-18 cps13or23-module1
Be cps-18 cps13or23-module1Be cps-18 cps13or23-module1
Be cps-18 cps13or23-module1
 
Engineering C-programing module1 ppt (18CPS13/23)
Engineering C-programing module1 ppt (18CPS13/23)Engineering C-programing module1 ppt (18CPS13/23)
Engineering C-programing module1 ppt (18CPS13/23)
 
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Ada lab manual
Ada lab manualAda lab manual
Ada lab manual
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Product Recommendation System​ By Using Collaborative Filtering and Network B...
Product Recommendation System​ By Using Collaborative Filtering and Network B...Product Recommendation System​ By Using Collaborative Filtering and Network B...
Product Recommendation System​ By Using Collaborative Filtering and Network B...
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1
 

More from Andrea Gigli

How organizations can become data-driven: three main rules
How organizations can become data-driven: three main rulesHow organizations can become data-driven: three main rules
How organizations can become data-driven: three main rulesAndrea Gigli
 
Equity Value for Startups.pdf
Equity Value for Startups.pdfEquity Value for Startups.pdf
Equity Value for Startups.pdfAndrea Gigli
 
Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systemsAndrea Gigli
 
Data Analytics per Manager
Data Analytics per ManagerData Analytics per Manager
Data Analytics per ManagerAndrea Gigli
 
Balance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVABalance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVAAndrea Gigli
 
Reasons behind XVAs
Reasons behind XVAs Reasons behind XVAs
Reasons behind XVAs Andrea Gigli
 
Recommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesRecommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesAndrea Gigli
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardUsing R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardAndrea Gigli
 
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Andrea Gigli
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
 
Electricity Derivatives
Electricity DerivativesElectricity Derivatives
Electricity DerivativesAndrea Gigli
 
Crawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - ItalianoCrawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - ItalianoAndrea Gigli
 
Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015Andrea Gigli
 
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQLA Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQLAndrea Gigli
 
Search Engine Query Suggestion Application
Search Engine Query Suggestion ApplicationSearch Engine Query Suggestion Application
Search Engine Query Suggestion ApplicationAndrea Gigli
 
From real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cvaFrom real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cvaAndrea Gigli
 
Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014Andrea Gigli
 
Lean Methods for Business & Social Innovation
Lean Methods for Business & Social InnovationLean Methods for Business & Social Innovation
Lean Methods for Business & Social InnovationAndrea Gigli
 
Presentazione Startup Saturday Europe @ ParmaCamp2013
Presentazione Startup Saturday Europe @ ParmaCamp2013Presentazione Startup Saturday Europe @ ParmaCamp2013
Presentazione Startup Saturday Europe @ ParmaCamp2013Andrea Gigli
 
Come calcolare l’equity value di una startup
Come calcolare l’equity value di una startupCome calcolare l’equity value di una startup
Come calcolare l’equity value di una startupAndrea Gigli
 

More from Andrea Gigli (20)

How organizations can become data-driven: three main rules
How organizations can become data-driven: three main rulesHow organizations can become data-driven: three main rules
How organizations can become data-driven: three main rules
 
Equity Value for Startups.pdf
Equity Value for Startups.pdfEquity Value for Startups.pdf
Equity Value for Startups.pdf
 
Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systems
 
Data Analytics per Manager
Data Analytics per ManagerData Analytics per Manager
Data Analytics per Manager
 
Balance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVABalance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVA
 
Reasons behind XVAs
Reasons behind XVAs Reasons behind XVAs
Reasons behind XVAs
 
Recommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesRecommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial Services
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardUsing R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
 
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
Electricity Derivatives
Electricity DerivativesElectricity Derivatives
Electricity Derivatives
 
Crawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - ItalianoCrawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - Italiano
 
Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015
 
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQLA Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
 
Search Engine Query Suggestion Application
Search Engine Query Suggestion ApplicationSearch Engine Query Suggestion Application
Search Engine Query Suggestion Application
 
From real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cvaFrom real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cva
 
Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014
 
Lean Methods for Business & Social Innovation
Lean Methods for Business & Social InnovationLean Methods for Business & Social Innovation
Lean Methods for Business & Social Innovation
 
Presentazione Startup Saturday Europe @ ParmaCamp2013
Presentazione Startup Saturday Europe @ ParmaCamp2013Presentazione Startup Saturday Europe @ ParmaCamp2013
Presentazione Startup Saturday Europe @ ParmaCamp2013
 
Come calcolare l’equity value di una startup
Come calcolare l’equity value di una startupCome calcolare l’equity value di una startup
Come calcolare l’equity value di una startup
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Fast Feature Selection for Learning to Rank - ACM International Conference on Information Retrieval 2016 - Newark, DE, USA

  • 1. ACM International Conference on the Theory of Information Retrieval University of Delaware, Newark, DE, USA September 13-16, 2016 Fast Feature Selection Algorithms for Learning to Rank Andrea Gigli Department of Computer Science, University of Pisa & ISTI – CNR Pisa Franco Maria Nardini, Claudio Lucchese, Raffaele Perego ISTI – CNR Pisa & istella*, Pisa
  • 2. Outline Introduction Proposed Feature Selection Algorithms (FSA) Application to Learning to Rank ICTIR 2016, Newark, DE
  • 3. Outline Introduction Proposed Feature Selection Algorithms (FSA) Application to Learning to Rank ICTIR 2016, Newark, DE
  • 4. ... … … ... ... Learning System Ranking System Indexed Documents ... Training Prediction How to Rank Documents using Supervised Learning ... , , , , , , … … , , , , , , , , , , : query i , : document j associated to the query i , : relevance label for the j-th document associated to the i-th query , : scoring function ICTIR 2016, Newark, DE
  • 5. , Learning to Rank , , , !… …, , , ! " , ( ) " , (#) " , ($) ⋮ " , (&) " ,# ( ) " ,# (#) " ,# ($) ⋮ " ,# (&) " , ( ) " , (#) " , ($) ⋮ " , (&) … Documents Query-Document LabelsQuery , ≈ (() K is in order of hundreds, thousands ICTIR 2016, Newark, DE
  • 6. Outline Introduction Proposed Feature Selection Algorithms (FSA) Application to Learning to Rank ICTIR 2016, Newark, DE
  • 7. We propose the following algorithms Naïve Greedy search Algorithm for feature Selection (N-GAS) eXtended naïve Greedy search Algorithm for feature Selection (X-GAS) Hierarchical clustering Greedy search Algorithm for feature Selection (H-GAS) Proposed Algorithms for Feature Selection ICTIR 2016, Newark, DE
  • 8. We compare them with the Greedy search Algorithm for feature Selection (GAS) proposed by Geng, Liu, Qin, Li (SIGIR07) All the competing FSAs belong to Filter Methods family. Competing FSAs try to to Maximise the Importance of a feature w.r.t. the judgements and Minimize Similarity among selected features. Both X-GAS and the GAS require hyper-parameter calibration. Proposed Algorithms for Feature Selection ICTIR 2016, Newark, DE
  • 9. Proposed Algorithms for Feature Selection Naïve Greedy search Algorithm for feature Selection (N-GAS) eXtended naïve Greedy search Algorithm for feature Selection (X-GAS) Hierarchical clustering Greedy search Algorithm for feature Selection (H-GAS) ICTIR 2016, Newark, DE
  • 10. Proposed Algorithms for Feature Selection: N-GAS The graph is built and the Subset S of n=4 selected features is initialized. Importance of the 8th feature w.r.t. query-offer relevance judgements Similarity between features 6th and 7th ICTIR 2016, Newark, DE
  • 11. Proposed Algorithms for Feature Selection: N-GAS Start by adding the node with the highest importance to S (Node ❶ in this example) ICTIR 2016, Newark, DE
  • 12. Proposed Algorithms for Feature Selection: N-GAS • Let be the node having the lowest similarity wrt Node ❶ • Let be the node having the highest similarity wrt Node ICTIR 2016, Newark, DE
  • 13. Proposed Algorithms for Feature Selection: N-GAS From ( , ) select the Node with the highest importance and add it to S (Node in the example). ICTIR 2016, Newark, DE
  • 14. Proposed Algorithms for Feature Selection: N-GAS • Let ❷ be the node having the lowest similarity wrt Node • Let ❸ be the node having the highest similarity wrt Node ❷ ICTIR 2016, Newark, DE
  • 15. Proposed Algorithms for Feature Selection: N-GAS From (❷, ❸ ) select the node with the highest importance and add it to S (Node ❷ in the example). ICTIR 2016, Newark, DE
  • 16. Proposed Algorithms for Feature Selection: N-GAS • Let ❹be the node having the lowest similarity wrt Node ❷ • Let ❽ be the node having the highest similarity wrt Node ❹ ICTIR 2016, Newark, DE
  • 17. Proposed Algorithms for Feature Selection: N-GAS In (❹, ❽ ) select the node with the highest importance and add it to S (Node ❹ in the example) ICTIR 2016, Newark, DE
  • 18. Naïve Greedy search Algorithm for feature Selection (N-GAS) eXtended naïve Greedy search Algorithm for feature Selection (X-GAS) Hierarchical clustering Greedy search Algorithm for feature Selection (H-GAS) Proposed Algorithms for Feature Selection ICTIR 2016, Newark, DE
  • 19. Proposed Algorithms for Feature Selection: X-GAS The graph is built and the Subset S of n=4 selected features is initialized. ICTIR 2016, Newark, DE
  • 20. Proposed Algorithms for Feature Selection: X-GAS Start by adding the node with the highest importance to S (Node ❶ in this example) ICTIR 2016, Newark, DE
  • 21. Proposed Algorithms for Feature Selection: X-GAS Select the 50% of nodes less similar to ❶ Filter Parameter ICTIR 2016, Newark, DE
  • 22. Proposed Algorithms for Feature Selection: X-GAS From the selection take the node with the highest importance and add it to S (Node in the example) ICTIR 2016, Newark, DE
  • 23. Proposed Algorithms for Feature Selection: X-GAS Select the 50% of the nodes less similar to ICTIR 2016, Newark, DE
  • 24. Proposed Algorithms for Feature Selection: X-GAS From the selection take the node with the highest importance and add it to S ( Node ❸ in the example) ICTIR 2016, Newark, DE
  • 25. Proposed Algorithms for Feature Selection: X-GAS Select the 50% of the nodes less similar to ❸ ICTIR 2016, Newark, DE
  • 26. Proposed Algorithms for Feature Selection: X-GAS From the selection take the node with the highest importance and add it to S (Node ❹ in the example) ICTIR 2016, Newark, DE
  • 27. Naïve Greedy search Algorithm for feature Selection (N-GAS) eXtended naïve Greedy search Algorithm for feature Selection (X-GAS) Hierarchical clustering Greedy search Algorithm for feature Selection (H-GAS) Proposed Algorithms for Feature Selection ICTIR 2016, Newark, DE
  • 28. Proposed Algorithms for Feature Selection: H-GAS 1 4 5 7 6 8 2 3 1 4 6 5 2 3 8 7 8 3 7 5 2 1 4 6 1 5 8 4 ICTIR 2016, Newark, DE
  • 29. Outline Introduction Proposed Feature Selection Algorithms (FSA) Application to Learning to Rank ICTIR 2016, Newark, DE
  • 30. Application to Web Search Engine Data Bing data http://research.microsoft.com/en-us/projects/mslr/ Yahoo! data http://webscope.sandbox.yahoo.com Train Validation Test # queries 19,944 2,994 6,983 # urls 473,134 71,083 165,660 # features 519 Train Validation Test # queries 18,919 6,306 6,306 # urls 723,412 235,259 241,521 # features 136 ICTIR 2016, Newark, DE
  • 31. Experimental Framework Importance, ) : +,-.@10 using each as a ranking model Similarity, 2 , : Spearman RankCorrelation Coefficient Distance, 3 , : 1 − S , 6 L2R Algorithm: LambdaMART ICTIR 2016, Newark, DE
  • 32. Select a subset of n<K features using a given FSA Repeat for different n in {5%K, 10%K, 20%K, 30%K, 40%K, 50%K, 75%K, K} Experimental Protocol Train LamdaMART using n features Measure LamdaMART Performance on the Test Set Compare FSAs using average 789@ : 1 2 3 4 Repeat from ❶ for each FSA ICTIR 2016, Newark, DE
  • 33. Results on “Bing” dataset789@: Feature Subset Size as % of the Feature Set Size (K) ICTIR 2016, Newark, DE
  • 34. Results on “Yahoo!” dataset Feature Subset Size as % of the Feature Set Size (K) ICTIR 2016, Newark, DE 789@:
  • 35. Feature Subset Dimension 5% 10% 20% 30% 40% 100% N-GAS 0.4011▼ 0.4459 0.471 0.4739▼ 0.4813 0.4863 X-GAS, p = 0.05 0.4376▲ 0.4528 0.4577▼ 0.4825 0.4834 0.4863 H-GAS, "single" 0.4423▲ 0.4643▲ 0.4870▲ 0.4854 0.4848 0.4863 H-GAS, "ward" 0.4289 0.4434▼ 0.4820 0.4879 0.4853 0.4863 GAS, c = 0.01 0.4294 0.4515 0.4758 0.4848 0.4863 0.4863 Feature Subset Dimension 5% 10% 20% 30% 40% 100% N-GAS 0.7430▼ 0.7601 0.7672 0.7717 0.7724 0.7753 X-GAS, p = 0.8 0.7655 0.7666 0.7723 0.7742 0.7751 0.7753 H-GAS, "single" 0.7350▼ 0.7635 0.7666 0.7738 0.7742 0.7753 H-GAS, "ward" 0.7570▼ 0.7626 0.7704 0.7743 0.7755 0.7753 GAS, c = 0.01 0.7628 0.7649 0.7671 0.773 0.7737 0.7753 Results Yahoo! dataset Bing dataset ICTIR 2016, Newark, DE
  • 36. Conclusion X-GAS e H-GAS show a performance greater or equal than the benchmark model H-CAS and N-GAS are more efficient than the others because do not need any hyper-parameter calibration. FutureWork: experiments on the new LtR dataset provided by istella* (http://blog.istella.it/istella-learning-to-rank-dataset/) application to other ML contexts, sorting problems and ensemble learning. ICTIR 2016, Newark, DE
  • 37. ACM International Conference on the Theory of Information Retrieval University of Delaware, Newark, DE, USA September 13-16, 2016 Thank you and special thanks to ACM-SIGIR for theTravel Grant support Andrea Gigli Email: andrgig@gmail.com Twitter: @andrgig http://www.slideshare.net/andrgig