Slides

207 views
173 views

Published on

My MSc thesis dissertation. My work was focused on finding academic experts. The sate of the art was mainly focused on textual similarities between documents and queries. In my work, I proposed supervised learning to rank techniques in order to improve the current state of the art.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
207
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Slides

  1. 1. Instituto Superior TécnicoUniversidade Técnica de LisboaLearning To RankAcademic Experts Catarina Moreira
  2. 2. Outline✓ Introduction✓ State of the Art Problems✓ Features to Estimate Expertise✓ Datasets✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework✓ Conclusions and Future Work 2/34
  3. 3. Expert Finding Information RetrievalGerard Salton Ricardo Baeza-Yates Bruce Croft 3/34
  4. 4. State of the Art ProblemsUsage of Generative Probabilistic ModelsHeuristics are too simple and do not reflect expertiseHeuristics only based on the documents’ textual contents 4/34
  5. 5. Contributions1. Different Sets of Features to Estimate Expertise2. Rank Aggregation Framework for Expert Finding3. Learning to Rank (L2R) Framework for Expert Finding 5/34
  6. 6. Outline✓ Introduction✓ State of the Art Problems✓ Features to Estimate Expertise✓ Datasets✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework✓ Conclusions and Future Work 6/34
  7. 7. Features: HypothesisMultiple estimators of expertise, based ondifferent sources of evidence, will enable theconstruction of more accurate and reliableranking models! 7/34
  8. 8. Textual SimilarityTerm FrequencyInverse Document Frequency TF.IDFBM25 8/34
  9. 9. Profile Information✓ Number of Publications with(out) query topics✓ Number of Journals with(out) query topics✓ Years Between Publications with(out) query topics✓ Average Number of Publications per year 9/34
  10. 10. Graphs✓ Total/Max/Avg citations of the authors’ papers✓ Total Number of Unique Collaborators✓ Publications’ PageRank✓ Academic Indexes 10/34
  11. 11. Hirsch Index Hirsch  Index 11/34
  12. 12. Other Indexesa-IndexContemporary h-Index (extension of h Index)Trend h-Index (extension of h Index) 12/34
  13. 13. DatasetsDBLP - Computer Science Dataset- Covers journal and conference publications- Contains abstracts and citation links- All this information was processed and stored in a database 13/34
  14. 14. DatasetsArnetminer - Validation- Contains experts for 13 query topics- Experts collected from important Program Committees related to the query topics 14/34
  15. 15. Outline✓ Introduction✓ State of the Art Problems✓ Features to Estimate Expertise✓ DataSets✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework✓ Conclusions and Future Work 15/34
  16. 16. QuestionHow can we combine these features? 16/34
  17. 17. AnswerTraditional IR techniques useframeworks inspired intraditional search engines tocombine different sources ofevidence! 17/34
  18. 18. Rank Aggregation Framework for Expert Finding 18/34
  19. 19. Data Fusion Algorithms✓ Positional ✓ Based on the position that a candidate occupies in a ranked list ✓ Algorithms: Borda Fuse and Reciprocal Rank Fuse✓ Score Aggregation ✓ Based on the score that a candidate achieved in a ranked list ✓ Algorithms: CombSUM, CombMNZ and CombANZ✓ Majoritarian ✓ Based on pairwise comparisons between candidates ✓ Algorithms: Condorcet Fusion 19/34
  20. 20. Results Rank Aggregation (MAP)* CombMNZ 48,43% [-10,25%] Cond. Fusion 43,82% CombSUM 41,34% [+6,00%] Borda Fuse 39,99% [+9,58%]Rec. Rank Fuse 39,99% [+9,58%] CombANZ* 35,61% [+23,06%] *Sig. Tests of 0.95 conf. *Mean Average Precision 20/34
  21. 21. Impact of the Features with Condorcet Fusion(MAP)* Graph 43,86% [- 0,09%]Text + Profile + Graph 43,82% Profile + Graph 41,65% [+4,95%] Text + Graph* 39,08% [+10,82%] Profile* 36,87% [+15,86%] Text + Profile* 32,67% [+25,45%] Text* 29,75% [+32,11%] *Sig. Tests of 0.95 conf. *Mean Average Precision 21/34
  22. 22. Outline✓ Introduction✓ State of the Art Problems✓ Features to Estimate Expertise✓ Datasets✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework✓ Conclusions and Future Work 22/34
  23. 23. Question How can we combine thesefeatures in an optimal way? 23/34
  24. 24. Answer IR literature focuses onMachine learning techniques,They enable the combination of multiple estimators in an optimal way! 24/34
  25. 25. The L2R Framework For Expert Finding 25 25/34
  26. 26. L2R Algorithms✓ Pointwise ✓ Input: single candidate ✓ Goal: use scoring functions to predict relevance ✓ Algorithms: Additive Groves✓ Pairwise ✓ Input: pair of candidates ✓ Goal: loss function to minimize number of misclassified candidate pairs ✓ Algorithms: RankBoost, SVMrank and RankNet✓ Listwise ✓ Input: list of candidates ✓ Goal: loss function which directly optimizes an IR metric ✓ Algorithm: SVMmap, Coordinate Ascent and AdaRank 26/34
  27. 27. Results Learning to Rank (MAP)*Additive Groves 89,40% SVMmap 87,02% [+2,66%] SVMrank 83,11[+7,04%] RankBoost* 78,40 [+12,30%] Coord. Ascent* 75,77 [+15,25%] RankNet* 65,30% [+26,96%] AdaRank* 64,78% [+27,54%]*Sig. Tests of 0.95 conf. *Mean Average Precision 27/34
  28. 28. Impact of the Features with Additive Groves(Map)*Text + Profile + Graph 89,40% Text + Graph* 88,25% [+1,29%] Profile 87,28% [+2,37%] Text + Profile* 87,14% [+2,53%] Text 86,60% [+3,13%] Graph 85,26% [+4,63%] Profile + Graph* 82,37% [+7,86%] *Sig. Tests of 0.95 conf. *Mean Average Precision 28/34
  29. 29. Comparison with State of the Art (MAP)* Balog’s Model 2 39,15% [+56,21%] Deng’s AuthorRank 49,06% [+45,12%] Yang’s SVMrank 63,56% [+28,90%]Moreira’s Add. Groves 89,40% *Mean Average Precision 29/34
  30. 30. Prototype
  31. 31. Outline✓ Introduction✓ State of the Art Problems✓ Features to Estimate Expertise✓ Datasets✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework✓ Conclusions and Future Work 31/34
  32. 32. Conclusions✓ Effectiveness of the Learning to Rank Framework ✓ Best algorithms: Additive Groves, SVMmap and SVMrank✓ Effectiveness of the Rank Aggregation Approach ✓ Best algorithms: CombMNZ and Condorcet Fusion✓ Effectiveness of the Proposed Features ✓ Set of full features are the best 32/34
  33. 33. Future Work✓ Feature Selection Techniques (ex: PCA)✓ Expert Finding in an organizational environment (TREC dataset)✓ Tasks beyond expert finding ✓ Natural Language Processing ✓ Geographic Information Retrieval 33/34
  34. 34. Publications✓ C. Moreira, P. Calado and B. Martins, Learning to Rank for Expert Search in Digital Libraries of Academic Publications, In proceedings of the 15th portuguese conference on Artificial Intelligence, 2011✓ C. Moreira, B. Martins and P. Calado, Using Rank Aggregation for Expert Search in Academic Digital Libraries, In Simpósio de Informática, INFORUM, 2011✓ C. Moreira, A. Mendes, L. Coheur and B. Martins, Towards the Rapid Development of a Natural Language Understanding Module, In proceedings of the 11th conference on intelligent virtual agents, 2011 34/34

×