Your SlideShare is downloading. ×
  • Like
  • Save
A language modeling framework for expert finding
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

A language modeling framework for expert finding

  • 283 views
Published

 

Published in Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
283
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A language model framework for expert finding Written by Krisztian Balog, Leif Azzopardi, Maarten de Rijke Presented by Saúl Vargas Sandoval
  • 2. Contents (I)
    • Expert finding: what and why?
    • 3. Language modeling: the basics
    • 4. The expert finding task
      • Formalization and models
      • 5. Candidate model
      • 6. Document model
      • 7. Document-candidate association
    • Testing the models
      • Experimental setup
  • 8. Contents (II)
      • Results and discussion
    • Conclusions
    • 9. Questions
  • 10. The expert finding: what and why? (I)
    • First example: IBM
      • ~400.000 employees
      • 11. 161 countries
    • Not-so-far example: Universität Bielefeld
      • ~1.500 workers and ~18.000 students
      • 12. 13 faculties
    • Who's the man?
  • 13. The expert finding: what and why? (II)
  • 22. Language modeling: the basics
    • Assigns a sequence of m words to a probability distribution:
    • 23. Information Retrieval:
      • Language model for each document:
      • 24. Query q -> probability of the document model generating the query q:
  • 25. The expert finding task: formalization
    • “What is the probability of a candidate ca being an expert given the query topic q?”
    • 26. Bayes' Theorem and simplifying assumption:
  • 27. The expert finding task: models Candidate model Document model Documents Candidates Query Result Query Ranked documents Candidates Result
  • 28. Candidate model
    • Multinomial unigram probability distribution:
    • 29. Smoothing:
  • 30. Candidate model (II)
    • Documents as bridge between terms and candidates.
    • 31. Let's do the math!
  • 32. Candidate model (III)
    • First approach:
    • 33. First estimation:
  • 34. Candidate model (IV)
    • Second approach: windows
    • 35. Second estimation:
  • 36. Document model (I)
    • Ranked documents according to query q.
    • 37. Association between the documents and the candidate.
    • 38. Assuming independence:
  • 39. Document model (II)
    • First approach: conditional independence
    • 40. First estimation:
  • 41. Document model (III)
    • Second approach: window size again
    • 42. Second estimation:
  • 43. Document-candidate association
    • Bayes' Theorem and simplifying conditons:
    • 44. Boolean model:
    • 45. Frenquency-based approach: TF.IDF
  • 46. Testing the models
    • Questions:
      • Which is better: candidate model or document model?
      • 47. For each model: independece assumption or windows? Which window size?
      • 48. Document-candidate association: boolean model or frequency-based approach?
  • 49. Experimental setup (I)
    • Sets of the 2005 and 2006 TREC Enterprise track: documents from the W3C website (GB's).
      • TREC Enterprise 2005
        • Topics: names of working groups on the W3C (50 topics).
        • 50. Experts: members of the working groups.
      • TREC Enterprise 2006:
        • Topics and experts: assessed manually (49 topics).
    • All documents as plain text, without stopwords but no stemming.
    • 51. Each person described in documents with name, e-mail, ID number, abreviations...
  • 52. Experimental setup (II)
    • Evaluation metrics:
      • Mean average precission (MAP):
      • 53. Mean reciprocal rank (MRR):
  • 54. Candidate model vs. Document model Model MAP MRR 2005 2006 2005 2006 Candidate 1st 0.1883 0.3206 0.4692 0.7264 Document 1st 0.2503 0.4660 0.6088 0.9354
  • 55. Window sizes Model MAP MRR 2005 2006 2005 2006 Candidate 2nd 25 100 15 15 Document 2nd 125 250 15 75
  • 56. Conditional independence vs. Windows (I) Model MAP MRR 2005 2006 2005 2006 Candidate 1st 0.1883 0.3206 0.4692 0.7264 Candidate 2nd 0.2020 0.4254 0.5928 0.9048 Document 1st 0.2053 0.4660 0.6088 0.9354 Document 2nd 0.2194 0.4544 0.6096 0.9235 Window sizes optimized for MAP
  • 57. Conditional independence vs. Windows (II) Model MAP MRR 2005 2006 2005 2006 Candidate 1st 0.1883 0.3206 0.4692 0.7264 Candidate 2nd 0.2012 0.3848 0.6275 0.9558 Document 1st 0.2053 0.4660 0.6088 0.9354 Document 2nd 0.1964 0.4463 0.6371 0.9531 Window sizes optimized for MRR
  • 58. Conditional independence vs. Windows (III)
  • 59. Document-candidate association methods (I)
  • 60. Conclusions DOCUMENT MODEL (with conditional independence!)
  • 61. The end
    • Questions? Fragen?
    • 62. Thanks! Danke!