Probabilistic Retrieval Models           Lecture 8         Sean A. Golliher
   Need to quickly cover some old material    to understand the new methods
   Complex concept that has been studied for    some time     Many factors to consider     People often disagree when m...
   Older models     Boolean retrieval     Vector Space model   Probabilistic Models     BM25     Language models   ...
   Two possible outcomes for query    processing     TRUE and FALSE     “exact-match” retrieval     simplest form of r...
   Advantages     Results are predictable, relatively easy to      explain     Many different features can be incorpora...
 Documents and query represented by a  vector of term weights Collection represented by a matrix of  term weights
   3-d pictures useful, but can be    misleading for high-dimensional space
The Euclideandistance between qand d2 is large eventhough thedistribution of termsin the query q andthe distribution ofter...
   Thought experiment: take a document d and    append it to itself. Call this document d′.   “Semantically” d and d′ ha...
 In Euclidean space, define dot product of  vectors a & b asab=||a|| ||b|| cos               a    where                 ...
 By using Law of Cosines, can compute  coordinate-dependent definition in 3-  space: ab= axbx + ayby + azbz cos = ab/...
   Documents ranked by distance between    points representing query and documents     Similarity measure more common th...
 Consider two documents D1, D2 and a query Q ○ D1 = (0.5, 0.8, 0.3), D2 = (0.9, 0.4, 0.2), Q = (1.5, 1.0, 0)
Dot product         Unit vectors                                            V                 q    d    q   d       ...
   tf.idf weight (older retrieval model)     tf: term frequency of term over collection of      documents     idf: inve...
     The collection frequency of t is the      number of occurrences of t in the      collection, counting multiple occur...
   The tf-idf weight of a term is the product of its    tf weight and its idf weight.   Best known weighting scheme in i...
 Rocchio algorithm (paper topic) Optimal query     Maximizes the difference between the average     vector representing...
   Most dominant paradigm used today   Probability theory is a strong foundation    for representing uncertainty that is...
   Robertson (1977)    If a reference retrieval system’s response to each    request is a ranking of the documents in the...
   Probability Ranking Principle (Robertson, 70ies;    Maron, Kuhns, 1959)   Information Retrieval as Probabilistic Infe...
   P(a | b) => Conditional probability. Probability of a given    that b occurred.   Basic definitions    (a È b) => Aor...
Let a, b be two events.  p(a | b)p(b) = p(a Ç b) = p(b | a)p(a)             p(b | a)p(a)  p(a | b) =                p(b)  ...
   Let D be a document in the collection.   Let R represent relevance of a document w.r.t. given    (fixed) query and le...
   Can we calculate P(D|R)? Probability of a document    occurring in a set given a relevant set has been returned.   If...
Let D be a document in the collection.Let R represent relevance of a document w.r.t. given (fixed)query and let NR represe...
p(D | R)p(R)                p(R | D) =                              p(D)                            p(D | NR)p(NR)        ...
   Bayes Decision Rule     A document D is relevant if P(R|D) > P(NR|D)   Estimating probabilities     use Bayes Rule ...
   Can we calculate P(D|R)? Probability that if a relevant    document is returned it is D?   If we KNOW we have relevan...
   Suppose we have a vector representing the presence and    absence of terms (1,0,0,1,1). Terms 1, 4, & 5 are present. ...
   Assume independence   Binary independence model     Dot product of over terms that have value      one. Zero means d...
   Scoring function is (Last term was same    for all documents. So it can be ignored.
 Jump to machine learning and web  search. Lots of training data available  from web search queries. Learning to  rank mo...
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Upcoming SlideShare
Loading in …5
×

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494

920 views

Published on

Lect 8 2012 CSCI 494. Sean Golliher. Probabilistic retrieval models, cosine similarity.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
920
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Angle captures relative proportion of terms
  • http://nlp.stanford.edu/IR-book/html/htmledition/inverse-document-frequency-1.html … For example auto industry. All documents contain the word auto. Want to decrease the value of that phraseas it occurs more because it is non discriminating in a search query. Df is more useful.. Look at the range.
  • Tf is the number of times the word occurs in document d.
  • D is a collection of documents. R is relevance. P(R) Is
  • Use log since we get lots of small numbers. pi is probablity that that term I occurs in relevant set.
  • Use log since we get lots of small numbers. pi is probablity that that term I occurs in relevant set.
  • Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494

    1. 1. Probabilistic Retrieval Models Lecture 8 Sean A. Golliher
    2. 2.  Need to quickly cover some old material to understand the new methods
    3. 3.  Complex concept that has been studied for some time  Many factors to consider  People often disagree when making relevance judgments Retrieval models make various assumptions about relevance to simplify problem  e.g., topical vs. user relevance  e.g., binary vs. multi-valued relevance
    4. 4.  Older models  Boolean retrieval  Vector Space model Probabilistic Models  BM25  Language models Combining evidence  Inference networks  Learning to Rank
    5. 5.  Two possible outcomes for query processing  TRUE and FALSE  “exact-match” retrieval  simplest form of ranking Query usually specified using Boolean operators  AND, OR, NOT
    6. 6.  Advantages  Results are predictable, relatively easy to explain  Many different features can be incorporated  Efficient processing since many documents can be eliminated from search Disadvantages  Effectiveness depends entirely on user  Simple queries usually don’t work well  Complex queries are difficult
    7. 7.  Documents and query represented by a vector of term weights Collection represented by a matrix of term weights
    8. 8.  3-d pictures useful, but can be misleading for high-dimensional space
    9. 9. The Euclideandistance between qand d2 is large eventhough thedistribution of termsin the query q andthe distribution ofterms in thedocument d2 arevery similar.
    10. 10.  Thought experiment: take a document d and append it to itself. Call this document d′. “Semantically” d and d′ have the same content The Euclidean distance between the two documents can be quite large The angle between the two documents is 0, corresponding to maximal similarity (cos(0) = 1). Key idea: Rank documents according to angle with query.
    11. 11.  In Euclidean space, define dot product of vectors a & b asab=||a|| ||b|| cos a where b ||a|| == length == angle between a & b
    12. 12.  By using Law of Cosines, can compute coordinate-dependent definition in 3- space: ab= axbx + ayby + azbz cos = ab/||a|| ||b|| cosine(0) = 1 cosine(90 deg) = 0
    13. 13.  Documents ranked by distance between points representing query and documents  Similarity measure more common than a distance or dissimilarity measure  e.g. Cosine correlation
    14. 14.  Consider two documents D1, D2 and a query Q ○ D1 = (0.5, 0.8, 0.3), D2 = (0.9, 0.4, 0.2), Q = (1.5, 1.0, 0)
    15. 15. Dot product Unit vectors     V   q d q d q di i 1 i cos(q, d )     q d q d V 2 q V d i2 i 1 i i 1qi is the tf-idf weight of term i in the querydi is the tf-idf weight of term i in the documentcos(q,d) is the cosine similarity of q and d … or,equivalently, the cosine of the angle between q and d.
    16. 16.  tf.idf weight (older retrieval model)  tf: term frequency of term over collection of documents  idf: inverse document freq. ex: ○ log(N/n)  N is the total number of document  n is total number of documents that contain a term  Measure of “importance” of term. The more documents a term appears in the lest discriminating the term is.  Use log to dampen the effects
    17. 17.  The collection frequency of t is the number of occurrences of t in the collection, counting multiple occurrences. Word Collection frequency Document frequency insurance 10440 3997 try 10422 8760 Document frequency df is number of documents that contain a term t. Which of these is more useful?
    18. 18.  The tf-idf weight of a term is the product of its tf weight and its idf weight. Best known weighting scheme in information retrieval ○ Note: the “-” in tf-idf is a hyphen, not a minus sign! ○ Alternative names: tf.idf, tf x idf Increases with the number of occurrences within a document Increases with the rarity of the term in the collection
    19. 19.  Rocchio algorithm (paper topic) Optimal query  Maximizes the difference between the average vector representing the relevant documents and the average vector representing the non-relevant documents Modifies query according to  α, β, and γ are parameters ○ Typical values 8, 16, 4
    20. 20.  Most dominant paradigm used today Probability theory is a strong foundation for representing uncertainty that is inherent in IR process.
    21. 21.  Robertson (1977) If a reference retrieval system’s response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data.”
    22. 22.  Probability Ranking Principle (Robertson, 70ies; Maron, Kuhns, 1959) Information Retrieval as Probabilistic Inference (van Rijsbergen & co, since 70ies) Probabilistic Indexing (Fuhr & Co.,late 80ies- 90ies) Bayesian Nets in IR (Turtle, Croft, 90ies) Probabilistic Logic Programming in IR (Fuhr & co, 90ies)
    23. 23.  P(a | b) => Conditional probability. Probability of a given that b occurred. Basic definitions (a È b) => AorB (a Ç b) = AandB
    24. 24. Let a, b be two events. p(a | b)p(b) = p(a Ç b) = p(b | a)p(a) p(b | a)p(a) p(a | b) = p(b) p(a | b)p(b) = p(b | a)p(a)
    25. 25.  Let D be a document in the collection. Let R represent relevance of a document w.r.t. given (fixed) query and let NR represent non-relevance. How do we find P(R|D)? Probability that a retrieved document is relevant. Abstract concept. P(R) is the probability that a retrieved is relevant Not Clear how to calculate this.
    26. 26.  Can we calculate P(D|R)? Probability of a document occurring in a set given a relevant set has been returned. If we KNOW we have relevant set of documents (maybe from humans?) We could calculate how often specific words occur in a certain set.
    27. 27. Let D be a document in the collection.Let R represent relevance of a document w.r.t. given (fixed)query and let NR represent non-relevance.Need to find p(R|D) - probability that a retrieved document Dis relevant. p(D | R)p(R) p(R),p(NR) - prior probability p(R | D) = p(D) of retrieving a (non) relevant p(xD | NR)p(NR) document p(NR | D) = p(xD) P(D|R), p(D|NR) - probability that if a relevant (non-rel document is retrieved, it is D.
    28. 28. p(D | R)p(R) p(R | D) = p(D) p(D | NR)p(NR) p(NR | D) = p(D)Ranking Principle (Bayes’ Decision Rule):If p(R|D) > p(NR|D) then D is relevant,Otherwise D is not relevant
    29. 29.  Bayes Decision Rule  A document D is relevant if P(R|D) > P(NR|D) Estimating probabilities  use Bayes Rule  classify a document as relevant if ○ Left side is likelihood ratio
    30. 30.  Can we calculate P(D|R)? Probability that if a relevant document is returned it is D? If we KNOW we have relevant set of documents (maybe from humans?) We could calculate how often specific words occur in a certain set. Ex: We have info on how often specific words occur in relevant set. We could calculate how likely it is to see the words appear in a set. Ex: Prob: “president” in the relevant set is 0.02 and “lincoln” in the relevant set is “0.03”. If new doc. has pres. & lincoln then prob. Is 0.02*0.03= .0006.
    31. 31.  Suppose we have a vector representing the presence and absence of terms (1,0,0,1,1). Terms 1, 4, & 5 are present. What is the probability of this document occurring in the relevant set? pi is the probability that the term i occurs in a relevant set. (1- pi ) would be the probability a term would not be included the relevant set. This gives us: p1 x (1-p2) x (1-p3) x p4 x p5
    32. 32.  Assume independence Binary independence model  Dot product of over terms that have value one. Zero means dot product over terms that have value 0.  pi is probability that term i occurs (i.e., has value 1) in relevant document, si is probability of occurrence in non-relevant document
    33. 33.  Scoring function is (Last term was same for all documents. So it can be ignored.
    34. 34.  Jump to machine learning and web search. Lots of training data available from web search queries. Learning to rank models. http://www.bradblock.com/A_General_La nguage_Model_for_Information_Retrieval .pdf

    ×