확률모델

  • 1,345 views
Uploaded on

정보검색시스템 강의노트 강승식교수님

정보검색시스템 강의노트 강승식교수님

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,345
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
12
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 2.5.4 Probabilistic Model
    • Motivation
      • Attempt to capture the IR problem within a probabilistic framework
    • Assumption (Probabilistic Principle)
      • Probability of relevance depends on the query and the document representations only
      • There is a subset of all documents which the user prefers as the answer set for the query q
      • Such an ideal answer set is labeled R and should maximize the overall probability of relevance to the user
      • Documents in the set R are predicted to be relevant to the query
      • Documents not in this set are predicted to be non-relevant
  • 2. Probabilistic Model
    • Definition
    23 Using Bayes’ rule
  • 3. Probabilistic Model
    • Definition (Cont.)
    23 Assuming independence of index terms Taking logarithms, ignoring constants,
  • 4. Probabilistic Model
    • Initial Probability
    • Improving Probability
    23 Adjustment the problems for small values of V and V i
  • 5. Probabilistic Model
    • Advantage
      • Documents are ranked in decreasing order of their probability of being relevant
    • Disadvantage
      • Guess the initial separation of documents into relevant and non-relevant sets
      • Binary weights
        • Does not take into account the frequency with which an index term occurs inside a document
      • Independence assumption for index terms
    23
  • 6. 2.5.5 Brief Comparison of Classic Models
    • Boolean Model
      • Weakest classic method
      • Inability to recognize partial matches
    • Vector Model
      • Popular retrieval model
    • Vector Model v.s Probabilistic Model
      • Croft
        • Probabilistic model provides a better retrieval performance
      • Salton, Buckley
        • Vector model is expected to outperform the probabilistic model with general collections
    23
  • 7. 참고 : Probabilistic Model Example of Probabilistic Retrieval Model
    • Query : q(k 1 , k 2 )
    • 5 documents
    • Assume d 2 and d 4 are relevant
    23 d 1 d 2 d 3 d 4 d 5 k 1 1 0 1 1 0 k 2 0 1 0 1 1
  • 8. Probabilistic Model Example of Probabilistic Model
      • Q: “gold silver truck”
      • D1: “Shipment of gold damaged in a fire.”
      • D2: “Delivery of silver arrived in a silver truck.”
      • D3: “Shipment of gold arrived in a truck.”
      • Assume that documents D2 and D3 are relevant to the query
      • N= number of documents in the collection
      • R= number of relevant documents for a given query Q
      • n= number of documents that contain term t
      • r= number of relevant documents that contain term t
    23 variable Gold Silver Truck N 3 3 3 R 2 2 2 n 2 1 2 r 1 1 2
  • 9. Probabilistic Model Example of Probabilistic Model (Cont.)
    • Computing Term Relevance Weight
    • Similarity Coefficient for each document
      • D1: -0.477, D2: 1.653, D3: 0.699
    23
  • 10. 참고 : Review of Probability Theory
    • Try to predict whether or not a baseball team (e.g. LA Dodgers) will win one of its games
    • Let’s assume the independence of evidences
    • By Bayes’ Theorem
    23 We can’t calculate it, so… Probability to lose
  • 11. Probabilistic Model Review of Probability Theory (Cont.)
    • Independent Assumption
    • Making Substitutions
    23
  • 12. Probabilistic Model Review of Probability Theory (Cont.)
    • Making Substitutions (Cont.)
    The probability of LA Dodgers will win its game on sunny days when Chanho plays