• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
확률모델
 

확률모델

on

  • 1,664 views

정보검색시스템 강의노트 강승식교수님

정보검색시스템 강의노트 강승식교수님

Statistics

Views

Total Views
1,664
Views on SlideShare
1,154
Embed Views
510

Actions

Likes
0
Downloads
10
Comments
0

3 Embeds 510

http://onbranding.kr 338
http://mylucky8.tistory.com 171
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    확률모델 확률모델 Presentation Transcript

    • 2.5.4 Probabilistic Model
      • Motivation
        • Attempt to capture the IR problem within a probabilistic framework
      • Assumption (Probabilistic Principle)
        • Probability of relevance depends on the query and the document representations only
        • There is a subset of all documents which the user prefers as the answer set for the query q
        • Such an ideal answer set is labeled R and should maximize the overall probability of relevance to the user
        • Documents in the set R are predicted to be relevant to the query
        • Documents not in this set are predicted to be non-relevant
    • Probabilistic Model
      • Definition
      23 Using Bayes’ rule
    • Probabilistic Model
      • Definition (Cont.)
      23 Assuming independence of index terms Taking logarithms, ignoring constants,
    • Probabilistic Model
      • Initial Probability
      • Improving Probability
      23 Adjustment the problems for small values of V and V i
    • Probabilistic Model
      • Advantage
        • Documents are ranked in decreasing order of their probability of being relevant
      • Disadvantage
        • Guess the initial separation of documents into relevant and non-relevant sets
        • Binary weights
          • Does not take into account the frequency with which an index term occurs inside a document
        • Independence assumption for index terms
      23
    • 2.5.5 Brief Comparison of Classic Models
      • Boolean Model
        • Weakest classic method
        • Inability to recognize partial matches
      • Vector Model
        • Popular retrieval model
      • Vector Model v.s Probabilistic Model
        • Croft
          • Probabilistic model provides a better retrieval performance
        • Salton, Buckley
          • Vector model is expected to outperform the probabilistic model with general collections
      23
    • 참고 : Probabilistic Model Example of Probabilistic Retrieval Model
      • Query : q(k 1 , k 2 )
      • 5 documents
      • Assume d 2 and d 4 are relevant
      23 d 1 d 2 d 3 d 4 d 5 k 1 1 0 1 1 0 k 2 0 1 0 1 1
    • Probabilistic Model Example of Probabilistic Model
        • Q: “gold silver truck”
        • D1: “Shipment of gold damaged in a fire.”
        • D2: “Delivery of silver arrived in a silver truck.”
        • D3: “Shipment of gold arrived in a truck.”
        • Assume that documents D2 and D3 are relevant to the query
        • N= number of documents in the collection
        • R= number of relevant documents for a given query Q
        • n= number of documents that contain term t
        • r= number of relevant documents that contain term t
      23 variable Gold Silver Truck N 3 3 3 R 2 2 2 n 2 1 2 r 1 1 2
    • Probabilistic Model Example of Probabilistic Model (Cont.)
      • Computing Term Relevance Weight
      • Similarity Coefficient for each document
        • D1: -0.477, D2: 1.653, D3: 0.699
      23
    • 참고 : Review of Probability Theory
      • Try to predict whether or not a baseball team (e.g. LA Dodgers) will win one of its games
      • Let’s assume the independence of evidences
      • By Bayes’ Theorem
      23 We can’t calculate it, so… Probability to lose
    • Probabilistic Model Review of Probability Theory (Cont.)
      • Independent Assumption
      • Making Substitutions
      23
    • Probabilistic Model Review of Probability Theory (Cont.)
      • Making Substitutions (Cont.)
      The probability of LA Dodgers will win its game on sunny days when Chanho plays