확률모델
Upcoming SlideShare
Loading in...5
×
 

확률모델

on

  • 1,788 views

정보검색시스템 강의노트 강승식교수님

정보검색시스템 강의노트 강승식교수님

Statistics

Views

Total Views
1,788
Views on SlideShare
1,274
Embed Views
514

Actions

Likes
0
Downloads
10
Comments
0

3 Embeds 514

http://onbranding.kr 341
http://mylucky8.tistory.com 172
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

확률모델 확률모델 Presentation Transcript

  • 2.5.4 Probabilistic Model
    • Motivation
      • Attempt to capture the IR problem within a probabilistic framework
    • Assumption (Probabilistic Principle)
      • Probability of relevance depends on the query and the document representations only
      • There is a subset of all documents which the user prefers as the answer set for the query q
      • Such an ideal answer set is labeled R and should maximize the overall probability of relevance to the user
      • Documents in the set R are predicted to be relevant to the query
      • Documents not in this set are predicted to be non-relevant
  • Probabilistic Model
    • Definition
    23 Using Bayes’ rule
  • Probabilistic Model
    • Definition (Cont.)
    23 Assuming independence of index terms Taking logarithms, ignoring constants, View slide
  • Probabilistic Model
    • Initial Probability
    • Improving Probability
    23 Adjustment the problems for small values of V and V i View slide
  • Probabilistic Model
    • Advantage
      • Documents are ranked in decreasing order of their probability of being relevant
    • Disadvantage
      • Guess the initial separation of documents into relevant and non-relevant sets
      • Binary weights
        • Does not take into account the frequency with which an index term occurs inside a document
      • Independence assumption for index terms
    23
  • 2.5.5 Brief Comparison of Classic Models
    • Boolean Model
      • Weakest classic method
      • Inability to recognize partial matches
    • Vector Model
      • Popular retrieval model
    • Vector Model v.s Probabilistic Model
      • Croft
        • Probabilistic model provides a better retrieval performance
      • Salton, Buckley
        • Vector model is expected to outperform the probabilistic model with general collections
    23
  • 참고 : Probabilistic Model Example of Probabilistic Retrieval Model
    • Query : q(k 1 , k 2 )
    • 5 documents
    • Assume d 2 and d 4 are relevant
    23 d 1 d 2 d 3 d 4 d 5 k 1 1 0 1 1 0 k 2 0 1 0 1 1
  • Probabilistic Model Example of Probabilistic Model
      • Q: “gold silver truck”
      • D1: “Shipment of gold damaged in a fire.”
      • D2: “Delivery of silver arrived in a silver truck.”
      • D3: “Shipment of gold arrived in a truck.”
      • Assume that documents D2 and D3 are relevant to the query
      • N= number of documents in the collection
      • R= number of relevant documents for a given query Q
      • n= number of documents that contain term t
      • r= number of relevant documents that contain term t
    23 variable Gold Silver Truck N 3 3 3 R 2 2 2 n 2 1 2 r 1 1 2
  • Probabilistic Model Example of Probabilistic Model (Cont.)
    • Computing Term Relevance Weight
    • Similarity Coefficient for each document
      • D1: -0.477, D2: 1.653, D3: 0.699
    23
  • 참고 : Review of Probability Theory
    • Try to predict whether or not a baseball team (e.g. LA Dodgers) will win one of its games
    • Let’s assume the independence of evidences
    • By Bayes’ Theorem
    23 We can’t calculate it, so… Probability to lose
  • Probabilistic Model Review of Probability Theory (Cont.)
    • Independent Assumption
    • Making Substitutions
    23
  • Probabilistic Model Review of Probability Theory (Cont.)
    • Making Substitutions (Cont.)
    The probability of LA Dodgers will win its game on sunny days when Chanho plays