Upcoming SlideShare
×

# 확률모델

1,792 views

Published on

정보검색시스템 강의노트 강승식교수님

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,792
On SlideShare
0
From Embeds
0
Number of Embeds
565
Actions
Shares
0
21
0
Likes
0
Embeds 0
No embeds

No notes for slide

### 확률모델

1. 1. 2.5.4 Probabilistic Model <ul><li>Motivation </li></ul><ul><ul><li>Attempt to capture the IR problem within a probabilistic framework </li></ul></ul><ul><li>Assumption (Probabilistic Principle) </li></ul><ul><ul><li>Probability of relevance depends on the query and the document representations only </li></ul></ul><ul><ul><li>There is a subset of all documents which the user prefers as the answer set for the query q </li></ul></ul><ul><ul><li>Such an ideal answer set is labeled R and should maximize the overall probability of relevance to the user </li></ul></ul><ul><ul><li>Documents in the set R are predicted to be relevant to the query </li></ul></ul><ul><ul><li>Documents not in this set are predicted to be non-relevant </li></ul></ul>
2. 2. Probabilistic Model <ul><li>Definition </li></ul>23 Using Bayes’ rule
3. 3. Probabilistic Model <ul><li>Definition (Cont.) </li></ul>23 Assuming independence of index terms Taking logarithms, ignoring constants,
4. 4. Probabilistic Model <ul><li>Initial Probability </li></ul><ul><li>Improving Probability </li></ul>23 Adjustment the problems for small values of V and V i
5. 5. Probabilistic Model <ul><li>Advantage </li></ul><ul><ul><li>Documents are ranked in decreasing order of their probability of being relevant </li></ul></ul><ul><li>Disadvantage </li></ul><ul><ul><li>Guess the initial separation of documents into relevant and non-relevant sets </li></ul></ul><ul><ul><li>Binary weights </li></ul></ul><ul><ul><ul><li>Does not take into account the frequency with which an index term occurs inside a document </li></ul></ul></ul><ul><ul><li>Independence assumption for index terms </li></ul></ul>23
6. 6. 2.5.5 Brief Comparison of Classic Models <ul><li>Boolean Model </li></ul><ul><ul><li>Weakest classic method </li></ul></ul><ul><ul><li>Inability to recognize partial matches </li></ul></ul><ul><li>Vector Model </li></ul><ul><ul><li>Popular retrieval model </li></ul></ul><ul><li>Vector Model v.s Probabilistic Model </li></ul><ul><ul><li>Croft </li></ul></ul><ul><ul><ul><li>Probabilistic model provides a better retrieval performance </li></ul></ul></ul><ul><ul><li>Salton, Buckley </li></ul></ul><ul><ul><ul><li>Vector model is expected to outperform the probabilistic model with general collections </li></ul></ul></ul>23
7. 7. 참고 : Probabilistic Model Example of Probabilistic Retrieval Model <ul><li>Query : q(k 1 , k 2 ) </li></ul><ul><li>5 documents </li></ul><ul><li>Assume d 2 and d 4 are relevant </li></ul>23 d 1 d 2 d 3 d 4 d 5 k 1 1 0 1 1 0 k 2 0 1 0 1 1
8. 8. Probabilistic Model Example of Probabilistic Model <ul><ul><li>Q: “gold silver truck” </li></ul></ul><ul><ul><li>D1: “Shipment of gold damaged in a fire.” </li></ul></ul><ul><ul><li>D2: “Delivery of silver arrived in a silver truck.” </li></ul></ul><ul><ul><li>D3: “Shipment of gold arrived in a truck.” </li></ul></ul><ul><ul><li>Assume that documents D2 and D3 are relevant to the query </li></ul></ul><ul><ul><li>N= number of documents in the collection </li></ul></ul><ul><ul><li>R= number of relevant documents for a given query Q </li></ul></ul><ul><ul><li>n= number of documents that contain term t </li></ul></ul><ul><ul><li>r= number of relevant documents that contain term t </li></ul></ul>23 variable Gold Silver Truck N 3 3 3 R 2 2 2 n 2 1 2 r 1 1 2
9. 9. Probabilistic Model Example of Probabilistic Model (Cont.) <ul><li>Computing Term Relevance Weight </li></ul><ul><li>Similarity Coefficient for each document </li></ul><ul><ul><li>D1: -0.477, D2: 1.653, D3: 0.699 </li></ul></ul>23
10. 10. 참고 : Review of Probability Theory <ul><li>Try to predict whether or not a baseball team (e.g. LA Dodgers) will win one of its games </li></ul><ul><li>Let’s assume the independence of evidences </li></ul><ul><li>By Bayes’ Theorem </li></ul>23 We can’t calculate it, so… Probability to lose
11. 11. Probabilistic Model Review of Probability Theory (Cont.) <ul><li>Independent Assumption </li></ul><ul><li>Making Substitutions </li></ul>23
12. 12. Probabilistic Model Review of Probability Theory (Cont.) <ul><li>Making Substitutions (Cont.) </li></ul>The probability of LA Dodgers will win its game on sunny days when Chanho plays