Your SlideShare is downloading. ×
집합모델  확장불린모델
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

집합모델 확장불린모델

328
views

Published on

정보검색시스템 강의노트 강승식교수님

정보검색시스템 강의노트 강승식교수님


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
328
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 2.6 Alternative Set Theoretic Models
    • Fuzzy Set Model
    • Extended Boolean Model
  • 2. 2.6.1 Fuzzy Set Model
    • Fuzzy Set Theory
      • Deals with the representation of classes whose boundaries are not well defined
      • Membership in a fuzzy set is a notion intrinsically gradual instead of abrupt (as in conventional Boolean logic)
    tall very tall 1 0 height 1 0 height Fuzzy Membership Conventional Membership
  • 3. Fuzzy Set Model (Cont.)
    • Definition
    • Definition
  • 4. Fuzzy Set Model (Cont.)
    • Fuzzy information retrieval
      • Representing documents and queries through sets of keywords yields descriptions which are only partially related to the real semantic contents of the respective documents and queries
      • Each query term defines a fuzzy set
      • Each document has a degree of membership in this set
    • Rank the documents relative to the user query
  • 5. 2.6.2 Extended Boolean Model
    • Motivation
      • Boolean Model
        • Simple and elegant
        • No provision for term weighting
        • No ranking of the answer set
        • Output might be too large or too small
      • Vector space Model
        • Simple, fast, better retrieval performance
      • Extended Boolean Model
        • Combine Boolean query formulations with characteristics for the vector model
  • 6. Extended Boolean Model (Cont.)
    • The Model is based on the Critique of a basic assumption of Boolean logic
      • Conjunction Boolean query :
        • Document which contains either the term k x or the term k y is as irrelevant as another document which contains neither of them
      • Disjunction Boolean query :
        • Document which contains either the term k x or the term k y is as relevant as another document which contains both of them
  • 7. Extended Boolean Model (Cont.)
    • When only two terms are considered, queries and documents are plotted in a two dimensional map
    k x and k y d j d j+1 k y k x (1,0) (0,1) (0,0) (1,1) ` d j d j+1 k x or k y (1,0) (0,0) (1,1) (0,1) k y k x
  • 8. Extended Boolean Model (Cont.)
    • Disjunctive query :
      • Point (0,0) is the spot to be avoided
      • Measure of similarity
        • Distance from the point (0,0)
    • Conjunctive query :
      • Point (1,1) is the most desirable spot
      • Measure of similarity
        • Complement of the distance from the point (1,1)
  • 9. Extended Boolean Model (Cont.)
    • P-norm Model
      • Generalizes the notion of distance to include not only Euclidean distance but also p -distances
      • p value is specified at query time
      • Generalized disjunctive query
      • Generalized conjunctive query
  • 10. Extended Boolean Model (Cont.)
    • P-norm Model query-document similarity
    • Example
  • 11. 2.7 Alternative Algebraic Models
    • Generalized Vector Space Model
    • Latent Semantic Indexing Model
    • Neural Network Model
  • 12. 2.7.1 Generalized Vector Space Model
    • Three classic models
      • Assume independence of index terms
    • Generalized vector space model
      • Index term vectors are assumed linearly independent but are not pairwise orthogonal
      • Co-occurrence of index terms inside documents in the collection induces dependencies among these index terms
      • Document ranking is based on the combination of the standard term-document weights with the term-term correlation factors
  • 13. 2.7.2 Latent Semantic Indexing Model
    • Motivation
      • Problem of lexical matching method
        • There are many ways to express a given concept ( synonymy )
          • Relevant documents which are not indexed by any of the query keywords are not retrieved
        • Most words have multiple meanings ( polysemy )
          • Many unrelated documents might be included in the answer set
    • Idea
      • Map each document and query vector into a lower dimensional space which is associated with concepts
        • Can be done by Singular Value Decomposition
  • 14. 2.7.3 Neural Network Model
    • Motivation
      • In a conventional IR system,
        • Document vectors are compared with query vectors for the computation of a ranking
        • Index terms in documents and queries have to be matched and weighted for computing this ranking
      • Neural networks are known to be good pattern matchers and can be an alternative IR model
      • Neural networks is a simplified graph representation of the mesh of interconnected neurons in human brain
        • Node: processing unit, edge: synaptic connections
        • Weight: strength of connection,
        • Spread activation
  • 15. Neural Network Model (Cont.)
    • Three layers
      • query terms, document terms, documents
    • Spread activation process
      • At the first phase: the query term nodes initiate the process by sending signals to the document term nodes , and then the document term nodes generate signals to the document nodes
      • The document nodes generate new signals back to the document term nodes , and then the document term nodes again fire new signals to the document nodes (repeat this process)
      • Signals become weaker at each iteration and the process eventually halts
  • 16. Neural Network Model (Cont.)
    • Example
      • D1
        • Cats and dogs eat.
      • D2
        • The dog has a mouse
      • D3
        • Mice eat anything
      • D4
        • Cats play with mice and rats
      • D5
        • Cats play with rats
      • Query
        • Do cats play with mice?
  • 17. 2.8 Alternative Probabilistic Models
    • Bayesian Networks
    • Inference Network Model
    • Belief Network Model
  • 18. 2.8.1 Bayesian Networks
    • Bayesian networks are directed acyclic graphs(DAGs)
      • node : random variables
        • The parents of a node are those judged to be direct causes for it.
      • arcs : causal relationships bet’n variables
        • The strengths of causal influences are expressed by conditional probabilities.
    x 1 x 2 x 3 x 4 x 5
  • 19. 2.8.2 Inference Network Model
    • Use evidential reasoning to estimate the probability that a document will be relevant to a query
    • The ranking of a document d j with respect to a query q is a measure of how much evidential support the observation of d j provides to the query q
  • 20. Inference Network Model(Cont.)
    • Simple inference Networks
    A B C D E X Y F
  • 21. Inference Network Model(Cont.)
    • Link Matrices
      • Indicate the strength by which parents (either by themselves or in conjunction with other parents) affect children in the inference network
    0.95 0.8 0.2 0.1 Y false 0.05 0.2 0.8 0.9 Y true P(D)=0.8 P(E)=0.4
  • 22. Inference Network Model(Cont.)
    • Inference Network Example
      • Three Layers: document layer, term layer, and query layer
      • Documents are represented as nodes, and a link exists from a document to a term.
    t 1 t 3 t 4 t 2 D1 D2 D3 Q t 2 t 3 d 1 d 2 d 3 t 1 t 2 t 3 t 4 Q Document Layer Concept Layer Query Layer
  • 23. Inference Network Model(Cont.)
    • Relevance Ranking with Inference Network
      • Processing begins when a document, say D 1 , is instantiated (we believe D 1 has been observed)
      • This instantiates all term nodes in D 1
      • All links emanate from the term nodes just activated are instantiated , and a query node is activated
      • The query node then computes the belief in the query given D 1 This is used as the similarity coefficient for D 1
      • This process continues until all documents are instantiated
  • 24. Inference Network Model(Cont.)
    • Example of computing similarity coefficient
    Q : “gold silver truck” D 1 : “Shipment of gold damaged in a fire.” D 2 : “Delivery of silver arrived in a silver truck.” D 3 : “Shipment of gold arrived in a truck.” 1 1 0 1 1 1 0 0 0 1 1 D3 0.5 0 1 0.5 0.5 0 0 0.5 0 0.5 0.5 D2 0 1 0 1 1 1 1 0 1 0 1 D1 0.37 0.37 0.37 0 0 0.37 1 1 1 0.37 0 nidf 0.41 0.41 0.41 0 0 0.41 1.10 1.10 1.10 0.41 0 idf t 11 t 10 t 9 t 8 t 7 t 6 t 5 t 4 t 3 t 2 t 1
  • 25. Inference Network Model(Cont.)
    • Constructing Link Matrix for Terms
      • Computing the belief in a given term (k i )
        • Given a document (d j )
        • P ij = 0.5 + 0.5(ntf ij )(nidf i )
        • P gold3 = 0.5 + 0.5(0.37)(1) = 0.685
      • Link Matrix
    0.685 0.685 0 True 0.315 0.315 1 False D1 D3 D1 D3 D1 D3 gold 0.685 True 0.315 False D2 silver 0.592 0.685 0 True 0.408 0.315 1 False D2 D3 D2 D3 D2 D3 truck
  • 26. Inference Network Model(Cont.)
    • Computing Similarity Coefficient
      • A link matrix for a query node
      • bel(gold|D 1 ) = 0.685, bel(silver|D 1 ) = 0, bel(truck|D 1 ) = 0,
      • Bel(Q|D 1 ) = 0.1(0.315)(1)(1) + 0.3(0.685)(1)(1) + 0.3(0.315)(0)(1) + 0.5(0.685)(0)(1) + 0.5(0.315)(1)(0) + 0.7(0.685)(1)(0) + 0.7(0.315)(0)(0) + 0.9(0.685)(0)(0) = 0.237
      • bel(gold|D 2 ) = 0, bel(silver|D 2 ) = 0.685, bel(truck|D 2 ) = 0.592, Bel(Q|D 2 ) = 0.589
      • bel(gold|D 3 ) = 0.685, bel(silver|D 3 ) = 0, bel(truck|D 3 ) = 0.685, Bel(Q|D 3 ) = 0.511
    0.5 0.5 t 0.7 0.3 gt 0.5 0.5 gs 0.7 0.3 st 0.1 0.9 gst 0.9 0.3 0.3 True 0.1 0.7 0.7 False gst s g