• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
집합모델  확장불린모델
 

집합모델 확장불린모델

on

  • 476 views

정보검색시스템 강의노트 강승식교수님

정보검색시스템 강의노트 강승식교수님

Statistics

Views

Total Views
476
Views on SlideShare
476
Embed Views
0

Actions

Likes
1
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    집합모델  확장불린모델 집합모델 확장불린모델 Presentation Transcript

    • 2.6 Alternative Set Theoretic Models
      • Fuzzy Set Model
      • Extended Boolean Model
    • 2.6.1 Fuzzy Set Model
      • Fuzzy Set Theory
        • Deals with the representation of classes whose boundaries are not well defined
        • Membership in a fuzzy set is a notion intrinsically gradual instead of abrupt (as in conventional Boolean logic)
      tall very tall 1 0 height 1 0 height Fuzzy Membership Conventional Membership
    • Fuzzy Set Model (Cont.)
      • Definition
      • Definition
    • Fuzzy Set Model (Cont.)
      • Fuzzy information retrieval
        • Representing documents and queries through sets of keywords yields descriptions which are only partially related to the real semantic contents of the respective documents and queries
        • Each query term defines a fuzzy set
        • Each document has a degree of membership in this set
      • Rank the documents relative to the user query
    • 2.6.2 Extended Boolean Model
      • Motivation
        • Boolean Model
          • Simple and elegant
          • No provision for term weighting
          • No ranking of the answer set
          • Output might be too large or too small
        • Vector space Model
          • Simple, fast, better retrieval performance
        • Extended Boolean Model
          • Combine Boolean query formulations with characteristics for the vector model
    • Extended Boolean Model (Cont.)
      • The Model is based on the Critique of a basic assumption of Boolean logic
        • Conjunction Boolean query :
          • Document which contains either the term k x or the term k y is as irrelevant as another document which contains neither of them
        • Disjunction Boolean query :
          • Document which contains either the term k x or the term k y is as relevant as another document which contains both of them
    • Extended Boolean Model (Cont.)
      • When only two terms are considered, queries and documents are plotted in a two dimensional map
      k x and k y d j d j+1 k y k x (1,0) (0,1) (0,0) (1,1) ` d j d j+1 k x or k y (1,0) (0,0) (1,1) (0,1) k y k x
    • Extended Boolean Model (Cont.)
      • Disjunctive query :
        • Point (0,0) is the spot to be avoided
        • Measure of similarity
          • Distance from the point (0,0)
      • Conjunctive query :
        • Point (1,1) is the most desirable spot
        • Measure of similarity
          • Complement of the distance from the point (1,1)
    • Extended Boolean Model (Cont.)
      • P-norm Model
        • Generalizes the notion of distance to include not only Euclidean distance but also p -distances
        • p value is specified at query time
        • Generalized disjunctive query
        • Generalized conjunctive query
    • Extended Boolean Model (Cont.)
      • P-norm Model query-document similarity
      • Example
    • 2.7 Alternative Algebraic Models
      • Generalized Vector Space Model
      • Latent Semantic Indexing Model
      • Neural Network Model
    • 2.7.1 Generalized Vector Space Model
      • Three classic models
        • Assume independence of index terms
      • Generalized vector space model
        • Index term vectors are assumed linearly independent but are not pairwise orthogonal
        • Co-occurrence of index terms inside documents in the collection induces dependencies among these index terms
        • Document ranking is based on the combination of the standard term-document weights with the term-term correlation factors
    • 2.7.2 Latent Semantic Indexing Model
      • Motivation
        • Problem of lexical matching method
          • There are many ways to express a given concept ( synonymy )
            • Relevant documents which are not indexed by any of the query keywords are not retrieved
          • Most words have multiple meanings ( polysemy )
            • Many unrelated documents might be included in the answer set
      • Idea
        • Map each document and query vector into a lower dimensional space which is associated with concepts
          • Can be done by Singular Value Decomposition
    • 2.7.3 Neural Network Model
      • Motivation
        • In a conventional IR system,
          • Document vectors are compared with query vectors for the computation of a ranking
          • Index terms in documents and queries have to be matched and weighted for computing this ranking
        • Neural networks are known to be good pattern matchers and can be an alternative IR model
        • Neural networks is a simplified graph representation of the mesh of interconnected neurons in human brain
          • Node: processing unit, edge: synaptic connections
          • Weight: strength of connection,
          • Spread activation
    • Neural Network Model (Cont.)
      • Three layers
        • query terms, document terms, documents
      • Spread activation process
        • At the first phase: the query term nodes initiate the process by sending signals to the document term nodes , and then the document term nodes generate signals to the document nodes
        • The document nodes generate new signals back to the document term nodes , and then the document term nodes again fire new signals to the document nodes (repeat this process)
        • Signals become weaker at each iteration and the process eventually halts
    • Neural Network Model (Cont.)
      • Example
        • D1
          • Cats and dogs eat.
        • D2
          • The dog has a mouse
        • D3
          • Mice eat anything
        • D4
          • Cats play with mice and rats
        • D5
          • Cats play with rats
        • Query
          • Do cats play with mice?
    • 2.8 Alternative Probabilistic Models
      • Bayesian Networks
      • Inference Network Model
      • Belief Network Model
    • 2.8.1 Bayesian Networks
      • Bayesian networks are directed acyclic graphs(DAGs)
        • node : random variables
          • The parents of a node are those judged to be direct causes for it.
        • arcs : causal relationships bet’n variables
          • The strengths of causal influences are expressed by conditional probabilities.
      x 1 x 2 x 3 x 4 x 5
    • 2.8.2 Inference Network Model
      • Use evidential reasoning to estimate the probability that a document will be relevant to a query
      • The ranking of a document d j with respect to a query q is a measure of how much evidential support the observation of d j provides to the query q
    • Inference Network Model(Cont.)
      • Simple inference Networks
      A B C D E X Y F
    • Inference Network Model(Cont.)
      • Link Matrices
        • Indicate the strength by which parents (either by themselves or in conjunction with other parents) affect children in the inference network
      0.95 0.8 0.2 0.1 Y false 0.05 0.2 0.8 0.9 Y true P(D)=0.8 P(E)=0.4
    • Inference Network Model(Cont.)
      • Inference Network Example
        • Three Layers: document layer, term layer, and query layer
        • Documents are represented as nodes, and a link exists from a document to a term.
      t 1 t 3 t 4 t 2 D1 D2 D3 Q t 2 t 3 d 1 d 2 d 3 t 1 t 2 t 3 t 4 Q Document Layer Concept Layer Query Layer
    • Inference Network Model(Cont.)
      • Relevance Ranking with Inference Network
        • Processing begins when a document, say D 1 , is instantiated (we believe D 1 has been observed)
        • This instantiates all term nodes in D 1
        • All links emanate from the term nodes just activated are instantiated , and a query node is activated
        • The query node then computes the belief in the query given D 1 This is used as the similarity coefficient for D 1
        • This process continues until all documents are instantiated
    • Inference Network Model(Cont.)
      • Example of computing similarity coefficient
      Q : “gold silver truck” D 1 : “Shipment of gold damaged in a fire.” D 2 : “Delivery of silver arrived in a silver truck.” D 3 : “Shipment of gold arrived in a truck.” 1 1 0 1 1 1 0 0 0 1 1 D3 0.5 0 1 0.5 0.5 0 0 0.5 0 0.5 0.5 D2 0 1 0 1 1 1 1 0 1 0 1 D1 0.37 0.37 0.37 0 0 0.37 1 1 1 0.37 0 nidf 0.41 0.41 0.41 0 0 0.41 1.10 1.10 1.10 0.41 0 idf t 11 t 10 t 9 t 8 t 7 t 6 t 5 t 4 t 3 t 2 t 1
    • Inference Network Model(Cont.)
      • Constructing Link Matrix for Terms
        • Computing the belief in a given term (k i )
          • Given a document (d j )
          • P ij = 0.5 + 0.5(ntf ij )(nidf i )
          • P gold3 = 0.5 + 0.5(0.37)(1) = 0.685
        • Link Matrix
      0.685 0.685 0 True 0.315 0.315 1 False D1 D3 D1 D3 D1 D3 gold 0.685 True 0.315 False D2 silver 0.592 0.685 0 True 0.408 0.315 1 False D2 D3 D2 D3 D2 D3 truck
    • Inference Network Model(Cont.)
      • Computing Similarity Coefficient
        • A link matrix for a query node
        • bel(gold|D 1 ) = 0.685, bel(silver|D 1 ) = 0, bel(truck|D 1 ) = 0,
        • Bel(Q|D 1 ) = 0.1(0.315)(1)(1) + 0.3(0.685)(1)(1) + 0.3(0.315)(0)(1) + 0.5(0.685)(0)(1) + 0.5(0.315)(1)(0) + 0.7(0.685)(1)(0) + 0.7(0.315)(0)(0) + 0.9(0.685)(0)(0) = 0.237
        • bel(gold|D 2 ) = 0, bel(silver|D 2 ) = 0.685, bel(truck|D 2 ) = 0.592, Bel(Q|D 2 ) = 0.589
        • bel(gold|D 3 ) = 0.685, bel(silver|D 3 ) = 0, bel(truck|D 3 ) = 0.685, Bel(Q|D 3 ) = 0.511
      0.5 0.5 t 0.7 0.3 gt 0.5 0.5 gs 0.7 0.3 st 0.1 0.9 gst 0.9 0.3 0.3 True 0.1 0.7 0.7 False gst s g