Signature files

1,845 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,845
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
43
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Signature files

  1. 1. NAME : DEEPALI RAIKAR<br />ROLL NO : 11150157<br />MSC.IT(PART – I )<br />
  2. 2. SIGNATURE FILES<br />
  3. 3. Typically “SIGNATURE FILE” is just a “BAG OF WORDS”<br />Signature files is a technique applied for “Document Retrieval”.<br />The main idea behind Signature Files is to create a quick link to the documents which match the queries passed by the user.<br />This is done by creating a signature for each document.<br />
  4. 4. A signature is created as an “abstraction” of a document.<br />A signature is a compressed version of a database.<br />All signatures that represent the documents are kept in a file called “SIGNATURE FILES”.<br />The signatures created are stored in the form of “HASH TABLES” to make it easy for retrieving the documents.<br />
  5. 5. Characteristics of signature file<br />Word oriented index structure<br />Low overhead<br />Suitable for not very large text<br />Suitable for conventional databases<br />For most applications inverted files <br /> outperform the signature file.<br />
  6. 6. There are various types of signatures, namely :<br />Word signatures<br />Is a fixed-length bit-string representation of word<br />Document Signatures<br />Query Signatures<br />
  7. 7. How Word Signatures are<br /> generated<br />Using “TRIPLETS” of word.<br />Each word is divided into the overlapping <br /> triplet of characters<br />triplet is given some numeric value<br />Use the number as the input to the Hash Function<br />The hash function produces a number which represents the bit position of the triplet in the word signature<br />
  8. 8. Example of a word signature<br />111000111001 is a signature created for word “SIGNATURE”<br />RE*<br />*SI<br />SIG<br />IGN<br />GNA<br />NAT<br />ATU<br />TUR<br />URE<br />12 3 7 3 2 9 1 12 8 <br />Numeric value<br /> of each triplet<br />111000111001<br />final word signature generated <br />using hash function<br />
  9. 9. Document signature<br />Can be created using two methods<br />Concatenation of word signature<br />Superimposed coding<br />Characteristics of Document signatures<br />The length can vary<br />A fixed number of bits may precede<br />Fixing the length of the document signature is possible<br />The length can be set to the longest document in the collection<br />For shorter documents extra “0” can be added.<br />
  10. 10. Example of signature file<br />
  11. 11. Which is better inverted file or signature file<br />Inverted Files<br />Accurate<br />Easy to maintain<br />Slow retrieval <br />Inverted files is the most popular storage structure for “INFORMATION RETRIEVAL”<br />

×