NAME : DEEPALI RAIKARROLL NO : 11150157MSC.IT(PART – I )
SIGNATURE FILES
Typically “SIGNATURE FILE” is just a “BAG OF WORDS”Signature files is a technique applied for “Document Retrieval”.The main idea behind Signature Files is to create a quick link to the documents which match the queries passed by the user.This is done by creating a signature for each document.
A signature is created as an “abstraction” of a document.A signature is a compressed version of a database.All signatures that represent the documents  are kept in a file called “SIGNATURE FILES”.The signatures created are stored in the form of “HASH TABLES” to make it easy for retrieving the documents.
Characteristics of signature fileWord oriented index structureLow overheadSuitable for not very large textSuitable for conventional databasesFor most applications inverted files      outperform the signature file.
There are various types of signatures, namely :Word signaturesIs a fixed-length bit-string representation of wordDocument SignaturesQuery Signatures
How Word Signatures are generatedUsing “TRIPLETS” of word.Each word is divided into the overlapping     triplet of characterstriplet is given some numeric valueUse the number as the input to the Hash FunctionThe hash function produces a number  which represents the bit position of the triplet in the word signature
Example of a word signature111000111001 is a signature created for word “SIGNATURE”RE**SISIGIGNGNANATATUTURURE12        3           7          3           2         9           1         12        8   Numeric value of each triplet111000111001final word signature generated using hash function
Document signatureCan be created using two methodsConcatenation of word signatureSuperimposed codingCharacteristics of Document signaturesThe length can varyA fixed number of bits may precedeFixing the length of the document signature is possibleThe length can be set to the longest document in the collectionFor shorter documents extra “0” can be added.
Example of signature file
Which is better	 inverted file or signature fileInverted FilesAccurateEasy to maintainSlow retrieval Inverted files is the most popular storage structure for “INFORMATION RETRIEVAL”
Signature files

Signature files

  • 1.
    NAME : DEEPALIRAIKARROLL NO : 11150157MSC.IT(PART – I )
  • 2.
  • 3.
    Typically “SIGNATURE FILE”is just a “BAG OF WORDS”Signature files is a technique applied for “Document Retrieval”.The main idea behind Signature Files is to create a quick link to the documents which match the queries passed by the user.This is done by creating a signature for each document.
  • 4.
    A signature iscreated as an “abstraction” of a document.A signature is a compressed version of a database.All signatures that represent the documents are kept in a file called “SIGNATURE FILES”.The signatures created are stored in the form of “HASH TABLES” to make it easy for retrieving the documents.
  • 5.
    Characteristics of signaturefileWord oriented index structureLow overheadSuitable for not very large textSuitable for conventional databasesFor most applications inverted files outperform the signature file.
  • 6.
    There are varioustypes of signatures, namely :Word signaturesIs a fixed-length bit-string representation of wordDocument SignaturesQuery Signatures
  • 7.
    How Word Signaturesare generatedUsing “TRIPLETS” of word.Each word is divided into the overlapping triplet of characterstriplet is given some numeric valueUse the number as the input to the Hash FunctionThe hash function produces a number which represents the bit position of the triplet in the word signature
  • 8.
    Example of aword signature111000111001 is a signature created for word “SIGNATURE”RE**SISIGIGNGNANATATUTURURE12 3 7 3 2 9 1 12 8 Numeric value of each triplet111000111001final word signature generated using hash function
  • 9.
    Document signatureCan becreated using two methodsConcatenation of word signatureSuperimposed codingCharacteristics of Document signaturesThe length can varyA fixed number of bits may precedeFixing the length of the document signature is possibleThe length can be set to the longest document in the collectionFor shorter documents extra “0” can be added.
  • 10.
  • 11.
    Which is better inverted file or signature fileInverted FilesAccurateEasy to maintainSlow retrieval Inverted files is the most popular storage structure for “INFORMATION RETRIEVAL”