This document describes a file search engine that indexes files on a system to allow users to search for files by name or content. It discusses different indexing techniques considered for the project, including sparse, dense, hash, and tree indexing, and explains why multi-level indexing was implemented. Multi-level indexing uses three databases - the hard disk drive, a mirror database, and a cache database. The document also outlines some potential future work, such as adding specific search for music/video files and enabling searches across systems on a LAN.
2. INTRODUCTION
In systems with large file storage, to remember the path of any file is a challenging job
now a days, in other words often it is not possible for the user to remember exactly path
where he has stored a particular file. This project provides a user friendly window in which
the user enters the name of the file and gets the exact path of the file as output.
Proposed system will overcome the existing system disadvantages like if the you don’t
know where it is stored, what is the file name not even the substring, modified date of file,
size then this search tool is helpless. Then this tool help to search the file by taking one
attribute from the user. That attribute is taking some text or contents contains in the file.
That is this tools also search file by reading it and if matches with required contents then it
shows the desired result.
4. INDEXING USED IN OUR PROJECT
Sparse indexing
Dense indexing
Hash Indexing
Tree Indexing
Multi-Level indexing (Implemented)
5. SPARSE INDEXING
In case of the sparse index is where first actually data is sorted and then it is having anchor
of all different pointer which contains the initials of the data attribute.
For example, if we search for the file name “chetan” then the files starting with “chetan” is
actually in a sorted similar to the dictionary.
When the first file is fetched then all subsequent files can be accessed easily just by
incrementing the address by one.
This will efficient if and only if the initials are matched with the requested file name. But if
the keyword “chetan” is present in the middle of the file name, then for that particular file
the indexing value will be different because initial of that file is different.
Hence, Sparse index will be useful only if the user requested file name matches with the
indexed value. But we are not expecting the user to enter the initial of the required file.
He/ She should be independent to enter the substring of the file name.
6. DENSE INDEXING
We are using the recursive algorithm for searching each and every file from the hard disk.
If the files in the hard disk are in unsorted manner then fetching of this files will directly
makes the entry in the database in unsorted order.
Suppose all the files in each and every folder from the hard disk are in sorted order and
when we apply recursive algorithm to fetch all the files ,we get the output in sorted form.
Since each sorted output is again merged into single file the resultant file will remain
unsorted.
As arrangement of resultant files in the database are in unsorted order, therefore it make
sense to use the dense index for fetching the required files.
7. HASH INDEXING
If we want to fetch the file by specifying the exact file name then we
can use the hash index. But very fewer user will be aware of the exact
file name.
Hence, we are not using hash indexing.
Tree Indexing
• We know that tree indexing is used for result expecting in specific range.
But in our file search engine this range concept is not applicable.
12. FUTURE WORK
The software is equipped with the specific search facility for music and
videos which often are the most frequently searched files.
This software can be expanded to the systems connected to LAN so
that file can also be searched from any other computer.