Research Report on Document Indexing-Nithish Kumar

Research Paper for CSCI 6370, Topics in Computer Science
Name: Sai Nithish Kumar Posani
SID : 20356909
Professor: Zhixiang Chen

Abstract:
The main theme of Informational retrieval is to send the exact response of a user for specific
Query.
Existing model:
The existing model and the functionality of the information retrieval is done by analyzing entire
document to give response to given query and the related terms to the query are extracted.Indexing
weight is plays a major role here, it is applied to all the terms and at the end it provides the
response to the user. In the existing model they did not considered the context into consideration
so the information cannot be retrieved efficiently.
Proposed Model:
In this paper, to gain the information retrieval in efficient way they proposed a context-sensitive
document indexing approach. By using a concept called lexical resource the content carrying
terms and background terms will separate in this approach. Here Indexing weight will be
calculate for content carrying terms. The highest indexing weight is taken account as the most
salient sentence and these sentences. Are retrieved and document summarization is done.
If user entered a query this will treat as a keyword the based on keyword search the information
will search.
Then this keyword is corresponding with the summarized document. Incase if it is matched the
related sentence will extracted by using the indexing algorithm. At the end these senses are
treated as a response of an appropriate response. By using this flow the information retrieval
process will be done successfully.
Overview:
In present world the internet usage and computers usage is rapidly increased. Even in villages
people are using computers or personnel systems. The usage of the computer technology is
raised. Compare with last two decades our technology is doubled, tripled, quadrupled. The main
focus here is to connect with internet and search different webpages from the different web pages
gather the information. The web sites may be entertainment, education, business, personnel,
social, and history, political, mechanical, and industrial it may be anything. If we enter the
content in the browser it will treat as a query. We have different type of client software’s
available nothing but browsers for example Google chrome, Mozilla Firefox, safari, torch…etc.
The web search has more importance now, because if we type something and click enter if it
takes 2 minutes to give response is useless. So everyone need fast and quick response from
server. And one more issue if you type something and you got the output is not related this is
also a different problem. For example if you type apple word for apple company information it
showing results like apple fruits, pine apple, advantages of eating apple, like that. This is also a
problem. Whenever, wherever if you search for anything you should get a good result, related
content should be display. To achieve such type of results you need to write a very good program

for the search engines. Here I am going to explain about the document indexing approach, by
using this approach we can get good result like as we discussed earlier. This type of content we
need a better idea on search engines. How user going to search, what he is going to expect form
server, what he is showing more interest like that. We have different techniques for search
engine development like Document indexing, web crawler, keyword search, document
clustering, link based ranking….etc. The term search has more importance here in the web
terminology. Content mining is the procedure of separating the valuable and superb data from the report.
The unique report in the site comprises of tremendous measure of data. The client finds harder to get the
fundamental topic of the report. Keeping in mind the end goal to conquer these troubles data recovery
from an abridged report is finished. The report rundown comprise of diverse sorts: single report rundown
is the procedure of compressing a solitary report. Multi report is utilized to compress the substance of one
or more reports. The fundamental point of the Information Retrieval is to fulfill the data need of a client.
He general undertaking of data recovery is utilized to recover the significant term as per the client
questions inside of the worthy reaction time. The primary point of the Data recovery is to give
Information sets which coordinating to the pivotal words of an inquiry. Data recovery predominantly
manages the representation, stockpiling, association furthermore, access to data things. Data recovery is
used to decrease the issue called data over-burden. Data over-burden alludes to the trouble of a man to
Comprehend the issues brought about by the vicinity of a lot of data. At the point when the client enters
the inquiry, the data is recovered. The inquiries are executed and recovered from the record by utilizing
SQL. At the point when the client enters the question, the inquiry is coordinated with the report. At that
point the inquiry related data is extricated. The best match is displayed as the reactions to the client.
Advantages:
 Reduced number of commands required to be known to the client for a given level of output.
 Here reduced number of clicks or keystrokes required to carry out a given appropriate
operation.
 It will give permission to consistent behavior to be pre-programmed or altered by the
user/client.
 It will reduces the number of choices to be on console at one time (i.e. "clutter")
 the splitting of Sentences : The content carrying term issued to give the important idea about
the main content
 Lexical association
 Context Based indexing approach
Disadvantages:
 The Context sensitive actions might be perceived as dumbing down of the user interface -leaving
the operative at a loss as to what to do when the computer decides to perform an unwanted action.
 Moreover non-automatic actions may be hidden or covered by the context sensitive interface
causing a rise in user workload for operations the developers did not foresee.
Improvisation:
In this paper they concentrated on Document indexing concept, it is really useful and it will raise the
efficiency of web pages. But my idea is concentrating on only document indexing is not fine for web

pages, at a time two or three different techniques should be implement in a single engine like Document
indexing, page ranking, crawler implementation, page clustering. All these applications can affect the
information search retrieval pattern. Here mainly information retrieval is playing a major role in each and
every aspect. That’s why we need to concentrate on each and every angle to get the output from server as
early as possible.
The information search retrieval is a very big process, to achieve this concept we need to develop an
application with more effect and we have to use techniques like Document indexing, page ranking,
clustering technique. Among all of these Document index is plays avital role while searching why since
instead of searching hundreds of thousands of documents it will directly go to the particular index and
will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is
storing an index is to optimize speed and performance in finding the appropriate/corresponding document
for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the
source document. Instead of searching every page on server, finding technically is better. Due to this we
can save our time, we can reduce the burden of server.
References:
1. Professor D.R. Radev, H. Jing, M. Stys, and D. Tam, "Centroid-Based
Summarization of Multiple Documents," Information Processing and
Management,
2. Professor, I. Mani, G. Klein, D. House, L. Hirschman, T. Firmin, and B.
Sundheim, "Summac: A Text Summarization Evaluation," Nat'
3. Professor, Xiaojun Wan Jianwu Yang Jianguo Xia "Towards an iterative
Reinforcement approach for simultaneous document summarization
And keyword extraction”
4. Professor, K. Morita, E.-S. Atlam, M. Fuketra, K. Tsuda, M. Oono, and .I.-i.
Aoe, "Word Classification and Hierarchy using Co-Occurrence Word
Intonation," Intonation Processing and Management,
5. Professor, H. Li, "Word Clustering and Disambiguation Based on Co-Occurrence
Data," Nat'! Language Eng.,
6. Professor, c.-Y. Lin, G. Cao, .I. Gao, and J.-Y. Nie, "An Information-Theoretic
Approach to Automatic Evaluation of Summaries," Proc. Main Conf.
Human Language Technology Conf. North Am. Chapter of the
Assoc. of Computational Linguistics,

Research Report on Document Indexing-Nithish Kumar

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (18)

Similar to Research Report on Document Indexing-Nithish Kumar

Similar to Research Report on Document Indexing-Nithish Kumar (20)

Recently uploaded

Recently uploaded (20)

Research Report on Document Indexing-Nithish Kumar