2. 134
•
Ellen Riloff and Lee Hollaar
1971] is a well-known method for automatic indexing that views each document
and query as a vector in an N-dimensional space, where N is the number of
relevant terms in the database. The
query vector is compared to all of the
document vectors using a similarity metric. Another retrieval model for automatic
indexing uses probability estimates to determine whether a document satisfies a
user’s query. For example, Bayesian inference networks have been used to compute the belief associated with a query for
each document in a database.
Relevance feedback techniques can
improve performance by asking the user
for feedback about the retrieved texts
[Salton 1989; Van Rijsbergen 1979]. The
user labels a subset of the retrieved
texts as relevant, and this information
is fed back into the system to modify the
original query, usually by adding new
terms or by changing the weights of the
original query terms. Relevance feedback has consistently been shown to
improve the performance of IR systems.
Experiments with richer text representations have also been conducted using natural-language processing (NLP)
techniques. Syntactic approaches have
been used to generate more complex
indexing terms consisting of phrases
and head-modifier structures. Knowledge-based NLP systems have been
used to generate conceptual meaning
representations of queries and documents. Information extraction techniques [Lehnert and Sundheim 1991]
have also been shown to be effective for
text classification problems, and represent a compromise between word-based
techniques and in-depth natural-language processing.
The future holds great promise for
integrating information-retrieval techniques with natural-language processing systems. The strengths of these
methodologies are largely complementary. IR systems use shallow text representations, which allows them to process large amounts of text quickly and
efficiently. But the accuracy of these
ACM Computing Surveys, Vol. 28, No. 1, March 1996
systems often suffers because of a lack
of semantic analysis, especially for complex information requests. Natural-language processing systems, on the other
hand, usually perform conceptual analyses, which allows them to produce
richer meanings and representations.
However, NLP techniques are more
computationally expensive and therefore are more difficult to scale up to
large text collections.
The information-retrieval community is
facing new challenges posed by larger
and more heterogeneous text databases,
which have led to an explosion of new
approaches and methodologies. As longer
texts become available on-line, new approaches are needed to process texts that
discuss multiple topics. A variety of techniques for subtopic identification and passage-based retrieval are actively being explored. Another area of active research is
intelligent information retrieval, which
draws upon techniques from artificial intelligence to generate richer text representations. Natural-language processing
methods (such as information extraction),
case-based reasoning techniques, and machine learning algorithms are all being
applied to information retrieval tasks in
the hopes of building more effective retrieval systems (for example, see ACM
[1995]). Intelligent information retrieval
is an exciting new direction for IR research.
REFERENCES
ACM. 1995. Proceedings of the 18th Annual
International ACM SIGIR Conference on
Research and Development in Information Retrieval. ACM, New York.
BELKIN, N. AND CROFT, W. B. 1992. Information
filtering and information retrieval: Two sides
of the same coin? Commun. ACM 35, 12,
29 –38.
FRAKES, W. B. AND BAEZA-YATES, R., EDS.
1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ.
HARMAN, D., ED. 1994. The Second Text REtrieval Conference (TREC2). National Institute of Standards and Technology Special
Publication 500 –215, Gaithersburg, MD.
LEHNERT, W. G. AND SUNDHEIM, B. 1991. A per-
3. Text Databases and Information Retrieval
formance evaluation of text analysis technologies. AI Mag. 12, 3, 81–94.
SALTON, G., ED. 1971. The SMART Retrieval
System: Experiments in Automatic Document
Processing. Prentice-Hall, Englewood Cliffs,
NJ.
•
135
SALTON, G. 1989. Automatic Text Processing:
The Transformation, Analysis, and Retrieval
of Information by Computer. Addison-Wesley,
Reading, MA.
VAN RIJSBERGEN, C. J. 1979. Information Retrieval (2nd Ed.). Butterworths, London.
ACM Computing Surveys, Vol. 28, No. 1, March 1996