2. Information retrieval is the activity of obtaining information
resources relevant to an information need from a collection of
information resources.
An information retrieval process begins when a user enters a query
into the system. Queries are formal statements of information
needs.
User queries are matched against the database information.
Depending on the application the data objects may be, for example,
text documents, images, audio, mind maps or videos.
Most IR systems compute a numeric score on how well each object in
the database matches the query, and rank the objects according to
this value.
The top ranking objects are then shown to the user. The process
may then be iterated if the user wishes to refine the query.
3. Every online database, every search engine,
everything that is searched online is based in
some way or another on principles developed
in IR
◦ IR is at the heart of searching used in systems such
as DIALOG, LexisNexis & others
Understanding the basics of IR is a
prerequisite for understanding how searching
of online systems works.
4. “Information retrieval embraces the intellectual
aspects of the description of information and
its specification for search, and also whatever
systems, techniques, or machines are
employed to carry out the operation.”
Calvin Mooers, 1951
Objective:
Provide the users with effective access to &
interaction with information resources.
5. 1. Document subsystem
a) Acquisition
b) Representation
c) File organization
2. User sub system
a) Problem
b) Representation
c) Query
3. Searching /Retrieval subsystem
a) Matching
b) Retrieved objects
6.
7. Acquisition
(Document subsystem)
Selection of documents & other objects from
various web resources
Mostly text based documents
◦ full texts, titles, abstracts ...
◦ but also other objects:
🞄 data, statistics, images, maps, trade marks, sounds ...
The data are collected by web crawler and
stored in data base.
8. Indexing – many ways :
◦ free text terms (even in full texts)
◦ controlled vocabulary - thesaurus
◦ manual & automatic techniques
Abstracting; summarizing
Bibliographic description:
◦ author, title, sources, date…
◦ metadata
Classifying, clustering
Organizing in fields & limits
◦ Basic Index, Additional Index. Limits
Representation of documents,
objects(document subsystem)
9. File organization
(Document subsystem)
Sequential
◦ record (document) by record
Inverted
◦ term by term; list of records under each term
Combination
indexes inverted, documents sequential
When citation retrieved only, need for
document files
Large file approaches
◦ for efficient retrieval by computers
10. Problem
(user subsystem)
Related to user‟s task, situation
◦ vary in specificity, clarity
Produces information need
◦ ultimate criterion for effectiveness of retrieval
🞄 how well was the need met?
Information need for the same problem may
change, evolve, shift during the IR process -
adjustment in searching
◦ often more than one search for same problem over
time
🞄 you will experience this in your term project
11. Representation
( user subsystem)
Converting a concept to query.
What we search for.
These are stemmed and corrected using
dictionary.
Focus toward a good result
Subject to feedback changes
12. Query - search statement
(user & system)
Translation into systems requirements & limits
◦ start of human-computer interaction
🞄 query is the thing that goes into the computer
Selection of files, resources
Search strategy - selection of:
◦ search terms & logic
◦ possible fields, delimiters
◦ controlled & uncontrolled vocabulary
◦ variations in effectiveness tactics
Reiterations from feedback
◦ several feedback types: relevance feedback, magnitude
feedback..
◦ query expansion & modification
13. Question is what user asks and what you
may then have elaborated
Query is what is asked of computer to
match – what is put in
Question is transformed into query
Question:
◦ I am interested in major historical developments
in the area of information retrieval?
Query
◦ history information retrieval (in Google)
14. Process of matching, comparing
◦ search: what documents in the file match the query as
stated?
Various search algorithms:
◦ exact match - Boolean
🞄 still available in most, if not all systems
◦ best match - ranking by relevance
🞄 increasingly used e.g. on the web
◦ hybrids incorporating both
🞄 e.g. Target, Rank in DIALOG
Each has strengths, weaknesses
◦ no „perfect‟ method exists
🞄 and probably never will
Matching - searching
(Searching subsystem)
15. Various order of output:
◦ Last In First Out (LIFO); sorted
◦ ranked by relevance
◦ ranked by other characteristics
Various forms of output
When citations only: possible links to
document delivery
Base for relevance, utility evaluation by users
Relevance feedback
Retrieved documents -from system
to user (IR Subsystem)
What a user (or you) sees, gets,
judges – can be specified
16. Described three parts: Document subsystem,
User sub system, Searching /Retrieval
subsystem
There are many search engine like Google,
Bing and Yahoo etc., but they never disclose
their methods of Information Retrieval.
Lot more to know about Information
Retrieval.