This document provides an outline for a course on Information Storage and Retrieval. It includes information on the course code, credits, target group, instructor contact details, course description and objectives. The course syllabus outlines 8 chapters covering topics like introduction to information retrieval systems, text operations and indexing, retrieval models and evaluation, query languages and operations, and current issues in IR. Student assessment will include assignments, tests, exams and a project. Reference books for the course are also listed.
Chapter 1: Introduction to Information Storage and Retrievalcaptainmactavish1996
Course material for 3rd year Information Technology students. Information Storage and Retrieval Course. Chapter 1: Introduction to Information storage and retrieval
Chapter 1: Introduction to Information Storage and Retrievalcaptainmactavish1996
Course material for 3rd year Information Technology students. Information Storage and Retrieval Course. Chapter 1: Introduction to Information storage and retrieval
This chapter introduces the notion of Information Retrieval (IR). it discusses after a survey of classification of various IR systems and major components of an IR system, the notion of Boolean Retrieval model and Invertex Index and extended Boolean are presented.
This slides are for a presentation at the 2009 IEEE/WIC/ACM International Conference on Web Intelligence. The major emphasis to this paper is concentrating on how to provide more personalized search support for a specific user considering his/her historical interests or recent interests. Cognitive memory retention like models are proposed and implemented in this system. Other supporting functionalities, such as domain distribution support, etc. are briefly mentioned. The whole paper can be downloaded from http://www.iwici.org/~yizeng/papers/WI2009-camera-ready.pdf
This chapter introduces the notion of Information Retrieval (IR). it discusses after a survey of classification of various IR systems and major components of an IR system, the notion of Boolean Retrieval model and Invertex Index and extended Boolean are presented.
This slides are for a presentation at the 2009 IEEE/WIC/ACM International Conference on Web Intelligence. The major emphasis to this paper is concentrating on how to provide more personalized search support for a specific user considering his/her historical interests or recent interests. Cognitive memory retention like models are proposed and implemented in this system. Other supporting functionalities, such as domain distribution support, etc. are briefly mentioned. The whole paper can be downloaded from http://www.iwici.org/~yizeng/papers/WI2009-camera-ready.pdf
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
2. Course Outline
Course Title:
Course Code:
Information Storage and Retrieval
ITec3081
ECTS Credits (CP): 5
Target Group: B.Sc. 3rd year
Information Technology Students (Regular
Program)
Year /Semester: Year: III, Semester: II
Status of the Course: Core
Contact Hours (per
week)
Section A S
e
c
t
i
Instructor: Habtamu Abate (M.Sc.)
Email: habate999@gmail.com
2
3. Course Outline
Course Description :
This course will cover introductory concepts of Information Storage
and Retrieval; automatic text operation including automatic indexing;
data and file structure for information retrieval; retrieval models;
evaluation of information retrieval systems and techniques for
enhancing retrieval effectiveness; query languages, query operations,
string manipulation and search algorithms; and Current Issues in IR
etc.
Course Objective
At the end of the course students will be able to:
Understand the various Information Retrieval Systems and
processes
Know the retrieval model and evaluation of Information Retrieval
Systems
Understand the processes of information storage and retrieval
Design ,develop and evaluate information retrieval models
Understand evaluation issues in IR
Understand current issues in IR 3
4. Course Syllabus
Chapter One
Introduction to ISR
IR and IR systems
Data versus information retrieval
IR and the retrieval process
Basic structure of an IR system
Chapter Two
Text/Document Operations and
Automatic Indexing
Index term selection (Luhn’s selection and
Zipf’s law in IR)
Document pre-processing (Lexical
analysis, Stop word Elimination, stemming)
Term extraction (Term weighting and
similarity measures)
Chapter Three
Indexing Structures
Inverted files
Tries, Suffix Trees and Suffix Arrays
Signature files
4
5. Course Syllabus
Chapter Four
IR Models
Introduction of IR Models
Chapter Five
Retrieval
Evaluation
Chapter Six
Query Languages
Chapter Seven
Query Operations
Chapter Eight
Current Issues in IR
Boolean model
Vector space model
Probabilistic model
Evaluation of IR systems
Relevance judgment
Performance measures (Recall, Precision, etc.)
Keyword-based queries
Pattern matching
Structural queries
Relevance feedback
Query expansion
Research in IR (Multimedia Retrieval, Web
Retrieval, Question answering. etc.)
5
6. Assessment
Assignments=10% ,Test=10% / Lab Exam=10%, Project work= 20 % ; Mid
Exam=20% ; Final examination= 40%
Text Book
Ricardo A. Baeza-Yates, Berthier Ribeiro-Neto, Modern Information
Retrieval, ACM Press.,2008.
Other Reference Books:
Salton, G. and McGill, M. J. Introduction to Modern Information Retrieval,
McGraw-Hill Co., 1983.
Robert R. Korfhage, Information Storage and Retrieval, John Wiley and
Sons, 1997.
Information Retrieval: Data Structures and Algorithms by W. B. Frakes
and R. Baeza-Yates (Eds.) (Prentice-Hall) 1992, ISBN 0-13-463837-9.
Spärck Jones, K. and Willett, P. (eds.). Readings in information retrieval.
San Francisco: Morgan Kaufmann, 1997.
6
7. Chapter 1: Introduction to Information
Storage and retrieval
Outline
• IR and IR systems
• Data versus information retrieval
• IR and the retrieval process
• Basic structure of an IR
7
8. Information Retrieval
Information retrieval (IR) is the process of
finding material (usually documents) of an
unstructured nature (usually text) that satisfies
an information need from within large collections
(usually stored on computers).
Information is organized into (a large number of)
documents
Large collections of documents from various
sources: news articles, research papers, books,
digital libraries, Web pages, etc.
Example: Web Search Engines like Google claim to
index over 1 Trillion pages 8
9. Information Retrieval
Document: Any object that can be
stored and can be retrieved
• Information Need: What kind of
information you need to find
– what you have in mind
• Query: Statement of an information
need or representation of an
information need.
• The user has an information need (IN)
in his/her mind.
The Retrieval System can’t
understand the IN directly.
• IN has to be abstracted into a form
that matches the information system
• This abstraction is a query
for
Examples
Good textbook
Information retrieval
9
10. General Goal of Information Retrieval
To help users find useful information based
on their information needs (with a minimum
effort) despite:
Increasing complexity of Information
Changing needs of user
Provide immediate random access to the
document collection.
10
11. Information Retrieval Systems?
• Document (Web page)
retrieval in response to a
query
– Quite effective (at
some things)
– Commercially
successful (some of
them)
• But what goes on behind
the scenes?
– How do they work?
– What happens beyond
the Web?
11
Web search systems
Lycos, Excite, Yahoo, Google, Live,
Northern Light, Teoma, HotBot, Baidu,
…
12. Examples of IR systems
Conventional (library catalog): Search by keyword,
title, author, etc.
Text-based (Lexis-Nexis, Google, FAST): Search
by keywords. Limited search using queries in natural
language.
Multimedia (QBIC, WebSeek, SaFe): Search b
y
visual appearance (shapes, colors,… ).
Question answering systems (AskJeeves,
Answerbus): Search in (restricted) natural language
Other:
Cross language information retrieval,
Music retrieval
12
13. Information Retrieval vs. Data Retrieval
Emphasis of IR is on the retrieval of information, rather
than on the retrieval of data
■ Data retrieval
Consists mainly of determining which documents contain a
set of keywords in the user query (which is not enough to
satisfy the user information need)
Aims at retrieving all objects that satisfy well defined
semantics
a single erroneous object among a thousand retrieved
objects implies failure
■ Information retrieval
Is concerned with retrieving information about a subject or
topic than retrieving data which satisfies a given query
semantics is frequently loose: the retrieved objects might
be inaccurate
small errors are tolerated
13
14. Information Retrieval vs. Data Retrieval
• Example of data retrieval system is a relational database
Data Retrieval Info
Retrieval
Data organization Structured Unstructured
Fields Clear Semantics
(ID, Name, age,…)
No fields (other
than text)
Query Language Artificial (defined,
SQL)
Free text (“natural
language”), Boolean
Matching Exact (results are
always “correct”)
Partial match, best match
Query specification Complete Incomplete
Items wanted Matching Relevant
Accuracy 100% < 50%
Error response Sensitive Insensitive
14
15. Basic Concepts in Information
Retrieval:
(i) User Task and (ii) Logical View of documents
The User Task:
two user task – retrieval and browsing
Retrieval
Browsing
DB
USER
15
16. The User Task
Retrieval
It is the process of retrieving information whereby the
main objective is clearly defined from the onset of
searching process.
The user of a retrieval system has to translate his
information need into a query in the language provided
by the system.
In this context (i.e. by specifying a set of words), the
user searches for useful information executing a
retrieval task
English Language Statement :
I want a book by Hadis Alemayehu titled Fikir Esk
Mekaber
16
17. Browsing
It is the process of retrieving information, whereby
the main objective is not clearly defined from the
beginning and whose purpose might change during the
interaction with the system.
E.g. User might search for documents about ‘car
racing’. Meanwhile S/he might find interesting
documents about ‘car manufacturers’. While reading
about car manufacturers in Addis, S/he might turn
his/her attention to a document providing ‘direction
to Addis’, and from this to documents which cover
‘Tourism in Ethiopia’.
In this context, user is said to be browsing in the
collection and not searching, since a user may has an
interest glancing around
17
18. Logical View of Documents
■ Documents in a collection are frequently represented by a
set of index terms or keywords
■ Such keywords are mostly extracted directly from the
text of the document
■ These representative keywords provide a logical view of
the document
■ Document representation viewed as a continuum, in which
logical view of documents might shift from full text to
index terms
Tokenization stemming Indexing
Docs stop words
Full
text
Index terms
18
19. Logical view of documents
• If full text :
– Each word in the text is a keyword
– Most complex form
– Expensive
• If full text is too large, the set of
representative keywords can be reduced
through transformation process called text
operation
• It reduce the complexity of the document
representation and allow moving the logical view
from that of a full text to a set of index terms
19
20. Structure of an IR System
An Information Retrieval System serves as a bridge between
the world of authors and the world of readers/users,
That is, writers present a set of ideas in a document using a set
of concepts. Then Users seek the IR system for relevant
documents that satisfy their information need.
Black box
User Documents
The black box is the information retrieval system.
To be effective in its attempt to satisfy information need of users,
the IR system must ‘interpret’ the contents of documents in a
collection and rank them according to their degree of relevance to the
user query.
Thus the notion of relevance is at the centre of IR
The primary goal of an IR system is to retrieve all the documents
which are relevant to a user query while retrieving as few non-
relevant documents as possible
20
21. Typical IR Task
Given:
A corpus of textual natural-language
documents.
A user query in the form of a textual
string.
Find:
A ranked set of documents that are
relevant to the query.
21
22. Typical IR System Architecture
IR
System
Query
String
Document
corpus
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
22
24. What is Information Retrieval ?
• A good formal definition of information retrieval is given
in Baeze-Yates & Riberio-Neto (1990, p1)
“Information retrieval deals with representation,
storage, organization of, and access to information
items. The organization and access of information items
should provide the user with easy access to the
information in which S/he is interested”
• The definition incorporates all important features of a
good information retrieval system
– Representation
– Storage
– Organization
– Access
• The focus is on the user information need
24
26. • It is necessary to define the text database before
any of the retrieval processes are initiated
• This is usually done by the manager of the database
and includes specifying the following
– The documents to be used
– The operations to be performed on the text
– The text model to be used (the text structure
and what elements can be retrieved)
• The text operations transform the original
documents and the information needs and generate a
logical view of them
The Retrieval Process
26
27. • Once the logical view of the documents is defined, the
database module builds an index of the text
– An index is a critical data structure
– It allows fast searching over large volumes of data
• Different index structures might be used , but the most
popular one is the inverted file.
• Given that the document database is indexed, the
retrieval process can be initiated
The user first specifies a user need which is then parsed
and transformed by the same text operation applied to
the text
Next the query operations is applied before the actual
query, which provides a system representation for the
user need, is generated
The Retrieval Process …
27
28. • The query is then processed to obtain the retrieved documents
Before the retrieved documents are sent to the user, the retrieved
documents are ranked according to the likelihood of relevance
The user then examines the set of ranked documents in the
search for useful information. Two choices for the user:
(i) reformulate query, run on entire collection or
(ii) reformulate query, run on result set
At this point, s/he might pinpoint a subset of the documents
seen as definitely of interest and initiate a user feedback cycle
In such a cycle, the system uses the documents selected by
the user to change the query formulation.
Hopefully, this modified query is a better representation of
the real user need
The Retrieval Process …
28
29. User
Interface
Text Operations
Query Language
& Operations Indexing
Searching
Ranking
Index
Text
Query
User
need
User
feedback
Ranked docs
Retrieved docs
Logical view
logical view
Inverted file
DB
manager
Module
Text
Database
Text
Detail view of the Retrieval
Process
29
30. Subsystems of an IR system
• The two subsystems of an IR system:
–Searching: is an online process of finding relevant
documents in the index list as per users query
–Indexing: is an offline process of organizing
documents using keywords extracted from the
collection
• Indexing and searching: are unavoidably connected
–you cannot search what was not first indexed in
some manner or other
–indexing of documents or objects is done in
order to be searchable
–there are many ways to do indexing
–to index one needs an indexing language
–there are many indexing languages
–even taking every word in a document is an indexing language
• Knowing searching is knowing indexing 30
32. Searching Subsystem
Index
Stemming & Normalize
stemmed terms
Stop list
non-stoplist
tokens
Similarity
Measure
ranking
Index terms
query parse query
query tokens
ranked
document
set
relevant
document set
Query Term weighting
terms
32
33. Issues that arise in IR
Text representation
what makes a “good” representation?
how is a representation generated from text?
what are retrievable objects and how are they
organized?
information needs representation
what is an appropriate query language?
how can interactive query formulation and
refinement be supported?
Comparing representations (to identify relevant
documents)
What weighting scheme and similarity measure to
be used?
what is a “good” model of retrieval?
Evaluating effectiveness of retrieval
what are good metrics?
what constitutes a good experimental test bed?
33
34. Focus in IR System Design
Our focus during IR system design is:
• In improving performance effectiveness of the system
–Effectiveness of the system is measured in terms of
precision, recall, …
–Stemming, stop words, weighting schemes, matching
algorithms
• In improving performance efficiency
–The concern here is storage space usage, access time,
searching time, data transfer time …
–Concern regarding space – time tradeoffs !!
–Use Compression techniques, data/file structures, etc.
34