1. Ms. T. Primya
Assistant Professor
Department of Computer Science and Engineering
Dr. N. G. P. Institute of Technology
Coimbatore
2. facts provided or learned about something or someone.
what is conveyed or represented by a particular arrangement
or sequence of things.
informing, telling, thing told, knowledge, items of knowledge,
news
knowledge communicated or received concerning a particular
fact or circumstance
3. knowing familiarity gained by experience
person’s range of information
a theoretical or practical understanding of the sum of what is
known
4.
5. Data
The raw material of information
Information
Data organized and presented in a particular manner
Knowledge
“Justified true belief”
Information that can be acted upon
Wisdom
Distilled and integrated knowledge
Demonstrative of high-level “understanding”
6. Data
98.6º F, 99.5º F, 100.3º F, 101º F, …
Information
Hourly body temperature: 98.6º F, 99.5º F, 100.3º F, 101º F,..
Knowledge
If you have a temperature above 100º F, you most likely have
a fever
Wisdom
If you don’t feel well, go see a doctor
7. Information as process
Information as communication
Information as message transmission and reception
8. Information = characteristics of the output of a process
◦ Tells us something about the process and the input
Information-generating process do not occur in isolation
(separation)
10. Communication = producing the same message at the
destination that was sent at the source
The message must be encoded for transmission across a
medium (called channel)
But the channel is noisy and can distort the message
Semantics (meaning) is irrelevant
11. Fetch something that’s been stored
Recover a stored state of knowledge
Search through stored messages to find some messages
relevant to the task at hand
12. The tracing and recovery of specific information from stored
data.
It is the activity of obtaining information system resources
relevant to an information need from a collection of
information resources. Searches can be based on full-text or
other content-based indexing.
Information retrieval is the science of searching for
information in a document, searching for documents
themselves, and also searching for metadata that describe data,
and for databases of texts, images or sounds.
13. An information retrieval process begins when a user enters a
query into the system.
Queries are formal statements of information needs, for
example search strings in web search engines.
In information retrieval a query does not uniquely identify a
single object in the collection.
Instead, several objects may match the query, perhaps with
different degrees of relevancy.
An object is an entity that is represented by information in a
content collection or database. User queries are matched
against the database information.
14. In information retrieval the results returned may or may not
match the query, so results are typically ranked.
This ranking of
results is a key
difference of
information
retrieval searching
compared to
database searching.
15. Retrospective
“Searching the past”
Different queries posed against a static collection
Time invariant
Prospective
“Searching the future”
Static query posed against a dynamic collection
Time dependent
16. Ad hoc retrieval: find documents “about this”
Compile a list of mammals that are considered to be
endangered, identify their habitat and, if possible, specify what
threatens them.
Known item search
Find Jimmy Lin’s homepage.
What’s the ISBN number of “Introduction to Information
Retrieval”?
Directed exploration
Who makes the best chocolates?
17. Question answering
“Factoid”
Who discovered America?
When did TamilNadu become a state?
What team won the World Series in 1998?
“List”
What countries export oil?
Name Indian cities that have “Tourist” Spot.
“Definition”
Who is Information?
What is Retrieval?
18. Filtering:
Make a binary decision about each incoming document
Ex: Spam or not
Routing:
Sort incoming documents into different bins?
Ex: Categorize news headlines:
World? Nation? Metro? Sports
19. Defn:
A structured set of data held in a computer, especially one
that is accessible in various ways.
Example:
Banks storing account information
Retailers storing inventories
Universities storing student grades
20.
21. Database IR
What we’re retrieving Structured data. Clear
semantics based on a
formal model.
Mostly unstructured. Free
text with some metadata.
Queries we’re posing Formally defined queries.
Unambiguous.
Vague, imprecise
information needs
Results we get Exact. Always correct in a
formal sense.
Sometimes relevant, often
not.
Interaction with system One-shot queries. Interaction is important
Other issues Concurrency, recovery,
atomicity are all critical
Issues downplayed.
22.
23. Precision: What fractions of the returned results are relevant
to the information need?
Recall: What fractions of the relevant documents in the
collection were returned by the systems?
26. Crawling:
The system browses the document collection and fetches
documents
Indexing:
The system builds an index of the documents fetched during
crawling
Ranking:
The system retrieves documents that are relevant to the query
from the index and displays to the user
Relevance feedback:
The initial results returned from a given query may be used to
refine the query itself