Ms. T. Primya
Assistant Professor
Department of Computer Science and Engineering
Dr. N. G. P. Institute of Technology
Coimbatore
 facts provided or learned about something or someone.
 what is conveyed or represented by a particular arrangement
or sequence of things.
 informing, telling, thing told, knowledge, items of knowledge,
news
 knowledge communicated or received concerning a particular
fact or circumstance
 knowing familiarity gained by experience
 person’s range of information
 a theoretical or practical understanding of the sum of what is
known
 Data
The raw material of information
 Information
Data organized and presented in a particular manner
 Knowledge
“Justified true belief”
Information that can be acted upon
 Wisdom
Distilled and integrated knowledge
Demonstrative of high-level “understanding”
 Data
98.6º F, 99.5º F, 100.3º F, 101º F, …
 Information
Hourly body temperature: 98.6º F, 99.5º F, 100.3º F, 101º F,..
 Knowledge
If you have a temperature above 100º F, you most likely have
a fever
 Wisdom
If you don’t feel well, go see a doctor
 Information as process
 Information as communication
 Information as message transmission and reception
 Information = characteristics of the output of a process
◦ Tells us something about the process and the input
 Information-generating process do not occur in isolation
(separation)
 Communication = transmission of information
 Communication = producing the same message at the
destination that was sent at the source
The message must be encoded for transmission across a
medium (called channel)
But the channel is noisy and can distort the message
 Semantics (meaning) is irrelevant
 Fetch something that’s been stored
 Recover a stored state of knowledge
 Search through stored messages to find some messages
relevant to the task at hand
 The tracing and recovery of specific information from stored
data.
 It is the activity of obtaining information system resources
relevant to an information need from a collection of
information resources. Searches can be based on full-text or
other content-based indexing.
 Information retrieval is the science of searching for
information in a document, searching for documents
themselves, and also searching for metadata that describe data,
and for databases of texts, images or sounds.
 An information retrieval process begins when a user enters a
query into the system.
 Queries are formal statements of information needs, for
example search strings in web search engines.
 In information retrieval a query does not uniquely identify a
single object in the collection.
 Instead, several objects may match the query, perhaps with
different degrees of relevancy.
 An object is an entity that is represented by information in a
content collection or database. User queries are matched
against the database information.
 In information retrieval the results returned may or may not
match the query, so results are typically ranked.
 This ranking of
results is a key
difference of
information
retrieval searching
compared to
database searching.
 Retrospective
“Searching the past”
Different queries posed against a static collection
Time invariant
 Prospective
“Searching the future”
Static query posed against a dynamic collection
Time dependent
Ad hoc retrieval: find documents “about this”
 Compile a list of mammals that are considered to be
endangered, identify their habitat and, if possible, specify what
threatens them.
Known item search
 Find Jimmy Lin’s homepage.
 What’s the ISBN number of “Introduction to Information
Retrieval”?
Directed exploration
 Who makes the best chocolates?
Question answering
“Factoid”
 Who discovered America?
 When did TamilNadu become a state?
 What team won the World Series in 1998?
“List”
 What countries export oil?
 Name Indian cities that have “Tourist” Spot.
“Definition”
 Who is Information?
 What is Retrieval?
 Filtering:
Make a binary decision about each incoming document
Ex: Spam or not
 Routing:
Sort incoming documents into different bins?
Ex: Categorize news headlines:
World? Nation? Metro? Sports
Defn:
A structured set of data held in a computer, especially one
that is accessible in various ways.
Example:
Banks storing account information
Retailers storing inventories
Universities storing student grades
Database IR
What we’re retrieving Structured data. Clear
semantics based on a
formal model.
Mostly unstructured. Free
text with some metadata.
Queries we’re posing Formally defined queries.
Unambiguous.
Vague, imprecise
information needs
Results we get Exact. Always correct in a
formal sense.
Sometimes relevant, often
not.
Interaction with system One-shot queries. Interaction is important
Other issues Concurrency, recovery,
atomicity are all critical
Issues downplayed.
 Precision: What fractions of the returned results are relevant
to the information need?
 Recall: What fractions of the relevant documents in the
collection were returned by the systems?
Precision=TP/(TP+FP)
Recall=TP/(TP+FN)
Relevant Non Relevant
Retrieved True positives (TP) False Positives (FP)
Not Retrieved False Negatives (FN) True Negatives (TN)
Crawling:
 The system browses the document collection and fetches
documents
Indexing:
 The system builds an index of the documents fetched during
crawling
Ranking:
 The system retrieves documents that are relevant to the query
from the index and displays to the user
Relevance feedback:
 The initial results returned from a given query may be used to
refine the query itself
Information  retrieval (introduction)
Information  retrieval (introduction)

Information retrieval (introduction)

  • 1.
    Ms. T. Primya AssistantProfessor Department of Computer Science and Engineering Dr. N. G. P. Institute of Technology Coimbatore
  • 2.
     facts providedor learned about something or someone.  what is conveyed or represented by a particular arrangement or sequence of things.  informing, telling, thing told, knowledge, items of knowledge, news  knowledge communicated or received concerning a particular fact or circumstance
  • 3.
     knowing familiaritygained by experience  person’s range of information  a theoretical or practical understanding of the sum of what is known
  • 5.
     Data The rawmaterial of information  Information Data organized and presented in a particular manner  Knowledge “Justified true belief” Information that can be acted upon  Wisdom Distilled and integrated knowledge Demonstrative of high-level “understanding”
  • 6.
     Data 98.6º F,99.5º F, 100.3º F, 101º F, …  Information Hourly body temperature: 98.6º F, 99.5º F, 100.3º F, 101º F,..  Knowledge If you have a temperature above 100º F, you most likely have a fever  Wisdom If you don’t feel well, go see a doctor
  • 7.
     Information asprocess  Information as communication  Information as message transmission and reception
  • 8.
     Information =characteristics of the output of a process ◦ Tells us something about the process and the input  Information-generating process do not occur in isolation (separation)
  • 9.
     Communication =transmission of information
  • 10.
     Communication =producing the same message at the destination that was sent at the source The message must be encoded for transmission across a medium (called channel) But the channel is noisy and can distort the message  Semantics (meaning) is irrelevant
  • 11.
     Fetch somethingthat’s been stored  Recover a stored state of knowledge  Search through stored messages to find some messages relevant to the task at hand
  • 12.
     The tracingand recovery of specific information from stored data.  It is the activity of obtaining information system resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing.  Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds.
  • 13.
     An informationretrieval process begins when a user enters a query into the system.  Queries are formal statements of information needs, for example search strings in web search engines.  In information retrieval a query does not uniquely identify a single object in the collection.  Instead, several objects may match the query, perhaps with different degrees of relevancy.  An object is an entity that is represented by information in a content collection or database. User queries are matched against the database information.
  • 14.
     In informationretrieval the results returned may or may not match the query, so results are typically ranked.  This ranking of results is a key difference of information retrieval searching compared to database searching.
  • 15.
     Retrospective “Searching thepast” Different queries posed against a static collection Time invariant  Prospective “Searching the future” Static query posed against a dynamic collection Time dependent
  • 16.
    Ad hoc retrieval:find documents “about this”  Compile a list of mammals that are considered to be endangered, identify their habitat and, if possible, specify what threatens them. Known item search  Find Jimmy Lin’s homepage.  What’s the ISBN number of “Introduction to Information Retrieval”? Directed exploration  Who makes the best chocolates?
  • 17.
    Question answering “Factoid”  Whodiscovered America?  When did TamilNadu become a state?  What team won the World Series in 1998? “List”  What countries export oil?  Name Indian cities that have “Tourist” Spot. “Definition”  Who is Information?  What is Retrieval?
  • 18.
     Filtering: Make abinary decision about each incoming document Ex: Spam or not  Routing: Sort incoming documents into different bins? Ex: Categorize news headlines: World? Nation? Metro? Sports
  • 19.
    Defn: A structured setof data held in a computer, especially one that is accessible in various ways. Example: Banks storing account information Retailers storing inventories Universities storing student grades
  • 21.
    Database IR What we’reretrieving Structured data. Clear semantics based on a formal model. Mostly unstructured. Free text with some metadata. Queries we’re posing Formally defined queries. Unambiguous. Vague, imprecise information needs Results we get Exact. Always correct in a formal sense. Sometimes relevant, often not. Interaction with system One-shot queries. Interaction is important Other issues Concurrency, recovery, atomicity are all critical Issues downplayed.
  • 23.
     Precision: Whatfractions of the returned results are relevant to the information need?  Recall: What fractions of the relevant documents in the collection were returned by the systems?
  • 24.
    Precision=TP/(TP+FP) Recall=TP/(TP+FN) Relevant Non Relevant RetrievedTrue positives (TP) False Positives (FP) Not Retrieved False Negatives (FN) True Negatives (TN)
  • 26.
    Crawling:  The systembrowses the document collection and fetches documents Indexing:  The system builds an index of the documents fetched during crawling Ranking:  The system retrieves documents that are relevant to the query from the index and displays to the user Relevance feedback:  The initial results returned from a given query may be used to refine the query itself