SlideShare a Scribd company logo
1 of 11
By
SUJIT KUMAR DAS
Information Retrieval(IR) is finding materials(usually
documents) containing text(usually) that satisfy an
information need from within large collections(usually
stored on computers).
These days we frequently think first of web search, but
there are many others-
1. E-mail search
2.Searching your Laptop
3.Corporate Knowledge based
4.Legal information retrieval
Collection: A set of documents.
Assume it is a static collection for the moment
Goal: Retrieve documents with information that
is relevant to the user’s information need and
helps the user complete a task .
User task
Info need
Query
Search Engine
CollectionResult
Query
Refinement
Example:
Get rid of mice in a politically correct way(user task)
Information about removing mice without killing them(info need)
How trap mice alive(Query)
Precision: Fraction of retrieved docs that are relevant to
the user information need.
Recall: Fraction of relevant docs in collection that are
retrieved.
 The BRM can answer any query that is a Boolean
expression:
Queries using AND, OR and NOT to join query terms.
Views each document as a set of terms.
Is precise: document matches condition or not.
 Many professional searchers(e.g., lawyers)still like
Boolean queries:
You know exactly what you’re getting.
Example: E-mail search.
Level of IR system:
Higher Level
Eg. Web search
Intermediate Level
Eg. Enterprise search,
Domain Specific
search/vertical Search
Lower Level
Eg. Desktop search
E.g.,Medline
 Largest commercial legal search service in terms of number of
paying subscribers.
 Over half a million subscribers performing million of
searches a day over tens of terabytes of text data.
 The service was started in 1975.
 Boolean search(called ”terms and connectors” by WestLaw)
still the default and used by a large percentage of users
 although ranked retrieval has been available since 1992.
Information need: Information on the legal theories involved
in preventing the disclosure of trade secrets by employees
formerly employed by a competing company.
Lets suppose, you are working in a company and then you go
and work for rival company, so what laws are there to prevent
you to disclosing information, that you worked for previous
company to the new company now you are working?
Query:”trade secret”/s diclos!/s prevent/s employe!
 Long(avg. 10 words), precise queries that use proximity
operators(e.g., /p,/$).
 Not tolerant to spelling mistakes
 More weight should be given to documents containing
higher number of instances of terms.
 No ranking of returned results.

More Related Content

What's hot

Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
9866825059
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
SA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentSA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated Content
John Breslin
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 

What's hot (18)

Text mining
Text miningText mining
Text mining
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Text mining
Text miningText mining
Text mining
 
Text mining
Text miningText mining
Text mining
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
SA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentSA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated Content
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 

Similar to Information Retrieval-1

Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
Amanda Gray
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
alaa223
 

Similar to Information Retrieval-1 (20)

Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Chapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and RetrievalChapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and Retrieval
 
Competitive Intelligence Made easy
Competitive Intelligence Made easyCompetitive Intelligence Made easy
Competitive Intelligence Made easy
 
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
 
CS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdfCS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdf
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
Lec1
Lec1Lec1
Lec1
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
Hh
HhHh
Hh
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
5 lexis nexis legal innovation powered by ai_min chen
5 lexis nexis legal innovation powered by ai_min chen5 lexis nexis legal innovation powered by ai_min chen
5 lexis nexis legal innovation powered by ai_min chen
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
 
Research report nithish
Research report nithishResearch report nithish
Research report nithish
 
Research Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish KumarResearch Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish Kumar
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
chapter 1-Overview of Information Retrieval.ppt
chapter 1-Overview of Information Retrieval.pptchapter 1-Overview of Information Retrieval.ppt
chapter 1-Overview of Information Retrieval.ppt
 
Movie Recommendation System.pptx
Movie Recommendation System.pptxMovie Recommendation System.pptx
Movie Recommendation System.pptx
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
 

More from Jeet Das

More from Jeet Das (14)

Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Lecture 12
Lecture 12Lecture 12
Lecture 12
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Lecture 10
Lecture 10Lecture 10
Lecture 10
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Information Retrieval 08
Information Retrieval 08 Information Retrieval 08
Information Retrieval 08
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02
 
Information Retrieval 07
Information Retrieval 07Information Retrieval 07
Information Retrieval 07
 
Information Retrieval-06
Information Retrieval-06Information Retrieval-06
Information Retrieval-06
 
Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)
 
Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)
 
NLP
NLPNLP
NLP
 
Token classification using Bengali Tokenizer
Token classification using Bengali TokenizerToken classification using Bengali Tokenizer
Token classification using Bengali Tokenizer
 
Silent sound technology
Silent sound technologySilent sound technology
Silent sound technology
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 

Information Retrieval-1

  • 2. Information Retrieval(IR) is finding materials(usually documents) containing text(usually) that satisfy an information need from within large collections(usually stored on computers). These days we frequently think first of web search, but there are many others- 1. E-mail search 2.Searching your Laptop 3.Corporate Knowledge based 4.Legal information retrieval
  • 3. Collection: A set of documents. Assume it is a static collection for the moment Goal: Retrieve documents with information that is relevant to the user’s information need and helps the user complete a task .
  • 4. User task Info need Query Search Engine CollectionResult Query Refinement
  • 5. Example: Get rid of mice in a politically correct way(user task) Information about removing mice without killing them(info need) How trap mice alive(Query)
  • 6. Precision: Fraction of retrieved docs that are relevant to the user information need. Recall: Fraction of relevant docs in collection that are retrieved.
  • 7.  The BRM can answer any query that is a Boolean expression: Queries using AND, OR and NOT to join query terms. Views each document as a set of terms. Is precise: document matches condition or not.  Many professional searchers(e.g., lawyers)still like Boolean queries: You know exactly what you’re getting. Example: E-mail search.
  • 8. Level of IR system: Higher Level Eg. Web search Intermediate Level Eg. Enterprise search, Domain Specific search/vertical Search Lower Level Eg. Desktop search E.g.,Medline
  • 9.  Largest commercial legal search service in terms of number of paying subscribers.  Over half a million subscribers performing million of searches a day over tens of terabytes of text data.  The service was started in 1975.  Boolean search(called ”terms and connectors” by WestLaw) still the default and used by a large percentage of users  although ranked retrieval has been available since 1992.
  • 10. Information need: Information on the legal theories involved in preventing the disclosure of trade secrets by employees formerly employed by a competing company. Lets suppose, you are working in a company and then you go and work for rival company, so what laws are there to prevent you to disclosing information, that you worked for previous company to the new company now you are working? Query:”trade secret”/s diclos!/s prevent/s employe!  Long(avg. 10 words), precise queries that use proximity operators(e.g., /p,/$).
  • 11.  Not tolerant to spelling mistakes  More weight should be given to documents containing higher number of instances of terms.  No ranking of returned results.