SlideShare a Scribd company logo
1 of 31
Course overview and introduction
Modern Information Retrieval
University of Qom
Z. Imanimehr
Spring 2023
 Instructor: Zahra Imanimehr
 Email: z.imanimehr@qom.ac.ir
 Eitaa ID: @Zahra.imanimehr
 Teacher Assistant: Sepideh Chehreh
 Eitaa Channel
Course info
2
Text books
 Main:
 Introduction to Information Retrieval, C.D. Manning, P.
Raghavan and H. Schuetze, Cambridge University Press,
2008.
 Free online version is available at: http://informationretrieval.org/
3
 Midterm: 15%
 Final Exam: 40%
 Exercises: 15%
 Project: 20%
 Class activity: 10%
 Rule 1: If the student's exercises and midterm total score is less
than 3, her project will not be delivered.
 Rule 2: If the student's exercises and midterm and project total
score is less than 5, she will not pass the lesson.
 Rule 3: If the student's final exam score is less than 4, she will
not pass
the lesson.
Marking scheme
4
Typical IR system
 Given: corpus & user query
 Find: A ranked set of docs relevant to the query.
Corpus: A collection of documents
5
Query
A list of
Ranked
Document
s
IR System
Document
corpus
Information Retrieval (IR)
 Information Retrieval (IR) is finding material
(usually documents) of an unstructured nature
(usually text) that satisfies an information need
from within large collections [IIR Book].
 Retrieving relevant documents to a query (while
retrieving as few non-relevant documents as
possible)
 especially from large sets of documents efficiently.
6
Information Retrieval (IR)
7
– These days we frequently think first of web search,
but there are many other cases:
• Corporate search engines
• E-mail search
• Searching your laptop
• Legal information retrieval
8
9
10
11
Basic Definitions
 Document: a unit decided to build a retrieval system
over
 textual: a sequence of words, punctuation, etc that
express ideas about some topic in a natural language.
 Corpus or collection: a set of documents
 Information need: information required by the user
about some topics
 Query: formulation of the information need
12
Heuristic nature of IR
 Problem: Semantic gap between query and docs
 A doc is relevant if the user perceives that this doc
contains his information need
 How to extract information from docs and how to use it to
decide relevance
 Solution: IR system must interpret and rank docs
according to the amount of relevance to the user’s
query.
 “The notion of relevance is at the center of IR.”
13
Minimize search overhead
 Search overhead: Time spent in all steps leading to
the reading of items containing the needed
information
 Steps: query generation, query execution, scanning
results, reading non-relevant items, etc.
 The amount of online data has grown at least as
quickly as the speed of computers
14
Condensing the data (indexing)
15
 Indexing the corpus to speed up the searching task
 Using the index instead of linearly scanning the docs that
is computationally expensive for large collections
 Indexing depends on the query language and IR model
 Term (index unit): A word, phrase, and other groups
of symbols used for retrieval
 Index terms are useful for remembering the document
themes
Typical IR system architecture
16
User
Interface
Text Operations
Query
Operations
Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
Corpus
Text
IR system components
 Text Operations forms index terms
 Tokenization, stop word removal, stemming, …
 Indexing constructs an index for a corpus of docs.
 Query Operations transform the query to improve
retrieval:
 Query expansion using a thesaurus or query
transformation using relevance feedback
 Searching retrieves docs that are related to the
query.
17
IR system components (continued)
 Ranking scores retrieved documents according to
their relevance.
 User Interface manages interaction with the user:
 Query input and visualization of results
 Relevance feedback
18
Structured vs. unstructured docs
 Unstructured text (free text): a continuous sequence
of tokens
 Structured text (fielded text): text is broken into fields
that are distinguished by tags or other markup
 Semi-structured text
 e.g. web page
19
Databases vs. IR:
Structured vs. unstructured data
 Structured: data tends to refer to information in
“tables”
Typically allows numerical range and exact match (for
text) queries, e.g.,
GPA < 16 AND Supervisor = Chang.
20
Student
Name
Student ID Supervisor
Name
GPA
Smith 20116671 Joes 12
Joes 20114190 Chang 14.1
Lee 20095900 Chang 19
Semi-structured data
 In fact almost no data is “unstructured”
 E.g., this slide has distinctly identified zones such as the
Title and Bullets
 Facilitates “semi-structured” search such as
 Title contains data AND Bullets contain search
… to say nothing of linguistic structure
21
Semi-structured data
22
Unstructured (text) vs. structured (database)
data in the mid-nineties
23
0
50
100
150
200
250
Data volume Market Cap
Unstructured
Structured
Unstructured (text) vs. structured (database)
data today
24
0
50
100
150
200
250
Data volume Market Cap
Data retrieval vs. information retrieval
 Data retrieval
 which items contain a set of keywords? Or satisfy the
given (e.g., regular expression like) user query?
 well defined structure and semantics
 a single erroneous object implies failure!
 Information retrieval
 information about a subject
 semantics is frequently loose (natural language is not well
structured and may be ambiguous)
 small errors are tolerated
25
Evaluation of results
 Precision: Fraction of retrieved docs that are relevant
to user’s information need
 Precision = relevant retrieved / total retrieved
= Retrieved  Relevant  / Retrieved 
 Recall: Fraction of relevant docs that are retrieved
 Recall = relevant retrieved / relevant exist
= Retrieved  Relevant  /  Relevant 
Sec. 1.1
26
Relevan
t
Retrieve
d
Retrieved &
Relevant
Example
27
 Assume that there are 8 relevant docs to the query
𝑄.
 List of the retrieved docs for 𝑄 :
 d1: R
 d2: NR
 d3: R
 d4: R
 d5: NR
 d6: NR
 d7: NR
𝑃 =
3
7
𝑅 =
3
8
Web Search
 Application of IR to (HTML) documents on the World
Wide Web.
 Web IR
 collect doc corpus by crawling the web
 exploit the structural layout of docs
 Beyond terms, exploit the link structure (ideas
from social networks)
 link analysis, clickstreams ...
28
Web IR
Web
29
Query
A list of
Ranked
Pages
IR System
corpus
Crawler
The web and its challenges
 Web collection properties
 Distributed nature of the web collection
 Size of the collection and volume of the user queries
 Web advertisement (web is a medium for business too)
 Predicting relevance on the web
 Docs change uncontrollably (dynamic and volatile data
)
 Unusual and diverse (heterogeneous) docs, users, and
queries
30
Course main topics
 Introduction
 Indexing & text operations
 IR Models
 Boolean, vector space, probabilistic
 Evaluation of IR systems
 Web IR: crawling, link-based algorithms,
personalization, and etc
31

More Related Content

Similar to Introduction.pptx

PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Templatebutest
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Jean Brenda
 
Modern information Retrieval-Relevance Feedback
Modern information Retrieval-Relevance FeedbackModern information Retrieval-Relevance Feedback
Modern information Retrieval-Relevance FeedbackHasanulFahmi2
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...csandit
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...cscpconf
 
A semantic based approach for knowledge discovery and acquistion from multipl...
A semantic based approach for knowledge discovery and acquistion from multipl...A semantic based approach for knowledge discovery and acquistion from multipl...
A semantic based approach for knowledge discovery and acquistion from multipl...csandit
 
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...IJDKP
 
Modern Association Rule Mining Methods
Modern Association Rule Mining MethodsModern Association Rule Mining Methods
Modern Association Rule Mining Methodsijcsity
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...IJwest
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methodsijcsity
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
Trends and innovations in database course
Trends and innovations in database courseTrends and innovations in database course
Trends and innovations in database courseNeetu Sardana
 
A Survey And Taxonomy Of Distributed Data Mining Research Studies A Systemat...
A Survey And Taxonomy Of Distributed Data Mining Research Studies  A Systemat...A Survey And Taxonomy Of Distributed Data Mining Research Studies  A Systemat...
A Survey And Taxonomy Of Distributed Data Mining Research Studies A Systemat...Sandra Long
 

Similar to Introduction.pptx (20)

PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval
 
Modern information Retrieval-Relevance Feedback
Modern information Retrieval-Relevance FeedbackModern information Retrieval-Relevance Feedback
Modern information Retrieval-Relevance Feedback
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
 
A semantic based approach for knowledge discovery and acquistion from multipl...
A semantic based approach for knowledge discovery and acquistion from multipl...A semantic based approach for knowledge discovery and acquistion from multipl...
A semantic based approach for knowledge discovery and acquistion from multipl...
 
Lecture 0 INT306.pptx
Lecture 0 INT306.pptxLecture 0 INT306.pptx
Lecture 0 INT306.pptx
 
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
 
Modern Association Rule Mining Methods
Modern Association Rule Mining MethodsModern Association Rule Mining Methods
Modern Association Rule Mining Methods
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Db lec 01
Db lec 01Db lec 01
Db lec 01
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
Dbms
DbmsDbms
Dbms
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
 
Az31349353
Az31349353Az31349353
Az31349353
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Trends and innovations in database course
Trends and innovations in database courseTrends and innovations in database course
Trends and innovations in database course
 
Cs501 intro
Cs501 introCs501 intro
Cs501 intro
 
A Survey And Taxonomy Of Distributed Data Mining Research Studies A Systemat...
A Survey And Taxonomy Of Distributed Data Mining Research Studies  A Systemat...A Survey And Taxonomy Of Distributed Data Mining Research Studies  A Systemat...
A Survey And Taxonomy Of Distributed Data Mining Research Studies A Systemat...
 

Recently uploaded

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

Introduction.pptx

  • 1. Course overview and introduction Modern Information Retrieval University of Qom Z. Imanimehr Spring 2023
  • 2.  Instructor: Zahra Imanimehr  Email: z.imanimehr@qom.ac.ir  Eitaa ID: @Zahra.imanimehr  Teacher Assistant: Sepideh Chehreh  Eitaa Channel Course info 2
  • 3. Text books  Main:  Introduction to Information Retrieval, C.D. Manning, P. Raghavan and H. Schuetze, Cambridge University Press, 2008.  Free online version is available at: http://informationretrieval.org/ 3
  • 4.  Midterm: 15%  Final Exam: 40%  Exercises: 15%  Project: 20%  Class activity: 10%  Rule 1: If the student's exercises and midterm total score is less than 3, her project will not be delivered.  Rule 2: If the student's exercises and midterm and project total score is less than 5, she will not pass the lesson.  Rule 3: If the student's final exam score is less than 4, she will not pass the lesson. Marking scheme 4
  • 5. Typical IR system  Given: corpus & user query  Find: A ranked set of docs relevant to the query. Corpus: A collection of documents 5 Query A list of Ranked Document s IR System Document corpus
  • 6. Information Retrieval (IR)  Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections [IIR Book].  Retrieving relevant documents to a query (while retrieving as few non-relevant documents as possible)  especially from large sets of documents efficiently. 6
  • 7. Information Retrieval (IR) 7 – These days we frequently think first of web search, but there are many other cases: • Corporate search engines • E-mail search • Searching your laptop • Legal information retrieval
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. Basic Definitions  Document: a unit decided to build a retrieval system over  textual: a sequence of words, punctuation, etc that express ideas about some topic in a natural language.  Corpus or collection: a set of documents  Information need: information required by the user about some topics  Query: formulation of the information need 12
  • 13. Heuristic nature of IR  Problem: Semantic gap between query and docs  A doc is relevant if the user perceives that this doc contains his information need  How to extract information from docs and how to use it to decide relevance  Solution: IR system must interpret and rank docs according to the amount of relevance to the user’s query.  “The notion of relevance is at the center of IR.” 13
  • 14. Minimize search overhead  Search overhead: Time spent in all steps leading to the reading of items containing the needed information  Steps: query generation, query execution, scanning results, reading non-relevant items, etc.  The amount of online data has grown at least as quickly as the speed of computers 14
  • 15. Condensing the data (indexing) 15  Indexing the corpus to speed up the searching task  Using the index instead of linearly scanning the docs that is computationally expensive for large collections  Indexing depends on the query language and IR model  Term (index unit): A word, phrase, and other groups of symbols used for retrieval  Index terms are useful for remembering the document themes
  • 16. Typical IR system architecture 16 User Interface Text Operations Query Operations Indexing Searching Ranking Index Text query user need user feedback ranked docs retrieved docs Corpus Text
  • 17. IR system components  Text Operations forms index terms  Tokenization, stop word removal, stemming, …  Indexing constructs an index for a corpus of docs.  Query Operations transform the query to improve retrieval:  Query expansion using a thesaurus or query transformation using relevance feedback  Searching retrieves docs that are related to the query. 17
  • 18. IR system components (continued)  Ranking scores retrieved documents according to their relevance.  User Interface manages interaction with the user:  Query input and visualization of results  Relevance feedback 18
  • 19. Structured vs. unstructured docs  Unstructured text (free text): a continuous sequence of tokens  Structured text (fielded text): text is broken into fields that are distinguished by tags or other markup  Semi-structured text  e.g. web page 19
  • 20. Databases vs. IR: Structured vs. unstructured data  Structured: data tends to refer to information in “tables” Typically allows numerical range and exact match (for text) queries, e.g., GPA < 16 AND Supervisor = Chang. 20 Student Name Student ID Supervisor Name GPA Smith 20116671 Joes 12 Joes 20114190 Chang 14.1 Lee 20095900 Chang 19
  • 21. Semi-structured data  In fact almost no data is “unstructured”  E.g., this slide has distinctly identified zones such as the Title and Bullets  Facilitates “semi-structured” search such as  Title contains data AND Bullets contain search … to say nothing of linguistic structure 21
  • 23. Unstructured (text) vs. structured (database) data in the mid-nineties 23 0 50 100 150 200 250 Data volume Market Cap Unstructured Structured
  • 24. Unstructured (text) vs. structured (database) data today 24 0 50 100 150 200 250 Data volume Market Cap
  • 25. Data retrieval vs. information retrieval  Data retrieval  which items contain a set of keywords? Or satisfy the given (e.g., regular expression like) user query?  well defined structure and semantics  a single erroneous object implies failure!  Information retrieval  information about a subject  semantics is frequently loose (natural language is not well structured and may be ambiguous)  small errors are tolerated 25
  • 26. Evaluation of results  Precision: Fraction of retrieved docs that are relevant to user’s information need  Precision = relevant retrieved / total retrieved = Retrieved  Relevant  / Retrieved   Recall: Fraction of relevant docs that are retrieved  Recall = relevant retrieved / relevant exist = Retrieved  Relevant  /  Relevant  Sec. 1.1 26 Relevan t Retrieve d Retrieved & Relevant
  • 27. Example 27  Assume that there are 8 relevant docs to the query 𝑄.  List of the retrieved docs for 𝑄 :  d1: R  d2: NR  d3: R  d4: R  d5: NR  d6: NR  d7: NR 𝑃 = 3 7 𝑅 = 3 8
  • 28. Web Search  Application of IR to (HTML) documents on the World Wide Web.  Web IR  collect doc corpus by crawling the web  exploit the structural layout of docs  Beyond terms, exploit the link structure (ideas from social networks)  link analysis, clickstreams ... 28
  • 29. Web IR Web 29 Query A list of Ranked Pages IR System corpus Crawler
  • 30. The web and its challenges  Web collection properties  Distributed nature of the web collection  Size of the collection and volume of the user queries  Web advertisement (web is a medium for business too)  Predicting relevance on the web  Docs change uncontrollably (dynamic and volatile data )  Unusual and diverse (heterogeneous) docs, users, and queries 30
  • 31. Course main topics  Introduction  Indexing & text operations  IR Models  Boolean, vector space, probabilistic  Evaluation of IR systems  Web IR: crawling, link-based algorithms, personalization, and etc 31

Editor's Notes

  1. IR system: interpret contents of information items generate a ranking which reflects relevance notion of relevance is most important
  2. Document might be a paragraph, a section, a chapter, a Web page, an article, or a whole book.
  3. Needed information: Either Sufficient information in the system to complete a task. All information in the system relevant to the user needs. Example –shopping: Looking for an item to purchase. Looking for an item to purchase at minimal cost. Example –researching: Looking for a bibliographic citation that explains a particular term. Building a comprehensive bibliography on a particular subject.
  4. Databases vs. Information Retrieval DATABASES We know the schema in advance, so semantic correlation between queries and data is clear. We can get exact answers Strong theoretical foundation (at least with relational…) IR No schema, but rather unstructured natural language text. The result is that there is not a clear semantic correlation between queries and data. We get inexact, estimated answers Theory not well understood (especially Natural Language Processing) Full text searching: Methods that compare the query with every word in the text, without distinguishing the function of the various words. Fielded searching: Methods that search on specific bibliographic or structural fields, such as author or title.
  5. To the surprise of many, the search box has become the preferred method of information access. Customers ask: Why can’t I search my database in the same way?
  6. subjective measures