SlideShare a Scribd company logo
1 of 25
Download to read offline
FOUNDATIONS OF
INFORMATION RETRIEVAL
Lynda Tamine-Lechani
lynda.lechani@irit.fr
https://www.irit.fr/~Lynda.Tamine-Lechani/
FOUNDATIONS OF INFORMATION RETRIEVAL
2
•Course description
Study the theory, design, and implementation of information retrieval systems
from the perspectives of:
ü information representation: focus on texts
ü theoretical information retrieval model: focus on language model and learning-
based models
ü Performance evaluation: focus on system-centred evaluation
•Learning objectives
ü Index and represent textual information;
ü Recall and discuss well-known information retrieval models;
ü Design, implement and evaluate the performance of information retrieval
systems using retrieval algorithms and models discussed in class.
© L. Tamine-Lechani
FOUNDATIONS OF INFORMATION RETRIEVAL
3
•Organization
o 12H course, 6H tutorial: Lynda Tamine-Lechani
o 10H hands-on work: Jesus-Lovon Melgajero, José G. Moréno and Lynda Tamine-
Lechani
•Prerequisites
o Python programming
o Basics in probability and statistics
•Course material
o Copies of the lecture slides are posted on the MOODLE site
o Book and readings references are provided
•Grading
o 1st session
üHands-on experience with techniques discussed in class: assignment of 30% of the final score
üFinal written exam in class: assignment of 70% of the final score
o 2nd session
üFinal written exam in class: assignment of 100% of the final score
© L. Tamine-Lechani
FOUNDATIONS OF INFORMATION RETRIEVAL
4
•Schedule
Lecture Topic
1 Course Introduction; Text indexing, vector semantics
2 Static embeddings, contextual embeddings
3 Infomation retrieval (IR) models: query reformulation, learning to
rank
4 Tutorial 1: Text indexing and representation
5 Neural models for IR
6 Page Rank, Performance evaluation
7 Tutorial 2: information retrieval techniques and models
8 Question answering systems and chatbots
9 Tutorial 3: performance evaluation
© L. Tamine-Lechani
Information retrieval: Algorithms and Heuristics
David A. Grossamnn, Ophir Frieder, Kluwer
Academic Publishers, 1998
Modern information retrieval
R.B Yates, R. Neto, ACM Press Addisson Wesley, 1999
Recherche d'information, applications, modèles et algorithmes
M.R Amini et E. Gaussier, Eyrolles 2012
Search engines in practice
B. Croft, D. Metzler, T. Trohman, Pearson 2010
Books
5
FOUNDATIONS OF INFORMATION RETRIEVAL
© L. Tamine-Lechani
Calvin Mooers 1951 :
Information retrieval (IR) is the name for the process or method whereby a prospective user of
information is able to convert his need for information into an actual list of citations to documents in
storage containing information useful to him. .. Information retrieval is crucial to documentation and
organization of knowledge". (Mooers, 1951, p. 25)
Salton, 1980 :
Information retrieval systems are designed to help analyze and describe the items stored in a file, to
organize them and search among them, and finally to retrieve them in response to a user's query.
Designing and using a retrieval system involves four major activities: information analysis, information
organization and search, query formulation, and information retrieval and dissemination.
Information retrieval (IR) in computing and information science is the process of obtaining
information system resources that are relevant to an information need from a
collection of those resources. Searches can be based on full-text or other content-based
indexing.
6
Information Retrieval (IR): definitions
Introduction
© L. Tamine-Lechani
...Yes, but also refer to:
- Search in digital libraries
- Search in campany corpus
- Search in specialized corpus (health, legal, biological –related resources)
- Search for a location
- Search for answers
- Recommend items
- Summarize reviews
- ...
7
Definitions refer to ....well-known search engines ?
Introduction
© L. Tamine-Lechani
• Wide-variety of search systems, interaction environments
o Web search engines
o Conversational agents
o E commmerce: Amazon, AirBnb, ...
o Media recommendation: Netflix, Spotify, ...
8
From search to
conversation
Search and navigate on
maps
Cross-device search
Heatmaps on SERP
...with voice only!
...and different forms of user-system interactions
Introduction
© L. Tamine-Lechani
(Web) search systems that select from a corpus of texts documents those that are
relevant to a user information need experssed by the user using a query.
9
Focus in this lecture
Introduction
Query Documents
Selection
Information
need
Corpus
System's
answer to the query
© L. Tamine-Lechani
10
Basic notions: Document
Introduction
• Document: information unit being searched
- Document
- Paragraph
- Phrase
- Structure unit (section, chapter,...)
•Different views 1. Introduction
Information
retrieval....
2. Basics
The notion of
query…
Date : 15/01/2013
Author : Albert
Langue : Français
….
This course introdues
the basics of
information retrieval
Content
Metadata
Structure
© L. Tamine-Lechani
• Different media
Text (monomedia)
Image
Multimedia
Video
11
Basic notions: Document
Introduction
© L. Tamine-Lechani
•Different forms
-Document
-Blog
-Tweet
-News
-Presentation
-E-mail
--..
12
Basic notions: Document
Introduction
© L. Tamine-Lechani
• What the user seeks for: an information need
• How the user expresses his information need : a query
In this course: a query is a
list of keywords
13
Basic notions: information need, query
Introduction
© L. Tamine-Lechani
• A key concept in information retrieval
A document is relevant if it matches the information need. Numerous types of
relevance:
o Topical (aboutness) relevance: the document covers the query topic
o Situational relevance: the document matches the user's situation (e.g., task,
location, ...)
o Cognitive relevance : the documents fits with the user's knowledge state
o ...
and numerous criteria of relevance:
- Novelty
- Fresheness
- Language
- Specificity
- Trust
- ...
The main focus in this course is topical relevance: useful and "easy" to define and to
measure, but it does not cover everything related to relevance
14
Basic notions: Relevance
Introduction
© L. Tamine-Lechani
15
What makes information retrieval challenging ?
Introduction
© L. Tamine-Lechani
© NIST (TREC)
16
What makes information retrieval challenging ?
Introduction
© L. Tamine-Lechani
•Deluge of information
o Large-scale information
o Often little ratio of information is relevant and/or useful for a query
o Information is noisy
o Information is not always trusty
o Hetrogeneous information forms and sources
o ...
Source : Infographic
Increasing volumes of information
available on increasing information sources: social applications
mobile devices, sensors, ...
2003
Réseaux sociaux
2001
Wiki
1998
Recherche
1995
Annuaire
1994
E-commerce
1990
WWW
1972
ARPANET
1999
Blogs
2001
Wiki
2003
Réseaux sociaux
17
Information is every where
Introduction
© L. Tamine-Lechani
18
Focus on Web 3.0: The digital world today
Introduction
© L.Tamine-Lechani
•1st place: platforms for
publication/sharing of texts (mostly),
newsletters, podcasts, videos, photos,
o Wikipedia, Blogger, Google Poadcast,
youtube, Flickr, TripAdvisor, ...
•2nd place: platforms for messaging
o Facebook, Messenger, telegram,...
•3rd place: platforms for conversations
o Quora, StackExchange, Reddit, Facebook
groups, Google Groups, ...
•4th place: platforms for collaboration
o Facebook workplace, TeamWork, Chatter,
...
image credit https://fredcavazza.net/2021/05/06/panorama-des-
medias-sociaux-2021/
19
2003
Réseaux sociaux
2003
Réseaux sociaux
Source :
https://datastudio.google.com/embed/reporting/1sImC_rjeWqNXdgQt5MtmrQMbH44qFjtA/page/1fzh
• Google processes in 2020 more than 7
milliards of queries every day among
which 15% have never been submitted
before (new queries)
• The number of users in the world is
estimated as 2.77 milliards on social
media, 2.46 milliards in 2017
• 51%, or more than 240 milliards of
dollars, de tout l'argent publicitaire
dépensé dans le monde en 2019 seront
basés sur les médias numériques.
• Les ventes en ligne devraient atteindre
3.45 billions de dollars de ventes en 2020
• 47.3% de la population mondiale devrait
acheter en ligne en 2020.
Statitistics on usage
of information
access systems
2014-2020
Some statistics 2020-2021: information and users
Introduction
image credit https://www.internetlivestats.com/
• Users and information
shared in live 2021
20
What makes information retrieval challenging?
Introduction
© L. Tamine-Lechani
•Information needs are ambiguous
oQueries are generally short, ambiguous
oThe matching between queries and intents is M-N
Roi lion
1 Queryà N intents
- Master UPS Intelligence artificielle
- Université paul Sabatier IA
- Formation IA Toulouse
- Matsre IAFA
..
M Queriesà 1 intent
21
What makes information retrieval challenging ?
Introduction
© L. Tamine-Lechani
•Relevance is subjective
o Relevance is subjective
ü User-dependent
ü Situation-dependant
ü Topicality is often the threshold relevance
•Relevance faces vocabulary mismatch between queries and
documents
o Matching as word overlap: is it really semantic overlap?
Q: "most jurisdictions exercise a high degree of regulation over banks" [financial institution]
D1: "I have been stolen when I withdrew the money from the bank" [Building]
D2: "fish lined the bank of the stream" [The land alongside or sloping down to a river or lake]
o Matching is not exact, rough matching between queries and documents
Q: "Presidential Elections in France"
D1 : "Election campaign is running"
[relevant, but missing ‘presidential’ and ‘France’]
D2 : "Macron, the President of France is attending COP21"
[irrelevant, and matching ‘France’ and ‘President’]
22
What makes information retrieval challenging ?
Introduction
© L. Tamine-Lechani
•Queries and documents vary in length
oModels must handle variable length input
oRelevant documents have irrelevant content
Q: "variant Omicron symptomes"
D: "Le variant Omicron a déjà atteint plusieurs patients en France après avoir fait son apparition en Afrique du Sud. S'il semble plus
transmissible, il ne serait pas plus virulent. Mais quels sont ses symptômes ?
Le 26 novembre dernier, l’Organisation mondiale de la Santé (OMS) qualifiait le variant Omicron, nouvellement apparu en Afrique
du Sud, de « préoccupant » sur la base de sa rapidité de propagation. De nombreux cas commencent depuis à émerger à travers le
monde, dont quelques-uns en France.
Mais concernant sa dangerosité ou ses symptômes, le grand flou règne. Alors, que savons-nous ?
En se basant sur les situations en Afrique du Sud et au Royaume-Unis, l'OMS a indiqué dans une mise au point technique que le
variant Omicron semble se propager plus vite que Delta.
Néanmoins, contrairement à ce dernier, les symptômes seraient moins sévères.
Pas de perte de goût ou d’odorat
Interrogée par la BBC, le Dr Angelique Coetzee, présidente de l’Association médicale sud-africaine, qui fut l’une des premières à
être confrontée à Omicron, a indiqué que les symptômes qu’elle a pu observer semblent moins spécifiques que ceux de la maladie
originelle. « Cela a débuté avec un patient de sexe masculin âgé d’environ 33 ans », a-t-elle expliqué lors de cet entretien.
« Il a déclaré qu’il était extrêmement fatigué ces derniers jours et se plaignait de courbatures et de légers maux de tête. » Mais
l’homme n’a pas perdu son sens du goût ni celui de l’odorat ; il avait la « gorge qui le grattait », et non pas un mal de gorge et
une toux comme avec les variants précédents.
Elle a également déclaré que les autres patients auscultés le même jour « présentaient les mêmes symptômes bénins ".
Source: https://www.leprogres.fr/magazine-sante/2021/12/13/variant-omicron-quels-sont-les-premiers-symptomes-
detectes
23
What makes information retrieval similar vs. different from data retrieval (Databases)?
Introduction
© L. Tamine-Lechani
Information retrieval Data retrieval
Information unit Information Data (attribute-value)
Query Vague expression of an
information need
Vague expressio
Language of the query Natural language Formel language
Matching query-information Approximatif Exact
Selected information Information relevant to the
query
All the data that satifies the
query
Documents
Documents
representations
Information need
Query
Selected documents
Indexing Expression
Matching
Feedback
24
Copyright L.Tamine-Lechani
The basic process of information retrieval
Introduction
FOUNDATIONS OF INFORMATION RETRIEVAL
25
• Lecture structure
oIntroduction
o Chapter 1: Text indexing and representation
"How to transform raw texts into machinable representations?
Keywords: indexation, words, documents, representation learning of texts
o Chapter 2: Information retrieval (IR) models
"How to score the relevance of a document as an answer to a user's
query?"
Keywords: relevance status value, retrieval model
o Chapter 3: Performance evaluation of an IR system
"How to measure the performance of an information retrieval system?"
Keywords: evaluation metrics, test collections
o Chapter 4: From question-answering systems to chatbots
"How to interact with systems while searching for information?"
Keywords: conversation, turn, clarification

More Related Content

Similar to Introduction to irs notes easy way learning

L yuan alt c 3
L yuan alt c 3L yuan alt c 3
L yuan alt c 3cetisli
 
ESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionErik Mannens
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsMarina Santini
 
Information Management
Information ManagementInformation Management
Information ManagementNadeem Raza
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciencesopenminted_eu
 
Rise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen RomboutsRise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen RomboutsLibrary_Connect
 
Synthesising JISC Institutional Innovation
Synthesising JISC Institutional InnovationSynthesising JISC Institutional Innovation
Synthesising JISC Institutional InnovationGeorge Roberts
 
Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?PEDAGOGY.IR
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaEnno Meijers
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystemMaryann Martone
 
Digital Repositories in Teaching and Learning (ppt)
Digital Repositories in Teaching and Learning (ppt)Digital Repositories in Teaching and Learning (ppt)
Digital Repositories in Teaching and Learning (ppt)UKOLN, University of Bath
 
Tacit knowledge sharing in virtual teams: is it even possible?
Tacit knowledge sharing in virtual teams:is it even possible?Tacit knowledge sharing in virtual teams:is it even possible?
Tacit knowledge sharing in virtual teams: is it even possible?Amanda Lam
 
De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012Anna De Liddo
 
Semi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termSemi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termPERICLES_FP7
 
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning AnalyticsEurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning AnalyticsSpeakApps Project
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...Prateek Singh
 
A presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxA presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxROHITSHARMA779690
 

Similar to Introduction to irs notes easy way learning (20)

L yuan alt c 3
L yuan alt c 3L yuan alt c 3
L yuan alt c 3
 
ESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking Session
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive Datasets
 
Information Management
Information ManagementInformation Management
Information Management
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
Rise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen RomboutsRise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen Rombouts
 
Synthesising JISC Institutional Innovation
Synthesising JISC Institutional InnovationSynthesising JISC Institutional Innovation
Synthesising JISC Institutional Innovation
 
Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?Pedagogical theory for e-Learning Design: From ideals to reality?
Pedagogical theory for e-Learning Design: From ideals to reality?
 
Competitive & Saleable E-Content for Philippine Libraries
Competitive & Saleable E-Content for Philippine LibrariesCompetitive & Saleable E-Content for Philippine Libraries
Competitive & Saleable E-Content for Philippine Libraries
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL India
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 
Digital Repositories in Teaching and Learning (ppt)
Digital Repositories in Teaching and Learning (ppt)Digital Repositories in Teaching and Learning (ppt)
Digital Repositories in Teaching and Learning (ppt)
 
Universities Without Borders
Universities Without BordersUniversities Without Borders
Universities Without Borders
 
Tacit knowledge sharing in virtual teams: is it even possible?
Tacit knowledge sharing in virtual teams:is it even possible?Tacit knowledge sharing in virtual teams:is it even possible?
Tacit knowledge sharing in virtual teams: is it even possible?
 
De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012
 
Semi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termSemi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-term
 
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning AnalyticsEurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
Eurocall2014 SpeakApps Presentation - SpeakApps and Learning Analytics
 
ICTConcepts.ppt
ICTConcepts.pptICTConcepts.ppt
ICTConcepts.ppt
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
A presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxA presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptx
 

Recently uploaded

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Recently uploaded (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Introduction to irs notes easy way learning

  • 1. FOUNDATIONS OF INFORMATION RETRIEVAL Lynda Tamine-Lechani lynda.lechani@irit.fr https://www.irit.fr/~Lynda.Tamine-Lechani/
  • 2. FOUNDATIONS OF INFORMATION RETRIEVAL 2 •Course description Study the theory, design, and implementation of information retrieval systems from the perspectives of: ü information representation: focus on texts ü theoretical information retrieval model: focus on language model and learning- based models ü Performance evaluation: focus on system-centred evaluation •Learning objectives ü Index and represent textual information; ü Recall and discuss well-known information retrieval models; ü Design, implement and evaluate the performance of information retrieval systems using retrieval algorithms and models discussed in class. © L. Tamine-Lechani
  • 3. FOUNDATIONS OF INFORMATION RETRIEVAL 3 •Organization o 12H course, 6H tutorial: Lynda Tamine-Lechani o 10H hands-on work: Jesus-Lovon Melgajero, José G. Moréno and Lynda Tamine- Lechani •Prerequisites o Python programming o Basics in probability and statistics •Course material o Copies of the lecture slides are posted on the MOODLE site o Book and readings references are provided •Grading o 1st session üHands-on experience with techniques discussed in class: assignment of 30% of the final score üFinal written exam in class: assignment of 70% of the final score o 2nd session üFinal written exam in class: assignment of 100% of the final score © L. Tamine-Lechani
  • 4. FOUNDATIONS OF INFORMATION RETRIEVAL 4 •Schedule Lecture Topic 1 Course Introduction; Text indexing, vector semantics 2 Static embeddings, contextual embeddings 3 Infomation retrieval (IR) models: query reformulation, learning to rank 4 Tutorial 1: Text indexing and representation 5 Neural models for IR 6 Page Rank, Performance evaluation 7 Tutorial 2: information retrieval techniques and models 8 Question answering systems and chatbots 9 Tutorial 3: performance evaluation © L. Tamine-Lechani
  • 5. Information retrieval: Algorithms and Heuristics David A. Grossamnn, Ophir Frieder, Kluwer Academic Publishers, 1998 Modern information retrieval R.B Yates, R. Neto, ACM Press Addisson Wesley, 1999 Recherche d'information, applications, modèles et algorithmes M.R Amini et E. Gaussier, Eyrolles 2012 Search engines in practice B. Croft, D. Metzler, T. Trohman, Pearson 2010 Books 5 FOUNDATIONS OF INFORMATION RETRIEVAL © L. Tamine-Lechani
  • 6. Calvin Mooers 1951 : Information retrieval (IR) is the name for the process or method whereby a prospective user of information is able to convert his need for information into an actual list of citations to documents in storage containing information useful to him. .. Information retrieval is crucial to documentation and organization of knowledge". (Mooers, 1951, p. 25) Salton, 1980 : Information retrieval systems are designed to help analyze and describe the items stored in a file, to organize them and search among them, and finally to retrieve them in response to a user's query. Designing and using a retrieval system involves four major activities: information analysis, information organization and search, query formulation, and information retrieval and dissemination. Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. 6 Information Retrieval (IR): definitions Introduction © L. Tamine-Lechani
  • 7. ...Yes, but also refer to: - Search in digital libraries - Search in campany corpus - Search in specialized corpus (health, legal, biological –related resources) - Search for a location - Search for answers - Recommend items - Summarize reviews - ... 7 Definitions refer to ....well-known search engines ? Introduction © L. Tamine-Lechani
  • 8. • Wide-variety of search systems, interaction environments o Web search engines o Conversational agents o E commmerce: Amazon, AirBnb, ... o Media recommendation: Netflix, Spotify, ... 8 From search to conversation Search and navigate on maps Cross-device search Heatmaps on SERP ...with voice only! ...and different forms of user-system interactions Introduction © L. Tamine-Lechani
  • 9. (Web) search systems that select from a corpus of texts documents those that are relevant to a user information need experssed by the user using a query. 9 Focus in this lecture Introduction Query Documents Selection Information need Corpus System's answer to the query © L. Tamine-Lechani
  • 10. 10 Basic notions: Document Introduction • Document: information unit being searched - Document - Paragraph - Phrase - Structure unit (section, chapter,...) •Different views 1. Introduction Information retrieval.... 2. Basics The notion of query… Date : 15/01/2013 Author : Albert Langue : Français …. This course introdues the basics of information retrieval Content Metadata Structure © L. Tamine-Lechani
  • 11. • Different media Text (monomedia) Image Multimedia Video 11 Basic notions: Document Introduction © L. Tamine-Lechani
  • 13. • What the user seeks for: an information need • How the user expresses his information need : a query In this course: a query is a list of keywords 13 Basic notions: information need, query Introduction © L. Tamine-Lechani
  • 14. • A key concept in information retrieval A document is relevant if it matches the information need. Numerous types of relevance: o Topical (aboutness) relevance: the document covers the query topic o Situational relevance: the document matches the user's situation (e.g., task, location, ...) o Cognitive relevance : the documents fits with the user's knowledge state o ... and numerous criteria of relevance: - Novelty - Fresheness - Language - Specificity - Trust - ... The main focus in this course is topical relevance: useful and "easy" to define and to measure, but it does not cover everything related to relevance 14 Basic notions: Relevance Introduction © L. Tamine-Lechani
  • 15. 15 What makes information retrieval challenging ? Introduction © L. Tamine-Lechani © NIST (TREC)
  • 16. 16 What makes information retrieval challenging ? Introduction © L. Tamine-Lechani •Deluge of information o Large-scale information o Often little ratio of information is relevant and/or useful for a query o Information is noisy o Information is not always trusty o Hetrogeneous information forms and sources o ...
  • 17. Source : Infographic Increasing volumes of information available on increasing information sources: social applications mobile devices, sensors, ... 2003 Réseaux sociaux 2001 Wiki 1998 Recherche 1995 Annuaire 1994 E-commerce 1990 WWW 1972 ARPANET 1999 Blogs 2001 Wiki 2003 Réseaux sociaux 17 Information is every where Introduction © L. Tamine-Lechani
  • 18. 18 Focus on Web 3.0: The digital world today Introduction © L.Tamine-Lechani •1st place: platforms for publication/sharing of texts (mostly), newsletters, podcasts, videos, photos, o Wikipedia, Blogger, Google Poadcast, youtube, Flickr, TripAdvisor, ... •2nd place: platforms for messaging o Facebook, Messenger, telegram,... •3rd place: platforms for conversations o Quora, StackExchange, Reddit, Facebook groups, Google Groups, ... •4th place: platforms for collaboration o Facebook workplace, TeamWork, Chatter, ... image credit https://fredcavazza.net/2021/05/06/panorama-des- medias-sociaux-2021/
  • 19. 19 2003 Réseaux sociaux 2003 Réseaux sociaux Source : https://datastudio.google.com/embed/reporting/1sImC_rjeWqNXdgQt5MtmrQMbH44qFjtA/page/1fzh • Google processes in 2020 more than 7 milliards of queries every day among which 15% have never been submitted before (new queries) • The number of users in the world is estimated as 2.77 milliards on social media, 2.46 milliards in 2017 • 51%, or more than 240 milliards of dollars, de tout l'argent publicitaire dépensé dans le monde en 2019 seront basés sur les médias numériques. • Les ventes en ligne devraient atteindre 3.45 billions de dollars de ventes en 2020 • 47.3% de la population mondiale devrait acheter en ligne en 2020. Statitistics on usage of information access systems 2014-2020 Some statistics 2020-2021: information and users Introduction image credit https://www.internetlivestats.com/ • Users and information shared in live 2021
  • 20. 20 What makes information retrieval challenging? Introduction © L. Tamine-Lechani •Information needs are ambiguous oQueries are generally short, ambiguous oThe matching between queries and intents is M-N Roi lion 1 Queryà N intents - Master UPS Intelligence artificielle - Université paul Sabatier IA - Formation IA Toulouse - Matsre IAFA .. M Queriesà 1 intent
  • 21. 21 What makes information retrieval challenging ? Introduction © L. Tamine-Lechani •Relevance is subjective o Relevance is subjective ü User-dependent ü Situation-dependant ü Topicality is often the threshold relevance •Relevance faces vocabulary mismatch between queries and documents o Matching as word overlap: is it really semantic overlap? Q: "most jurisdictions exercise a high degree of regulation over banks" [financial institution] D1: "I have been stolen when I withdrew the money from the bank" [Building] D2: "fish lined the bank of the stream" [The land alongside or sloping down to a river or lake] o Matching is not exact, rough matching between queries and documents Q: "Presidential Elections in France" D1 : "Election campaign is running" [relevant, but missing ‘presidential’ and ‘France’] D2 : "Macron, the President of France is attending COP21" [irrelevant, and matching ‘France’ and ‘President’]
  • 22. 22 What makes information retrieval challenging ? Introduction © L. Tamine-Lechani •Queries and documents vary in length oModels must handle variable length input oRelevant documents have irrelevant content Q: "variant Omicron symptomes" D: "Le variant Omicron a déjà atteint plusieurs patients en France après avoir fait son apparition en Afrique du Sud. S'il semble plus transmissible, il ne serait pas plus virulent. Mais quels sont ses symptômes ? Le 26 novembre dernier, l’Organisation mondiale de la Santé (OMS) qualifiait le variant Omicron, nouvellement apparu en Afrique du Sud, de « préoccupant » sur la base de sa rapidité de propagation. De nombreux cas commencent depuis à émerger à travers le monde, dont quelques-uns en France. Mais concernant sa dangerosité ou ses symptômes, le grand flou règne. Alors, que savons-nous ? En se basant sur les situations en Afrique du Sud et au Royaume-Unis, l'OMS a indiqué dans une mise au point technique que le variant Omicron semble se propager plus vite que Delta. Néanmoins, contrairement à ce dernier, les symptômes seraient moins sévères. Pas de perte de goût ou d’odorat Interrogée par la BBC, le Dr Angelique Coetzee, présidente de l’Association médicale sud-africaine, qui fut l’une des premières à être confrontée à Omicron, a indiqué que les symptômes qu’elle a pu observer semblent moins spécifiques que ceux de la maladie originelle. « Cela a débuté avec un patient de sexe masculin âgé d’environ 33 ans », a-t-elle expliqué lors de cet entretien. « Il a déclaré qu’il était extrêmement fatigué ces derniers jours et se plaignait de courbatures et de légers maux de tête. » Mais l’homme n’a pas perdu son sens du goût ni celui de l’odorat ; il avait la « gorge qui le grattait », et non pas un mal de gorge et une toux comme avec les variants précédents. Elle a également déclaré que les autres patients auscultés le même jour « présentaient les mêmes symptômes bénins ". Source: https://www.leprogres.fr/magazine-sante/2021/12/13/variant-omicron-quels-sont-les-premiers-symptomes- detectes
  • 23. 23 What makes information retrieval similar vs. different from data retrieval (Databases)? Introduction © L. Tamine-Lechani Information retrieval Data retrieval Information unit Information Data (attribute-value) Query Vague expression of an information need Vague expressio Language of the query Natural language Formel language Matching query-information Approximatif Exact Selected information Information relevant to the query All the data that satifies the query
  • 24. Documents Documents representations Information need Query Selected documents Indexing Expression Matching Feedback 24 Copyright L.Tamine-Lechani The basic process of information retrieval Introduction
  • 25. FOUNDATIONS OF INFORMATION RETRIEVAL 25 • Lecture structure oIntroduction o Chapter 1: Text indexing and representation "How to transform raw texts into machinable representations? Keywords: indexation, words, documents, representation learning of texts o Chapter 2: Information retrieval (IR) models "How to score the relevance of a document as an answer to a user's query?" Keywords: relevance status value, retrieval model o Chapter 3: Performance evaluation of an IR system "How to measure the performance of an information retrieval system?" Keywords: evaluation metrics, test collections o Chapter 4: From question-answering systems to chatbots "How to interact with systems while searching for information?" Keywords: conversation, turn, clarification