SlideShare a Scribd company logo
1 of 33
Introduction to Information Retrieval
Introduction to
Information Retrieval
Hinrich Schütze and Christina Lioma
Lecture 15-2: Learning to Rank
1
Introduction to Information Retrieval
Overview
❶ Learning Boolen Weights
❷ Learning Real-Valued Weights
❸ Rank Learning as Ordinal Regression
2
Introduction to Information Retrieval
Outline
❶ Learning Boolen Weights
❷ Learning Real-Valued Weights
❸ Rank Learning as Ordinal Regression
3
Introduction to Information Retrieval
4
Main Idea
 The aim of term weights (e.g. TF-IDF) is to measure term
salience
 Summing up term weights for a document allows to measure
the relevance of a document to a query, hence to rank the
document
 Think of this as a text classification problem
 Term weights can be learned using training examples that have
been judged
 This methodology falls under a general class of approaches
known as machine learned relevance or learning to rank
4
Introduction to Information Retrieval
5
Learning weights
Main methodology
 Given a set of training examples, each of which is a tuple of: a
query q, a document d, a relevance judgment for d on q
 Simplest case: R(d, q) is either relevant (1) or nonrelevant (0)
 More sophisticated cases: graded relevance judgments
 Learn weights from these examples, so that the learned
scores approximate the relevance judgments in the training
examples
Example? Weighted zone scoring
5
Introduction to Information Retrieval
6
What is weighted zone scoring?
 Given a query and a collection where documents have three
zones (a.k.a. fields): author, title, body
 Weighted zone scoring requires a separate weight for each
zone, e.g. g1, g2, g3
 Not all zones are equally important:
e.g. author ¡ title ¡ body
→ g1 = 0.2, g2 = 0.3, g3 = 0.5 (so that they add up to 1)
 Score for a zone = 1 if the query term occurs in that zone, 0
otherwise (Boolean)
Example
Query term appears in title and body only
Document score: (0.3 ・ 1) + (0.5 ・ 1) = 0.8.
6
Introduction to Information Retrieval
7
Weighted zone scoring: let’s generalise
Given q and d, weighted zone scoring assigns to the pair (q, d) a
score in the interval [0,1] by computing a linear combination of
document zone scores, where each zone contributes a value
 Consider a set of documents, which have l zones
 Let g1, ..., gl ∈ [0, 1], such that gi=1
For 1 ≤ i ≤ l , let si be the Boolean score denoting a match (or
non-match) between q and the ith zone
 E.g. si could be any Boolean function that maps the presence
of query terms in a zone to 0,1
Weighted zone score a.k.a ranked Boolean retrieval
gisi
7

l
i 1


l
i 1
Introduction to Information Retrieval
8
Weighted zone scoring and learning weights
 Weighted zone scoring may be viewed as learning a linear
function of the Boolean match scores contributed by the
various zones
 Bad news: labour-intensive assembly of user-generated
relevance judgments from which to learn the weights
 Especially in a dynamic collection (such as the Web)
 Good news: reduce the problem of learning the weights gi to a
simple optimisation problem
8
Introduction to Information Retrieval
9
Learning weights in weighted zone scoring
 Simple case: let documents have two zones: title, body
 The weighted zone scoring formula we saw before:
gisi (2)
 Given q, d, we want to compute sT (d, q) and sB(d, q),
depending whether the title or body zone of d matches
query q
 We compute a score between 0 and 1 for each (d, q) pair
using sT (d, q) and sB(d, q) by using a constant g ∈ [0, 1]:
score(d, q) = g ・ sT (d, q) + (1 − g) ・ sB(d, q) (3)
9


l
i 1
Introduction to Information Retrieval
10
Learning weights: determine g from training
examples
 Training examples: triples of the form Фj = (dj , qj , r (dj , qj ))
 A given training document dj and a given training query qj are
assessed by a human who decides r (dj , qj ) (either relevant or
nonrelevant)
Example
10
Introduction to Information Retrieval
11
Learning weights: determine g from training
examples
 For each training example Фj we have Boolean values sT (dj, qj)
and sB(dj, qj) that we use to compute a score from:
score(dj, qj) = g ・ sT (dj, qj) + (1 − g) ・ sB(dj, qj) (4)
Example
11
Introduction to Information Retrieval
12
Learning weights
 We compare this computed score (score(dj , qj )) with the
human relevance judgment for the same document-query
pair (dj , qj )
 We quantisize each relevant judgment as 1, and each
nonrelevant judgment as 0
 We define the error of the scoring function with weight g as
ϵ(g,Фj ) = (r (dj , qj ) − score(dj , qj ))2 (5)
 Then, the total error of a set of training examples is given by
ϵ (g,Фj) (6)
 The problem of learning the constant g from the given training
examples then reduces to picking the value of g that
minimises the total error 12

j
Introduction to Information Retrieval
13
Exercise: Find the value of g that minimises total
error ϵ
❶Quantisize: relevant as 1, and nonrelevant as 0
❷Compute score:
score(dj, qj) = g ・ sT ( dj, qj ) + (1 − g ) ・ sB( dj, qj )
❸Compute total error: j ϵ ( g,Фj ), where
ϵ ( g,Фj ) = ( r ( dj , qj ) − score ( dj , qj ))2
❹Pick the value of g that minimises the total error
13

Training example
Introduction to Information Retrieval
14
Exercise solution
❶Compute score score(dj , qj )
score(d1, q1) = g ・ 1 + (1 − g) ・ 1 = g + 1 − g = 1
score(d2, q2) = g ・ 0 + (1 − g) ・ 1 = 0 + 1 − g = 1 − g
score(d3, q3) = g ・ 0 + (1 − g) ・ 1 = 0 + 1 − g = 1 − g
score(d4, q4) = g ・ 0 + (1 − g) ・ 0 = 0 + 0 = 0
score(d5, q5) = g ・ 1 + (1 − g) ・ 1 = g + 1 − g = 1
score(d6, q6) = g ・ 0 + (1 − g) ・ 1 = 0 + 1 − g = 1 − g
score(d7, q7) = g ・ 1 + (1 − g) ・ 0 = g + 0 = g
❷ Compute total error j ϵ ( g,Фj )
(1− 1)2 +(0− 1+g)2 +(1− 1+g)2 +(0 −0)2 +(1 −1)2 +
(1 − 1 + g)2 + (0 − g)2
❸ Pick the value of g that minimises the total error
Solve by ranging g between 0.1 - 0.9 and pick the g value
that minimises the error 14

Introduction to Information Retrieval
Outline
❶ Learning Boolen Weights
❷ Learning Real-Valued Weights
❸ Rank Learning as Ordinal Regression
15
Introduction to Information Retrieval
16
A simple example of machine learned scoring
 So far, we considered a case where we had to combine
Boolean indicators of relevance
 Now consider more general factors that go beyond Boolean
functions of query term presence in document zones
16
Introduction to Information Retrieval
17
A simple example of machine learned scoring
 Setting: the scoring function is a linear combination of two
factors:
❶1 the vector space cosine similarity between query and
document (denoted α)
❷2 the minimum window width within which the query
terms lie (denoted ω)
 query term proximity is often very indicative of topical
relevance
 query term proximity gives an implementation of implicit
phrases
 Thus, we have one factor that depends on the statistics of
query terms in the document as a bag of words, and another
that depends on proximity weighting 17
Introduction to Information Retrieval
18
A simple example of machine learned scoring
Given a set of training examples r (dj , qj ). For each example we
compute:
 vector space cosine similarity α
 window width ω
The result is a training set, with two real-valued features (α, ω)
Example
18
Introduction to Information Retrieval
19
The same thing seen graphically on a 2-D plane
19
Introduction to Information Retrieval
20
A simple example of machine learned scoring
 Again, let’s say: relevant = 1 and nonrelevant = 0
 We now seek a scoring function that combines the values of
the features to generate a value that is (close to) 0 or 1
 We wish this function to be in agreement with our set of
training examples as much as possible
Without loss of generality, a linear classifier will use a linear
combination of features of the form:
Score(d, q) = Score(α, ω) = aα + bω + c, (7)
with the coefficients a, b, c to be learned from the training data
20
Introduction to Information Retrieval
21
A simple example of machine learned scoring
 The function Score(α, ω)
represents a plane “hanging
above” the figure
 Ideally this plane assumes
values close to 1 above the
points marked R, and values
close to 0 above the points
marked N
21
Introduction to Information Retrieval
22
A simple example of machine learned scoring
 We use thresholding: for a
query, document pair, we pick
a value θ
 if Score(α, ω) > θ, we declare
the document relevant,
otherwise we declare it
nonrelevant
 As we know from SVMs, all
points that satisfy Score(α, ω)
= θ form a line (dashed here)
→ linear classifier that
separates relevant from
nonrelevant instances
22
Introduction to Information Retrieval
23
A simple example of machine learned scoring
Thus, the problem of making a binary relevant/nonrelevant
judgment given training examples turns into one of learning the
dashed line in the figure separating relevant from nonrelevant
training examples
 In the α-ω plane, this line can be written as a linear equation
involving α and ω, with two parameters (slope and intercept)
 We have already seen linear classification methods for choosing
this line
 Provided we can build a sufficiently rich collection of training
samples, we can thus altogether avoid hand-tuning score
functions
 Bottleneck: maintaining a suitably representative set of training
examples, whose relevance assessments must be made by
experts
23
Introduction to Information Retrieval
24
Result ranking by machine learning
 The above ideas can be readily generalized to functions of many
more than two variables
 In addition to cosine similarity and query term window, there
are lots of other indicators of relevance, e.g. PageRank-style
measures, document age, zone contributions, document length,
etc.
 If these measures can be calculated for a training document
collection with relevance judgments, any number of such
measures can be used to train a machine learning classifier
24
Introduction to Information Retrieval
25
134 Features released from Microsoft Research on
16 June 2010
http://research.microsoft.com/en-us/projects/mslr/feature.aspx
Zones: body, anchor, title, url, whole document
Features: query term number, query term ratio, stream length,
idf, sum of term frequency, min of term frequency, max of term
frequency, mean of term frequency, variance of term frequency,
sum of stream length normalized term frequency, min of stream
length normalized term frequency, max of stream length
normalized term frequency, mean of stream length normalized term
frequency, variance of stream length normalized term frequency,
sum of tf*idf, min of tf*idf, max of tf*idf, mean of tf*idf, variance
of tf*idf, boolean model, vector space model, BM25, LMIR.ABS,
LMIR.DIR, LMIR.JM, number of slash in url, length of url, inlink
number, outlink number, PageRank, SiteRank, QualityScore,
QualityScore2, query-url click count, url click count, url dwell time.
25
Introduction to Information Retrieval
26
Result ranking by machine learning
However, approaching IR ranking like this is not necessarily the
right way to think about the problem
 Statisticians normally first divide problems into classification
problems (where a categorical variable is predicted) versus
regression problems (where a real number is predicted)
 In between is the specialised field of ordinal regression
where a ranking is predicted
 Machine learning for ad hoc retrieval is most properly
thought of as an ordinal regression problem, where the goal
is to rank a set of documents for a query, given training data
of the same sort
26
Introduction to Information Retrieval
Outline
❶ Learning Boolen Weights
❷ Learning Real-Valued Weights
❸ Rank Learning as Ordinal Regression
27
Introduction to Information Retrieval
28
IR ranking as ordinal regression
Why formulate IR ranking as an ordinal regression problem?
 because documents can be evaluated relative to other candidate
documents for the same query, rather than having to be mapped to a
global scale of goodness
 hence, the problem space weakens, since just a ranking is required
rather than an absolute measure of relevance
Especially germane in web search, where the ranking at the very
top of the results list is exceedingly important
Structural SVM for IR ranking
Such work has been pursued using the structural SVM framework,
where the class being predicted is a ranking of results for a query
28
Introduction to Information Retrieval
29
The construction of a ranking SVM
 We begin with a set of judged queries
 For each training query q, we have a set of documents
returned in response to the query, which have been totally
ordered by a person for relevance to the query
 We construct a vector of features ψj = ψ(dj , q) for each
document/query pair, using features such as those discussed,
and many more
 For two documents di and dj , we then form the vector of
feature differences:
Ф(di , dj , q) = ψ(di , q) − ψ(dj , q) (8)
29
Introduction to Information Retrieval
30
The construction of a ranking SVM
 By hypothesis, one of di and dj has been judged more relevant
 If di is judged more relevant than dj , denoted di ≺ dj (di should
precede dj in the results ordering), then we will assign the
vector Ф(di , dj , q) the class yijq = +1; otherwise −1
 The goal then is to build a classifier which will return
wT Ф(di , dj , q) > 0 iff di ≺ dj (9)
30

Introduction to Information Retrieval
31
Ranking SVM
 This approach has been used to build ranking functions which
outperform standard hand-built ranking functions in IR
evaluations on standard data sets
 See the references for papers that present such results (page
316)
31
Introduction to Information Retrieval
32
Note: Linear vs. nonlinear weighting
 Both of the methods that we’ve seen use a linear weighting of
document features that are indicators of relevance, as has
most work in this area
 Much of traditional IR weighting involves nonlinear scaling of
basic measurements (such as log-weighting of term frequency,
or idf)
 At the present time, machine learning is very good at
producing optimal weights for features in a linear
combination, but it is not good at coming up with good
nonlinear scalings of basic measurements
 This area remains the domain of human feature engineering
32
Introduction to Information Retrieval
33
Recap
 The idea of learning ranking functions has been around for a
number of years, but it is only very recently that sufficient
machine learning knowledge, training document collections,
and computational power have come together to make this
method practical and exciting
 While skilled humans can do a very good job at defining
ranking functions by hand, hand tuning is difficult, and it has
to be done again for each new document collection and class
of users
33

More Related Content

Similar to learning boolean weight learning real valued weights rank learning as ordinal regression (2).pptx

Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfShiwani Gupta
 
01 EC 7311-Module IV.pptx
01 EC 7311-Module IV.pptx01 EC 7311-Module IV.pptx
01 EC 7311-Module IV.pptxVSUDHEER4
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptxQingsong Guo
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
Lecture 2 data structures and algorithms
Lecture 2 data structures and algorithmsLecture 2 data structures and algorithms
Lecture 2 data structures and algorithmsAakash deep Singhal
 
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...Kensuke Mitsuzawa
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihmSajid Marwat
 
Learning to Grow Structured Visual Summaries for Document Collections
Learning to Grow Structured Visual Summaries for Document CollectionsLearning to Grow Structured Visual Summaries for Document Collections
Learning to Grow Structured Visual Summaries for Document CollectionsDaniil Mirylenka
 
Optimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesOptimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesEmmanuel Hadoux
 
Learning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwiseLearning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwiseHasan H Topcu
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
Text classification
Text classificationText classification
Text classificationFraboni Ec
 
Text classification
Text classificationText classification
Text classificationDavid Hoen
 
Text classification
Text classificationText classification
Text classificationJames Wong
 

Similar to learning boolean weight learning real valued weights rank learning as ordinal regression (2).pptx (20)

Max Entropy
Max EntropyMax Entropy
Max Entropy
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdf
 
01 EC 7311-Module IV.pptx
01 EC 7311-Module IV.pptx01 EC 7311-Module IV.pptx
01 EC 7311-Module IV.pptx
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
Lecture 2 data structures and algorithms
Lecture 2 data structures and algorithmsLecture 2 data structures and algorithms
Lecture 2 data structures and algorithms
 
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
Time complexity.ppt
Time complexity.pptTime complexity.ppt
Time complexity.ppt
 
1.algorithms
1.algorithms1.algorithms
1.algorithms
 
Learning to Grow Structured Visual Summaries for Document Collections
Learning to Grow Structured Visual Summaries for Document CollectionsLearning to Grow Structured Visual Summaries for Document Collections
Learning to Grow Structured Visual Summaries for Document Collections
 
Optimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesOptimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processes
 
Learning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwiseLearning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwise
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Approximating the Bell-shaped Function based on Combining Hedge Algebras and ...
Approximating the Bell-shaped Function based on Combining Hedge Algebras and ...Approximating the Bell-shaped Function based on Combining Hedge Algebras and ...
Approximating the Bell-shaped Function based on Combining Hedge Algebras and ...
 
Kaggle KDD Cup Report
Kaggle KDD Cup ReportKaggle KDD Cup Report
Kaggle KDD Cup Report
 
Text classification
Text classificationText classification
Text classification
 
Text classification
Text classificationText classification
Text classification
 
Text classification
Text classificationText classification
Text classification
 

Recently uploaded

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

learning boolean weight learning real valued weights rank learning as ordinal regression (2).pptx

  • 1. Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 15-2: Learning to Rank 1
  • 2. Introduction to Information Retrieval Overview ❶ Learning Boolen Weights ❷ Learning Real-Valued Weights ❸ Rank Learning as Ordinal Regression 2
  • 3. Introduction to Information Retrieval Outline ❶ Learning Boolen Weights ❷ Learning Real-Valued Weights ❸ Rank Learning as Ordinal Regression 3
  • 4. Introduction to Information Retrieval 4 Main Idea  The aim of term weights (e.g. TF-IDF) is to measure term salience  Summing up term weights for a document allows to measure the relevance of a document to a query, hence to rank the document  Think of this as a text classification problem  Term weights can be learned using training examples that have been judged  This methodology falls under a general class of approaches known as machine learned relevance or learning to rank 4
  • 5. Introduction to Information Retrieval 5 Learning weights Main methodology  Given a set of training examples, each of which is a tuple of: a query q, a document d, a relevance judgment for d on q  Simplest case: R(d, q) is either relevant (1) or nonrelevant (0)  More sophisticated cases: graded relevance judgments  Learn weights from these examples, so that the learned scores approximate the relevance judgments in the training examples Example? Weighted zone scoring 5
  • 6. Introduction to Information Retrieval 6 What is weighted zone scoring?  Given a query and a collection where documents have three zones (a.k.a. fields): author, title, body  Weighted zone scoring requires a separate weight for each zone, e.g. g1, g2, g3  Not all zones are equally important: e.g. author ¡ title ¡ body → g1 = 0.2, g2 = 0.3, g3 = 0.5 (so that they add up to 1)  Score for a zone = 1 if the query term occurs in that zone, 0 otherwise (Boolean) Example Query term appears in title and body only Document score: (0.3 ・ 1) + (0.5 ・ 1) = 0.8. 6
  • 7. Introduction to Information Retrieval 7 Weighted zone scoring: let’s generalise Given q and d, weighted zone scoring assigns to the pair (q, d) a score in the interval [0,1] by computing a linear combination of document zone scores, where each zone contributes a value  Consider a set of documents, which have l zones  Let g1, ..., gl ∈ [0, 1], such that gi=1 For 1 ≤ i ≤ l , let si be the Boolean score denoting a match (or non-match) between q and the ith zone  E.g. si could be any Boolean function that maps the presence of query terms in a zone to 0,1 Weighted zone score a.k.a ranked Boolean retrieval gisi 7  l i 1   l i 1
  • 8. Introduction to Information Retrieval 8 Weighted zone scoring and learning weights  Weighted zone scoring may be viewed as learning a linear function of the Boolean match scores contributed by the various zones  Bad news: labour-intensive assembly of user-generated relevance judgments from which to learn the weights  Especially in a dynamic collection (such as the Web)  Good news: reduce the problem of learning the weights gi to a simple optimisation problem 8
  • 9. Introduction to Information Retrieval 9 Learning weights in weighted zone scoring  Simple case: let documents have two zones: title, body  The weighted zone scoring formula we saw before: gisi (2)  Given q, d, we want to compute sT (d, q) and sB(d, q), depending whether the title or body zone of d matches query q  We compute a score between 0 and 1 for each (d, q) pair using sT (d, q) and sB(d, q) by using a constant g ∈ [0, 1]: score(d, q) = g ・ sT (d, q) + (1 − g) ・ sB(d, q) (3) 9   l i 1
  • 10. Introduction to Information Retrieval 10 Learning weights: determine g from training examples  Training examples: triples of the form Фj = (dj , qj , r (dj , qj ))  A given training document dj and a given training query qj are assessed by a human who decides r (dj , qj ) (either relevant or nonrelevant) Example 10
  • 11. Introduction to Information Retrieval 11 Learning weights: determine g from training examples  For each training example Фj we have Boolean values sT (dj, qj) and sB(dj, qj) that we use to compute a score from: score(dj, qj) = g ・ sT (dj, qj) + (1 − g) ・ sB(dj, qj) (4) Example 11
  • 12. Introduction to Information Retrieval 12 Learning weights  We compare this computed score (score(dj , qj )) with the human relevance judgment for the same document-query pair (dj , qj )  We quantisize each relevant judgment as 1, and each nonrelevant judgment as 0  We define the error of the scoring function with weight g as ϵ(g,Фj ) = (r (dj , qj ) − score(dj , qj ))2 (5)  Then, the total error of a set of training examples is given by ϵ (g,Фj) (6)  The problem of learning the constant g from the given training examples then reduces to picking the value of g that minimises the total error 12  j
  • 13. Introduction to Information Retrieval 13 Exercise: Find the value of g that minimises total error ϵ ❶Quantisize: relevant as 1, and nonrelevant as 0 ❷Compute score: score(dj, qj) = g ・ sT ( dj, qj ) + (1 − g ) ・ sB( dj, qj ) ❸Compute total error: j ϵ ( g,Фj ), where ϵ ( g,Фj ) = ( r ( dj , qj ) − score ( dj , qj ))2 ❹Pick the value of g that minimises the total error 13  Training example
  • 14. Introduction to Information Retrieval 14 Exercise solution ❶Compute score score(dj , qj ) score(d1, q1) = g ・ 1 + (1 − g) ・ 1 = g + 1 − g = 1 score(d2, q2) = g ・ 0 + (1 − g) ・ 1 = 0 + 1 − g = 1 − g score(d3, q3) = g ・ 0 + (1 − g) ・ 1 = 0 + 1 − g = 1 − g score(d4, q4) = g ・ 0 + (1 − g) ・ 0 = 0 + 0 = 0 score(d5, q5) = g ・ 1 + (1 − g) ・ 1 = g + 1 − g = 1 score(d6, q6) = g ・ 0 + (1 − g) ・ 1 = 0 + 1 − g = 1 − g score(d7, q7) = g ・ 1 + (1 − g) ・ 0 = g + 0 = g ❷ Compute total error j ϵ ( g,Фj ) (1− 1)2 +(0− 1+g)2 +(1− 1+g)2 +(0 −0)2 +(1 −1)2 + (1 − 1 + g)2 + (0 − g)2 ❸ Pick the value of g that minimises the total error Solve by ranging g between 0.1 - 0.9 and pick the g value that minimises the error 14 
  • 15. Introduction to Information Retrieval Outline ❶ Learning Boolen Weights ❷ Learning Real-Valued Weights ❸ Rank Learning as Ordinal Regression 15
  • 16. Introduction to Information Retrieval 16 A simple example of machine learned scoring  So far, we considered a case where we had to combine Boolean indicators of relevance  Now consider more general factors that go beyond Boolean functions of query term presence in document zones 16
  • 17. Introduction to Information Retrieval 17 A simple example of machine learned scoring  Setting: the scoring function is a linear combination of two factors: ❶1 the vector space cosine similarity between query and document (denoted α) ❷2 the minimum window width within which the query terms lie (denoted ω)  query term proximity is often very indicative of topical relevance  query term proximity gives an implementation of implicit phrases  Thus, we have one factor that depends on the statistics of query terms in the document as a bag of words, and another that depends on proximity weighting 17
  • 18. Introduction to Information Retrieval 18 A simple example of machine learned scoring Given a set of training examples r (dj , qj ). For each example we compute:  vector space cosine similarity α  window width ω The result is a training set, with two real-valued features (α, ω) Example 18
  • 19. Introduction to Information Retrieval 19 The same thing seen graphically on a 2-D plane 19
  • 20. Introduction to Information Retrieval 20 A simple example of machine learned scoring  Again, let’s say: relevant = 1 and nonrelevant = 0  We now seek a scoring function that combines the values of the features to generate a value that is (close to) 0 or 1  We wish this function to be in agreement with our set of training examples as much as possible Without loss of generality, a linear classifier will use a linear combination of features of the form: Score(d, q) = Score(α, ω) = aα + bω + c, (7) with the coefficients a, b, c to be learned from the training data 20
  • 21. Introduction to Information Retrieval 21 A simple example of machine learned scoring  The function Score(α, ω) represents a plane “hanging above” the figure  Ideally this plane assumes values close to 1 above the points marked R, and values close to 0 above the points marked N 21
  • 22. Introduction to Information Retrieval 22 A simple example of machine learned scoring  We use thresholding: for a query, document pair, we pick a value θ  if Score(α, ω) > θ, we declare the document relevant, otherwise we declare it nonrelevant  As we know from SVMs, all points that satisfy Score(α, ω) = θ form a line (dashed here) → linear classifier that separates relevant from nonrelevant instances 22
  • 23. Introduction to Information Retrieval 23 A simple example of machine learned scoring Thus, the problem of making a binary relevant/nonrelevant judgment given training examples turns into one of learning the dashed line in the figure separating relevant from nonrelevant training examples  In the α-ω plane, this line can be written as a linear equation involving α and ω, with two parameters (slope and intercept)  We have already seen linear classification methods for choosing this line  Provided we can build a sufficiently rich collection of training samples, we can thus altogether avoid hand-tuning score functions  Bottleneck: maintaining a suitably representative set of training examples, whose relevance assessments must be made by experts 23
  • 24. Introduction to Information Retrieval 24 Result ranking by machine learning  The above ideas can be readily generalized to functions of many more than two variables  In addition to cosine similarity and query term window, there are lots of other indicators of relevance, e.g. PageRank-style measures, document age, zone contributions, document length, etc.  If these measures can be calculated for a training document collection with relevance judgments, any number of such measures can be used to train a machine learning classifier 24
  • 25. Introduction to Information Retrieval 25 134 Features released from Microsoft Research on 16 June 2010 http://research.microsoft.com/en-us/projects/mslr/feature.aspx Zones: body, anchor, title, url, whole document Features: query term number, query term ratio, stream length, idf, sum of term frequency, min of term frequency, max of term frequency, mean of term frequency, variance of term frequency, sum of stream length normalized term frequency, min of stream length normalized term frequency, max of stream length normalized term frequency, mean of stream length normalized term frequency, variance of stream length normalized term frequency, sum of tf*idf, min of tf*idf, max of tf*idf, mean of tf*idf, variance of tf*idf, boolean model, vector space model, BM25, LMIR.ABS, LMIR.DIR, LMIR.JM, number of slash in url, length of url, inlink number, outlink number, PageRank, SiteRank, QualityScore, QualityScore2, query-url click count, url click count, url dwell time. 25
  • 26. Introduction to Information Retrieval 26 Result ranking by machine learning However, approaching IR ranking like this is not necessarily the right way to think about the problem  Statisticians normally first divide problems into classification problems (where a categorical variable is predicted) versus regression problems (where a real number is predicted)  In between is the specialised field of ordinal regression where a ranking is predicted  Machine learning for ad hoc retrieval is most properly thought of as an ordinal regression problem, where the goal is to rank a set of documents for a query, given training data of the same sort 26
  • 27. Introduction to Information Retrieval Outline ❶ Learning Boolen Weights ❷ Learning Real-Valued Weights ❸ Rank Learning as Ordinal Regression 27
  • 28. Introduction to Information Retrieval 28 IR ranking as ordinal regression Why formulate IR ranking as an ordinal regression problem?  because documents can be evaluated relative to other candidate documents for the same query, rather than having to be mapped to a global scale of goodness  hence, the problem space weakens, since just a ranking is required rather than an absolute measure of relevance Especially germane in web search, where the ranking at the very top of the results list is exceedingly important Structural SVM for IR ranking Such work has been pursued using the structural SVM framework, where the class being predicted is a ranking of results for a query 28
  • 29. Introduction to Information Retrieval 29 The construction of a ranking SVM  We begin with a set of judged queries  For each training query q, we have a set of documents returned in response to the query, which have been totally ordered by a person for relevance to the query  We construct a vector of features ψj = ψ(dj , q) for each document/query pair, using features such as those discussed, and many more  For two documents di and dj , we then form the vector of feature differences: Ф(di , dj , q) = ψ(di , q) − ψ(dj , q) (8) 29
  • 30. Introduction to Information Retrieval 30 The construction of a ranking SVM  By hypothesis, one of di and dj has been judged more relevant  If di is judged more relevant than dj , denoted di ≺ dj (di should precede dj in the results ordering), then we will assign the vector Ф(di , dj , q) the class yijq = +1; otherwise −1  The goal then is to build a classifier which will return wT Ф(di , dj , q) > 0 iff di ≺ dj (9) 30 
  • 31. Introduction to Information Retrieval 31 Ranking SVM  This approach has been used to build ranking functions which outperform standard hand-built ranking functions in IR evaluations on standard data sets  See the references for papers that present such results (page 316) 31
  • 32. Introduction to Information Retrieval 32 Note: Linear vs. nonlinear weighting  Both of the methods that we’ve seen use a linear weighting of document features that are indicators of relevance, as has most work in this area  Much of traditional IR weighting involves nonlinear scaling of basic measurements (such as log-weighting of term frequency, or idf)  At the present time, machine learning is very good at producing optimal weights for features in a linear combination, but it is not good at coming up with good nonlinear scalings of basic measurements  This area remains the domain of human feature engineering 32
  • 33. Introduction to Information Retrieval 33 Recap  The idea of learning ranking functions has been around for a number of years, but it is only very recently that sufficient machine learning knowledge, training document collections, and computational power have come together to make this method practical and exciting  While skilled humans can do a very good job at defining ranking functions by hand, hand tuning is difficult, and it has to be done again for each new document collection and class of users 33