SlideShare a Scribd company logo
 Is a document simply a sequence of words?
 Many structural components like authors, title, date of publication …..
 Metadata – data about documents
 Fields – document features where possible values are finite. Example – dates,
ISBN
 Zones – document features whose content can be arbitrary text fields. Example –
title, abstract
A user may specify requirements on fields and zones
One parametric index for each zone/field
Dictionary comes from a fixed vocabulary
Separate inverted index is build for each zone of the document
 Dictionary structure whatever vocabulary stems from the text of that zone
 Advantages:
 Reduced size of the dictionary
 Efficient query answering using weighted zone scoring
 Different field/zones have different importance in evaluating how a document
matches a query
 For a query q and a document d, weighted zone scoring assigns a pair to (q, d)
[(query, document)] a score in range [0, 1] by computing a linear combination of
zone scores
 Let each document has l zones. Let g1….gl belongs to [0,1] such that 𝑖=1
𝑙
𝑔𝑖 = 1
 Each field/zone contributes a Boolean value – let si be the Boolean score denoting a
match or absence between q and the ith zone
 The weighted zone is 𝑖=1
𝑙
𝑔𝑖 𝑠𝑖
 Consider the query Shakespeare in a collection in which each document has three
zones: author, title and body
 Boolean score function take the value 1 if the query term Shakespeare is present
in the zone otherwise 0
 Weight score term require three weights gbody, gtitle and gauthor
 Let gbody=0.5, gtitle=0.3 and gauthor=0.2
 Here author zone is least important, title zone is somewhat more and body
contributes the most
 Could have been specified by an expert
 Can be judged editorially
 Each training example is a tuple consisting of a query q and a document d and a
relevance judgment of q on d
 The judgment can be binary
 A judgment score can also be used
 Compute the weights such that the learned scores approximate the relevance
judgments as much as possible
 An optimization problem
 Consider two zones: title and body
 Compute Boolean variables sT(d, q), sB(d, q) depending on the query matching
 Compute a score between 0 and 1 by using the relation:
 Score (d, q) = g sT(d, q) + (1-g)sB(d, q)
 Constant g is determined from a set of training examples µj = (dj, qj, r(dj, qj))
 In each training example, each training document and a training query is accessed by a
human editor who delivers a relevant judgment r(dj, qj).
 For each training example µj ,we have Boolean values sT(d, q) and sB(d, q), that we use to
compute a score
 Error of scoring function
 µ(g, µj) = (r(dj, qj) – score(dj, qj))2
 Total error 𝑗 µ(g, µj)
 Let n01r (n01n) be the numbers of training examples that STitle = 0 and SBody = 1 and the
judgment is relevant (irrelevant). The contribution of those examples that STitle = 0 and
SBody = 1 to the total error is
 [1-(1-g)]2n01r + [0-(1-g)]2n01n
 The total error is (n01r+n10n)g2 + (n10r + n01n)(1-g)2 +n00r + n11n
 By differentiating with respect to g and setting the result to 0, the optimal value of g is

𝒏 𝟏𝟎𝒓
+𝒏 𝟎𝟏𝒏
𝒏 𝟏𝟎𝒓
+𝒏 𝟏𝟎𝒏
+𝒏 𝟎𝟏𝒓
+𝒏 𝟎𝟏𝒏

More Related Content

What's hot

Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
Mahmoud Alfarra
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
guest0edcaf
 
Using SweetSpotSimilarity for Solr Fulltext Indexing
Using SweetSpotSimilarity for Solr Fulltext IndexingUsing SweetSpotSimilarity for Solr Fulltext Indexing
Using SweetSpotSimilarity for Solr Fulltext Indexing
Jay Luker
 
Document clustering for forensic analysis
Document clustering for forensic analysisDocument clustering for forensic analysis
Document clustering for forensic analysis
srinivasa teja
 
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
marxliouville
 
Similarity Measurement Preliminary Results
Similarity  Measurement  Preliminary ResultsSimilarity  Measurement  Preliminary Results
Similarity Measurement Preliminary Resultsxiaojuzheng
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
Sai Kiran Kadam
 
Av33274282
Av33274282Av33274282
Av33274282
IJERA Editor
 
Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositoriesfeiwin
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
Tomonari Masada
 
Cluster
ClusterCluster
Cluster
guest1babda
 
Algorithm and complexity
Algorithm and complexityAlgorithm and complexity
Algorithm and complexity
HABIB FIGA GUYE
 
Java -lec-6
Java -lec-6Java -lec-6
Java -lec-6
Zubair Khalid
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information Retrieval
Vipul Munot
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
Prakash Dubey
 

What's hot (18)

Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Using SweetSpotSimilarity for Solr Fulltext Indexing
Using SweetSpotSimilarity for Solr Fulltext IndexingUsing SweetSpotSimilarity for Solr Fulltext Indexing
Using SweetSpotSimilarity for Solr Fulltext Indexing
 
Document clustering for forensic analysis
Document clustering for forensic analysisDocument clustering for forensic analysis
Document clustering for forensic analysis
 
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
 
Similarity Measurement Preliminary Results
Similarity  Measurement  Preliminary ResultsSimilarity  Measurement  Preliminary Results
Similarity Measurement Preliminary Results
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Av33274282
Av33274282Av33274282
Av33274282
 
Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
 
Bl24409420
Bl24409420Bl24409420
Bl24409420
 
Lect4
Lect4Lect4
Lect4
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Cluster
ClusterCluster
Cluster
 
Algorithm and complexity
Algorithm and complexityAlgorithm and complexity
Algorithm and complexity
 
Java -lec-6
Java -lec-6Java -lec-6
Java -lec-6
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information Retrieval
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 

Similar to Scoring, term weighting and the vector space

Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Dwaipayan Roy
 
learning boolean weight learning real valued weights rank learning as ordina...
learning boolean weight learning real valued weights  rank learning as ordina...learning boolean weight learning real valued weights  rank learning as ordina...
learning boolean weight learning real valued weights rank learning as ordina...
jaishriramm0
 
Search Engines
Search EnginesSearch Engines
Search Enginesbutest
 
Ir 08
Ir   08Ir   08
Text categorization
Text categorizationText categorization
Text categorization
KU Leuven
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
YI-JHEN LIN
 
Part 1
Part 1Part 1
Part 1butest
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
Datamining Tools
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
DataminingTools Inc
 
Subjective evaluation answer ppt
Subjective evaluation answer pptSubjective evaluation answer ppt
Subjective evaluation answer ppt
Mrunal Pagnis
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
thenmozhip8
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
Rebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
PyData
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
Habtamu100
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
naveenchaurasia
 

Similar to Scoring, term weighting and the vector space (20)

Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
 
learning boolean weight learning real valued weights rank learning as ordina...
learning boolean weight learning real valued weights  rank learning as ordina...learning boolean weight learning real valued weights  rank learning as ordina...
learning boolean weight learning real valued weights rank learning as ordina...
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Ir 08
Ir   08Ir   08
Ir 08
 
Text categorization
Text categorizationText categorization
Text categorization
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
 
Part 1
Part 1Part 1
Part 1
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
Subjective evaluation answer ppt
Subjective evaluation answer pptSubjective evaluation answer ppt
Subjective evaluation answer ppt
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
Lec1
Lec1Lec1
Lec1
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 

More from Ujjawal

fMRI in machine learning
fMRI in machine learningfMRI in machine learning
fMRI in machine learning
Ujjawal
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learningUjjawal
 
Information retrieval
Information retrievalInformation retrieval
Information retrievalUjjawal
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithmUjjawal
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
Ujjawal
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
Ujjawal
 
Vector space classification
Vector space classificationVector space classification
Vector space classification
Ujjawal
 
Bayes’ theorem and logistic regression
Bayes’ theorem and logistic regressionBayes’ theorem and logistic regression
Bayes’ theorem and logistic regressionUjjawal
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 

More from Ujjawal (10)

fMRI in machine learning
fMRI in machine learningfMRI in machine learning
fMRI in machine learning
 
Random forest
Random forestRandom forest
Random forest
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learning
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Vector space classification
Vector space classificationVector space classification
Vector space classification
 
Bayes’ theorem and logistic regression
Bayes’ theorem and logistic regressionBayes’ theorem and logistic regression
Bayes’ theorem and logistic regression
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Scoring, term weighting and the vector space

  • 1.
  • 2.  Is a document simply a sequence of words?  Many structural components like authors, title, date of publication …..  Metadata – data about documents  Fields – document features where possible values are finite. Example – dates, ISBN  Zones – document features whose content can be arbitrary text fields. Example – title, abstract
  • 3. A user may specify requirements on fields and zones
  • 4. One parametric index for each zone/field Dictionary comes from a fixed vocabulary Separate inverted index is build for each zone of the document
  • 5.  Dictionary structure whatever vocabulary stems from the text of that zone  Advantages:  Reduced size of the dictionary  Efficient query answering using weighted zone scoring
  • 6.  Different field/zones have different importance in evaluating how a document matches a query  For a query q and a document d, weighted zone scoring assigns a pair to (q, d) [(query, document)] a score in range [0, 1] by computing a linear combination of zone scores  Let each document has l zones. Let g1….gl belongs to [0,1] such that 𝑖=1 𝑙 𝑔𝑖 = 1  Each field/zone contributes a Boolean value – let si be the Boolean score denoting a match or absence between q and the ith zone  The weighted zone is 𝑖=1 𝑙 𝑔𝑖 𝑠𝑖
  • 7.  Consider the query Shakespeare in a collection in which each document has three zones: author, title and body  Boolean score function take the value 1 if the query term Shakespeare is present in the zone otherwise 0  Weight score term require three weights gbody, gtitle and gauthor  Let gbody=0.5, gtitle=0.3 and gauthor=0.2  Here author zone is least important, title zone is somewhat more and body contributes the most
  • 8.  Could have been specified by an expert  Can be judged editorially  Each training example is a tuple consisting of a query q and a document d and a relevance judgment of q on d  The judgment can be binary  A judgment score can also be used  Compute the weights such that the learned scores approximate the relevance judgments as much as possible  An optimization problem
  • 9.  Consider two zones: title and body  Compute Boolean variables sT(d, q), sB(d, q) depending on the query matching  Compute a score between 0 and 1 by using the relation:  Score (d, q) = g sT(d, q) + (1-g)sB(d, q)  Constant g is determined from a set of training examples µj = (dj, qj, r(dj, qj))  In each training example, each training document and a training query is accessed by a human editor who delivers a relevant judgment r(dj, qj).  For each training example µj ,we have Boolean values sT(d, q) and sB(d, q), that we use to compute a score  Error of scoring function  µ(g, µj) = (r(dj, qj) – score(dj, qj))2  Total error 𝑗 µ(g, µj)
  • 10.  Let n01r (n01n) be the numbers of training examples that STitle = 0 and SBody = 1 and the judgment is relevant (irrelevant). The contribution of those examples that STitle = 0 and SBody = 1 to the total error is  [1-(1-g)]2n01r + [0-(1-g)]2n01n  The total error is (n01r+n10n)g2 + (n10r + n01n)(1-g)2 +n00r + n11n  By differentiating with respect to g and setting the result to 0, the optimal value of g is  𝒏 𝟏𝟎𝒓 +𝒏 𝟎𝟏𝒏 𝒏 𝟏𝟎𝒓 +𝒏 𝟏𝟎𝒏 +𝒏 𝟎𝟏𝒓 +𝒏 𝟎𝟏𝒏