SlideShare a Scribd company logo
1 of 11
Information Retrieval : 8
Term Weighting
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
Term Weighting
• The terms of a document are not equally useful
for describing the document contents
• In fact, there are index terms which are simply
vaguer than others
• There are properties of an index term which are
useful for evaluating the importance of the term
in a document
– For instance, a word which appears in all documents
of a collection is completely useless for retrieval tasks
Term Weighting
• To characterize term importance, we associate a
weight wi,j > 0 with each term ki that occurs in the
document dj
– If ki that does not appear in the document dj , then
wi,j = 0.
• The weight wi,j quantifies the importance of the
index term ki for describing the contents of
document dj
• These weights are useful to compute a rank for
each document in the collection with regard to a
given query
Term Weighting
Term Weighting
• The weights wi,j can be computed using the frequencies of
occurrence of the terms within documents
• Let fi,j be the frequency of occurrence of index term ki in
• the document dj
• The total frequency of occurrence Fi of term ki in the
collection is defined as
• where N is the number of documents in the collection
Term Weighting
• The document frequency ni of a term ki is the number of
documents in which it occurs
• Notice that ni < Fi or ni = Fi.
• For instance, in the document collection below, the values fi,j ,
Fi and ni associated with the term do are
Term-term correlation matrix
• For classic information retrieval models, the index term
weights are assumed to be mutually independent
– This means that wi,j tells us nothing about wi+1,j
• This is clearly a simplification because occurrences of
index terms in a document are not uncorrelated
• For instance, the terms computer and network tend to
appear together in a document about computer
networks
– In this document, the appearance of one of these terms
attracts the appearance of the other
• Thus, they are correlated and their weights should
reflect this correlation.
Term-term correlation matrix
Term-term correlation matrix
TF-IDF Weights
• TF-IDF term weighting scheme:
– Term frequency (TF)
– Inverse document frequency (IDF)
– Foundations of the most popular term weighting
scheme in IR
Assignment
• Discuss in detail the concept of Term
Weighting and Term Correlation matrix

More Related Content

What's hot

Lecture 14 run time environment
Lecture 14 run time environmentLecture 14 run time environment
Lecture 14 run time environmentIffat Anjum
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notesBAIRAVI T
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Primya Tamil
 
block ciphers
block ciphersblock ciphers
block ciphersAsad Ali
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction) Primya Tamil
 
Secure communication
Secure communicationSecure communication
Secure communicationTushar Swami
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix ArrayHarshit Agarwal
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Public Key Cryptosystem
Public Key CryptosystemPublic Key Cryptosystem
Public Key CryptosystemDevakumar Kp
 
Network security - OSI Security Architecture
Network security - OSI Security ArchitectureNetwork security - OSI Security Architecture
Network security - OSI Security ArchitectureBharathiKrishna6
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query ProcessingMythili Kannan
 
Authentication Protocols
Authentication ProtocolsAuthentication Protocols
Authentication ProtocolsTrinity Dwarka
 
Classical encryption techniques
Classical encryption techniquesClassical encryption techniques
Classical encryption techniquesDr.Florence Dayana
 

What's hot (20)

Lecture 14 run time environment
Lecture 14 run time environmentLecture 14 run time environment
Lecture 14 run time environment
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
block ciphers
block ciphersblock ciphers
block ciphers
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
Secure communication
Secure communicationSecure communication
Secure communication
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Public Key Cryptosystem
Public Key CryptosystemPublic Key Cryptosystem
Public Key Cryptosystem
 
Firewalls
FirewallsFirewalls
Firewalls
 
Network security - OSI Security Architecture
Network security - OSI Security ArchitectureNetwork security - OSI Security Architecture
Network security - OSI Security Architecture
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
Authentication Protocols
Authentication ProtocolsAuthentication Protocols
Authentication Protocols
 
Steganography
SteganographySteganography
Steganography
 
Classical encryption techniques
Classical encryption techniquesClassical encryption techniques
Classical encryption techniques
 

Similar to Term Weighting and Correlation in Information Retrieval

IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptxthenmozhip8
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdfHabtamu100
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsVaibhav Khanna
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir modelsVaibhav Khanna
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.pptBereketAraya
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.pptBereketAraya
 
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...onlmcq
 
Information retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsInformation retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsVaibhav Khanna
 
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...IJERA Editor
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdfHabtamu100
 
Using Class Frequency for Improving Centroid-based Text Classification
Using Class Frequency for Improving Centroid-based Text ClassificationUsing Class Frequency for Improving Centroid-based Text Classification
Using Class Frequency for Improving Centroid-based Text ClassificationIDES Editor
 
IRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET Journal
 
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...rahulmonikasharma
 
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...rahulmonikasharma
 

Similar to Term Weighting and Correlation in Information Retrieval (20)

IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
Ir models
Ir modelsIr models
Ir models
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
 
UNIT 3 IRT.docx
UNIT 3 IRT.docxUNIT 3 IRT.docx
UNIT 3 IRT.docx
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
 
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
 
IR.pptx
IR.pptxIR.pptx
IR.pptx
 
Information retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsInformation retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based models
 
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
 
Using Class Frequency for Improving Centroid-based Text Classification
Using Class Frequency for Improving Centroid-based Text ClassificationUsing Class Frequency for Improving Centroid-based Text Classification
Using Class Frequency for Improving Centroid-based Text Classification
 
IRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF Metric
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
3_Indexing.ppt
3_Indexing.ppt3_Indexing.ppt
3_Indexing.ppt
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
 
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
 
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
Context based Document Indexing and Retrieval using Big Data Analytics - A Re...
 

More from Vaibhav Khanna

Information and network security 47 authentication applications
Information and network security 47 authentication applicationsInformation and network security 47 authentication applications
Information and network security 47 authentication applicationsVaibhav Khanna
 
Information and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmInformation and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmVaibhav Khanna
 
Information and network security 45 digital signature standard
Information and network security 45 digital signature standardInformation and network security 45 digital signature standard
Information and network security 45 digital signature standardVaibhav Khanna
 
Information and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesInformation and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesVaibhav Khanna
 
Information and network security 43 digital signatures
Information and network security 43 digital signaturesInformation and network security 43 digital signatures
Information and network security 43 digital signaturesVaibhav Khanna
 
Information and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeInformation and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeVaibhav Khanna
 
Information and network security 41 message authentication code
Information and network security 41 message authentication codeInformation and network security 41 message authentication code
Information and network security 41 message authentication codeVaibhav Khanna
 
Information and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmInformation and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmVaibhav Khanna
 
Information and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmInformation and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmVaibhav Khanna
 
Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Vaibhav Khanna
 
Information and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationInformation and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationVaibhav Khanna
 
Information and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremInformation and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremVaibhav Khanna
 
Information and network security 34 primality
Information and network security 34 primalityInformation and network security 34 primality
Information and network security 34 primalityVaibhav Khanna
 
Information and network security 33 rsa algorithm
Information and network security 33 rsa algorithmInformation and network security 33 rsa algorithm
Information and network security 33 rsa algorithmVaibhav Khanna
 
Information and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsInformation and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsVaibhav Khanna
 
Information and network security 31 public key cryptography
Information and network security 31 public key cryptographyInformation and network security 31 public key cryptography
Information and network security 31 public key cryptographyVaibhav Khanna
 
Information and network security 30 random numbers
Information and network security 30 random numbersInformation and network security 30 random numbers
Information and network security 30 random numbersVaibhav Khanna
 
Information and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmInformation and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmVaibhav Khanna
 
Information and network security 28 blowfish
Information and network security 28 blowfishInformation and network security 28 blowfish
Information and network security 28 blowfishVaibhav Khanna
 
Information and network security 27 triple des
Information and network security 27 triple desInformation and network security 27 triple des
Information and network security 27 triple desVaibhav Khanna
 

More from Vaibhav Khanna (20)

Information and network security 47 authentication applications
Information and network security 47 authentication applicationsInformation and network security 47 authentication applications
Information and network security 47 authentication applications
 
Information and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmInformation and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithm
 
Information and network security 45 digital signature standard
Information and network security 45 digital signature standardInformation and network security 45 digital signature standard
Information and network security 45 digital signature standard
 
Information and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesInformation and network security 44 direct digital signatures
Information and network security 44 direct digital signatures
 
Information and network security 43 digital signatures
Information and network security 43 digital signaturesInformation and network security 43 digital signatures
Information and network security 43 digital signatures
 
Information and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeInformation and network security 42 security of message authentication code
Information and network security 42 security of message authentication code
 
Information and network security 41 message authentication code
Information and network security 41 message authentication codeInformation and network security 41 message authentication code
Information and network security 41 message authentication code
 
Information and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmInformation and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithm
 
Information and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmInformation and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithm
 
Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...
 
Information and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationInformation and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authentication
 
Information and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremInformation and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theorem
 
Information and network security 34 primality
Information and network security 34 primalityInformation and network security 34 primality
Information and network security 34 primality
 
Information and network security 33 rsa algorithm
Information and network security 33 rsa algorithmInformation and network security 33 rsa algorithm
Information and network security 33 rsa algorithm
 
Information and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsInformation and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystems
 
Information and network security 31 public key cryptography
Information and network security 31 public key cryptographyInformation and network security 31 public key cryptography
Information and network security 31 public key cryptography
 
Information and network security 30 random numbers
Information and network security 30 random numbersInformation and network security 30 random numbers
Information and network security 30 random numbers
 
Information and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmInformation and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithm
 
Information and network security 28 blowfish
Information and network security 28 blowfishInformation and network security 28 blowfish
Information and network security 28 blowfish
 
Information and network security 27 triple des
Information and network security 27 triple desInformation and network security 27 triple des
Information and network security 27 triple des
 

Recently uploaded

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

Term Weighting and Correlation in Information Retrieval

  • 1. Information Retrieval : 8 Term Weighting Prof Neeraj Bhargava Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer
  • 2. Term Weighting • The terms of a document are not equally useful for describing the document contents • In fact, there are index terms which are simply vaguer than others • There are properties of an index term which are useful for evaluating the importance of the term in a document – For instance, a word which appears in all documents of a collection is completely useless for retrieval tasks
  • 3. Term Weighting • To characterize term importance, we associate a weight wi,j > 0 with each term ki that occurs in the document dj – If ki that does not appear in the document dj , then wi,j = 0. • The weight wi,j quantifies the importance of the index term ki for describing the contents of document dj • These weights are useful to compute a rank for each document in the collection with regard to a given query
  • 5. Term Weighting • The weights wi,j can be computed using the frequencies of occurrence of the terms within documents • Let fi,j be the frequency of occurrence of index term ki in • the document dj • The total frequency of occurrence Fi of term ki in the collection is defined as • where N is the number of documents in the collection
  • 6. Term Weighting • The document frequency ni of a term ki is the number of documents in which it occurs • Notice that ni < Fi or ni = Fi. • For instance, in the document collection below, the values fi,j , Fi and ni associated with the term do are
  • 7. Term-term correlation matrix • For classic information retrieval models, the index term weights are assumed to be mutually independent – This means that wi,j tells us nothing about wi+1,j • This is clearly a simplification because occurrences of index terms in a document are not uncorrelated • For instance, the terms computer and network tend to appear together in a document about computer networks – In this document, the appearance of one of these terms attracts the appearance of the other • Thus, they are correlated and their weights should reflect this correlation.
  • 10. TF-IDF Weights • TF-IDF term weighting scheme: – Term frequency (TF) – Inverse document frequency (IDF) – Foundations of the most popular term weighting scheme in IR
  • 11. Assignment • Discuss in detail the concept of Term Weighting and Term Correlation matrix