SlideShare a Scribd company logo
1 of 12
Information Retrieval : 20
Divergence from Randomness
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
Divergence from Randomness
• A distinct probabilistic model has been
proposed by Amati and Rijsbergen
• The idea is to compute term weights by
measuring the divergence between a term
distribution produced by a random process
and the actual term distribution
• Thus, the name divergence from randomness
• The model is based on two fundamental
assumptions, as follows.
First assumption:
• Not all words are equally important for describing
the content of the documents
• Words that carry little information are assumed to
be randomly distributed over the whole
document collection C
• Given a term ki, its probability distribution over
the whole collection is referred to as P(ki|C)
• The amount of information associated with this
distribution is given by −log P(ki|C)
• By modifying this probability function, we can
implement distinct notions of term randomness
Second assumption
• A complementary term distribution can be obtained by
considering just the subset of documents that contain
term ki
• This subset is referred to as the elite set
• The corresponding probability distribution, computed
with regard to document dj , is referred to as P(ki|dj)
• Smaller the probability of observing a term ki in a
document dj , more rare and important is the term
considered to be
• Thus, the amount of information associated with the
term in the elite set is defined as 1 − P(ki|dj)
Divergence from Randomness
Random Distribution
• To compute the distribution of terms in the collection,
distinct probability models can be considered
• For instance, consider that Bernoulli trials are used to
model the occurrences of a term in the collection
• To illustrate, consider a collection with 1,000 documents
and a term ki that occurs 10 times in the collection
• Then, the probability of observing 4 occurrences of term
ki in a document is given by
Random Distribution
Random Distribution
• Under these conditions, we can aproximate
the binomial distribution by a Poisson process,
which yields
Distribution over the Elite Set
Normalization
Normalization
Assignment
• Explain the Information Retrieval Model of
Divergence from Randomness

More Related Content

What's hot

Information retrieval 15 alternative algebraic models
Information retrieval 15 alternative algebraic modelsInformation retrieval 15 alternative algebraic models
Information retrieval 15 alternative algebraic modelsVaibhav Khanna
 
Introduction to Health Informatics and Health Information Technology (Part 1)...
Introduction to Health Informatics and Health Information Technology (Part 1)...Introduction to Health Informatics and Health Information Technology (Part 1)...
Introduction to Health Informatics and Health Information Technology (Part 1)...Nawanan Theera-Ampornpunt
 
Cloud computing in academic libraries
Cloud computing in academic librariesCloud computing in academic libraries
Cloud computing in academic librariesErik Mitchell
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
Information retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irInformation retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irVaibhav Khanna
 
The vector space model
The vector space modelThe vector space model
The vector space modelpkgosh
 
Denclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, PeDenclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, PeTauhidul Khandaker
 
Federated Cloud Computing - The OpenNebula Experience v1.0s
Federated Cloud Computing  - The OpenNebula Experience v1.0sFederated Cloud Computing  - The OpenNebula Experience v1.0s
Federated Cloud Computing - The OpenNebula Experience v1.0sIgnacio M. Llorente
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean modelVaibhav Khanna
 
Planning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectPlanning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectJenn Riley
 
cloud computing, Principle and Paradigms: 1 introdution
cloud computing, Principle and Paradigms: 1 introdutioncloud computing, Principle and Paradigms: 1 introdution
cloud computing, Principle and Paradigms: 1 introdutionMajid Hajibaba
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)KU Leuven
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 

What's hot (20)

Information retrieval 15 alternative algebraic models
Information retrieval 15 alternative algebraic modelsInformation retrieval 15 alternative algebraic models
Information retrieval 15 alternative algebraic models
 
Introduction to Health Informatics and Health Information Technology (Part 1)...
Introduction to Health Informatics and Health Information Technology (Part 1)...Introduction to Health Informatics and Health Information Technology (Part 1)...
Introduction to Health Informatics and Health Information Technology (Part 1)...
 
Cloud computing in academic libraries
Cloud computing in academic librariesCloud computing in academic libraries
Cloud computing in academic libraries
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Knowledge organization system
Knowledge organization systemKnowledge organization system
Knowledge organization system
 
Service Oriented Computing - Session1 : Intro
Service Oriented Computing - Session1 : IntroService Oriented Computing - Session1 : Intro
Service Oriented Computing - Session1 : Intro
 
Information retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irInformation retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of ir
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
Denclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, PeDenclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, Pe
 
Federated Cloud Computing - The OpenNebula Experience v1.0s
Federated Cloud Computing  - The OpenNebula Experience v1.0sFederated Cloud Computing  - The OpenNebula Experience v1.0s
Federated Cloud Computing - The OpenNebula Experience v1.0s
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Digital Library Initiatives in India
Digital Library Initiatives in IndiaDigital Library Initiatives in India
Digital Library Initiatives in India
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Koha System Architecture
Koha System ArchitectureKoha System Architecture
Koha System Architecture
 
Planning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectPlanning and Implementing a Digital Library Project
Planning and Implementing a Digital Library Project
 
Library Analytics: an Overview
Library Analytics: an OverviewLibrary Analytics: an Overview
Library Analytics: an Overview
 
cloud computing, Principle and Paradigms: 1 introdution
cloud computing, Principle and Paradigms: 1 introdutioncloud computing, Principle and Paradigms: 1 introdution
cloud computing, Principle and Paradigms: 1 introdution
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 

Similar to Information retrieval 20 divergence from randomness

IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptxthenmozhip8
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspectiveankurpandeyinfo
 
Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...National Institute of Informatics
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelJunya Tanaka
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2AdamCribbs1
 
Statistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreStatistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreGiridhar Chandrasekaran
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspacePrakash Dubey
 
The tale of heavy tails in computer networking
The tale of heavy tails in computer networkingThe tale of heavy tails in computer networking
The tale of heavy tails in computer networkingStenio Fernandes
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxgamingzonedead880
 
Information retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsInformation retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsVaibhav Khanna
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdfHabtamu100
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDivyanshGupta922023
 
Text classificationmethods
Text classificationmethodsText classificationmethods
Text classificationmethodsFraboni Ec
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsHarry Potter
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsLuis Goldster
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsDavid Hoen
 

Similar to Information retrieval 20 divergence from randomness (20)

IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
PA_EPGDM_2_2023.pptx
PA_EPGDM_2_2023.pptxPA_EPGDM_2_2023.pptx
PA_EPGDM_2_2023.pptx
 
Statistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreStatistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - Core
 
KNN
KNNKNN
KNN
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 
The tale of heavy tails in computer networking
The tale of heavy tails in computer networkingThe tale of heavy tails in computer networking
The tale of heavy tails in computer networking
 
determinatiion of
determinatiion of determinatiion of
determinatiion of
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptx
 
Information retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsInformation retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based models
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Text classificationmethods
Text classificationmethodsText classificationmethods
Text classificationmethods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 

More from Vaibhav Khanna

Information and network security 47 authentication applications
Information and network security 47 authentication applicationsInformation and network security 47 authentication applications
Information and network security 47 authentication applicationsVaibhav Khanna
 
Information and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmInformation and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmVaibhav Khanna
 
Information and network security 45 digital signature standard
Information and network security 45 digital signature standardInformation and network security 45 digital signature standard
Information and network security 45 digital signature standardVaibhav Khanna
 
Information and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesInformation and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesVaibhav Khanna
 
Information and network security 43 digital signatures
Information and network security 43 digital signaturesInformation and network security 43 digital signatures
Information and network security 43 digital signaturesVaibhav Khanna
 
Information and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeInformation and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeVaibhav Khanna
 
Information and network security 41 message authentication code
Information and network security 41 message authentication codeInformation and network security 41 message authentication code
Information and network security 41 message authentication codeVaibhav Khanna
 
Information and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmInformation and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmVaibhav Khanna
 
Information and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmInformation and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmVaibhav Khanna
 
Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Vaibhav Khanna
 
Information and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationInformation and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationVaibhav Khanna
 
Information and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremInformation and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremVaibhav Khanna
 
Information and network security 34 primality
Information and network security 34 primalityInformation and network security 34 primality
Information and network security 34 primalityVaibhav Khanna
 
Information and network security 33 rsa algorithm
Information and network security 33 rsa algorithmInformation and network security 33 rsa algorithm
Information and network security 33 rsa algorithmVaibhav Khanna
 
Information and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsInformation and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsVaibhav Khanna
 
Information and network security 31 public key cryptography
Information and network security 31 public key cryptographyInformation and network security 31 public key cryptography
Information and network security 31 public key cryptographyVaibhav Khanna
 
Information and network security 30 random numbers
Information and network security 30 random numbersInformation and network security 30 random numbers
Information and network security 30 random numbersVaibhav Khanna
 
Information and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmInformation and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmVaibhav Khanna
 
Information and network security 28 blowfish
Information and network security 28 blowfishInformation and network security 28 blowfish
Information and network security 28 blowfishVaibhav Khanna
 
Information and network security 27 triple des
Information and network security 27 triple desInformation and network security 27 triple des
Information and network security 27 triple desVaibhav Khanna
 

More from Vaibhav Khanna (20)

Information and network security 47 authentication applications
Information and network security 47 authentication applicationsInformation and network security 47 authentication applications
Information and network security 47 authentication applications
 
Information and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmInformation and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithm
 
Information and network security 45 digital signature standard
Information and network security 45 digital signature standardInformation and network security 45 digital signature standard
Information and network security 45 digital signature standard
 
Information and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesInformation and network security 44 direct digital signatures
Information and network security 44 direct digital signatures
 
Information and network security 43 digital signatures
Information and network security 43 digital signaturesInformation and network security 43 digital signatures
Information and network security 43 digital signatures
 
Information and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeInformation and network security 42 security of message authentication code
Information and network security 42 security of message authentication code
 
Information and network security 41 message authentication code
Information and network security 41 message authentication codeInformation and network security 41 message authentication code
Information and network security 41 message authentication code
 
Information and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmInformation and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithm
 
Information and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmInformation and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithm
 
Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...
 
Information and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationInformation and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authentication
 
Information and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremInformation and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theorem
 
Information and network security 34 primality
Information and network security 34 primalityInformation and network security 34 primality
Information and network security 34 primality
 
Information and network security 33 rsa algorithm
Information and network security 33 rsa algorithmInformation and network security 33 rsa algorithm
Information and network security 33 rsa algorithm
 
Information and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsInformation and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystems
 
Information and network security 31 public key cryptography
Information and network security 31 public key cryptographyInformation and network security 31 public key cryptography
Information and network security 31 public key cryptography
 
Information and network security 30 random numbers
Information and network security 30 random numbersInformation and network security 30 random numbers
Information and network security 30 random numbers
 
Information and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmInformation and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithm
 
Information and network security 28 blowfish
Information and network security 28 blowfishInformation and network security 28 blowfish
Information and network security 28 blowfish
 
Information and network security 27 triple des
Information and network security 27 triple desInformation and network security 27 triple des
Information and network security 27 triple des
 

Recently uploaded

Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 

Recently uploaded (20)

Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 

Information retrieval 20 divergence from randomness

  • 1. Information Retrieval : 20 Divergence from Randomness Prof Neeraj Bhargava Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer
  • 2. Divergence from Randomness • A distinct probabilistic model has been proposed by Amati and Rijsbergen • The idea is to compute term weights by measuring the divergence between a term distribution produced by a random process and the actual term distribution • Thus, the name divergence from randomness • The model is based on two fundamental assumptions, as follows.
  • 3. First assumption: • Not all words are equally important for describing the content of the documents • Words that carry little information are assumed to be randomly distributed over the whole document collection C • Given a term ki, its probability distribution over the whole collection is referred to as P(ki|C) • The amount of information associated with this distribution is given by −log P(ki|C) • By modifying this probability function, we can implement distinct notions of term randomness
  • 4. Second assumption • A complementary term distribution can be obtained by considering just the subset of documents that contain term ki • This subset is referred to as the elite set • The corresponding probability distribution, computed with regard to document dj , is referred to as P(ki|dj) • Smaller the probability of observing a term ki in a document dj , more rare and important is the term considered to be • Thus, the amount of information associated with the term in the elite set is defined as 1 − P(ki|dj)
  • 6. Random Distribution • To compute the distribution of terms in the collection, distinct probability models can be considered • For instance, consider that Bernoulli trials are used to model the occurrences of a term in the collection • To illustrate, consider a collection with 1,000 documents and a term ki that occurs 10 times in the collection • Then, the probability of observing 4 occurrences of term ki in a document is given by
  • 8. Random Distribution • Under these conditions, we can aproximate the binomial distribution by a Poisson process, which yields
  • 12. Assignment • Explain the Information Retrieval Model of Divergence from Randomness