SlideShare a Scribd company logo
1 of 2
Download to read offline
Here is a simplified example of the vector space retrieval model. Consider a very
small collection C that consists in the following three documents:
d1: “new york times”
d2: “new york post”
d3: “los angeles times”
Some terms appear in two documents, some appear only in one document. The total
number of documents is N=3. Therefore, the idf values for the terms are:
angles log2(3/1)=1.584
los log2(3/1)=1.584
new log2(3/2)=0.584
post log2(3/1)=1.584
times log2(3/2)=0.584
york log2(3/2)=0.584
For all the documents, we calculate the tf scores for all the terms in C. We assume the
words in the vectors are ordered alphabetically.
angeles los new post times york
d1 0 0 1 0 1 1
d2 0 0 1 1 0 1
d3 1 1 0 0 1 0
Now we multiply the tf scores by the idf values of each term, obtaining the following
matrix of documents-by-terms: (All the terms appeared only once in each document in our
small collection, so the maximum value for normalization is 1.)
angeles los new post times york
d1 0 0 0.584 0 0.584 0.584
d2 0 0 0.584 1.584 0 0.584
d3 1.584 1.584 0 0 0.584 0
Given the following query: “new new times”, we calculate the tf-idf vector for the
query, and compute the score of each document in C relative to this query, using the cosine
similarity measure. When computing the tf-idf values for the query terms we divide the
frequency by the maximum frequency (2) and multiply with the idf values.
q 0 0 (2/2)*0.584=0.584 0 (1/2)*0.584=0.292 0
We calculate the length of each document and of the query:
Length of d1 = sqrt(0.584^2+0.584^2+0.584^2)=1.011
Length of d2 = sqrt(0.584^2+1.584^2+0.584^2)=1.786
Length of d3 = sqrt(1.584^2+1.584^2+0.584^2)=2.316
Length of q = sqrt(0.584^2+0.292^2)=0.652
Then the similarity values are:
cosSim(d1,q) = (0*0+0*0+0.584*0.584+0*0+0.584*0.292+0.584*0) / (1.011*0.652) = 0.776
cosSim(d2,q) = (0*0+0*0+0.584*0.584+1.584*0+0*0.292+0.584*0) / (1.786*0.652) = 0.292
cosSim(d3,q) = (1.584*0+1.584*0+0*0.584+0*0+0.584*0.292+0*0) / (2.316*0.652) = 0.112
According to the similarity values, the final order in which the documents are
presented as result to the query will be: d1, d2, d3.

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingSaurabh Kaushik
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentationMarijn van Zelst
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
Text classification
Text classificationText classification
Text classificationJames Wong
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data miningZHAO Sam
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingMercy Rani
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 

What's hot (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Text summarization
Text summarization Text summarization
Text summarization
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Text classification
Text classificationText classification
Text classification
 
NLP
NLPNLP
NLP
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Similar to tf idf example machine learning

Cosine tf idf_example
Cosine tf idf_exampleCosine tf idf_example
Cosine tf idf_examplehellangel13
 
ON THE COVERING RADIUS OF CODES OVER Z4 WITH CHINESE EUCLIDEAN WEIGHT
ON THE COVERING RADIUS OF CODES OVER Z4 WITH CHINESE EUCLIDEAN WEIGHTON THE COVERING RADIUS OF CODES OVER Z4 WITH CHINESE EUCLIDEAN WEIGHT
ON THE COVERING RADIUS OF CODES OVER Z4 WITH CHINESE EUCLIDEAN WEIGHTijitjournal
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqPradeep Bisht
 
Fast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodFast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodIAEME Publication
 
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIFast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIijeei-iaes
 
Options pricing using Lattice models
Options pricing using Lattice modelsOptions pricing using Lattice models
Options pricing using Lattice modelsQuasar Chunawala
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptxpallavidhade2
 
Binomial Theorem
Binomial TheoremBinomial Theorem
Binomial Theoremitutor
 
Fast Fourier Transform (FFT) Algorithms in DSP
Fast Fourier Transform (FFT) Algorithms in DSPFast Fourier Transform (FFT) Algorithms in DSP
Fast Fourier Transform (FFT) Algorithms in DSProykousik2020
 
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCESON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCESijitjournal
 

Similar to tf idf example machine learning (20)

Cosine tf idf_example
Cosine tf idf_exampleCosine tf idf_example
Cosine tf idf_example
 
renato_barros_arantes_article
renato_barros_arantes_articlerenato_barros_arantes_article
renato_barros_arantes_article
 
rs_1.pptx
rs_1.pptxrs_1.pptx
rs_1.pptx
 
ON THE COVERING RADIUS OF CODES OVER Z4 WITH CHINESE EUCLIDEAN WEIGHT
ON THE COVERING RADIUS OF CODES OVER Z4 WITH CHINESE EUCLIDEAN WEIGHTON THE COVERING RADIUS OF CODES OVER Z4 WITH CHINESE EUCLIDEAN WEIGHT
ON THE COVERING RADIUS OF CODES OVER Z4 WITH CHINESE EUCLIDEAN WEIGHT
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
 
5th Semester Electronic and Communication Engineering (2013-June) Question Pa...
5th Semester Electronic and Communication Engineering (2013-June) Question Pa...5th Semester Electronic and Communication Engineering (2013-June) Question Pa...
5th Semester Electronic and Communication Engineering (2013-June) Question Pa...
 
2013-June: 5th Semester E & C Question Papers
2013-June: 5th Semester E & C Question Papers2013-June: 5th Semester E & C Question Papers
2013-June: 5th Semester E & C Question Papers
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Fast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodFast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s method
 
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-IIFast Algorithm for Computing the Discrete Hartley Transform of Type-II
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
 
E33018021
E33018021E33018021
E33018021
 
Options pricing using Lattice models
Options pricing using Lattice modelsOptions pricing using Lattice models
Options pricing using Lattice models
 
Rsa encryption
Rsa encryptionRsa encryption
Rsa encryption
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
Binomial theorem
Binomial theorem Binomial theorem
Binomial theorem
 
Binomial Theorem
Binomial TheoremBinomial Theorem
Binomial Theorem
 
Fast Fourier Transform (FFT) Algorithms in DSP
Fast Fourier Transform (FFT) Algorithms in DSPFast Fourier Transform (FFT) Algorithms in DSP
Fast Fourier Transform (FFT) Algorithms in DSP
 
QUARTILE DEVIATION
QUARTILE DEVIATIONQUARTILE DEVIATION
QUARTILE DEVIATION
 
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCESON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
 
Unit 3
Unit 3Unit 3
Unit 3
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 

tf idf example machine learning

  • 1. Here is a simplified example of the vector space retrieval model. Consider a very small collection C that consists in the following three documents: d1: “new york times” d2: “new york post” d3: “los angeles times” Some terms appear in two documents, some appear only in one document. The total number of documents is N=3. Therefore, the idf values for the terms are: angles log2(3/1)=1.584 los log2(3/1)=1.584 new log2(3/2)=0.584 post log2(3/1)=1.584 times log2(3/2)=0.584 york log2(3/2)=0.584 For all the documents, we calculate the tf scores for all the terms in C. We assume the words in the vectors are ordered alphabetically. angeles los new post times york d1 0 0 1 0 1 1 d2 0 0 1 1 0 1 d3 1 1 0 0 1 0 Now we multiply the tf scores by the idf values of each term, obtaining the following matrix of documents-by-terms: (All the terms appeared only once in each document in our small collection, so the maximum value for normalization is 1.) angeles los new post times york d1 0 0 0.584 0 0.584 0.584 d2 0 0 0.584 1.584 0 0.584 d3 1.584 1.584 0 0 0.584 0 Given the following query: “new new times”, we calculate the tf-idf vector for the query, and compute the score of each document in C relative to this query, using the cosine similarity measure. When computing the tf-idf values for the query terms we divide the frequency by the maximum frequency (2) and multiply with the idf values. q 0 0 (2/2)*0.584=0.584 0 (1/2)*0.584=0.292 0
  • 2. We calculate the length of each document and of the query: Length of d1 = sqrt(0.584^2+0.584^2+0.584^2)=1.011 Length of d2 = sqrt(0.584^2+1.584^2+0.584^2)=1.786 Length of d3 = sqrt(1.584^2+1.584^2+0.584^2)=2.316 Length of q = sqrt(0.584^2+0.292^2)=0.652 Then the similarity values are: cosSim(d1,q) = (0*0+0*0+0.584*0.584+0*0+0.584*0.292+0.584*0) / (1.011*0.652) = 0.776 cosSim(d2,q) = (0*0+0*0+0.584*0.584+1.584*0+0*0.292+0.584*0) / (1.786*0.652) = 0.292 cosSim(d3,q) = (1.584*0+1.584*0+0*0.584+0*0+0.584*0.292+0*0) / (2.316*0.652) = 0.112 According to the similarity values, the final order in which the documents are presented as result to the query will be: d1, d2, d3.