SlideShare a Scribd company logo
1 of 61
1
Introduction to Information
Retrieval
Jian-Yun Nie
University of Montreal
Canada
2
Outline
 What is the IR problem?
 How to organize an IR system? (Or the
main processes in IR)
 Indexing
 Retrieval
 System evaluation
 Some current research topics
3
The problem of IR
 Goal = find documents relevant to an information
need from a large document set
Document
collection
Info.
need
Query
Answer list
IR
system
Retrieval
4
Example
Google
Web
5
IR problem
 First applications: in libraries (1950s)
ISBN: 0-201-12227-8
Author: Salton, Gerard
Title: Automatic text processing: the transformation,
analysis, and retrieval of information by computer
Editor: Addison-Wesley
Date: 1989
Content: <Text>
 external attributes and internal attribute (content)
 Search by external attributes = Search in DB
 IR: search by content
6
Possible approaches
1. String matching (linear search in
documents)
- Slow
- Difficult to improve
2. Indexing (*)
- Fast
- Flexible to further improvement
7
Indexing-based IR
Document Query
indexing indexing
(Query analysis)
Representation Representation
(keywords) Query (keywords)
evaluation
8
Main problems in IR
 Document and query indexing
 How to best represent their contents?
 Query evaluation (or retrieval process)
 To what extent does a document correspond
to a query?
 System evaluation
 How good is a system?
 Are the retrieved documents relevant?
(precision)
 Are all the relevant documents retrieved?
(recall)
9
Document indexing
 Goal = Find the important meanings and create an
internal representation
 Factors to consider:
 Accuracy to represent meanings (semantics)
 Exhaustiveness (cover all the contents)
 Facility for computer to manipulate
 What is the best representation of contents?
 Char. string (char trigrams): not precise enough
 Word: good coverage, not precise
 Phrase: poor coverage, more precise
 Concept: poor coverage, precise
Coverage
(Recall)
Accuracy
(Precision)
String Word Phrase Concept
10
Keyword selection and weighting
 How to select important keywords?
 Simple method: using middle-frequency words
Frequency/Informativity
frequency informativity
Max.
Min.
1 2 3 … Rank
11
 tf = term frequency
 frequency of a term/keyword in a document
The higher the tf, the higher the importance (weight) for the doc.
 df = document frequency
 no. of documents containing the term
 distribution of the term
 idf = inverse document frequency
 the unevenness of term distribution in the corpus
 the specificity of term to a document
The more the term is distributed evenly, the less it is specific to a
document
weight(t,D) = tf(t,D) * idf(t)
tf*idf weighting schema
12
Some common tf*idf schemes
 tf(t, D)=freq(t,D) idf(t) = log(N/n)
 tf(t, D)=log[freq(t,D)] n = #docs containing t
 tf(t, D)=log[freq(t,D)]+1 N = #docs in corpus
 tf(t, D)=freq(t,d)/Max[f(t,d)]
weight(t,D) = tf(t,D) * idf(t)
 Normalization: Cosine normalization, /max, …
13
Document Length
Normalization
 Sometimes, additional normalizations e.g.
length:
)
,
(
_
)
1
(
1
)
,
(
)
,
(
D
t
weight
normalized
povot
slope
slope
D
t
weight
D
t
pivoted




pivot
Probability
of relevance
Probability of retrieval
Doc. length
slope
14
 function words do not bear useful information for IR
of, in, about, with, I, although, …
 Stoplist: contain stopwords, not to be used as index
 Prepositions
 Articles
 Pronouns
 Some adverbs and adjectives
 Some frequent words (e.g. document)
 The removal of stopwords usually improves IR
effectiveness
 A few “standard” stoplists are commonly used.
Stopwords / Stoplist
15
Stemming
 Reason:
 Different word forms may bear similar meaning
(e.g. search, searching): create a “standard”
representation for them
 Stemming:
 Removing some endings of word
computer
compute
computes
computing
computed
computation
comput
16
Porter algorithm
(Porter, M.F., 1980, An algorithm for suffix stripping,
Program, 14(3) :130-137)
 Step 1: plurals and past participles
 SSES -> SS caresses -> caress
 (*v*) ING -> motoring -> motor
 Step 2: adj->n, n->v, n->adj, …
 (m>0) OUSNESS -> OUS callousness -> callous
 (m>0) ATIONAL -> ATE relational -> relate
 Step 3:
 (m>0) ICATE -> IC triplicate -> triplic
 Step 4:
 (m>1) AL -> revival -> reviv
 (m>1) ANCE -> allowance -> allow
 Step 5:
 (m>1) E -> probate -> probat
 (m > 1 and *d and *L) -> single letter controll -> control
17
Lemmatization
 transform to standard form according to syntactic
category.
E.g. verb + ing  verb
noun + s  noun
 Need POS tagging
 More accurate than stemming, but needs more resources
 crucial to choose stemming/lemmatization rules
noise v.s. recognition rate
 compromise between precision and recall
light/no stemming severe stemming
-recall +precision +recall -precision
18
Result of indexing
 Each document is represented by a set of weighted
keywords (terms):
D1  {(t1, w1), (t2,w2), …}
e.g.D1  {(comput, 0.2), (architect, 0.3), …}
D2  {(comput, 0.1), (network, 0.5), …}
 Inverted file:
comput  {(D1,0.2), (D2,0.1), …}
Inverted file is used during retrieval for higher efficiency.
19
Retrieval
 The problems underlying retrieval
 Retrieval model
 How is a document represented with the
selected keywords?
 How are document and query representations
compared to calculate a score?
 Implementation
20
Cases
 1-word query:
The documents to be retrieved are those that
include the word
- Retrieve the inverted list for the word
- Sort in decreasing order of the weight of the word
 Multi-word query?
- Combining several lists
- How to interpret the weight?
(IR model)
21
IR models
 Matching score model
 Document D = a set of weighted keywords
 Query Q = a set of non-weighted keywords
 R(D, Q) = i w(ti , D)
where ti is in Q.
22
Boolean model
 Document = Logical conjunction of keywords
 Query = Boolean expression of keywords
 R(D, Q) = D Q
e.g. D = t1  t2  …  tn
Q = (t1  t2)  (t3  t4)
D Q, thus R(D, Q) = 1.
Problems:
 R is either 1 or 0 (unordered set of documents)
 many documents or few documents
 End-users cannot manipulate Boolean operators correctly
E.g. documents about kangaroos and koalas
23
Extensions to Boolean model
(for document ordering)
 D = {…, (ti, wi), …}: weighted keywords
 Interpretation:
 D is a member of class ti to degree wi.
 In terms of fuzzy sets: ti(D) = wi
A possible Evaluation:
R(D, ti) = ti(D);
R(D, Q1  Q2) = min(R(D, Q1), R(D, Q2));
R(D, Q1  Q2) = max(R(D, Q1), R(D, Q2));
R(D, Q1) = 1 - R(D, Q1).
24
Vector space model
 Vector space = all the keywords encountered
<t1, t2, t3, …, tn>
 Document
D = < a1, a2, a3, …, an>
ai = weight of ti in D
 Query
Q = < b1, b2, b3, …, bn>
bi = weight of ti in Q
 R(D,Q) = Sim(D,Q)
25
Matrix representation
t1 t2 t3 … tn
D1 a11 a12 a13 … a1n
D2 a21 a22 a23 … a2n
D3 a31 a32 a33 … a3n
…
Dm am1 am2 am3 … amn
Q b1 b2 b3 … bn
Term vector
space
Document space
26
Some formulas for Sim
Dot product
Cosine
Dice
Jaccard
  

 

 









i i i
i
i
i
i
i
i
i
i i
i
i
i
i
i
i i
i
i
i
i
i
i
i
b
a
b
a
b
a
Q
D
Sim
b
a
b
a
Q
D
Sim
b
a
b
a
Q
D
Sim
b
a
Q
D
Sim
)
*
(
)
*
(
)
,
(
)
*
(
2
)
,
(
*
)
*
(
)
,
(
)
*
(
)
,
(
2
2
2
2
2
2
t1
t2
D
Q
27
Implementation (space)
 Matrix is very sparse: a few 100s terms for
a document, and a few terms for a query,
while the term space is large (~100k)
 Stored as:
D1  {(t1, a1), (t2,a2), …}
t1  {(D1,a1), …}
28
Implementation (time)
 The implementation of VSM with dot product:
 Naïve implementation: O(m*n)
 Implementation using inverted file:
Given a query = {(t1,b1), (t2,b2)}:
1. find the sets of related documents through inverted file for
t1 and t2
2. calculate the score of the documents to each weighted term
(t1,b1)  {(D1,a1 *b1), …}
3. combine the sets and sum the weights ()
 O(|Q|*n)
29
Other similarities
 Cosine:
- use and to normalize the
weights after indexing
- Dot product
(Similar operations do not apply to Dice and
Jaccard)



 



j
j
i
i
j
j
i
j j
i
i
i
i
i
b
b
a
a
b
a
b
a
Q
D
Sim
2
2
2
2
*
)
*
(
)
,
(

j
j
a
2

j
j
b
2
30
Probabilistic model
 Given D, estimate P(R|D) and P(NR|D)
 P(R|D)=P(D|R)*P(R)/P(D) (P(D), P(R) constant)
 P(D|R)
D = {t1=x1, t2=x2, …}
























i
i
i
i
i
i
i
i
i
i
i
i
i
i
t
x
i
x
i
t
x
i
x
i
t
x
i
x
i
t
x
i
x
i
D
x
t
i
i
q
q
NR
t
P
NR
t
P
NR
D
P
p
p
R
t
P
R
t
P
R
x
t
P
R
D
P
)
1
(
)
1
(
)
1
(
)
1
(
)
(
)
1
(
)
|
0
(
)
|
1
(
)
|
(
)
1
(
)
|
0
(
)
|
1
(
)
|
(
)
|
(




absent
present
xi
0
1
31
Prob. model (cont’d)
)
1
(
)
1
(
log
1
1
log
)
1
(
)
1
(
log
)
1
(
)
1
(
log
)
|
(
)
|
(
log
)
( )
1
(
)
1
(
i
i
i
i
t
i
t i
i
i
i
i
i
t
i
t
x
i
x
i
t
x
i
x
i
p
q
q
p
x
q
p
p
q
q
p
x
q
q
p
p
NR
D
P
R
D
P
D
Odd
i
i
i
i
i
i
i
i
i




















For document ranking
32
Prob. model (cont’d)
 How to estimate pi and qi?
 A set of N relevant and
irrelevant samples:
ri
Rel. doc.
with ti
ni-ri
Irrel.doc.
with ti
ni
Doc.
with ti
Ri-ri
Rel. doc.
without ti
N-Ri–n+ri
Irrel.doc.
without ti
N-ni
Doc.
without ti
Ri
Rel. doc
N-Ri
Irrel.doc.
N
Samples
i
i
i
i
i
i
i
R
N
r
n
q
R
r
p




33
Prob. model (cont’d)
 Smoothing (Robertson-Sparck-Jones formula)
 When no sample is available:
pi=0.5,
qi=(ni+0.5)/(N+0.5)ni/N
 May be implemented as VSM
)
)(
(
)
(
)
1
(
)
1
(
log
)
(
i
i
i
i
i
i
i
i
t
i
i
i
i
i
t
i
r
n
r
R
r
n
R
N
r
x
p
q
q
p
x
D
Odd
i
i












 











D
t
i
i
i
i
i
i
i
i
i
t
i
i
i
w
r
n
r
R
r
n
R
N
r
x
D
Odd
)
5
.
0
)(
5
.
0
(
)
5
.
0
)(
5
.
0
(
)
(
34
BM25
 k1, k2, k3, d: parameters
 qtf: query term frequency
 dl: document length
 avdl: average document length
)
)
1
((
|
|
)
1
(
)
1
(
)
,
(
1
2
3
3
1
dl
avdl
dl
b
b
k
K
dl
avdl
dl
avdl
Q
k
qtf
k
qtf
k
tf
K
tf
k
w
Q
D
Score
Q
t











 

35
(Classic) Presentation of results
 Query evaluation result is a list of documents,
sorted by their similarity to the query.
 E.g.
doc1 0.67
doc2 0.65
doc3 0.54
…
36
System evaluation
 Efficiency: time, space
 Effectiveness:
 How is a system capable of retrieving relevant
documents?
 Is a system better than another one?
 Metrics often used (together):
 Precision = retrieved relevant docs / retrieved docs
 Recall = retrieved relevant docs / relevant docs
relevant retrieved
retrieved relevant
37
General form of precision/recall
Precision
1.0
Recall
1.0
-Precision change w.r.t. Recall (not a fixed point)
-Systems cannot compare at one Precision/Recall point
-Average precision (on 11 points of recall: 0.0, 0.1, …, 1.0)
38
An illustration of P/R
calculation
List Rel?
Doc1 Y
Doc2
Doc3 Y
Doc4 Y
Doc5
…
Precision
1.0 - * (0.2, 1.0)
0.8 - * (0.6, 0.75)
* (0.4, 0.67)
0.6 - * (0.6, 0.6)
* (0.2, 0.5)
0.4 -
0.2 -
0.0 | | | | | Recall
0.2 0.4 0.6 0.8 1.0
Assume: 5 relevant docs.
39
MAP (Mean Average Precision)
 rij = rank of the j-th relevant document for Qi
 |Ri| = #rel. doc. for Qi
 n = # test queries
 E.g. Rank: 1 4 1st rel. doc.
5 8 2nd rel. doc.
10 3rd rel. doc.
 


i i
j
Q R
D ij
i r
j
R
n
MAP
|
|
1
1
)]
8
2
4
1
(
2
1
)
10
3
5
2
1
1
(
3
1
[
2
1





MAP
40
Some other measures
 Noise = retrieved irrelevant docs / retrieved docs
 Silence = non-retrieved relevant docs / relevant docs
 Noise = 1 – Precision; Silence = 1 – Recall
 Fallout = retrieved irrel. docs / irrel. docs
 Single value measures:
 F-measure = 2 P * R / (P + R)
 Average precision = average at 11 points of recall
 Precision at n document (often used for Web IR)
 Expected search length (no. irrelevant documents to read
before obtaining n relevant doc.)
41
Test corpus
 Compare different IR systems on the same
test corpus
 A test corpus contains:
 A set of documents
 A set of queries
 Relevance judgment for every document-query pair
(desired answers for each query)
 The results of a system is compared with the
desired answers.
42
An evaluation example
(SMART)
Run number: 1 2
Num_queries: 52 52
Total number of documents over
all queries
Retrieved: 780 780
Relevant: 796 796
Rel_ret: 246 229
Recall - Precision Averages:
at 0.00 0.7695 0.7894
at 0.10 0.6618 0.6449
at 0.20 0.5019 0.5090
at 0.30 0.3745 0.3702
at 0.40 0.2249 0.3070
at 0.50 0.1797 0.2104
at 0.60 0.1143 0.1654
at 0.70 0.0891 0.1144
at 0.80 0.0891 0.1096
at 0.90 0.0699 0.0904
at 1.00 0.0699 0.0904
Average precision for all points
11-pt Avg: 0.2859 0.3092
% Change: 8.2
Recall:
Exact: 0.4139 0.4166
at 5 docs: 0.2373 0.2726
at 10 docs: 0.3254 0.3572
at 15 docs: 0.4139 0.4166
at 30 docs: 0.4139 0.4166
Precision:
Exact: 0.3154
0.2936
At 5 docs: 0.4308 0.4192
At 10 docs: 0.3538 0.3327
At 15 docs: 0.3154 0.2936
At 30 docs: 0.1577 0.1468
43
The TREC experiments
 Once per year
 A set of documents and queries are distributed
to the participants (the standard answers are
unknown) (April)
 Participants work (very hard) to construct, fine-
tune their systems, and submit the answers
(1000/query) at the deadline (July)
 NIST people manually evaluate the answers
and provide correct answers (and classification
of IR systems) (July – August)
 TREC conference (November)
44
TREC evaluation methodology
 Known document collection (>100K) and query set
(50)
 Submission of 1000 documents for each query by
each participant
 Merge 100 first documents of each participant ->
global pool
 Human relevance judgment of the global pool
 The other documents are assumed to be irrelevant
 Evaluation of each system (with 1000 answers)
 Partial relevance judgments
 But stable for system ranking
45
Tracks (tasks)
 Ad Hoc track: given document collection, different
topics
 Routing (filtering): stable interests (user profile),
incoming document flow
 CLIR: Ad Hoc, but with queries in a different
language
 Web: a large set of Web pages
 Question-Answering: When did Nixon visit China?
 Interactive: put users into action with system
 Spoken document retrieval
 Image and video retrieval
 Information tracking: new topic / follow up
46
CLEF and NTCIR
 CLEF = Cross-Language Experimental
Forum
 for European languages
 organized by Europeans
 Each per year (March – Oct.)
 NTCIR:
 Organized by NII (Japan)
 For Asian languages
 cycle of 1.5 year
47
Impact of TREC
 Provide large collections for further
experiments
 Compare different systems/techniques on
realistic data
 Develop new methodology for system
evaluation
 Similar experiments are organized in other
areas (NLP, Machine translation,
Summarization, …)
48
Some techniques to
improve IR effectiveness
 Interaction with user (relevance feedback)
- Keywords only cover part of the contents
- User can help by indicating relevant/irrelevant
document
 The use of relevance feedback
 To improve query expression:
Qnew = *Qold + *Rel_d - *Nrel_d
where Rel_d = centroid of relevant documents
NRel_d = centroid of non-relevant documents
49
Effect of RF
* x * x x
* * * x x
* * R* Q * NR x
* x * x x
* * x
Qnew
* * *
* *
* *
* *
1st retrieval
2nd retrieval
50
Modified relevance feedback
 Users usually do not cooperate (e.g.
AltaVista in early years)
 Pseudo-relevance feedback (Blind RF)
 Using the top-ranked documents as if they
are relevant:
 Select m terms from n top-ranked documents
 One can usually obtain about 10% improvement
51
Query expansion
 A query contains part of the important words
 Add new (related) terms into the query
 Manually constructed knowledge base/thesaurus
(e.g. Wordnet)
 Q = information retrieval
 Q’ = (information + data + knowledge + …)
(retrieval + search + seeking + …)
 Corpus analysis:
 two terms that often co-occur are related (Mutual
information)
 Two terms that co-occur with the same words are
related (e.g. T-shirt and coat with wear, …)
52
Global vs. local context analysis
 Global analysis: use the whole document
collection to calculate term relationships
 Local analysis: use the query to retrieve a
subset of documents, then calculate term
relationships
 Combine pseudo-relevance feedback and term co-
occurrences
 More effective than global analysis
53
Some current research topics:
Go beyond keywords
 Keywords are not perfect representatives of concepts
 Ambiguity:
table = data structure, furniture?
 Lack of precision:
“operating”, “system” less precise than “operating_system”
 Suggested solution
 Sense disambiguation (difficult due to the lack of contextual
information)
 Using compound terms (no complete dictionary of
compound terms, variation in form)
 Using noun phrases (syntactic patterns + statistics)
 Still a long way to go
54
Theory …
 Bayesian networks
 P(Q|D)
D1 D2 D3 … Dm
t1 t2 t3 t4 …. tn
c1 c2 c3 c4 … cl
Inference Q revision
 Language models
55
Logical models
 How to describe the relevance
relation as a logical relation?
D => Q
 What are the properties of this
relation?
 How to combine uncertainty with a
logical framework?
 The problem: What is relevance?
56
Related applications:
Information filtering
 IR: changing queries on stable document collection
 IF: incoming document flow with stable interests
(queries)
 yes/no decision (in stead of ordering documents)
 Advantage: the description of user’s interest may be
improved using relevance feedback (the user is more willing
to cooperate)
 Difficulty: adjust threshold to keep/ignore document
 The basic techniques used for IF are the same as those for
IR – “Two sides of the same coin”
IF
… doc3, doc2, doc1
keep
ignore
User profile
57
IR for (semi-)structured
documents
 Using structural information to assign weights
to keywords (Introduction, Conclusion, …)
 Hierarchical indexing
 Querying within some structure (search in
title, etc.)
 INEX experiments
 Using hyperlinks in indexing and retrieval
(e.g. Google)
 …
58
PageRank in Google
 Assign a numeric value to each page
 The more a page is referred to by important pages, the more this
page is important
 d: damping factor (0.85)
 Many other criteria: e.g. proximity of query words
 “…information retrieval …” better than “… information … retrieval …”
A B 



i i
i
I
C
I
PR
d
d
A
PR
)
(
)
(
)
1
(
)
(
I1
I2
59
IR on the Web
 No stable document collection (spider,
crawler)
 Invalid document, duplication, etc.
 Huge number of documents (partial
collection)
 Multimedia documents
 Great variation of document quality
 Multilingual problem
 …
60
Final remarks on IR
 IR is related to many areas:
 NLP, AI, database, machine learning, user
modeling…
 library, Web, multimedia search, …
 Relatively week theories
 Very strong tradition of experiments
 Many remaining (and exciting) problems
 Difficult area: Intuitive methods do not
necessarily improve effectiveness in practice
61
Why is IR difficult
 Vocabularies mismatching
 Synonymy: e.g. car v.s. automobile
 Polysemy: table
 Queries are ambiguous, they are partial specification
of user’s need
 Content representation may be inadequate and
incomplete
 The user is the ultimate judge, but we don’t know
how the judge judges…
 The notion of relevance is imprecise, context- and user-
dependent
 But how much it is rewarding to gain 10%
improvement!

More Related Content

Similar to Intro.ppt

Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdfHabtamu100
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexityIntro C# Book
 
Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, M...
Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, M...Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, M...
Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, M...Goran S. Milovanovic
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCESON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCESijitjournal
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowOswald Campesato
 
similarity measure
similarity measure similarity measure
similarity measure ZHAO Sam
 
Deep Learning, Scala, and Spark
Deep Learning, Scala, and SparkDeep Learning, Scala, and Spark
Deep Learning, Scala, and SparkOswald Campesato
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630Yong Joon Moon
 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_fariaPaulo Faria
 
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...DrkhanchanaR
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.pptssuserd64918
 

Similar to Intro.ppt (20)

Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
Term weighting
Term weightingTerm weighting
Term weighting
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity
 
Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, M...
Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, M...Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, M...
Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, M...
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
H2 o berkeleydltf
H2 o berkeleydltfH2 o berkeleydltf
H2 o berkeleydltf
 
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCESON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
ON RUN-LENGTH-CONSTRAINED BINARY SEQUENCES
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
similarity measure
similarity measure similarity measure
similarity measure
 
Deep Learning, Scala, and Spark
Deep Learning, Scala, and SparkDeep Learning, Scala, and Spark
Deep Learning, Scala, and Spark
 
Q
QQ
Q
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630
 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria
 
AlgorithmAnalysis2.ppt
AlgorithmAnalysis2.pptAlgorithmAnalysis2.ppt
AlgorithmAnalysis2.ppt
 
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
 

More from WrushabhShirsat3

unit-iipart-1.WDQWDQWDQWDQWDQWDQWDQWDQWDQWDppt
unit-iipart-1.WDQWDQWDQWDQWDQWDQWDQWDQWDQWDpptunit-iipart-1.WDQWDQWDQWDQWDQWDQWDQWDQWDQWDppt
unit-iipart-1.WDQWDQWDQWDQWDQWDQWDQWDQWDQWDpptWrushabhShirsat3
 
01_intro.pfwekfmwEKFHswfuhSWFYUGsduyfgSWY6FEWUFpt
01_intro.pfwekfmwEKFHswfuhSWFYUGsduyfgSWY6FEWUFpt01_intro.pfwekfmwEKFHswfuhSWFYUGsduyfgSWY6FEWUFpt
01_intro.pfwekfmwEKFHswfuhSWFYUGsduyfgSWY6FEWUFptWrushabhShirsat3
 
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...WrushabhShirsat3
 
chapter7.pptfuifbsdiufbsiudfiudfiufeiufiuf
chapter7.pptfuifbsdiufbsiudfiudfiufeiufiufchapter7.pptfuifbsdiufbsiudfiudfiufeiufiuf
chapter7.pptfuifbsdiufbsiudfiudfiufeiufiufWrushabhShirsat3
 
m150c1.pptfefgefuygfiuffwefnefjufbwefbweiufwiuefiuefhiuefhwiuefhwuiefhuwiefhwiue
m150c1.pptfefgefuygfiuffwefnefjufbwefbweiufwiuefiuefhiuefhwiuefhwuiefhuwiefhwiuem150c1.pptfefgefuygfiuffwefnefjufbwefbweiufwiuefiuefhiuefhwiuefhwuiefhuwiefhwiue
m150c1.pptfefgefuygfiuffwefnefjufbwefbweiufwiuefiuefhiuefhwiuefhwuiefhuwiefhwiueWrushabhShirsat3
 
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytxjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytWrushabhShirsat3
 
jhbuhbhujnhyubhbuybuybuybbuhyybuybuybuybybyubyubybybb
jhbuhbhujnhyubhbuybuybuybbuhyybuybuybuybybyubyubybybbjhbuhbhujnhyubhbuybuybuybbuhyybuybuybuybybyubyubybybb
jhbuhbhujnhyubhbuybuybuybbuhyybuybuybuybybyubyubybybbWrushabhShirsat3
 
dcvdhusdbsduvb0sdyvbsdyvbsdvysdvysdbvsydvdbvbyubdvbdvhvhvhvh
dcvdhusdbsduvb0sdyvbsdyvbsdvysdvysdbvsydvdbvbyubdvbdvhvhvhvhdcvdhusdbsduvb0sdyvbsdyvbsdvysdvysdbvsydvdbvbyubdvbdvhvhvhvh
dcvdhusdbsduvb0sdyvbsdyvbsdvysdvysdbvsydvdbvbyubdvbdvhvhvhvhWrushabhShirsat3
 
asdabuydvduyawdyuadauysdasuydyudayudayudaw
asdabuydvduyawdyuadauysdasuydyudayudayudawasdabuydvduyawdyuadauysdasuydyudayudayudaw
asdabuydvduyawdyuadauysdasuydyudayudayudawWrushabhShirsat3
 
Ch01_Introduction_to_SPM.pdf
Ch01_Introduction_to_SPM.pdfCh01_Introduction_to_SPM.pdf
Ch01_Introduction_to_SPM.pdfWrushabhShirsat3
 
chapter1-conventionalsoftwaremanagement1 (1).ppt
chapter1-conventionalsoftwaremanagement1 (1).pptchapter1-conventionalsoftwaremanagement1 (1).ppt
chapter1-conventionalsoftwaremanagement1 (1).pptWrushabhShirsat3
 

More from WrushabhShirsat3 (20)

unit-iipart-1.WDQWDQWDQWDQWDQWDQWDQWDQWDQWDppt
unit-iipart-1.WDQWDQWDQWDQWDQWDQWDQWDQWDQWDpptunit-iipart-1.WDQWDQWDQWDQWDQWDQWDQWDQWDQWDppt
unit-iipart-1.WDQWDQWDQWDQWDQWDQWDQWDQWDQWDppt
 
01_intro.pfwekfmwEKFHswfuhSWFYUGsduyfgSWY6FEWUFpt
01_intro.pfwekfmwEKFHswfuhSWFYUGsduyfgSWY6FEWUFpt01_intro.pfwekfmwEKFHswfuhSWFYUGsduyfgSWY6FEWUFpt
01_intro.pfwekfmwEKFHswfuhSWFYUGsduyfgSWY6FEWUFpt
 
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
chapter1-convehisudhiusdiudiudsiusdiuddsdshdibsdiubdsjxkjxjntionalsoftwareman...
 
chapter7.pptfuifbsdiufbsiudfiudfiufeiufiuf
chapter7.pptfuifbsdiufbsiudfiudfiufeiufiufchapter7.pptfuifbsdiufbsiudfiudfiufeiufiuf
chapter7.pptfuifbsdiufbsiudfiudfiufeiufiuf
 
m150c1.pptfefgefuygfiuffwefnefjufbwefbweiufwiuefiuefhiuefhwiuefhwuiefhuwiefhwiue
m150c1.pptfefgefuygfiuffwefnefjufbwefbweiufwiuefiuefhiuefhwiuefhwuiefhuwiefhwiuem150c1.pptfefgefuygfiuffwefnefjufbwefbweiufwiuefiuefhiuefhwiuefhwuiefhuwiefhwiue
m150c1.pptfefgefuygfiuffwefnefjufbwefbweiufwiuefiuefhiuefhwiuefhwuiefhuwiefhwiue
 
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytxjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
 
jhbuhbhujnhyubhbuybuybuybbuhyybuybuybuybybyubyubybybb
jhbuhbhujnhyubhbuybuybuybbuhyybuybuybuybybyubyubybybbjhbuhbhujnhyubhbuybuybuybbuhyybuybuybuybybyubyubybybb
jhbuhbhujnhyubhbuybuybuybbuhyybuybuybuybybyubyubybybb
 
dcvdhusdbsduvb0sdyvbsdyvbsdvysdvysdbvsydvdbvbyubdvbdvhvhvhvh
dcvdhusdbsduvb0sdyvbsdyvbsdvysdvysdbvsydvdbvbyubdvbdvhvhvhvhdcvdhusdbsduvb0sdyvbsdyvbsdvysdvysdbvsydvdbvbyubdvbdvhvhvhvh
dcvdhusdbsduvb0sdyvbsdyvbsdvysdvysdbvsydvdbvbyubdvbdvhvhvhvh
 
asdabuydvduyawdyuadauysdasuydyudayudayudaw
asdabuydvduyawdyuadauysdasuydyudayudayudawasdabuydvduyawdyuadauysdasuydyudayudayudaw
asdabuydvduyawdyuadauysdasuydyudayudayudaw
 
lecture1-intro.ppt
lecture1-intro.pptlecture1-intro.ppt
lecture1-intro.ppt
 
IntroT.ppt
IntroT.pptIntroT.ppt
IntroT.ppt
 
scan.pptx
scan.pptxscan.pptx
scan.pptx
 
2.pptx
2.pptx2.pptx
2.pptx
 
papp01.pptx
papp01.pptxpapp01.pptx
papp01.pptx
 
80410172053.pdf
80410172053.pdf80410172053.pdf
80410172053.pdf
 
asabydqwfdytwdv.ppt
asabydqwfdytwdv.pptasabydqwfdytwdv.ppt
asabydqwfdytwdv.ppt
 
Ch01_Introduction_to_SPM.pdf
Ch01_Introduction_to_SPM.pdfCh01_Introduction_to_SPM.pdf
Ch01_Introduction_to_SPM.pdf
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
chapter1-conventionalsoftwaremanagement1 (1).ppt
chapter1-conventionalsoftwaremanagement1 (1).pptchapter1-conventionalsoftwaremanagement1 (1).ppt
chapter1-conventionalsoftwaremanagement1 (1).ppt
 
ABCD.ppt
ABCD.pptABCD.ppt
ABCD.ppt
 

Recently uploaded

Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...Neil Horowitz
 
Indian Premiere League 2024 by livecricline
Indian Premiere League 2024 by livecriclineIndian Premiere League 2024 by livecricline
Indian Premiere League 2024 by livecriclineLive Cric Line
 
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝soniya singh
 
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Judith Chuquipul
 
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdfJORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdfArturo Pacheco Alvarez
 
08448380779 Call Girls In International Airport Women Seeking Men
08448380779 Call Girls In International Airport Women Seeking Men08448380779 Call Girls In International Airport Women Seeking Men
08448380779 Call Girls In International Airport Women Seeking MenDelhi Call girls
 
大学假文凭《原版英国Imperial文凭》帝国理工学院毕业证制作成绩单修改
大学假文凭《原版英国Imperial文凭》帝国理工学院毕业证制作成绩单修改大学假文凭《原版英国Imperial文凭》帝国理工学院毕业证制作成绩单修改
大学假文凭《原版英国Imperial文凭》帝国理工学院毕业证制作成绩单修改atducpo
 
ppt on Myself, Occupation and my Interest
ppt on Myself, Occupation and my Interestppt on Myself, Occupation and my Interest
ppt on Myself, Occupation and my InterestNagaissenValaydum
 
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...World Wide Tickets And Hospitality
 
ALL NFL NETWORK CONTACTS- April 29, 2024
ALL NFL NETWORK CONTACTS- April 29, 2024ALL NFL NETWORK CONTACTS- April 29, 2024
ALL NFL NETWORK CONTACTS- April 29, 2024Brian Slack
 
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service 🧣
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service  🧣CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service  🧣
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service 🧣anilsa9823
 
Interpreting the Secrets of Milan Night Chart
Interpreting the Secrets of Milan Night ChartInterpreting the Secrets of Milan Night Chart
Interpreting the Secrets of Milan Night ChartChart Kalyan
 
Chennai Call Girls Anna Nagar Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Anna Nagar Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Anna Nagar Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Anna Nagar Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
08448380779 Call Girls In Lajpat Nagar Women Seeking Men
08448380779 Call Girls In Lajpat Nagar Women Seeking Men08448380779 Call Girls In Lajpat Nagar Women Seeking Men
08448380779 Call Girls In Lajpat Nagar Women Seeking MenDelhi Call girls
 
( Sports training) All topic (MCQs).pptx
( Sports training) All topic (MCQs).pptx( Sports training) All topic (MCQs).pptx
( Sports training) All topic (MCQs).pptxParshotamGupta1
 
08448380779 Call Girls In Karol Bagh Women Seeking Men
08448380779 Call Girls In Karol Bagh Women Seeking Men08448380779 Call Girls In Karol Bagh Women Seeking Men
08448380779 Call Girls In Karol Bagh Women Seeking MenDelhi Call girls
 
大学学位办理《原版美国USD学位证书》圣地亚哥大学毕业证制作成绩单修改
大学学位办理《原版美国USD学位证书》圣地亚哥大学毕业证制作成绩单修改大学学位办理《原版美国USD学位证书》圣地亚哥大学毕业证制作成绩单修改
大学学位办理《原版美国USD学位证书》圣地亚哥大学毕业证制作成绩单修改atducpo
 

Recently uploaded (20)

Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
Atlanta Dream Exec Dan Gadd on Driving Fan Engagement and Growth, Serving the...
 
Indian Premiere League 2024 by livecricline
Indian Premiere League 2024 by livecriclineIndian Premiere League 2024 by livecricline
Indian Premiere League 2024 by livecricline
 
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
 
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
 
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdfJORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
 
08448380779 Call Girls In International Airport Women Seeking Men
08448380779 Call Girls In International Airport Women Seeking Men08448380779 Call Girls In International Airport Women Seeking Men
08448380779 Call Girls In International Airport Women Seeking Men
 
大学假文凭《原版英国Imperial文凭》帝国理工学院毕业证制作成绩单修改
大学假文凭《原版英国Imperial文凭》帝国理工学院毕业证制作成绩单修改大学假文凭《原版英国Imperial文凭》帝国理工学院毕业证制作成绩单修改
大学假文凭《原版英国Imperial文凭》帝国理工学院毕业证制作成绩单修改
 
ppt on Myself, Occupation and my Interest
ppt on Myself, Occupation and my Interestppt on Myself, Occupation and my Interest
ppt on Myself, Occupation and my Interest
 
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
Spain Vs Italy 20 players confirmed for Spain's Euro 2024 squad, and three po...
 
Call Girls Service Noida Extension @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Noida Extension @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Noida Extension @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Noida Extension @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
ALL NFL NETWORK CONTACTS- April 29, 2024
ALL NFL NETWORK CONTACTS- April 29, 2024ALL NFL NETWORK CONTACTS- April 29, 2024
ALL NFL NETWORK CONTACTS- April 29, 2024
 
Call Girls 🫤 Malviya Nagar ➡️ 9999965857 ➡️ Delhi 🫦 Russian Escorts FULL ENJOY
Call Girls 🫤 Malviya Nagar ➡️ 9999965857  ➡️ Delhi 🫦  Russian Escorts FULL ENJOYCall Girls 🫤 Malviya Nagar ➡️ 9999965857  ➡️ Delhi 🫦  Russian Escorts FULL ENJOY
Call Girls 🫤 Malviya Nagar ➡️ 9999965857 ➡️ Delhi 🫦 Russian Escorts FULL ENJOY
 
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service 🧣
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service  🧣CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service  🧣
CALL ON ➥8923113531 🔝Call Girls Telibagh Lucknow best Night Fun service 🧣
 
Interpreting the Secrets of Milan Night Chart
Interpreting the Secrets of Milan Night ChartInterpreting the Secrets of Milan Night Chart
Interpreting the Secrets of Milan Night Chart
 
Chennai Call Girls Anna Nagar Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Anna Nagar Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Anna Nagar Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Anna Nagar Phone 🍆 8250192130 👅 celebrity escorts service
 
08448380779 Call Girls In Lajpat Nagar Women Seeking Men
08448380779 Call Girls In Lajpat Nagar Women Seeking Men08448380779 Call Girls In Lajpat Nagar Women Seeking Men
08448380779 Call Girls In Lajpat Nagar Women Seeking Men
 
( Sports training) All topic (MCQs).pptx
( Sports training) All topic (MCQs).pptx( Sports training) All topic (MCQs).pptx
( Sports training) All topic (MCQs).pptx
 
Call Girls 🫤 Paharganj ➡️ 9999965857 ➡️ Delhi 🫦 Russian Escorts FULL ENJOY
Call Girls 🫤 Paharganj ➡️ 9999965857  ➡️ Delhi 🫦  Russian Escorts FULL ENJOYCall Girls 🫤 Paharganj ➡️ 9999965857  ➡️ Delhi 🫦  Russian Escorts FULL ENJOY
Call Girls 🫤 Paharganj ➡️ 9999965857 ➡️ Delhi 🫦 Russian Escorts FULL ENJOY
 
08448380779 Call Girls In Karol Bagh Women Seeking Men
08448380779 Call Girls In Karol Bagh Women Seeking Men08448380779 Call Girls In Karol Bagh Women Seeking Men
08448380779 Call Girls In Karol Bagh Women Seeking Men
 
大学学位办理《原版美国USD学位证书》圣地亚哥大学毕业证制作成绩单修改
大学学位办理《原版美国USD学位证书》圣地亚哥大学毕业证制作成绩单修改大学学位办理《原版美国USD学位证书》圣地亚哥大学毕业证制作成绩单修改
大学学位办理《原版美国USD学位证书》圣地亚哥大学毕业证制作成绩单修改
 

Intro.ppt

  • 1. 1 Introduction to Information Retrieval Jian-Yun Nie University of Montreal Canada
  • 2. 2 Outline  What is the IR problem?  How to organize an IR system? (Or the main processes in IR)  Indexing  Retrieval  System evaluation  Some current research topics
  • 3. 3 The problem of IR  Goal = find documents relevant to an information need from a large document set Document collection Info. need Query Answer list IR system Retrieval
  • 5. 5 IR problem  First applications: in libraries (1950s) ISBN: 0-201-12227-8 Author: Salton, Gerard Title: Automatic text processing: the transformation, analysis, and retrieval of information by computer Editor: Addison-Wesley Date: 1989 Content: <Text>  external attributes and internal attribute (content)  Search by external attributes = Search in DB  IR: search by content
  • 6. 6 Possible approaches 1. String matching (linear search in documents) - Slow - Difficult to improve 2. Indexing (*) - Fast - Flexible to further improvement
  • 7. 7 Indexing-based IR Document Query indexing indexing (Query analysis) Representation Representation (keywords) Query (keywords) evaluation
  • 8. 8 Main problems in IR  Document and query indexing  How to best represent their contents?  Query evaluation (or retrieval process)  To what extent does a document correspond to a query?  System evaluation  How good is a system?  Are the retrieved documents relevant? (precision)  Are all the relevant documents retrieved? (recall)
  • 9. 9 Document indexing  Goal = Find the important meanings and create an internal representation  Factors to consider:  Accuracy to represent meanings (semantics)  Exhaustiveness (cover all the contents)  Facility for computer to manipulate  What is the best representation of contents?  Char. string (char trigrams): not precise enough  Word: good coverage, not precise  Phrase: poor coverage, more precise  Concept: poor coverage, precise Coverage (Recall) Accuracy (Precision) String Word Phrase Concept
  • 10. 10 Keyword selection and weighting  How to select important keywords?  Simple method: using middle-frequency words Frequency/Informativity frequency informativity Max. Min. 1 2 3 … Rank
  • 11. 11  tf = term frequency  frequency of a term/keyword in a document The higher the tf, the higher the importance (weight) for the doc.  df = document frequency  no. of documents containing the term  distribution of the term  idf = inverse document frequency  the unevenness of term distribution in the corpus  the specificity of term to a document The more the term is distributed evenly, the less it is specific to a document weight(t,D) = tf(t,D) * idf(t) tf*idf weighting schema
  • 12. 12 Some common tf*idf schemes  tf(t, D)=freq(t,D) idf(t) = log(N/n)  tf(t, D)=log[freq(t,D)] n = #docs containing t  tf(t, D)=log[freq(t,D)]+1 N = #docs in corpus  tf(t, D)=freq(t,d)/Max[f(t,d)] weight(t,D) = tf(t,D) * idf(t)  Normalization: Cosine normalization, /max, …
  • 13. 13 Document Length Normalization  Sometimes, additional normalizations e.g. length: ) , ( _ ) 1 ( 1 ) , ( ) , ( D t weight normalized povot slope slope D t weight D t pivoted     pivot Probability of relevance Probability of retrieval Doc. length slope
  • 14. 14  function words do not bear useful information for IR of, in, about, with, I, although, …  Stoplist: contain stopwords, not to be used as index  Prepositions  Articles  Pronouns  Some adverbs and adjectives  Some frequent words (e.g. document)  The removal of stopwords usually improves IR effectiveness  A few “standard” stoplists are commonly used. Stopwords / Stoplist
  • 15. 15 Stemming  Reason:  Different word forms may bear similar meaning (e.g. search, searching): create a “standard” representation for them  Stemming:  Removing some endings of word computer compute computes computing computed computation comput
  • 16. 16 Porter algorithm (Porter, M.F., 1980, An algorithm for suffix stripping, Program, 14(3) :130-137)  Step 1: plurals and past participles  SSES -> SS caresses -> caress  (*v*) ING -> motoring -> motor  Step 2: adj->n, n->v, n->adj, …  (m>0) OUSNESS -> OUS callousness -> callous  (m>0) ATIONAL -> ATE relational -> relate  Step 3:  (m>0) ICATE -> IC triplicate -> triplic  Step 4:  (m>1) AL -> revival -> reviv  (m>1) ANCE -> allowance -> allow  Step 5:  (m>1) E -> probate -> probat  (m > 1 and *d and *L) -> single letter controll -> control
  • 17. 17 Lemmatization  transform to standard form according to syntactic category. E.g. verb + ing  verb noun + s  noun  Need POS tagging  More accurate than stemming, but needs more resources  crucial to choose stemming/lemmatization rules noise v.s. recognition rate  compromise between precision and recall light/no stemming severe stemming -recall +precision +recall -precision
  • 18. 18 Result of indexing  Each document is represented by a set of weighted keywords (terms): D1  {(t1, w1), (t2,w2), …} e.g.D1  {(comput, 0.2), (architect, 0.3), …} D2  {(comput, 0.1), (network, 0.5), …}  Inverted file: comput  {(D1,0.2), (D2,0.1), …} Inverted file is used during retrieval for higher efficiency.
  • 19. 19 Retrieval  The problems underlying retrieval  Retrieval model  How is a document represented with the selected keywords?  How are document and query representations compared to calculate a score?  Implementation
  • 20. 20 Cases  1-word query: The documents to be retrieved are those that include the word - Retrieve the inverted list for the word - Sort in decreasing order of the weight of the word  Multi-word query? - Combining several lists - How to interpret the weight? (IR model)
  • 21. 21 IR models  Matching score model  Document D = a set of weighted keywords  Query Q = a set of non-weighted keywords  R(D, Q) = i w(ti , D) where ti is in Q.
  • 22. 22 Boolean model  Document = Logical conjunction of keywords  Query = Boolean expression of keywords  R(D, Q) = D Q e.g. D = t1  t2  …  tn Q = (t1  t2)  (t3  t4) D Q, thus R(D, Q) = 1. Problems:  R is either 1 or 0 (unordered set of documents)  many documents or few documents  End-users cannot manipulate Boolean operators correctly E.g. documents about kangaroos and koalas
  • 23. 23 Extensions to Boolean model (for document ordering)  D = {…, (ti, wi), …}: weighted keywords  Interpretation:  D is a member of class ti to degree wi.  In terms of fuzzy sets: ti(D) = wi A possible Evaluation: R(D, ti) = ti(D); R(D, Q1  Q2) = min(R(D, Q1), R(D, Q2)); R(D, Q1  Q2) = max(R(D, Q1), R(D, Q2)); R(D, Q1) = 1 - R(D, Q1).
  • 24. 24 Vector space model  Vector space = all the keywords encountered <t1, t2, t3, …, tn>  Document D = < a1, a2, a3, …, an> ai = weight of ti in D  Query Q = < b1, b2, b3, …, bn> bi = weight of ti in Q  R(D,Q) = Sim(D,Q)
  • 25. 25 Matrix representation t1 t2 t3 … tn D1 a11 a12 a13 … a1n D2 a21 a22 a23 … a2n D3 a31 a32 a33 … a3n … Dm am1 am2 am3 … amn Q b1 b2 b3 … bn Term vector space Document space
  • 26. 26 Some formulas for Sim Dot product Cosine Dice Jaccard                   i i i i i i i i i i i i i i i i i i i i i i i i i i b a b a b a Q D Sim b a b a Q D Sim b a b a Q D Sim b a Q D Sim ) * ( ) * ( ) , ( ) * ( 2 ) , ( * ) * ( ) , ( ) * ( ) , ( 2 2 2 2 2 2 t1 t2 D Q
  • 27. 27 Implementation (space)  Matrix is very sparse: a few 100s terms for a document, and a few terms for a query, while the term space is large (~100k)  Stored as: D1  {(t1, a1), (t2,a2), …} t1  {(D1,a1), …}
  • 28. 28 Implementation (time)  The implementation of VSM with dot product:  Naïve implementation: O(m*n)  Implementation using inverted file: Given a query = {(t1,b1), (t2,b2)}: 1. find the sets of related documents through inverted file for t1 and t2 2. calculate the score of the documents to each weighted term (t1,b1)  {(D1,a1 *b1), …} 3. combine the sets and sum the weights ()  O(|Q|*n)
  • 29. 29 Other similarities  Cosine: - use and to normalize the weights after indexing - Dot product (Similar operations do not apply to Dice and Jaccard)         j j i i j j i j j i i i i i b b a a b a b a Q D Sim 2 2 2 2 * ) * ( ) , (  j j a 2  j j b 2
  • 30. 30 Probabilistic model  Given D, estimate P(R|D) and P(NR|D)  P(R|D)=P(D|R)*P(R)/P(D) (P(D), P(R) constant)  P(D|R) D = {t1=x1, t2=x2, …}                         i i i i i i i i i i i i i i t x i x i t x i x i t x i x i t x i x i D x t i i q q NR t P NR t P NR D P p p R t P R t P R x t P R D P ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) ( ) 1 ( ) | 0 ( ) | 1 ( ) | ( ) 1 ( ) | 0 ( ) | 1 ( ) | ( ) | (     absent present xi 0 1
  • 31. 31 Prob. model (cont’d) ) 1 ( ) 1 ( log 1 1 log ) 1 ( ) 1 ( log ) 1 ( ) 1 ( log ) | ( ) | ( log ) ( ) 1 ( ) 1 ( i i i i t i t i i i i i i t i t x i x i t x i x i p q q p x q p p q q p x q q p p NR D P R D P D Odd i i i i i i i i i                     For document ranking
  • 32. 32 Prob. model (cont’d)  How to estimate pi and qi?  A set of N relevant and irrelevant samples: ri Rel. doc. with ti ni-ri Irrel.doc. with ti ni Doc. with ti Ri-ri Rel. doc. without ti N-Ri–n+ri Irrel.doc. without ti N-ni Doc. without ti Ri Rel. doc N-Ri Irrel.doc. N Samples i i i i i i i R N r n q R r p    
  • 33. 33 Prob. model (cont’d)  Smoothing (Robertson-Sparck-Jones formula)  When no sample is available: pi=0.5, qi=(ni+0.5)/(N+0.5)ni/N  May be implemented as VSM ) )( ( ) ( ) 1 ( ) 1 ( log ) ( i i i i i i i i t i i i i i t i r n r R r n R N r x p q q p x D Odd i i                          D t i i i i i i i i i t i i i w r n r R r n R N r x D Odd ) 5 . 0 )( 5 . 0 ( ) 5 . 0 )( 5 . 0 ( ) (
  • 34. 34 BM25  k1, k2, k3, d: parameters  qtf: query term frequency  dl: document length  avdl: average document length ) ) 1 (( | | ) 1 ( ) 1 ( ) , ( 1 2 3 3 1 dl avdl dl b b k K dl avdl dl avdl Q k qtf k qtf k tf K tf k w Q D Score Q t              
  • 35. 35 (Classic) Presentation of results  Query evaluation result is a list of documents, sorted by their similarity to the query.  E.g. doc1 0.67 doc2 0.65 doc3 0.54 …
  • 36. 36 System evaluation  Efficiency: time, space  Effectiveness:  How is a system capable of retrieving relevant documents?  Is a system better than another one?  Metrics often used (together):  Precision = retrieved relevant docs / retrieved docs  Recall = retrieved relevant docs / relevant docs relevant retrieved retrieved relevant
  • 37. 37 General form of precision/recall Precision 1.0 Recall 1.0 -Precision change w.r.t. Recall (not a fixed point) -Systems cannot compare at one Precision/Recall point -Average precision (on 11 points of recall: 0.0, 0.1, …, 1.0)
  • 38. 38 An illustration of P/R calculation List Rel? Doc1 Y Doc2 Doc3 Y Doc4 Y Doc5 … Precision 1.0 - * (0.2, 1.0) 0.8 - * (0.6, 0.75) * (0.4, 0.67) 0.6 - * (0.6, 0.6) * (0.2, 0.5) 0.4 - 0.2 - 0.0 | | | | | Recall 0.2 0.4 0.6 0.8 1.0 Assume: 5 relevant docs.
  • 39. 39 MAP (Mean Average Precision)  rij = rank of the j-th relevant document for Qi  |Ri| = #rel. doc. for Qi  n = # test queries  E.g. Rank: 1 4 1st rel. doc. 5 8 2nd rel. doc. 10 3rd rel. doc.     i i j Q R D ij i r j R n MAP | | 1 1 )] 8 2 4 1 ( 2 1 ) 10 3 5 2 1 1 ( 3 1 [ 2 1      MAP
  • 40. 40 Some other measures  Noise = retrieved irrelevant docs / retrieved docs  Silence = non-retrieved relevant docs / relevant docs  Noise = 1 – Precision; Silence = 1 – Recall  Fallout = retrieved irrel. docs / irrel. docs  Single value measures:  F-measure = 2 P * R / (P + R)  Average precision = average at 11 points of recall  Precision at n document (often used for Web IR)  Expected search length (no. irrelevant documents to read before obtaining n relevant doc.)
  • 41. 41 Test corpus  Compare different IR systems on the same test corpus  A test corpus contains:  A set of documents  A set of queries  Relevance judgment for every document-query pair (desired answers for each query)  The results of a system is compared with the desired answers.
  • 42. 42 An evaluation example (SMART) Run number: 1 2 Num_queries: 52 52 Total number of documents over all queries Retrieved: 780 780 Relevant: 796 796 Rel_ret: 246 229 Recall - Precision Averages: at 0.00 0.7695 0.7894 at 0.10 0.6618 0.6449 at 0.20 0.5019 0.5090 at 0.30 0.3745 0.3702 at 0.40 0.2249 0.3070 at 0.50 0.1797 0.2104 at 0.60 0.1143 0.1654 at 0.70 0.0891 0.1144 at 0.80 0.0891 0.1096 at 0.90 0.0699 0.0904 at 1.00 0.0699 0.0904 Average precision for all points 11-pt Avg: 0.2859 0.3092 % Change: 8.2 Recall: Exact: 0.4139 0.4166 at 5 docs: 0.2373 0.2726 at 10 docs: 0.3254 0.3572 at 15 docs: 0.4139 0.4166 at 30 docs: 0.4139 0.4166 Precision: Exact: 0.3154 0.2936 At 5 docs: 0.4308 0.4192 At 10 docs: 0.3538 0.3327 At 15 docs: 0.3154 0.2936 At 30 docs: 0.1577 0.1468
  • 43. 43 The TREC experiments  Once per year  A set of documents and queries are distributed to the participants (the standard answers are unknown) (April)  Participants work (very hard) to construct, fine- tune their systems, and submit the answers (1000/query) at the deadline (July)  NIST people manually evaluate the answers and provide correct answers (and classification of IR systems) (July – August)  TREC conference (November)
  • 44. 44 TREC evaluation methodology  Known document collection (>100K) and query set (50)  Submission of 1000 documents for each query by each participant  Merge 100 first documents of each participant -> global pool  Human relevance judgment of the global pool  The other documents are assumed to be irrelevant  Evaluation of each system (with 1000 answers)  Partial relevance judgments  But stable for system ranking
  • 45. 45 Tracks (tasks)  Ad Hoc track: given document collection, different topics  Routing (filtering): stable interests (user profile), incoming document flow  CLIR: Ad Hoc, but with queries in a different language  Web: a large set of Web pages  Question-Answering: When did Nixon visit China?  Interactive: put users into action with system  Spoken document retrieval  Image and video retrieval  Information tracking: new topic / follow up
  • 46. 46 CLEF and NTCIR  CLEF = Cross-Language Experimental Forum  for European languages  organized by Europeans  Each per year (March – Oct.)  NTCIR:  Organized by NII (Japan)  For Asian languages  cycle of 1.5 year
  • 47. 47 Impact of TREC  Provide large collections for further experiments  Compare different systems/techniques on realistic data  Develop new methodology for system evaluation  Similar experiments are organized in other areas (NLP, Machine translation, Summarization, …)
  • 48. 48 Some techniques to improve IR effectiveness  Interaction with user (relevance feedback) - Keywords only cover part of the contents - User can help by indicating relevant/irrelevant document  The use of relevance feedback  To improve query expression: Qnew = *Qold + *Rel_d - *Nrel_d where Rel_d = centroid of relevant documents NRel_d = centroid of non-relevant documents
  • 49. 49 Effect of RF * x * x x * * * x x * * R* Q * NR x * x * x x * * x Qnew * * * * * * * * * 1st retrieval 2nd retrieval
  • 50. 50 Modified relevance feedback  Users usually do not cooperate (e.g. AltaVista in early years)  Pseudo-relevance feedback (Blind RF)  Using the top-ranked documents as if they are relevant:  Select m terms from n top-ranked documents  One can usually obtain about 10% improvement
  • 51. 51 Query expansion  A query contains part of the important words  Add new (related) terms into the query  Manually constructed knowledge base/thesaurus (e.g. Wordnet)  Q = information retrieval  Q’ = (information + data + knowledge + …) (retrieval + search + seeking + …)  Corpus analysis:  two terms that often co-occur are related (Mutual information)  Two terms that co-occur with the same words are related (e.g. T-shirt and coat with wear, …)
  • 52. 52 Global vs. local context analysis  Global analysis: use the whole document collection to calculate term relationships  Local analysis: use the query to retrieve a subset of documents, then calculate term relationships  Combine pseudo-relevance feedback and term co- occurrences  More effective than global analysis
  • 53. 53 Some current research topics: Go beyond keywords  Keywords are not perfect representatives of concepts  Ambiguity: table = data structure, furniture?  Lack of precision: “operating”, “system” less precise than “operating_system”  Suggested solution  Sense disambiguation (difficult due to the lack of contextual information)  Using compound terms (no complete dictionary of compound terms, variation in form)  Using noun phrases (syntactic patterns + statistics)  Still a long way to go
  • 54. 54 Theory …  Bayesian networks  P(Q|D) D1 D2 D3 … Dm t1 t2 t3 t4 …. tn c1 c2 c3 c4 … cl Inference Q revision  Language models
  • 55. 55 Logical models  How to describe the relevance relation as a logical relation? D => Q  What are the properties of this relation?  How to combine uncertainty with a logical framework?  The problem: What is relevance?
  • 56. 56 Related applications: Information filtering  IR: changing queries on stable document collection  IF: incoming document flow with stable interests (queries)  yes/no decision (in stead of ordering documents)  Advantage: the description of user’s interest may be improved using relevance feedback (the user is more willing to cooperate)  Difficulty: adjust threshold to keep/ignore document  The basic techniques used for IF are the same as those for IR – “Two sides of the same coin” IF … doc3, doc2, doc1 keep ignore User profile
  • 57. 57 IR for (semi-)structured documents  Using structural information to assign weights to keywords (Introduction, Conclusion, …)  Hierarchical indexing  Querying within some structure (search in title, etc.)  INEX experiments  Using hyperlinks in indexing and retrieval (e.g. Google)  …
  • 58. 58 PageRank in Google  Assign a numeric value to each page  The more a page is referred to by important pages, the more this page is important  d: damping factor (0.85)  Many other criteria: e.g. proximity of query words  “…information retrieval …” better than “… information … retrieval …” A B     i i i I C I PR d d A PR ) ( ) ( ) 1 ( ) ( I1 I2
  • 59. 59 IR on the Web  No stable document collection (spider, crawler)  Invalid document, duplication, etc.  Huge number of documents (partial collection)  Multimedia documents  Great variation of document quality  Multilingual problem  …
  • 60. 60 Final remarks on IR  IR is related to many areas:  NLP, AI, database, machine learning, user modeling…  library, Web, multimedia search, …  Relatively week theories  Very strong tradition of experiments  Many remaining (and exciting) problems  Difficult area: Intuitive methods do not necessarily improve effectiveness in practice
  • 61. 61 Why is IR difficult  Vocabularies mismatching  Synonymy: e.g. car v.s. automobile  Polysemy: table  Queries are ambiguous, they are partial specification of user’s need  Content representation may be inadequate and incomplete  The user is the ultimate judge, but we don’t know how the judge judges…  The notion of relevance is imprecise, context- and user- dependent  But how much it is rewarding to gain 10% improvement!