Search Engine Technology
(3)
Prof. Dragomir R. Radev
radev@umich.edu
SET Fall 2009
…
5. Evaluation of IR systems
Reference collections
TREC
…
Relevance
• Difficult to change: fuzzy, inconsistent
• Methods: exhaustive, sampling, pooling,
search-based
Contingency table
w=tp x=fn
y=fp z=tn
n2 = w + y
n1 = w + x
N
relevant
not relevant
retrieved not retrieved
Precision and Recall
Recall:
Precision:
w
w+y
w+x
w
Exercise
Go to Google (www.google.com) and search for documents on
Tolkien’s “Lord of the Rings”. Try different ways of ph...
n Doc. no Relevant? Recall Precision
1 588 x 0.2 1.00
2 589 x 0.4 1.00
3 576 0.4 0.67
4 590 x 0.6 0.75
5 986 0.6 0.60
6 59...
P/R graph
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precision
P/R graph
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precision
Interpolated av...
Issues
• Why not use accuracy A=(w+z)/N?
• Average precision
• Average P at given “document cutoff
values”
• Report when P...
Kappa
• N: number of items (index i)
• n: number of categories (index j)
• k: number of annotators
)(1
)()(
EP
EPAP
−
−
=κ...
Kappa example
J1+ J1- TOTAL
J2+ 300 10 310
J2- 20 70 90
TOTA
L
320 80 400
Kappa (cont’d)
• P(A) = 370/400 = 0.925
• P (-) = (10+20+70+70)/800 = 0.2125
• P (+) = (10+20+300+300)/800 = 0.7875
• P (E...
Sample TREC query
<top>
<num> Number: 305
<title> Most Dangerous Vehicles
<desc> Description:
Which are the most crashwort...
<DOCNO> LA031689-0177 </DOCNO>
<DOCID> 31701 </DOCID>
<DATE><P>March 16, 1989, Thursday, Home Edition </P></DATE>
<SECTION...
TREC (cont’d)
• http://trec.nist.gov/tracks.html
• http://trec.nist.gov/presentations/presentations.html
Most used reference collections
• Generic retrieval: OHSUMED, CRANFIELD,
CACM
• Text classification: Reuters, 20newsgroups...
Comparing two systems
• Comparing A and B
• One query?
• Average performance?
• Need: A to consistently outperform B
[this...
The sign test
• Example 1:
– A > B (12 times)
– A = B (25 times)
– A < B (3 times)
– p < 0.035 (significant at the 5% leve...
Other tests
• Student t-test: takes into account the actual
performances, not just which system is better
– http://www.fon...
SET Fall 2009
…
6. Automated indexing/labeling
Compression
…
Indexing methods
• Manual: e.g., Library of Congress subject
headings, MeSH
• Automatic: e.g., TF*IDF based
LOC subject headings
http://www.loc.gov/catdir/cpso/lcco/lcco.html
A -- GENERAL WORKS
B -- PHILOSOPHY. PSYCHOLOGY. RELIGIO...
Medicine
CLASS R - MEDICINE
Subclass R
R5-920 Medicine (General)
R5-130.5 General works
R131-687 History of medicine. Medi...
Automatic methods
• TF*IDF: pick terms with the highest
TF*IDF scores
• Centroid-based: pick terms that appear in
the cent...
Compression
• Methods
– Fixed length codes
– Huffman coding
– Ziv-Lempel codes
Fixed length codes
• Binary representations
– ASCII
– Representational power (2k
symbols where k
is the number of bits)
Variable length codes
• Alphabet:
A .- N -. 0 -----
B -... O --- 1 .----
C -.-. P .--. 2 ..---
D -.. Q --.- 3 ...—
E . R ....
Most frequent letters in English
• Most frequent letters:
– E T A O I N S H R D L U
• Demo:
– http://www.amstat.org/public...
Huffman coding
• Developed by David Huffman (1952)
• Average of 5 bits per character (37.5%
compression)
• Based on freque...
Symbol Frequency
A 7
B 4
C 10
D 5
E 2
F 11
G 15
H 3
I 7
J 8
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
c
b d
f
g
i j
he
a
Symbol Code
A 0110
B 0010
C 000
D 0011
E 01110
F 010
G 10
H 01111
I 110
J 111
Exercise
• Consider the bit string:
011011011110001001100011101001110
00110101101011101
• Use the Huffman code from the ex...
Extensions
• Word-based
• Domain/genre dependent models
Ziv-Lempel coding
• Two types - one is known as LZ77 (used
in GZIP)
• Code: set of triples <a,b,c>
• a: how far back in th...
• <0,0,p> p
• <0,0,e> pe
• <0,0,t> pet
• <2,1,r> peter
• <0,0,_> peter_
• <6,1,i> peter_pi
• <8,2,r> peter_piper
• <6,3,c>...
Links on text compression
• Data compression:
– http://www.data-compression.info/
• Calgary corpus:
– http://links.uwaterl...
100 alternative search engines
• http://rss.slashdot.org/~r/Slashdot/slashdo
t/~3/83468703/article.pl
Readings
• 2: MRS9
• 3: MRS13, MRS14
• 4: MRS15, MRS16
Upcoming SlideShare
Loading in …5
×

IR3.ppt

412
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
412
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

IR3.ppt

  1. 1. Search Engine Technology (3) Prof. Dragomir R. Radev radev@umich.edu
  2. 2. SET Fall 2009 … 5. Evaluation of IR systems Reference collections TREC …
  3. 3. Relevance • Difficult to change: fuzzy, inconsistent • Methods: exhaustive, sampling, pooling, search-based
  4. 4. Contingency table w=tp x=fn y=fp z=tn n2 = w + y n1 = w + x N relevant not relevant retrieved not retrieved
  5. 5. Precision and Recall Recall: Precision: w w+y w+x w
  6. 6. Exercise Go to Google (www.google.com) and search for documents on Tolkien’s “Lord of the Rings”. Try different ways of phrasing the query: e.g., Tolkien, “JRR Tolkien”, +”JRR Tolkien” +Lord of the Rings”, etc. For each query, compute the precision (P) based on the first 10 documents returned by AltaVista. Note! Before starting the exercise, have a clear idea of what a relevant document for your query should look like. Try different information needs. Later, try different queries.
  7. 7. n Doc. no Relevant? Recall Precision 1 588 x 0.2 1.00 2 589 x 0.4 1.00 3 576 0.4 0.67 4 590 x 0.6 0.75 5 986 0.6 0.60 6 592 x 0.8 0.67 7 984 0.8 0.57 8 988 0.8 0.50 9 578 0.8 0.44 10 985 0.8 0.40 11 103 0.8 0.36 12 591 0.8 0.33 13 772 x 1.0 0.38 14 990 1.0 0.36 [From Salton’s book]
  8. 8. P/R graph 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision
  9. 9. P/R graph 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision Interpolated average precision (e.g., 11pt) Interpolation – what is precision at recall=0.5?
  10. 10. Issues • Why not use accuracy A=(w+z)/N? • Average precision • Average P at given “document cutoff values” • Report when P=R • F measure: F=(β2 +1)PR/(β2 P+R) • F1 measure: F1 = 2/(1/R+1/P) : harmonic mean of P and R
  11. 11. Kappa • N: number of items (index i) • n: number of categories (index j) • k: number of annotators )(1 )()( EP EPAP − − =κ ∑∑= = − − − = N i n j ij k m kNk AP 1 1 2 1 1 )1( 1 )( 2 1 1 )(             = ∑ ∑ = = Nk m EP N i ijn j
  12. 12. Kappa example J1+ J1- TOTAL J2+ 300 10 310 J2- 20 70 90 TOTA L 320 80 400
  13. 13. Kappa (cont’d) • P(A) = 370/400 = 0.925 • P (-) = (10+20+70+70)/800 = 0.2125 • P (+) = (10+20+300+300)/800 = 0.7875 • P (E) = 0.2125 * 0.2125 + 0.7875 * 0.7875 = 0.665 • K = (0.925-0.665)/(1-0.665) = 0.776 • Kappa higher than 0.67 is tentatively acceptable; higher than 0.8 is good
  14. 14. Sample TREC query <top> <num> Number: 305 <title> Most Dangerous Vehicles <desc> Description: Which are the most crashworthy, and least crashworthy, passenger vehicles? <narr> Narrative: A relevant document will contain information on the crashworthiness of a given vehicle or vehicles that can be used to draw a comparison with other vehicles. The document will have to describe/compare vehicles, not drivers. For instance, it should be expected that vehicles preferred by 16-25 year-olds would be involved in more crashes, because that age group is involved in more crashes. I would view number of fatalities per 100 crashes to be more revealing of a vehicle's crashworthiness than the number of crashes per 100,000 miles, for example. </top> LA031689-0177 FT922-1008 LA090190-0126 LA101190-0218 LA082690-0158 LA112590-0109 FT944-136 LA020590-0119 FT944-5300 LA052190-0048 LA051689-0139 FT944-9371 LA032390-0172 LA042790-0172 LA021790-0136 LA092289-0167 LA111189-0013 LA120189-0179 LA020490-0021 LA122989-0063 LA091389-0119 LA072189-0048 FT944-15615 LA091589-0101 LA021289-0208
  15. 15. <DOCNO> LA031689-0177 </DOCNO> <DOCID> 31701 </DOCID> <DATE><P>March 16, 1989, Thursday, Home Edition </P></DATE> <SECTION><P>Business; Part 4; Page 1; Column 5; Financial Desk </P></SECTION> <LENGTH><P>586 words </P></LENGTH> <HEADLINE><P>AGENCY TO LAUNCH STUDY OF FORD BRONCO II AFTER HIGH RATE OF ROLL-OVER ACCIDENTS </P></HEADLINE> <BYLINE><P>By LINDA WILLIAMS, Times Staff Writer </P></BYLINE> <TEXT> <P>The federal government's highway safety watchdog said Wednesday that the Ford Bronco II appears to be involved in more fatal roll-over accidents than other vehicles in its class and that it will seek to determine if the vehicle itself contributes to the accidents. </P> <P>The decision to do an engineering analysis of the Ford Motor Co. utility-sport vehicle grew out of a federal accident study of the Suzuki Samurai, said Tim Hurd, a spokesman for the National Highway Traffic Safety Administration. NHTSA looked at Samurai accidents after Consumer Reports magazine charged that the vehicle had basic design flaws. </P> <P>Several Fatalities </P> <P>However, the accident study showed that the "Ford Bronco II appears to have a higher number of single-vehicle, first event roll-overs, particularly those involving fatalities," Hurd said. The engineering analysis of the Bronco, the second of three levels of investigation conducted by NHTSA, will cover the 1984-1989 Bronco II models, the agency said. </P> <P>According to a Fatal Accident Reporting System study included in the September report on the Samurai, 43 Bronco II single-vehicle roll-overs caused fatalities, or 19 of every 100,000 vehicles. There were eight Samurai fatal roll-overs, or 6 per 100,000; 13 involving the Chevrolet S10 Blazers or GMC Jimmy, or 6 per 100,000, and six fatal Jeep Cherokee roll-overs, for 2.5 per 100,000. After the accident report, NHTSA declined to investigate the Samurai. </P> ... </TEXT> <GRAPHIC><P> Photo, The Ford Bronco II "appears to have a higher number of single-vehicle, first event roll-overs," a federal official said. </P></GRAPHIC> <SUBJECT> <P>TRAFFIC ACCIDENTS; FORD MOTOR CORP; NATIONAL HIGHWAY TRAFFIC SAFETY ADMINISTRATION; VEHICLE INSPECTIONS; RECREATIONAL VEHICLES; SUZUKI MOTOR CO; AUTOMOBILE SAFETY </P> </SUBJECT> </DOC>
  16. 16. TREC (cont’d) • http://trec.nist.gov/tracks.html • http://trec.nist.gov/presentations/presentations.html
  17. 17. Most used reference collections • Generic retrieval: OHSUMED, CRANFIELD, CACM • Text classification: Reuters, 20newsgroups • Question answering: TREC-QA • Web: DOTGOV, wt100g • Blogs: Buzzmetrics datasets • TREC ad hoc collections, 2-6 GB • TREC Web collections, 2-100GB
  18. 18. Comparing two systems • Comparing A and B • One query? • Average performance? • Need: A to consistently outperform B [this slide: courtesy James Allan]
  19. 19. The sign test • Example 1: – A > B (12 times) – A = B (25 times) – A < B (3 times) – p < 0.035 (significant at the 5% level) • Example 2: – A > B (18 times) – A < B (9 times) – p < 0.122 (not significant at the 5% level) – http://www.fon.hum.uva.nl/Service/Statistics/Sign_Tes t.html [this slide: courtesy James Allan]
  20. 20. Other tests • Student t-test: takes into account the actual performances, not just which system is better – http://www.fon.hum.uva.nl/Service/Statistics/Student_t _Test.html – http://www.socialresearchmethods.net/kb/stat_t.php • Wilcoxon Matched-Pairs Signed-Ranks Test – http://www.fon.hum.uva.nl/Service/Statistics/Signed_ Rank_Test.html
  21. 21. SET Fall 2009 … 6. Automated indexing/labeling Compression …
  22. 22. Indexing methods • Manual: e.g., Library of Congress subject headings, MeSH • Automatic: e.g., TF*IDF based
  23. 23. LOC subject headings http://www.loc.gov/catdir/cpso/lcco/lcco.html A -- GENERAL WORKS B -- PHILOSOPHY. PSYCHOLOGY. RELIGION C -- AUXILIARY SCIENCES OF HISTORY D -- HISTORY (GENERAL) AND HISTORY OF EUROPE E -- HISTORY: AMERICA F -- HISTORY: AMERICA G -- GEOGRAPHY. ANTHROPOLOGY. RECREATION H -- SOCIAL SCIENCES J -- POLITICAL SCIENCE K -- LAW L -- EDUCATION M -- MUSIC AND BOOKS ON MUSIC N -- FINE ARTS P -- LANGUAGE AND LITERATURE Q -- SCIENCE R -- MEDICINE S -- AGRICULTURE T -- TECHNOLOGY U -- MILITARY SCIENCE V -- NAVAL SCIENCE Z -- BIBLIOGRAPHY. LIBRARY SCIENCE. INFORMATION RESOURCES (GENERAL)
  24. 24. Medicine CLASS R - MEDICINE Subclass R R5-920 Medicine (General) R5-130.5 General works R131-687 History of medicine. Medical expeditions R690-697 Medicine as a profession. Physicians R702-703 Medicine and the humanities. Medicine and disease in relation to history, literature, etc. R711-713.97 Directories R722-722.32 Missionary medicine. Medical missionaries R723-726 Medical philosophy. Medical ethics R726.5-726.8 Medicine and disease in relation to psychology. Terminal care. Dying R727-727.5 Medical personnel and the public. Physician and the public R728-733 Practice of medicine. Medical practice economics R735-854 Medical education. Medical schools. Research R855-855.5 Medical technology R856-857 Biomedical engineering. Electronics. Instrumentation R858-859.7 Computer applications to medicine. Medical informatics R864 Medical records R895-920 Medical physics. Medical radiology. Nuclear medicine
  25. 25. Automatic methods • TF*IDF: pick terms with the highest TF*IDF scores • Centroid-based: pick terms that appear in the centroid with high scores • The maximal marginal relevance principle (MMR) • Related to summarization, snippet generation
  26. 26. Compression • Methods – Fixed length codes – Huffman coding – Ziv-Lempel codes
  27. 27. Fixed length codes • Binary representations – ASCII – Representational power (2k symbols where k is the number of bits)
  28. 28. Variable length codes • Alphabet: A .- N -. 0 ----- B -... O --- 1 .---- C -.-. P .--. 2 ..--- D -.. Q --.- 3 ...— E . R .-. 4 ....- F ..-. S ... 5 ..... G --. T - 6 -.... H .... U ..- 7 --... I .. V ...- 8 ---.. J .--- W .-- 9 ----. K -.- X -..- L .-.. Y -.— M -- Z --.. • Demo: – http://www.scphillips.com/morse/
  29. 29. Most frequent letters in English • Most frequent letters: – E T A O I N S H R D L U • Demo: – http://www.amstat.org/publications/jse/secure/v7n2/count-c • Also: bigrams: – TH HE IN ER AN RE ND AT ON NT
  30. 30. Huffman coding • Developed by David Huffman (1952) • Average of 5 bits per character (37.5% compression) • Based on frequency distributions of symbols • Algorithm: iteratively build a tree of symbols starting with the two least frequent symbols
  31. 31. Symbol Frequency A 7 B 4 C 10 D 5 E 2 F 11 G 15 H 3 I 7 J 8
  32. 32. 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 c b d f g i j he a
  33. 33. Symbol Code A 0110 B 0010 C 000 D 0011 E 01110 F 010 G 10 H 01111 I 110 J 111
  34. 34. Exercise • Consider the bit string: 011011011110001001100011101001110 00110101101011101 • Use the Huffman code from the example to decode it. • Try inserting, deleting, and switching some bits at random locations and try decoding.
  35. 35. Extensions • Word-based • Domain/genre dependent models
  36. 36. Ziv-Lempel coding • Two types - one is known as LZ77 (used in GZIP) • Code: set of triples <a,b,c> • a: how far back in the decoded text to look for the upcoming text segment • b: how many characters to copy • c: new character to add to complete segment
  37. 37. • <0,0,p> p • <0,0,e> pe • <0,0,t> pet • <2,1,r> peter • <0,0,_> peter_ • <6,1,i> peter_pi • <8,2,r> peter_piper • <6,3,c> peter_piper_pic • <0,0,k> peter_piper_pick • <7,1,d> peter_piper_picked • <7,1,a> peter_piper_picked_a • <9,2,e> peter_piper_picked_a_pe • <9,2,_> peter_piper_picked_a_peck_ • <0,0,o> peter_piper_picked_a_peck_o • <0,0,f> peter_piper_picked_a_peck_of • <17,5,l> peter_piper_picked_a_peck_of_pickl • <12,1,d> peter_piper_picked_a_peck_of_pickled • <16,3,p> peter_piper_picked_a_peck_of_pickled_pep • <3,2,r> peter_piper_picked_a_peck_of_pickled_pepper • <0,0,s> peter_piper_picked_a_peck_of_pickled_peppers
  38. 38. Links on text compression • Data compression: – http://www.data-compression.info/ • Calgary corpus: – http://links.uwaterloo.ca/calgary.corpus.html • Huffman coding: – http://www.compressconsult.com/huffman/ – http://en.wikipedia.org/wiki/Huffman_coding • LZ – http://en.wikipedia.org/wiki/LZ77
  39. 39. 100 alternative search engines • http://rss.slashdot.org/~r/Slashdot/slashdo t/~3/83468703/article.pl
  40. 40. Readings • 2: MRS9 • 3: MRS13, MRS14 • 4: MRS15, MRS16
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×