Upcoming SlideShare
×

# IR3.ppt

412
-1

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
412
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
4
0
Likes
0
Embeds 0
No embeds

No notes for slide

### IR3.ppt

2. 2. SET Fall 2009 … 5. Evaluation of IR systems Reference collections TREC …
3. 3. Relevance • Difficult to change: fuzzy, inconsistent • Methods: exhaustive, sampling, pooling, search-based
4. 4. Contingency table w=tp x=fn y=fp z=tn n2 = w + y n1 = w + x N relevant not relevant retrieved not retrieved
5. 5. Precision and Recall Recall: Precision: w w+y w+x w
6. 6. Exercise Go to Google (www.google.com) and search for documents on Tolkien’s “Lord of the Rings”. Try different ways of phrasing the query: e.g., Tolkien, “JRR Tolkien”, +”JRR Tolkien” +Lord of the Rings”, etc. For each query, compute the precision (P) based on the first 10 documents returned by AltaVista. Note! Before starting the exercise, have a clear idea of what a relevant document for your query should look like. Try different information needs. Later, try different queries.
7. 7. n Doc. no Relevant? Recall Precision 1 588 x 0.2 1.00 2 589 x 0.4 1.00 3 576 0.4 0.67 4 590 x 0.6 0.75 5 986 0.6 0.60 6 592 x 0.8 0.67 7 984 0.8 0.57 8 988 0.8 0.50 9 578 0.8 0.44 10 985 0.8 0.40 11 103 0.8 0.36 12 591 0.8 0.33 13 772 x 1.0 0.38 14 990 1.0 0.36 [From Salton’s book]
8. 8. P/R graph 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision
9. 9. P/R graph 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision Interpolated average precision (e.g., 11pt) Interpolation – what is precision at recall=0.5?
10. 10. Issues • Why not use accuracy A=(w+z)/N? • Average precision • Average P at given “document cutoff values” • Report when P=R • F measure: F=(β2 +1)PR/(β2 P+R) • F1 measure: F1 = 2/(1/R+1/P) : harmonic mean of P and R
11. 11. Kappa • N: number of items (index i) • n: number of categories (index j) • k: number of annotators )(1 )()( EP EPAP − − =κ ∑∑= = − − − = N i n j ij k m kNk AP 1 1 2 1 1 )1( 1 )( 2 1 1 )(             = ∑ ∑ = = Nk m EP N i ijn j
12. 12. Kappa example J1+ J1- TOTAL J2+ 300 10 310 J2- 20 70 90 TOTA L 320 80 400
13. 13. Kappa (cont’d) • P(A) = 370/400 = 0.925 • P (-) = (10+20+70+70)/800 = 0.2125 • P (+) = (10+20+300+300)/800 = 0.7875 • P (E) = 0.2125 * 0.2125 + 0.7875 * 0.7875 = 0.665 • K = (0.925-0.665)/(1-0.665) = 0.776 • Kappa higher than 0.67 is tentatively acceptable; higher than 0.8 is good
14. 14. Sample TREC query <top> <num> Number: 305 <title> Most Dangerous Vehicles <desc> Description: Which are the most crashworthy, and least crashworthy, passenger vehicles? <narr> Narrative: A relevant document will contain information on the crashworthiness of a given vehicle or vehicles that can be used to draw a comparison with other vehicles. The document will have to describe/compare vehicles, not drivers. For instance, it should be expected that vehicles preferred by 16-25 year-olds would be involved in more crashes, because that age group is involved in more crashes. I would view number of fatalities per 100 crashes to be more revealing of a vehicle's crashworthiness than the number of crashes per 100,000 miles, for example. </top> LA031689-0177 FT922-1008 LA090190-0126 LA101190-0218 LA082690-0158 LA112590-0109 FT944-136 LA020590-0119 FT944-5300 LA052190-0048 LA051689-0139 FT944-9371 LA032390-0172 LA042790-0172 LA021790-0136 LA092289-0167 LA111189-0013 LA120189-0179 LA020490-0021 LA122989-0063 LA091389-0119 LA072189-0048 FT944-15615 LA091589-0101 LA021289-0208
15. 15. <DOCNO> LA031689-0177 </DOCNO> <DOCID> 31701 </DOCID> <DATE><P>March 16, 1989, Thursday, Home Edition </P></DATE> <SECTION><P>Business; Part 4; Page 1; Column 5; Financial Desk </P></SECTION> <LENGTH><P>586 words </P></LENGTH> <HEADLINE><P>AGENCY TO LAUNCH STUDY OF FORD BRONCO II AFTER HIGH RATE OF ROLL-OVER ACCIDENTS </P></HEADLINE> <BYLINE><P>By LINDA WILLIAMS, Times Staff Writer </P></BYLINE> <TEXT> <P>The federal government's highway safety watchdog said Wednesday that the Ford Bronco II appears to be involved in more fatal roll-over accidents than other vehicles in its class and that it will seek to determine if the vehicle itself contributes to the accidents. </P> <P>The decision to do an engineering analysis of the Ford Motor Co. utility-sport vehicle grew out of a federal accident study of the Suzuki Samurai, said Tim Hurd, a spokesman for the National Highway Traffic Safety Administration. NHTSA looked at Samurai accidents after Consumer Reports magazine charged that the vehicle had basic design flaws. </P> <P>Several Fatalities </P> <P>However, the accident study showed that the "Ford Bronco II appears to have a higher number of single-vehicle, first event roll-overs, particularly those involving fatalities," Hurd said. The engineering analysis of the Bronco, the second of three levels of investigation conducted by NHTSA, will cover the 1984-1989 Bronco II models, the agency said. </P> <P>According to a Fatal Accident Reporting System study included in the September report on the Samurai, 43 Bronco II single-vehicle roll-overs caused fatalities, or 19 of every 100,000 vehicles. There were eight Samurai fatal roll-overs, or 6 per 100,000; 13 involving the Chevrolet S10 Blazers or GMC Jimmy, or 6 per 100,000, and six fatal Jeep Cherokee roll-overs, for 2.5 per 100,000. After the accident report, NHTSA declined to investigate the Samurai. </P> ... </TEXT> <GRAPHIC><P> Photo, The Ford Bronco II "appears to have a higher number of single-vehicle, first event roll-overs," a federal official said. </P></GRAPHIC> <SUBJECT> <P>TRAFFIC ACCIDENTS; FORD MOTOR CORP; NATIONAL HIGHWAY TRAFFIC SAFETY ADMINISTRATION; VEHICLE INSPECTIONS; RECREATIONAL VEHICLES; SUZUKI MOTOR CO; AUTOMOBILE SAFETY </P> </SUBJECT> </DOC>
16. 16. TREC (cont’d) • http://trec.nist.gov/tracks.html • http://trec.nist.gov/presentations/presentations.html
17. 17. Most used reference collections • Generic retrieval: OHSUMED, CRANFIELD, CACM • Text classification: Reuters, 20newsgroups • Question answering: TREC-QA • Web: DOTGOV, wt100g • Blogs: Buzzmetrics datasets • TREC ad hoc collections, 2-6 GB • TREC Web collections, 2-100GB
18. 18. Comparing two systems • Comparing A and B • One query? • Average performance? • Need: A to consistently outperform B [this slide: courtesy James Allan]
19. 19. The sign test • Example 1: – A > B (12 times) – A = B (25 times) – A < B (3 times) – p < 0.035 (significant at the 5% level) • Example 2: – A > B (18 times) – A < B (9 times) – p < 0.122 (not significant at the 5% level) – http://www.fon.hum.uva.nl/Service/Statistics/Sign_Tes t.html [this slide: courtesy James Allan]
20. 20. Other tests • Student t-test: takes into account the actual performances, not just which system is better – http://www.fon.hum.uva.nl/Service/Statistics/Student_t _Test.html – http://www.socialresearchmethods.net/kb/stat_t.php • Wilcoxon Matched-Pairs Signed-Ranks Test – http://www.fon.hum.uva.nl/Service/Statistics/Signed_ Rank_Test.html
21. 21. SET Fall 2009 … 6. Automated indexing/labeling Compression …
22. 22. Indexing methods • Manual: e.g., Library of Congress subject headings, MeSH • Automatic: e.g., TF*IDF based
23. 23. LOC subject headings http://www.loc.gov/catdir/cpso/lcco/lcco.html A -- GENERAL WORKS B -- PHILOSOPHY. PSYCHOLOGY. RELIGION C -- AUXILIARY SCIENCES OF HISTORY D -- HISTORY (GENERAL) AND HISTORY OF EUROPE E -- HISTORY: AMERICA F -- HISTORY: AMERICA G -- GEOGRAPHY. ANTHROPOLOGY. RECREATION H -- SOCIAL SCIENCES J -- POLITICAL SCIENCE K -- LAW L -- EDUCATION M -- MUSIC AND BOOKS ON MUSIC N -- FINE ARTS P -- LANGUAGE AND LITERATURE Q -- SCIENCE R -- MEDICINE S -- AGRICULTURE T -- TECHNOLOGY U -- MILITARY SCIENCE V -- NAVAL SCIENCE Z -- BIBLIOGRAPHY. LIBRARY SCIENCE. INFORMATION RESOURCES (GENERAL)
24. 24. Medicine CLASS R - MEDICINE Subclass R R5-920 Medicine (General) R5-130.5 General works R131-687 History of medicine. Medical expeditions R690-697 Medicine as a profession. Physicians R702-703 Medicine and the humanities. Medicine and disease in relation to history, literature, etc. R711-713.97 Directories R722-722.32 Missionary medicine. Medical missionaries R723-726 Medical philosophy. Medical ethics R726.5-726.8 Medicine and disease in relation to psychology. Terminal care. Dying R727-727.5 Medical personnel and the public. Physician and the public R728-733 Practice of medicine. Medical practice economics R735-854 Medical education. Medical schools. Research R855-855.5 Medical technology R856-857 Biomedical engineering. Electronics. Instrumentation R858-859.7 Computer applications to medicine. Medical informatics R864 Medical records R895-920 Medical physics. Medical radiology. Nuclear medicine
25. 25. Automatic methods • TF*IDF: pick terms with the highest TF*IDF scores • Centroid-based: pick terms that appear in the centroid with high scores • The maximal marginal relevance principle (MMR) • Related to summarization, snippet generation
26. 26. Compression • Methods – Fixed length codes – Huffman coding – Ziv-Lempel codes
27. 27. Fixed length codes • Binary representations – ASCII – Representational power (2k symbols where k is the number of bits)
28. 28. Variable length codes • Alphabet: A .- N -. 0 ----- B -... O --- 1 .---- C -.-. P .--. 2 ..--- D -.. Q --.- 3 ...— E . R .-. 4 ....- F ..-. S ... 5 ..... G --. T - 6 -.... H .... U ..- 7 --... I .. V ...- 8 ---.. J .--- W .-- 9 ----. K -.- X -..- L .-.. Y -.— M -- Z --.. • Demo: – http://www.scphillips.com/morse/
29. 29. Most frequent letters in English • Most frequent letters: – E T A O I N S H R D L U • Demo: – http://www.amstat.org/publications/jse/secure/v7n2/count-c • Also: bigrams: – TH HE IN ER AN RE ND AT ON NT
30. 30. Huffman coding • Developed by David Huffman (1952) • Average of 5 bits per character (37.5% compression) • Based on frequency distributions of symbols • Algorithm: iteratively build a tree of symbols starting with the two least frequent symbols
31. 31. Symbol Frequency A 7 B 4 C 10 D 5 E 2 F 11 G 15 H 3 I 7 J 8
32. 32. 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 c b d f g i j he a
33. 33. Symbol Code A 0110 B 0010 C 000 D 0011 E 01110 F 010 G 10 H 01111 I 110 J 111
34. 34. Exercise • Consider the bit string: 011011011110001001100011101001110 00110101101011101 • Use the Huffman code from the example to decode it. • Try inserting, deleting, and switching some bits at random locations and try decoding.
35. 35. Extensions • Word-based • Domain/genre dependent models
36. 36. Ziv-Lempel coding • Two types - one is known as LZ77 (used in GZIP) • Code: set of triples <a,b,c> • a: how far back in the decoded text to look for the upcoming text segment • b: how many characters to copy • c: new character to add to complete segment
37. 37. • <0,0,p> p • <0,0,e> pe • <0,0,t> pet • <2,1,r> peter • <0,0,_> peter_ • <6,1,i> peter_pi • <8,2,r> peter_piper • <6,3,c> peter_piper_pic • <0,0,k> peter_piper_pick • <7,1,d> peter_piper_picked • <7,1,a> peter_piper_picked_a • <9,2,e> peter_piper_picked_a_pe • <9,2,_> peter_piper_picked_a_peck_ • <0,0,o> peter_piper_picked_a_peck_o • <0,0,f> peter_piper_picked_a_peck_of • <17,5,l> peter_piper_picked_a_peck_of_pickl • <12,1,d> peter_piper_picked_a_peck_of_pickled • <16,3,p> peter_piper_picked_a_peck_of_pickled_pep • <3,2,r> peter_piper_picked_a_peck_of_pickled_pepper • <0,0,s> peter_piper_picked_a_peck_of_pickled_peppers
38. 38. Links on text compression • Data compression: – http://www.data-compression.info/ • Calgary corpus: – http://links.uwaterloo.ca/calgary.corpus.html • Huffman coding: – http://www.compressconsult.com/huffman/ – http://en.wikipedia.org/wiki/Huffman_coding • LZ – http://en.wikipedia.org/wiki/LZ77
39. 39. 100 alternative search engines • http://rss.slashdot.org/~r/Slashdot/slashdo t/~3/83468703/article.pl
40. 40. Readings • 2: MRS9 • 3: MRS13, MRS14 • 4: MRS15, MRS16
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.