Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Reverted Indexing for
Feedback and Expansion
Jeremy Pickens, Matthew Cooper,
Gene Golovchinsky
Reverted Indexing
for Feedback and Expansion
Jeremy Pickens
Catalyst Repository Systems
Query-Document Duality has long history
• Using queries to label documents
• Queries and documents as bipartite graph
– Us...
Motivation – Three R’s
Retrievability
Reuse (Algorithmic)
Recall-Oriented
Tasks
Our Key Contribution
We treat query result sets as unstructured
text “documents” -- and index them
Outline
• Reverted Documents
• Reverted Indexing
• Experimental Setup
• Results
– Effectiveness
– Efficiency
• Related Wor...
Reverted Document
Query
Expression
Ranking
Algorithm
Results
(docid)
Results
(score)
ID
(Basis Query)
Body
Basis Query
(Reverted Document ID)
Query
Expression
Ranking
Algorithm
giraffe BM25
cheetah BM25
gazelle BM25
gazelle Langu...
Reverted Document Body
Results
(docid)
Results
(score)
Canonical URL and/or
docid
1. Probability of Relevance
2. Cosine si...
rank docid score shift-scale Ahn&Moffat
1 #415 0.82 10.0 10
2 #32 0.73 8.92 9
3 #63 0.62 7.57 8
4 #7 0.49 5.95 6
5 #56 0.3...
Result Set→Document Body
docid Ahn&Moffat
#415 10
#32 9
#63 8
#7 6
#56 4
#12 2
#108 1
#115 1
#42 1
#85 1
<text>
415 415 41...
Reverted Document
Query
Expression
Ranking
Algorithm
Results
(docid)
Results
(score)
ID
(Basis Query)
Body
Reverted Document
<document>
<docid>
[gazelle : BM25]
</docid>
<text>
415 415 415 415 415 415 415 415 415 415
32 32 32 32 ...
Questions?
Outline
• Reverted Documents
• Reverted Indexing
• Experimental Setup
• Results
– Effectiveness
– Efficiency
• Related Wor...
Reverted Indexing
1. Choose a set of basis queries
2. For each basis query:
1. Execute each query, producing results up to...
Standard Index
Reverted Index
Reverted Index Statistics
Retrieval Score of docid Term Frequency
Sum of Retrieval Scores
of all docids retrieved by
a Bas...
Outline
• Reverted Documents
• Reverted Indexing
• Experimental Setup
• Results
– Effectiveness
– Efficiency
• Related Wor...
Experiment: Relevance Feedback
1. Run initial query using PL2 (Terrier platform)
[poaching wildlife preserves]
2. Judge to...
Reverted Index→Expansion
1. Original query = [poaching wildlife preserves]
2. Reverted query = [#415 #56 #42 #85]
3. Expan...
Outline
• Reverted Documents
• Reverted Indexing
• Experimental Setup
• Results
– Effectiveness
– Efficiency
• Related Wor...
MAP
%Change
Residual MAP
%Change
Efficiency
• Two components to query expansion
– Selection and Weighting
– Execution of Expanded Query
Avg Selection Time
Avg Execution Time
Why would execution be faster?
Bo1 Reverted_PL2
Term Score Term Score
leakey 0.88 poaching 1.00
poaching 0.74 poachers 0.56
wildlife 0.73 tsavo 0.56
keny...
Bo1 Reverted_PL2
Term DF Term DF
africa 20390 wildlife 2891
african 10636 kenya 1163
conservation 4298 ivory 1014
animals ...
Bo1 Reverted_PL2
Term DF Term DF
los 46748 transportation 15262
angeles 45147 freeway 3506
metro 39849 tunnel 2643
safety ...
Outline
• Reverted Documents
• Reverted Indexing
• Experimental Setup
• Results
– Effectiveness
– Efficiency
• Related Wor...
Related Work
Inspiration:
“Retrievability: An Evaluation Measure for
Higher Order Information Access Tasks” --
Azzopardi a...
Related Work
Query-Document Duality has long history
– S. E. Robertson. “Query-Document Symmetry
and Dual models.” Journal...
Future Extensions
Basis queries
– Query expression may be arbitrarily complex
– Ranking function may be arbitrarily comple...
Motivation – Three R’s
Retrievability
Reuse (Algorithmic)
Recall-Oriented
Tasks
Questions?
Reverted Indexing for Expansion and Feedback
Upcoming SlideShare
Loading in …5
×

Reverted Indexing for Expansion and Feedback

1,337 views

Published on

Pickens, J., Cooper, M., and Golovchinsky, G. Reverted Indexing for Expansion and Feedback. In Proc. CIKM 2010, Toronto, Canada, ACM Press. See http://fxpal.com/?p=abstract&abstractID=581

Published in: Technology
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Reverted Indexing for Expansion and Feedback

  1. 1. Reverted Indexing for Feedback and Expansion Jeremy Pickens, Matthew Cooper, Gene Golovchinsky
  2. 2. Reverted Indexing for Feedback and Expansion Jeremy Pickens Catalyst Repository Systems
  3. 3. Query-Document Duality has long history • Using queries to label documents • Queries and documents as bipartite graph – Used for random walks – Used for partitioning • Reverse Querying
  4. 4. Motivation – Three R’s Retrievability Reuse (Algorithmic) Recall-Oriented Tasks
  5. 5. Our Key Contribution We treat query result sets as unstructured text “documents” -- and index them
  6. 6. Outline • Reverted Documents • Reverted Indexing • Experimental Setup • Results – Effectiveness – Efficiency • Related Work • Future Extensions
  7. 7. Reverted Document Query Expression Ranking Algorithm Results (docid) Results (score) ID (Basis Query) Body
  8. 8. Basis Query (Reverted Document ID) Query Expression Ranking Algorithm giraffe BM25 cheetah BM25 gazelle BM25 gazelle Language Model gazelle PL2 (Divergence from Randomness) gazelle Y gazelle B gazelle G fast cheetah BM25 cheetah AND NOT gazelle Boolean Latitude+Longitude of Zanzibar Euclidean distance
  9. 9. Reverted Document Body Results (docid) Results (score) Canonical URL and/or docid 1. Probability of Relevance 2. Cosine similarity 3. KL Divergence 4. Raw Rank 5. 1 or 0 (Boolean)
  10. 10. rank docid score shift-scale Ahn&Moffat 1 #415 0.82 10.0 10 2 #32 0.73 8.92 9 3 #63 0.62 7.57 8 4 #7 0.49 5.95 6 5 #56 0.35 4.24 4 6 #12 0.14 1.72 2 7 #108 0.12 1.36 1 8 #115 0.09 1.09 1 9 #42 0.08 1.0 1 10 #85 0.08 1.0 1 Result Set→Document Body
  11. 11. Result Set→Document Body docid Ahn&Moffat #415 10 #32 9 #63 8 #7 6 #56 4 #12 2 #108 1 #115 1 #42 1 #85 1 <text> 415 415 415 415 415 415 415 415 415 415 32 32 32 32 32 32 32 32 32 63 63 63 63 63 63 63 63 7 7 7 7 7 7 56 56 56 56 12 12 108 115 42 85 </text>
  12. 12. Reverted Document Query Expression Ranking Algorithm Results (docid) Results (score) ID (Basis Query) Body
  13. 13. Reverted Document <document> <docid> [gazelle : BM25] </docid> <text> 415 415 415 415 415 415 415 415 415 415 32 32 32 32 32 32 32 32 32 63 63 63 63 63 63 63 63 7 7 7 7 7 7 56 56 56 56 12 12 108 115 42 85 </text> </document>
  14. 14. Questions?
  15. 15. Outline • Reverted Documents • Reverted Indexing • Experimental Setup • Results – Effectiveness – Efficiency • Related Work • Future Extensions
  16. 16. Reverted Indexing 1. Choose a set of basis queries 2. For each basis query: 1. Execute each query, producing results up to cutoff depth k 2. Use results to create a “reverted document” 3. Add the reverted document to the index How basis queries are chosen (in these experiments): All singleton terms (unigrams) with df ≥ 2. Ranking algorithm for all basis queries is PL2.
  17. 17. Standard Index
  18. 18. Reverted Index
  19. 19. Reverted Index Statistics Retrieval Score of docid Term Frequency Sum of Retrieval Scores of all docids retrieved by a Basis Query Document Length Number of Basis Queries that docid was retrieved by Document Frequency
  20. 20. Outline • Reverted Documents • Reverted Indexing • Experimental Setup • Results – Effectiveness – Efficiency • Related Work • Future Extensions
  21. 21. Experiment: Relevance Feedback 1. Run initial query using PL2 (Terrier platform) [poaching wildlife preserves] 2. Judge top k documents for relevance 3. 4. Expand using top 500 terms (strongest baseline @ 500) 5. Run expanded query using PL2 6. Evaluate Use KL Divergence to select and weight query expansion terms Use Bo1 to select and weight query expansion terms Use PL2 retrieval on the Reverted Index to select and weight query expansion terms
  22. 22. Reverted Index→Expansion 1. Original query = [poaching wildlife preserves] 2. Reverted query = [#415 #56 #42 #85] 3. Expanded query = [poaching^2.0 wildlife^1.24 preserves^1.0 poachers^0.57 tsavo^0.56 leakey^0.41 tusks^0.39 …] term original retrieved weight poaching 1 1.0 2.0 poachers 0 0.57 0.57 tsavo 0 0.56 0.56 leakey 0 0.41 0.41 tusks 0 0.39 0.39 elephants 0 0.34 0.34 wildlife 1 0.24 1.24 kws 0 0.2 0.2 … … … … preserves 1 0 1.0
  23. 23. Outline • Reverted Documents • Reverted Indexing • Experimental Setup • Results – Effectiveness – Efficiency • Related Work • Future Extensions
  24. 24. MAP
  25. 25. %Change
  26. 26. Residual MAP
  27. 27. %Change
  28. 28. Efficiency • Two components to query expansion – Selection and Weighting – Execution of Expanded Query
  29. 29. Avg Selection Time
  30. 30. Avg Execution Time
  31. 31. Why would execution be faster?
  32. 32. Bo1 Reverted_PL2 Term Score Term Score leakey 0.88 poaching 1.00 poaching 0.74 poachers 0.56 wildlife 0.73 tsavo 0.56 kenya 0.52 leakey 0.41 ivory 0.47 tusks 0.39 elephants 0.46 elephants 0.34 elephant 0.32 wildlife 0.24 deer 0.30 kws 0.20 poachers 0.28 kez 0.17 conservation 0.27 ivory 0.14 species 0.23 jealousies 0.14 tusks 0.19 elephant 0.14 african 0.19 conservationists 0.09 namibia 0.19 kenya 0.09 animals 0.17 fiefdom 0.08 africa 0.15 safaris 0.04 zimbabwe 0.15 conservationist 0.03 tsavo 0.14 egos 0.01 kenyan 0.13 kierie 0.00 conservationists 0.12 aphrodisiacs 0.00
  33. 33. Bo1 Reverted_PL2 Term DF Term DF africa 20390 wildlife 2891 african 10636 kenya 1163 conservation 4298 ivory 1014 animals 3928 elephant 743 species 3479 elephants 356 wildlife 2891 poaching 331 kenya 1163 conservationists 293 ivory 1014 egos 269 zimbabwe 966 kez 173 deer 748 fiefdom 129 elephant 743 conservationist 125 namibia 483 poachers 117 kenyan 436 safaris 57 elephants 356 jealousies 56 poaching 331 tusks 42 conservationists 293 leakey 22 poachers 117 tsavo 12 tusks 42 aphrodisiacs 12 leakey 22 kws 9 tsavo 12 kierie 2 Average DF 2617 Average DF 391
  34. 34. Bo1 Reverted_PL2 Term DF Term DF los 46748 transportation 15262 angeles 45147 freeway 3506 metro 39849 tunnel 2643 safety 22569 disasters 1822 fire 21257 subway 805 foot 13120 extinguished 452 traffic 12410 rtd 227 feet 12034 caved 193 hollywood 7677 shoring 158 heat 6004 roper 147 rail 5747 timbers 98 downtown 5390 shored 97 engineers 4308 pilgrimages 73 freeway 3506 asphyxiation 71 disasters 1822 smolder 29 firefighters 1489 busway 22 subway 805 grouting 21 rtd 227 smoldered 19 timbers 98 lutgen 10 busway 22 droped 2 Average DF 12511 Average DF 1283
  35. 35. Outline • Reverted Documents • Reverted Indexing • Experimental Setup • Results – Effectiveness – Efficiency • Related Work • Future Extensions
  36. 36. Related Work Inspiration: “Retrievability: An Evaluation Measure for Higher Order Information Access Tasks” -- Azzopardi and Vinay, CIKM 2008 Azzopardi & Vinay take a document centric approach, examining whether documents (n)ever appear among top k results to any query
  37. 37. Related Work Query-Document Duality has long history – S. E. Robertson. “Query-Document Symmetry and Dual models.” Journal of Documentation, 50(3),1994 – B. Billerbeck, F. Scholer, H. E. Williams, and J. Zobel. Query Expansion Using Associated Queries. CIKM '03 – N. Craswell and M. Szummer. Random walks on the Query-Click Graph. SIGIR 2007 – Reverse Querying / alerting (various)
  38. 38. Future Extensions Basis queries – Query expression may be arbitrarily complex – Ranking function may be arbitrarily complex (remember: ranking function is a part of the basis query) Reverted queries – Best Match: [#415 #56 #42 #85] – Boolean: (#415 AND #56) OR (#42 AND #85) – Other query operators: [SYNONYM(#415 #56) #42 #85] [ORDERED(#415 #56) #42 #85]
  39. 39. Motivation – Three R’s Retrievability Reuse (Algorithmic) Recall-Oriented Tasks
  40. 40. Questions?

×