Your SlideShare is downloading. ×
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Sigir 2011 proceedings
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sigir 2011 proceedings

483

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
483
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Summary of Papers of SIGIR 2011 Workshop on Query Representation and Understanding Chetana Gavankar
  • 2. Ricardo Campos, Alipio Jorge, Gael Dias: "Using Web Snippets and Query-logs to Measure Implicit Temporal Intents in Queries"
  • 3. Types of Temporal queries 1. Atemporal : Queries not sensitive to time like plan my trip 2. Temporal unambiguous : Queries in concrete time period. Ex : Haiti earthquake in 2010 3. Temporal ambiguous : queries with multiple instances over time. Ex : Cricket worldcup which occurs every four years.
  • 4. Web snippets and Query Logs Content-Related Resources , based on a web content approach Simply requires the set of web search results. Query-Log Resources , based on similar year-qualified queries Imply that some versions of the query have already been issued.
  • 5. 1. Web snippets ( temporal evidence within web pages): TA(q)= ∑ f ε I w f f(q) I = {Tsnippet(.),TTitle(.),TUrl(.)} Value each feature differently using w f 18.14 for TTitles, 50.91 for TSnippets and 30.95 for Turl(.) If TA(q) value < 10% then Atemporal. Dates appearing in query & docs may not match. # Snippets Retrieved with Dates Identifying implicit temporal queries TSnippets = # Snippets Retrieved
  • 6. Identifying implicit temporal queries 2.Web Query Logs : Temporal activity can be recorded from date & time of request and from user activity. No. of times query is pre, post qualified by year is WA(q,y)=#(y,q) + #(q,y) α(q) = ∑ y WA (q,y) / ∑ x #(x,q) + ∑ x #(q,x) If query qualified with single year then α(q) =1
  • 7. Results An additional analysis led us to conclude that the temporal information is more frequent in web snippets than in any of the query logs of Google and Yahoo!; Overall, while most of the queries have a TSnippet(.) value around 20%, TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.
  • 8. Conclusion
    • Future dates common in snippets than query log
    • 9. Query having dates does not necessarily mean that it has temporal intent (from web query logs of Google and yahoo) Ex: October Sky movie
    • 10. Web snippets statistically more relevant in terms of temporal intent than query logs
  • 11. Rishiraj Saha Roy, Niloy Ganguly, Monojit Choudhury, Naveen Singh: &quot;Complex Network Analysis Reveals Kernel-Periphery Structure in Web Search Queries&quot;
  • 12. Search Queries Search Query language: bag of segments Word occurrence n/w: Edge exists if P ij > P i P j Eight complex network models for query logs
    • Query Unrestricted wordnet(local) and (global)
    • 13. Query Restricted wordnet(local) and (global)
    • 14. Query Unrestricted SegmentNet(local) and (global)
    • 15. Query Restricted SegmentNet(local) and (global)
  • 16. Kernel and Peripheral lexicons Two regimes in DD of word occurrence N/W: 1.K ernel lexicons (K-Lex or modifiers):
    • Units popular in query (high degrees)
    • 17. Generic and domain independent
    2.Peripheral lexicon (P-Lex or HEADs): Rare ones with degree much less than those in kernal P K-Lex (popular segments) P-Lex (rarer segments) how to matthew brodrick wiki accessories free police officer and who is in australia epson tx800 videos star trek next gen
  • 18. Degree Disribution |N| = Nodes, |E| = edges C= average clustering coefficient d=mean shortest path between edges C rand and d rand are corr. Values in random graph C rand ~ k'/ |N| , d rand ~ ln(|N|)/ ln(|k'|) k' = average degree of graph Degree distribution= p(k) = nodes with degree k/ total nodes
  • 19. Two regime power law
  • 20. Conclusion
    • Like NL, Queries reflect kernal-periphery distinction
    • 21. Unlike NL, Query N/W lack small word property for quickly retrieving words from mind
    • 22. More difficult to understand context of segment in query.
    • 23. Peripheral N/W consist of large number of small disconnected components
    • 24. Capability of peripheral units to exist by themselves makes POS identification hard in Queries.
    • 25. Socio-cultural factors govern the kernel-periphery distinction in queries
  • 26. Lidong Bing, Wai Lam: &quot;Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization&quot;
  • 27. Web Query Refinement
    • Query Refinement
    • Generate some candidate queries first, and score the quality of these candidates.
  • 34. Latent Topic Analysis in Query Log Query log record (user_id, query, clicked_url, time) Pseudo-document generation: Queries related to the same host are aggregated. General sites like “en.wikipedia.org” are not suitable for latent topic analysis & are eliminated Latent Dirichlet Allocation Algorithm) LDA to conduct the latent semantic topic analysis on the collection of host-based pseudo-documents. Z = set of latent topic s z i Each z i is associated with multinomial distribution of terms P ( tk | z i )= prob of term tk given topic z i
  • 35. Personalization π u ={ π u 1 , π u 2 , … , π u |z| } = profile of the user u , π u i = P ( z i | u ) = probability that the user u prefers the topic z i Generate user-based pseudo-document U for user u . { P ( z 1 | U ), P ( z 2 | U ), … , P ( z | Z | | U )} = profile of u . candidate query q : t 1 , … t n Topic of term t r = z r
  • 36. Topic based scoring with personalization Candidate query score: model parameter P ( zj | zi ) captures the relationship of two topics With personal profile P ( z 1 | u ) = probability that user u prefers the topic z 1
  • 37. Conclusion Framework that considers personalization achieves the best performance. With user profiles, the topic-based scoring part is more reliable

×