Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Overview of the Living Labs for
IR Evaluation (LL4IR) CLEF Lab
http://living-labs.net
@livinglabsnet
“Give us your ranking...
Living Labs 

for IR Evaluation
Motivation
- Overall goal: make information retrieval
evaluation more realistic
new retrieval methodusers live site
intera...
Key idea
new retrieval
methods
users live site
data 

(docs/products,
logs, etc.)
K. Balog, L. Kelly, and A. Schuth. Head ...
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Searc...
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Searc...
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Searc...
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"queries": [
{
"creatio...
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"doclist": [
{
"docid":...
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"content": {
"age_max":...
Methodology
2. Rankings are generated for each query and uploaded
through an API
API
{
"qid": "U-q22",
"runid": "82"
"crea...
Methodology
3. When any of the test queries is fired, the live site
request rankings from the API and interleaves them
with...
Interleaving
- Site provides the set of candidate items that can be
re-ranked (safety mechanism)

- Experimental ranking i...
Methodology
4. Participants get detailed feedback on user
interactions (clicks)
API
{
"feedback": [
{
"qid": "S-q1",
"runi...
Methodology
5. Ultimate measure is the number of “wins” against the
production system (aggregated over a period of time)
O...
What is in it for
participants?
- Access to privileged commercial data 

- (Search and click-through data)
- Opportunity t...
The Living Labs Platform
Source code

https://bitbucket.org/living-labs/ll-api
Documentation

http://doc.living-labs.net/
Dashboard

http://dashboard.living-labs.net/
CLEF LL4IR
Use-cases
• Product search

(REGIO Játék)
• Web search

(Seznam)
• Product search

(REGIO Játék)
Benchmark organization
training period test period
query
type
train
- feedback available

- individual feedback

- update ...
Product search
- Ad-hoc retrieval over a product catalog

- Several thousand products

- Limited amount of text, lots of s...
Product data
Product data Product name
Price / bonus price
Short
description
Recommended
age from/to
Gender
recommendation
Categories
B...
{
"content": {
"age_max": 10,
"age_min": 6,
"arrived": "2014-08-28",
"available": 1,
"brand": "Mattel",
"category": "Babu0...
Queries
- Typically very short
monster high
magnetiz
duplo
lego friends
geomag
trash+pack
barbie
monopoly
lego duplo
trans...
Results (2015)Outcome
0
0,1
0,2
0,3
0,4
0,5
0,6
Evaluation round
0 1 2 3 4 5
Baseline UiS GESIS IRIT
Inventory changes
New arrival
Became available
Became unavailable
Days
#Products
−40−20020406080−40−20020406080
05−01 05−0...
Summary and Outlook
Summary
- Successes

- Experimental methodology
- Many interesting opportunities to address current limitations 

(come to...
Limitations / Open issues
- Head queries only: Considerable portion of traffic,
but only popular info needs

- Lack of conte...
TREC Open Search

http://trec-open-search.org/
- Use-case: academic search

- Ad-hoc document search
- Sites

- CiteSeerX
...
We you!
living-labs.net
Thanks to
Upcoming SlideShare
Loading in …5
×

Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab

393 views

Published on

Lab overview talk given at the 7th International Conference of the CLEF Association (CLEF 2016)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab

  1. 1. Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab http://living-labs.net @livinglabsnet “Give us your ranking, we’ll have it clicked!” Krisztian Balog University of Stavanger Liadh Kelly Trinity College Dublin Anne Schuth Blendle 7th International Conference of the CLEF Association (CLEF 2016) | Évora, Portugal, 2016
  2. 2. Living Labs 
 for IR Evaluation
  3. 3. Motivation - Overall goal: make information retrieval evaluation more realistic new retrieval methodusers live site interaction data How to test a new method with real users in their natural task environment (i.e., on the live site)? #1 How to make interaction data available for method development? #2
  4. 4. Key idea new retrieval methods users live site data 
 (docs/products, logs, etc.) K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 API
  5. 5. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 An API orchestrates all data exchange between the live site and experimental systems#1 API data 
 (docs/products, logs, etc.)
  6. 6. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 Focus on frequent (head) queries.
 - Ranked result lists can be generated offline
 - Enough traffic on them (historical & live)#2 API data 
 (docs/products, logs, etc.)
  7. 7. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 Medium to large organizations with fair amount of search volume
 Typically lack their own R&D department#3 API data 
 (docs/products, logs, etc.)
  8. 8. Methodology 1. Queries, candidate documents, historical search and click data made available API { "queries": [ { "creation_time": "Wed, 22 Apr 2015 09:15:41 -0000", "qid": "R-q1", "qstr": "monster high", "type": "train" }, { "creation_time": "Wed, 22 Apr 2015 09:15:41 -0000", "qid": "R-q51",
  9. 9. Methodology 1. Queries, candidate documents, historical search and click data made available API { "doclist": [ { "docid": "R-d1291", "site_id": "R", "title": "LEGO DUPLO Hamupipu0151ke hintu00f3ja 6153" }, { "docid": "R-d1306", "site_id": "R", "title": "LEGO Rendu0151rkapitu00e1nysu00e1g 5681"
  10. 10. Methodology 1. Queries, candidate documents, historical search and click data made available API { "content": { "age_max": 3, "age_min": 1, "arrived": "2014-08-28", "available": 0, "brand": "Lego", "category": "LEGO", "category_id": "38", "characters": [], "description": "Lego Duplo - u00c9pu00edtu0151-u00e9s j
  11. 11. Methodology 2. Rankings are generated for each query and uploaded through an API API { "qid": "U-q22", "runid": "82" "creation_time": "Wed, 04 Jun 2014 15:03:56 -0000", "doclist": [ { "docid": "U-d4" }, { "docid": "U-d2" }, ...
  12. 12. Methodology 3. When any of the test queries is fired, the live site request rankings from the API and interleaves them with that of the production system API
  13. 13. Interleaving - Site provides the set of candidate items that can be re-ranked (safety mechanism) - Experimental ranking is interleaved with the production ranking - Meeds 1-2 order of magnitudes data than A/B testing (also, it is within subject as opposed to between subject design) doc 1 doc 2 doc 3 doc 4 doc 5 doc 2 doc 4 doc 7 doc 1 doc 3 system A system B doc 1 doc 2 doc 4 doc 3 doc 7 interleaved list A>B Inference:
  14. 14. Methodology 4. Participants get detailed feedback on user interactions (clicks) API { "feedback": [ { "qid": "S-q1", "runid": "baseline", "type": "tdi", "doclist": [ { "docid": "S-d1", "clicked": true, "team": "site",
  15. 15. Methodology 5. Ultimate measure is the number of “wins” against the production system (aggregated over a period of time) Outcome = #Wins #Wins + #Losses
  16. 16. What is in it for participants? - Access to privileged commercial data - (Search and click-through data) - Opportunity to test IR systems with real, unsuspecting users in a live setting - (Not the same as crowdsourcing!) - (Continuous evaluation is possible, not limited to yearly evaluation cycle)
  17. 17. The Living Labs Platform
  18. 18. Source code
 https://bitbucket.org/living-labs/ll-api
  19. 19. Documentation
 http://doc.living-labs.net/
  20. 20. Dashboard
 http://dashboard.living-labs.net/
  21. 21. CLEF LL4IR
  22. 22. Use-cases • Product search
 (REGIO Játék) • Web search
 (Seznam) • Product search
 (REGIO Játék)
  23. 23. Benchmark organization training period test period query type train - feedback available
 - individual feedback
 - update possible test - feedback available
 - no individual feedback
 - update possible - no feedback available
 - no individual feedback
 - update not possible
  24. 24. Product search - Ad-hoc retrieval over a product catalog - Several thousand products - Limited amount of text, lots of structure - Categories, characters, brands, etc.
  25. 25. Product data
  26. 26. Product data Product name Price / bonus price Short description Recommended age from/to Gender recommendation Categories Brands Long description (Links to) photos
  27. 27. { "content": { "age_max": 10, "age_min": 6, "arrived": "2014-08-28", "available": 1, "brand": "Mattel", "category": "Babu00e1k, kellu00e9kek", "category_id": "25", "characters": [], "description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i […]", "gender": 2, "main_category": "Baba, babakocsi", "main_category_id": "3", "photos": [ "http://regiojatek.hu/data/regio_images/normal/20777_0.jpg", "http://regiojatek.hu/data/regio_images/normal/20777_1.jpg", […] ], "price": 8675.0, "product_name": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le", "queries": { "clawdeen": "0.037", "monster": "0.222", "monster high": "0.741" }, "short_description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i 
 elsu0151 ku00fclfu00f6ldi u00fatjukra indulnak..." }, "creation_time": "Mon, 11 May 2015 04:52:59 -0000", "docid": "R-d43", "site_id": "R", "title": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le" } Frequent queries that led to the product
  28. 28. Queries - Typically very short monster high magnetiz duplo lego friends geomag trash+pack barbie monopoly lego duplo transformers star wars nerf carrera baba
  29. 29. Results (2015)Outcome 0 0,1 0,2 0,3 0,4 0,5 0,6 Evaluation round 0 1 2 3 4 5 Baseline UiS GESIS IRIT
  30. 30. Inventory changes New arrival Became available Became unavailable Days #Products −40−20020406080−40−20020406080 05−01 05−03 05−05 05−07 05−09 05−11 05−13 05−15
  31. 31. Summary and Outlook
  32. 32. Summary - Successes - Experimental methodology - Many interesting opportunities to address current limitations 
 (come to NewsREEL & LL4IR session tomorrow) - The living labs platform - Open source, can be used for a variety of tasks - Some interesting work for product search - See best of the labs session - Lack of success - Raise sufficient interest in the use-cases at CLEF
  33. 33. Limitations / Open issues - Head queries only: Considerable portion of traffic, but only popular info needs - Lack of context: No knowledge of the searcher’s location, previous searches, etc. - No real-time feedback: API provides detailed feedback, but it’s not immediate - Limited control: Experimentation is limited to single searches, where results are interleaved with those of the production system; no control over the entire result list - Ultimate measure of success: Search is only a means to an end, it is not the ultimate goal
  34. 34. TREC Open Search
 http://trec-open-search.org/ - Use-case: academic search - Ad-hoc document search - Sites - CiteSeerX - SSOAR — German Social Sciences - Microsoft Academic Search - Round #3 runs from Oct 1 to Nov 15
  35. 35. We you! living-labs.net Thanks to

×