Looking Forward
Prof.dr.ir. Arjen P. de Vries
arjen@acm.org
Gaithersburg MD, November 15th, 2016
Q: “TREC Anniversary”
Top Result:
 50 years of Star Trek
(Article on the Verge about Facebook Like buttons)
Science Fiction
 Defining a TREC task or a track is like time-travel in Back
to the Future
Note to the audience: that is just 74 characters
You could even add the hashtag #TREC #TRECCelebrations
and my Twitter handle @arjenpdevries
Better Search – “Deep Personalization”
 “Even more broadly than trying to get people the right
content based on their context, we as a community need to
be thinking about how to support people through the entire
search experience.”
Jaime Teevan on “Slow Search”
 Search as a dialogue
My first journal paper:
De Vries, Van der Veer and Blanken: Let’s talk about it: dialogues with multimedia databases (1998)
Moving Forward
 Elements of the “Slow Search movement” at TREC today:
- Sessions
- Tasks
- Dynamic domains
- Total recall
- Complex Answer Retrieval (new!)
Missing from TREC!
 Access to rich personal data including email, browsing
history, documents read and contents of the user’s home
directory…
Trade log data!
IR-809: (2011) Feild, H., Allan, J. and Glatt, J.,
"CrowdLogging: Distributed, private, and
anonymous search logging," Proceedings of the
International Conference on Research and
Development in Information Retrieval (SIGIR'11),
pp. 375-384. [View bibtex]
We describe an approach for distributed search log collection, storage, and mining,
with the dual goals of preserving privacy and making the mined information broadly
available. [..] The approach works with any search behavior artifact that can be
extracted from a search log, including queries, query reformulations, and query-
click pairs.
Open challenges
 How to select the part of your log data you are willing to
trade?
 How to estimate the value of this log data?
 And a social challenge, not so much scientific:
How to get people to participate?
Branding
Branding (NL)
The TREC Brand
 A community that creates reusable test collections
Extra Slides
Reproducibility vs. Representativeness
 Increasing representativeness of a TREC task should not
come at the cost of sacrificing reproducibility
(104 characters )
Samar, T., Bellogín, A. & de Vries, A.P. Inf Retrieval J (2016) 19: 230.
doi: 10.1007/s10791-015-9276-9
Baltimore
Baltimore
 Title query of TREC topic 478 for the information need “Who is
the mayor of Baltimore”
 “The honest conclusion of this year’s evaluation should be that we
underestimated the problem of handling Web data. Surprising is
the performance of the title-only queries doing better than queries
including description or even narrative. It seems that the web-track
topics are really different from the previous TREC topics in the ad-
hoc task, for which we never weighted title terms different from
description or narrative.”
(Quote from the CWI TREC-9 paper)

TREC 2016: Looking Forward Panel

  • 1.
    Looking Forward Prof.dr.ir. ArjenP. de Vries arjen@acm.org Gaithersburg MD, November 15th, 2016
  • 2.
  • 3.
    Top Result:  50years of Star Trek (Article on the Verge about Facebook Like buttons)
  • 4.
    Science Fiction  Defininga TREC task or a track is like time-travel in Back to the Future Note to the audience: that is just 74 characters You could even add the hashtag #TREC #TRECCelebrations and my Twitter handle @arjenpdevries
  • 6.
    Better Search –“Deep Personalization”  “Even more broadly than trying to get people the right content based on their context, we as a community need to be thinking about how to support people through the entire search experience.” Jaime Teevan on “Slow Search”  Search as a dialogue My first journal paper: De Vries, Van der Veer and Blanken: Let’s talk about it: dialogues with multimedia databases (1998)
  • 7.
    Moving Forward  Elementsof the “Slow Search movement” at TREC today: - Sessions - Tasks - Dynamic domains - Total recall - Complex Answer Retrieval (new!)
  • 8.
    Missing from TREC! Access to rich personal data including email, browsing history, documents read and contents of the user’s home directory…
  • 10.
    Trade log data! IR-809:(2011) Feild, H., Allan, J. and Glatt, J., "CrowdLogging: Distributed, private, and anonymous search logging," Proceedings of the International Conference on Research and Development in Information Retrieval (SIGIR'11), pp. 375-384. [View bibtex] We describe an approach for distributed search log collection, storage, and mining, with the dual goals of preserving privacy and making the mined information broadly available. [..] The approach works with any search behavior artifact that can be extracted from a search log, including queries, query reformulations, and query- click pairs.
  • 11.
    Open challenges  Howto select the part of your log data you are willing to trade?  How to estimate the value of this log data?  And a social challenge, not so much scientific: How to get people to participate?
  • 12.
  • 13.
  • 14.
    The TREC Brand A community that creates reusable test collections
  • 15.
  • 16.
    Reproducibility vs. Representativeness Increasing representativeness of a TREC task should not come at the cost of sacrificing reproducibility (104 characters ) Samar, T., Bellogín, A. & de Vries, A.P. Inf Retrieval J (2016) 19: 230. doi: 10.1007/s10791-015-9276-9
  • 17.
  • 18.
    Baltimore  Title queryof TREC topic 478 for the information need “Who is the mayor of Baltimore”  “The honest conclusion of this year’s evaluation should be that we underestimated the problem of handling Web data. Surprising is the performance of the title-only queries doing better than queries including description or even narrative. It seems that the web-track topics are really different from the previous TREC topics in the ad- hoc task, for which we never weighted title terms different from description or narrative.” (Quote from the CWI TREC-9 paper)