Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
How The New York Times
Tackles Relevance
Jeremiah Via
or how to find "bruce lambert hot
dogs"
Background
• Surface all Times content
• ~18 million assets
• Powers a lot of systems at The Times
Background
Ingest API
Elasticsearch
Data Processing
Mappings
Query Building
Search was basic
{"french toast"
Query String Query
{
"query_string" : {
"query" : "french toast",
"fields": ["headline", "body", "byline"]
"operator": "AN...
Decay Function
Now -100d
publication_date
All Together
{"french toast"
Changes were Complaint Driven
The Final Straw
bruce lambert hot dogs
The Final Straw
The Final Straw
The Final Straw
(headline:bruce OR body:bruce OR byline:bruce)
AND
(headline:lambert OR body:lambert OR byline:lambert)
AN...
The Final Straw
Museum and Gallery Listings
An autumnal landscape called "Straw Dogs" exudes a stirring
romance, part Sam ...
Query 2.0
Document Scoresuseconomy
The March Job Numbers Tell Us the Economy Is (Still) Doing Fin
What Booming Markets Are Telling U...
Document Scores
We needed to re-analyze data.
Search Architecture
Ingest API
Elasticsearch
Search Corpora
1851

-

1859
1860

-

1865
1866

-

1959
1981

-

2004
2006

-

2019
AbstractXML
Full Text XML
1960

-

19...
Search Architecture
∞ 12 hours
Danny
O'Connor
Patronymic Last Names
Danny O'Leary
Danny O'Connor
Danny O'Brien
Patronymic Last Names
Danny
O
Patronymic Last Names
Noun Phrases
bret stephens first amendment
Noun Phrases
Noun Phrases
Noun Phrases
first amendment
1st_amendment
We needed offline metrics.
Offline Metrics
Apple Loses $900,000 To Fake iPhone Swaps
appleiphone
2019-04-06
It’s Your iPhone. Why Can’t You Fix It Yo...
Hard Day's Night
Hard Day's Night
Hard Day's Night
Offline Metrics
P1
OCR
A rEw I.ITEnAttYV181TOR: Zi'elteartllatbir. $gRT , the well- , is to make a
tour in this . Mr. Smith, .has been eminen...
22 millionunique tokens
The Elasticsearch index contained
We needed online metrics.
What do users actually do?
Columns
Trends Report
Mapping Changes
The New York Times Index
https://www.flickr.com/photos/jichikawa
The New York Times Index
{
"subjects": [{
"uri": "nyt://subject/f12ea1c9-22f9-5034-9426-efd8039a80e1",
"displayName": "Ani...
So much tagging
1.4m
Persons
72K
Locations
207K
Creative Works
148K
Subjects
We Did It
Summary
• Understand search: You have to build a deep understanding of how search works and what all
these systems are doi...
Thank You
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via

Download to read offline

The New York Times has had search for a long time but 2018 was the year in which the company engaged with relevance in a deep way. The aim of this talk is to share what we've learned as we've increased our search sophistication and some of the challenges we still face.

Some of the techniques we've adopted in this past year include offline metrics testing, reflective testing, and user engagement metrics. We now have a process in place to quickly get mappings changes out to production. As a team we now also have a vocabulary for talking about relevance and can use it to discuss trade-offs and goals in conjunction with our metrics.

We hope this talk is of use to those who've put off working on search relevance due to fear, uncertainty, or ambivalence. We will talk about how we went from working on everything but search relevance to finally pulling back the curtain on the search system. We hope what we've learned can help others get started.

  • Be the first to like this

2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via

  1. 1. How The New York Times Tackles Relevance Jeremiah Via or how to find "bruce lambert hot dogs"
  2. 2. Background • Surface all Times content • ~18 million assets • Powers a lot of systems at The Times
  3. 3. Background Ingest API Elasticsearch Data Processing Mappings Query Building
  4. 4. Search was basic {"french toast"
  5. 5. Query String Query { "query_string" : { "query" : "french toast", "fields": ["headline", "body", "byline"] "operator": "AND", "minimum_should_match": "75%" } }
  6. 6. Decay Function Now -100d publication_date
  7. 7. All Together {"french toast"
  8. 8. Changes were Complaint Driven
  9. 9. The Final Straw bruce lambert hot dogs
  10. 10. The Final Straw
  11. 11. The Final Straw
  12. 12. The Final Straw (headline:bruce OR body:bruce OR byline:bruce) AND (headline:lambert OR body:lambert OR byline:lambert) AND (headline:hot OR body:hot OR byline:hot) AND (headline:dogs OR body:dogs OR byline:dogs)
  13. 13. The Final Straw Museum and Gallery Listings An autumnal landscape called "Straw Dogs" exudes a stirring romance, part Sam Peckinpah, part Thomas Hardy. The lines dividing art, music and film were blurry enough to allow Laurie Anderson, Bruce Nauman and Yoko Ono, among others, to shi from object making to performance and back again. Yvon Lambert, 550 West 21st Street, Chelsea. The work is tightly installed, text-intensive and hot with information.
  14. 14. Query 2.0
  15. 15. Document Scoresuseconomy The March Job Numbers Tell Us the Economy Is (Still) Doing Fin What Booming Markets Are Telling Us About the Global Econom The Terrible, Wonderful, Inscrutable Economy 36.2 32.7 31.8
  16. 16. Document Scores
  17. 17. We needed to re-analyze data.
  18. 18. Search Architecture Ingest API Elasticsearch
  19. 19. Search Corpora 1851 - 1859 1860 - 1865 1866 - 1959 1981 - 2004 2006 - 2019 AbstractXML Full Text XML 1960 - 1980 2004 - 2006 Web Archive CMS
 JSON Civil War XML Abstract XML 60s-70s 
 XML
  20. 20. Search Architecture ∞ 12 hours
  21. 21. Danny O'Connor Patronymic Last Names
  22. 22. Danny O'Leary Danny O'Connor Danny O'Brien Patronymic Last Names
  23. 23. Danny O Patronymic Last Names
  24. 24. Noun Phrases bret stephens first amendment
  25. 25. Noun Phrases
  26. 26. Noun Phrases
  27. 27. Noun Phrases first amendment 1st_amendment
  28. 28. We needed offline metrics.
  29. 29. Offline Metrics Apple Loses $900,000 To Fake iPhone Swaps appleiphone 2019-04-06 It’s Your iPhone. Why Can’t You Fix It Yourself? 2019-04-06 Photo of iPhone is a fake, but buyers seem real 2006-09-12 An Apple Pie That Lasts for Days 2017-10-20 5 5 2 1
  30. 30. Hard Day's Night
  31. 31. Hard Day's Night
  32. 32. Hard Day's Night
  33. 33. Offline Metrics
  34. 34. P1
  35. 35. OCR A rEw I.ITEnAttYV181TOR: Zi'elteartllatbir. $gRT , the well- , is to make a tour in this . Mr. Smith, .has been eminently successful also as a lecturer; .treat us to descriptions of his recent travels in ~'. In England orer. ing houses."
  36. 36. 22 millionunique tokens The Elasticsearch index contained
  37. 37. We needed online metrics.
  38. 38. What do users actually do?
  39. 39. Columns
  40. 40. Trends Report
  41. 41. Mapping Changes The New York Times Index https://www.flickr.com/photos/jichikawa
  42. 42. The New York Times Index { "subjects": [{ "uri": "nyt://subject/f12ea1c9-22f9-5034-9426-efd8039a80e1", "displayName": "Animal Abuse, Rights and Welfare", "vernacular": "Animal Abuse" }], "organizations": [{ "uri": "nyt://organization/84eb681b-64ef-5457-a600-b0679b8f9e87", "displayName": "Beyond Meat Inc", "vernacular": "Beyond Meat" }], "persons": [{ "uri": "nyt://person/c6755685-5568-56ba-b3da-2eca4bbe68a0", "displayName": "Friedrich, Bruce (1969- )", "vernacular": "Bruce Friedrich" }] }
  43. 43. So much tagging 1.4m Persons 72K Locations 207K Creative Works 148K Subjects
  44. 44. We Did It
  45. 45. Summary • Understand search: You have to build a deep understanding of how search works and what all these systems are doing when you issue a query. • Make it possible to iterate: Improving search is about making lots of little improvements. You want to remove as much friction in this process as possible and get the infrastructure out of your way. • Know where you're going: You have to know whether your changes will improve things. There is where offline metrics like precision, NDCG, and ERR can help. • Learn about your users: Know how your users are using search. Without this, you can never really know what to focus on. • Leverage your company: You have colleagues with deep domain expertise. Engage them so you don't have to reinvent everything.
  46. 46. Thank You

The New York Times has had search for a long time but 2018 was the year in which the company engaged with relevance in a deep way. The aim of this talk is to share what we've learned as we've increased our search sophistication and some of the challenges we still face. Some of the techniques we've adopted in this past year include offline metrics testing, reflective testing, and user engagement metrics. We now have a process in place to quickly get mappings changes out to production. As a team we now also have a vocabulary for talking about relevance and can use it to discuss trade-offs and goals in conjunction with our metrics. We hope this talk is of use to those who've put off working on search relevance due to fear, uncertainty, or ambivalence. We will talk about how we went from working on everything but search relevance to finally pulling back the curtain on the search system. We hope what we've learned can help others get started.

Views

Total views

131

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

7

Shares

0

Comments

0

Likes

0

×