Haystack keynote 2019: What is Search Relevance? - Max Irwin

Principal, OpenSource Connections and Solr Consultant at OpenSource Connections
May. 17, 2019
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Haystack keynote 2019: What is Search Relevance? - Max Irwin
1 of 44

More Related Content

What's hot

Amrapali Zaveri DefenseAmrapali Zaveri Defense
Amrapali Zaveri DefenseAmrapali Zaveri, PhD
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017Investigating Performance: Design & Outcomes with xAPI | LSCon 2017
Investigating Performance: Design & Outcomes with xAPI | LSCon 2017HT2 Labs
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered SearchTrey Grainger

Similar to Haystack keynote 2019: What is Search Relevance? - Max Irwin

On serendipity in recommender systems - Haifa RecSoc workshop june 2015On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015Giovanni Semeraro
Understanding SearchUnderstanding Search
Understanding SearchChristina Wodtke
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Data Driven Innovation
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...Rocío Cañamares
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...Università degli Studi di Milano-Bicocca
Thesis PresentationThesis Presentation
Thesis Presentationnirvdrum

Similar to Haystack keynote 2019: What is Search Relevance? - Max Irwin(20)

More from OpenSource Connections

EncoresEncores
EncoresOpenSource Connections
Test driven relevancyTest driven relevancy
Test driven relevancyOpenSource Connections
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with SolrOpenSource Connections
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections

More from OpenSource Connections(20)

Recently uploaded

All-sql-cheat-sheet-a4.pdfAll-sql-cheat-sheet-a4.pdf
All-sql-cheat-sheet-a4.pdfssuser8392a0
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Cathrine Wilhelmsen
Machine Learning for Your Business - e-Definers Technology.pptxMachine Learning for Your Business - e-Definers Technology.pptx
Machine Learning for Your Business - e-Definers Technology.pptxe-Definers Technology
Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...
Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...Richard Lawrence
SUSTAINABLE NETWORKS.pptxSUSTAINABLE NETWORKS.pptx
SUSTAINABLE NETWORKS.pptxGeorgeDiamandis11

Haystack keynote 2019: What is Search Relevance? - Max Irwin

Editor's Notes

  1. <This slide will be shown while people are taking their seats.>
  2. <This slide will be shown when making announcements>
  3. This is a photo from last year’s Haystack. Raise your hand if you are in this photo! clap by Berkah Icon from the Noun Project
  4. This year, we’re going to talk about relevance, and how it’s related to people, and how it’s related to machines So what does relevance mean to each of these? And how do we unify them to bridge the gap?
  5. I like to talk about search quality that goes beyond relevance and considers experience and performance. And there’s an interesting parallel we can make if we try looking at relevance on it’s own. But the truth is that they’re these three are inseparable. https://commons.wikimedia.org/wiki/File:Flower_jtca001.jpg
  6. Because we make big promises to customers. And we put our reputation on the line when catch phrases are espoused. So we need to take a closer look and understand what all this relevance stuff really means. Because none of the promises can deliver unless we really know what’s relevant.
  7. Because for machines, you need an absolute and clearly defined mathematical rule to measure success. For search, that measure of success for search is relevance. https://commons.wikimedia.org/wiki/File:Gradient_method.svg
  8. But people and businesses, have a very difficult time defining success when it relates to relevance – and even more so in a way that machines can understand. Because we have a complex and intimate understanding of the world around us. Simplifying it is not so easy. Let’s see why.
  9. While we pause to examine the cranial anatomy of the Relevance Engineer, we first ask them: “what is relevance?” http://clipart-library.com/clipart/8iEbGk88T.htm
  10. And they might show you this. Some of you may recognize this instantly but I’m sure it’s a mystery to many of you. This is the formula for “normalized Discounted Cumulative Gain”, better known as nDCG.
  11. And when we evaluate search with judgement data we get a number. Here’s a possible result. But what does it mean? How did we get that? Even if I showed you the query and the documents that produced this number, is there any real world comparison that you can ascribe it to?
  12. Let’s see how it works…and we’ll look at just the numerator which is Discounted Cumulative Gain, or DCG. This will get you the DCG score for 1 query with p relevance graded results. <animate and read the steps>
  13. Recap
  14. Let’s look at all the possible combinations we’d ever see in the top 4 results, and what the nDCG score would be.
  15. Here’s what it looks like when you view the spectrum of all possible relevance combinations for the top five results of a query. The different colors represent how strict, or lenient, the graded punishment is. 1.0 nDCG is considered perfect success with relevance. 0.0 nDCG is complete irrelevance.
  16. We can also look at how strict we want to be in the score. These are just variations on the ‘punishment for lower rank’ part of nDCG. But you can see we have a good deal of control tailoring nDCG for how we want to represent relevance.
  17. But we’re still missing something. We have yet to define relevance at the atomic level, so that a machine can understand it.
  18. RELEVANCE DENOTES HOW WELL A RETRIEVED DOCUMENT MEETS A USER’S INFORMATION NEED. OK this is great. But what’s an information need?
  19. AN INFORMATION NEED IS A DESIRE TO LOCATE AND OBTAIN INFORMATION TO SATISFY A CONSCIOUS OR UNCONSCIOUS NEED. So that’s the really tricky part! How are we, the humble product and engineering folks, able to understand the needs of our customers when even they are not conscious of it?
  20. Well we add judgements and we look at usage data, to try and dig into this problem and come up with a good shape and model of understanding that we work towards.
  21. Let’s ask the humans for judgements first. While we would trust our experts to describe relevance, we need to be careful and make sure that it is done properly.
  22. So we look to something called inter-rater reliability. The field started in psychology. We draw from psychology because that has the tools we need to dive into the conscious and unconscious of our customers and raters! Inter-rater reliability came from the requirement for consensus on patient diagnosis to be measured.
  23. Let’s look to the work of Klaus Krippendorff. https://50.asc.upenn.edu/drupal/klaus-krippendorff
  24. He developed a coefficient now known as Krippendorff’s Alpha. https://en.wikipedia.org/wiki/Krippendorff%27s_alpha
  25. Which measures the deviation of agreement from chance. So if you were to pick your relevance judgements at random, it would give you zero. More agreeability between raters would get you closer to 1. Interestingly it’s possible to have negative alpha if the disagreement is worse than random!
  26. So you may get a number like this. And you’ll see that perhaps your raters don’t really agree. With a large group, what does that mean? I can’t trust anyone?
  27. Well there’s another mathematician named Arprad Elo, who developed the Elo rating system for chessplayers.
  28. It gives you a way to give a rating to someone in a competition to see how likely they are to win against others. It was initially used for chess, but it can be used to measure almost any competitive system. here’s an example of tennis champions and their Elo ratings. https://www.betfair.com.au/hub/an-introduction-to-tennis-modelling/
  29. Let’s play a game. We will give our raters a starting Elo rating. We will turn rating agreement into a contest, and award those who agree on relevance.
  30. If you agree with someone else you win. If you disagree with everyone else, you lose https://emojipedia.org/apple/
  31. Here’s an example of 106 crowdsourced raters being compared for agreement 50000 times and getting an Elo rating. Everyone starts with the rating of ‘1’. After playing enough relevance judgement games, this shows the likelihood of a rater to agree or disagree using their representative rating. There is vast diversity here. Some are very likely to agree, and some are very likely to disagree. But does that make those who disagree wrong?
  32. It might not. Because everyone has their own mind, and everyone has their own needs.
  33. OK, taking a step back, let’s look at the logs.
  34. A huge problem that many teams face is that relevance is really a data annotation and training problem. And it is difficult to attribute automatic data annotation to relevance success when you can’t understand the numbers being produced by something like nDCG. So you’ll be walking along the forest gathering data berries… …and then you get chased out by the reality werewolf of misunderstanding how to interpret data. So you need a plan and a methodology for taking the right path. https://upload.wikimedia.org/wikipedia/commons/9/91/Werewolf.svg https://www.flickr.com/photos/99873033@N08/17780417711
  35. Now we’ll turn to the logs. If we have a mature enough analytics capture you might have this data. What we call conversations, or sessions, in search.
  36. If we remember that we need a goal to https://commons.wikimedia.org/wiki/File:Time_study_stopwatch.JPG
  37. MAKE THIS LESS CONFUSING BY ONLY USING THE BLUE LINE
  38. And we see the obvious connection, that you probably already knew when we started. It’s time, it’s effort. It’s those factors which weigh heavy on whether our customers will be happy with search. Getting that needle down as low as possible is the great frontier of search. But this is really just one methodology! You have your own, because your product and your users are unique!
  39. We are the community that pushes advancements for open search methodology. We’ve got some great talks. We’ve got some great people. We’re here to learn, connect, and grow. https://commons.wikimedia.org/wiki/File:Flower_jtca001.jpg
  40. Welcome to Haystack!