Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
A Learning to Rank Project on a Daily Song Ranking ProblemSease
Ranking data, i.e., ordered list of items, naturally appears in a wide variety of situation; understanding how to adapt a specific dataset and to design the best approach to solve a ranking problem in a real-world scenario is thus crucial.This talk aims to illustrate how to set up and build a Learning to Rank (LTR) project starting from the available data, in our case a Spotify Dataset (available on Kaggle) on the Worldwide Daily Song Ranking, and ending with the implementation of a ranking model. A step by step (phased) approach to cope with this task using open source libraries will be presented.We will examine in depth the most important part of the pipeline that is the data preprocessing and in particular how to model and manipulate the features in order to create the proper input dataset, tailored to the machine learning algorithm requirements.
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
Entity Search is a search paradigm that aims to retrieve entities and all the information related to them. In the last few years the importance of this topic has become greater and greater due to the fact that 40% of the queries made by users mention specific entities nowdays.
This talk wants to give a first overview of the state-of-the-art methods used for entities retrieval and then describe the new approach Anna has implemented and proposed in her master thesis. The novelty introduced with this work exploits two machine learning techniques: neural network and clustering.
How to Build your Training Set for a Learning To Rank ProjectSease
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
Haystack London - Search Quality Evaluation, Tools and Techniques Andrea Gazzarini
Every search engineer ordinarily struggles with the task of evaluating how well a search engine is performing. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going. The talk will describe the Rated Ranking Evaluator from a developer perspective. RRE is an open source search quality evaluation tool, that could be used for producing a set of deliverable reports and that could be integrated within a continuous integration infrastructure.
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
A Learning to Rank Project on a Daily Song Ranking ProblemSease
Ranking data, i.e., ordered list of items, naturally appears in a wide variety of situation; understanding how to adapt a specific dataset and to design the best approach to solve a ranking problem in a real-world scenario is thus crucial.This talk aims to illustrate how to set up and build a Learning to Rank (LTR) project starting from the available data, in our case a Spotify Dataset (available on Kaggle) on the Worldwide Daily Song Ranking, and ending with the implementation of a ranking model. A step by step (phased) approach to cope with this task using open source libraries will be presented.We will examine in depth the most important part of the pipeline that is the data preprocessing and in particular how to model and manipulate the features in order to create the proper input dataset, tailored to the machine learning algorithm requirements.
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
Entity Search is a search paradigm that aims to retrieve entities and all the information related to them. In the last few years the importance of this topic has become greater and greater due to the fact that 40% of the queries made by users mention specific entities nowdays.
This talk wants to give a first overview of the state-of-the-art methods used for entities retrieval and then describe the new approach Anna has implemented and proposed in her master thesis. The novelty introduced with this work exploits two machine learning techniques: neural network and clustering.
How to Build your Training Set for a Learning To Rank ProjectSease
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
Haystack London - Search Quality Evaluation, Tools and Techniques Andrea Gazzarini
Every search engineer ordinarily struggles with the task of evaluating how well a search engine is performing. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going. The talk will describe the Rated Ranking Evaluator from a developer perspective. RRE is an open source search quality evaluation tool, that could be used for producing a set of deliverable reports and that could be integrated within a continuous integration infrastructure.
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
How to Build your Training Set for a Learning To Rank Project - HaystackSease
Presented by Alessandro Benedetti of Sease, Learning to Rank (LTR) is the application of machine learning techniques (typically supervised), in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular, organizations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
- model and collect the necessary feedback from the users (implicit or explicit)
- calculate for each training sample a relevance label that is meaningful and not ambiguous (Click Through Rate, Sales Rate ...)
- transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training libraries expect)
Join us as we explore real-world scenarios and dos and don'ts from the e-commerce industry.
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
For more details:
https://sease.io/2020/04/the-importance-of-online-testing-in-learning-to-rank-part-1.html
https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017 and Elasticsearch has an Open Source plugin released in 2018), organizations struggle with the problem of how to evaluate the quality of the models they train.
This talk explores all the major points in both Offline and Online evaluation.
Setting up correct infrastructures and processes for a fair and effective evaluation of the trained models is vital for measuring the improvements/regressions of a LTR system.
The talk is intended for:
– Product Owners, Search Managers, Business Owners
– Software Engineers, Data Scientists, and Machine Learning Enthusiast
Expect to learn :
the importance of Offline testing from a business perspective
how Offline testing can be done with Open Source libraries
how to build a realistic test set from the original data set in input avoiding common mistakes in the process
the importance of Online testing from a business perspective
A/B testing and Interleaving approaches: details and Pros/ Cons
common mistakes and how they can false the obtained results
Join us as we explore real-world scenarios and dos and don’ts from the e-commerce industry!
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
RRE is an open-source search quality evaluation tool that can be used to produce a set of reports about the quality of a system, iteration after iteration, and that can be integrated within a continuous integration infrastructure to monitor quality metrics after each release.
Many aspects remained problematic though:
– how to directly evaluate a middle layer search-API that communicates with Apache Solr or Elasticsearch?
– how to easily generate explicit and implicit ratings without spending hours on tedious json files?
– how to better explore the evaluation results? with nice widgets and interesting insights?
Rated Ranking Evaluator Enterprise solves these problems and much more.
Join us as we introduce the next generation of open-source search quality evaluation tools, exploring the internals and real-world scenarios!
In the last few years, Artificial Intelligence applications have become more and more sophisticated and often operate like algorithmic “black boxes” for decision-making. Due to this fact, some questions naturally arise when working with these models: why should we trust a certain decision taken by these algorithms? Why and how was this prediction made? Which variables mostly influenced the prediction? The most crucial challenge with complex machine learning models is therefore their interpretability and explainability. This talk aims to illustrate an overview of the most popular explainability techniques and their application in Learning to Rank. In particular, we will examine in depth a powerful library called SHAP with both theoretical and practical insights; we will talk about its amazing tools to give an explanation of the model behaviour, especially how each feature impacts the model’s output, and we will explain to you how to interpret the results in a Learning to Rank scenario.
Being your core domain involving real world entities ( such as hotels, restaurant, cars ...) or text documents, searching for similar entities, given one in input, is a very common use case for most of the systems that involve information retrieval. This presentation will start describing how much this problem is present across a variety of different scenarios and how you can use the More Like This feature in the Apache Lucene library to solve it. Building on the introduction the focus will be on how the More Like This module internally works, all the components involved end to end, BM25 text similarity metric and how this has been included through a cospicuos refactor and testing process. The presentation will include real world usage examples and future developments such as improved query building through positional phrase queries and term relevancy scoring pluggability.
Enterprise Search – How Relevant Is Relevance?Sease
Enterprise search is the outlier in search applications. It has to work effectively with very large collections of un-curated content, often in multiple languages, to meet the requirements of employees who need to make business-critical decisions.
In this talk, I will outline the challenges of searching enterprise content. Recent research is revealing a unique pattern of search behaviour in which relevance is both very important and yet also irrelevant, and where recall is just as important as precision. This behaviour has implications for the use of standard metrics for search performance (especially in the case of federated search across multiple applications) and for the adoption of AI/ML techniques.
Interactive Questions and Answers - London Information Retrieval MeetupSease
Answers to some questions about Natural Language Search, Language Modelling (Google Bert, OpenAI GPT-3), Neural Search and Learning to Rank made during our London Information Retrieval Meetup (December).
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. Building on the introduction the focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation in to Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored, such as how it works, and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
his talk will feature some of my recent research into the alternative uses for Solr facets and facet metadata. I will develop the idea that facets can be used to discover similarities between items and attributes in a search index, and show some interesting applications of this idea. A common takeaway is that using facets and facet metadata in non-conventional ways enables the semantic context of a query to be automatically tuned. This has important implications for user-centric and semantically focused relevance.
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
Search engines, recommendation systems, advertising networks, and even data analytics tools all share the same end goal - to deliver the most relevant information possible to meet a given information need (usually in real-time). Perfecting these systems requires algorithms which can build a deep understanding of the domains represented by the underlying data, understand the nuanced ways in which words and phrases should be parsed and interpreted within different contexts, score the relationships between arbitrary phrases and concepts, continually learn from users' context and interactions to make the system smarter, and generate custom models of personalized tastes for each user of the system.
In this talk, we'll dive into both the philosophical questions associated with such systems ("how do you accurately represent and interpret the meaning of words?", "How do you prevent filter bubbles?", etc.), as well as look at practical examples of how these systems have been successfully implemented in production systems combining a variety of available commercial and open source components (inverted indexes, entity extraction, similarity scoring and machine-learned ranking, auto-generated knowledge graphs, phrase interpretation and concept expansion, etc.).
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...OpenSource Connections
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
- flexible and highly configurable for a technical user
- immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
Every search engineer ordinarily struggles with the task of evaluating how well a search engine is performing. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going. The talk will describe the Rated Ranking Evaluator from a developer perspective. RRE is an open source search quality evaluation tool, that could be used for producing a set of deliverable reports and that could be integrated within a continuous integration infrastructure.
Search Quality Evaluation: a Developer PerspectiveSease
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. To satisfy these requirements an helpful tool must be: - flexible and highly configurable for a technical user - immediate, visual and concise for an optimal business utilization In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows. This talk will introduce RRE, it will describe its functionalities and demonstrate how it can be integrated in a project and how it can help to measure and assess the search quality of your search application. The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State Universitydhabalia
The software Quality Analysis is a measure of properties of a piece of software or its
specifications. The direct measurement of software quality is quite difficult due to lack of
quality factor measurement. To resolve this measurement problem, there is a model which
measures the quality of the software in terms of the attributes, specifications and
characteristics. This model is known as LSP (Logic Score Preference) .When client gives
specifications of the software to the developer then client expects the good quality of
software from developers. Hence, to decide the quality of software we can use this LSP
model.
This model validates following software quality attributes.
(1) Functionality
Suitability
Accuracy
Security
Interoperability
Compliance
(2) Usability
Understandability
Learn ability
Operability
(3) Performance
Processing time
Throughput
Resource consumption
(4) Maintainability
(5) Portability
(6) Reusability
In LSP, the features are decomposed into above aggregation blocks. And this decomposition
continues with in the each block until the all the lowest level features are directly measurable
and makes tree of decomposed features. And for each feature, an elementary criterion is
defined. And LSP calculates elementary preference for each criterion and then aggregate all
of them to calculate final global preference. And this global preference shows the quality of
the software. We can calculate global preference for different systems and we can analyze
and compare the systems’ quality.
How to Build your Training Set for a Learning To Rank Project - HaystackSease
Presented by Alessandro Benedetti of Sease, Learning to Rank (LTR) is the application of machine learning techniques (typically supervised), in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular, organizations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
- model and collect the necessary feedback from the users (implicit or explicit)
- calculate for each training sample a relevance label that is meaningful and not ambiguous (Click Through Rate, Sales Rate ...)
- transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training libraries expect)
Join us as we explore real-world scenarios and dos and don'ts from the e-commerce industry.
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
For more details:
https://sease.io/2020/04/the-importance-of-online-testing-in-learning-to-rank-part-1.html
https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017 and Elasticsearch has an Open Source plugin released in 2018), organizations struggle with the problem of how to evaluate the quality of the models they train.
This talk explores all the major points in both Offline and Online evaluation.
Setting up correct infrastructures and processes for a fair and effective evaluation of the trained models is vital for measuring the improvements/regressions of a LTR system.
The talk is intended for:
– Product Owners, Search Managers, Business Owners
– Software Engineers, Data Scientists, and Machine Learning Enthusiast
Expect to learn :
the importance of Offline testing from a business perspective
how Offline testing can be done with Open Source libraries
how to build a realistic test set from the original data set in input avoiding common mistakes in the process
the importance of Online testing from a business perspective
A/B testing and Interleaving approaches: details and Pros/ Cons
common mistakes and how they can false the obtained results
Join us as we explore real-world scenarios and dos and don’ts from the e-commerce industry!
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
RRE is an open-source search quality evaluation tool that can be used to produce a set of reports about the quality of a system, iteration after iteration, and that can be integrated within a continuous integration infrastructure to monitor quality metrics after each release.
Many aspects remained problematic though:
– how to directly evaluate a middle layer search-API that communicates with Apache Solr or Elasticsearch?
– how to easily generate explicit and implicit ratings without spending hours on tedious json files?
– how to better explore the evaluation results? with nice widgets and interesting insights?
Rated Ranking Evaluator Enterprise solves these problems and much more.
Join us as we introduce the next generation of open-source search quality evaluation tools, exploring the internals and real-world scenarios!
In the last few years, Artificial Intelligence applications have become more and more sophisticated and often operate like algorithmic “black boxes” for decision-making. Due to this fact, some questions naturally arise when working with these models: why should we trust a certain decision taken by these algorithms? Why and how was this prediction made? Which variables mostly influenced the prediction? The most crucial challenge with complex machine learning models is therefore their interpretability and explainability. This talk aims to illustrate an overview of the most popular explainability techniques and their application in Learning to Rank. In particular, we will examine in depth a powerful library called SHAP with both theoretical and practical insights; we will talk about its amazing tools to give an explanation of the model behaviour, especially how each feature impacts the model’s output, and we will explain to you how to interpret the results in a Learning to Rank scenario.
Being your core domain involving real world entities ( such as hotels, restaurant, cars ...) or text documents, searching for similar entities, given one in input, is a very common use case for most of the systems that involve information retrieval. This presentation will start describing how much this problem is present across a variety of different scenarios and how you can use the More Like This feature in the Apache Lucene library to solve it. Building on the introduction the focus will be on how the More Like This module internally works, all the components involved end to end, BM25 text similarity metric and how this has been included through a cospicuos refactor and testing process. The presentation will include real world usage examples and future developments such as improved query building through positional phrase queries and term relevancy scoring pluggability.
Enterprise Search – How Relevant Is Relevance?Sease
Enterprise search is the outlier in search applications. It has to work effectively with very large collections of un-curated content, often in multiple languages, to meet the requirements of employees who need to make business-critical decisions.
In this talk, I will outline the challenges of searching enterprise content. Recent research is revealing a unique pattern of search behaviour in which relevance is both very important and yet also irrelevant, and where recall is just as important as precision. This behaviour has implications for the use of standard metrics for search performance (especially in the case of federated search across multiple applications) and for the adoption of AI/ML techniques.
Interactive Questions and Answers - London Information Retrieval MeetupSease
Answers to some questions about Natural Language Search, Language Modelling (Google Bert, OpenAI GPT-3), Neural Search and Learning to Rank made during our London Information Retrieval Meetup (December).
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. Building on the introduction the focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation in to Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored, such as how it works, and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
his talk will feature some of my recent research into the alternative uses for Solr facets and facet metadata. I will develop the idea that facets can be used to discover similarities between items and attributes in a search index, and show some interesting applications of this idea. A common takeaway is that using facets and facet metadata in non-conventional ways enables the semantic context of a query to be automatically tuned. This has important implications for user-centric and semantically focused relevance.
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
Search engines, recommendation systems, advertising networks, and even data analytics tools all share the same end goal - to deliver the most relevant information possible to meet a given information need (usually in real-time). Perfecting these systems requires algorithms which can build a deep understanding of the domains represented by the underlying data, understand the nuanced ways in which words and phrases should be parsed and interpreted within different contexts, score the relationships between arbitrary phrases and concepts, continually learn from users' context and interactions to make the system smarter, and generate custom models of personalized tastes for each user of the system.
In this talk, we'll dive into both the philosophical questions associated with such systems ("how do you accurately represent and interpret the meaning of words?", "How do you prevent filter bubbles?", etc.), as well as look at practical examples of how these systems have been successfully implemented in production systems combining a variety of available commercial and open source components (inverted indexes, entity extraction, similarity scoring and machine-learned ranking, auto-generated knowledge graphs, phrase interpretation and concept expansion, etc.).
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...OpenSource Connections
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
- flexible and highly configurable for a technical user
- immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
Every search engineer ordinarily struggles with the task of evaluating how well a search engine is performing. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going. The talk will describe the Rated Ranking Evaluator from a developer perspective. RRE is an open source search quality evaluation tool, that could be used for producing a set of deliverable reports and that could be integrated within a continuous integration infrastructure.
Search Quality Evaluation: a Developer PerspectiveSease
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. To satisfy these requirements an helpful tool must be: - flexible and highly configurable for a technical user - immediate, visual and concise for an optimal business utilization In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows. This talk will introduce RRE, it will describe its functionalities and demonstrate how it can be integrated in a project and how it can help to measure and assess the search quality of your search application. The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State Universitydhabalia
The software Quality Analysis is a measure of properties of a piece of software or its
specifications. The direct measurement of software quality is quite difficult due to lack of
quality factor measurement. To resolve this measurement problem, there is a model which
measures the quality of the software in terms of the attributes, specifications and
characteristics. This model is known as LSP (Logic Score Preference) .When client gives
specifications of the software to the developer then client expects the good quality of
software from developers. Hence, to decide the quality of software we can use this LSP
model.
This model validates following software quality attributes.
(1) Functionality
Suitability
Accuracy
Security
Interoperability
Compliance
(2) Usability
Understandability
Learn ability
Operability
(3) Performance
Processing time
Throughput
Resource consumption
(4) Maintainability
(5) Portability
(6) Reusability
In LSP, the features are decomposed into above aggregation blocks. And this decomposition
continues with in the each block until the all the lowest level features are directly measurable
and makes tree of decomposed features. And for each feature, an elementary criterion is
defined. And LSP calculates elementary preference for each criterion and then aggregate all
of them to calculate final global preference. And this global preference shows the quality of
the software. We can calculate global preference for different systems and we can analyze
and compare the systems’ quality.
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
The hallmark of a great search experience is always delivering the most relevant results, quickly, to every user. The difficulty lies behind the scenes in making that happen elegantly and at a scale. From App Search’s intuitive drag and drop interface to the advanced relevance capabilities built into the core of Elasticsearch — Elastic offers a range of tools for developers to tune relevance ranking and create incredible search experiences. In this session, we’ll explore some of Elasticsearch’s advanced relevance ranking features, such as dense vector fields, BM25F, ranking evaluation, and more. Plus we’ll give you some ideas for how these features are being used by other Elastic users to create world-class, category defining search experiences.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
• Explored and cleaned huge amount of user activity logs (JSON) from Movies website using Map Reduce jobs in Python.
• Classified user accounts into adults and children for targeted advertising by implementing Similarity Ranking algorithm.
• Grouped user sessions based on user behavior using K means clustering to observe outliers and to find distinctive groups.
• Predicted ratings for movies using User-user and Item-Item based recommendation algorithms using Mahout.
Some highlights from Recsys 2018 presented to my team at Schibsted. Note this is a "biased" summary based on personal interest and work related to my team.
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
The hallmark of a great search experience is always delivering the most relevant results, quickly, to every user. The difficulty lies behind the scenes in making that happen elegantly and at a scale. From App Search’s intuitive drag and drop interface to the advanced relevance capabilities built into the core of Elasticsearch — Elastic offers a range of tools for developers to tune relevance ranking and create incredible search experiences. In this session, we’ll explore some of Elasticsearch’s advanced relevance ranking features, such as dense vector fields, BM25F, ranking evaluation, and more. Plus we’ll give you some ideas for how these features are being used by other Elastic users to create world-class, category defining search experiences.
basic Function and Terminology of Recommendation Systems. Some Algorithmic Implementation with some sample Dataset for Understanding. It contains all the Layers of RS Framework well explained.
Similar to Search Quality Evaluation to Help Reproducibility: An Open-source Approach (20)
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
1. Search Quality Evaluation to
Help Reproducibility:
An Open-source Approach
Alessandro Benedetti, Software Engineer
18th April 2019
2. Who I am
▪ Search Consultant
▪ R&D Software Engineer
▪ Master in Computer Science
▪ Apache Lucene/Solr Enthusiast
▪ Semantic, NLP, Machine Learning
Technologies passionate
▪ Beach Volleyball Player & Snowboarder
Alessandro Benedetti
3. Sease
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
! Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning
4. ✓ Search Quality Evaluation
‣ Context overview
‣ Search System Status
‣ Information Need and Relevancy Ratings
‣ Evaluation Measures
➢ Rated Ranking Evaluator (RRE)
➢ Future Works
Agenda
5. Search Quality Evaluation is the activity of
assessing how good a search system is.
Defining what good means depends on the interests
of who(stakeholder, developer, ect…) is doing the
evaluation.
So it is necessary to measure multiple metrics to
cover all the aspects of the perceived quality and
understand how the system is behaving.
Context Overview
Search Quality Evaluation
Search Quality
Internal Factors
External Factors
Correctness
RobustnessExtendibility
Reusability
Efficiency
Timeliness
Modularity
Readability
Maintainability
Testability
Maintainability
Understandability
Reusability
….
Focused on
Primarily focused on
6. Search Quality: Correctness
In Information Retrieval Correctness is the ability of
a system to meet the information needs of its users.
For each internal (gray) and external (red) iteration it
is vital to measure correctness variations.
Evaluation measures are used to assert how well
the search results satisfies the user's query intent.
Correctness
New system Existing system
Here are the
requirements
V1.0 has been released
Cool!
a month later…
We have a change
request.
We found a bug
We need to improve our
search system, users are
complaining about junk in
search results.
v0.1
…
v0.9
v1.1
v1.2
v1.3
…
v2.0
v2.0
In terms of correctness, how can
we know the system performance
across various versions?
7. Search Quality: Relevancy Ratings
A key concept in the calculation
of offline search quality
metrics is the relevance of a
document given a user
information need(query).
Before assessing the
correctness of the system it is
necessary to associate a
relevancy rating to each pair
<query, document> involved in
our evaluation.
Assign Ratings
Ratings
Set
Explicit
Feedback
Implicit
Feedback
Judgements Collector
Interactions Logger
Queen music
Bohemian
Rhapsody
D a n c i n g
Queen
Queen
Albums
Bohemian
Rhapsody
D a n c i n g
Queen
Queen
Albums
8. Search Quality: Measures
Evaluation Measures
Online Measures
Offline Measures
Average Precision
Mean Reciprocal Rank
Recall
NDCG
Precision Click-through rate
F-Measure
Zero result rate
Session abandonment rate
Session success rate
….
….
We are mainly focused here
Evaluation measures for an information retrieval
system try to formalise how well a search system
satisfies its user information needs.
Measures are generally split into two categories:
online and offline measures.
In this context we will focus on offline measures.
Evaluation Measures
9. Search Quality: Evaluate a System
Input
Information Need with Ratings
e.g.
Set of queries with expected
resulting documents annotated
Metric
e.g.
Precision
Evaluation
System
Corpus of Information
Evaluate Results
Metric Score
0..1
Reproducibility
Keeping these factors locked
I am expecting the same Metric Score
10. ➢ Search Quality Evaluation
✓An Open Source Approach(RRE)
‣ Apache Solr/ES
‣ Search System Status
‣ Rated Ranking Evaluator
‣ Information Need and Relevancy Ratings
‣ Evaluation Measures
‣ Evaluation and Output
➢ Future Works
Agenda
11. Open Source Search Engines
Solr is the popular, blazing-fast, open
source enterprise search platform built
on Apache Lucene™
Elasticsearch is a distributed, RESTful search and
analytics engine capable of solving a growing
number of use cases.
12. Search System Status: Index
- Data
Documents in input
- Index Time Configurations
Indexing Application Pipeline
Update Processing Chain
Text Analysis Configuration
Index
(Corpus of Information)
13. System Status: Query
- Search-API
Build the client query
- Query Time Configurations
Query Parser
Query Building
(Information Need)
Search-API
Query Parser
QUERY: The White Tiger
QUERY: ?q=the white tiger&qf=title,content^10&bf=popularity
QUERY: title:the white tiger OR
content:the white tiger …
14. RRE: What is it?
• A set of search quality evaluation tools
• A search quality evaluation framework
• Multi (search) platform
• Written in Java
• It can be used also in non-Java projects
• Licensed under Apache 2.0
• Open to contributions
• Extremely dynamic!
RRE: What is it?
https://github.com/SeaseLtd/rated-ranking-evaluator
15. RRE: Ecosystem
The picture illustrates the main modules composing
the RRE ecosystem.
All modules with a dashed border are planned for a
future release.
RRE CLI has a double border because although the
rre-cli module hasn’t been developed, you can run
RRE from a command line using RRE Maven
archetype, which is part of the current release.
As you can see, the current implementation includes
two target search platforms: Apache Solr and
Elasticsearch.
The Search Platform API module provide a search
platform abstraction for plugging-in additional
search systems.
RRE Ecosystem
CORE
Plugin
Plugin
Reporting Plugin
Search
Platform
API
RequestHandler
RRE Server
RRE CLI
Plugin
Plugin
Plugin
Archetypes
16. RRE: Reproducibility in Evaluating a System
INPUT
RRE Ratings
e.g.
Json representation of
Information Need with
related annotated
documents
Metric
e.g.
Precision
Evaluation
Apache Solr/ ES
Index (Corpus Of Information)
- Data
- Index Time Configuration
- Query Building(Search API)
- Query Time Configuration
Evaluate Results
Metric Score
0..1
Reproducibility
Running RRE with the same status
I am expecting the same metric score
17. RRE: Information Need Domain Model
• Rooted Tree (the Root is the Evaluation)
• each level enriches the details of the information
need
• The corpus identify the data collection
• The topic assign a human readable semantic
• Query groups expect the same results from the
children
The benefit of having a composite structure is clear:
we can see a metric value at different levels (e.g. a
query, all queries belonging to a query group, all
queries belonging to a topic or at corpus level)
RRE Domain ModelEvaluation
Corpus
1..*
Topic
Query Group
Query
1..*
1..*
1..*
Top level domain entity
dataset / collection to evaluate
High Level Information need
Query variants
Queries
18. RRE: Define Information Need and Ratings
Although the domain model structure is able to
capture complex scenarios, sometimes we want to
model simpler contexts.
In order to avoid verbose and redundant ratings
definitions it’s possible to omit some level.
Combinations accepted for each corpus are:
• only queries
• query groups and queries
• topics, query groups and queries
RRE Domain ModelEvaluation
Corpus
1..*
Doc 2 Doc 3 Doc N
Topic
Query Group
Query
1..*
1..*
1..*
…
= Optional
= Required
Doc 1
19. RRE: Json Ratings
Ratings files associate the RRE domain model
entities with relevance judgments. A ratings file
provides the association between queries and
relevant documents.
There must be at least one ratings file (otherwise no
evaluation happens). Usually there’s a 1:1
relationship between a rating file and a dataset.
Judgments, the most important part of this file,
consist of a list of all relevant documents for a
query group.
Each listed document has a corresponding “gain”
which is the relevancy judgment we want to assign
to that document.
Ratings
OR
20. RRE: Available metrics
These are the RRE built-in metrics which can be
used out of the box.
The most part of them are computed at query level
and then aggregated at upper levels.
However, compound metrics (e.g. MAP, or GMAP)
are not explicitly declared or defined, because the
computation doesn’t happen at query level. The result
of the aggregation executed on the upper levels will
automatically produce these metric.
e.g.
the Average Precision computed for Q1, Q2, Q3, Qn
becomes the Mean Average Precision at Query
Group or Topic levels.
Available Metrics
Precision
Recall
Precision at 1 (P@1)
Precision at 2 (P@2)
Precision at 3 (P@3)
Precision at 10 (P@10)
Average Precision (AP)
Reciprocal Rank
Mean Reciprocal Rank
Mean Average Precision (MAP)
Normalised Discounted Cumulative Gain (NDCG)
F-Measure Compound Metric
21. RRE: Reproducibility in Evaluating a System
INPUT
RRE Ratings
e.g.
Json representation of
Information Need with
r e l a t e d a n n o t a t e d
documents
Metric
e.g.
Precision
Evaluation
Apache Solr/ ES
Index (Corpus Of Information)
- Data
- Index Time Configuration
- Query Building(Search API)
- Query Time Configuration
Evaluate Results
Metric Score
0..1
Reproducibility
Running RRE with the same status
I am expecting the same metric score
22. System Status: Init Search Engine
Data
Configuration
Spins up an embedded
Search Platform
INPUT LAYER
EVALUATION LAYER
- an instance of ES/ Solr is instantiated from
the input configurations
- Data is populated from the input
- The Instance is ready to respond to queries
and be evaluated
N.B. an alternative approach we are working on
is to target a QA instance already populated
In that scenario is vital to keep version
controlled the configuration and data
Embedded
23. System Status: Configuration Sets
- Configurations evolve with time
Reproducibility: track it with version control
systems!
- RRE can take various version of configurations in
input to compare them
- The evaluation process allows you to define
inclusion / exclusion rules (i.e. include only
version 1.0 and 2.0)
Index/Query Time
Configuration
24. System Status: Feed the Data
An evaluation execution can involve more than one
datasets targeting a given search platform.
A dataset consists consists of representative domain
data; although a compressed dataset can be
provided, generally it has a small/medium size.
Within RRE, corpus, dataset, collection are
synonyms.
Datasets must be located under a configurable
folder. Each dataset is then referenced in one or
more ratings file.
Corpus Of Information
(Data)
25. System Status: Build the Queries
For each query or query group) it’s possible to
define a template, which is a kind of query shape
containing one or more placeholders.
Then, in the ratings file you can reference one of
those defined templates and you can provide a value
for each placeholder.
Templates have been introduced in order to:
• allow a common query management between
search platforms
• define complex queries
• define runtime parameters that cannot be
statically determined (e.g. filters)
Query templates
only_q.json
filter_by_language.json
26. RRE: Reproducibility in Evaluating a System
INPUT
RRE Ratings
e.g.
Json representation of
Information Need with
r e l a t e d a n n o t a t e d
documents
Metric
e.g.
Precision
Evaluation
Apache Solr/ ES
Index (Corpus Of Information)
- Data
- Index Time Configuration
- Query Building(Search API)
- Query Time Configuration
Evaluate Results
Metric Score
0..1
Reproducibility
Running RRE with the same status
I am expecting the same metric score
27. RRE: Evaluation process overview (1/2)
Data
Configuration
Ratings
Search Platform
uses a
produces
Evaluation Data
INPUT LAYER EVALUATION LAYER OUTPUT LAYER
JSON
RRE Console
…
used for generating
28. RRE: Evaluation process overview (2/2)
Runtime Container
RRE Core
Rating Files
Datasets
Queries
Starts the search
platform
Stops the search
platform
Creates & configure the index
Indexes data
Executes query
Computes metric
outputs the evaluation data
Init System
Set Status
Set Status
Stop System
29. RRE: Evaluation Output
The RRE Core itself is a library, so it outputs its
result as a Plain Java object that must be
programmatically used.
However when wrapped within a runtime container,
like the Maven Plugin, the evaluation object tree is
marshalled in JSON format.
Being interoperable, the JSON format can be used by
some other component for producing a different kind
of output.
An example of such usage is the RRE Apache
Maven Reporting Plugin which can
• output a spreadsheet
• send the evaluation data to a running RRE Server
Evaluation output
30. RRE: Workbook
The RRE domain model (topics, groups and queries)
is on the left and each metric (on the right section)
has a value for each version / entity pair.
In case the evaluation process includes multiple
datasets, there will be a spreadsheet for each of
them.
This output format is useful when
• you want to have (or maintain somewhere) a
snapshot about how the system performed in a
given moment
• the comparison includes a lot of versions
• you want to include all available metrics
Workbook
31. RRE: RRE Console
• SpringBoot/AngularJS app that shows real-time
information about evaluation results.
• Each time a build happens, the RRE reporting
plugin sends the evaluation result to a RESTFul
endpoint provided by RRE Console.
• The received data immediately updates the web
dashboard with fresh data.
• Useful during the development / tuning phase
iterations (you don’t have to open again and again
the excel report)
RRE Console
32. RRE: Iterative development & tuning
Dev, tune & Build
Check evaluation results
We are thinking about how
to fill a third monitor
33. RRE: We are working on…
“I think if we could create a simplified
pass/fail report for the business team,
that would be ideal. So they could
understand the tradeoffs of the new
search.”
“Many search engines process the user
query heavily before it's submitted to the
search engine in whatever DSL is required,
and if you don't retain some idea of the
original query in the system how can you”
relate the test results back to user
behaviour?
Do I have to write all judgments
manually??
How can I use RRE if I have a custom
search platform?
Java is not in my stack
Can I persist the evaluation data?
34. RRE: Github Repository and Resources
• A sample RRE-enabled project
• No Java code, only configuration
• Search Platform: Elasticsearch 6.3.2
• Seven example iterations
• Index shapes & queries from Relevant Search [1]
• Dataset: TMBD (extract)
Demo Project
https://github.com/SeaseLtd/
rre-demo-walkthrough
[1] https://www.manning.com/books/relevant-search
https://github.com/SeaseLtd/
rated-ranking-evaluator
Github Repo
https://sease.io/2018/07/
rated-ranking-evaluator.html
Blog article