Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. To satisfy these requirements an helpful tool must be: - flexible and highly configurable for a technical user - immediate, visual and concise for an optimal business utilization In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows. This talk will introduce RRE, it will describe its functionalities and demonstrate how it can be integrated in a project and how it can help to measure and assess the search quality of your search application. The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
A Learning to Rank Project on a Daily Song Ranking ProblemSease
Ranking data, i.e., ordered list of items, naturally appears in a wide variety of situation; understanding how to adapt a specific dataset and to design the best approach to solve a ranking problem in a real-world scenario is thus crucial.This talk aims to illustrate how to set up and build a Learning to Rank (LTR) project starting from the available data, in our case a Spotify Dataset (available on Kaggle) on the Worldwide Daily Song Ranking, and ending with the implementation of a ranking model. A step by step (phased) approach to cope with this task using open source libraries will be presented.We will examine in depth the most important part of the pipeline that is the data preprocessing and in particular how to model and manipulate the features in order to create the proper input dataset, tailored to the machine learning algorithm requirements.
Haystack London - Search Quality Evaluation, Tools and Techniques Andrea Gazzarini
Every search engineer ordinarily struggles with the task of evaluating how well a search engine is performing. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going. The talk will describe the Rated Ranking Evaluator from a developer perspective. RRE is an open source search quality evaluation tool, that could be used for producing a set of deliverable reports and that could be integrated within a continuous integration infrastructure.
From Academic Papers To Production : A Learning To Rank StoryAlessandro Benedetti
This talk is about the journey to bring Learning To Rank (LTR from now on) to the e-commerce domain in a real world scenario, including all the pitfalls and disillutions involved.
LTR is a fantastic approach to solve complex ranking problems but industry domains are far from being the ideal world where those technologies were designed and experimented : open source software implementations are not working perfectly out of the box and require advanced tuning; industry training data is dirty, noisy and incomplete.
This talk will guide you through the different phases and technologies involved in a LTR project with a pragmatic approach.
Feature Engineering, Domain Modelling, Training Set Building, Model Training, Search Integration and Online Evaluation : each of them presents different challenges in the real world and must be carefully approached.
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
The document summarizes Anna Ruggero's presentation on entity search using graph embeddings. The presentation introduced an approach to entity search that creates virtual documents combining related entities based on their embeddings. Entities from DBPedia were represented as nodes in a graph and Node2Vec was used to generate embeddings. The embeddings were clustered to form documents, which were ranked using BM25. Systems that combined or fused the document rankings and entity rankings were evaluated on entity search tasks and found relevant entities that traditional methods did not.
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
A Learning to Rank Project on a Daily Song Ranking ProblemSease
Ranking data, i.e., ordered list of items, naturally appears in a wide variety of situation; understanding how to adapt a specific dataset and to design the best approach to solve a ranking problem in a real-world scenario is thus crucial.This talk aims to illustrate how to set up and build a Learning to Rank (LTR) project starting from the available data, in our case a Spotify Dataset (available on Kaggle) on the Worldwide Daily Song Ranking, and ending with the implementation of a ranking model. A step by step (phased) approach to cope with this task using open source libraries will be presented.We will examine in depth the most important part of the pipeline that is the data preprocessing and in particular how to model and manipulate the features in order to create the proper input dataset, tailored to the machine learning algorithm requirements.
Haystack London - Search Quality Evaluation, Tools and Techniques Andrea Gazzarini
Every search engineer ordinarily struggles with the task of evaluating how well a search engine is performing. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going. The talk will describe the Rated Ranking Evaluator from a developer perspective. RRE is an open source search quality evaluation tool, that could be used for producing a set of deliverable reports and that could be integrated within a continuous integration infrastructure.
From Academic Papers To Production : A Learning To Rank StoryAlessandro Benedetti
This talk is about the journey to bring Learning To Rank (LTR from now on) to the e-commerce domain in a real world scenario, including all the pitfalls and disillutions involved.
LTR is a fantastic approach to solve complex ranking problems but industry domains are far from being the ideal world where those technologies were designed and experimented : open source software implementations are not working perfectly out of the box and require advanced tuning; industry training data is dirty, noisy and incomplete.
This talk will guide you through the different phases and technologies involved in a LTR project with a pragmatic approach.
Feature Engineering, Domain Modelling, Training Set Building, Model Training, Search Integration and Online Evaluation : each of them presents different challenges in the real world and must be carefully approached.
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
The document summarizes Anna Ruggero's presentation on entity search using graph embeddings. The presentation introduced an approach to entity search that creates virtual documents combining related entities based on their embeddings. Entities from DBPedia were represented as nodes in a graph and Node2Vec was used to generate embeddings. The embeddings were clustered to form documents, which were ranked using BM25. Systems that combined or fused the document rankings and entity rankings were evaluated on entity search tasks and found relevant entities that traditional methods did not.
How to Build your Training Set for a Learning To Rank ProjectSease
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
This document provides an overview of using Apache Lucene to perform document similarity searches. It discusses how Lucene indexes documents and calculates term scores using techniques like TF-IDF and BM25. The document demonstrates Lucene's More Like This module, which finds similar documents by building a query from weighted interesting terms in the input document. It also proposes future work, such as leveraging term positions and fields across multiple input documents to improve similarity searches.
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
This document provides an overview of explainability methods for learning to rank models. It introduces the SHAP (Shapley Additive Explanations) library as a popular method for explaining machine learning models using game theory. A case study demonstrates how to use Tree SHAP to generate summary, force, dependence, and decision plots to explain the features influencing a LambdaMART ranking model. Key advantages and limitations of SHAP are discussed, such as its ability to handle different data types but computational expense. Code examples are provided for generating SHAP values and plots in Python.
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
Rated Ranking Evaluator Enterprise is an enterprise version of the open source Rated Ranking Evaluator search quality evaluation tool. It features query discovery to automatically extract queries from a search API, rating generation using both explicit ratings and implicit feedback, and an interactive UI for exploring and comparing evaluation results. The UI provides overview, exploration, and comparison views of evaluation data to meet the needs of business stakeholders and software engineers. Future work aims to improve the tool's capabilities around configuration, multimedia support, insights generation, and click modeling.
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
1. The document discusses evaluating learning to rank models, including offline and online evaluation methods. Offline evaluation involves building a test set from labeled data and evaluating metrics like NDCG, while online evaluation uses methods like A/B testing and interleaving to directly measure user behavior and business metrics.
2. Common mistakes in offline evaluation include having only one sample per query, single relevance labels per query, and unrepresentative test samples. While offline evaluation provides efficiency, online evaluation allows observing real user interactions and model impact on key metrics.
3. Recommendations are given to test models both offline and online, with online testing providing advantages like measuring actual business outcomes and interpreting model effects.
Interactive Questions and Answers - London Information Retrieval MeetupSease
Answers to some questions about Natural Language Search, Language Modelling (Google Bert, OpenAI GPT-3), Neural Search and Learning to Rank made during our London Information Retrieval Meetup (December).
Enterprise Search – How Relevant Is Relevance?Sease
Enterprise search is the outlier in search applications. It has to work effectively with very large collections of un-curated content, often in multiple languages, to meet the requirements of employees who need to make business-critical decisions.
In this talk, I will outline the challenges of searching enterprise content. Recent research is revealing a unique pattern of search behaviour in which relevance is both very important and yet also irrelevant, and where recall is just as important as precision. This behaviour has implications for the use of standard metrics for search performance (especially in the case of federated search across multiple applications) and for the adoption of AI/ML techniques.
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
The document discusses leveraging Lucene/Solr as a knowledge graph and intent engine. It describes building an intent engine that incorporates type-ahead prediction, spelling correction, entity and entity-type resolution, semantic query parsing, and query augmentation using a knowledge graph. The intent engine aims to understand the user's intent beyond the literal query string and help express their intent through an interactive search experience.
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
The document discusses implementing conceptual search in Solr. It describes how conceptual search aims to improve recall without reducing precision by matching documents based on concepts rather than keywords alone. It explains how Word2Vec can be used to learn related concepts from documents and represent words as vectors, which can then be embedded in Solr through synonym filters and payloads to enable conceptual search queries. This allows retrieving more relevant documents that do not contain the exact search terms but are still conceptually related.
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
Talk Abstract: Most work in semantic search has thus far focused upon either manually building language-specific taxonomies/ontologies or upon automatic techniques such as clustering or dimensionality reduction to discover latent semantic links within the content that is being searched. The former is very labor intensive and is hard to maintain, while the latter is prone to noise and may be hard for a human to understand or to interact with directly. We believe that the links between similar user’s queries represent a largely untapped source for discovering latent semantic relationships between search terms. The proposed system is capable of mining user search logs to discover semantic relationships between key phrases in a manner that is language agnostic, human understandable, and virtually noise-free.
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...Lucidworks
The document describes the NLP4L framework, which aims to improve search experiences for Lucene-based search systems. It provides reference implementations of plugins and corpora that use NLP/ML technologies to generate models, dictionaries, and indexes. It also includes an interface for users to examine output dictionaries since NLP/ML results may not be perfect. The framework is pluggable and processes data sources through various NLP tasks to generate tagged corpora, dictionaries, and document vectors to enhance search features like suggestions, translations, entity extraction, and ranking.
Self-learned Relevancy with Apache SolrTrey Grainger
Search engines are known for "relevancy", but the relevancy models that ship out of the box (BM25, classic tf-idf, etc.) are just scratching the surface of what's needed for a truly insightful application.
What if your search engine could automatically tune its own domain-specific relevancy model based on user interactions? What if it could learn the important phrases and topics within your domain, learn the conceptual relationships embedded within your documents, and even use machine-learned ranking to discover the relative importance of different features and then automatically optimize its own ranking algorithms for your domain? What if you could further use SQL queries to explore these relationships within your own BI tools and return results in ranked order to deliver relevance-driven analytics visualizations?
In this presentation, we'll walk through how you can leverage the myriad of capabilities in the Apache Solr ecosystem (such as the Solr Text Tagger, Semantic Knowledge Graph, Spark-Solr, Solr SQL, learning to rank, probabilistic query parsing, and Lucidworks Fusion) to build self-learning, relevance-first search, recommendations, and data analytics applications.
The document describes a Big Data project analyzing the Stack Overflow dataset. Key points:
- The dataset was obtained from Stack Exchange and consists of XML files totaling around 20GB that were parsed and loaded into HDFS.
- The data was analyzed to identify trending questions, unanswered questions, closed questions, dead questions, top tags, and more. PageRank analysis was also performed to rank posts.
- The results were visualized in a web application using technologies like Hive, HBase, Pig, and Mahout recommendation. Performance issues were encountered during the MapReduce parsing of the large XML files.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
Exploring Direct Concept Search - Steve Rowe, LucidworksLucidworks
This document discusses direct concept search using word embeddings. It describes mapping query and index terms to vector representations in a conceptual space to improve recall by expanding queries with related concepts. Word2vec is used to generate 127-dimensional word embeddings from Wikipedia text. The embeddings are indexed in Lucene to enable nearest neighbor search. Queries are expanded by searching for terms nearest to query terms in the embedding space. While building high-dimensional point indexes is slow in Lucene, this approach demonstrates the potential of using word embeddings for query expansion in information retrieval.
Every search engineer ordinarily struggles with the task of evaluating how well a search engine is performing. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going. The talk will describe the Rated Ranking Evaluator from a developer perspective. RRE is an open source search quality evaluation tool, that could be used for producing a set of deliverable reports and that could be integrated within a continuous integration infrastructure.
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...OpenSource Connections
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
- flexible and highly configurable for a technical user
- immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Search Quality Evaluation: a Developer PerspectiveSease
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
How to Build your Training Set for a Learning To Rank ProjectSease
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
This document provides an overview of using Apache Lucene to perform document similarity searches. It discusses how Lucene indexes documents and calculates term scores using techniques like TF-IDF and BM25. The document demonstrates Lucene's More Like This module, which finds similar documents by building a query from weighted interesting terms in the input document. It also proposes future work, such as leveraging term positions and fields across multiple input documents to improve similarity searches.
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
This document provides an overview of explainability methods for learning to rank models. It introduces the SHAP (Shapley Additive Explanations) library as a popular method for explaining machine learning models using game theory. A case study demonstrates how to use Tree SHAP to generate summary, force, dependence, and decision plots to explain the features influencing a LambdaMART ranking model. Key advantages and limitations of SHAP are discussed, such as its ability to handle different data types but computational expense. Code examples are provided for generating SHAP values and plots in Python.
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
Rated Ranking Evaluator Enterprise is an enterprise version of the open source Rated Ranking Evaluator search quality evaluation tool. It features query discovery to automatically extract queries from a search API, rating generation using both explicit ratings and implicit feedback, and an interactive UI for exploring and comparing evaluation results. The UI provides overview, exploration, and comparison views of evaluation data to meet the needs of business stakeholders and software engineers. Future work aims to improve the tool's capabilities around configuration, multimedia support, insights generation, and click modeling.
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
1. The document discusses evaluating learning to rank models, including offline and online evaluation methods. Offline evaluation involves building a test set from labeled data and evaluating metrics like NDCG, while online evaluation uses methods like A/B testing and interleaving to directly measure user behavior and business metrics.
2. Common mistakes in offline evaluation include having only one sample per query, single relevance labels per query, and unrepresentative test samples. While offline evaluation provides efficiency, online evaluation allows observing real user interactions and model impact on key metrics.
3. Recommendations are given to test models both offline and online, with online testing providing advantages like measuring actual business outcomes and interpreting model effects.
Interactive Questions and Answers - London Information Retrieval MeetupSease
Answers to some questions about Natural Language Search, Language Modelling (Google Bert, OpenAI GPT-3), Neural Search and Learning to Rank made during our London Information Retrieval Meetup (December).
Enterprise Search – How Relevant Is Relevance?Sease
Enterprise search is the outlier in search applications. It has to work effectively with very large collections of un-curated content, often in multiple languages, to meet the requirements of employees who need to make business-critical decisions.
In this talk, I will outline the challenges of searching enterprise content. Recent research is revealing a unique pattern of search behaviour in which relevance is both very important and yet also irrelevant, and where recall is just as important as precision. This behaviour has implications for the use of standard metrics for search performance (especially in the case of federated search across multiple applications) and for the adoption of AI/ML techniques.
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
The document discusses leveraging Lucene/Solr as a knowledge graph and intent engine. It describes building an intent engine that incorporates type-ahead prediction, spelling correction, entity and entity-type resolution, semantic query parsing, and query augmentation using a knowledge graph. The intent engine aims to understand the user's intent beyond the literal query string and help express their intent through an interactive search experience.
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
The document discusses implementing conceptual search in Solr. It describes how conceptual search aims to improve recall without reducing precision by matching documents based on concepts rather than keywords alone. It explains how Word2Vec can be used to learn related concepts from documents and represent words as vectors, which can then be embedded in Solr through synonym filters and payloads to enable conceptual search queries. This allows retrieving more relevant documents that do not contain the exact search terms but are still conceptually related.
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
Talk Abstract: Most work in semantic search has thus far focused upon either manually building language-specific taxonomies/ontologies or upon automatic techniques such as clustering or dimensionality reduction to discover latent semantic links within the content that is being searched. The former is very labor intensive and is hard to maintain, while the latter is prone to noise and may be hard for a human to understand or to interact with directly. We believe that the links between similar user’s queries represent a largely untapped source for discovering latent semantic relationships between search terms. The proposed system is capable of mining user search logs to discover semantic relationships between key phrases in a manner that is language agnostic, human understandable, and virtually noise-free.
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...Lucidworks
The document describes the NLP4L framework, which aims to improve search experiences for Lucene-based search systems. It provides reference implementations of plugins and corpora that use NLP/ML technologies to generate models, dictionaries, and indexes. It also includes an interface for users to examine output dictionaries since NLP/ML results may not be perfect. The framework is pluggable and processes data sources through various NLP tasks to generate tagged corpora, dictionaries, and document vectors to enhance search features like suggestions, translations, entity extraction, and ranking.
Self-learned Relevancy with Apache SolrTrey Grainger
Search engines are known for "relevancy", but the relevancy models that ship out of the box (BM25, classic tf-idf, etc.) are just scratching the surface of what's needed for a truly insightful application.
What if your search engine could automatically tune its own domain-specific relevancy model based on user interactions? What if it could learn the important phrases and topics within your domain, learn the conceptual relationships embedded within your documents, and even use machine-learned ranking to discover the relative importance of different features and then automatically optimize its own ranking algorithms for your domain? What if you could further use SQL queries to explore these relationships within your own BI tools and return results in ranked order to deliver relevance-driven analytics visualizations?
In this presentation, we'll walk through how you can leverage the myriad of capabilities in the Apache Solr ecosystem (such as the Solr Text Tagger, Semantic Knowledge Graph, Spark-Solr, Solr SQL, learning to rank, probabilistic query parsing, and Lucidworks Fusion) to build self-learning, relevance-first search, recommendations, and data analytics applications.
The document describes a Big Data project analyzing the Stack Overflow dataset. Key points:
- The dataset was obtained from Stack Exchange and consists of XML files totaling around 20GB that were parsed and loaded into HDFS.
- The data was analyzed to identify trending questions, unanswered questions, closed questions, dead questions, top tags, and more. PageRank analysis was also performed to rank posts.
- The results were visualized in a web application using technologies like Hive, HBase, Pig, and Mahout recommendation. Performance issues were encountered during the MapReduce parsing of the large XML files.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
Exploring Direct Concept Search - Steve Rowe, LucidworksLucidworks
This document discusses direct concept search using word embeddings. It describes mapping query and index terms to vector representations in a conceptual space to improve recall by expanding queries with related concepts. Word2vec is used to generate 127-dimensional word embeddings from Wikipedia text. The embeddings are indexed in Lucene to enable nearest neighbor search. Queries are expanded by searching for terms nearest to query terms in the embedding space. While building high-dimensional point indexes is slow in Lucene, this approach demonstrates the potential of using word embeddings for query expansion in information retrieval.
Every search engineer ordinarily struggles with the task of evaluating how well a search engine is performing. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going. The talk will describe the Rated Ranking Evaluator from a developer perspective. RRE is an open source search quality evaluation tool, that could be used for producing a set of deliverable reports and that could be integrated within a continuous integration infrastructure.
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...OpenSource Connections
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
- flexible and highly configurable for a technical user
- immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
Search Quality Evaluation: a Developer PerspectiveSease
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
This document summarizes a web recommender project. It describes three algorithms - baseline, HTML structure-based, and semantics-based - for recommending relevant web pages based on an input page. An evaluation of the algorithms on five topics found the structure-based algorithm performed best overall, particularly for the topic of "entropy". Error analysis revealed ways to improve the semantic algorithm, such as combining title words with named entities. The project demonstrated the importance of topic on recommendation success and provided insights into developing effective web recommending tools.
This document provides an overview of how to get started with the SimilarWeb API. It discusses signing up for a free trial account, making API requests, and monitoring usage. The API allows users to retrieve web traffic and engagement data, mobile app details, and related websites for a given domain. Requests require a domain, endpoint, format, and user key. Time granularity and date ranges can also be specified for some endpoints. Usage is monitored and quotas tracked to avoid going over limits.
Building multi billion ( dollars, users, documents ) search engines on open ...Andrei Lopatenko
How to use open source technologies to build search engines for billions of users, billions of revenue, billions of documents
Keynote talk at The 16th International Conference on Open Source Systems.
Best Automated SEO Reporting Tools:2023
Posted by
Editorial Staff of orangutaan.com
–
March 28, 2023
Automated SEO reporting tools have become an essential part of search engine optimization. If you are a marketer, then it helps you to automate the process of generating reports, saving time and effort that can be put into other crucial tasks.
In this article, we will discuss the features of some of the best Automated SEO Reporting tools, the pros, and cons, and answer some frequently asked questions.
Best Automated SEO Reporting Tools
Table of Contents
Features Of Automated SEO Reporting Tools
Automated SEO reporting tools come with 5 main features as outlined below. All these features help to generate comprehensive reports easily.
1. Scheduled Reports:
The “Scheduled Reports” feature allows you to automatically produce and transmit reports on a regular basis.
In fact, these reports can be changed to contain particular KPIs and metrics that the user wishes to monitor. Typically you need to register your account, link their data sources, and either build a custom report template or use a standard template.
By reducing the need to manually prepare and deliver reports, scheduled reports may save time and effort.
Nevertheless, it’s crucial to make sure the reports contain useful KPIs and data to effectively convey outcomes.
2. Keyword Tracking:
The “Keyword Tracking” tool allows you to track the ranks of your keywords on Search Engine Results Pages (SERPs) over time.
This feature gives information on the performance of a website’s search engine optimization efforts and can assist you in identifying areas for improvement.
Tools, such as SEMrush and Ahrefs, let you track hundreds or even thousands of keywords at the same time. These tools not only monitor keyword rankings but also give other information such as search volume, keyword difficulty, and competition analysis to assist you in optimizing your SEO approach.
The keyword tracking option in Automatic SEO Reporting tools is useful for those who want to monitor and optimize their online presence.
3. Competitor Analysis:
The “Competitor Analysis” tool allows users to collect and analyze data about their rivals’ websites in order to uncover patterns and trends. To begin an SEO competition study, look at competing websites and their top-performing pages, paying close attention to the content and keywords and how they’ve been designed to function together.
You may add a lot more keywords and other data points, such as keyword position, to make this study even more helpful. Using these indexes, several SEO rank-tracking software solutions allow you to categorize keywords and track ranks. Surfer, Google AdWords Keyword Planner, and Moz’s Open Site Explorer are among the tools available for acquiring information on rivals’ websites.
Overall, automated SEO reporting tools’ competition analysis capability helps you to obtain insights into your rivals’ SEO techniques
This document discusses search engine optimization (SEO) techniques. It begins with an introduction to digital marketing and SEO. The document then covers SEO metrics, components of search engines, keyword research factors, on-page and off-page optimization strategies, link building guidelines, and website monetization. It concludes with references in APA format. In under 3 sentences, the document provides an overview of digital marketing and SEO, discusses key on-page and off-page optimization techniques to improve search engine rankings, and cites references used.
Ranker is a social platform that allows users to rank and list items. It faces challenges of large data volumes and traffic spikes. To address this, Ranker employs strategies like caching frequently accessed data, de-normalizing databases, using proper hardware, pre-calculating aggregated data, processing events asynchronously, and indexing data for fast searching. Through these techniques, Ranker is able to handle its performance issues and scale with increasing traffic.
IRJET- Hybrid Recommendation System for MoviesIRJET Journal
This document describes a hybrid recommendation system for movies that combines collaborative and content-based filtering. It uses the MovieLens rating dataset and supplements it with additional data from IMDB, such as movie details. Algorithms like nearest neighbors collaborative filtering and content-based filtering are used to provide personalized movie recommendations to users. The system architecture and design are outlined, including user profiles, movie searching, and success prediction for upcoming movies. An evaluation of the system demonstrates how additional content features can improve recommendation accuracy over collaborative filtering alone.
Online Exams System fulfils the requirements of the institutes to conduct the exams online. They do not have to go to any software developer to make a separate site for being able to conduct exams online. They just have to register on the site and enter the exam details and the lists of the students which can appear in the exam.
The document describes a system called UProRevs that aims to personalize web search results based on the user's profile and interests. It does this by taking the results from a normal search engine, calculating the relevance of each result to the user's profile, and displaying the results along with this relevance score. The system generates user profiles based on information provided during registration and updates them over time based on the user's feedback on search results. It calculates relevance by comparing keywords from the user profile and web page, and weighting them based on their ranks in each profile. The goal is to provide more useful search results tailored to each individual user's perspective.
6 key things UXers need to know while working with APIsMargaret Hanley
This talk is for UX designers working in projects where the site is being fed and created by APis. The six things they need to know as they design their interfaces and workflows with API.
Google faces challenges in providing relevant search results due to the massive amount of online data. They use algorithms, manual reviews, and filters to evaluate pages and rank them. Major algorithms include RankBrain, Panda, and Penguin which analyze factors like content quality and user experience. Google measures algorithm effectiveness by user behavior metrics and feedback. Moving forward, Google will focus on content quality, mobile experience, and voice search optimization to improve search results.
Similar to Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation (20)
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
6. FOSDEM 2019
Search engineering is the production of quality
search systems.
Search quality (and in general software quality) is a
huge topic which can be described using internal
and external factors.
In the end, only external factors matter, those that
can be perceived by users and customers. But the
key for getting optimal levels of those external
factors are the internal ones.
One of the main differences between search and
software quality (especially from a correctness
perspective) is in the ok / ko judgment, which is, in
general, more “deterministic” in case of software
development.
Context OverviewSearch Quality
Internal Factors
External Factors
Correctness
RobustnessExtendibility
Reusability
Efficiency
Timeliness
Modularity
Readability
Maintainability
Testability
Maintainability
Understandability
Reusability
….
Focused on
Primarily focused on
Search Quality Evaluation
7. FOSDEM 2019Search Quality: Correctness
Correctness is the ability of a system to perform its
exact task, as defined by its specification.
Search domain is critical from this perspective
because correctness depends on arbitrary user
judgments.
For each internal (gray) and external (red) iteration
we need to find a way to measure the correctness.
Evaluation measures for an information retrieval
system are used to assert how well the search results
satisfied the user's query intent.
Correctness
New system Existing system
Here are the requirements
V1.0 has been
Cool!
a month later…
We have a change request.We found a bug
We need to improve our search
system, users are complaining
about junk in search results.
v0.1
…
v0.9
v1.1
v1.2
v1.3
…
v2.0
v2.0
How can we know where our system is going
between versions, in terms of correctness?
8. FOSDEM 2019Search Quality: Measures
Evaluation measures for an information retrieval
system try to formalise how well a search system
satisfies its user information needs.
Measures are generally split into two categories:
online and offline measures.
In this context we will focus on offline measures.
We will talk about something that can help a search
engineer during his ordinary day (i.e. in those phases
previously called “internal iterations”)
We will also see how the same tool can be used for
a broader usage, like contributing in the continuous
integration pipeline or even for delivering value to
functional stakeholders.
Evaluation MeasuresEvaluation Measures
Online Measures
Offline Measures
Average Precision
Mean Reciprocal Rank
Recall
NDCG
Precision Click-through rate
F-Measure
Zero result rate
Session abandonment rate
Session success rate
….
….
We are mainly focused here
9. FOSDEM 2019
➢ Search Quality Evaluation
✓Rated Ranking Evaluator (RRE)
‣ What is it?
‣ How does it work?
‣ Evaluation Process Input & Output
‣ Challenges
➢ Demo
➢ Future Works
➢ Q&A
Agenda
10. FOSDEM 2019RRE: What is it?
• A set of search quality evaluation tools
• A search quality evaluation framework
• Multi (search) platform
• Written in Java
• It can be used also in non-Java projects
• Licensed under Apache 2.0
• Open to contributions
• Extremely dynamic!
RRE: What is it?
https://github.com/SeaseLtd/rated-ranking-evaluator
11. FOSDEM 2019RRE: At a glance
2________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Contributors
Apache Lucene/Solr
London
10________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Modules
10________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Modules
48950________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Lines of Code
2________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Months
2________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Contributors
10________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Modules
10________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Modules
67317________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Lines of Code
5________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Months
4________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Contributors
12________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Modules
10________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Modules
71670________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Lines of Code
8________________________________________________________________________________________________________________________________________________________________________________________________________________
____________
Months
12. FOSDEM 2019RRE: Ecosystem
The picture illustrates the main modules composing
the RRE ecosystem.
All modules with a dashed border are planned for a
future release.
RRE CLI has a double border because although the
rre-cli module hasn’t been developed, you can run
RRE from a command line using RRE Maven
archetype, which is part of the current release.
As you can see, the current implementation includes
two target search platforms: Apache Solr and
Elasticsearch.
The Search Platform API module provide a search
platform abstraction for plugging-in additional
search systems.
RRE Ecosystem
CORE
Plugin
Plugin
Reporting Plugin
Search
Platform
API
RequestHandler
RRE Server
RRE CLI
Plugin
Plugin
Plugin
Archetypes
13. FOSDEM 2019RRE: Available metrics
These are the RRE built-in metrics which can be
used out of the box.
The most part of them are computed at query level
and then aggregated at upper levels.
However, compound metrics (e.g. MAP, or GMAP)
are not explicitly declared or defined, because the
computation doesn’t happen at query level. The result
of the aggregation executed on the upper levels will
automatically produce these metric.
For example, the Average Precision computed for
Q1, Q2, Q3, Qn becomes the Mean Average
Precision at Query Group or Topic levels.
Available Metrics
Precision
Recall
Precision at 1 (P@1)
Precision at 2 (P@2)
Precision at 3 (P@3)
Precision at 10 (P@10)
Average Precision (AP)
Reciprocal Rank
Mean Reciprocal Rank
Mean Average Precision (MAP)
Normalised Discounted Cumulative Gain (NDCG)
F-Measure Compound Metric
14. FOSDEM 2019RRE: Domain Model (1/2)
RRE Domain Model is organized into a composite /
tree-like structure where the relationships between
entities are always 1 to many.
The top level entity is a placeholder representing an
evaluation execution.
Versioned metrics are computed at query level and
then reported, using an aggregation function, at
upper levels.
The benefit of having a composite structure is clear:
we can see a metric value at different levels (e.g. a
query, all queries belonging to a query group, all
queries belonging to a topic or at corpus level)
RRE Domain ModelEvaluation
Corpus
1..*
v1.0
P@10
NDCG
AP
F-MEASURE
….
v1.1
P@10
NDCG
AP
F-MEASURE
….
v1.2
P@10
NDCG
AP
F-MEASURE
….
v1.n
P@10
NDCG
AP
F-MEASURE
….
Topic
Query Group
Query
1..*
1..*
1..*
…
Top level domain entity
Test dataset / collection
Information need
Query variants
Queries
15. FOSDEM 2019RRE: Domain Model (2/2)
Although the domain model structure is able to
capture complex scenarios, sometimes we want to
model simpler contexts.
In order to avoid verbose and redundant ratings
definitions it’s possibile to omit some level.
Specifically we can be in one of the following:
• only queries
• query groups and queries
• topics, query groups and queries
RRE Domain ModelEvaluation
Corpus
1..*
v1.0
P@10
NDCG
AP
F-MEASURE
….
v1.1
P@10
NDCG
AP
F-MEASURE
….
v1.2
P@10
NDCG
AP
F-MEASURE
….
v1.n
P@10
NDCG
AP
F-MEASURE
….
Topic
Query Group
Query
1..*
1..*
1..*
…
= Required
= Optional
16. FOSDEM 2019RRE: Evaluation process overview (1/2)
Data
Configuration
Ratings
Search Platform
uses a
produces
Evaluation Data
INPUT LAYER EVALUATION LAYER OUTPUT LAYER
JSON
RRE Console
…
used for generating
17. FOSDEM 2019RRE: Evaluation process overview (2/2)
Runtime Container
RRE Core
For each ratings set
For each dataset
For each topic
For each query group
For each query
Starts the search
Stops the search
Creates & configure the index
Indexes data
For each version Executes query
Computes metric
2
3
4
5
6
7
8
9 12
13
1
11
outputs the evaluation data
14
uses the evaluation data
15
18. FOSDEM 2019RRE: Corpora
An evaluation execution can involve more than one
datasets targeting a given search platform.
A dataset consists consists of representative domain
data; although a compressed dataset can be
provided, generally it has a small/medium size.
Within RRE, corpus, dataset, collection are
synonyms.
Datasets must be located under a configurable
folder. Each dataset is then referenced in one or
more ratings file.
Corpora
19. FOSDEM 2019RRE: Configuration Sets
The search platform configuration evolves over
time (e.g. change requests, enhancements, bugs)
RRE encourages an incremental approach for
managing the configuration instances. Even for
internal or small iterations, each time we make a
relevant change to the current configuration, it’s
better to clone it and move forward with a new
version.
In this way we’ll end up having the historical
progression of our system, and RRE will be able to
make comparisons.
The evaluation process allows you to define
inclusion / exclusion rules (i.e. include only version
1.0 and 2.0)
Configuration Sets
20. FOSDEM 2019RRE / Query templates
For each query or query group) it’s possible to
define a template, which is a kind of query shape
containing one or more placeholders.
Then, in the ratings file you can reference one of
those defined templates and you can provide a value
for each placeholder.
Templates have been introduced in order to:
• allow a common query management between
search platforms
• define complex queries
• define runtime parameters that cannot be
statically determined (e.g. filters)
Query templates
only_q.json
filter_by_language.json
21. FOSDEM 2019RRE: Ratings
Ratings files associate the RRE domain model
entities with relevance judgments. A ratings file
provides the association between queries and
relevant documents.
There must be at least one ratings file (otherwise no
evaluation happens). Usually there’s a 1:1
relationship between a rating file and a dataset.
Judgments, the most important part of this file,
consist of a list of all relevant documents for a
query group.
Each listed document has a corresponding “gain”
which is the relevancy judgment we want to assign
to that document.
Ratings
OR
22. FOSDEM 2019RRE: Evaluation Output
The RRE Core itself is a library, so it outputs its
result as a Plain Java object that must be
programmatically used.
However when wrapped within a runtime container,
like the Maven Plugin, the evaluation object tree is
marshalled in JSON format.
Being interoperable, the JSON format can be used by
some other component for producing a different kind
of output.
An example of such usage is the RRE Apache
Maven Reporting Plugin which can
• output a spreadsheet
• send the evaluation data to a running RRE Server
Evaluation output
23. FOSDEM 2019RRE: Workbook
The RRE domain model (topics, groups and queries)
is on the left and each metric (on the right section)
has a value for each version / entity pair.
In case the evaluation process includes multiple
datasets, there will be a spreadsheet for each of
them.
This output format is useful when
• you want to have (or maintain somewhere) a
snapshot about how the system performed in a
given moment
• the comparison includes a lot of versions
• you want to include all available metrics
Workbook
24. FOSDEM 2019RRE: RRE Server (1/2)
The RRE console is a SpringBoot/AngularJS
application which shows real-time information about
evaluation results.
Each time a build happens, the RRE reporting
plugin sends the evaluation result to a RESTFul
endpoint provided by RRE Server.
The received data immediately updates the web
dashboard with fresh data.
Useful during the development / tuning phase
iterations (you don’t have to open again and again
the excel report)
RRE Server
25. FOSDEM 2019RRE: Iterative development & tuning
Dev, tune & Build
Check evaluation results
We are thinking about how
to fill a third monitor
26. FOSDEM 2019RRE: Challenges
“I think if we could create a simplified
pass/fail report for the business team,
that would be ideal. So they could
understand the tradeoffs of the new
search.”
“Many search engines process the user
query heavily before it's submitted to the
search engine in whatever DSL is required,
and if you don't retain some idea of the
original query in the system how can you”
relate the test results back to user
behaviour?
Do I have to write all judgments
manually??
How can I use RRE if I have a custom
search platform?
Java is not in my stack
Can I persist the evaluation data?
28. FOSDEM 2019RRE DEMO: Github Repository
• A sample RRE-enabled project
• No Java code, only configuration
• Search Platform: Elasticsearch 6.3.2
• Seven example iterations
• Index shapes & queries from Relevant Search [1]
• Dataset: TMBD (extract)
Example Project
https://github.com/SeaseLtd/rre-fosdem-walkthrough
[1] https://www.manning.com/books/relevant-search
29. FOSDEM 2019RRE DEMO: Let’s start!
• Project skeleton with example files
• Generated using RRE Maven Archetype
• Can be executed in the command line
• Can be imported in an IDE (e.g. IntelliJ, Eclipse)
• Prerequisites: Java >= 8, Apache Maven 3.x
• Detailed steps in the RRE Wiki
Project Skeleton
RRE artefacts are not yet available in central Maven
Repository. As consequence of that, Sease Repository
needs to be configured in your Maven settings.
The RRE Wiki contains detailed info about that.
30. FOSDEM 2019RRE DEMO: Maven Archetype in Action
• Prerequisites: Java >= 8, Apache Maven 3.x
• -B stands for batch mode. Without that parameter
the skeleton build procedure will be interactive
• groupId and artifactId can be whatever
• archetypeArtifactId, esVersion/solrVersion
need to be configured according with your search
infrastructure
Maven Archetype
> mvn archetype:generate
-PSease
-B
-DarchetypeGroupId=io.sease
-DarchetypeArtifactId=rre-maven-elasticsearch-archetype
-DarchetypeVersion=1.0
-DgroupId=com.yourcompany.rre
-DartifactId=search-quality-evaluation
-Dversion=1.0
-DesVersion=6.3.2
Apache Solr folks will use the following properties:
…
-DarchetypeArtifactId=rre-maven-solr-archetype
…
-DsolrVersion=7.4.0
31. FOSDEM 2019RRE DEMO: Starter project (1/2)
• RRE enabled with selected Search Platform
• Two sample iterations (v1.0, v1.1)
• Each folder contains
• an example of the expected content
• a README.md
Starter Project
32. FOSDEM 2019RRE DEMO: Starter project (2/2)
• In the pom.xml you can customise:
• the evaluation metrics
• the project folders structure
• the search platform version (e.g. 6.3.2)
Starter Project
You could even change the search platform (e.g. from
Elasticsearch to Apache Solr, in the example). Note that
without restarting from scratch, changing the search
platform type on an existing project would invalidate the
example files.
33. FOSDEM 2019RRE DEMO: Prepare our data
• At least the first iteration (version)
• Corpus
• Ratings
• Query templates
What we need
You can directly load in your IDE the demo repository or
copy and paste gradually the required data from there.
tbmd.json
https://github.com/SeaseLtd/rre-fosdem-walkthrough/tree/master/src/etc/corpora
https://github.com/SeaseLtd/rre-fosdem-walkthrough/tree/master/src/etc/configuration_sets/v1.0
https://github.com/SeaseLtd/rre-fosdem-walkthrough/tree/master/src/etc/ratings
34. FOSDEM 2019RRE DEMO: Iteration #1 (1/3)
• Default configuration (empty config)
• See the execution plan in the logs
• Two different outputs:
• Excel
• RRE Server
Iteration #1
The output type can be configured in the pom.xml
37. FOSDEM 2019RRE DEMO: Iteration #2 (1/2)
• Index shape contains some specific
settings
• Default english analyzer for title and
overview fields
• mmmm…everything is still red
• …however results are different
Iteration #2 (v1.1)
39. FOSDEM 2019RRE DEMO: Iteration #3 (1/2)
• replaces title^10 with title^0.1
• introduces shingles for names search
• “Space Jam” search is green!
• Other queries are better as well
Iteration #3 (v1.2)
42. FOSDEM 2019Future Works: Solr Rank Eval API
The RRE core can be used for implementing a
RequestHandler which will be able to expose a
Ranking Evaluation endpoint.
That would result in the same functionality introduced
in Elasticsearch 6.2 [1] with some differences.
• rich tree data model
• metrics framework
Note that in this case it doesn’t make so much sense
to provide comparisons between versions.
As part of the same module there could be a
SearchComponent for evaluating a single query
interaction.
[1] https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-rank-eval.html
Rank Eval API
/rank_eval
?q=something&evaluate=true
+
RRE
RequestHandler
+
RRE
SearchComponent
43. FOSDEM 2019Future Works: Jenkins Plugin
RRE Maven plugin already produces the evaluation
data in a machine-readable format (JSON) which
can be consumed by another component.
The Maven RRE Report plugin or the RRE Server
are just two examples of such consumers.
RRE can be already executed in a Jenkins CI build
cycle (using the Maven plugin).
By means of a dedicated Jenkins plugin, the
evaluation data could be graphically displayed in the
Jenkins dashboard. It could be even used for
blocking builds which produce bad evaluation results.
Jenkins Plugin
44. FOSDEM 2019Future Works: Building the input
The main input for RRE is the Ratings file, in JSON format.
Writing a comprehensive JSON to detail the ratings sets for your Search ecosystem can be expensive!
1. Explicit feedback from users judgements
2. An intuitive UI allow judges to run queries, see documents and rate them
3. Relevance label is explicitly assigned by domain experts
1. Implicit feedback from users interactions (Clicks, Sales …)
2. Log to disk / internal Solr instance for analytics
3. Estimate <q,d> relevance label based on Click Through Rate, Sales Rate
Users Interactions Logger
Judgement Collector UI
Quality
Metrics
Ratings
Set
Explicit
Feedback RRE
Implicit
Feedback
Judgements Collector
Interactions Logger
45. FOSDEM 2019Future Works: Learning To Rank
Once you collected the ratings, could we use them to actively improve the quality metrics ?
“Learning to rank is the application of machine
learning, typically supervised, semi-supervised or
reinforcement learning, in the construction of
ranking models for information retrieval
systems.” Wikipedia
Learning To Rank Users Interactions
Logger
Judgement Collector
UI
Interactions
Training
46. FOSDEM 2019Future Works: Training Set Building
Creating a Learning To Rank Training Set from the collected interactions is not going to be trivial.
It normally requires ad hoc data manipulation depending on the use case…
… but some steps could be automated and make available for a generic configurable approach
▪ Null feature sanitisation
▪ Query Id calculation
▪ Query document feature generation
▪ Single/Multi valued categorical feature encoding
Configuration
1. Ad Hoc category, Artificial values, keep NaN
-> depends of Training Library to use
2. Optional Query Level features to be hashed as
QueryId
3. Intersect related query and document level
categorical features to generate Ordinal query-
document features
4. Label Encoding ? One Hot Encoding? Binary
Encoding? [1]
Dummy Variable Trap
[1] https://www.datacamp.com/community/tutorials/categorical-data
47. FOSDEM 2019Future Works: Training Set Building
What about the relevance label for each training vector ?
Can we estimate it from the interactions collected ?
▪ Interaction Type Counts
▪ Click Through Rate/Sales Through Rate calculation
▪ Relevance label normalisation
Configuration
1. Impressions? Clicks? Bookmarks? Add To
Charts? Sales?
2. Define the objective: Clicks/Impressions ?
Sales/Impressions?
3. Relevance Label : 0…4
48. FOSDEM 2019Future Works: Learning To Rank Solr Configs
Can the features.json configuration generation be automated?
The features.json is a configuration file necessary
for Solr Learning To Rank
extension to work.
It is a configuration file that describes how the
features that were used at
training time for the model can be extracted at
query time.
This file is coupled both with the training set
features and the query time
features.
Learning To Rank - Solr features.json
Users Interactions
Logger
Training Set Builder
Configuration
Features.json