Building Search Using OpenSearch: Limitations and Workarounds

LONDON INFORMATION RETRIEVAL+AI MEETUP
24/06/2025
Nazerke Seidan, Software Engineer in Search @Sease
Search Limitations and Workarounds
in OpenSearch

‣ Software Engineer in Search @Sease
‣ Ex-Salesforce, Search Solr Cloud
‣ Apache Solr Contributor
‣ Nature Lover
NAZERKE SEIDAN
WHO WE ARE

OVERVIEW
Hybrid Search Limitations in OpenSearch v2.17
Workarounds: Pagination + Reciprocal Rank Fusion
Native Support Introduced in OpenSearch v2.19
Semantic Highlighting Workaround
Native Support in OpenSearch v3.0
Did-you-mean Suggester Documentation Correction

WHY SHOULD WE USE HYBRID SEARCH AT ALL?
Provides more relevant results by matching exact
query terms and broader semantic meaning
traditional keyword search + semantic search

HYBRID QUERY LIMITATIONS IN OPENSEARCH V2.17
● RANK-BASED COMBINATION TECHNIQUE
● PAGINATION
- Hybrid query returns all results at once
- Reciprocal Rank Fusion (RRF) is not supported
- Only support for score-based min-max normalization

HYBRID SEARCH CUSTOM IMPLEMENTATION WORKAROUNDS
● multi-match query (lexical)
● k-NN query (semantic)
● Reciprocal Rank Fusion
(RRF) combination technique

- based on the concept of Reciprocal Rank, which is inverse of the
rank of the document;
- takes into account the position of the document and give higher
importance to the documents that are ranked higher;
- reciprocal rank score:
Reciprocal Rank Fusion (RRF)
1/(rank + K)
where rank is the position of the document, K is constant (k=60)

1. Obtain ranked search results from both lexical (match) and
semantic (k-NN) search queries
2. Calculate reciprocal rank scores for each search result by
using 1/(rank +K)
3. Combine scores: sum the reciprocal rank scores of the same
search result
4. Sort search results based on reciprocal rank scores
How RRF Works in Hybrid Search ?

Hybrid Search Improvements
in OpenSearch v2.19

● PAGINATION SUPPORT
● RECIPROCAL RANK FUSION
(RRF)
- Normalization technique (PR)
- Controlled by pagination_depth, from, and size params (PR)
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19

PUT /_search/pipeline/rrf-pipeline
{
"description": "Post processor for hybrid RRF search",
"phase_results_processors": [
{
"score-ranker-processor": {
"combination": {
"technique": "rrf",
"rank_constant": 60
}
}
}
]
}
Create a search pipeline using RRF technique:

GET /web/_search?search_pipeline=rrf-pipeline
{
"query": {
"hybrid": {
"queries": [
{/* lexical keyword search: multi-match query*/},
{/* semantic search: knn query */}
]
}
}
}
Use the pipeline in the search hybrid query:

"hybrid": {
"queries": [{
"multi_match": {
"query": "test",
"fields": ["title", "description"],
"type": "best_fields"}
}
}]
}
Add sub-queries in the search hybrid query:

"hybrid": {
"queries": [{
"multi_match": {
"query": "test",
"fields": ["title", "description"],
"type": "best_fields"}
},
{
"nested": {
"path": "chunks",
"query": {
"knn": {
"chunks.embedding": {
"vector": [/* LLM generated embeddings */],
"k": 100
}}}}}]
}
Add sub-queries in the search hybrid query:

Use the built-in pagination via these params:
pagination_depth
from & size
this param specifies the max number of results to retrieve from
each shard per every sub-query before the results are
merged
define the final result window returned to the user after
merging, deduplication, and re-ranking

OpenSearch blog

GET /web/_search?search_pipeline=rrf-pipeline
{
"from": 5,
"size": 10
"query": {
"hybrid": {
"pagination_depth": 20,
"queries": [
{/* lexical keyword search: multi-match query*/},
{/* semantic search: knn query */}
]
}
}
}
Use the built-in pagination:

● Although pagination_depth ensures stable results, it limits to a fixed result set.
● When you reach the last available page for your specified pagination_depth,
OpenSearch returns this error:
○ “Reached end of search results. Increase pagination_depth value to see more results.”
● Changing pagination_depth during pagination will lead to inconsistent results.
HYBRID SEARCH PAGINATION
pagination_depth limitations:

SEMANTIC HIGHLIGHTING
Relies on semantic meaning of the context
➡ semantic keyword highlighting

SEMANTIC HIGHLIGHTING
Relies on semantic meaning of the context
➡ semantic keyword highlighting ➡ semantic sentence highlighting

SEMANTIC SENTENCE HIGHLIGHTING WORKAROUND
1. Retrieve top-k documents using lexical search
2. Pre-filter by their doc IDs and run k-NN search at the chunk level to extract
the most semantically relevant snippet, <doc_id, chunk>

SEMANTIC SENTENCE HIGHLIGHTING IN OPENSEARCH V3.0
*The diagram originates from the OpenSearch Semantic Highlighting RFC

Index Sample Dataset
POST /neural-search-index/_doc/1
{
"text": "Rising rates of mental health disorders such as anxiety,
depression, and eating disorders are increasingly reported among
adolescents and young adults. These issues are often linked to
academic stress and lack of physical activity."
}
{
"text": "Youth today face more health issues due to digital lifestyle.
Constant screen time, social comparison on social media make
them depressed, build low self-esteem."
}
{
"text": "Despite facing physical and mental health challenges,
many young people today remain resilient and motivated. They
actively seek self-improvement and engage in communities that
promote healthy lifestyles. Their openness to discussing mental
health and prioritizing well-being marks a shift toward greater
awareness and proactive care."
}
{
"text": "Practicing mindfulness, seeking help from counselors, and
staying connected with supportive peers play a key role. Schools
and families can support by promoting open conversations,
mental health education. But young people have to be motivated
in doing this."
}

Perform Semantic Sentence Highlighting
POST /neural-search-index/_search
{
"query": {
"neural": {
"text_embedding": {
"query_text": "youth health problems",
"model_id": "1_jy7JYBs6feJmwsZPcP",
"k": 2
}
}
},
"highlight": {
"fields": {
"text": {"type": "semantic" }
},
"options": {
"model_id": "7_gE7ZYBs6feJmwscPf9"
}
}}
text embedding model:
huggingface/sentence-transformers/all-MiniLM-L6-v2
sentence highlighting model:
opensearch-semantic-highlighter-v1

Semantic Sentence Highlighting Response
"hits": [
{
"highlight": {
"text": [
"Youth today face more health issues due to
digital lifestyle. Constant screen time, social
comparison on social media make them depressed,
build low self-esteem."] }
},
{
"highlight": {
"text": [
"Rising rates of mental health disorders such as
anxiety, depression, and eating disorders are
increasingly reported among adolescents and young
adults. These issues are often linked to
academic stress and lack of physical activity."]}
}]

PHRASE SUGGESTER DOCUMENTATION CORRECTION
● PHRASE SUGGESTER
- Uses n-gram models to suggest phrases (OpenSearch doc)
➡ text analysis doesn’t break the text into n-grams
➡ spellchecker still works with plain chain (standard
tokenizer, language analyzers)

Building Search Using OpenSearch: Limitations and Workarounds

More Related Content

More from Sease

Recently uploaded

Building Search Using OpenSearch: Limitations and Workarounds

Editor's Notes