LONDON INFORMATION RETRIEVAL+AI MEETUP
24/06/2025
Nazerke Seidan, Software Engineer in Search @Sease
Search Limitations and Workarounds
in OpenSearch
‣ Software Engineer in Search @Sease
‣ Ex-Salesforce, Search Solr Cloud
‣ Apache Solr Contributor
‣ Nature Lover
NAZERKE SEIDAN
WHO WE ARE
OVERVIEW
Hybrid Search Limitations in OpenSearch v2.17
Workarounds: Pagination + Reciprocal Rank Fusion
Native Support Introduced in OpenSearch v2.19
Semantic Highlighting Workaround
Native Support in OpenSearch v3.0
Did-you-mean Suggester Documentation Correction
Hybrid Search
WHY SHOULD WE USE HYBRID SEARCH AT ALL?
Provides more relevant results by matching exact
query terms and broader semantic meaning
traditional keyword search + semantic search
HYBRID QUERY LIMITATIONS IN OPENSEARCH V2.17
● RANK-BASED COMBINATION TECHNIQUE
● PAGINATION
- Hybrid query returns all results at once
- Reciprocal Rank Fusion (RRF) is not supported
- Only support for score-based min-max normalization
HYBRID SEARCH CUSTOM IMPLEMENTATION WORKAROUNDS
● multi-match query (lexical)
● k-NN query (semantic)
● Reciprocal Rank Fusion
(RRF) combination technique
HYBRID SEARCH CUSTOM IMPLEMENTATION WORKAROUNDS
- based on the concept of Reciprocal Rank, which is inverse of the
rank of the document;
- takes into account the position of the document and give higher
importance to the documents that are ranked higher;
- reciprocal rank score:
Reciprocal Rank Fusion (RRF)
1/(rank + K)
where rank is the position of the document, K is constant (k=60)
HYBRID SEARCH CUSTOM IMPLEMENTATION WORKAROUNDS
1. Obtain ranked search results from both lexical (match) and
semantic (k-NN) search queries
2. Calculate reciprocal rank scores for each search result by
using 1/(rank +K)
3. Combine scores: sum the reciprocal rank scores of the same
search result
4. Sort search results based on reciprocal rank scores
How RRF Works in Hybrid Search ?
HYBRID SEARCH CUSTOM IMPLEMENTATION WORKAROUNDS
HYBRID SEARCH CUSTOM IMPLEMENTATION WORKAROUNDS
HYBRID SEARCH CUSTOM IMPLEMENTATION WORKAROUNDS
Hybrid Search Improvements
in OpenSearch v2.19
● PAGINATION SUPPORT
● RECIPROCAL RANK FUSION
(RRF)
- Normalization technique (PR)
- Controlled by pagination_depth, from, and size params (PR)
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
PUT /_search/pipeline/rrf-pipeline
{
"description": "Post processor for hybrid RRF search",
"phase_results_processors": [
{
"score-ranker-processor": {
"combination": {
"technique": "rrf",
"rank_constant": 60
}
}
}
]
}
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
Create a search pipeline using RRF technique:
GET /web/_search?search_pipeline=rrf-pipeline
{
"query": {
"hybrid": {
"queries": [
{/* lexical keyword search: multi-match query*/},
{/* semantic search: knn query */}
]
}
}
}
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
Use the pipeline in the search hybrid query:
"hybrid": {
"queries": [{
"multi_match": {
"query": "test",
"fields": ["title", "description"],
"type": "best_fields"}
}
}]
}
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
Add sub-queries in the search hybrid query:
"hybrid": {
"queries": [{
"multi_match": {
"query": "test",
"fields": ["title", "description"],
"type": "best_fields"}
},
{
"nested": {
"path": "chunks",
"query": {
"knn": {
"chunks.embedding": {
"vector": [/* LLM generated embeddings */],
"k": 100
}}}}}]
}
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
Add sub-queries in the search hybrid query:
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
Use the built-in pagination via these params:
pagination_depth
from & size
this param specifies the max number of results to retrieve from
each shard per every sub-query before the results are
merged
define the final result window returned to the user after
merging, deduplication, and re-ranking
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
OpenSearch blog
GET /web/_search?search_pipeline=rrf-pipeline
{
"from": 5,
"size": 10
"query": {
"hybrid": {
"pagination_depth": 20,
"queries": [
{/* lexical keyword search: multi-match query*/},
{/* semantic search: knn query */}
]
}
}
}
HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
Use the built-in pagination:
● Although pagination_depth ensures stable results, it limits to a fixed result set.
● When you reach the last available page for your specified pagination_depth,
OpenSearch returns this error:
○ “Reached end of search results. Increase pagination_depth value to see more results.”
● Changing pagination_depth during pagination will lead to inconsistent results.
HYBRID SEARCH PAGINATION
pagination_depth limitations:
Semantic Highlighting
SEMANTIC HIGHLIGHTING
Relies on semantic meaning of the context
➡ semantic keyword highlighting
SEMANTIC HIGHLIGHTING
Relies on semantic meaning of the context
➡ semantic keyword highlighting ➡ semantic sentence highlighting
SEMANTIC SENTENCE HIGHLIGHTING WORKAROUND
1. Retrieve top-k documents using lexical search
2. Pre-filter by their doc IDs and run k-NN search at the chunk level to extract
the most semantically relevant snippet, <doc_id, chunk>
SEMANTIC SENTENCE HIGHLIGHTING WORKAROUND
1. Retrieve top-k documents using lexical search
2. Pre-filter by their doc IDs and run k-NN search at the chunk level to extract
the most semantically relevant snippet, <doc_id, chunk>
SEMANTIC SENTENCE HIGHLIGHTING IN OPENSEARCH V3.0
*The diagram originates from the OpenSearch Semantic Highlighting RFC
SEMANTIC SENTENCE HIGHLIGHTING IN OPENSEARCH V3.0
Index Sample Dataset
POST /neural-search-index/_doc/1
{
"text": "Rising rates of mental health disorders such as anxiety,
depression, and eating disorders are increasingly reported among
adolescents and young adults. These issues are often linked to
academic stress and lack of physical activity."
}
POST /neural-search-index/_doc/2
{
"text": "Youth today face more health issues due to digital lifestyle.
Constant screen time, social comparison on social media make
them depressed, build low self-esteem."
}
POST /neural-search-index/_doc/3
{
"text": "Despite facing physical and mental health challenges,
many young people today remain resilient and motivated. They
actively seek self-improvement and engage in communities that
promote healthy lifestyles. Their openness to discussing mental
health and prioritizing well-being marks a shift toward greater
awareness and proactive care."
}
POST /neural-search-index/_doc/4
{
"text": "Practicing mindfulness, seeking help from counselors, and
staying connected with supportive peers play a key role. Schools
and families can support by promoting open conversations,
mental health education. But young people have to be motivated
in doing this."
}
SEMANTIC SENTENCE HIGHLIGHTING IN OPENSEARCH V3.0
Perform Semantic Sentence Highlighting
POST /neural-search-index/_search
{
"query": {
"neural": {
"text_embedding": {
"query_text": "youth health problems",
"model_id": "1_jy7JYBs6feJmwsZPcP",
"k": 2
}
}
},
"highlight": {
"fields": {
"text": {"type": "semantic" }
},
"options": {
"model_id": "7_gE7ZYBs6feJmwscPf9"
}
}}
text embedding model:
huggingface/sentence-transformers/all-MiniLM-L6-v2
sentence highlighting model:
opensearch-semantic-highlighter-v1
SEMANTIC SENTENCE HIGHLIGHTING IN OPENSEARCH V3.0
Semantic Sentence Highlighting Response
"hits": [
{
"highlight": {
"text": [
"<em>Youth today face more health issues due to
digital lifestyle.</em> Constant screen time, social
comparison on social media make them depressed,
build low self-esteem."] }
},
{
"highlight": {
"text": [
"<em>Rising rates of mental health disorders such as
anxiety, depression, and eating disorders are
increasingly reported among adolescents and young
adults.</em> <em>These issues are often linked to
academic stress and lack of physical activity.</em>"]}
}]
SEMANTIC SENTENCE HIGHLIGHTING IN OPENSEARCH V3.0
Semantic Sentence Highlighting Response
"hits": [
{
"highlight": {
"text": [
"<em>Youth today face more health issues due to
digital lifestyle.</em> Constant screen time, social
comparison on social media make them depressed,
build low self-esteem."] }
},
{
"highlight": {
"text": [
"<em>Rising rates of mental health disorders such as
anxiety, depression, and eating disorders are
increasingly reported among adolescents and young
adults.</em> <em>These issues are often linked to
academic stress and lack of physical activity.</em>"]}
}]
Did-you-mean Suggester
PHRASE SUGGESTER DOCUMENTATION CORRECTION
● PHRASE SUGGESTER
- Uses n-gram models to suggest phrases (OpenSearch doc)
➡ text analysis doesn’t break the text into n-grams
➡ spellchecker still works with plain chain (standard
tokenizer, language analyzers)
THANK YOU!

Building Search Using OpenSearch: Limitations and Workarounds

  • 1.
    LONDON INFORMATION RETRIEVAL+AIMEETUP 24/06/2025 Nazerke Seidan, Software Engineer in Search @Sease Search Limitations and Workarounds in OpenSearch
  • 2.
    ‣ Software Engineerin Search @Sease ‣ Ex-Salesforce, Search Solr Cloud ‣ Apache Solr Contributor ‣ Nature Lover NAZERKE SEIDAN WHO WE ARE
  • 3.
    OVERVIEW Hybrid Search Limitationsin OpenSearch v2.17 Workarounds: Pagination + Reciprocal Rank Fusion Native Support Introduced in OpenSearch v2.19 Semantic Highlighting Workaround Native Support in OpenSearch v3.0 Did-you-mean Suggester Documentation Correction
  • 4.
  • 5.
    WHY SHOULD WEUSE HYBRID SEARCH AT ALL? Provides more relevant results by matching exact query terms and broader semantic meaning traditional keyword search + semantic search
  • 6.
    HYBRID QUERY LIMITATIONSIN OPENSEARCH V2.17 ● RANK-BASED COMBINATION TECHNIQUE ● PAGINATION - Hybrid query returns all results at once - Reciprocal Rank Fusion (RRF) is not supported - Only support for score-based min-max normalization
  • 7.
    HYBRID SEARCH CUSTOMIMPLEMENTATION WORKAROUNDS ● multi-match query (lexical) ● k-NN query (semantic) ● Reciprocal Rank Fusion (RRF) combination technique
  • 8.
    HYBRID SEARCH CUSTOMIMPLEMENTATION WORKAROUNDS - based on the concept of Reciprocal Rank, which is inverse of the rank of the document; - takes into account the position of the document and give higher importance to the documents that are ranked higher; - reciprocal rank score: Reciprocal Rank Fusion (RRF) 1/(rank + K) where rank is the position of the document, K is constant (k=60)
  • 9.
    HYBRID SEARCH CUSTOMIMPLEMENTATION WORKAROUNDS 1. Obtain ranked search results from both lexical (match) and semantic (k-NN) search queries 2. Calculate reciprocal rank scores for each search result by using 1/(rank +K) 3. Combine scores: sum the reciprocal rank scores of the same search result 4. Sort search results based on reciprocal rank scores How RRF Works in Hybrid Search ?
  • 10.
    HYBRID SEARCH CUSTOMIMPLEMENTATION WORKAROUNDS
  • 11.
    HYBRID SEARCH CUSTOMIMPLEMENTATION WORKAROUNDS
  • 12.
    HYBRID SEARCH CUSTOMIMPLEMENTATION WORKAROUNDS
  • 13.
  • 14.
    ● PAGINATION SUPPORT ●RECIPROCAL RANK FUSION (RRF) - Normalization technique (PR) - Controlled by pagination_depth, from, and size params (PR) HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19
  • 15.
    PUT /_search/pipeline/rrf-pipeline { "description": "Postprocessor for hybrid RRF search", "phase_results_processors": [ { "score-ranker-processor": { "combination": { "technique": "rrf", "rank_constant": 60 } } } ] } HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19 Create a search pipeline using RRF technique:
  • 16.
    GET /web/_search?search_pipeline=rrf-pipeline { "query": { "hybrid":{ "queries": [ {/* lexical keyword search: multi-match query*/}, {/* semantic search: knn query */} ] } } } HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19 Use the pipeline in the search hybrid query:
  • 17.
    "hybrid": { "queries": [{ "multi_match":{ "query": "test", "fields": ["title", "description"], "type": "best_fields"} } }] } HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19 Add sub-queries in the search hybrid query:
  • 18.
    "hybrid": { "queries": [{ "multi_match":{ "query": "test", "fields": ["title", "description"], "type": "best_fields"} }, { "nested": { "path": "chunks", "query": { "knn": { "chunks.embedding": { "vector": [/* LLM generated embeddings */], "k": 100 }}}}}] } HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19 Add sub-queries in the search hybrid query:
  • 19.
    HYBRID SEARCH IMPROVEMENTSIN OPENSEARCH V2.19 Use the built-in pagination via these params: pagination_depth from & size this param specifies the max number of results to retrieve from each shard per every sub-query before the results are merged define the final result window returned to the user after merging, deduplication, and re-ranking
  • 20.
    HYBRID SEARCH IMPROVEMENTSIN OPENSEARCH V2.19 OpenSearch blog
  • 21.
    GET /web/_search?search_pipeline=rrf-pipeline { "from": 5, "size":10 "query": { "hybrid": { "pagination_depth": 20, "queries": [ {/* lexical keyword search: multi-match query*/}, {/* semantic search: knn query */} ] } } } HYBRID SEARCH IMPROVEMENTS IN OPENSEARCH V2.19 Use the built-in pagination:
  • 22.
    ● Although pagination_depthensures stable results, it limits to a fixed result set. ● When you reach the last available page for your specified pagination_depth, OpenSearch returns this error: ○ “Reached end of search results. Increase pagination_depth value to see more results.” ● Changing pagination_depth during pagination will lead to inconsistent results. HYBRID SEARCH PAGINATION pagination_depth limitations:
  • 23.
  • 24.
    SEMANTIC HIGHLIGHTING Relies onsemantic meaning of the context ➡ semantic keyword highlighting
  • 25.
    SEMANTIC HIGHLIGHTING Relies onsemantic meaning of the context ➡ semantic keyword highlighting ➡ semantic sentence highlighting
  • 26.
    SEMANTIC SENTENCE HIGHLIGHTINGWORKAROUND 1. Retrieve top-k documents using lexical search 2. Pre-filter by their doc IDs and run k-NN search at the chunk level to extract the most semantically relevant snippet, <doc_id, chunk>
  • 27.
    SEMANTIC SENTENCE HIGHLIGHTINGWORKAROUND 1. Retrieve top-k documents using lexical search 2. Pre-filter by their doc IDs and run k-NN search at the chunk level to extract the most semantically relevant snippet, <doc_id, chunk>
  • 28.
    SEMANTIC SENTENCE HIGHLIGHTINGIN OPENSEARCH V3.0 *The diagram originates from the OpenSearch Semantic Highlighting RFC
  • 29.
    SEMANTIC SENTENCE HIGHLIGHTINGIN OPENSEARCH V3.0 Index Sample Dataset POST /neural-search-index/_doc/1 { "text": "Rising rates of mental health disorders such as anxiety, depression, and eating disorders are increasingly reported among adolescents and young adults. These issues are often linked to academic stress and lack of physical activity." } POST /neural-search-index/_doc/2 { "text": "Youth today face more health issues due to digital lifestyle. Constant screen time, social comparison on social media make them depressed, build low self-esteem." } POST /neural-search-index/_doc/3 { "text": "Despite facing physical and mental health challenges, many young people today remain resilient and motivated. They actively seek self-improvement and engage in communities that promote healthy lifestyles. Their openness to discussing mental health and prioritizing well-being marks a shift toward greater awareness and proactive care." } POST /neural-search-index/_doc/4 { "text": "Practicing mindfulness, seeking help from counselors, and staying connected with supportive peers play a key role. Schools and families can support by promoting open conversations, mental health education. But young people have to be motivated in doing this." }
  • 30.
    SEMANTIC SENTENCE HIGHLIGHTINGIN OPENSEARCH V3.0 Perform Semantic Sentence Highlighting POST /neural-search-index/_search { "query": { "neural": { "text_embedding": { "query_text": "youth health problems", "model_id": "1_jy7JYBs6feJmwsZPcP", "k": 2 } } }, "highlight": { "fields": { "text": {"type": "semantic" } }, "options": { "model_id": "7_gE7ZYBs6feJmwscPf9" } }} text embedding model: huggingface/sentence-transformers/all-MiniLM-L6-v2 sentence highlighting model: opensearch-semantic-highlighter-v1
  • 31.
    SEMANTIC SENTENCE HIGHLIGHTINGIN OPENSEARCH V3.0 Semantic Sentence Highlighting Response "hits": [ { "highlight": { "text": [ "<em>Youth today face more health issues due to digital lifestyle.</em> Constant screen time, social comparison on social media make them depressed, build low self-esteem."] } }, { "highlight": { "text": [ "<em>Rising rates of mental health disorders such as anxiety, depression, and eating disorders are increasingly reported among adolescents and young adults.</em> <em>These issues are often linked to academic stress and lack of physical activity.</em>"]} }]
  • 32.
    SEMANTIC SENTENCE HIGHLIGHTINGIN OPENSEARCH V3.0 Semantic Sentence Highlighting Response "hits": [ { "highlight": { "text": [ "<em>Youth today face more health issues due to digital lifestyle.</em> Constant screen time, social comparison on social media make them depressed, build low self-esteem."] } }, { "highlight": { "text": [ "<em>Rising rates of mental health disorders such as anxiety, depression, and eating disorders are increasingly reported among adolescents and young adults.</em> <em>These issues are often linked to academic stress and lack of physical activity.</em>"]} }]
  • 33.
  • 34.
    PHRASE SUGGESTER DOCUMENTATIONCORRECTION ● PHRASE SUGGESTER - Uses n-gram models to suggest phrases (OpenSearch doc) ➡ text analysis doesn’t break the text into n-grams ➡ spellchecker still works with plain chain (standard tokenizer, language analyzers)
  • 35.

Editor's Notes

  • #3 10:40
  • #17 best_fields: uses “score” of the best matching field
  • #18 best_fields: uses “score” of the best matching field, dimension for open ai, groq: 1536 , image jina: 512