Querying Elasticsearch
Binary Studio Academy PRO 2016
binary-studio.com
Search Types
STRUCTURE
(FIELD)
SEARCH
FULL-TEXT
SEARCH
Search APIs
● Lite (query string) search
● Full-body search
Lite search
http
localhost:9200/github/repository/_search?q=language:Javascript%20+forks_count:
%3E20000&sort=forks_count:desc&size=3
Lite search
Expects all parameters to be passed via query string and encoded properly e.g:
http localhost:9200/github/repository/_search?q=name:angular.js
Based on _search API:
http localhost:9200/_search http localhost:9200/user,repository
http localhost:9200/{index}/{type}/_search?q=field:value...
http localhost:9200/github/repository/_search?size=2&from=50
Lite search
Supports pagination:
Supports obligatory conditions (+  -):
http localhost:9200/github/repository/_search?q=+language:(php%20css)
Supports sorting
http
localhost:9200/github/repository/_search?q=language:Java&sort=watchers_count:d
esc
Lite search
PROS
Powerful
Convenient for development and ad-
hoc queries
End-users can run queries directly
from their web-browser
CONS
Queries should be carefuly encoded
Opened API can cause potentially
slow queries or even kill your
cluster
Not so efficient for complex queries
FULL-BODY
SEARCH
data
FULL-BODY SEARCH
● Utilizes the same _search API
● Transfers parameters in request body e.g
curl localhost:9200/github/repository/_search -d '{"size": 2, "from": 10}'
● According to RFC 7231 there is no strict definition what to do when server
received GET query with body parameters (depends on server
implementation). So both GET and POST methods are allowed.
● Instead of encoded urls there is convenient search query domain-specific
language (DSL)
SEARCH QUERY
DSL
DSL
SEARCH QUERY CLAUSES
● Leaf clauses - compare field to a query string
(match, term, range)
● Compound clauses - combine other query clauses
(bool, dis_max)
SEARCH QUERY FORMAT
SEARCH QUERY DSL EXAMPLE
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"match": {
"language": "Javascript"
}
}
}'
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"bool": {
"must": {"match": {"language": "Javascript"}},
"should": {"match": {"description": "library"}}
}
}
}'
SEARCH QUERY MATCHERS
match
multi_match common_terms query_string
simple_query_string
FULL TEXT QUERIES
MATCHERS
MATCH & MULTI_MATCH
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"match": {
"language": "Javascript"
}
}
}'
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"multi_match": {
"query": "javascript",
"fields": ["language", "description"]
}
}
}'
QUERY STRING QUERY
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"query_string": {
"query": "language:(C OR PHP) AND watchers_count:[15000 TO *]"
}
}
}'
Supports compact Lucene query string syntax
SIMPLE QUERY STRING QUERY
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"simple_query_string": {
"fields": ["description"],
"query": "(framework^2 realtime) + -(web port client)"
}
}
}'
Have simplified query syntax
COMMON TERMS QUERY
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"common": {
"description": {
"query": "for is and web",
"cutoff_frequency": 0.001
}
}
}
}'
Divides query terms into two groups:
● More important - low frequency
● Less important - high frequency (applied first)
SEARCH QUERY FILTERS
● term
● terms
● range
● exists
● missing
● bool
● prefix
● wildcard
● regex
● fuzzy
TERM AND RANGE FILTERS
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"term": {
"language": "C++"
}
}
}'
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"range": {
"watchers_count": {
"gte": 5000,
"lte": 15000
}
}
}
}'
EXISTS AND MISSING FILTERS
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": {
"exists": {
"field": "language"
}
}
}
}
}
}
}'
BOOL FILTER
● must
○ Clauses must match, like and
● must_not
○ Clauses must not match, like not
● should
○ At least one of clauses must match, like or .
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": {
"term": {
"language": "JavaScript"
}
},
"should": {
"range": {
"forks_count": {
"gt": 10000
}
}
}
}
}
}
}
COMBINING FILTERS AND MATCHERS
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"filtered": {
"query": {
"match": {
"has_issues": true
}
},
"filter": {
"term": {
"language": "Objective-C"
}
}
}
}
}'
SORTING
SORTINGcurl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"filtered": {
"query": {
"match": {
"has_issues": true
}
},
"filter": {
"term": {
"language": "Objective-C"
}
}
}
},
"sort": {
"forks_count": {
"order": "desc"
}
}
}'
RELEVANCE
RELEVANCE
● How well a retrieved document or set of documents meets the information
need (criteria) of the user
● Positive FP number stored under _score property
● Calculated by term frequency/inverce document frequency (TF/IDF) algorithm:
○ Term Frequency (tf): more often - more
relevant (field)
○ Inverted Document Frequency(idf) more often - less relevant (index)
○ Field-length norm (fieldNorm) shorter - more relevant (field)
RELEVANCE EXPLANATION
curl localhost:9200/github/repository/_search?pretty -d '{
"query": {
"term": {
"language": "C++"
}
},
"size": 1,
"explain": true
}'
TO BE CONTINUED...

Academy PRO: Querying Elasticsearch

  • 1.
    Querying Elasticsearch Binary StudioAcademy PRO 2016 binary-studio.com
  • 2.
  • 3.
  • 4.
    Search APIs ● Lite(query string) search ● Full-body search
  • 5.
  • 6.
    http localhost:9200/github/repository/_search?q=language:Javascript%20+forks_count: %3E20000&sort=forks_count:desc&size=3 Lite search Expects allparameters to be passed via query string and encoded properly e.g: http localhost:9200/github/repository/_search?q=name:angular.js Based on _search API: http localhost:9200/_search http localhost:9200/user,repository http localhost:9200/{index}/{type}/_search?q=field:value...
  • 7.
    http localhost:9200/github/repository/_search?size=2&from=50 Lite search Supportspagination: Supports obligatory conditions (+ -): http localhost:9200/github/repository/_search?q=+language:(php%20css) Supports sorting http localhost:9200/github/repository/_search?q=language:Java&sort=watchers_count:d esc
  • 8.
    Lite search PROS Powerful Convenient fordevelopment and ad- hoc queries End-users can run queries directly from their web-browser CONS Queries should be carefuly encoded Opened API can cause potentially slow queries or even kill your cluster Not so efficient for complex queries
  • 9.
  • 10.
    FULL-BODY SEARCH ● Utilizesthe same _search API ● Transfers parameters in request body e.g curl localhost:9200/github/repository/_search -d '{"size": 2, "from": 10}' ● According to RFC 7231 there is no strict definition what to do when server received GET query with body parameters (depends on server implementation). So both GET and POST methods are allowed. ● Instead of encoded urls there is convenient search query domain-specific language (DSL)
  • 11.
  • 12.
    SEARCH QUERY CLAUSES ●Leaf clauses - compare field to a query string (match, term, range) ● Compound clauses - combine other query clauses (bool, dis_max)
  • 13.
  • 14.
    SEARCH QUERY DSLEXAMPLE curl localhost:9200/github/repository/_search?pretty -d '{ "query": { "match": { "language": "Javascript" } } }' curl localhost:9200/github/repository/_search?pretty -d '{ "query": { "bool": { "must": {"match": {"language": "Javascript"}}, "should": {"match": {"description": "library"}} } } }'
  • 15.
    SEARCH QUERY MATCHERS match multi_matchcommon_terms query_string simple_query_string FULL TEXT QUERIES MATCHERS
  • 16.
    MATCH & MULTI_MATCH curllocalhost:9200/github/repository/_search?pretty -d '{ "query": { "match": { "language": "Javascript" } } }' curl localhost:9200/github/repository/_search?pretty -d '{ "query": { "multi_match": { "query": "javascript", "fields": ["language", "description"] } } }'
  • 17.
    QUERY STRING QUERY curllocalhost:9200/github/repository/_search?pretty -d '{ "query": { "query_string": { "query": "language:(C OR PHP) AND watchers_count:[15000 TO *]" } } }' Supports compact Lucene query string syntax
  • 18.
    SIMPLE QUERY STRINGQUERY curl localhost:9200/github/repository/_search?pretty -d '{ "query": { "simple_query_string": { "fields": ["description"], "query": "(framework^2 realtime) + -(web port client)" } } }' Have simplified query syntax
  • 19.
    COMMON TERMS QUERY curllocalhost:9200/github/repository/_search?pretty -d '{ "query": { "common": { "description": { "query": "for is and web", "cutoff_frequency": 0.001 } } } }' Divides query terms into two groups: ● More important - low frequency ● Less important - high frequency (applied first)
  • 20.
    SEARCH QUERY FILTERS ●term ● terms ● range ● exists ● missing ● bool ● prefix ● wildcard ● regex ● fuzzy
  • 21.
    TERM AND RANGEFILTERS curl localhost:9200/github/repository/_search?pretty -d '{ "query": { "term": { "language": "C++" } } }' curl localhost:9200/github/repository/_search?pretty -d '{ "query": { "range": { "watchers_count": { "gte": 5000, "lte": 15000 } } } }'
  • 22.
    EXISTS AND MISSINGFILTERS curl localhost:9200/github/repository/_search?pretty -d '{ "query": { "filtered": { "query": { "match_all": {} }, "filter": { "bool": { "must_not": { "exists": { "field": "language" } } } } } } }'
  • 23.
    BOOL FILTER ● must ○Clauses must match, like and ● must_not ○ Clauses must not match, like not ● should ○ At least one of clauses must match, like or .
  • 24.
    "query": { "filtered": { "query":{ "match_all": {} }, "filter": { "bool": { "must": { "term": { "language": "JavaScript" } }, "should": { "range": { "forks_count": { "gt": 10000 } } } } } } }
  • 25.
    COMBINING FILTERS ANDMATCHERS curl localhost:9200/github/repository/_search?pretty -d '{ "query": { "filtered": { "query": { "match": { "has_issues": true } }, "filter": { "term": { "language": "Objective-C" } } } } }'
  • 26.
  • 27.
    SORTINGcurl localhost:9200/github/repository/_search?pretty -d'{ "query": { "filtered": { "query": { "match": { "has_issues": true } }, "filter": { "term": { "language": "Objective-C" } } } }, "sort": { "forks_count": { "order": "desc" } } }'
  • 28.
  • 29.
    RELEVANCE ● How wella retrieved document or set of documents meets the information need (criteria) of the user ● Positive FP number stored under _score property ● Calculated by term frequency/inverce document frequency (TF/IDF) algorithm: ○ Term Frequency (tf): more often - more relevant (field) ○ Inverted Document Frequency(idf) more often - less relevant (index) ○ Field-length norm (fieldNorm) shorter - more relevant (field)
  • 30.
    RELEVANCE EXPLANATION curl localhost:9200/github/repository/_search?pretty-d '{ "query": { "term": { "language": "C++" } }, "size": 1, "explain": true }'
  • 31.

Editor's Notes

  • #16 match query The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries. multi_match query The multi-field version of the match query. common_terms query A more specialized query which gives more preference to uncommon words. query_string query Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only. simple_query_string A simpler, more robust version of the query_string syntax suitable for exposing directly to users.