Elasticsearch
in action
By Thijs Feryn
Explain in 1 slide
•Full-text search engine
•NoSQL database
•Analytics engine
•Written in Java
•Lucene based ( ~Solr)
•Inverted indices
•Easy to scale (~Elastic)
•RESTFul interface (HTTP/JSON)
•Schemaless
•Real-time
•ELK stack
Still with me?
Hi, I’m Thijs
I’m
@ThijsFeryn
on Twitter
I’m an
Evangelist
At
I’m a
at
board member
https://www.elastic.co/
downloads/elasticsearch
{
"name" : "node-1",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.2.0",
"build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe",
"build_timestamp" : "2016-01-27T13:32:39Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}
http://localhost:
9200
RDBMS Elasticsearch
Database
Table
Row
Index
Type
Document
POST /blog
{"acknowledged":true}
Confirmation
POST/blog/post/6160
{
"language": "en-US",
"title": "WordPress 4.4 is available! And these are
the new features…",
"date": "Tue, 15 Dec 2015 13:28:23 +0000",
"author": "Romy",
"category": [
"News",
"PHP",
"Sector news",
"Webdesign & development",
"CMS",
"content management system",
"wordpress",
"WordPress 4.4"
],
"guid": "6160"
}
{
"_index": "blog",
"_type": "post",
"_id": "6160",
"_version": 1,
"created": true
}
Confirmation
GET /blog/post/6160
{
"_index": "blog",
"_type": "post",
"_id": "6160",
"_version": 1,
"found": true,
"_source": {
"language": "en-US",
"title": "WordPress 4.4 is available! And these are the new
features…",
"date": "Tue, 15 Dec 2015 13:28:23 +0000",
"author": "Romy",
"category": [
"News",
"PHP",
"Sector news",
"Webdesign & development",
"CMS",
"content management system",
"wordpress",
"WordPress 4.4"
],
"guid": "6160"
}
}
Retrieve
document by
id
Document &
meta data
GET /blog/_mapping
{
"blog": {
"mappings": {
"post": {
"properties": {
"author": {
"type": "string"
},
"category": {
"type": "string"
},
"date": {
"type": "string"
},
"guid": {
"type": "string"
},
"language": {
"type": "string"
},
"title": {
"type": "string"
}
}
}
}
}
}
Schemaless?
Not really …
“Guesses”
mapping on
insert
Explicit mapping
POST /blog
{
"mappings" : {
"post" : {
"properties": {
"title" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format": "E, dd MMM YYYY HH:mm:ss Z"
},
"author": {
"type": "string"
},
"category": {
"type": "string"
},
"guid": {
"type": "integer"
}
}
}
}
}
Explicit
mapping at
index creation
time
POST /blog
{
"mappings": {
"post": {
"properties": {
"author": {
"type": "string",
"index": "not_analyzed"
},
"category": {
"type": "string",
"index": "not_analyzed"
},
"date": {
"type": "date",
"format": "E, dd MMM YYYY HH:mm:ss Z"
},
"guid": {
"type": "integer"
},
"language": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english"
},
"nl": {
"type": "string",
"analyzer": "dutch"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Alternative
mapping
"type": "integer"
},
"language": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english"
},
"nl": {
"type": "string",
"analyzer": "dutch"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
What’s with
the analyzers?
Analyzed
vs
non-analyzed
Full-text
vs
exact value
By default strings
are analyzed
… unless you mention it in the
mapping
Analyzer
•Character filters
•Tokenizers
•Token filters
Replaces
characters
for analyzed
text
Break text
down into
terms
Add/modify/
delete tokens
Built-in analyzers
•Standard
•Simple
•Whitespace
•Stop
•Keyword
•Pattern
•Language
•Snowball
•Custom
Standard
tokenizer
Lowercase
token filter
English
stop word
token filter
Hey man, how are you doing?
hey man how are you doing
Standard
Hey man, how are you doing?
Whitespace
hei man how you do
English
POST /blog/post/_search
{
"fields": ["title"],
"query": {
"match": {
"title": "working"
}
}
}
"total": 1,
"max_score": 1.7562683,
"hits": [
{
"_index": "blog",
"_type": "post",
"_id": "2742",
"_score": 1.7562683,
"fields": {
"title": [
"Hosted SharePoint 2010: working
efficiently as a team"
]
}
}
]
}
}
POST /blog/post/_search
{
"fields": ["title"],
"query": {
"match": {
"title.en": "working"
}
}
}
"failed": 0
},
"hits": {
"total": 6,
"max_score": 2.4509864,
"hits": [
{
"_index": "blog",
"_type": "post",
"_id": "828",
"_score": 2.4509864,
"fields": {
"title": [
"Still a lot of work in store"
]
}
},
{
"_index": "blog",
"_type": "post",
"_id": "3873",
"_score": 2.144613,
"fields": {
"title": [
"SSL: what is it and how does it work?"
]
}
},
{
"_index": "blog",
Search
GET /blog/post/_search?pretty
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 963,
"max_score": 1,
"hits": [
{
"_index": "blog",
"_type": "post",
"_id": "6067",
"_score": 1,
"_source": {
"language": "en-US",
"title": "My Combell Power Tips: Registrant Templates and
new domain name overview",
"date": "Tue, 24 Nov 2015 15:58:48 +0000",
"author": "Romy",
"category": [
"Combell news",
"Domain names",
"News",
"Tools",
"control panel",
"domain name",
"my combell",
"register",
"templates"
],
"guid": "6067"
GET /blog/post/_search?pretty
POST /blog/post/_search?pretty
{
"query": {
"match_all": {}
}
} Search
“lite” vs full
query DSL
GET /blog/post/_search?pretty&q=title:Thijs
POST /products/product/_search?pretty
{
"query": {
"match": {
"title": "Thijs"
}
}
}
Search
“lite” vs full
query DSL
POST /blog/post/_count
{
"query": {
"match": {
"title": "PROXY protocol support in Varnish"
}
}
}
162 posts
1 postPOST /blog/post/_count
{
"query": {
"filtered": {
"filter": {
"term": {
"title.raw": "PROXY protocol support in Varnish"
}
}
}
}
}
Filter
vs
Query
Filter
•Does it match? Yes or no
•When relevance doesn’t matter
•Faster & cacheable
•For non-analyzed data
Query
•How well does it match?
•For full-text search
•On analyzed/tokenized data
Match Query
Multi Match Query
Bool Query
Boosting Query
Common Terms Query
Constant Score Query
Dis Max Query
Filtered Query
Fuzzy Like This Query
Fuzzy Like This Field Query
Function Score Query
Fuzzy Query
GeoShape Query
Has Child Query
Has Parent Query
Ids Query
Indices Query
Match All Query
More Like This Query
Nested Query
Prefix Query
Query String Query
Simple Query String Query
Range Query
Regexp Query
Span First Query
Span Multi Term Query
Span Near Query
Span Not Query
Span Or Query
Span Term Query
Term Query
Terms Query
Top Children Query
Wildcard Query
Minimum Should Match
Multi Term Query Rewrite
Template Query
And Filter
Bool Filter
Exists Filter
Geo Bounding Box Filter
Geo Distance Filter
Geo Distance Range Filter
Geo Polygon Filter
GeoShape Filter
Geohash Cell Filter
Has Child Filter
Has Parent Filter
Ids Filter
Indices Filter
Limit Filter
Match All Filter
Missing Filter
Nested Filter
Not Filter
Or Filter
Prefix Filter
Query Filter
Range Filter
Regexp Filter
Script Filter
Term Filter
Terms Filter
Type Filter
Filter
examples
POST /blog/post/_search?pretty
{
"query": {
"filtered": {
"filter": {
"ids": {
"values": [231,234,258]
}
}
}
}
}
POST /blog/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must" : [
{
"term" : {
"language" : "en-US"
}
},
{
"range" : {
"date" : {
"gte" : "2016-01-01",
"format" : "yyyy-MM-dd"
}
}
}
],
"must_not" : [
{
"term" : {
"category" : "joomla"
}
}
],
"should" : [
{
"term" : {
"category" : "Hosting"
}
},
{
"term" : {
"category" : "evangelist"
}
}
]
}
}
}
}
}
POST /blog/_search?pretty
{
"query": {
"filtered": {
"filter": {
"prefix": {
"title.raw": "Combell"
}
}
}
}
}
POST /cities/city/_search
{
"size": 200,
"sort": [
{
"city": {
"order": "asc"
}
}
],
"query": {
"filtered": {
"filter": {
"geo_distance_range": {
"lt": "5km",
"location": {
"lat": 51.033333,
"lon": 2.866667
}
}
}
}
}
}
Requires “geo
point” typed
field
POST /cities/city/_search
{
"size": 200,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"location": {
"bottom_left": {
"lat": 51.1,
"lon": 2.6
},
"top_right": {
"lat": 51.2,
"lon": 2.7
}
}
}
}
}
}
}
Requires “geo
point” typed
field
Draw a “box”
Relevance
POST /blog/_search
{
"fields": ["title"],
"query": {
"bool": {
"must": [
{
"match": {
"title": "varnish thijs"
}
},
{
"filtered": {
"filter": {
"term": {
"language": "en-US"
}
}
}
}
]
}
}
}
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 1.984594,
"hits": [
{
"_index": "blog",
"_type": "post",
"_id": "4275",
"_score": 1.984594,
"fields": {
"title": [
"Thijs Feryn gave a demo of Varnish Cache on WordPress during a
Future Insights webinar"
]
}
},
{
"_index": "blog",
"_type": "post",
"_id": "6238",
"_score": 0.8335616,
"fields": {
"title": [
"PROXY protocol support in Varnish"
]
}
},
{
"_index": "blog",
Hits both
terms. More
relevant
POST /blog/_search?_source=false
{
"query": {
"filtered": {
"filter": {
"term": {
"category": "PHPBenelux"
}
}
}
}
}
Using a filter
instead of a
query
We don’t
care about
the source
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "blog",
"_type": "post",
"_id": "6254",
"_score": 1
},
{
"_index": "blog",
"_type": "post",
"_id": "11749",
"_score": 1
}
]
}
}
No relevance
on filters
Score is
always 1
POST /blog/_search
{
"fields": ["title", "category"],
"query": {
"bool": {
"must": [
{
"match": {
"title": "thijs feryn"
}
}
],
"should": [
{
"match": {
"category": "Varnish"
}
}
]
}
}
}
Only search
for “thijs feryn”
Increase
relevance if
category contains
“Varnish”
POST /blog/_search
{
"fields": ["title", "category"],
"query": {
"bool": {
"must_not": [
{
"filtered": {
"filter": {
"term": {
"author": "Romy"
}
}
}
}
],
"should": [
{
"match": {
"category": "Magento"
}
}
]
}
}
}
Increase
relevance
Combining
filters &
queries
POST /blog/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "Magento",
"boost" : 3
}
}
},
{
"match": {
"title": {
"query": "Wordpress",
"boost" : 2
}
}
}
]
}
}
}
Increase
relevance
Query-
time
boosting
Multi index
multi type
/_search
/products/_search
/products/product/_search
/products,clients/_search
/pro*/_search
/pro*,cli*/_search
/products/product,invoice/_search
/products/pro*/_search
/_all/product/_search
/_all/product,invoice/_search
/_all/pro*/_search
Multi
“all the
things”
Aggregations
Group by on steroids
SELECT author, COUNT(guid)
FROM blog.post
GROUP BY author
Aggregations
in SQL
Metric
Bucket
SELECT author, COUNT(guid)
FROM blog.post
GROUP BY author
POST /blog/post/_search?
pretty&search_type=count
{
"aggs": {
"popular_bloggers": {
"terms": {
"field": "author"
}
}
}
}
Only
aggs, no
docs
"aggregations": {
"popular_bloggers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Romy",
"doc_count": 415
},
{
"key": "Combell",
"doc_count": 184
},
{
"key": "Tom",
"doc_count": 184
},
{
"key": "Jimmy Cappaert",
"doc_count": 157
},
{
"key": "Christophe",
"doc_count": 23
}
]
}
}
Aggregation
output
POST /blog/_search
{
"query": {
"match": {
"title": "varnish"
}
},
"aggs": {
"popular_bloggers": {
"terms": {
"field": "author",
"size": 10
},
"aggs": {
"used_languages": {
"terms": {
"field": "language",
"size": 10
}
}
}
}
}
}
Nested
multi-group by
alongside
query
"aggregations": {
"popular_bloggers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Romy",
"doc_count": 4,
"used_languages": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "en-US",
"doc_count": 3
},
{
"key": "nl-NL",
"doc_count": 1
}
]
}
},
{
"key": "Combell",
"doc_count": 3,
"used_languages": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "nl-NL",
"doc_count": 3
}
]
}
},
Aggregation
output
Min Aggregation
Max Aggregation
Sum Aggregation
Avg Aggregation
Stats Aggregation
Extended Stats Aggregation
Value Count Aggregation
Percentiles Aggregation
Percentile Ranks Aggregation
Cardinality Aggregation
Geo Bounds Aggregation
Top hits Aggregation
Scripted Metric Aggregation
Global Aggregation
Filter Aggregation
Filters Aggregation
Missing Aggregation
Nested Aggregation
Reverse nested Aggregation
Children Aggregation
Terms Aggregation
Significant Terms Aggregation
Range Aggregation
Date Range Aggregation
IPv4 Range Aggregation
Histogram Aggregation
Date Histogram Aggregation
Geo Distance Aggregation
GeoHash grid Aggregation
Managing
Elasticsearch
Plenty of ways
… for which we don’t have enough time
Clustering
Single
node
2 node
cluster
3 node
cluster
Example config settings
node.rack: my-location
node.master: true
node.data: true
http.enabled: true
cluster.name: my-cluster
node.name: my-node
index.number_of_shards: 5
index.number_of_replicas: 1
discovery.zen.minimum_master_nodes: 2
GET /_cat
GET /_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
Non-JSON
output
GET /_cat/shards?v
index shard prirep state docs store ip node
my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3
my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2
my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3
my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2
my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1
my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3
my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1
my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2
my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1
my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3
5 shards & a
single replica
by default
GET /_cat/health?
v&h=cluster,status,node.total,shards,pri,unassign,init
cluster status node.total shards pri unassign init
mycluster green 3 12 6 0 0
Cluster health
The ELK stack
Logs
Parse
& ship
Store
Visualize
Beats
•File beat
•Top beat
•Packet beat
•Winlog beat
Logs
Parse
Store Visualize
Ship
Integrating
Elasticsearch
It’s REST,
deal with it!
Or just use an
API
PHP Java Perl
PythonRuby.NET
Try it
yourself!
http://github.com/
thijsferyn/
elasticsearch_tutorial
https://blog.feryn.eu
https://talks.feryn.eu
https://youtube.com/thijsferyn
https://soundcloud.com/thijsferyn
https://twitter.com/thijsferyn
http://itunes.feryn.eu
ElasticSearch in action

ElasticSearch in action