Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ELASTICSEARCH INTRODUCTION
Kristijan Duvnjak & Mladen MaravićZagreb, 27.03.2015.
Elasticsearch as a search
alternative to ...
PART 1
1
What is Elasticsearch?
What is Elasticsearch (ES)?
Document-oriented schema-free "database"
Built on top of Apache Lucene
Real-time search and da...
ES for relational database users...
3
Oracle Elasticsearch
Database Index
Partition Shard
Table Type
Row Document
Column F...
Clustering – single node cluster
Node = running instance of ES
Cluster = 1+ nodes with the same cluster.name
Every cluster...
About indexes & shards
All data is stored inside one or more indexes
Index has one or more shards (change
requires reindex...
Clustering – adding a second node
Example above:
3 indexes
Each index has one primary (P) and one replica (R) shard
6
Clustering – adding a third node
More primary shards:
faster indexing
more scale
More replicas:
faster searching
more fail...
About documents...
Documents are JSON-based
Schema-free, but not necessarily!
If no schema:
ES guesses field type
and inde...
About documents...
Each document has an ID (auto-generated or manually assigned)
You can force placement of a document int...
Index details
inverted index
Elasticsearch Server 1.0 (doc 1)
Mastering Elasticsearch (doc 2)
Apache Solr 4 Cookbook (doc ...
Indexing example
GET /blog/_search
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
...
Storing data - indexing
data input: REST, Java API, Rivers*
data analysis: tokenizer and one or more filters
types of filt...
We query them!
All the usual stuff (think of WHERE in SQL)
Full text search with support for:
highlighting
stemming
ngrams...
Query details
search types (query_then_fetch, query_and_fetch ...)
same type of analysis as indexing
explain plan
sorting,...
PBZ use case
turnovers by account: 600M documents, 200M/year
routing by account number
indexing performance, 30k-40k docum...
PART 2
16
Cluster architecture
PBZ ES cluster architecture
17
DATA node 1
DATA node 2
Elasticsearch cluster
CLIENT node 1
NETWORK
DISPATCHER
CLIENT node ...
DATA node 1
DATA node 2Elasticsearch cluster
CLIENT node
NETWORK
DISPATCHER
MASTER node
PBZ ES cluster architecture
18
Clu...
Elasticsearch Administration
plugins
Marvel – monitoring console (GC, throttiling, CPU, memory, heap, search/indexing stat...
PART 3
20
ELK
PART 4
21
Q & A
Mladen Maravić & Kristijan Duvnjak
22
Upcoming SlideShare
Loading in …5
×

Elasticsearch as a search alternative to a relational database

The volume of data that we are working with is growing every day, the size of data is pushing us to find new intelligent solutions for problem’s put in front of us. Elasticsearch server has proved it self as an excellent full text search solution for big volume’s of data.

Elasticsearch as a search alternative to a relational database

  1. 1. ELASTICSEARCH INTRODUCTION Kristijan Duvnjak & Mladen MaravićZagreb, 27.03.2015. Elasticsearch as a search alternative to a relational database
  2. 2. PART 1 1 What is Elasticsearch?
  3. 3. What is Elasticsearch (ES)? Document-oriented schema-free "database" Built on top of Apache Lucene Real-time search and data analytics Full-text search Distributed (horizontal scalability) High-avalability REST API 2 "Open Source (Apache 2) distributed RESTful search engine built on top of Lucene"
  4. 4. ES for relational database users... 3 Oracle Elasticsearch Database Index Partition Shard Table Type Row Document Column Field Schema Mapping Index - (everything is indexed) SQL Query DSL
  5. 5. Clustering – single node cluster Node = running instance of ES Cluster = 1+ nodes with the same cluster.name Every cluster has 1 master node Clients talk to any node in the cluster 1 Cluster can have any number of indexes 4
  6. 6. About indexes & shards All data is stored inside one or more indexes Index has one or more shards (change requires reindexing) One index is one folder somewhere on disk Backup an index? Just tar/zip the folder.... 5 Each shard is one full instance of Lucene Each shard can have zero or more replicas (can be changed at any time) Index Shard
  7. 7. Clustering – adding a second node Example above: 3 indexes Each index has one primary (P) and one replica (R) shard 6
  8. 8. Clustering – adding a third node More primary shards: faster indexing more scale More replicas: faster searching more failover 7
  9. 9. About documents... Documents are JSON-based Schema-free, but not necessarily! If no schema: ES guesses field type and indexes it With schema (or explicit mapping): Mapping applies to specific document type (type is just a label) Mapping defines the following for each field: ─ kind (string, number, date...) ─ to index or not ─ to store data or not 8
  10. 10. About documents... Each document has an ID (auto-generated or manually assigned) You can force placement of a document into a specific shard – routing! Versioning is available – optimistic version control ! 9
  11. 11. Index details inverted index Elasticsearch Server 1.0 (doc 1) Mastering Elasticsearch (doc 2) Apache Solr 4 Cookbook (doc 3) 10 Term Count Document 1.0 1 <1> 4 1 <3> apache 1 <3> cookbook 1 <3> elasticsearch 2 <1>,<2> mastering 1 <2> server 1 <1> solr 1 <3>
  12. 12. Indexing example GET /blog/_search { "took": 6, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "blog", "_type": "blog_comment", "_id": "AUzhH9M9HW_GzrF8oLAj", "_score": 1, "_source": { "user_id": 1, "date": "2015-04-01T13:12:12", "comment": "What’s so cool about Elasticsearch?" } } ] } } 11 POST /blog/blog_comment?routing=1 { "user_id" : 1, "date" : "2015-04-01T13:12:12", "comment" : "What’s so cool about Elasticsearch?" } GET /blog/_mapping { "blog": { "mappings": { "blog_comment": { "properties": { "comment": { "type": "string" }, "date": { "type": "date", "format": "dateOptionalTime" }, "user_id": { "type": "long" } } } } } }
  13. 13. Storing data - indexing data input: REST, Java API, Rivers* data analysis: tokenizer and one or more filters types of filters: lowercase filter – makes all tokens lowercased synonyms filter – changes one token to another on the basis of synonym rules language stemming filters - reducing tokens into root or base forms, the stem different data storing needs string analyze,not_analyze field configuration _all in field memory field data or doc values segments, segment merging, throttling routing, indexing with routing 12
  14. 14. We query them! All the usual stuff (think of WHERE in SQL) Full text search with support for: highlighting stemming ngrams & edge-ngrams Aggregations: term facets, date histograms, ranges Geo search: bounding box, distance,distance ranges, polygons Percolators (or reverse-search!) So, we can store documents and then what?!? 13
  15. 15. Query details search types (query_then_fetch, query_and_fetch ...) same type of analysis as indexing explain plan sorting,aggregating data with in memory or on disk values search filters Boolean And/Or/Not filter cache, BitSets routing, searching with routing 14
  16. 16. PBZ use case turnovers by account: 600M documents, 200M/year routing by account number indexing performance, 30k-40k documents per second DB performance in seconds, ES performance in ms (3500 queries/sec): find last 100 turnovers for a given account number: < 50 ms find last 100 turnovers for a given account number where description contains some words: <100ms 15
  17. 17. PART 2 16 Cluster architecture
  18. 18. PBZ ES cluster architecture 17 DATA node 1 DATA node 2 Elasticsearch cluster CLIENT node 1 NETWORK DISPATCHER CLIENT node 2 MASTER node 1 MASTER node 2 MASTER node 3 Cluster per datacenter
  19. 19. DATA node 1 DATA node 2Elasticsearch cluster CLIENT node NETWORK DISPATCHER MASTER node PBZ ES cluster architecture 18 Cluster per datacenter
  20. 20. Elasticsearch Administration plugins Marvel – monitoring console (GC, throttiling, CPU, memory, heap, search/indexing statistics ...) Sense – REST UI to Elasticsearch custom plugins (JDBC rivers ...) security Apache Web server Elasticsearch Shield speeding up queries using warmers 19
  21. 21. PART 3 20 ELK
  22. 22. PART 4 21 Q & A
  23. 23. Mladen Maravić & Kristijan Duvnjak 22

×